All posts by

Estimation of Displayed Items by User Behavior – An Application of the German Tank Problem in Tech Platforms

Imagine when you are shopping on Amazon, a list of 50 items is displayed after a search. You scroll down, click an item, continue scrolling, and click on a few more. How does an analyst know if an item has been displayed on the screen to calculate the click rate (clicked/displayed)? How do they know if you saw only 15, or 20, or all 50 items? Is there a scientific way to estimate the furthest point you scrolled base on your clicks, and therefore how many items were actually displayed? It turns out this is “The German Tank Problem”.

Continue reading Estimation of Displayed Items by User Behavior – An Application of the German Tank Problem in Tech Platforms

Cognitive Minimalist

In this post, I wanted to quickly introduce an idea that I haven’t seen anywhere else. It may be obvious to some people, although this is an simple idea some might still be benefited from it.

Most of the minimalists dedicate to reduce the number of objects they own, or some similar metrics such as the amount of money spent or the space of their apartment is – physical entities. However, I’m proposing idea of “cognitive minimalist” which is to reduce the amount needed for cognition. In other words, mental cost, psychological effort, or cognitive resource etc.

Continue reading Cognitive Minimalist

How To Fit A Machine Learning Model To A Kaggle Dataset In 8 Lines

import pandas as pd
train_transaction_raw = pd.read_csv('data/ieee-fraud-detection.zip Folder/train_transaction.csv')

import TEF
train_transaction = TEF.auto_set_dtypes(train_transaction_raw, set_object=[0])

TEF.dfmeta(train_transaction)
TEF.plot_1var(train_transaction)
TEF.plot_1var_by_cat_y(train_transaction, 'isFraud')

TEF.fit(train_transaction, 'isFraud', verbose=2)

Disclaimer and Caveat

Every ML practitioner knows it is a risky behavior to fit a model without understanding the data. The purpose of this article is to introduce the universal usage of TEF only instead of detailed exploration. Within these code, we can only have a rough understanding about the dataset.

In the following section I will walk through these codes for this ieee fraud detection dataset. A more detailed exploration, feature engineering, and model selection may be published in the future.

Continue reading How To Fit A Machine Learning Model To A Kaggle Dataset In 8 Lines