Geek

How To Fit A Machine Learning Model To A Kaggle Dataset In 8 Lines

2019-09-01 Leave a comment

import pandas as pd
train_transaction_raw = pd.read_csv('data/ieee-fraud-detection.zip Folder/train_transaction.csv')

import TEF
train_transaction = TEF.auto_set_dtypes(train_transaction_raw, set_object=[0])

TEF.dfmeta(train_transaction)
TEF.plot_1var(train_transaction)
TEF.plot_1var_by_cat_y(train_transaction, 'isFraud')

TEF.fit(train_transaction, 'isFraud', verbose=2)

Disclaimer and Caveat

Every ML practitioner knows it is a risky behavior to fit a model without understanding the data. The purpose of this article is to introduce the universal usage of TEF only instead of detailed exploration. Within these code, we can only have a rough understanding about the dataset.

In the following section I will walk through these codes for this ieee fraud detection dataset. A more detailed exploration, feature engineering, and model selection may be published in the future.

Continue reading →

Academic

Featured Articles Abstract

2017-12-25 Leave a comment

This is a list of my featured articles. Some are abstracted below. If you want a translated version, please don’t hesitate to contact me.
For your reference, I was 20 years old (sophomore) at 2014, 16 at 2010.

Continue reading Featured Articles Abstract →

Academic

Swear Words in Review: Regiospecificity and Predictability

2015-12-03 Leave a comment

Abstract

This report is aimed to answer the following two questions. 1. Does the use of swear words have any regiospecificity that result in heterogeneous in the data? 2. Does the use of swear words in customers’ review have an impact on the ratings they gave? Can it predict the stars they gave toward a business? Mainly, analysis using ANOVA on metropolis, multiple regression on ratings are performed. Results indicate that the usage of swear words is different by region and 25 of 45 swear words have predictability on the rating a customer gave. All code and files can be obtained from the link in the end.
Continue reading Swear Words in Review: Regiospecificity and Predictability →

Academic

Data Science Capstone Quiz

2015-10-13 Leave a comment

Introduction

All quiz questions are from Coursera Data Science Capstone course.
All .json files are provided by Yelp.
Data sources is hiden for privacy concern.

Continue reading Data Science Capstone Quiz →

TLL

How To Fit A Machine Learning Model To A Kaggle Dataset In 8 Lines

Featured Articles Abstract

Swear Words in Review: Regiospecificity and Predictability

Abstract

Data Science Capstone Quiz

Introduction

An Academic Geek