Tag Archives: Statistics

Estimation of Displayed Items by User Behavior – An Application of the German Tank Problem in Tech Platforms

2024-08-05 Leave a comment

Imagine when you are shopping on Amazon, a list of 50 items is displayed after a search. You scroll down, click an item, continue scrolling, and click on a few more. How does an analyst know if an item has been displayed on the screen to calculate the click rate (clicked/displayed)? How do they know if you saw only 15, or 20, or all 50 items? Is there a scientific way to estimate the furthest point you scrolled base on your clicks, and therefore how many items were actually displayed? It turns out this is “The German Tank Problem”.

Continue reading →

Academic

Swear Words in Review: Regiospecificity and Predictability

2015-12-03 Leave a comment

Abstract

This report is aimed to answer the following two questions. 1. Does the use of swear words have any regiospecificity that result in heterogeneous in the data? 2. Does the use of swear words in customers’ review have an impact on the ratings they gave? Can it predict the stars they gave toward a business? Mainly, analysis using ANOVA on metropolis, multiple regression on ratings are performed. Results indicate that the usage of swear words is different by region and 25 of 45 swear words have predictability on the rating a customer gave. All code and files can be obtained from the link in the end.
Continue reading Swear Words in Review: Regiospecificity and Predictability →

Academic

Data Science Capstone Quiz

2015-10-13 Leave a comment

Introduction

All quiz questions are from Coursera Data Science Capstone course.
All .json files are provided by Yelp.
Data sources is hiden for privacy concern.

Continue reading Data Science Capstone Quiz →

TLL