TLL’s Exploratory Functions
Example
With one line of code
TEF.dfmeta(titanic, check_possible_error=False)
it converts any dataset from
an uninformative original form (click to expand)
survived | passenger_id | pclass | age | birth | deck | |
---|---|---|---|---|---|---|
0 | False | 1 | 3 | 22.0 | 1890-04-19 | NaN |
1 | True | 2 | 1 | 38.0 | 1874-04-23 | C |
2 | True | 3 | 3 | 26.0 | 1886-04-20 | NaN |
3 | True | 4 | 1 | 35.0 | 1877-04-22 | C |
4 | False | 5 | 3 | 35.0 | 1877-04-22 | NaN |
to
an informative metadata dataframe (click to expand)
col name | idx | dtype | NaNs | unique counts | summary | summary plot | row 59 | row 86 | row 216 |
---|---|---|---|---|---|---|---|---|---|
survived | 0 | bool | 0 0% |
2 0% |
False 62% True 38% |
False | False | True | |
passenger_id | 1 | object | 0 0% |
891 100% |
other 100% | 60 | 87 | 217 | |
pclass | 2 | int64 | 0 0% |
3 0% |
3 55% 1 24% 2 21% |
3 | 3 | 3 | |
age | 3 | float64 | 177 20% |
89 10% |
[0.42, 20.125, 28.0, 38.0, 80.0] mean: 29.70 std: 14.53 cv: 0.49 skew: 0.39* log skew: -2.30 |
11 | 16 | 27 | |
birth | 4 | datetime64[ns] | 177 20% |
72 8% |
1832-05-03 1874-04-23 1884-04-20 1892-04-18 1912-04-14 |
1901-04-17 00:00:00 | 1896-04-17 00:00:00 | 1885-04-20 00:00:00 | |
deck | 5 | category | 688 77% |
8 1% |
nan 77% C 7% B 5% D 4% E 4% A 2% F 1% G 0% |
nan | nan | nan |
or, with 2 more lines to include description and feature importance (click to expand)
col name | idx | dtype | description | NaNs | unique counts | summary | summary plot | fitted feature importance | row 255 | row 657 | row 884 |
---|---|---|---|---|---|---|---|---|---|---|---|
survived | 0 | bool | Survived (1) or died (0) | 0 0% |
2 0% |
False 62% True 38% |
True | False | False | ||
passenger_id | 1 | object | Unique ID of the passenger | 0 0% |
891 100% |
other 100% | 256 | 658 | 885 | ||
pclass | 2 | int64 | Passenger’s class (1st, 2nd, or 3rd) | 0 0% |
3 0% |
3 55% 1 24% 2 21% |
1/4 1.03 40% | 3 | 3 | 3 | |
age | 3 | float64 | Passenger’s age | 177 20% |
89 10% |
[0.42, 20.125, 28.0, 38.0, 80.0] mean: 29.70 std: 14.53 cv: 0.49 skew: 0.39* log skew: -2.30 |
4/4 0.06 2% | 29 | 32 | 25 | |
birth | 4 | datetime64[ns] | Created from minusing the titanic happened date from age | 177 20% |
72 8% |
1832-05-03 1874-04-23 1884-04-20 1892-04-18 1912-04-14 |
3/4 0.65 25% | 1883-04-21 00:00:00 | 1880-04-21 00:00:00 | 1887-04-20 00:00:00 | |
deck | 5 | category | 688 77% |
8 1% |
nan 77% C 7% B 5% D 4% E 4% A 2% F 1% G 0% |
2/4 0.83 32% | nan | nan | nan |
- Quick Start
- Documentations
- Release Notes
- How to fit a machine learning model to a Kaggle dataset in 8 lines
Other Links
Related Packages
- dabl by Andreas Mueller
- FuzzyWuzzy
Dependencies
- numpy, pandas, seaborn, matplotlib
- io, warnings, re
- scipy
- IPython
- sklearn
- xgboost
- base64
Feel free to request feature or give any feedback below any related blogpost, on github or mail me.