QUALITY CHECK your PANDAS dataframe - missing values, inconsistencies, correlation, text analysis.

Опубликовано: 15 Ноябрь 2021
на канале: Discover AI
348
8

Pandas df profiling! The QUALITY of your input DATA is most important! Here a python lib for an ultra-fast quality check on your PANDAS dataframe. Your potentially heterogeneous tabular data structure is thoroughly checked for consistency.

Pandas profiling. Pandas dataframe profiling. Profiling of a pandas dataframe. Pandas_profiling. Missing values.

With visual output: Pandas_profiling v3.
https://pandas-profiling.github.io/pa...

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:
1. Type inference: detect the types of columns in a dataframe.
2. Essentials: type, unique values, missing values.
3. Statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range.
4. Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation.
5. Most frequent values.
6. Histograms.
7. Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices.
8. Missing values matrix, count, heatmap and dendrogram of missing values.
9. Duplicate rows Lists the most occurring duplicate rows.
10. Text analysis: learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.

python lib.
#JupyterLab
#Pandas_profiling
#Pandas_dataframe
Pandas_df.

00:00 Install Pandas_profiling lib
01:05 Profile Report
02:56 35K project descriptions
06:29 Variable Analysis
09:42 Deep Dive