How I Approach New Datasets (5 THINGS TO LOOK OUT FOR)

Опубликовано: 06 Ноябрь 2024
на канале: Mo Chen
6,091
409

I'M HERE TO HELP YOU LAND YOUR NEXT TECH & DATA JOB
🌐 My website: https://mochen.info/

LET'S COLLABORATE
🤝 Sponsorships & long-term partnerships: https://mo-chen.notion.site/partnerships

ABOUT ME
I'm Mo and I work as a data analytics manager / content creator. I make videos about how best to get your next job in the tech & data space; whether you're a beginner, career switcher, or someone looking to take their tech & data career to the next level.

VIDEO SUMMARY
I’ll walk you through the 5 areas that you should definitely consider when you’re faced with a new dataset. These 5 areas include Content & Relevance (e.g., Where's the data coming from, Any potential data biases?), Data Quality (e.g., missing values, duplicates), Data Structure & Types, Outliers (e.g., minimum, maximum), Data Distribution and Summary Statistics (e.g., mean, median, mode, histogram). I’ll give you practical examples using a Google Play Store dataset.

TIMESTAMPS
00:00 Overview: Tooling (Excel, Python), Dataset
01:20 Check 1
01:53 Check 2 (with Practical Example)
05:34 Check 3 (with Practical Example)
09:39 Check 4 (with Practical Example)
12:05 Check 5 (with Practical Example)

PROJECT FILES
Kaggle dataset link ➡ https://www.kaggle.com/datasets/lava1...
GitHub repo ➡ https://github.com/mochen862/new-data...