Want to learn more? Take the full course at https://learn.datacamp.com/courses/in... at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
In this video, I'll introduce the histogram. The histogram is a type of visualization that's very useful to explore your data. It can help you to get an idea about the distribution of your variables. To see how it works, imagine 12 values between 0 and 6. I've put them along a number line here. To build a histogram for these values, you can divide the line into equal chunks, called bins. Suppose you go for 3 bins, that each has a width of 2. Next, you count how many data points sit inside each bin. There are 4 data points in the first bin, 6 in the second bin and 2 in the third bin. Finally, you draw a bar for each bin. The height of the bar corresponds to the number of data points that fall in this bin.
The result is a histogram, which gives us a nice overview of how the 12 values are distributed. Most values are in the middle, but there are more values below 2 than there are values above 4.
Of course, also matplotlib is able to build histograms. As before, you should start by importing the pyplot package that's inside matplotlib. Next, you can use the hist() function. Let's open up its documentation. There's a bunch of arguments you can specify, but the first two here are the most important ones. x should be a list of values you want to build a histogram for. You can use the second argument, bins, to tell Python how many bins the data should be divided. Based on this number, hist() will automatically find appropriate boundaries for all bins, and calculate how may values are in each one. If you don't specify the bins argument, it will be 10 by default.
So to generate the histogram that you've seen before, let's start by building a list with the 12 values. Next, you simply call hist() and pass this list as an input, so it's matched to the argument x. I also specified the bins argument to be 3, so that the values are divided into three bins. If you finally call the show function, a nice histogram results.
Histograms are really useful to give a bigger picture. As an example, have a look at this so-called population pyramid. The age distribution is shown, for both males and females, in the European union. Notice that the histograms are flipped 90 degrees; the bins are horizontal now. The bins are largest for the ages 40 to 44, where there are 20 million males and 20 million females. They are the so called baby boomers. These are figures of the year 2010. What do you think will have changed in 2050? Let's have a look. The distribution is flatter, and the baby boom generation has gotten older. With the blink of an eye, you can easily see how demographics will be changing over time. That's the true power of histograms at work here!
Now head over to the exercises to experiment with histograms yourself!
#DataCamp #PythonTutorial #Histogram #IntermediatePython