Data visualization is the graphical representation of information and data. It helps in identifying trends, patterns, and outliers that might go unnoticed in a plain data table. Bar charts, line graphs, histograms, and pie charts are some popular ways to visualize data.
In Python, we use powerful libraries like Pandas and Matplotlib to:
- Manipulate and process data easily.
- Generate high-quality, visually appealing charts.
In the code we are analyzing, we will focus on using Pandas to aggregate sales and tips data and Matplotlib for data visualization with a bar chart.
Data Visualization with matplotlib: Build Bar Chart in 7 easy steps
Understanding the code step by step:
Step 1: Importing Necessary Libraries
import pandas as pd # for dataset manipulation
import matplotlib.pyplot as plt # for data visualization
import numpy as np # for statistical data
PythonThe code begins by importing three essential libraries:
- Pandas: A data manipulation library used to handle structured data, like reading and aggregating the CSV file.
- Matplotlib: A plotting library used for generating graphs and visualizations.
- Numpy: A numerical computing library. It provides tools
arange()
to generate ranges of values (useful for bar positioning).
Step 2: Reading the Dataset
df = pd.read_csv('tips.csv')
PythonHere, we load the dataset using pd.read_csv()
. This function reads the CSV file and stores the data in a DataFrame, which is a table-like data structure in Pandas.
Data Visualization with matplotlib: Build Bar Chart in 7 easy steps
Step 3: Aggregating Data using Pandas
ndf = df.groupby('day').agg(
total_sale=('total_bill', 'sum'),
total_tip=('tip', 'sum')
).reset_index()
print(ndf)
PythonIn this step, we group the data by the day
column using the groupby()
function. Then, we use the agg()
function to:
- Sum the total bills (
total_bill
) for each day. - Sum the tips (
tip
) for each day.
Step 4: Setting up the Bar Chart with Matplotlib
x_pos = np.arange(len(ndf))
plt.title('Sale vs Tips on Days', color='red')
Python- Creating positions for bars:
We usenp.arange()
to generate an array representing the positions of the bars on the x-axis. It is important to space out the bars correctly. - Adding a title:
plt.title()
sets the title of the chart. Thecolor='red'
argument makes the title appear in red.
Data Visualization with matplotlib: Build Bar Chart in 7 easy steps
Step 5: Customizing the Chart for Better Insights
plt.bar(x_pos, ndf['total_sale'], width=0.5, label='sale', color='Green')
plt.bar(x_pos + 0.2, ndf['total_tip'], width=0.3, label='tips', color='orange')
Python- Creating two bar plots:
We generate two sets of bars: one for sales and another for tips.- The first
plt.bar()
plots the sales data. - The second
plt.bar()
plots the tips data but shifts the bars slightly to the right (x_pos + 0.2
) to prevent overlap.
- The first
- Customizing colors and widths:
- The
color
parameter sets the bar colors. - The
width
parameter controls the width of the bars. - The
label
parameter provides a name for the legend.
- The
These customizations improve the readability of the chart by clearly distinguishing between sales and tips.
Step 6: Displaying the Final Chart
plt.xlabel('Days', color='green')
plt.ylabel('Sale', color='green')
plt.xticks(x_pos, ndf['day'])
plt.legend()
plt.grid()
plt.show()
PythonAdding axis labels:plt.xlabel()
and plt.ylabel()
define the labels for the x and y axes, respectively. These labels help viewers understand what the chart represents.
Setting x-axis ticks:plt.xticks()
positions the day names directly under each bar. We use x_pos
as the positions and ndf['day']
as the labels.
Adding a legend and grid:
plt.legend()
displays a legend to explain what each bar color represents.plt.grid()
adds grid lines, making it easier to compare values visually.
Displaying the chart:
Finally, plt.show()
renders the chart and displays it to the user.
For more such content and regular updates, follow us on Facebook, Instagram, LinkedIn
Output and Insights
The plot generated by this code compares monthwise retail sales for 2023 and 2024. Some key takeaways include:
- Identifying Trends: You can quickly see if sales are increasing or decreasing on specific days.
- Comparing Performance: It’s easy to compare performance between the sales and tips on a particular day.
Data Visualization with matplotlib: Build Bar Chart in 7 easy steps
The Code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df=pd.read_csv('tips.csv')
ndf=df.groupby('day').agg(
total_sale=('total_bill','sum'),
total_tip=('tip','sum')
).reset_index()
print(ndf)
x_pos=np.arange(len(ndf))
plt.title('Sale vs Tips on Days',color='red')
plt.bar(x_pos,ndf['total_sale'],width=0.5,label='sale',color='Green')
plt.bar(x_pos+0.2,ndf['total_tip'],width=0.3,label='tips',color='orange')
plt.xlabel('Days',color='green')
plt.ylabel('Sale',color='green')
plt.xticks(x_pos,ndf['day'])
plt.legend()
plt.grid()
plt.show()
PythonIf you’re ready to embark on a rewarding career as a data analyst in the data science field, consider enrolling in a comprehensive course that focuses on Python.
At ConsoleFlare, we offer tailored courses that provide hands-on experience and in-depth knowledge to help you master Python and excel in your data science journey. Join us and take the first step towards becoming a data science expert with Python at your fingertips.
Register yourself with ConsoleFlare for our free workshop on data science. In this workshop, you will get to know each tool and technology that is required for you to become a data analyst from scratch and also which will make you skillfully eligible for any other data science profile.
Data Visualization with matplotlib: Build Bar Chart in 7 easy steps
Thinking, Why Console Flare?
- Recently, ConsoleFlare has been recognized as one of the Top 10 Most Promising Data Science Training Institutes of 2023.
- Console Flare offers the opportunity to learn data science in Hindi, just like you speak daily.
- Console Flare believes in the idea of “What to learn and what not to learn” and this can be seen in their curriculum structure. They have designed their program based on what you need to learn for data science and nothing else.
- Want more reasons,