IPL Analysis:
IPL analysis plays a major role in owning a team, making decisions, and deciding team batting or bowling order. We will be using the dataset by Kaggle, and try to dig insights. We will only be using pandas. So the only thing you need to analyze the data is to get the dataset.
Step 1: Importing Libraries
We will start by importing all the necessary libraries before analyzing the data.
import pandas as pd
Step 2: Download The Dataset
We will be working on the dataset of the 2008 – 2020 Data of IPL. Before Everything downloads the Dataset from here.
Step 3: Importing The Dataset
df = pd.read_csv('IPL Ball-by-Ball 2008-2020.csv')
df.head()
IPL DataSet Information :
df.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 193468 non-null int64
1 inning 193468 non-null int64
2 over 193468 non-null int64
3 ball 193468 non-null int64
4 batsman 193468 non-null object
5 non_striker 193468 non-null object
6 bowler 193468 non-null object
7 batsman_runs 193468 non-null int64
8 extra_runs 193468 non-null int64
9 total_runs 193468 non-null int64
10 non_boundary 193468 non-null int64
11 is_wicket 193468 non-null int64
12 dismissal_kind 9495 non-null object
13 player_dismissed 9495 non-null object
14 fielder 6784 non-null object
15 extras_type 10233 non-null object
16 batting_team 193468 non-null object
17 bowling_team 193277 non-null object
dtypes: int64(9), object(9)
memory usage: 26.6+ MB
IPL Analysis 1: List Of Seasons
you can get all the seasons in the dataset for cricket analysis by applying unique() function on the season column so that seasons don’t repeat. Like this:
df.season.unique()
array([2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2019,
2018, 2020, 2021], dtype=int64)
IPL Analysis 2: IPL Matches Season Wise
How many IPL matches were played in each season can be determined by matchid.
df.groupby(['match_id','season']).count().index.droplevel(level=0).value_counts().sort_index().plot(kind='bar')
IPL Analysis 3: Most IPL Matches Played In a Stadium
According to our analysis, most ipl matches are played in M.chinaswamy stadium.
We have grouped venue and match id to count how many matches are played in any stadium.
%matplotlib inline
df.groupby(['venue','match_id']).count().droplevel(level=1).index.value_counts().sort_values(ascending=False)[:10].plot(kind='bar')
IPL Analysis 4: Number of IPL Matches Played By Each Team
df['bowling_team'].value_counts().sort_values(ascending=False).plot(kind='barh')
IPL Analysis 5 : Most Run Scored by IPL Teams
We have grouped all batting team and added all the runs scored by teams.
No wonder, Mumbai Indians tops the list.
%matplotlib inline
df.groupby(['batting_team'])['run'].sum().sort_values(ascending=False).plot(kind='barh')
IPL Analysis 5 : Most IPL Runs by a Batsman
We have group all the strikers and add all the runs. Virat Kohli tops the list.
df.groupby(['striker'])['runs_off_bat'].sum().sort_values(ascending=False)[:10].plot(kind='bar')
Average Run by Teams in Powerplay
df[df['over']<6].groupby(['match_id','batting_team']).sum()['run'].groupby('batting_team').mean().sort_values(ascending=False)[2:].plot(kind='barh')
Most IPL Century by a Player
runs = df.groupby(['striker','match_id'])['runs_off_bat'].sum()
runs[runs >= 100].droplevel(level=1).groupby('striker').count().sort_values(ascending=False)[:10].plot(kind='barh')
Most IPL Fifty by a Player
runs = df.groupby(['striker','start_date'])['runs_off_bat'].sum()
data= runs[runs >= 50].droplevel(level=1).groupby('striker').count().sort_values(ascending=False)[:10].plot(kind='barh')
Most Sixes in an IPL Inning
df[df['runs_off_bat'] == 6].groupby(['start_date','striker']).count()['season'].sort_values(ascending=False).droplevel(level=0)[:10].plot(kind='barh')
Most (4s) hit by a Batsman
data = df[df['runs_off_bat'] == 4]['striker'].value_counts()[:10].plot(kind='bar')
Most runs in an IPL season by Player
df.groupby(['striker','season'])['runs_off_bat'].sum().sort_values(ascending=False)[:10].plot(kind='bar')
No. of Sixes in IPL Seasons
data = df[df['runs_off_bat'] == 6].groupby('season').count()['match_id'].sort_values(ascending=False).plot(kind='barh')
Highest Individual IPL Score
df.groupby(['striker','start_date'])['runs_off_bat'].sum().sort_values(ascending=False)[:10].plot(kind='barh')
Most run conceded by a bowler in an inning
df.groupby(['bowler','start_date'])['run'].sum().droplevel(level=1).sort_values(ascending=False)[:10].plot(kind='barh')
Most IPL Wickets by a Bowler
lst = 'caught,bowled,lbw,stumped,caught and bowled,hit wicket'
df[df['wicket_type'].apply(lambda x: True if x in lst and x != ' ' else False)]['bowler'].value_counts()[:10].plot(kind='barh')
Most Dot Ball by a Bowler
data = df[df['run'] == 0].groupby('bowler').count()['match_id'].sort_values(ascending=False)[:10].plot(kind='barh')
Most Wickets by an IPL Team
lst = 'caught,bowled,lbw,stumped,caught and bowled,hit wicket'
data = df[df['wicket_type'].apply(lambda x: True if x in lst and x != ' ' else False)]['bowling_team'].value_counts()
df.groupby(['batting_team'])['extras'].agg('sum').sort_values(ascending=False).plot(kind='barh')
Most No Balls by an IPL team
df.groupby(['batting_team'])['noballs'].agg('sum').sort_values(ascending=False).plot(kind='bar')
As you have noticed, we have analyzed a lot of things using pandas and matplotlib. These analyses alone are sufficient enough to take some very important decisions. Imagine a Data analyst, doing a postmortem of data and digging insights much more complex than these.
This is what you do, as a Data analyst in any company, you improve the decision-making process by giving them insights like these.
If you want to learn to analyze data and become a data scientist, we are offering our courses here.
Go through the courses and learn Data analysis to become a Data analyst in less than 7 months.
Follow our Insta Page for more info like this: Console Flare (@consoleflare) is on Instagram
Want to see IPL stats : IPLT20.com – Indian Premier League Official Website