Netflix Data Analysis with the help of Python and Pandas

netflix data analysis

Netflix Data Analysis:

Netflix has always been a data-driven company since its reception.Netflix uses data to extract insights and use them in improving their services.

Let us do a simple project with the help of pandas to answer some questions of Netflix Data.

Download data from here :

Import Libraries

import pandas as pd
import seaborn as sns

Import Dataset

netflix = pd.read_csv(r"C:\Users\abhis\Downloads\netflix.csv")
netflix  

Sno
show_idtypetitledirectorcastcountrydate_addedratingdurationlisted_in Description
0s1MovieDick Johnson Is DeadKirsten JohnsonNaNUnited StatesSeptember 25, 2021PG-1390 minDocumentariesAs her father nears the end of his life, filmm…
1s1MovieDick Johnson Is DeadKirsten JohnsonNaNUnited StatesSeptember 25, 2021PG-1390 minDocumentariesAs her father nears the end of his life, filmm…
2s2TV ShowBlood & WaterNaNAma Qamata, Khosi Ngema, Gail Mabalane, Thaban…South AfricaSeptember 24, 2021TV-MA2 SeasonsInternational TV Shows, TV Dramas, TV MysteriesAfter crossing paths at a party, a Cape Town t…
3s3TV ShowGanglandsJulien LeclercqSami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi…NaNSeptember 24, 2021TV-MA1 SeasonCrime TV Shows, International TV Shows, TV Act…To protect his family from a powerful drug lor…

You are reading netflix data analysis.

Simple Analysis

Top 5 Records

netflix.head()

Last 5 Records:

netflix.tail()
You are reading netflix data analysis.

shape of Dataset

netflix.shape
(8810, 11)

Size – Total number of elements (Values) in the dataset

netflix.size
96910

Column names:

netflix.columns
Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

dtypes:

netflix.dtypes

Out[8]:

show_id        object
type           object
title          object
director       object
cast           object
country        object
date_added     object
rating         object
duration       object
listed_in      object
description    object
dtype: object

Info

print(netflix.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8810 entries, 0 to 8809
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   show_id      8810 non-null   object
 1   type         8810 non-null   object
 2   title        8810 non-null   object
 3   director     6176 non-null   object
 4   cast         7984 non-null   object
 5   country      7977 non-null   object
 6   date_added   8800 non-null   object
 7   rating       8806 non-null   object
 8   duration     8807 non-null   object
 9   listed_in    8810 non-null   object
 10  description  8810 non-null   object
dtypes: object(11)
memory usage: 757.2+ KB
None

Is there any duplicate Records,if yes then Remove Duplicate Records .

You are reading netflix data analysis.

To check Duplicate Records

netflix[netflix.duplicated()]
You are reading netflix data analysis.

Remove Duplicate Records

netflix.drop_duplicates(inplace = True)

Is there any null-value in our dataset ? Show with heatmap

isnull()

netflix.isnull()

You are reading netflix data analysis.

Count null values

netflix.isnull().sum()
show_id           0
type              0
title             0
director       2634
cast            825
country         831
date_added       10
rating            4
duration          3
listed_in         0
description       0
dtype: int64

Heatmap

sns.heatmap(netflix.isnull())

For ‘Squid Game’ , What is the showid and director of the show ?

To check if Squid Game in title

In [15]:





netflix.loc[netflix['title']=='Squid Game']
netflix[netflix['title'].isin(['Squid Game'])]
netflix[netflix['title'].str.contains('Squid Game')]

In which year the highest number of tv shows and movies are released? Show with Bar Graph

To perform an operation on date_added, we must first check its data type, and if it is not in DateTime, we should convert it.

Convert it into DateTime:

netflix['date_added'].dtypes
dtype('O')
netflix['date_added'] = pd.to_datetime(netflix['date_added'])
netflix['date_added'].dtypes
2019.0    2016
2020.0    1879
2018.0    1649
2021.0    1498
2017.0    1188
2016.0     429
2015.0      82
2014.0      24
2011.0      13
2013.0      11
2012.0       3
2009.0       2
2008.0       2
2010.0       1
Name: date_added, dtype: int64

Bar Graph

netflix['date_added'].dt.year.value_counts().plot(kind='bar')

How many movies and tv shows are in this dataset, Show with Graph?

groupby

netflix.groupby('type')['type'].count()
type
Movie      6131
TV Show    2676
Name: type, dtype: int64

Bar Graph:

sns.countplot(netflix['type'])

Show all the movies that were released in year 2021

Create a column ‘Release year’:

netflix['release year'] = netflix['date_added'].dt.year
netflix['release year']
0       2021.0
2       2021.0
3       2021.0
4       2021.0
5       2021.0
         ...  
8805    2019.0
8806    2019.0
8807    2019.0
8808    2020.0
8809    2019.0
Name: release year, Length: 8807, dtype: float64
netflix.loc[(netflix['release year']==2021) & (netflix['type']=='Movie'),['title']]

Show only the titles of Movies released in india only

netflix.loc[(netflix['type']=='Movie') & (netflix['country']=='India'),['title']]

Show top 10 directors who gave most Shows and movies ever .

netflix['director'].value_counts()[:10]
Rajiv Chilaka             19
Raúl Campos, Jan Suter    18
Marcus Raboy              16
Suhas Kadav               16
Jay Karas                 14
Cathy Garcia-Molina       13
Martin Scorsese           12
Youssef Chahine           12
Jay Chapman               12
Steven Spielberg          11
Name: director, dtype: int64

Show all the records of Indian Comedy Movies

netflix.loc[(netflix['country']=='India') & (netflix['listed_in'].str.contains('Comedies')) & (netflix['type']=='Movie')]

In how many movies ‘Will Smith’ is cast ?

netflixx = netflix.dropna()
netflixx.loc[netflixx['cast'].str.contains('Will Smith')]
netflixx.loc[netflixx['cast'].str.contains('Will Smith')]['cast'].count()
10

What are the different ratings defined by netflix ?

netflix['rating'].nunique()
17

How many movies got TV-14 rating in india in 2021?

netflix.loc[(netflix['rating']=='TV-14') & (netflix['country']=='India') & (netflix['release year']==2021) & (netflix['type']=='Movie')]

You are reading netflix data analysis.

How many TV Shows got the ‘R’ rating, after year 2018 ?

netflix.loc[(netflix['type']=='TV Show') & (netflix['rating']=='R') & (netflix['release year']>2018)]

You are reading netflix data analysis.

Conclusion:

Netflix, a giant streaming platform has made it big using big data analytics. Netflix is one of the most prominent examples of how advancements in technology have helped brands like Netflix to grow into becoming famous and successful. It is not only Netflix that is making use of big data analytics like Amazon. 

Learn to analyze data like never before . We have placed many students with our well structure course and guided learning.

Check here : Best Data Analytics Course with Python – Consoleflare

One thought on “Netflix Data Analysis with the help of Python and Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top