Monday, January 30, 2017

Data Visualisation : Conditional Plots



In [18]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import warnings
warnings.filterwarnings('ignore')

Data

In [2]:
titanic = pd.read_csv("titanic.csv")
In [3]:
print(titanic.head())
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  
In [4]:
titanic.shape
Out[4]:
(891, 12)
In [11]:
cols = ['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
titanic = titanic[cols].dropna()

Removed unwanted columns

In [12]:
titanic.head()
Out[12]:
Survived Pclass Sex Age SibSp Parch Fare Embarked
0 0 3 male 22.0 1 0 7.2500 S
1 1 1 female 38.0 1 0 71.2833 C
2 1 3 female 26.0 0 0 7.9250 S
3 1 1 female 35.0 1 0 53.1000 S
4 0 3 male 35.0 0 0 8.0500 S

Visualizing using seaborn

Histogram

In [19]:
sns.distplot(titanic["Fare"])
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x119948c88>
In [20]:
sns.distplot(titanic["Age"])
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x119991320>
In [16]:
plt.boxplot(titanic["Age"])
Out[16]:
{'boxes': [<matplotlib.lines.Line2D at 0x1198a3ef0>],
 'caps': [<matplotlib.lines.Line2D at 0x1198b9a90>,
  <matplotlib.lines.Line2D at 0x119a3ee10>],
 'fliers': [<matplotlib.lines.Line2D at 0x11ce79b00>],
 'means': [],
 'medians': [<matplotlib.lines.Line2D at 0x119a3e860>],
 'whiskers': [<matplotlib.lines.Line2D at 0x1198a30b8>,
  <matplotlib.lines.Line2D at 0x1198b9e10>]}

Generating a kernal density plot

In [22]:
sns.kdeplot(titanic["Age"], shade= True)
plt.xlabel("Age")
Out[22]:
<matplotlib.text.Text at 0x11d3d5b70>

Conditional distribution using single condition

In [31]:
g = sns.FacetGrid(titanic, col = "Survived", size = 5)
g.map(sns.kdeplot, "Age")
Out[31]:
<seaborn.axisgrid.FacetGrid at 0x1209bd400>
In [32]:
g = sns.FacetGrid(titanic, col = "Pclass",size = 5)
g.map(sns.kdeplot,"Age")
Out[32]:
<seaborn.axisgrid.FacetGrid at 0x120bef0f0>

Creating a Conditional Plot using two conditions

Tip : use row parameter
In [33]:
g = sns.FacetGrid(titanic, row = "Pclass",col = "Survived", size = 5)
g.map(sns.kdeplot, "Age")
Out[33]:
<seaborn.axisgrid.FacetGrid at 0x120c02780>

Creating a Conditional Plot using three conditions

Tip: use hue
In [39]:
g = sns.FacetGrid(titanic, row = "Pclass", col = "Survived", hue = "Sex", size = 5)
g.map(sns.kdeplot, "Age", shade = True)
g.add_legend()
Out[39]:
<seaborn.axisgrid.FacetGrid at 0x11d7c7b70>
In [ ]:
 

No comments :

Post a Comment