## Wednesday, January 25, 2017

### Scales in Data Science¶

• Ratio Scale
• Units are equally spaced
• Mathematical operations such as +,-,*,/ are all valid
• Example : Height and Weight
• Interval Scale
• Units are equally spaced, but there is no true zero value
• For example, in temperature values, zero doesn't indicate that there is an absence of temperature.
• Ordinal Scale
• The order of the scale is important. It's not evenly spaced scale.
• Example : Grades such as A, A-, A+.
• Nominal Scale(Categorical)
• It's very common in data science. These are the categories of data.
• The order of the data is not important.
• For example : Sports team
In :
import pandas as pd
import numpy as np

In :
student = ["alex","bob","cynthia","daniel","evans"]
tshirt = ["L","XL","S","M","L"]

In :
df = pd.DataFrame(data = tshirt, index=student)

In :
df = df.rename(columns={0:"tshirt"})

In :
df

Out:
tshirt
alex L
bob XL
cynthia S
daniel M
evans L

### Nominal Scale (Setting type as category)¶

In :
df["tshirt"].astype("category")

Out:
alex        L
bob        XL
cynthia     S
daniel      M
evans       L
Name: tshirt, dtype: category
Categories (4, object): [L, M, S, XL]

### Ordinal scale (ordered = True)¶

In :
df = df["tshirt"].astype("category", categories = ["S","M","L","XL"],ordered = True)

In :
df

Out:
alex        L
bob        XL
cynthia     S
daniel      M
evans       L
Name: tshirt, dtype: category
Categories (4, object): [S < M < L < XL]
In :
df.loc[["alex"]] < df.loc[["daniel"]]

Out:
alex    False
Name: tshirt, dtype: bool
In :
df.loc["alex"]

Out:
'L'
In :
df.loc["daniel"]

Out:
'M'
In :
df >="S"

Out:
alex       True
bob        True
cynthia    True
daniel     True
evans      True
Name: tshirt, dtype: bool

### cut function¶

In :
s = pd.Series([9,8,10,1,2,3,6,7,4,5])

pd.cut(s, 3)

Out:
0       (7, 10]
1       (7, 10]
2       (7, 10]
3    (0.991, 4]
4    (0.991, 4]
5    (0.991, 4]
6        (4, 7]
7        (4, 7]
8    (0.991, 4]
9        (4, 7]
dtype: category
Categories (3, object): [(0.991, 4] < (4, 7] < (7, 10]]
In :
s

Out:
0     9
1     8
2    10
3     1
4     2
5     3
6     6
7     7
8     4
9     5
dtype: int64
In :
# You can also add labels for the sizes [Small < Medium < Large].
pd.cut(s, 3, labels=['Small', 'Medium', 'Large'])

Out:
0     Large
1     Large
2     Large
3     Small
4     Small
5     Small
6    Medium
7    Medium
8     Small
9    Medium
dtype: category
Categories (3, object): [Small < Medium < Large]
In [ ]: