Scales in Data Science¶
- Ratio Scale
- Units are equally spaced
- Mathematical operations such as +,-,*,/ are all valid
- Example : Height and Weight
- Interval Scale
- Units are equally spaced, but there is no true zero value
- For example, in temperature values, zero doesn't indicate that there is an absence of temperature.
- Ordinal Scale
- The order of the scale is important. It's not evenly spaced scale.
- Example : Grades such as A, A-, A+.
- Nominal Scale(Categorical)
- It's very common in data science. These are the categories of data.
- The order of the data is not important.
- For example : Sports team
In [1]:
import pandas as pd
import numpy as np
In [2]:
student = ["alex","bob","cynthia","daniel","evans"]
tshirt = ["L","XL","S","M","L"]
In [3]:
df = pd.DataFrame(data = tshirt, index=student)
In [4]:
df = df.rename(columns={0:"tshirt"})
In [5]:
df
Out[5]:
Nominal Scale (Setting type as category)¶
In [6]:
df["tshirt"].astype("category")
Out[6]:
Ordinal scale (ordered = True)¶
In [7]:
df = df["tshirt"].astype("category", categories = ["S","M","L","XL"],ordered = True)
In [8]:
df
Out[8]:
In [9]:
df.loc[["alex"]] < df.loc[["daniel"]]
Out[9]:
In [10]:
df.loc["alex"]
Out[10]:
In [11]:
df.loc["daniel"]
Out[11]:
In [12]:
df >="S"
Out[12]:
cut function¶
In [13]:
s = pd.Series([9,8,10,1,2,3,6,7,4,5])
pd.cut(s, 3)
Out[13]:
In [14]:
s
Out[14]:
In [15]:
# You can also add labels for the sizes [Small < Medium < Large].
pd.cut(s, 3, labels=['Small', 'Medium', 'Large'])
Out[15]:
In [ ]:
No comments :
Post a Comment