Numpy library¶
Numpy (Numerical python is very similar to python lists. We can do operations like list indexing,slicing in numpy. However, there are some differences in numpy compared to python lists. In Numpy, all the data should be of same datatype. It has some additional functions like mean(), std(). It can support multi-dimensional array.
In [3]:
# First 20 countries with employment data
countries = np.array([
'Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas',
'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
'Belize', 'Benin', 'Bhutan', 'Bolivia',
'Bosnia and Herzegovina'
])
# Employment data in 2007 for those 20 countries
employment = np.array([
55.70000076, 51.40000153, 50.5 , 75.69999695,
58.40000153, 40.09999847, 61.5 , 57.09999847,
60.90000153, 66.59999847, 60.40000153, 68.09999847,
66.90000153, 53.40000153, 48.59999847, 56.79999924,
71.59999847, 58.40000153, 70.40000153, 41.20000076
])
In [4]:
print(countries)
In [5]:
print(employment)
Accessing elements¶
In [6]:
countries[1:5]
Out[6]:
In [7]:
countries[10]
Out[7]:
Element types¶
In [8]:
print(type(countries))
In [9]:
print(countries.dtype)
In [10]:
print(np.array([1,2,3]).dtype)
In [11]:
print(np.array([True,False,True]).dtype)
In [12]:
print(np.array(['a','b','c']).dtype)
In [13]:
print(np.array(["hello","hi"]).dtype)
In [14]:
print(np.array(["hell","hi"]).dtype)
numpy functions.¶
In [15]:
print(employment.max())
print(employment.min())
print(employment.mean())
print(employment.std())
Highest employment rate country:¶
In [16]:
def max_employment(countries,eemployment):
max_country = ''
max_employment = 0
for i in range(len(countries)):
if employment[i] > max_employment:
max_employment = employment[i]
max_country = countries[i]
print(max_country,max_employment)
max_employment(countries,employment)
In [17]:
print(employment.max())
max_pos = employment.argmax() #position of maximum value
print(countries[max_pos])
+ operator in numpy vs standard python¶
In numpy, vector is a list of numbers. If we add two vectors, it follows vector addition.
In [18]:
num1 = np.array([1,2,3])
num2 = np.array([4,5,6])
print(num1+num2)
In [19]:
num1 = [1,2,3]
num2 = [4,5,6]
print(num1+num2)
standardizing an numpy array (Normalization)¶
$ z = \frac{x - \mu}{\sigma} $where $\mu $ is mean, and $ \sigma $ is standard deviation.
In [20]:
arr = np.array([10,20,30,90,80,50,100,35,80])
print("mean = "+ str(arr.mean()))
print("std deviation = " + str(arr.std()))
arr = (arr - arr.mean())/arr.std()
print(arr)
Index array in Numpy¶
In [21]:
a = [1,2,3,4,5]
b = [True,True,False,False,False]
print(a[b])
In [26]:
a = np.array(a)
b = np.array(b)
print(a[b])
+ vs +=¶
In [27]:
a = np.array([1,2,3])
b = a
a = a+ np.array([4,5,6])
print(b)
In [28]:
a = np.array([1,2,3])
b = a
a+=np.array([4,5,6])
print(b)
in-place vs not in-place¶
In [29]:
li = [1,2,3,4,5]
sl = li[0:3]
sl[0] = 100
print(li)
In [30]:
li = np.array([1,2,3,4,5,6])
sl = li[0:3]
sl[0] = 100
print(li)
Pandas demo¶
In [31]:
life_expectancy_values = [74.7, 75. , 83.4, 57.6, 74.6, 75.4, 72.3, 81.5, 80.2,
70.3, 72.1, 76.4, 68.1, 75.2, 69.8, 79.4, 70.8, 62.7,
67.3, 70.6]
gdp_values = [ 1681.61390973, 2155.48523109, 21495.80508273, 562.98768478,
13495.1274663 , 9388.68852258, 1424.19056199, 24765.54890176,
27036.48733192, 1945.63754911, 21721.61840978, 13373.21993972,
483.97086804, 9783.98417323, 2253.46411147, 25034.66692293,
3680.91642923, 366.04496652, 1175.92638695, 1132.21387981]
In [32]:
life_expectancy = pd.Series(life_expectancy_values)
gdp = pd.Series(gdp_values)
In [33]:
print(life_expectancy)
In [34]:
print(gdp)
Accessing Series elements using indexing.¶
In [35]:
print(gdp[0:5])
pandas functions¶
In [36]:
print(gdp.mean())
print(gdp.max())
print(gdp.min())
In [37]:
print(life_expectancy.mean())
print(life_expectancy.max())
print(life_expectancy.min())
In [40]:
plt.plot(life_expectancy)
Out[40]:
In [42]:
plt.plot(gdp)
Out[42]:
In [43]:
life_mean = life_expectancy.mean()
gdp_mean = gdp.mean()
gdp_above = [gdp>gdp_mean]
life_above = [life_expectancy>life_mean]
gdp_below = [gdp<gdp_mean]
life_below = [life_expectancy<life_mean]
In [44]:
relation_above = gdp_above and life_above
relation_below = gdp_below and life_below
In [45]:
print(relation_above)
In [46]:
print(relation_below)
In [47]:
true_count = 0
false_count =0
for i in range(len(relation_above)):
if relation_above[i] is False and relation_below[i] is False:
false_count = false_count + 1
print(false_count)
In [48]:
print(gdp_above)
In [49]:
print(life_above)
In [50]:
true_count = 0
false_count = 0
for i in range(len(life_above[0])):
if gdp_above[0][i] and life_above[0][i] :
true_count = true_count +1
if not gdp_above[0][i] and not life_above[0][i]:
true_count = true_count +1
print(true_count)
print(len(life_above[0])-true_count)
In [51]:
countries = [
'Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas',
'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
'Belize', 'Benin', 'Bhutan', 'Bolivia',
'Bosnia and Herzegovina'
]
employment_values = [
55.70000076, 51.40000153, 50.5 , 75.69999695,
58.40000153, 40.09999847, 61.5 , 57.09999847,
60.90000153, 66.59999847, 60.40000153, 68.09999847,
66.90000153, 53.40000153, 48.59999847, 56.79999924,
71.59999847, 58.40000153, 70.40000153, 41.20000076
]
In [52]:
employment = pd.Series(employment_values,index=countries)
print(employment)
loc vs iloc in pandas series¶
In [53]:
print(employment.iloc[0])
print(employment.loc["Algeria"])
In [54]:
print(employment.argmax())
print(employment.loc["Angola"])
addition when indexes are same¶
In [55]:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(s1+s2)
addition when some indexes are not same¶
In [56]:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'e'])
print(s1+s2)
addition when all the element indexes are different¶
In [57]:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([10, 20, 30, 40], index=['e', 'f', 'g', 'h'])
print(s1+s2)
dropna, fillna¶
In [58]:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'e'])
s3 = s1+s2
print(s3)
s3 = s3.dropna()
print(s3)
In [59]:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'e'])
s3 = s1+s2
s3 = s3.fillna(0)
print(s3)
apply function¶
In [60]:
s = pd.Series([1, 2, 3, 4, 5])
def add_one(x):
return x + 1
print (s.apply(add_one))
No comments :
Post a Comment