Dear all,
You have three datasets:
Download Homework 1 Notebook and Dataset - You need to unzip the hw2 files.
Upload the homework to Ninova on time!
Before starting the homeworks run the codes to import necessary libraries:
import pandas as pd
import numpy as np
%matplotlib inline
Answer each questions and follow the instructions carefully:
Import/Read three datasets. You need to replace the None
s within the code below. Use pd.read_csv()
function.
gdp = None #Replace None with your code --- GDP
pop = None #Replace None with your code --- POPULATION
life = None #Replace None with your code --- LIFE EXPECTANCY
Use head()
function to see the structure of the datasets:
# Your code here for GDP
# Your code here for population
# Your code here for Life Expectancy
You will be working with your assigned country. BE CAREFUL! You need to work with your assigned country. Check out Ninova.
So select your own country and remove the rests:
gdp = None #Replace None with your code --- GDP
pop = None #Replace None with your code --- POPULATION
life = None #Replace None with your code --- LIFE EXPECTANCY
HINT: You must use boolean conditions to select your country. For instance gdp[gdp['Country Name'] == 'Turkey']
When we dropped the other countries we obtained three seperate series. Now drop all columns except the years between 1995
and 2016
for each of the three datasets. You have many options. One way is slicing your datasets using iloc
.
BE CAREFUL!!! Before replacing Nones below try to see how slicing works in an empty sell. For instance try gdp.iloc[:,:] looks like in a cell. You result must look like image below:
gdp = None #Replace None with your code --- GDP
pop = None #Replace None with your code --- POPULATION
life = None #Replace None with your code --- LIFE EXPECTANCY
After this operation your datasets should look like this:
gdp[gdp['Country Name'] == 'Turkey'].iloc[:,-23:-1]
Change the index names for all three datasets. Example: gdp.index = ['GDP']
# Index name for gdp should be GDP
# Index name for pop should be Population
# Index name for life should be Life Expectation
Use pd.concat()
function to combine three datasets:
df = None
Now transpose the dataset:
#Run this cell but do not make changes. Otherwise you cannot do other questions.
df = df.T
Now index names became column names.
#Run this cell but do not make changes. Otherwise you cannot do other questions.
df['GDP']=pd.to_numeric(df['GDP'])
df['Population']=pd.to_numeric(df['Population'])
df['Life Expectation']=pd.to_numeric(df['Life Expectation'])
Calculate the GDP per capita by dividing the GDP to the population. Use divide() attribute of the pandas:
df['GDP per capita'] = None
Seperatelly visualize the data. For instance: df['GDP'].plot()
:
# Plot GDP
# Plot GDP per capita
# Plot population
# Plot life expectancy
Make comments about data set. You can use the plots for commenting above.
-------------> Comments here
Check out the correlation between Life Expectancy
and GDP per capita
. You can use np.correlate()
function.
#code here
Save df
to an excel file.
# Your code here
You do not need to uplaod excel file.
You need to have a file.xlsx
at the end of the homework. You can see the example output for Turkey
.