Homework 1

Dear all,

You have three datasets:

  1. gdp.csv: GDP of some countries.
  2. pop.csv: Population of some countries.
  3. life.csv: Life expectancy of some countries.

Download Homework 1 Notebook and Dataset - You need to unzip the hw2 files.

Upload the homework to Ninova on time!

Before starting the homeworks run the codes to import necessary libraries:

In [ ]:
import pandas as pd
import numpy as np
%matplotlib inline

Answer each questions and follow the instructions carefully:

1. Import datasets

Import/Read three datasets. You need to replace the Nones within the code below. Use pd.read_csv() function.

In [ ]:
gdp = None #Replace None with your code --- GDP
pop = None #Replace None with your code --- POPULATION
life = None #Replace None with your code --- LIFE EXPECTANCY

Use head() function to see the structure of the datasets:

In [ ]:
# Your code here for GDP
In [ ]:
# Your code here for population
In [ ]:
# Your code here for Life Expectancy

2. Select your own country

You will be working with your assigned country. BE CAREFUL! You need to work with your assigned country. Check out Ninova.

So select your own country and remove the rests:

In [ ]:
gdp = None #Replace None with your code --- GDP
In [ ]:
pop = None #Replace None with your code --- POPULATION
In [ ]:
life = None #Replace None with your code --- LIFE EXPECTANCY

HINT: You must use boolean conditions to select your country. For instance gdp[gdp['Country Name'] == 'Turkey']

3. Exclusion

When we dropped the other countries we obtained three seperate series. Now drop all columns except the years between 1995 and 2016 for each of the three datasets. You have many options. One way is slicing your datasets using iloc.

BE CAREFUL!!! Before replacing Nones below try to see how slicing works in an empty sell. For instance try gdp.iloc[:,:] looks like in a cell. You result must look like image below:

In [ ]:
gdp = None #Replace None with your code --- GDP
In [ ]:
pop = None #Replace None with your code --- POPULATION
In [ ]:
life = None #Replace None with your code --- LIFE EXPECTANCY

After this operation your datasets should look like this:

In [ ]:
gdp[gdp['Country Name'] == 'Turkey'].iloc[:,-23:-1]

4. Change Index Names

Change the index names for all three datasets. Example: gdp.index = ['GDP']

In [ ]:
# Index name for gdp should be GDP
In [ ]:
# Index name for pop should be Population
In [ ]:
# Index name for life should be Life Expectation

5. Combine the datasets

Use pd.concat() function to combine three datasets:

In [ ]:
df = None

Now transpose the dataset:

In [ ]:
#Run this cell but do not make changes. Otherwise you cannot do other questions.
df = df.T

Now index names became column names.

In [ ]:
#Run this cell but do not make changes. Otherwise you cannot do other questions.
df['GDP']=pd.to_numeric(df['GDP'])
df['Population']=pd.to_numeric(df['Population'])
df['Life Expectation']=pd.to_numeric(df['Life Expectation'])

6. GDP per capita

Calculate the GDP per capita by dividing the GDP to the population. Use divide() attribute of the pandas:

In [ ]:
df['GDP per capita'] = None

7. Data Visualization

Seperatelly visualize the data. For instance: df['GDP'].plot():

In [ ]:
# Plot GDP
In [ ]:
# Plot GDP per capita
In [ ]:
# Plot population
In [ ]:
# Plot life expectancy

8. Comments

Make comments about data set. You can use the plots for commenting above.

-------------> Comments here

9. Correlations

Check out the correlation between Life Expectancy and GDP per capita. You can use np.correlate() function.

In [ ]:
#code here

10. Save Excel

Save df to an excel file.

In [ ]:
# Your code here

You do not need to uplaod excel file.

EXAMPLE

You need to have a file.xlsx at the end of the homework. You can see the example output for Turkey.

file.xlsx