Homework 1¶

Dear all,

You have three datasets:

gdp.csv: GDP of some countries.
pop.csv: Population of some countries.
life.csv: Life expectancy of some countries.

Download Homework 1 Notebook and Dataset - You need to unzip the hw2 files.

Upload the homework to Ninova on time!

Before starting the homeworks run the codes to import necessary libraries:

import pandas as pd
import numpy as np
%matplotlib inline

Answer each questions and follow the instructions carefully:

1. Import datasets¶

Import/Read three datasets. You need to replace the Nones within the code below. Use pd.read_csv() function.

gdp = None #Replace None with your code --- GDP
pop = None #Replace None with your code --- POPULATION
life = None #Replace None with your code --- LIFE EXPECTANCY

Use head() function to see the structure of the datasets:

# Your code here for GDP

# Your code here for population

# Your code here for Life Expectancy

2. Select your own country¶

You will be working with your assigned country. BE CAREFUL! You need to work with your assigned country. Check out Ninova.

So select your own country and remove the rests:

gdp = None #Replace None with your code --- GDP

pop = None #Replace None with your code --- POPULATION

life = None #Replace None with your code --- LIFE EXPECTANCY

HINT: You must use boolean conditions to select your country. For instance gdp[gdp['Country Name'] == 'Turkey']

3. Exclusion¶

When we dropped the other countries we obtained three seperate series. Now drop all columns except the years between 1995 and 2016 for each of the three datasets. You have many options. One way is slicing your datasets using iloc.

BE CAREFUL!!! Before replacing Nones below try to see how slicing works in an empty sell. For instance try gdp.iloc[:,:] looks like in a cell. You result must look like image below:

gdp = None #Replace None with your code --- GDP

pop = None #Replace None with your code --- POPULATION

life = None #Replace None with your code --- LIFE EXPECTANCY

After this operation your datasets should look like this:

gdp[gdp['Country Name'] == 'Turkey'].iloc[:,-23:-1]

4. Change Index Names¶

Change the index names for all three datasets. Example: gdp.index = ['GDP']

# Index name for gdp should be GDP

# Index name for pop should be Population

# Index name for life should be Life Expectation

5. Combine the datasets¶

Use pd.concat() function to combine three datasets:

df = None

Now transpose the dataset:

#Run this cell but do not make changes. Otherwise you cannot do other questions.
df = df.T

Now index names became column names.

#Run this cell but do not make changes. Otherwise you cannot do other questions.
df['GDP']=pd.to_numeric(df['GDP'])
df['Population']=pd.to_numeric(df['Population'])
df['Life Expectation']=pd.to_numeric(df['Life Expectation'])

6. GDP per capita¶

Calculate the GDP per capita by dividing the GDP to the population. Use divide() attribute of the pandas:

df['GDP per capita'] = None

7. Data Visualization¶

Seperatelly visualize the data. For instance: df['GDP'].plot():

# Plot GDP

# Plot GDP per capita

# Plot population

# Plot life expectancy

8. Comments¶

Make comments about data set. You can use the plots for commenting above.

-------------> Comments here

9. Correlations¶

Check out the correlation between Life Expectancy and GDP per capita. You can use np.correlate() function.

#code here

10. Save Excel¶

Save df to an excel file.

# Your code here

You do not need to uplaod excel file.

EXAMPLE¶

You need to have a file.xlsx at the end of the homework. You can see the example output for Turkey.

file.xlsx