Quiz 2

Dear all,

Dataset used in this quiz is taken from Kaggle - World Happiness Report competition.

Your going to find some statistical analysis for the dataset. Follow the instructions and answer the questions. Do not forget run the cell below:

In [ ]:
import numpy as np
import pandas as pd

How to Upload the Quiz?

You need to upload quiz on time to Ninova. If you get late for uploading then your quiz will not be graded. There will not be a make-up quiz.

You need to upload your solutions as HTML file. Inside the Jupyter Notebooks follow: File > Download as > HTML (.html). Then upload the HTML file to the Ninova.

Datasets

Download the datasets before starting:

1. Import three datasets

We have three different datasets for the years 2015, 2016, and 2017. The files' format are 'csv'. So you need to use pd.read_csv() function to read the datasets. Assign each dataset to a seperate variable called: year15, year16, and year17:

In [ ]:
year15 = None #replace None with your code
year16 = None #replace None with your code
year17 = None #replace None with your code

See the structure of the datasets using head() function.

In [ ]:
#Your code here for 2015
In [ ]:
#Your code here for 2016
In [ ]:
#Your code here for 2017

2. Descriptive Statistics

Run .describe() function for all variables to see descriptive statistics and make comments.

In [ ]:
#Your code here for 2015
In [ ]:
#Your code here for 2016
In [ ]:
#Your code here for 2017

----------> MAKE COMMENT HERE

3. Add 'year' column

Add new column, year, to all datasets. value of the column must be equal to the year. For instance year16['year'] = 2016. Do it for all dataframes.

In [ ]:
#Your code here for 2015
In [ ]:
#Your code here for 2016
In [ ]:
#Your code here for 2017

4. Combine the all datasets

After tagging the all datasets with 'year' column we can combine all datasets into one dataframe. Use pd.concat() function to combine. Assign it into a new variable called df.

In [ ]:
#Your code here

Let's see how dataset looks like:

In [ ]:
#Your code here

5. Delete Column

Delete the column Unnamed: 0 from the combined dataset, df:

In [ ]:
#Your code here

6. Find some countries

It looks like some countries are not listed in all datasets. Find which ones are missing. (You need to show your code for the answer.)

In [ ]:
#Your code here

Hint: Search for pandas counting values from Google

---------->LIST THE COUNTRIES HERE THAT ARE NOT LISTED IN ALL DATASETS

GRADING

All questions are 20 points. There is one extra question.