Dear all,
Dataset used in this quiz is taken from Kaggle - World Happiness Report competition.
Your going to find some statistical analysis for the dataset. Follow the instructions and answer the questions. Do not forget run the cell below:
import numpy as np
import pandas as pd
You need to upload quiz on time to Ninova. If you get late for uploading then your quiz will not be graded. There will not be a make-up quiz.
You need to upload your solutions as HTML file. Inside the Jupyter Notebooks follow: File > Download as > HTML (.html)
. Then upload the HTML file to the Ninova.
We have three different datasets for the years 2015, 2016, and 2017. The files' format are 'csv'. So you need to use pd.read_csv()
function to read the datasets. Assign each dataset to a seperate variable called: year15, year16, and year17:
year15 = None #replace None with your code
year16 = None #replace None with your code
year17 = None #replace None with your code
See the structure of the datasets using head()
function.
#Your code here for 2015
#Your code here for 2016
#Your code here for 2017
Run .describe()
function for all variables to see descriptive statistics and make comments.
#Your code here for 2015
#Your code here for 2016
#Your code here for 2017
----------> MAKE COMMENT HERE
Add new column, year
, to all datasets. value of the column must be equal to the year. For instance year16['year'] = 2016
. Do it for all dataframes.
#Your code here for 2015
#Your code here for 2016
#Your code here for 2017
After tagging the all datasets with 'year' column we can combine all datasets into one dataframe. Use pd.concat()
function to combine. Assign it into a new variable called df
.
#Your code here
Let's see how dataset looks like:
#Your code here
Delete the column Unnamed: 0
from the combined dataset, df
:
#Your code here
It looks like some countries are not listed in all datasets. Find which ones are missing. (You need to show your code for the answer.)
#Your code here
pandas counting values
from Google¶---------->LIST THE COUNTRIES HERE THAT ARE NOT LISTED IN ALL DATASETS
All questions are 20 points. There is one extra question.