Quiz 2¶

Dear all,

Dataset used in this quiz is taken from Kaggle - World Happiness Report competition.

Your going to find some statistical analysis for the dataset. Follow the instructions and answer the questions. Do not forget run the cell below:

import numpy as np
import pandas as pd

How to Upload the Quiz?¶

You need to upload quiz on time to Ninova. If you get late for uploading then your quiz will not be graded. There will not be a make-up quiz.

You need to upload your solutions as HTML file. Inside the Jupyter Notebooks follow: File > Download as > HTML (.html). Then upload the HTML file to the Ninova.

Datasets¶

Download the datasets before starting:

1. Import three datasets¶

We have three different datasets for the years 2015, 2016, and 2017. The files' format are 'csv'. So you need to use pd.read_csv() function to read the datasets. Assign each dataset to a seperate variable called: year15, year16, and year17:

year15 = pd.read_csv('2015.csv')
year16 = pd.read_csv('2016.csv')
year17 = pd.read_csv('2017.csv')

See the structure of the datasets using head() function.

year15.head()

year16.head()

year17.head()

2. Descriptive Statistics¶

Run .describe() function for all variables to see descriptive statistics and make comments.

year15.describe()

year16.describe()

year17.describe()

----------> MAKE COMMENT HERE

3. Add 'year' column¶

Add new column, year, to all datasets. value of the column must be equal to the year. For instance year16['year'] = 2016. Do it for all dataframes.

year15['year'] = 2015

year16['year'] = 2016

year17['year'] = 2017

4. Combine the all datasets¶

After tagging the all datasets with 'year' column we can combine all datasets into one dataframe. Use pd.concat() function to combine. Assign it into a new variable called df.

df = pd.concat([year15, year16, year17])

Let's see how dataset looks like:

df.head()

5. Delete Column¶

Delete the column Unnamed: 0 from the combined dataset, df:

del df['Unnamed: 0']

6. Find some countries¶

It looks like some countries are not listed in all datasets. Find which ones are missing. (You need to show your code for the answer.)

df['Country'].value_counts() < 3

Iraq                        False
Uzbekistan                  False
Turkmenistan                False
Ecuador                     False
Ukraine                     False
Qatar                       False
Turkey                      False
Switzerland                 False
Israel                      False
Montenegro                  False
Armenia                     False
Serbia                      False
Brazil                      False
Mauritius                   False
Uruguay                     False
El Salvador                 False
Guatemala                   False
Palestinian Territories     False
United Arab Emirates        False
Egypt                       False
Cyprus                      False
Norway                      False
Madagascar                  False
South Korea                 False
Luxembourg                  False
Greece                      False
Czech Republic              False
Bahrain                     False
Guinea                      False
Spain                       False
                            ...  
Lithuania                   False
Ivory Coast                 False
Hungary                     False
Uganda                      False
Cambodia                    False
Liberia                     False
Angola                      False
New Zealand                 False
Honduras                    False
Cameroon                    False
Comoros                      True
Belize                       True
Somalia                      True
Lesotho                      True
South Sudan                  True
Hong Kong                    True
Suriname                     True
Mozambique                   True
Taiwan                       True
Central African Republic     True
Namibia                      True
Laos                         True
Oman                         True
Hong Kong S.A.R., China      True
Djibouti                     True
Puerto Rico                  True
Somaliland Region            True
Taiwan Province of China     True
Somaliland region            True
Swaziland                    True
Name: Country, Length: 166, dtype: bool

Hint: Search for `pandas counting values` from Google¶

LIST THE COUNTRIES HERE THAT ARE NOT LISTED IN ALL DATASETS

GRADING¶

All questions are 20 points. There is one extra question.

	Unnamed: 0	Country	Happiness Rank	Happiness Score	Freedom	Generosity
0	0	Switzerland	1	7.587	0.66557	0.29678
1	1	Iceland	2	7.561	0.62877	0.43630
2	2	Denmark	3	7.527	0.64938	0.34139
3	3	Norway	4	7.522	0.66973	0.34699
4	4	Canada	5	7.427	0.63297	0.45811

	Unnamed: 0	Country	Happiness Rank	Happiness Score	Freedom	Generosity
0	0	Denmark	1	7.526	0.57941	0.36171
1	1	Switzerland	2	7.509	0.58557	0.28083
2	2	Iceland	3	7.501	0.56624	0.47678
3	3	Norway	4	7.498	0.59609	0.37895
4	4	Finland	5	7.413	0.57104	0.25492

	Unnamed: 0	Country	Happiness Rank	Happiness Score	Freedom	Generosity
0	0	Norway	1	7.537	0.635423	0.362012
1	1	Denmark	2	7.522	0.626007	0.355280
2	2	Iceland	3	7.504	0.627163	0.475540
3	3	Switzerland	4	7.494	0.620071	0.290549
4	4	Finland	5	7.469	0.617951	0.245483

	Unnamed: 0	Happiness Rank	Happiness Score	Freedom	Generosity
count	158.000000	158.000000	158.000000	158.000000	158.000000
mean	78.500000	79.493671	5.375734	0.428615	0.237296
std	45.754781	45.754363	1.145010	0.150693	0.126685
min	0.000000	1.000000	2.839000	0.000000	0.000000
25%	39.250000	40.250000	4.526000	0.328330	0.150553
50%	78.500000	79.500000	5.232500	0.435515	0.216130
75%	117.750000	118.750000	6.243750	0.549092	0.309883
max	157.000000	158.000000	7.587000	0.669730	0.795880

	Unnamed: 0	Happiness Rank	Happiness Score	Freedom	Generosity
count	157.000000	157.000000	157.000000	157.000000	157.000000
mean	78.000000	78.980892	5.382185	0.370994	0.242635
std	45.466105	45.466030	1.141674	0.145507	0.133756
min	0.000000	1.000000	2.905000	0.000000	0.000000
25%	39.000000	40.000000	4.404000	0.257480	0.154570
50%	78.000000	79.000000	5.314000	0.397470	0.222450
75%	117.000000	118.000000	6.269000	0.484530	0.311850
max	156.000000	157.000000	7.526000	0.608480	0.819710

	Unnamed: 0	Happiness Rank	Happiness Score	Freedom	Generosity
count	155.000000	155.000000	155.000000	155.000000	155.000000
mean	77.000000	78.000000	5.354019	0.408786	0.246883
std	44.888751	44.888751	1.131230	0.149997	0.134780
min	0.000000	1.000000	2.693000	0.000000	0.000000
25%	38.500000	39.500000	4.505500	0.303677	0.154106
50%	77.000000	78.000000	5.279000	0.437454	0.231538
75%	115.500000	116.500000	6.101500	0.516561	0.323762
max	154.000000	155.000000	7.537000	0.658249	0.838075