Quiz 2

Dear all,

Dataset used in this quiz is taken from Kaggle - World Happiness Report competition.

Your going to find some statistical analysis for the dataset. Follow the instructions and answer the questions. Do not forget run the cell below:

In [1]:
import numpy as np
import pandas as pd

How to Upload the Quiz?

You need to upload quiz on time to Ninova. If you get late for uploading then your quiz will not be graded. There will not be a make-up quiz.

You need to upload your solutions as HTML file. Inside the Jupyter Notebooks follow: File > Download as > HTML (.html). Then upload the HTML file to the Ninova.

Datasets

Download the datasets before starting:

1. Import three datasets

We have three different datasets for the years 2015, 2016, and 2017. The files' format are 'csv'. So you need to use pd.read_csv() function to read the datasets. Assign each dataset to a seperate variable called: year15, year16, and year17:

In [2]:
year15 = pd.read_csv('2015.csv')
year16 = pd.read_csv('2016.csv')
year17 = pd.read_csv('2017.csv')

See the structure of the datasets using head() function.

In [3]:
year15.head()
Out[3]:
Unnamed: 0 Country Happiness Rank Happiness Score Freedom Generosity
0 0 Switzerland 1 7.587 0.66557 0.29678
1 1 Iceland 2 7.561 0.62877 0.43630
2 2 Denmark 3 7.527 0.64938 0.34139
3 3 Norway 4 7.522 0.66973 0.34699
4 4 Canada 5 7.427 0.63297 0.45811
In [4]:
year16.head()
Out[4]:
Unnamed: 0 Country Happiness Rank Happiness Score Freedom Generosity
0 0 Denmark 1 7.526 0.57941 0.36171
1 1 Switzerland 2 7.509 0.58557 0.28083
2 2 Iceland 3 7.501 0.56624 0.47678
3 3 Norway 4 7.498 0.59609 0.37895
4 4 Finland 5 7.413 0.57104 0.25492
In [5]:
year17.head()
Out[5]:
Unnamed: 0 Country Happiness Rank Happiness Score Freedom Generosity
0 0 Norway 1 7.537 0.635423 0.362012
1 1 Denmark 2 7.522 0.626007 0.355280
2 2 Iceland 3 7.504 0.627163 0.475540
3 3 Switzerland 4 7.494 0.620071 0.290549
4 4 Finland 5 7.469 0.617951 0.245483

2. Descriptive Statistics

Run .describe() function for all variables to see descriptive statistics and make comments.

In [6]:
year15.describe()
Out[6]:
Unnamed: 0 Happiness Rank Happiness Score Freedom Generosity
count 158.000000 158.000000 158.000000 158.000000 158.000000
mean 78.500000 79.493671 5.375734 0.428615 0.237296
std 45.754781 45.754363 1.145010 0.150693 0.126685
min 0.000000 1.000000 2.839000 0.000000 0.000000
25% 39.250000 40.250000 4.526000 0.328330 0.150553
50% 78.500000 79.500000 5.232500 0.435515 0.216130
75% 117.750000 118.750000 6.243750 0.549092 0.309883
max 157.000000 158.000000 7.587000 0.669730 0.795880
In [7]:
year16.describe()
Out[7]:
Unnamed: 0 Happiness Rank Happiness Score Freedom Generosity
count 157.000000 157.000000 157.000000 157.000000 157.000000
mean 78.000000 78.980892 5.382185 0.370994 0.242635
std 45.466105 45.466030 1.141674 0.145507 0.133756
min 0.000000 1.000000 2.905000 0.000000 0.000000
25% 39.000000 40.000000 4.404000 0.257480 0.154570
50% 78.000000 79.000000 5.314000 0.397470 0.222450
75% 117.000000 118.000000 6.269000 0.484530 0.311850
max 156.000000 157.000000 7.526000 0.608480 0.819710
In [8]:
year17.describe()
Out[8]:
Unnamed: 0 Happiness Rank Happiness Score Freedom Generosity
count 155.000000 155.000000 155.000000 155.000000 155.000000
mean 77.000000 78.000000 5.354019 0.408786 0.246883
std 44.888751 44.888751 1.131230 0.149997 0.134780
min 0.000000 1.000000 2.693000 0.000000 0.000000
25% 38.500000 39.500000 4.505500 0.303677 0.154106
50% 77.000000 78.000000 5.279000 0.437454 0.231538
75% 115.500000 116.500000 6.101500 0.516561 0.323762
max 154.000000 155.000000 7.537000 0.658249 0.838075

----------> MAKE COMMENT HERE

3. Add 'year' column

Add new column, year, to all datasets. value of the column must be equal to the year. For instance year16['year'] = 2016. Do it for all dataframes.

In [9]:
year15['year'] = 2015
In [10]:
year16['year'] = 2016
In [11]:
year17['year'] = 2017

4. Combine the all datasets

After tagging the all datasets with 'year' column we can combine all datasets into one dataframe. Use pd.concat() function to combine. Assign it into a new variable called df.

In [12]:
df = pd.concat([year15, year16, year17])

Let's see how dataset looks like:

In [13]:
df.head()
Out[13]:
Unnamed: 0 Country Happiness Rank Happiness Score Freedom Generosity year
0 0 Switzerland 1 7.587 0.66557 0.29678 2015
1 1 Iceland 2 7.561 0.62877 0.43630 2015
2 2 Denmark 3 7.527 0.64938 0.34139 2015
3 3 Norway 4 7.522 0.66973 0.34699 2015
4 4 Canada 5 7.427 0.63297 0.45811 2015

5. Delete Column

Delete the column Unnamed: 0 from the combined dataset, df:

In [14]:
del df['Unnamed: 0']

6. Find some countries

It looks like some countries are not listed in all datasets. Find which ones are missing. (You need to show your code for the answer.)

In [15]:
df['Country'].value_counts() < 3
Out[15]:
Iraq                        False
Uzbekistan                  False
Turkmenistan                False
Ecuador                     False
Ukraine                     False
Qatar                       False
Turkey                      False
Switzerland                 False
Israel                      False
Montenegro                  False
Armenia                     False
Serbia                      False
Brazil                      False
Mauritius                   False
Uruguay                     False
El Salvador                 False
Guatemala                   False
Palestinian Territories     False
United Arab Emirates        False
Egypt                       False
Cyprus                      False
Norway                      False
Madagascar                  False
South Korea                 False
Luxembourg                  False
Greece                      False
Czech Republic              False
Bahrain                     False
Guinea                      False
Spain                       False
                            ...  
Lithuania                   False
Ivory Coast                 False
Hungary                     False
Uganda                      False
Cambodia                    False
Liberia                     False
Angola                      False
New Zealand                 False
Honduras                    False
Cameroon                    False
Comoros                      True
Belize                       True
Somalia                      True
Lesotho                      True
South Sudan                  True
Hong Kong                    True
Suriname                     True
Mozambique                   True
Taiwan                       True
Central African Republic     True
Namibia                      True
Laos                         True
Oman                         True
Hong Kong S.A.R., China      True
Djibouti                     True
Puerto Rico                  True
Somaliland Region            True
Taiwan Province of China     True
Somaliland region            True
Swaziland                    True
Name: Country, Length: 166, dtype: bool

Hint: Search for pandas counting values from Google

LIST THE COUNTRIES HERE THAT ARE NOT LISTED IN ALL DATASETS

GRADING

All questions are 20 points. There is one extra question.