Introduction to Numpy & Pandas

Python Libraries

Python have some built-in functions which we had seen some before. Yet, these functions are the very basic of Python programming. You can download Youtube videos, access Twitter API, make GPU analysis for Big Data, create games or web pages, built robots, and so on.

Python also have its Standard Library like math, os, sys, random, time... But most enjoyful applications created with Python are created with other libraries.

Also check out 100 Numpy Exercises and Solutions.

We will cover the two basic external libraries: Numpy & Pandas. They are irreplaceable libraries of Python especially for Data Science.

Downlaod this notebook to make practice.

Numpy

NumPy has strong integration with other Python libraries. It looks like small brother of pandas in a way :). Let's call him:

In [1]:
import numpy as np

Arrays are similar to the lists but are more flexible.

In [2]:
stock_list = [3.5, 5, 2, 8, 4.2]
In [3]:
returns = np.array(stock_list)
print('List: ', stock_list)
print('Array: ', returns)
print(returns.shape)
print(type(returns))
List:  [3.5, 5, 2, 8, 4.2]
Array:  [ 3.5  5.   2.   8.   4.2]
(5,)
<class 'numpy.ndarray'>

Be careful about the notation:

In [4]:
np.array(1,2,3,4)    # WRONG
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-9aa09435aea3> in <module>()
----> 1 np.array(1,2,3,4)    # WRONG

ValueError: only 2 non-keyword arguments accepted
In [5]:
np.array([1,2,3,4])  # RIGHT
Out[5]:
array([1, 2, 3, 4])

We can create two dimensinal array by:

In [6]:
A = np.array([[1, 2], [3, 4]])
print(A)
print(A.shape)
print(type(A))
[[1 2]
 [3 4]]
(2, 2)
<class 'numpy.ndarray'>

Arrays are indexed in much the same way as lists in Python. Elements of a list begin indexing from $0$ and end at $n−1$ , where $n$ is the length of the array.

In [7]:
print(returns[0], returns[len(returns) - 1])
3.5 4.2
In [8]:
print(returns[1:3])
[ 5.  2.]
In [9]:
print(A[:, 0])
[1 3]
In [10]:
print(A[0, :])
[1 2]

Passing only one index to a 2-dimensional array will result in returning the row with the given index as well, providing us with another way to access individual rows.

In [11]:
print(type(A[0,:]))
<class 'numpy.ndarray'>

Accessing the index of an individual element will return only the element.

In [12]:
print(A[1, 1])
4

Complex Numbers:

In [13]:
c = np.array( [ [1,2], [3,4] ], dtype=complex )
print(c)
[[ 1.+0.j  2.+0.j]
 [ 3.+0.j  4.+0.j]]

Array Functions

In [14]:
print(np.log(returns))
[ 1.25276297  1.60943791  0.69314718  2.07944154  1.43508453]
In [15]:
print(np.mean(returns))
4.54
In [16]:
print(np.max(returns))
8.0
In [17]:
print(np.min(returns))
2.0
In [18]:
print(np.exp(returns))
[   33.11545196   148.4131591      7.3890561   2980.95798704    66.68633104]
In [19]:
print(np.sqrt(returns))
[ 1.87082869  2.23606798  1.41421356  2.82842712  2.04939015]

Further Reading: Universal Functions

You can do the very basic math operations to the numpy arrays:

In [20]:
print(returns * 0)
[ 0.  0.  0.  0.  0.]
In [21]:
print(returns * 2)
[  7.   10.    4.   16.    8.4]
In [22]:
print(returns * 2 + 10)
[ 17.   20.   14.   26.   18.4]
In [23]:
print("Mean: ", np.mean(returns),' --- ', "Std Dev: ", np.std(returns))
Mean:  4.54  ---  Std Dev:  1.99158228552
In [24]:
N = 10
assets = np.zeros((N, 100))
returns = np.zeros((N, 100))

This function, zeroes(), creates a NumPy array with the given dimensions that is entirely filled in with $0$ . We can pass a single value or a tuple of as many dimensions as we like. Passing in the tuple (N, 100), will return a two-dimensional array with $N$ rows and $100$ columns. Our result is a $N \times 100$ array.

You can create a matrix with 1's instead of 0:

In [25]:
N = 10
assets = np.ones((N, 100))
returns = np.ones((N, 100))

Alternative of range function. It is better integrated with numpy functions:

In [26]:
print(np.arange(6))
[0 1 2 3 4 5]

You can create ranges with shapes:

In [27]:
print(np.arange(12).reshape(4,3))
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
In [28]:
print(np.arange(24).reshape(2,3,4))
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

You can reshape the dimensions with reshape(x,y,z,...):

In [29]:
arr = np.arange(12).reshape(4,3)
print(arr)
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
In [30]:
print(arr.reshape(12,1))
[[ 0]
 [ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]
 [11]]
In [31]:
print(arr.reshape(1,12))
[[ 0  1  2  3  4  5  6  7  8  9 10 11]]

You will have error if you use impossible shapes:

In [32]:
print(arr.reshape(1,4))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-0f3a5b9b5b3e> in <module>()
----> 1 print(arr.reshape(1,4))

ValueError: cannot reshape array of size 12 into shape (1,4)
In [33]:
print(arr.reshape(1,13))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-9bdbaa0929fc> in <module>()
----> 1 print(arr.reshape(1,13))

ValueError: cannot reshape array of size 12 into shape (1,13)

Random values with shape:

In [34]:
r_arr = np.random.random((2,2))
print(r_arr)
[[ 0.02636317  0.49639583]
 [ 0.88345342  0.61187671]]

Save you values:

In [35]:
np.savetxt('test.txt', r_arr, delimiter=',')

You can save the variables as objects instead:

In [36]:
np.save('f.npy', r_arr)

Then load it again:

In [37]:
loaded = np.load('f.npy')
print(loaded)
[[ 0.02636317  0.49639583]
 [ 0.88345342  0.61187671]]

NaN values

In [38]:
print(np.nan)
nan
In [39]:
v = np.array([1, 2, np.nan, 4, 5])
print(v)
[  1.   2.  nan   4.   5.]
In [40]:
print(np.isnan(v))
[False False  True False False]
In [41]:
print(np.mean(v))
nan

Solution:

In [42]:
ix = ~np.isnan(v) # the ~ indicates a logical not, inverting the bools
print(np.mean(v[ix])) # We can also just write v = v[~np.isnan(v)]
3.0

Or:

In [43]:
np.nanmean(v)
Out[43]:
3.0

Matrix Operations

Dot/Scalar Product

Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the dot function or method:

In [44]:
A = np.array( [[1,1],[0,1]] )
B = np.array( [[2,0],[3,4]] )
In [45]:
A * B
Out[45]:
array([[2, 0],
       [0, 4]])
In [46]:
A.dot(B)
Out[46]:
array([[5, 4],
       [3, 4]])
In [47]:
B.dot(A)
Out[47]:
array([[2, 2],
       [3, 7]])

Or:

In [48]:
np.dot(A,B)
Out[48]:
array([[5, 4],
       [3, 4]])

Cross Product

In [49]:
np.cross(A,B)
Out[49]:
array([-2, -3])

Exercise - Linearization

In [50]:
# data is created with:
# np.savetxt('data.txt', np.log(10*np.random.random(1000)))

Download the data.txt

In [51]:
non_linear = np.loadtxt('data.txt')
In [52]:
import matplotlib.pyplot as plt 
In [53]:
non_linear = np.sort(non_linear)
plt.plot(non_linear)
plt.show()
In [54]:
plt.plot(np.square(4+non_linear)*np.square(4+non_linear)*np.square(4+non_linear))
plt.show()
In [55]:
plt.plot(np.exp(non_linear))
plt.show()