# Introduction to Numpy & Pandas

## Python Libraries

- [Built-in Functions](https://docs.python.org/3/library/functions.html)
- [The Python Standard Library](https://docs.python.org/3/library/index.html)

Python have some built-in functions which we had seen some before. Yet, these functions are the very basic of Python programming. You can download Youtube videos, access Twitter API, make GPU analysis for Big Data, create games or web pages, built robots, and so on. 

Python also have its Standard Library like `math`, `os`, `sys`, `random`, `time`... But most enjoyful applications created with Python are created with `other` libraries. 

- [20 Python libraries you can’t live without](https://pythontips.com/2013/07/30/20-python-libraries-you-cant-live-without/)
- [20 Great Python Libraries You Must Know](http://blog.stoneriverelearning.com/20-great-python-libraries-you-must-know/)

We will cover the two basic external libraries: Numpy & Pandas. They are irreplaceable libraries of Python especially for Data Science. 

Downlaod this [notebook](/bil113e/notebooks/numpy.ipynb) to make practice.

## Numpy

NumPy has strong integration with other Python libraries. It looks like small brother of pandas in a way :). Let's call him:

In [None]:
import numpy as np

Arrays are similar to the lists but are more flexible.

In [None]:
stock_list = [3.5, 5, 2, 8, 4.2]

In [None]:
returns = np.array(stock_list)
print('List: ', stock_list)
print('Array: ', returns)
print(returns.shape)
print(type(returns))

Be careful about the notation:

In [None]:
np.array(1,2,3,4)    # WRONG

In [None]:
np.array([1,2,3,4])  # RIGHT

We can create two dimensinal array by:

In [None]:
A = np.array([[1, 2], [3, 4]])
print(A)
print(A.shape)
print(type(A))

Arrays are indexed in much the same way as lists in Python. Elements of a list begin indexing from  $0$  and end at  $n−1$ , where  $n$  is the length of the array.

In [None]:
print(returns[0], returns[len(returns) - 1])

In [None]:
print(returns[1:3])

In [None]:
print(A[:, 0])

In [None]:
print(A[0, :])

Passing only one index to a 2-dimensional array will result in returning the row with the given index as well, providing us with another way to access individual rows.

In [None]:
print(type(A[0,:]))

Accessing the index of an individual element will return only the element.

In [None]:
print(A[1, 1])

#### Complex Numbers:

In [None]:
c = np.array( [ [1,2], [3,4] ], dtype=complex )
print(c)

### Array Functions ###

In [None]:
print(np.log(returns))

In [None]:
print(np.mean(returns))

In [None]:
print(np.max(returns))

In [None]:
print(np.min(returns))

In [None]:
print(np.exp(returns))

In [None]:
print(np.sqrt(returns))

Further Reading: [Universal Functions](https://docs.scipy.org/doc/numpy/user/quickstart.html#universal-functions)

You can do the very basic math operations to the numpy arrays:

In [None]:
print(returns * 0)

In [None]:
print(returns * 2)

In [None]:
print(returns * 2 + 10)

In [None]:
print("Mean: ", np.mean(returns),' --- ', "Std Dev: ", np.std(returns))

In [None]:
N = 10
assets = np.zeros((N, 100))
returns = np.zeros((N, 100))

This function, `zeroes()`, creates a NumPy array with the given dimensions that is entirely filled in with  $0$ . We can pass a single value or a tuple of as many dimensions as we like. Passing in the tuple `(N, 100)`, will return a two-dimensional array with  $N$  rows and  $100$  columns. Our result is a  $N \times 100$  array.

You can create a matrix with 1's instead of 0:

In [None]:
N = 10
assets = np.ones((N, 100))
returns = np.ones((N, 100))

Alternative of range function. It is better integrated with numpy functions:

In [None]:
print(np.arange(6))

You can create `ranges` with `shapes`:

In [None]:
print(np.arange(12).reshape(4,3))

In [None]:
print(np.arange(24).reshape(2,3,4))

You can `reshape` the dimensions with `reshape(x,y,z,...)`:

In [None]:
arr = np.arange(12).reshape(4,3)
print(arr)

In [None]:
print(arr.reshape(12,1))

In [None]:
print(arr.reshape(1,12))

You will have error if you use impossible shapes:

In [None]:
print(arr.reshape(1,4))

In [None]:
print(arr.reshape(1,13))

Random values with shape:

In [None]:
r_arr = np.random.random((2,2))
print(r_arr)

Save you values:

In [None]:
np.savetxt('test.txt', r_arr, delimiter=',')

You can save the variables as objects instead:

In [None]:
np.save('f.npy', r_arr)

Then load it again:

In [None]:
loaded = np.load('f.npy')
print(loaded)

### NaN values

In [None]:
print(np.nan)

In [None]:
v = np.array([1, 2, np.nan, 4, 5])
print(v)

In [None]:
print(np.isnan(v))

In [None]:
print(np.mean(v))

Solution:

In [None]:
ix = ~np.isnan(v) # the ~ indicates a logical not, inverting the bools
print(np.mean(v[ix])) # We can also just write v = v[~np.isnan(v)]

Or:

In [None]:
np.nanmean(v)

## Matrix Operations
### Dot/Scalar Product

Unlike in many matrix languages, the product operator `*` operates elementwise in NumPy arrays. The matrix product can be performed using the `dot` function or method:

In [None]:
A = np.array( [[1,1],[0,1]] )
B = np.array( [[2,0],[3,4]] )

In [None]:
A * B

In [None]:
A.dot(B)

In [None]:
B.dot(A)

Or:

In [None]:
np.dot(A,B)

### Cross Product

In [None]:
np.cross(A,B)

## Exercise - Linearization

In [None]:
# data is created with:
# np.savetxt('data.txt', np.log(10*np.random.random(1000)))

Download the [data.txt](/bil113e/assets/data.txt)

In [None]:
non_linear = np.loadtxt('data.txt')

In [None]:
import matplotlib.pyplot as plt 

In [None]:
non_linear = np.sort(non_linear)
plt.plot(non_linear)
plt.show()

In [None]:
plt.plot(np.square(4+non_linear)*np.square(4+non_linear)*np.square(4+non_linear))
plt.show()

In [None]:
plt.plot(np.exp(non_linear))
plt.show()

## Resouces ##
- https://www.quantopian.com/lectures/introduction-to-numpy
- https://docs.scipy.org/doc/numpy/user/quickstart.html#universal-functions
- https://docs.scipy.org/doc/numpy/user/index.html
- https://www.datacamp.com/community/tutorials/python-numpy-tutorial