Python have some built-in functions which we had seen some before. Yet, these functions are the very basic of Python programming. You can download Youtube videos, access Twitter API, make GPU analysis for Big Data, create games or web pages, built robots, and so on.
Python also have its Standard Library like math
, os
, sys
, random
, time
... But most enjoyful applications created with Python are created with other
libraries.
Also check out 100 Numpy Exercises and Solutions.
We will cover the two basic external libraries: Numpy & Pandas. They are irreplaceable libraries of Python especially for Data Science.
Downlaod this notebook to make practice.
NumPy has strong integration with other Python libraries. It looks like small brother of pandas in a way :). Let's call him:
import numpy as np
Arrays are similar to the lists but are more flexible.
stock_list = [3.5, 5, 2, 8, 4.2]
returns = np.array(stock_list)
print('List: ', stock_list)
print('Array: ', returns)
print(returns.shape)
print(type(returns))
Be careful about the notation:
np.array(1,2,3,4) # WRONG
np.array([1,2,3,4]) # RIGHT
We can create two dimensinal array by:
A = np.array([[1, 2], [3, 4]])
print(A)
print(A.shape)
print(type(A))
Arrays are indexed in much the same way as lists in Python. Elements of a list begin indexing from $0$ and end at $n−1$ , where $n$ is the length of the array.
print(returns[0], returns[len(returns) - 1])
print(returns[1:3])
print(A[:, 0])
print(A[0, :])
Passing only one index to a 2-dimensional array will result in returning the row with the given index as well, providing us with another way to access individual rows.
print(type(A[0,:]))
Accessing the index of an individual element will return only the element.
print(A[1, 1])
c = np.array( [ [1,2], [3,4] ], dtype=complex )
print(c)
print(np.log(returns))
print(np.mean(returns))
print(np.max(returns))
print(np.min(returns))
print(np.exp(returns))
print(np.sqrt(returns))
Further Reading: Universal Functions
You can do the very basic math operations to the numpy arrays:
print(returns * 0)
print(returns * 2)
print(returns * 2 + 10)
print("Mean: ", np.mean(returns),' --- ', "Std Dev: ", np.std(returns))
N = 10
assets = np.zeros((N, 100))
returns = np.zeros((N, 100))
This function, zeroes()
, creates a NumPy array with the given dimensions that is entirely filled in with $0$ . We can pass a single value or a tuple of as many dimensions as we like. Passing in the tuple (N, 100)
, will return a two-dimensional array with $N$ rows and $100$ columns. Our result is a $N \times 100$ array.
You can create a matrix with 1's instead of 0:
N = 10
assets = np.ones((N, 100))
returns = np.ones((N, 100))
Alternative of range function. It is better integrated with numpy functions:
print(np.arange(6))
You can create ranges
with shapes
:
print(np.arange(12).reshape(4,3))
print(np.arange(24).reshape(2,3,4))
You can reshape
the dimensions with reshape(x,y,z,...)
:
arr = np.arange(12).reshape(4,3)
print(arr)
print(arr.reshape(12,1))
print(arr.reshape(1,12))
You will have error if you use impossible shapes:
print(arr.reshape(1,4))
print(arr.reshape(1,13))
Random values with shape:
r_arr = np.random.random((2,2))
print(r_arr)
Save you values:
np.savetxt('test.txt', r_arr, delimiter=',')
You can save the variables as objects instead:
np.save('f.npy', r_arr)
Then load it again:
loaded = np.load('f.npy')
print(loaded)
print(np.nan)
v = np.array([1, 2, np.nan, 4, 5])
print(v)
print(np.isnan(v))
print(np.mean(v))
Solution:
ix = ~np.isnan(v) # the ~ indicates a logical not, inverting the bools
print(np.mean(v[ix])) # We can also just write v = v[~np.isnan(v)]
Or:
np.nanmean(v)
Unlike in many matrix languages, the product operator *
operates elementwise in NumPy arrays. The matrix product can be performed using the dot
function or method:
A = np.array( [[1,1],[0,1]] )
B = np.array( [[2,0],[3,4]] )
A * B
A.dot(B)
B.dot(A)
Or:
np.dot(A,B)
np.cross(A,B)
# data is created with:
# np.savetxt('data.txt', np.log(10*np.random.random(1000)))
Download the data.txt
non_linear = np.loadtxt('data.txt')
import matplotlib.pyplot as plt
non_linear = np.sort(non_linear)
plt.plot(non_linear)
plt.show()
plt.plot(np.square(4+non_linear)*np.square(4+non_linear)*np.square(4+non_linear))
plt.show()
plt.plot(np.exp(non_linear))
plt.show()