We covered some basic notions regarding with the plotting in the pandas course. We will cover matplotlib.pyplot for data visualization in Python.
Let's start with importing necessary libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = np.arange(0, 5, 0.1);
y = np.sin(x)
plt.plot(x, y)
It looks like nothing appeared but a text. You should command it to plot:
plt.show()
If you don't want to bother with writing this each time for plotting you can use this command:
%matplotlib inline
X = np.linspace(-np.pi, np.pi, 256, endpoint=True) # generating 256 linear points between -pi and +pi
C, S = np.cos(X), np.sin(X)
plt.plot(X, C)
plt.plot(X, S)
We actually combined two plots of the sin()
and cos()
functions.
This plotting is done with the default settings. You can manipulate the features of the plot. Check for the matplotlib.pyplot documentation.
For example you can change the size of the plot:
plt.figure(figsize=(8,6), dpi=80)
plt.plot(X, C)
plt.plot(X, S)
You can change the colors:
plt.figure(figsize=(8,6), dpi=80)
plt.plot(X, C, color="blue", linewidth=1.0, linestyle="-")
plt.plot(X, S, color="green", linewidth=3.0, linestyle=":")
plt.figure(figsize=(8,6), dpi=80)
plt.plot(X, C, color="blue", linewidth=1.0, linestyle="-")
plt.plot(X, S, color="green", linewidth=3.0, linestyle=":")
plt.xlim(-4.0,4.0)
# Set x ticks
plt.xticks(np.linspace(-4,4,9,endpoint=True))
# Set y limits
plt.ylim(-1.0,1.0)
# Set y ticks
plt.yticks(np.linspace(-1,1,20,endpoint=True))
Let's add some grid:
plt.figure(figsize=(8,6), dpi=80)
plt.plot(X, C, color="blue", linewidth=3.0, linestyle="-")
plt.plot(X, S, color="green", linewidth=3.0, linestyle="-")
plt.xlim(-4.0,4.0)
# Set x ticks
plt.xticks(np.linspace(-4,4,9,endpoint=True))
# Set y limits
plt.ylim(-1.0,1.0)
# Set y ticks
plt.yticks(np.linspace(-1,1,20,endpoint=True))
plt.grid(color='red')
We can label the ticks:
plt.figure(figsize=(8,6), dpi=80)
plt.plot(X, C, color="blue", linewidth=3.0, linestyle="-")
plt.plot(X, S, color="green", linewidth=3.0, linestyle="-")
plt.xlim(-4.0,4.0)
# Set x ticks
# Set y limits
# Set y ticks
plt.grid(color='grey')
plt.xticks([-np.pi, -np.pi/2, 0, np.pi/2, np.pi],
[r'$-\pi$', r'$-\pi/2$', r'$0$', r'$+\pi/2$', r'$+\pi$'])
plt.yticks([-1, 0, +1],
[r'$-1$', r'$0$', r'$+1$'])
Withing the $
signs you can use latex/markdown
formatting.
It look's like we can save the it as figure:
plt.figure(figsize=(8,6), dpi=80)
plt.plot(X, C, color="blue", linewidth=3.0, linestyle="-")
plt.plot(X, S, color="green", linewidth=3.0, linestyle="-")
plt.xlim(-4.0,4.0)
# Set x ticks
plt.xticks(np.linspace(-4,4,9,endpoint=True))
# Set y limits
plt.ylim(-1.0,1.0)
# Set y ticks
plt.yticks(np.linspace(-1,1,10,endpoint=True))
plt.grid(color='black')
plt.savefig('fig.png')
Add a legend:
plt.figure(figsize=(8,6), dpi=80)
plt.plot(X, C, color="blue", linewidth=3.0, linestyle="-", label = 'cos()')
plt.plot(X, S, color="green", linewidth=3.0, linestyle="-", label='sin()')
plt.xlim(-4.0,4.0)
plt.xticks(np.linspace(-4,4,9,endpoint=True))
plt.yticks(np.linspace(-1,1,10,endpoint=True))
plt.legend(loc='upper left', frameon=False)
Annotate some points:
plt.figure(figsize=(8,6), dpi=80)
plt.plot(X, C, color="blue", linewidth=3.0, linestyle="-", label = 'cos()')
plt.plot(X, S, color="green", linewidth=3.0, linestyle="-", label='sin()')
plt.xlim(-4.0,4.0)
plt.xticks(np.linspace(-4,4,9,endpoint=True))
plt.yticks(np.linspace(-1,1,10,endpoint=True))
plt.legend(loc='upper left', frameon=False)
plt.annotate(s='sin(0) = 0', xy = (0,0),
xytext=(+10, -30), textcoords='offset points', fontsize=12,
arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=.2"))
Explanation of the annotate()
attributes:
tuple
that gives the coordinates of the points.points
- offset (in points) from the xy value; (2) pixels
- offset (in pixels) from the xy valueFor now you are not necessarily learn the details. It is enough to know what you can do while plotting a data.
Do you remember the dataset we manipulated in the pandas course? We actually saved it as csv file names movies_new.csv. Okay download the dataset and import it.
mv = pd.read_csv('movies_new.csv',encoding='latin1')
mv.head()
Let's find out the categories:
cat_names = mv.columns
cat_names = cat_names[4:]
print(cat_names, '\n',len(cat_names))
We have 19 categories.
cat = mv[cat_names].sum().sort_values()
cat
Let's make a histogram:
plt.figure(figsize=(12,5), dpi=80)
plt.bar(cat.index, cat.values, color='darkred')
plt.xticks(rotation=90, fontsize=12)
plt.xlabel('Categories',fontsize=15,color='darkblue')
plt.ylabel('Counts',fontsize=15,color='darkblue')
May be a pie chart works better? Also see this example
plt.figure(figsize=(10,10), dpi=80)
exp= [0] * 16 + [0.1] + [0] * 2
plt.pie(cat, explode = exp, labels=cat.index,shadow=True)
For better visualization we can cut-off some categories:
pop_cat = cat[-12:]
plt.figure(figsize=(12,12), dpi=80)
plt.pie(pop_cat, labels=pop_cat.index,shadow=True)
Second thing we can do with our data is to count number of movies for each year.
yrs = mv['year'].value_counts()
yrs.head()
yrs = yrs.sort_index()
plt.figure(figsize=(12,5), dpi=80)
plt.plot(yrs, linewidth=3.0, linestyle="-")
plt.xticks(rotation=90, fontsize=11)
plt.grid(color='black')
plt.xlabel('Years',fontsize=15,color='darkblue')
plt.ylabel('Counts',fontsize=15,color='darkblue')
When you examine the examples below you will notice that how advanced plotting can be done with matplotlib. These examples are taken from: https://www.labri.fr/perso/nrougier/teaching/matplotlib/
n = 1024
X = np.random.normal(0,1,n)
Y = np.random.normal(0,1,n)
T = np.arctan2(Y,X)
plt.axes([0.025,0.025,0.95,0.95])
plt.scatter(X,Y, s=75, c=T, alpha=.5)
plt.xlim(-1.5,1.5), plt.xticks([])
plt.ylim(-1.5,1.5), plt.yticks([])
# savefig('../figures/scatter_ex.png',dpi=48)
plt.show()
n = 12
X = np.arange(n)
Y1 = (1-X/float(n)) * np.random.uniform(0.5,1.0,n)
Y2 = (1-X/float(n)) * np.random.uniform(0.5,1.0,n)
plt.axes([0.025,0.025,0.95,0.95])
plt.bar(X, +Y1, facecolor='#9999ff', edgecolor='white')
plt.bar(X, -Y2, facecolor='#ff9999', edgecolor='white')
for x,y in zip(X,Y1):
plt.text(x+0.4, y+0.05, '%.2f' % y, ha='center', va= 'bottom')
for x,y in zip(X,Y2):
plt.text(x+0.4, -y-0.05, '%.2f' % y, ha='center', va= 'top')
plt.xlim(-.5,n), plt.xticks([])
plt.ylim(-1.25,+1.25), plt.yticks([])
# savefig('../figures/bar_ex.png', dpi=48)
plt.show()
def f(x,y):
return (1-x/2+x**5+y**3)*np.exp(-x**2-y**2)
n = 256
x = np.linspace(-3,3,n)
y = np.linspace(-3,3,n)
X,Y = np.meshgrid(x,y)
plt.axes([0.025,0.025,0.95,0.95])
plt.contourf(X, Y, f(X,Y), 8, alpha=.75, cmap=plt.cm.hot)
C = plt.contour(X, Y, f(X,Y), 8, colors='black', linewidth=.5)
plt.clabel(C, inline=1, fontsize=10)
plt.xticks([]), plt.yticks([])
plt.show()
n = 20
Z = np.ones(n)
Z[-1] *= 2
plt.axes([0.025,0.025,0.95,0.95])
plt.pie(Z, explode=Z*.05, colors = ['%f' % (i/float(n)) for i in range(n)])
plt.gca().set_aspect('equal')
plt.xticks([]), plt.yticks([])
n = 8
X,Y = np.mgrid[0:n,0:n]
T = np.arctan2(Y-n/2.0, X-n/2.0)
R = 10+np.sqrt((Y-n/2.0)**2+(X-n/2.0)**2)
U,V = R*np.cos(T), R*np.sin(T)
plt.axes([0.025,0.025,0.95,0.95])
plt.quiver(X,Y,U,V,R, alpha=.5)
plt.quiver(X,Y,U,V, edgecolor='k', facecolor='None', linewidth=.5)
plt.xlim(-1,n), plt.xticks([])
plt.ylim(-1,n), plt.yticks([])
ax = plt.axes([0.025,0.025,0.95,0.95], polar=True)
N = 20
theta = np.arange(0.0, 2*np.pi, 2*np.pi/N)
radii = 10*np.random.rand(N)
width = np.pi/4*np.random.rand(N)
bars = plt.bar(theta, radii, width=width, bottom=0.0)
for r,bar in zip(radii, bars):
bar.set_facecolor( plt.cm.jet(r/10.))
bar.set_alpha(0.5)
ax.set_xticklabels([])
ax.set_yticklabels([])
# savefig('../figures/polar_ex.png',dpi=48)
plt.show()
eqs = []
eqs.append((r"$W^{3\beta}_{\delta_1 \rho_1 \sigma_2} = U^{3\beta}_{\delta_1 \rho_1} + \frac{1}{8 \pi 2} \int^{\alpha_2}_{\alpha_2} d \alpha^\prime_2 \left[\frac{ U^{2\beta}_{\delta_1 \rho_1} - \alpha^\prime_2U^{1\beta}_{\rho_1 \sigma_2} }{U^{0\beta}_{\rho_1 \sigma_2}}\right]$"))
eqs.append((r"$\frac{d\rho}{d t} + \rho \vec{v}\cdot\nabla\vec{v} = -\nabla p + \mu\nabla^2 \vec{v} + \rho \vec{g}$"))
eqs.append((r"$\int_{-\infty}^\infty e^{-x^2}dx=\sqrt{\pi}$"))
eqs.append((r"$E = mc^2 = \sqrt{{m_0}^2c^4 + p^2c^2}$"))
eqs.append((r"$F_G = G\frac{m_1m_2}{r^2}$"))
plt.axes([0.025,0.025,0.95,0.95])
for i in range(24):
index = np.random.randint(0,len(eqs))
eq = eqs[index]
size = np.random.uniform(12,32)
x,y = np.random.uniform(0,1,2)
alpha = np.random.uniform(0.25,.75)
plt.text(x, y, eq, ha='center', va='center', color="#11557c", alpha=alpha,
transform=plt.gca().transAxes, fontsize=size, clip_on=True)
plt.xticks([]), plt.yticks([])
# savefig('../figures/text_ex.png',dpi=48)
plt.show()
Matplotlib is very fundemental plotting library. Many of the plotting libraries are built on Matplotlib. You can check out for other libraries. Let's check out another library.
import seaborn as sns
tips = sns.load_dataset("tips")
sns.violinplot(x = "total_bill", data=tips)
seaborn
is based on matplotlib so we can use matplotlib features with seaborn. For instance:
plt.figure(figsize=(8,4), dpi=80)
tips = sns.load_dataset("tips")
sns.violinplot(x = "total_bill", data=tips)
plt.figure(figsize=(7,7), dpi=80)
# Load iris data
iris = sns.load_dataset("iris")
# Construct iris plot
sns.swarmplot(x="species", y="petal_length", data=iris)
titanic = sns.load_dataset("titanic")
# Set up a factorplot
g = sns.factorplot("class", "survived", "sex", data=titanic, kind="bar", palette="muted", legend=False)
seaborn default styles: white, dark, whitegrid, darkgrid, ticks
.
sns.set() #default seaborn style
titanic = sns.load_dataset("titanic")
g = sns.factorplot("class", "survived", "sex", data=titanic, kind="bar", palette="muted", legend=False)
sns.set_style('whitegrid') #default seaborn style
titanic = sns.load_dataset("titanic")
g = sns.factorplot("class", "survived", "sex", data=titanic, kind="bar", palette="muted", legend=False)
sns.set_style('dark') #default seaborn style
titanic = sns.load_dataset("titanic")
g = sns.factorplot("class", "survived", "sex", data=titanic, kind="bar", palette="muted", legend=False)
Check out for DataCamp SeaBorn Tutorial
Objective of this course is to introduce basic plotting library of python. One importing thing should be pointed out that you do not need to grasp every detail of this features. This course should give you the intuition of how to plot data. You must get habit of learning what you need using documentations, or using Google :).
Check out for other data visualization libraries: 10 Useful Python Data Visualization Libraries for Any Discipline