View the iPython notebook version of this post. (Much nicer formatting)
In finance, market data is price and trade data for a given instrument like a
stock, currency pair, or futures contract. Often times this data is visualized
as a chart of historical data.
Python is an excellent language for working with financial data. Python syntax
is compact for data exploration using an interactive shell like iPython. There
is also strong support for downloading, manipulating, and visualizing financial
market data through popular open source libraries like Pandas and Matplotlib.
# download free end of day historical stock data # from yahoo finance using pandas import pandas.io.data as web from datetime import datetime end = datetime.now() start = datetime(end.year - 5, end.month, end.day) df = web.DataReader("SPY", 'yahoo', start, end) print df.tail()
Open High Low Close Volume Adj Close
Date
2014-06-05 193.41 194.65 192.70 194.45 92103000 194.45
2014-06-06 194.87 195.43 194.78 195.38 78696000 195.38
2014-06-09 195.35 196.05 195.17 195.58 65119000 195.58
2014-06-10 195.34 195.64 194.92 195.60 57129000 195.60
2014-06-11 194.90 195.12 194.48 194.92 68711600 194.92
[5 rows x 6 columns]
The above is a Pandas DataFrame, a two-dimensional tabular, column-oriented data
structure with rich, high-performance time series functionality built on top of
NumPy’s array-computing features. A DataFrame provides many of the capabilities
of a spreadsheet and relational database with flexible handling of missing data
and integration with Matplotlib for visualization.
# summary statistics accross the whole DataFrame df.describe()
Open | High | Low | Close | Volume | Adj Close | |
---|---|---|---|---|---|---|
count | 1258.000000 | 1258.000000 | 1258.000000 | 1258.000000 | 1.258000e+03 | 1258.000000 |
mean | 136.639539 | 137.375763 | 135.841924 | 136.679245 | 1.685675e+08 | 130.850079 |
std | 25.695177 | 25.643999 | 25.745236 | 25.700838 | 7.963665e+07 | 28.457251 |
min | 87.700000 | 88.490000 | 87.000000 | 87.960000 | 3.967750e+07 | 79.860000 |
25% | 116.085000 | 117.275000 | 115.080000 | 116.295000 | 1.153016e+08 | 107.932500 |
50% | 132.555000 | 133.220000 | 131.640000 | 132.500000 | 1.524652e+08 | 125.405000 |
75% | 154.820000 | 155.540000 | 154.150000 | 154.907500 | 2.046574e+08 | 151.655000 |
max | 195.350000 | 196.050000 | 195.170000 | 195.600000 | 7.178287e+08 | 195.600000 |
8 rows × 6 columns
Slicing a DataFrame’s column yields a Series that can be operated on alone as
seen below.
# Closing price for most recent 5 trading days df[['Close']].tail()
Close | |
---|---|
Date | |
2014-06-05 | 194.45 |
2014-06-06 | 195.38 |
2014-06-09 | 195.58 |
2014-06-10 | 195.60 |
2014-06-11 | 194.92 |
5 rows × 1 columns
# volume statistics vol = df[['Volume']] print "Min: %s Max: %s Average: %s" % (vol.min().values[0], vol.max().values[0], vol.mean().values[0])
Min: 39677500 Max: 717828700 Average: 168567548.649
A wrapper around Matplotlib produces preformatted two-dimensional charts.
# plot the historical closing prices and volume using matplotlib plots = df[['Close', 'Volume']].plot(subplots=True, figsize=(10, 10)) plt.show()
# chart a basic 50 period moving average of the closing price import pandas as pd df['ma50'] = pd.rolling_mean(df['Close'], 50) df['ma200'] = pd.rolling_mean(df['Close'], 200) plots = df[['Close', 'ma50', 'ma200']].plot(subplots=False, figsize=(10, 4)) plt.show()
For more details on Pandas and matplotlib see: