Python, Pandas and Matplotlib 1.2 : Data frames and HDF5 storage

After running into Unicode indexing errors in Python 2.7, a solution was found by installing the Anaconda Python 3.4 distribution. This will also set up your System Environment Variables.

This will be an example that shows you an easy way to capture the ticker data from btc-e.com for all currency pairs. HDF5, also known as Hierarchical Data Format, and Pandas DataFrame will demonstrate to be powerful for quick analysis on large data-sets.

After you have installed the distribution, open the Anaconda Launcher:

Anaconda_Launcher

Launch an ipython-notebook. A new window should open directed towards localhost:8888/tree or  similar. Under files go to the new button and select “Python 3” under the notebooks section.

To begin, import pandas and the time library by doing the following:

1

Also, set a variable named url  to be the BTC-e api ticker url. In this instance we are concerned with capturing data for each pair available on BTC-e.

Next will be the soul of our data capturing, pd.read_json() .

2

I recommend setting the delay to time.sleep(2). If you read the API information on BTC-e, it states that it refreshes at 0.5Hz.

We simply read the json from the api url into a data-frame named frame and transpose it. Next, we define keylist as the values of the index (pair names for ex. ‘btc_usd’ to be used as the data-frame key); of which is accomplished by setting keylist = frame.index.values.

In the for loop we then separate out each pair with its index set to the series ‘updated’. We then use store.append(‘key’,data) to append each key and data-set to pd.HDFStore(‘filename.h5’).

Once you stop the loop, you can view the data with the follow:

pd.read_hdf(‘filename.h5′,’key’)

Type any pair combination on BTC-e as the key such as ‘ltc_usd’ or ‘ppc_usd’ to get the logged data.

3

Here’s a quick example of plotting the data.

cover

Next we will learn about real time plotting using matplotlib!  

Python, Pandas and Matplotlib 1.1 : Analysis and Storage of Large Time-Series Data from the BTC-e API

First off I would like to thank Wes McKinney for his great work compiling together the pandas library over the past years. His book Python for Data Analysis (~$25 on Amazon) is fantastic and a great addition to your home library.

This post will be an introduction to help you better understand the resources and tools available to you in python and the pandas library. If you have been following the MATLAB data capture tutorials you will transition into these python tutorials fairly well.

Our main objective is to capture the JSON data from the BTC-e API and continually store and analyze it to our liking. To accomplish this we will be utilizing Data Frames (Not to be confused with the Data Frames found in “R”) and Hierarchical Indexing.

Now, I started off hacking away in Python 2.7 with the following Libraries/Packages/IDE:

  • IPython
  • Pandas
  • Matplotlib
  • Spyder
  • Numpy
  • SQLite3

I recommend downloading the following Python(x,y) Package.

It will provide you with the following:

  • IPython 2.4.1-10
  • Pandas 0.16.2-15
  • Matplotlib 1.4.3-7
  • Spyder 2.3.5.2-17
    • If you’re comfortable working in the MATLAB environment the Spyder IDE will feel like your home away from home.
  • Numpy 1.9.2-8

Our main working environments will be the Spyder IDE along with instances of IPython Notebooks. I have found that this is a great combination for testing out new strategies.

Once you have accumulated the necessary resources for this tutorial continue on to Part 1.2. If you have any trouble feel free to leave comments or contact embeddedthought directly.