R – Pivot Billions

Real Net Profit: 150% in just 4 Months

Ben Waxer — Fri, 08 Feb 2019 13:49:51 +0000

Developing a post-commission profitable currency trading model using Pivot Billions and R.

Needle, meet haystack. Searching for the right combination of features to make a consistent trading model can be quite difficult and takes many, many iterations. By incorporating Pivot Billions and R into my research process, I was able to dramatically improve the efficiency of each iteration making finding that needle in a haystack actually possible. Pivot Billions provided the raw power and scalability, while R provided the higher level manipulations and processes that allowed my to dive deep into my financial data and start to understand the underlying trends.

Utilizing Pivot Billions’ accurate financial backtesting simulator I was able to quickly test each version of my model as I developed it and see how it would perform in the real market. From testing initial general trading strategies to exploring individual and grouped features to see their distribution in my data and their effect on the trading strategies, my research process made great use of both tools. Adding features easily across all 143 Million rows of my data in Pivot Billions and being able to access, test, and simulate the effect of trading using these features from within my R code led to a very promising model ready for live trading.

After implementing this model in my real live trading account, I was able to achieve over 150% net profit in just four months! While there are still some small drawdowns the overall profit is very consistent and achieves great profitability in a very small amount of time.

I am continuing to trade this model and follow its performance. In the meantime I am working on minimizing its drawdowns and maximizing my profit by incorporating AI. Check out my Pivot Billions and Deep Learning post to see some of my preliminary results.

R NewYorkers Feeling the Holiday Spirit? Here’s Your Tip

Ben Waxer — Wed, 09 Jan 2019 16:53:12 +0000

The holiday season brings with it a degree of cheer and joy that many claim makes people act friendlier towards each other. I wanted to see how this effect translates to action so I decided to look into tips for New York green taxis both during the holiday season and the rest of the year. To start, I streamed all of the green taxi data files from the public NYC Taxi and Limousine Commission Trip Record Data for 2017-07-01 to 2018-06-31 (the most recent year of green taxi data) into Pivot Billions and enhanced the data with two new columns: holidayseason and tip_percent.

There were many rows that weren’t relevant to this analysis since cash payments did not have records of tips, so I filtered out cash payments from the payment_type column in Pivot Billions bringing the total rows to ~5 Million.

To dive into the data I made use of Pivot Billions’ pivot feature to quickly reorganize all of this filtered data by where the passenger(s) were dropped off (DOLocationID) and whether the trip occurred during the holiday season. My over 9 Million original rows of data were now shrunk down to a much more manageable 513 row detailed summary. Downloading this new view of the data from Pivot Billions I switched my focus to visualizing and analyzing the data in R.

Now that the data was shrunk down to a size R can easily handle, I loaded the Taxi Zone Shapefile and my newly downloaded DoLocationID_holiday_tips.csv file into R. This was a simple process of uploading the shapefile from our datasource as well as our Pivot Billions - processed file onto my machine running R and then joining them by setting Location ID equal to DOLocationID.

After quickly defining a new metric from our data called “Holiday Effect” that tracks the percentage difference in average tips between the holiday season and the rest of the year and adding additional information to the data to make it informative and explorable, I was left with a very clear and powerful visualization of the green taxi data.

It is immediately clear that there are regions with a much greater occurrence of positive holiday effects (green areas) than negative effects (orange areas) as well as the reverse. Utilizing R’s powerful indexing abilities it's easy to narrow down the data by location and explore which areas of New York experience the effect the most. It appears that Bronx and Brooklyn experience more negative effects whereas Queens is evenly spread between positive and negative. However, Manhattan and Newark Airport have a much higher proportion of positive effects due to the holiday season.

Though most of New York is being affected by the holidays for better or worse, people going to Manhattan and Newark Airport seem to be feeling the holiday spirit the most.

To create this visualization yourself you can download my R code, DOLocationID_holiday_tips.csv, and the Public Data’s Shapefile. You can also run this code replacing "DOLocationID_holiday_tips.csv" with "PULocationID_holiday_tips.csv" and DOLocationID with PULocationID to view the holiday effect on tips by Pick-Up Location.

Pivot Billions and Deep Learning enhanced trading models achieve 30% net profit

pivotteam — Mon, 24 Dec 2018 19:01:15 +0000

Deep Learning has revolutionized the fields of image classification, personal assistance, competitive board game play, and many more. However, the financial currency markets have been surprisingly stagnant. In our efforts to create a profitable and accurate trading model, we came upon the question: what if financial currency data could be represented as an image? The answer: it can!

There are many ways to reshape currency data into an image. However, each requires a great deal of processing power and research. We powered our analysis with Pivot Billions, which allowed us to load and analyze our data quickly and to reshape it into an image using their custom module. While we could have reshaped the data to have the last X ticks for each tick data point or last Y minutes for each minute of the data, we already had working models from our initial research without deep learning.

See prior posts:

Pivot Billions and R by going through Streamlining EDA with Pivot Billions enhances workflows in R

Blazing Fast Financial Backtesting from R

Powering Insight through Massive Optimization.

Therefore we decided to take the signals from one of our models and enhance them with the power of deep learning and AI.

By incorporating Pivot Billions into our Keras workflow, we were immediately able to prepare the data to have the last 100 minutes prior to each of our model’s signals in the row for that respective signal and to setup our training and testing datasets. We then easily loaded this data into keras and worked on developing our deep learning model. Many times we needed to modify which features’ 100 minute histories were being stored and incorporated into our deep learning so we made full use of Pivot Billions’ speed to decrease our iteration turnaround time. After many iterations, we were able to determine a deep learning model that learned our model’s weaknesses and strengths and accurately predicted the profitable signals.

Using this enhanced model we could achieve amazing and much more stable profit throughout our data. We took raw signals that looked like this:

which were profitable but highly volatile due to periods of noisy and underperforming trades, and turned them into this:

The periods of drawdown are greatly reduced and our profitability is much more stable, allowing us to achieve nearly 30% profit in less than 7 months. We’ll continue our research so look forward to another blog post featuring even better AI-enhanced models!

Powering Insight Through Massive Optimization

pivotteam — Thu, 08 Nov 2018 00:15:43 +0000

Data comes with a price. Accuracy comes with an even greater price. And the two together can demand enormous resources. That’s why it is important to achieve the greatest efficiency in your research process and make use of any tools that can help you. This is particularly true if you are trying to develop a currency trading model that is based on data from highly granular tick by tick values.

The Pivot Billions team has been working on this use case because it's an interesting and challenging real world application of Pivot Billions. Each currency pair that we wanted to develop a trading model for starts with historical tick data for a 5 year period, with roughly 140 millions rows that need to be analyzed. 140 million rows is pretty large, and that's for just one currency pair. Our modeling goals were to be able to develop models for multiple currencies.

Incorporating PivotBillions allowed me to access the full granularity of my currencies’ tick data throughout my research cycle. From loading and accessing all of the data, to enhancing the data with additional features, to exploring the distribution of the data across these features and even pivoting my data by these features to explore their effect, my massive data became something I could actually tame.

Now, even when you have access to and control of your data, to develop consistent trading models takes many, many iterations. What has taken me minutes or even hours in other tools was reduced to mere seconds using PivotBillions. While waiting a few minutes for results may be acceptable for a single test simulation, it can cause insurmountable hurdles when running many iterations and can prevent any optimizations or extensive modelling from being run on large amounts of data.

For example, if your test simulation time on 1 year of tick data takes on average 60 seconds, and you explore 1000 trading models, it will take you roughly 17 hours just to run them. If you need to run the same simulation for a 5 year period and assume that the added time to run it goes up linearly, it will take about 85 hours. This of course is an oversimplification, and run times for simulations can take much longer based on the complexity of the model, but in general you can see that the time to run just 1000 models can be ridiculously long.

That's why we developed our own time series database and trading module that runs on the Pivot Billions data processing platform to achieve extremely fast simulation times. By doing so, we reduce the run time for a simulation of 5 years of tick data to about 10 seconds on average. So that 5 year simulation that took about 85 hours to run in other trading simulators would take about 7 hours in Pivot Billions. That's over a 90% reduction in total run time! In fact, this was early on in our development, and since then we've been able to optimize the simulation almost 1000 times faster.

With this capability, I was finally able to fully dive into my features and run optimizations from R incorporating hundreds of thousands of simulations across hundreds of millions of raw tick data points to find trading models that continued to make profit over a five year timeframe.

Even after developing my trading model for one currency, I was able to seamlessly transition to another currency and apply my process. Quickly diving into the many possible features and optimizing a vast number of parameterizations to find a model that works consistently was even easier the second time around and soon I had my second model. This process continued until I had four profitable, consistent, and viable models for the EUR/USD, AUD/JPY, EUR/JPY, and USD/JPY currency pairs.

The whole process can be accomplished using the free PivotBillions docker image. If you need help using these features, contact us at info@pivotbillions.com.

Streamlining EDA (exploratory data analysis) with Pivot Billions enhances workflows in R.

pivotteam — Wed, 31 Oct 2018 22:20:35 +0000

Incorporating Pivot Billions into your R analysis workflow can dramatically improve the research cycle and your ability to get results.

R is a great statistical analysis tool that a wide variety of data analysts use to analyze and model data. But R has limits on the data it can load onto your machine and tends to dramatically slow down past a certain number of data points. To facilitate faster turnaround times when using R, we incorporate Pivot Billions into the workflow for fast data exploration and enhancement.

R users can appreciate that even after the data is loaded, you can still modify and interact with the data on the fly from Pivot Billions' interface. Adding new features such as calculations based on existing columns is facilitated through a column creation function directly accessible from the Pivot Billions UI. This allows any user to quickly add additional features to the data even after it was imported and then easily export the data to R.

As a real word example, we loaded over 444 MB of EUR/USD Currency Tick Data, approximately 9 million rows, to predict price increases in the currency pair in the currency exchange market. Using Pivot Billions, installed locally on our laptop, we are able to explore the raw data files, add transformation rules to enhance data, and load all of it into the Pivot Billions in-memory database in less than two minutes. In this case, Pivot Billions acts as both a data warehouse and an EDA tool.

From the report interface, we quickly add some new features including:

delta_maxmin_300) - the difference between the maximum and minimum close prices over the last 300 minutes
(delta_CO) - the difference in the current minute’s close and open prices
(delta_NcC) - the difference between the next minute (future) and current close prices.

This last feature is the value we’re most interested in being able to determine. If we can discover rules that govern whether the price will go up in the next minute, we can apply that to our currency trading strategy.

To start working with the data in R, we simply download it, with all the newly added features, from Pivot Billions and read it into R. By analyzing and visualizing the data in R we can quickly drill down into the key features and how they affect the currency price. We explore the relationship between the difference in the mid term maximum and minimum, close and open price, and the next period close price and the current close price. It appears that selecting certain thresholds for these features can accurately predict an increase in the close price for the next minute.

To further validate our findings, we applied the same process on the much larger data set of 135 million rows (5 years of currency tick data) on a memory optimized EC2 instance in AWS. The results showed the same behavior on the much larger data set.

Although we ultimately derive our predictive model in R, we used Pivot Billions as our workhorse to get the large quantity of data prepared, enhanced and in a format that can be quickly and efficiently used within R. To follow along and view the R code for this example usecase, go through our Pivot Billions and R Visualization Demo.

Blazing Fast Financial Backtesting from R

pivotteam — Thu, 25 Oct 2018 00:00:17 +0000

As a data scientist, whenever I am developing and testing financial models in R I’ve consistently run into data size limitations, large or distributed compute clusters, and many long waits for my results to be processed and returned. That's why I was genuinely impressed with how our recently released docker image of Pivot Billions, enables me to perform extremely fast backtesting of financial models from R in real time. The sheer power and efficiency of our tool was surprising. It was able to backtest my model on over 140 million rows of data in less than 9 seconds. The fact that it's a free solution makes it a big plus for the data science community.

Pivot Billions not only allowed simulations across massive amounts of financial data from the comfort of R, but did it from a free docker image running on a single r4.large Amazon EC2 instance ($0.133/Hour). This made it a lot simpler to develop my financial model as each iteration of my testing was done in seconds and the whole process was much more interactive.

The simplicity of the R integration allowed me to control my model and testing parameters and view and analyze the results directly from R. Behind-the-scenes, our custom-built time series database and trading module handled the brunt of the work. Having access to the whole 5.5 years of Raw Currency Tick Data allowed me to see the full granularity of my trading strategy in real time, greatly enhancing my development process.

Enhancing R with Pivot Billions created a pretty powerful combination.

If you want to try it out for yourself, I've posted a link to the R script. Just download and access with RStudio.

Links:
Backtesting R Script

Details of Setup:

1 memory-optimized r4.large Amazon EC2 instance: $0.133 / Hour
7.5 GB used
5.5 Years of Raw Tick Data Analyzed
143,143,685 Rows of Data Analyzed in 8.37 Seconds

Backtesting Detail:
The time series database and trading module were developed internally as a project specifically for backtesting financial models quickly, simply, and accurately. Both the database and module have been verified against real-world trades through a broker and allow risk-free exploration and testing of your models in real-time. They are included in the Docker version of Pivot Billions for free. If you need help using these features, contact us at info@pivotbillions.com.