Tuesday, July 7, 2015

Aspiring Hari Seldon - Developing Price Predictions (Part 1)

With the Prosper Market Show on semi-hiatus for the summer, I'm trying to put the time toward developing some new utilities for analysis.  Though I have plenty on the to-do list to work through, I wanted to toy around with some future-looking tools to help illustrate some of the intuition I see moving forward.

Specifically, after looking at some examples, I wanted to try out Monte Carlo analysis.  The idea being if you take a decent prediction methodology, then generate thousands of predictions, you can average out the noise to get at the true signal.

Stage 1: Generating Predictions

Knowing nothing at the start, I picked up Geometric Brownian Motion as the predictor function.  It's relatively popular and well supported, and is handily ready in an R Library.  I won't even begin to claim I am an expert as to what's going on under the hood, but the general concept is pretty straight forward:

Next Price = Current Price + (expected variance)*(normal-random number)

The variance can be tuned as a percentage to express how far any individual roll will move, and how widely the random-number-generator will vary.  To find the numbers, I had to do a bit of hackery, but if you want to see the full methodology, check out the raw code here.  Also, it's worth noting that GBM does take into account interest, but I've zeroed out those numbers for the time being.

Using CREST data as a basis, we can pick a point in time and generate some predictions looking forward.  But due to the random-variance methodology, some predictions are good, and others are not:


These are 3 independent predictions using a low/med/hi variance seed into the GBM function.  The "high" line trending down and "low" trending up are a matter of which rolls are winning.  If you look more closely, the bounds on low are much tighter where the high is much more wild.  Given another set of rolls, the best predictions will change.

The weakness we should be aware of is rolling errors.  Because the GBM function is a recursive and random guessing function, if predictions start to stray off into the bushes, it's hard to expect they will come back.  This model uses a single price to start walking, and the further it walks, the wider the end results will land.

Step 2: Generate a lot of predictions

Happy with the general behavior of the model, the next step is to be able to generate a lot of guesses.  Randomness should wash out given enough trials.

Though I was able to wash out the randomness, the end result is not nearly as useful as I was expecting:


Summarizing the predictions by-day gives us a much different picture.  The randomness is gone, but we're left with a less useful "cone of possibility".  Though the end result we're looking for is definitely a "zone of possibility", this picture is not a useful automation of that concept.  Specifically that the median prediction is roughly "the price will stay the same", this is not a useful prediction for most items.

What is going on?  Well, here we can actually see the GBM function for what it is and why it is breaking down for our predictions:
  1. Assumes +/- variation is equally probable.  Here it looks like the distributions are strictly normal, which means they are centered around 0 +/- variance over time.
  2. Takes no history into account by default.  Function takes a single price and a single variance variable.  Understands nothing about max/min variance
This is particularly absurd for PLEX where variance/volatility is largely positive.


Just looking at the variations for the last year, almost 60% of the movements were positive.  We could further enhance our predictor by looking at a more recent window.

Step 3: Back to the Drawing Board

After chewing on this problem for a few days, I do have some things to follow up.  Though there are some other models to try in the sde package, I do think we have some options on how to get more useful predictions out.

Convolve Predictions?

This is one I've gone back and forth about.  I'd like to be able to "sum" together the predictions to get the general "tone" of what is going on.  Except that it's been 5 years since I took a digital signals class, and my attempts so far have just been guesses.  Though R is ready out-of-the-box to do FFT/convolutions, I need to better understand what I'm asking the program to do.  Currently, all attempts run off to infinity.

Exponential Predictions Using Volatility

After seeing the outcomes of the GBM experiments, I'm instead seeing a different trend pop out.  If I just pick a low/med/high variance out of the distribution, I could better generate a window in the short term.  Simply project forward the same %variance forward.


Another option is to get witty with the actual high/low bounds to narrow the prediction off our existing price-flagging system.  I just picked 25/75th percentiles, but we could narrow those bands with a smarter lookback to characterize how "wild" the last period has been.

Get Smart: Filter/Augment GBM Outputs

The last option is to roll some of my own ideas into the random-guessing function.  Using the historical variance as a seed for guessing-function, and/or dropping predictions that don't at least make a good prediction for the last 10 days before moving forward to the next 10.  I'm not yet convinced either is better than the far simpler exponential predictor above, because I would expect the exponential pattern to still wash out in the end, especially if we stick with a Monte Carlo style approach.

The hope would be to start a random function on a linear path (perhaps take 2wks into account) then have exponential variance as the high/low bounds to build a channel.

EDIT: using RSI as a prediction filter yielded some interesting results.  There is still some tweaking to do, especially for items that have had a recent spike into very unstable territory but initial forays seem promising



Conclusions

I'm really disappointed with the outcomes from this project; a lot of articles online made this sound like a magic-pill.  Instead we see the underlying nature of the model once randomness is removed.  Though it's not a complete loss, I was hoping for something a little wittier than a linear fit, but CCP Quant has pointed out Forecast inside R for more tinkering.

The nice thing out of this exercise has been being able to quickly experiment in R.  Though there are still some pretty considerable hurdles between these experiments and actual inclusion in the show, I was able to work with some pretty powerful packages extremely quickly to really dig into the problem and iterate quickly.  I continue to be impressed with R as an exploration platform, but still have some hesitance on integrating it in more powerful ways.