Sunday, November 16, 2014

How It's Made: Price Flagging

I owe my partner in crime, Etienne Erquilenne, a huge debt for adding a much needed second pair of hands onto this whole Prosper project.  His IRL expertise is completely invaluable, and has freed me up to accelerate the development schedule measurably.  But, as I've expressed on the blog before, I'm very much a fan of open sourcing.  So let's look under the hood.

Volume Flagging

Volume data was straight forward.  Since they never go negative, and rarely jump by orders of magnitude, it was pretty easy to wrap the values up into a normal-ish histogram.  Below is Tritanium:
1yr of Tritanium Volumes - The Forge

Not perfectly normal, but close enough where we can use percentiles to check the "sigma levels".  
For those who aren't statistics nerds, we can say things about values depending on how far they are deviated from the normal.  +/- 1 deviation (sigma) should be relatively normal behavior.  +/- 2 deviations should be extremely rare.  The further we deviate from the norm linearly, the exponentially fewer values we should see at those levels.

Since volumes are largely well behaved, I used this principle of sigma flagging to highlight extreme outliers to report for the Prosper show.  

Price Flagging

Price values aren't so well behaved, and using the same approach is not going to flag useful data:
Just looking for straight price outliers isn't useful.  The only things that will flag are long rises/declines which represent the extremes of the last year.  We'd like to use the same extremity methodology for prices, but a different approach would be required.

Deviation From a Trend

The inspiration came from the Bollinger Band chart.  Simply, it puts a simple-moving-average trend line and then moving-deviation bars around the chart (red lines on matchstick above).  If we instead characterized the distance from a trend, we'd be able to say things about "this is an extreme deviation".

This is a far more "normal" plot.  Also, Etienne rolled in simple-moving-median to compensate for items that might have outrageously bi-variate behavior because of a paradigm shift due to a patch.

Unfortunately, without some sort of second filter, we're going to flag everything every week, and that's not a useful filter.  So Etienne added a voting scheme and "highest votes" binning technique to properly classify the outcoming flags.

Results So Far

So far, this is my favorite validation that the pull is working as intended:

Here we see a peak last week, and a drastic crash in progress.  Though we would have reviewed the data anyway (fuel is a forced group in the tools), finding it in the expected flagging group is a great sign.

To explain what I see in the first graph: we see a spike in pre-Phoebe stockpiling, then a rapid dump off once the patch hit.  What I also see in the above is an heavy overcorrection in the price, dumping it much lower than really makes sense.  If you were watching this product, this would be a great opportunity to buy hoping for a snap-back.  Especially looking at the bottom RSI chart, closer inspection shows that the product is crossing heavily into "oversold" territory, and is strongly signalling an artificially low price.  Of course, balance these price signals against the volume flags (perhaps slightly anemic) to temper expectations. 

What's going on under the hood is the tool is checking the closing average against the white-dotted moving average line.  Though the moving average will catch up, right now the distance from the trend is WAY out of whack.  Especially since it's voted for "very abnormally low" for 5 days, along with 2-3 votes for "abnormally high", this was going to end up in the charting group regardless.

Great... but

Now I have a new problem... too much good data.  To keep the outlier segment inside 15 minutes, I have to filter the pick list to 15-25 items.  Etienne's new tool flagged 500 items, and the true-positive rate is astounding.  For those looking to get into some powerful market automation, this methodology is extremely powerful and should help boil down opportunities like nothing EVE has seen before.  Though we still lack the means to automate "black swan" events like expansion releases, the flagging methodology is very useful for the active trader.

Also, for all of its power, we're up against a problem where the show format and the goals don't match.  Two of the chief goals of the show is to showcase investment opportunities and general trend information going into the weekend.  Unfortunately, the flags are very good for a very short period.  Many of the flags show high pops during the week, after the action has expired.  So, if the trend isn't cooking on Tues-Weds, the show will miss the opportunity to report it.

Lastly, it's a little hard to use this as a direct day-to-day trading tool because of the way the CREST feeds update.  Rumors are that CCP is going to roll out a "one day" CREST feed to simplify keeping the dbs up to date.  

What's Next?

We have some tasks on the table, but the short term goals are:
  1. Bring in destruction data
  2. Do inter-hub analysis
  3. Build indexes
Things are moving along pretty well.  I expect that zkillboard data will be live by the first week of December, and a few more QOL updates should make the show prep move along easier.  Also, rumors of new CREST feeds should improve the quality of data we're pulling (or make our lives even more difficult).  Regardless, with new hardware coming in at home, the ability to automate more should make things move more smoothly.