Monday, October 14, 2013

Fool's Errand

I found today's Nobel prize in Economics interesting, especially since it's partially related to my project.

The prize winners, all vastly more qualified than me, state through their research that you can't know short term price fluctuations, but should be able to map longer term trends.  I might be in trouble, since my project is looking to do the opposite: chart with decreasing certainty a small number of weeks into the future.

The end product here is that I may not be able to do what I want with all this data.  But if I don't try, because it's "impossible", then I will never know.  I'd like to take this moment to talk through some of my dissenter's opinions.

Imperfect Data

This is the most common dissent I hear when people hear what I am trying to do: "But the out-of-game feeds are imperfect.  How could you possibly know EXACTLY [pick your metric]?"  I always end up countering with a classical engineering retort: "But I can get close enough"

If I may extend the metaphor, imagine you couldn't possibly see something with the naked eye (ISS flying overhead, for example).  If I could get a telescope to take a half-decent black-and-white picture of it, would that not be "close enough" for practical purposes to show you that it was there and what it kinda looked like?  I may not be able to provide the stunning HD pictures NASA can, but something beats nothing.

Exploring the frontier is all about using what you can to get what you need.  I may not be able to tell you EXACTLY how many noob frigates died this year, but I can tell you it's on the order of ~250K and probably under 300K.  Just because I can't know the EXACT number doesn't mean a good estimate has no value.  

Understanding Limitations

It's important to know the relative accuracy of the data you're collecting, and what your blind spots may be.  As far as kill data goes, these are the assumptions I am using:
  • PVP-kill quality
    • 95% quality.  API-only kills should provide extremely good coverage
    • HS kills will be less thourough.  But gaps should be very small
  • Other kill quality (NPC kills, CONCORD kills, self-destructs)
    • No way to view these kills.  zKB filters NPC-only kills before adding them to DB
    • These kills should account for a very low percentage of destruction data
The thinking goes: if something dies in PVP, it should get to zKB somehow.  It only takes one key to get the data.  Either from the victim or the killer, or their corps.  Now, it is possible to have kills unaccounted for, where the killer (killing blow) or victim or their corps don't have a key in zKB/eve-kill's records.  But losing sleep over the last ~10% that I can know is not worth derailing the 90% I'm already getting.

The things that worry me that I'm not seeing:
  • PVE deaths: BS/BC/T2 losses to rats
  • Suicide bombers: Attacker km's are ignored
  • Self destruct data: small segment of pod data not being tracked
  • NPC corp data that might be missed because killer doesn't have correct keys
The hope is these groups account for a very small fraction of the data out there.  

Prediction Quality

I expect to get a decent idea of the future price of something (trend up or down, by how much?) and network all those predictions together to feed to a machine that will automatically task out my manufacturing lines.  If the tuning is strong enough, getting a leg up on the shipping margin economy is a second avenue for activity.  

I catch flak when I describe this because people get mired in the fine details.  I might predict a 20% rise in price on a weapon, but only see a 10% rise.  That's still enough to pocket profit, and I'm better off having some numbers-based prediction than spending a ton of my time scouring the numbers and playing by "gut feeling".  Large repetitive math is EXACTLY what computers are for, and if I can tune the machine to have even a sliver of intuition, then I am ahead of my competition.

Today, I am using today's-cost and today's-profit to say that when I do get to market, I will be somewhere close to that prediction.  I also watch market order volume to make sure what I bring to market is a suitably small percentage of actual sales, so as not to be the downward force.  This is fine in products that swing slowly (most modules) but can be extremely troublesome in ships where bubbles are constantly forming/popping with fickle player tastes.  

My preliminary data doesn't show kills as a predictive metric.  But with kill data being extremely spiky (weekend warriors) I may not be looking at the groupings correctly yet.  So far, only pure-market numbers look like the trend setters.  This is probably because the kill data I am scraping isn't as publicly and easy to access as eve-central data.  But there has to be some amount of weight to put into "replacement" behavior rather than just purely buying and selling commodities without any other basis in reality.

I am wondering if I should get in touch with Chribba or Ripard Teg for their PCU numbers too.  Since player participation is pretty directly related to profitability.

Progress Update

In the end, you won't know unless you try.  Even if the data is purely scientific it's been extremely interesting to get a look at what is destroyed.  Raw dumps for those interested.

As interesting as that is, you get a very similar picture with sales data

If you overlay the two charts, the levels and spikes line up pretty similarly.
As for data parsed so far: total ~3.15M mails parsed
Frigate 902227
Cruiser 314795
Battleship 77331
Industrial 81221
Capsule 929040
Shuttle 3153
Rookie ship 245810
Assault Frigate 109897
Heavy Assault Cruiser 164
Deep Space Transport 30
Combat Battlecruiser 387
Destroyer 314605
Mining Barge 658
Command Ship 6280
Interdictor 32810
Exhumer 23516
Carrier 4873
Covert Ops 537
Interceptor 276
Logistics 14973
Force Recon Ship 177
Stealth Bomber 96750
Capital Industrial Ship 247
Electronic Attack Ship 32
Heavy Interdiction Cruiser 137
Black Ops 19
Marauder 27
Combat Recon Ship 53
Strategic Cruiser 377
Prototype Exploration Ship 236
Attack Battlecruiser 192
Blockade Runner 164