Thursday, October 10, 2013

A Little Less Talk: Part 2 - zKB cooking with fire

Like Frankenstein's monster, the parts are coming together.  zKB crawling is almost ready for the first full time passes.  Figured it would be worth blogging some of the work.

Throughput is a Bitch

After 2 days of running the "pre-alpha" full-flow version of my binner, the progress was as follows:
Frigate319,689 (to Aug1)
Rookie ship241,695
Logistics14,736
Capital Industrial Ship245
Prototype Exploration Ship186
That's a lot of mails parsed, but my rate was something like 400 kills/minute.  This is abysmally slow, and means I would have needed several days to have any hope of getting the whole destruction picture.  

Thankfully, the dudes behind zKB just added some keys to better communicate server status and my dry run was pulling 2,600 kills/minute and stands to run stable at up to 3,800 kills/minute.  Still pretty slow compared to the market data (10,000 entries/minute) but I'll take the sizable improvement.

For those playing along, 3 throughput keys were added to the HTTP header:
X-Bin-Attempts-Allowed
X-Bin-Requests
X-Bin-Seconds-Between-Request
Leveraging these keys lets me set the between-call waits on the fly.  As the budget is changed, I am able to adapt to that and pull "as fast as possible" according to the rules.  I would like to implement a more dynamic back-off routine, that keeps a more steady stream, but that is not yielding a better throughput at the moment. 

Still a Database Scrub

Originally, I was making the scraper set up dynamic "bins" from a file, and push those into a table.  The output can be found on my gdoc dump.  As this is practical for serving from SQL->user, it is not efficient or elegant.  By relying on the data dump for translation, I'm now only storing the required information:
  • Date destroyed
  • Week destroyed (becuase I don't want to do the date->week conversion)
  • typeID
  • typeGroup (also for easy grouping)
  • systemID
  • destroyed count
I could stand to lose Week/typeGroup from the DB, but I like to have the quicker grouping handy... and being numbers instead of strings means they are much smaller to deal with.

Results

Frigate27
Cruiser1
Industrial51
Capsule1
Shuttle5
Rookie ship50349
Assault Frigate3
Heavy Assault Cruiser1
Deep Space Transport9
Combat Battlecruiser4
Destroyer4
Mining Barge19
Interdictor2
Exhumer63
Covert Ops2
Force Recon Ship2
Stealth Bomber5
Capital Industrial Ship247
Prototype Exploration Ship189
Blockade Runner39
In ~30mins, I was able to crunch nearly 50,000 kills.  The numbers aren't as tidy as before (should add bool value for ships killed vs cargo destroyed), but this is leaps and bounds better than before.  Odds are good that by Monday I'll have a nearly complete picture of destruction statistics in EVE.

To Do

  • Add something to watch for repeated killID's
  • Clean up home PC so I can parse this data at home
  • Better test "polite snooze" routine
The zKB devs have asked me to contribute this feature to them so they can serve the data themselves.  I would be more than happy to open up that data to the world through them, but seeing as kill data is such a small segment of my project, I would rather focus on my goals for the time being.  If I can get to the point where I am able to hire contributors, then I might be able to loop back and contribute to them.

Also, zKB has a service like EMDR that throws live data to listeners.  If I can get most of the parts I'd like stable on the "cron" data, then I'd be more than happy to switch feeds over.  Unfortunately, since I have no reliable web space to catch these live feeds, I am not able to get the qualities I need from them at this time.