Throughput is a Bitch
After 2 days of running the "pre-alpha" full-flow version of my binner, the progress was as follows:
Frigate | 319,689 (to Aug1) |
Rookie ship | 241,695 |
Logistics | 14,736 |
Capital Industrial Ship | 245 |
Prototype Exploration Ship | 186 |
That's a lot of mails parsed, but my rate was something like 400 kills/minute. This is abysmally slow, and means I would have needed several days to have any hope of getting the whole destruction picture.
Thankfully, the dudes behind zKB just added some keys to better communicate server status and my dry run was pulling 2,600 kills/minute and stands to run stable at up to 3,800 kills/minute. Still pretty slow compared to the market data (10,000 entries/minute) but I'll take the sizable improvement.
For those playing along, 3 throughput keys were added to the HTTP header:
Leveraging these keys lets me set the between-call waits on the fly. As the budget is changed, I am able to adapt to that and pull "as fast as possible" according to the rules. I would like to implement a more dynamic back-off routine, that keeps a more steady stream, but that is not yielding a better throughput at the moment.X-Bin-Attempts-Allowed
X-Bin-Requests
X-Bin-Seconds-Between-Request
Still a Database Scrub
Originally, I was making the scraper set up dynamic "bins" from a file, and push those into a table. The output can be found on my gdoc dump. As this is practical for serving from SQL->user, it is not efficient or elegant. By relying on the data dump for translation, I'm now only storing the required information:
- Date destroyed
- Week destroyed (becuase I don't want to do the date->week conversion)
- typeID
- typeGroup (also for easy grouping)
- systemID
- destroyed count
I could stand to lose Week/typeGroup from the DB, but I like to have the quicker grouping handy... and being numbers instead of strings means they are much smaller to deal with.
Results
Frigate | 27 |
Cruiser | 1 |
Industrial | 51 |
Capsule | 1 |
Shuttle | 5 |
Rookie ship | 50349 |
Assault Frigate | 3 |
Heavy Assault Cruiser | 1 |
Deep Space Transport | 9 |
Combat Battlecruiser | 4 |
Destroyer | 4 |
Mining Barge | 19 |
Interdictor | 2 |
Exhumer | 63 |
Covert Ops | 2 |
Force Recon Ship | 2 |
Stealth Bomber | 5 |
Capital Industrial Ship | 247 |
Prototype Exploration Ship | 189 |
Blockade Runner | 39 |
In ~30mins, I was able to crunch nearly 50,000 kills. The numbers aren't as tidy as before (should add bool value for ships killed vs cargo destroyed), but this is leaps and bounds better than before. Odds are good that by Monday I'll have a nearly complete picture of destruction statistics in EVE.
To Do
- Add something to watch for repeated killID's
- Clean up home PC so I can parse this data at home
- Better test "polite snooze" routine
The zKB devs have asked me to contribute this feature to them so they can serve the data themselves. I would be more than happy to open up that data to the world through them, but seeing as kill data is such a small segment of my project, I would rather focus on my goals for the time being. If I can get to the point where I am able to hire contributors, then I might be able to loop back and contribute to them.
Also, zKB has a service like EMDR that throws live data to listeners. If I can get most of the parts I'd like stable on the "cron" data, then I'd be more than happy to switch feeds over. Unfortunately, since I have no reliable web space to catch these live feeds, I am not able to get the qualities I need from them at this time.
No comments:
Post a Comment