Showing posts with label tools. Show all posts
Showing posts with label tools. Show all posts

Monday, April 17, 2017

Aspiring Hari Seldon - Part 3 - Releasing REST Forecasts

Check out part 2

After playing with Prophet, I wanted a way to distribute the data more publicly.  I could have just incorporated it into our R templates and made it part of the show, but I felt like that was unfair to the general public.

(Though it required a ground-up rework) I used the Flask Restful app we were providing for EVE Mogul.  Though this turned out to be a complete rewrite, I learned a lot about Flask and testing to make things go.

What They Don't Tell You

One of the biggest pains of self-teaching is it's very easy to learn enough to be useful, but not enough to be good.  And the Flask documentation is particularly bad about giving up the barebones of what does what, but totally forgets to sell a viable project shape, leading to a lot of pain later.

Since the original OHLC feed was based loosely off a work project (as a way to test out corners on my own time), it repeated a design flaw.  Turns out there's a very particular Flask shape, and deviating from it causes a lot of problems.  

Some examples to avoid pitfalls:
By using the prescribed Flask structure, all the pieces from launcher to test make a lot more sense.  Particularly, testing was one of the more difficult pieces of the picture.

About The Endpoint

Because of the CPU-intensive operation of running predictions, I wanted to incorporate two features:
  • API keying
  • Caching
Though I'm happy to share the source, and open up to the community, I'm not ready for full-hog release.  TL;DR: There are a couple of Flask eccentricities, and gdoc integration could cause some serious issues.  Also, uncached performance can run north of 15s, which could cause issues for some platforms.

To help get around "walled garden" accusations, I've done a few things:
  1. I've left copious notes and automation on how to deploy the service on your own webhost
  2. I've provided API keys to some other market devs such as EVE Mogul and Adam4EVE
  3. I am happy to distribute keys on-request to other devs
The goal is to get the content out to the widest audience possible, even if the raw data is a little unwieldy.  And due to my limitations as a developer, this is my compromise.  

Predictions In The Wild

Adam4EVE



Though my API service is designed to give you soup-to-nuts everything you need to plot in the REST payload, Ethan02 over at Adam4EVE added his own DB to keep us honest.  I totally love this!  As of right now, it's still in their DEV branch, but expect to see more from them soon!

EVE-Mogul


Extending the existing OHLC candlestick plot, EVE Mogul will let you keep close tabs on what you're currently trading, and this is an excellent chance to gut-check your investments.

Conclusions

Check out the source: ProsperAPI

I didn't get to share all the other super-nerdy #devfleet stuff (like travis-ci integration, or PyPI release).  I will probably try to release more notes on python stuff directly on Medium going forward.

This was an eye opening project in a lot of ways.  This opens the door for more micro-service REST stuff in the future.  Also, I do plan to have the PLEX split covered before CCP releases it on May 9th.

Tuesday, October 18, 2016

Up And Down - Maintaining a OHLC Endpoint and Deploying Flask Restful


This is the first part of a more technical devblog. I will be writing up more specifics in a part 2, but I wanted to talk about the ups and downs behind the scenes with our EVE Mogul partnership. Issues are mostly my failings, and Jeronica, Randomboy50, and the rest of the team have been amazing given my shoddy uptime.

Prosper's OHLC Feed

I forgot to blog about this since the plans for Prosper's v2 codebase have only recently solidified, but we have a CREST markethistory -> OHLC feed hosted at eveprosper.com. The purpose was to run Flask/REST through its paces, but Jeronica over at EVE Mogul whipped up a front-end and Roeden at Neocom has been using it in their trading forays.

SSO Login Required To View

This originally served me well as a learning experience, but keeping a REST endpoint up isn't as simple as originally expected. From Flask's lack of out-of-the-box multithread support, to some more linux FUBAR's below, it's been a wild ride. And now that players are legitimately counting on this resource as part of their toolchain, I figured it's time to get my act together.

The Litany of SNAFUs

What really brought the house of cards down was our move from a traditional hosting service to a full r/homelab solution. Prosper has been living besides some other nerd projects (minecraft, arma, mumble, etc) and this move gets Prosper off the shitlist from the other customers when Wednesday night rolls around and I hammer box generating the show's plots. Unfortunately, for the added performance, we trade being under a benevolent tinkerer; restarts and reconfigs are more common than before. It's a huge upgrade, and I can't thank Randomboy50 enough for the support, but nothing is truly free (except the minerals you mine yourself™).

#nofilter #bareisbeautiful


This need for stability runs headlong into a shitty part of python: package deployment. Though wheeling up and distributing individual python libraries is easy, deploying python as a service is not. There will be a second blog on the specifics, but you're largely stuck with magic-project-deploying scripting out of the box, which can get really hairy if you're not careful about virtualenvs.

Thankfully, work turned me on to dh-virtualenv and though now we're grossly overengineered with a service .deb installer, we now have a properly deployed linux service that should be far more robust going forward. It does mean that there's now "build" and "deploy" steps for updates, but now that we're tied into systemctl the endpoint should be much less likely to go down.

With the last few months of work, I still expect a large amount of reengineering in our quest for a Quandl-like EVE service, but with the installer built in we can upkeep the endpoint with a lot less effort going forward. We are still behind on the ProsperWarehouse rollout, getting scrapers rewritten, but those modules should be a cakewalk to deploy now with ProsperAPI properly built up.

Also, I've worked in a discord logging handler which will be useful for monitoring, but notes on that later ;)

Tuesday, July 7, 2015

Aspiring Hari Seldon - Developing Price Predictions (Part 1)

With the Prosper Market Show on semi-hiatus for the summer, I'm trying to put the time toward developing some new utilities for analysis.  Though I have plenty on the to-do list to work through, I wanted to toy around with some future-looking tools to help illustrate some of the intuition I see moving forward.

Specifically, after looking at some examples, I wanted to try out Monte Carlo analysis.  The idea being if you take a decent prediction methodology, then generate thousands of predictions, you can average out the noise to get at the true signal.

Stage 1: Generating Predictions

Knowing nothing at the start, I picked up Geometric Brownian Motion as the predictor function.  It's relatively popular and well supported, and is handily ready in an R Library.  I won't even begin to claim I am an expert as to what's going on under the hood, but the general concept is pretty straight forward:

Next Price = Current Price + (expected variance)*(normal-random number)

The variance can be tuned as a percentage to express how far any individual roll will move, and how widely the random-number-generator will vary.  To find the numbers, I had to do a bit of hackery, but if you want to see the full methodology, check out the raw code here.  Also, it's worth noting that GBM does take into account interest, but I've zeroed out those numbers for the time being.

Using CREST data as a basis, we can pick a point in time and generate some predictions looking forward.  But due to the random-variance methodology, some predictions are good, and others are not:


These are 3 independent predictions using a low/med/hi variance seed into the GBM function.  The "high" line trending down and "low" trending up are a matter of which rolls are winning.  If you look more closely, the bounds on low are much tighter where the high is much more wild.  Given another set of rolls, the best predictions will change.

The weakness we should be aware of is rolling errors.  Because the GBM function is a recursive and random guessing function, if predictions start to stray off into the bushes, it's hard to expect they will come back.  This model uses a single price to start walking, and the further it walks, the wider the end results will land.

Step 2: Generate a lot of predictions

Happy with the general behavior of the model, the next step is to be able to generate a lot of guesses.  Randomness should wash out given enough trials.

Though I was able to wash out the randomness, the end result is not nearly as useful as I was expecting:


Summarizing the predictions by-day gives us a much different picture.  The randomness is gone, but we're left with a less useful "cone of possibility".  Though the end result we're looking for is definitely a "zone of possibility", this picture is not a useful automation of that concept.  Specifically that the median prediction is roughly "the price will stay the same", this is not a useful prediction for most items.

What is going on?  Well, here we can actually see the GBM function for what it is and why it is breaking down for our predictions:
  1. Assumes +/- variation is equally probable.  Here it looks like the distributions are strictly normal, which means they are centered around 0 +/- variance over time.
  2. Takes no history into account by default.  Function takes a single price and a single variance variable.  Understands nothing about max/min variance
This is particularly absurd for PLEX where variance/volatility is largely positive.


Just looking at the variations for the last year, almost 60% of the movements were positive.  We could further enhance our predictor by looking at a more recent window.

Step 3: Back to the Drawing Board

After chewing on this problem for a few days, I do have some things to follow up.  Though there are some other models to try in the sde package, I do think we have some options on how to get more useful predictions out.

Convolve Predictions?

This is one I've gone back and forth about.  I'd like to be able to "sum" together the predictions to get the general "tone" of what is going on.  Except that it's been 5 years since I took a digital signals class, and my attempts so far have just been guesses.  Though R is ready out-of-the-box to do FFT/convolutions, I need to better understand what I'm asking the program to do.  Currently, all attempts run off to infinity.

Exponential Predictions Using Volatility

After seeing the outcomes of the GBM experiments, I'm instead seeing a different trend pop out.  If I just pick a low/med/high variance out of the distribution, I could better generate a window in the short term.  Simply project forward the same %variance forward.


Another option is to get witty with the actual high/low bounds to narrow the prediction off our existing price-flagging system.  I just picked 25/75th percentiles, but we could narrow those bands with a smarter lookback to characterize how "wild" the last period has been.

Get Smart: Filter/Augment GBM Outputs

The last option is to roll some of my own ideas into the random-guessing function.  Using the historical variance as a seed for guessing-function, and/or dropping predictions that don't at least make a good prediction for the last 10 days before moving forward to the next 10.  I'm not yet convinced either is better than the far simpler exponential predictor above, because I would expect the exponential pattern to still wash out in the end, especially if we stick with a Monte Carlo style approach.

The hope would be to start a random function on a linear path (perhaps take 2wks into account) then have exponential variance as the high/low bounds to build a channel.

EDIT: using RSI as a prediction filter yielded some interesting results.  There is still some tweaking to do, especially for items that have had a recent spike into very unstable territory but initial forays seem promising



Conclusions

I'm really disappointed with the outcomes from this project; a lot of articles online made this sound like a magic-pill.  Instead we see the underlying nature of the model once randomness is removed.  Though it's not a complete loss, I was hoping for something a little wittier than a linear fit, but CCP Quant has pointed out Forecast inside R for more tinkering.

The nice thing out of this exercise has been being able to quickly experiment in R.  Though there are still some pretty considerable hurdles between these experiments and actual inclusion in the show, I was able to work with some pretty powerful packages extremely quickly to really dig into the problem and iterate quickly.  I continue to be impressed with R as an exploration platform, but still have some hesitance on integrating it in more powerful ways.  

Sunday, November 16, 2014

How It's Made: Price Flagging

I owe my partner in crime, Etienne Erquilenne, a huge debt for adding a much needed second pair of hands onto this whole Prosper project.  His IRL expertise is completely invaluable, and has freed me up to accelerate the development schedule measurably.  But, as I've expressed on the blog before, I'm very much a fan of open sourcing.  So let's look under the hood.

Volume Flagging

Volume data was straight forward.  Since they never go negative, and rarely jump by orders of magnitude, it was pretty easy to wrap the values up into a normal-ish histogram.  Below is Tritanium:
1yr of Tritanium Volumes - The Forge

Not perfectly normal, but close enough where we can use percentiles to check the "sigma levels".  
For those who aren't statistics nerds, we can say things about values depending on how far they are deviated from the normal.  +/- 1 deviation (sigma) should be relatively normal behavior.  +/- 2 deviations should be extremely rare.  The further we deviate from the norm linearly, the exponentially fewer values we should see at those levels.

Since volumes are largely well behaved, I used this principle of sigma flagging to highlight extreme outliers to report for the Prosper show.  

Price Flagging

Price values aren't so well behaved, and using the same approach is not going to flag useful data:
Just looking for straight price outliers isn't useful.  The only things that will flag are long rises/declines which represent the extremes of the last year.  We'd like to use the same extremity methodology for prices, but a different approach would be required.

Deviation From a Trend

The inspiration came from the Bollinger Band chart.  Simply, it puts a simple-moving-average trend line and then moving-deviation bars around the chart (red lines on matchstick above).  If we instead characterized the distance from a trend, we'd be able to say things about "this is an extreme deviation".

This is a far more "normal" plot.  Also, Etienne rolled in simple-moving-median to compensate for items that might have outrageously bi-variate behavior because of a paradigm shift due to a patch.

Unfortunately, without some sort of second filter, we're going to flag everything every week, and that's not a useful filter.  So Etienne added a voting scheme and "highest votes" binning technique to properly classify the outcoming flags.



Results So Far

So far, this is my favorite validation that the pull is working as intended:


Here we see a peak last week, and a drastic crash in progress.  Though we would have reviewed the data anyway (fuel is a forced group in the tools), finding it in the expected flagging group is a great sign.

To explain what I see in the first graph: we see a spike in pre-Phoebe stockpiling, then a rapid dump off once the patch hit.  What I also see in the above is an heavy overcorrection in the price, dumping it much lower than really makes sense.  If you were watching this product, this would be a great opportunity to buy hoping for a snap-back.  Especially looking at the bottom RSI chart, closer inspection shows that the product is crossing heavily into "oversold" territory, and is strongly signalling an artificially low price.  Of course, balance these price signals against the volume flags (perhaps slightly anemic) to temper expectations. 

What's going on under the hood is the tool is checking the closing average against the white-dotted moving average line.  Though the moving average will catch up, right now the distance from the trend is WAY out of whack.  Especially since it's voted for "very abnormally low" for 5 days, along with 2-3 votes for "abnormally high", this was going to end up in the charting group regardless.

Great... but

Now I have a new problem... too much good data.  To keep the outlier segment inside 15 minutes, I have to filter the pick list to 15-25 items.  Etienne's new tool flagged 500 items, and the true-positive rate is astounding.  For those looking to get into some powerful market automation, this methodology is extremely powerful and should help boil down opportunities like nothing EVE has seen before.  Though we still lack the means to automate "black swan" events like expansion releases, the flagging methodology is very useful for the active trader.

Also, for all of its power, we're up against a problem where the show format and the goals don't match.  Two of the chief goals of the show is to showcase investment opportunities and general trend information going into the weekend.  Unfortunately, the flags are very good for a very short period.  Many of the flags show high pops during the week, after the action has expired.  So, if the trend isn't cooking on Tues-Weds, the show will miss the opportunity to report it.

Lastly, it's a little hard to use this as a direct day-to-day trading tool because of the way the CREST feeds update.  Rumors are that CCP is going to roll out a "one day" CREST feed to simplify keeping the dbs up to date.  

What's Next?

We have some tasks on the table, but the short term goals are:
  1. Bring in destruction data
  2. Do inter-hub analysis
  3. Build indexes
Things are moving along pretty well.  I expect that zkillboard data will be live by the first week of December, and a few more QOL updates should make the show prep move along easier.  Also, rumors of new CREST feeds should improve the quality of data we're pulling (or make our lives even more difficult).  Regardless, with new hardware coming in at home, the ability to automate more should make things move more smoothly.

Monday, July 28, 2014

Weekend Update

Crunched on some work this weekend.  Getting dangerously close to complete on the spreadsheet front, and pushed some updates out.  Just posting this as a quick update for those that may have missed them.

Crius Industry Tools for Google Drive - Update

I pushed some updates to the script this weekend.  This should address some bugs with the getPOS() routine, as well as some features that @CCP_Nullarbor added to the Teams CREST feeds.
  • Fixed issue with getPOS(): Was trying to fetch location names with the wrong function call.  This issue has been resolved and all valid TQ corp keys should return good data now
  • Fixed issue with getIndustryJobs(): Was not fetching personal jobs correctly.  Changed key validation to be correct now.  Can fetch corporate or personal jobs automatically off a single key/vcode
  • Added teamName/activity to Teams and TeamsAuction feeds: Were "TODO" placeholders.  Now reports in-game names and valid activity information
Go fetch a fresh copy from either source, and continue reporting bugs
I still need to clean up some errors and hanging bits.  I'd like to push this tool to proper add-in status, but I am a little unclear on what to do between what I have now and what it takes to be considered "release ready".  Also, probably time to bite the bullet and buy some icon/banner designs.

More Crius Utilities for Devs

This one won't be as useful for the veteran 3rd party devs, but should be invaluable to the amateurs.  I put together some SQL queries and the resultant CSVs into the github.  If you're trying to update your spreadsheets with new data, these feeds should help you save a boatload of time.  If you're not sure what to do with the CSV files, I suggest you read up on pivot tables.

Let's Build a Spreadsheet Series

I did a pretty long set of streams on Sunday.  The goal was to help show off the spreadsheet fu and help people learn to build better spreadsheets with the tools at hand.  I will have the raw streams up on YouTube and Twitch tonight, but I want to boil down the lesson plans into 5-15min function lessons.  I still need to write up lesson plans and get my hands around video editing, but it's a goal for this August to put together a series that will help people build their own calculators.

Sunday, July 20, 2014

Building Better Spreadsheets - Crius Toolset

Crius releases on Tuesday, and most industrialists are scrambling to replace their spreadsheets (I know I am).  What's worse, the job-cost equation requires live data from the game, which can be difficult for someone relying on spreadsheet feeds.  Fear not!  I have great news!

Crius Feeds In Google Spreadsheets

These functions let you read in the CREST/API feeds of common calls and import them directly into your spreadsheet.  Furthermore, the triggers are set up to automatically refresh the data periodically so your spreadsheet will always be up-to-date.  Though this is not an exhaustive API tool, and still could use some more features, it should be a huge leg up for any spreadsheet jockey.

Most of the feeds are designed to dump the entire feed, and don't offer much filtering in the call.  Instead, they were designed to be used in a reference sheet that could then be leveraged using VLOOKUP() or QUERY().  This might lead to some issues with complexity down the line, so I intend to eventually add some finer calls that will just return single-line kind of data.

Getting Started

Method 1: Clone the Master Spreadsheet

This will give you the tools and triggers, but will not stay up-to-date with the master sheet.  Until I can wrap up the code as a stand-alone Drive app, this will be the most stupid proof way to get a copy:
  1. Open the spreadsheet
  2. Go to File -> Make a Copy
  3. Set the name of your copy (do not share with collaborators)
  4. Remove any extra sheets/calls you need to and start developing your spreadsheet
This method is the easiest to start with, but has the issue that it will not keep current with updates.  

Method 2: Copy-Paste from Github

The codebase is free and open source, and is designed to be copy-pasted into the gdoc script interface.  This method is a little more tedious, but will be easy to copy updates as they come out.
  1. Get plain-text code from the GitHub repo
  2. In your spreadsheet, go to Tools -> Script Editor...
  3. This opens a new window.  Select "Blank Project" from the initialization prompt
  4. Copy the raw code into code.js space
  5. Set the name of the project
  6. Save the changes
  7. Configure the app triggers.  Set get/All functions to 1hr timers

This will give you all the utilities in a fresh, or existing codebase.  Also, configuring the triggers appropriately will keep the data up-to-date automatically.  It's technically optional, but without time triggers, it will require a fresh open to pull fresh data.

Also, as updates come out, you'll be able to drop in the new code.  I expect to keep this project backwards compatible, so each drop in should ADD features.  Though, of course, if you go editing the code, you will need to be more careful about dropping in changes.  

Function List

  • getPOS (keyID, vCode, header_bool, verbose_bool, test_server_boo l)
  • getFacilities (keyID, vCode, header_bool, verbose_bool, test_server_bool )
  • getIndustryJobs (keyID, vCode, header_bool, verbose_bool, test_server_bool )
  • getAvgVolume (days, item_id, region_id )
  • getVolumes (days, item_id, region_id )
  • AllItemPrices (header_bool, test_server_bool )
  • AllSystemIndexes (header_bool, test_server_bool )
  • AllTeams (header_bool, verbose_bool, test_server_bool )
  • AllAuctions (header_bool, verbose_bool, test_server_bool )
The functions are designed to be referenced as simply as possible.  CREST feeds like AllItemPrices and AllSystemIndexes can be referenced without arguments if desired.  Also, the classic API feeds are designed to return as much information as they can, with internal switches to try and use the /corp/Locations feeds if possible.  Also, most feeds come with a "verbose_bool" trigger to add/remove ugly or useless raw ID kind of data.  Lastly, the test_server_bool has been left in the release.  For TQ this value can either be blank or false.

Function guide below the cut

Wednesday, July 16, 2014

Can I Play With MA(C)Dness?

Blame this one entirely on @K162space.  He got me playing with Quantmod in R.  Also, shout out to CCP Quant for showing off the original bits, and CCP Foxfour for the CREST market history.

After all my EVE data experiments, I've had a very hard time finding predictive correlations between datasets.  Even trying to bring in destruction data showed no predictive properties, instead only showing correlation to corroborate market volumes.  Also, I've tried my best to get into Varakoh's analysis, but have never been able to internalize his methods enough to consider rolling it into my own analysis (also getting reliable candlesticks for EVE data is a pain)

Then Blake over at K162space.com sent me this tidbit:

Quantmod opened a whole new realm of tools I never knew about (full listing here).  The most interesting to me has been Moving Average Convergence Divergence (MACD).  I wouldn't be so bold as to say "use this to make money", but the results are extremely interesting when using IRL stock data.

INTC - YTD
For those that have never seen an MACD plot, the theory is relatively simple: Different line crosses are signals for buy/sell operations (specifics here and here).  Though the signals are imperfect, and will by no means catch every peak and valley, they can be an excellent sanity check for the longer trend.  For many IRL stocks these trends can have very strong correlation and are a popular tool among amateurs.  It is less useful for the minute-to-minute daytrading or arbitrage, but can be a powerful data source for the long-term investor.

Can this be useful for an EVE trader?  Well....

Let's look at a few plots:
PLEX - YTD - click to embiggen
Tritanium - YTD - click to embiggen
Phenolic Composites - YTD
I had to do a little fudging to get open/close bars to work correctly (though high=close=avg is roughly the same).  The trend information is interesting, but the crossover signals aren't lining up well.  Using the traditional 12,26,9 configuration, many of the crossover signals arrive 2-3 days late.  If you ended up using these charts as-is, I think you'd end up at best breaking even.  Though there are some secondary things you could do like buy and sell in the swings up and down, these charts aren't going to be immediately useful.

I then started playing with shorter windows, and pairing a fast and a slow chart might be a better way forward.  Unfortunately, I'm blindly fiddling knobs right now, but I'm actively hunting down documentation to better fine tune the charts.  I was thinking a 10-15d window might be more accurate, and a 25-30d window would serve well as a "slow" chart.

Also, I think the weakness has a bit to do with the quality of data here.  Where MACD is looking for close prices, we're feeding it daily averages.  This might be a good excuse to finally work on a better snapshot tool using eve-central data.  Though I had a lot of trouble processing down the raw archive, starting up a stupid-script that pulls periodically from the traditional API would be a quick solution that could start crunching in the background.  Once I can get my industry tools refactored, I expect to get back into the data mining game pretty seriously.

In the meantime, be sure to subscribe to my channel on Twitch, and follow on Twitter.  I will be doing some random streams over the next week as I get back into a reasonable routine again.

Sunday, June 22, 2014

Crius stuff - Getting Ahead in Code Development

http://community.eveonline.com/news/dev-blogs/upcoming-api-changes-for-industry/

Crius isn't that far off, and I'm starting to get a little anxious about getting started on new industry tool code.  Though CCP_Foxfour has been extremely outspoken on Twitter and Reddit about API rollouts, it's been harder to get the other base math information needed to make calculations.

Beta Data Dump

Thanks to FuzzySteve and CCP_Nullarbor for dropping this pro-tip in my stream.  You can take a look at the current BPO data inside the Singularity client files:
<sisi client path>\bin\staticdata\blueprint.db
 There you can find a SQLite db of the individual BPO data roughly the same format as will be delivered in the SDE when Crius launches.  I had a little trouble exporting the whole thing because I'm a noob and SQLite TEXT/BLOB rules are weird.  I ended up whipping together a little Python script to dump as csv.  The final version will be published as YAML, but should still be the JSON-string type for extraction.

Just to reiterate, we probably won't have a real set of tables for attributes like before.  Instead, each blueprint will be an individual object with all the attributes inside it.  There's nothing stopping you from expanding/transforming this data back into tables, but don't expect CCP or Fuzzwork to do it for you (EDIT: maybe after release, but will be custom project).

New APIs

The API is dead, long live the API

Until SSO rolls out, we're still going to have the classic-API for private data, and CREST for public/global data.  I will need more time to whip up real guides, and I will probably just update dev wikis rather than post wall-o-text here on the blog.

Classic-API Feeds

  • [char|corp]/IndustryJobs: updated format, will break classic tools
  • [char|corp]/IndustryJobsHistory: updated format, will break classic tools
  • corp/Facilities: will list corp-controlled facilities like POS/Outpost

CREST Feeds

  • /industry/
    • more info soon(tm)
Personally, I continue to use Entity's eveapi Python module for classic API queries.  Mostly because Python + XML kinda sucks.  Thankfully CREST is JSON and much easier to handle.  I still haven't seen an all-in-one CREST module, so you're gonna be stuck writing handlers for each new API feed.  This shouldn't be too much trouble, since the only things that need to change between requests is address and perhaps HTTP request headers.

Tuesday, January 14, 2014

Everything You Never Wanted to Know: eveapi

This is going to be slightly misleading.  The purpose of this post is to showcase Entity's eveapi Python module, more than the official EVE API.  The ins-and-outs of large scale EVE API work require much more space than I have here, and I am not completely familiar with all the traps and holes.  Instead, this is meant as a first-pass guide to the API and how to leverage it in your custom code.  Also, this will only be about "v2" read-only options, since CREST is something else entirely and mostly a pipedream still.

EVE API Basics

The EVE API provides a read-only portal for apps to get game data.  This can be anything from skill plans, to industrial jobs, to wallet transactions, and more.  The API is accessed using a keyID/vCode combo, which is controlled through your account management page.  There, any feed can be enabled, or disabled, even the key can be set to expire or deleted entirely.  This gives account owners the means to have many APIs for any particular app or service.

A query looks like:


EVE's API returns follow the XML DOM structure.  This is a parsable tree that many code languages have easy methods to handle.  Javascript and Perl are my two personal favorites... Python's XML handler is kinda terrible.

Feeds

There are 4 kinds of feeds.  Account, Character, Corporation, and generic.  Each group has a use scheme, authentication requirements, and similar behavior.

Account Feeds

Account feeds are used to get general information about the key and do not require any special access.  They take a given keyID/vCode combo and return information about that key.  APIKeyInfo will tell you what type of key (Corporation or Character) and what feeds it can access.  Characters will tell you what characters are accessible by the account/key and general info like corporation name and characterID.  These feeds are meant to be used as a validation step.  If you are writing your own app, it's a good idea to validate against these resources, so you can handle errors such as API expiration or invalid key more gracefully.

Character Feeds

I won't break down every feed one-by-one, but Character Feeds are meant to give individual character data.  This can be troublesome if you are given a all-characters key, since the character list will need to be pulled from the Account/Characters feed.  

All of the Character feeds require a 3rd piece of the key, CharacterID.  So, if you wanted all the character sheets of one account, you would have to ask up to 3 times: one for each CharacterID.  

Corporation Feeds

Like Character Feeds, Corporation Feeds require a specific API to access.  These API keys can only be generated by CEO/directors.  The access/information is completely separated from Character Feeds: one cannot access the other at all.

Corporation feeds say they need CharacterID in the documentation, but I believe most don't need it.  Also, the Starbase Detail requires itemID of the tower in question (found in the Starbase List).  As always, check documentation for feed specifics.

Generic Feeds

All other API feeds fall under a generic category and do not even require a keyID/vCode combo.  If you're looking for map info, or a characterID name conversion, or some basic stats, they can be accessed directly without a special key.  Also, cache and limits to these APIs tend to be much more generous.

Accessing Feeds

Authentication is a two factor process.  First, a key must be validated: make sure it's valid, the expected type, and not expired.  Second, to access a feed, it must have the valid "accessMask".  If one is not valid, the API will reject your request.

Every feed has a binary access mask.  To check if you have the right key for the feed in question:

if(API_accessMask & _feed_accessMask_) == _feed_access_mask_: 
    pass 
else:    #API will return HTTP:503 error
Using this snippet means you can access any feed the key gives access to, rather than hardcoding for a specific one-size-only key.  Seriously... I want to crush hands when a tool won't take a 'everything enabled' key

Best Practices

  • Follow cachedUntil guidelines
    • Should re-return same data until cachedUntil expires
    • Can lead to bans if ignored
  • Don't forget User-Agent in your request header
    • Gives admins a way to contact developers who are causing problems
  • Use gzip encoding in your request
  • Check the return header for more info

Using eveapi

Go crawl through some of the API documentation and look at the different returns each feed has.  Though there are some constant themes, each feed has completely different return structures.  If you were to write a new API tool from scratch, you could be faced with a feed-by-feed custom cruncher.  Instead of writing 70-80 custom functions, eveapi lets you get straight to the meat of each feed with a simple set of calls.

How does eveapi work?  It utilizes Python's magic methods to build classes and objects dynamically as they are called.  Though there is no specific auth.char.CharacterSheet() entry in the codebase, it builds the query on-the-fly from the names of each call.  There is a level of elegance and future-proofing that makes eveapi absolutely incredible.

Installing

It's meant as an importable module.  It's pretty easy to copy down a version from github.  Though, to keep a version in your repository current with its current repository, use git submodule.
From there, it's a simple import eveapi from eveapi, and you're ready to rock!  Since eveapi is just a Python module, it makes it very easy to include with a project for deployment where you may not have control of the Python version, like AppEngine.

Working with eveapi

eveapi comes with a tutorial program to show how some of its features work.  Here's a little more ELI5 approach:

1: Create an auth object for the key in question

Here we have 2 objects:
  • auth is a key-specific access token.  Used for any direct Character or Corporation feed access
  • api is the global api token.  Used for generic queries 

2: Validate the key credentials

Though an exception will be thrown when you try to access something you shouldn't, it's best to avoid the exception entirely.  With this code, we pull down the keyInfo, fetch the character list (which can also come from Characters), and validate that we can ask for the Character Sheet.  We could further validate that the account is even active with AccountStatus, but that requires more access than may be given.

3: Pull down relevant information



The art of eveapi is that it accepts the names straight from the <rowset> or any other tag and lets you crawl through with english instead of:
 ...getElementsByTagName(...)[0].firstChild.nodeValue 
--OR--
 for row in rowsets[0].getElementsByTagName("row"): 
   row.getAttribute(...)
Each feed can be accessed by name.  Just combine the /type/APIname without the .xml.aspx

The contents can be accessed by name.  If it's a list, check the <rowset name="[this one]">, if it's a text item, just reference the name.  It's deceptively simple!


Instead of getting hung up in the actual structure of each individual API, each element can be accessed directly, regardless if it's text/attribute/tag/etc.  This leaves more time for contributors to work on building real tools rather than becoming mired in feed-by-feed minutia.

This is by no means an exhaustive guide for EVE API or the eveapi module.  But this should cover the basics for those who may have been overwhelmed by the API feeds previously.

Tuesday, December 31, 2013

The Sublime Art of BPO Fu

I don't know how many readers out there have been ambitious enough to dig into the EVE Data Dump for more than just typeID/typeName conversions.  There is a ton of data in there, and traversing it can be a real trip for the uninitiated.

One segment that has always proven personally daunting has been trying to scrape out BPO data.  Though the queries look easy to start, they quickly become cumbersome trying to handle all the data you actually care about.  Some of the linking you probably want:
  • T2 Products: what does the invention step cost?
  • T2 Products: what is the source BPO?
  • T2 Products: what decryptor(s) should I use?
  • T2 Products: default runs yield?
  • Can I build the sub-components (capital, T2, T3)?
  • What is the product's group/category?
Truly, the "basic" functionality of a materials list is pretty easy to put together.  Simply combine the "base materials" and "extra materials" queries and apply the appropriate math.  Most of the pain comes around T2 blueprints.  So much of the accounting for T2 is interdependent.  Also, since T2 BPOs exist, the attributes are a little screwy when trying to account for the two production paths.

My buddy Valkrr had a pretty decent xml/JSON tool that took care of all the little nuances, but lacked a standalone updater.  Since he has quit EVE for the foreseeable future, I was SOL for recreating his utility as the SDE's update.

BPO_Builder.py command-line script
You will need to download the script + scraper.ini to parse FuzzySteve's MySQL releases.  As of now it requires whatever SDE you want to scrape to be mounted locally.  There are no special options, it just dumps the result files for use elsewhere.  In a future release I'd like to internalize it to the app in question so it can refresh the values at launch time.

The Goal: A Formatted File Of BPO Data


The idea here is you could crawl along BPOs (or their products) and get all the data you'd care to know in one return.  Invention info, dependent builds, linked blueprints, math data... the only thing missing is required skills for the builds (filtered out).

As of this post, the script returns an XML file similarly formatted as above.  I am working to push out a JSON version for lighter-weight release.  The idea here is this script can be run once-per-release to give you a standardized view of all the BPO data you could need for a program.  I'm personally pushing this to replace my gawd awful Perl kit builder, that stored much of this data manually.

The Pain of the SDE

If you are completely lacking in SQL-fu, the SDE can be really obnoxious to traverse.  Thankfully, FuzzySteve is incredibly easy to get a hold of and is an immense help in those circumstances when you're just stuck.  He was instrumental in helping with the T1/T2 BPO mapping.  

The absolute worst part of dealing with the Data Dump is the few little pitfalls out there.  The Data Dump is littered with bugs, and they've been there for a very long time.  Some bugs I ran into:
  • Meta Level (dgmTypeAttributes.attributeID=633) is split between valueInt/valueFloat
    • fixed with COALESCE(dgmTypeAttributes.valueInt,dgmTypeAttributes.valueFloat,0)
    • Though metaLevel is never a float in-game, someone at CCP flubbed about 20% of the items into the wrong type
  • Capital T2 rigs (Anti-EM/Anti-Explosive armor rigs) report the wrong metaGroupID in invmetatypes
    • Reported bug
    • Added manual repair to my tool 
  • Mercoxit Mining Crystal I lacking a metaLevel (unlike other mining crystals)
    • Reported bug
    • Added manual repair

What's Next

This will enable two goals of mine.  First, to be able to further crunch scraper data to answer questions like "How much Tritanium was destroyed".  Second, to enable me to move away from spreadsheets to more sustainable apps.  I'd still like to do something to hook into google spreadsheets, only because of ease of sharing, but trying to build a similar tool for price searching is unsustainable.

Go ahead and give my script a whirl.  Tell me if there's anything you need added to it.  The dump should be quick and easy to use.  It's a bit larger than I expected, but slurping in XML/JSON should be pretty easy to handle.

Wednesday, December 4, 2013

Better Piloting Through Chemistry

With all the recent PVP I've been doing in Aideron Robotics, chiefly against Russians in Old Man Gang, we've been faced with a higher class of solo/small-gang PVPer than any of us is used to.  Chiefly, the kind that ALWAYS has fleet boosts, and employs pirate implant sets.  These have made for some very tough nuts to crack, and have made defending Heydieles a real challenge.

Our answer has been to respond, as much as we can, in kind.  I've pushed two of my booster characters into the system with a full suite of fleet boosters.  I've been keeping a steady stream of fitted ships on contract where pilots can quickly grab them down and get back in the fight.  With the recent tide of allies and a generous US holiday, we've been able to turn the system in our favor.  Aideron Robotics has recently taken away OMG's POCOs (#1, #2), and really stomped down Caldari challenge to the system.

This isn't enough, and we've been scrambling for more to answer OMG.  There's no way we're going to push pilots into pirate sets and expect them to beat OMG at their own game, but we can leverage Combat Boosters!

The big problem with boosters though is convincing people to try them.  With the really steep penalties for use, and pain of transport/sale, most pilots completely discount them.  Though, if they are properly utilized, Boosters can be a real force multiplier when used in the right roles/ships.

Unfortunately, there are no really good definitive guides on drug use.  Ripard Teg's Fit of the Week segment usually highlights individual boosters when they make sense in a fit, and there are some written guides explaining how they work in wiki wall-o-text fashion, but there isn't a great go-to guide for them.  To push boosters on our greener members, and keep them active in our FC's minds for utilization, we need something better.

Making Infographics

Recently, Aideron Robotics has been pushing a "Making a Better Pilot" series.  Similar to a lot of TEST/CFC propaganda, cute infographics to try and curb bad behaviors or illustrate less-intuitive piloting ideas.  



Since my moon mining flow chart was so well received, especially with the siphon additions, I figured this was a good chance to fill a need.  Also, I'm eyeing Booster production, but the market throughput is kind of anemic, so I figured we could kill a lot of birds with one stone here:  Increase demand in general, improve Aideron performance in PVP, line the industry wallet, and contribute something to the overall meta (which I have been neglecting for the last month or so).

How It's Made

For the graphically retarded, Google Drawing in Google Docs is an absolute life saver.  Pair the image dumps(link) with some text and a little graph magic and GIMP.  The biggest problem I ran into is I wanted a radial bar graph... and had no easy way to make one.  Seriously... why is this so difficult?

What I Wanted

What I Made

The hope was to have a bit of a gauge to illustrate the various grades + skills combinations.  The hope was to illustrate that, with the appropriate skills, the chances of incurring a truly unacceptable penalty was very low.  Unfortunately, explaining probability to the masses is always a frustrating endeavor. Though I think I illustrated the reality pretty decently by pairing my chart with a character sheet view.  


Released infographics after the break!

Tuesday, October 22, 2013

A Little Less Talk: EMD scraper v2

In my fervor to get at one subset of data, I wrote myself into a corner.  So, I spent this last weekend ripping out the inner workings of my pricefetch script and bringing it line with the style/stability of my zkb scraper.

Code at Github

This exercise was painful because I had to essentially start over and rework the entire tool from top to bottom.  This did give me the chance to clean up a lot of errors (data backfill was bugged all along), and now things are pretty and fast.  I still have the issue of "fast as you please, there's still xGB's to parse", but I think I've worked the tool down into a sweet spot for effort/speed.

I owe a lot of thanks to the recent progress to Valkrr and Lukas Rox.  Seeing as I am so painfully green with databases, they've been exceptionally helpful in cleaning up some of the pitfalls I've run into.

What Changed?

Where pricefetch was designed to grab everything from one region, EMD_scraper is designed to grab everything from everywhere.  To accomplish this I put in two modes for scraping:
  • --regionfast
  • --itemfast
These handles help define the method of scraping.  --regionfast will attempt to pull as many regions as possible, resulting in a one-item-per-call return.  --itemfast does the opposite, trying to pull as many items as possible, one region at a time.  Also, unlike zKB_scraper which goes in dictionary-order, regions have been placed in a "most relevant" configuration on this release.  Namely big hubs first, then HS, LS, Nullsec.  It still accepts smaller lists, and you can modify the lookup.json values to your heart's content as well.

This also necessitated some updates to the crash handler.  Crashes now dump the entire progress so far (region,item) and the script modifies the outgoing calls to skip region/item combinations already run.  I'd really like a more efficient crash/fetch routine, trying to get the full 10k returns each query... but I can't know the limits ahead of time with the current layouts.  I'll take 10k max with 5-7k avg returns rather than try to dynamically update the query.  EMD isn't designed to crawl like zKB.

I'm not wholly pleased about how --itemfast runs.  I may have to rewrite to crawl through all items in one region before moving onto the next.  It's currently blasting through a large number of items and increments region.  

Beautification

Coding on my own, I have this habit of scrawling down code/files willy-nilly until I can get a stable working midpoint.  Since my professional code habits stem from more time spent repairing code or tacking features onto an existing project, I lack a lot of intuition on building foundations.

Repository Maintenance

When I first created the Prosper repository (about a year ago now) I spent a good deal of time trying to create a monolithic DB scraper/builder.  With this second try, I wanted to split the tasks into finer pieces and make the code more independent.  If I could adopt a "First: make it run" mentality, I could at least get to a manageable midpoint with data, rather than burning a bunch of effort in crafting expert code.  This resulted in a lot of duplicated work, and I figured since the paradigm shifted so far, I might as well gut that original code and promote the new scripts to "DB_builder" status

I am banking all of my examples to a scraps directory, but I need to make sure I am adding them all to the repository.  Thankfully, I find myself ransacking those samples to help move the project forward.  Much of the zKB urllib2 code was previously written.  Also, many of the item lookup JSONs were pre-existing.

A tack on the TODO list though is to add more sample data dumps into the SQL portion of the repository.  I was avoiding tracking these to avoid making the repo too large, but as Valkrr pointed out, at least keeping the SQL scripts of common queries would be useful as examples.

Death to Global Variables

I had a good Samaritan swing by my code and point out that I should de-commit some globals, like db_username/db_password, and replace them with configuration scripts.  After a little back-and-forth, he was so gracious as to add the .ini handlers for me into the zkb script.

I figured it was a good time to add some extra functionality and roped those changes into a more complete set.  Now zKB and EMD scrapers both pull from the same .ini; as will any other outgoing scraper (EVE-Central, eveoffline?).  I'd like to compartmentalize internal and external scrapers to use different .ini files, but we'll see how long that continues.

Cleaner, Clearer Code

If you look at the previous version of the EMD_scraper, you'll see a lot of commented code around working code.  I left a lot of the trial-and-error in the first version.  I have since cleaned a lot of that out, leaving only some quick handles in there for debug printing.

I would like to take another pass at these scripts down the line to make very-pretty output, instead of the progress dumping to the command line.  This is purely cosmetic though, so expect the priority to be extremely low.

SQL-Fu

I seriously underestimated how much trouble data warehousing would be.  I have spent a lot of time over the last week trying to understand where I am going wrong and what steps I am missing.

Steps so far:
  • Reduce DB size by reducing strings
    • Removed itemname from priceDB
  • Design the DB to have the data, use queries to make the form
    • Abandoned "binning" directly from zKB data
    • Instead save by system, binning can be handled in a second-pass method
  • OPTIMIZE TABLE is your friend
  • CUSTOM INDEX's for common queries: added some, need to read more
  • CCP and NVIDIA are sloppy with their previous patch cleanup:
    • Check C:\Program Files\CCP\EVE\
    • Check C:\nvidia\ 
  • mySQL is a hog

JOIN and SUM(IF(..)): two flavors that don't go well together

One bug I mentioned is that some of my queries are returning hilariously high values.  On my Neutron Blaster Cannon II experiment, the raw numbers were 10x what were in the DB.  When Powers, from the #tweetfleet asked for freighter data, I was returning something like 138x.  It seems I have been confused about order of operations in SQL.  

This is why I really want to get the "bridge" scripts done so I can just splice together the tables I want to have all the data I need.  Since the data is local, rescraping should be mostly trivial, and it would give me data stores in the shapes I need to move onto the next step of the machine.

Tuesday, October 15, 2013

Objective Complete: zKB Data Get

3.75M Kills parsed (2013 so far)
17.5M Entries
40hr estimated parsing time

Frigate 905,329
Cruiser 315,493
Battleship 77,617
Industrial 81,642
Capsule 929,041
Titan 25
Shuttle 41,814
Rookie ship 246,308
Assault Frigate 110,147
Heavy Assault Cruiser 30,583
Deep Space Transport 2,421
Combat Battlecruiser 170,480
Destroyer 332,165
Mining Barge 54,804
Dreadnought 2,218
Freighter 1,960
Command Ship 6,340
Interdictor 32,956
Exhumer 24,032
Carrier 4,873
Supercarrier 113
Covert Ops 39,401
Interceptor 52,546
Logistics 15,082
Force Recon Ship 24,132
Stealth Bomber 97,226
Capital Industrial Ship 247
Electronic Attack Ship 5,940
Heavy Interdiction Cruiser 3,961
Black Ops 1,129
Marauder 1,375
Jump Freighter 861
Combat Recon Ship 8,368
Industrial Command Ship 2,986
Strategic Cruiser 32,309
Prototype Exploration Ship 265
Attack Battlecruiser 82,652
Blockade Runner 11,583


Remaining To-Do

  1. Investigate count bug
    • Initial dump is 10x expected values on items?
  2. Finish "prettying" for release
  3. Update pricefetch to scrape all regions for full market picture
  4. Find a way to maintain/release .sql dump of data generated
  5. mySQL optimization and "bridge" scripts for smaller passes

Progress So Far

I have to thank a bunch of people for helping me get to this point where I have at least a passable crawler and data set to munch on.  I would like to get EVE-Central's dumps processed before moving onto the data science step, but we will see what happens.

Extra special thanks to:
I still have a lot of work to go between "working" and "good", but being able to stand upright and get my hands on this data is exceptionally awesome.

Finally, I can put together data like this: