Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Monday, April 17, 2017

Aspiring Hari Seldon - Part 3 - Releasing REST Forecasts

Check out part 2

After playing with Prophet, I wanted a way to distribute the data more publicly.  I could have just incorporated it into our R templates and made it part of the show, but I felt like that was unfair to the general public.

(Though it required a ground-up rework) I used the Flask Restful app we were providing for EVE Mogul.  Though this turned out to be a complete rewrite, I learned a lot about Flask and testing to make things go.

What They Don't Tell You

One of the biggest pains of self-teaching is it's very easy to learn enough to be useful, but not enough to be good.  And the Flask documentation is particularly bad about giving up the barebones of what does what, but totally forgets to sell a viable project shape, leading to a lot of pain later.

Since the original OHLC feed was based loosely off a work project (as a way to test out corners on my own time), it repeated a design flaw.  Turns out there's a very particular Flask shape, and deviating from it causes a lot of problems.  

Some examples to avoid pitfalls:
By using the prescribed Flask structure, all the pieces from launcher to test make a lot more sense.  Particularly, testing was one of the more difficult pieces of the picture.

About The Endpoint

Because of the CPU-intensive operation of running predictions, I wanted to incorporate two features:
  • API keying
  • Caching
Though I'm happy to share the source, and open up to the community, I'm not ready for full-hog release.  TL;DR: There are a couple of Flask eccentricities, and gdoc integration could cause some serious issues.  Also, uncached performance can run north of 15s, which could cause issues for some platforms.

To help get around "walled garden" accusations, I've done a few things:
  1. I've left copious notes and automation on how to deploy the service on your own webhost
  2. I've provided API keys to some other market devs such as EVE Mogul and Adam4EVE
  3. I am happy to distribute keys on-request to other devs
The goal is to get the content out to the widest audience possible, even if the raw data is a little unwieldy.  And due to my limitations as a developer, this is my compromise.  

Predictions In The Wild

Adam4EVE



Though my API service is designed to give you soup-to-nuts everything you need to plot in the REST payload, Ethan02 over at Adam4EVE added his own DB to keep us honest.  I totally love this!  As of right now, it's still in their DEV branch, but expect to see more from them soon!

EVE-Mogul


Extending the existing OHLC candlestick plot, EVE Mogul will let you keep close tabs on what you're currently trading, and this is an excellent chance to gut-check your investments.

Conclusions

Check out the source: ProsperAPI

I didn't get to share all the other super-nerdy #devfleet stuff (like travis-ci integration, or PyPI release).  I will probably try to release more notes on python stuff directly on Medium going forward.

This was an eye opening project in a lot of ways.  This opens the door for more micro-service REST stuff in the future.  Also, I do plan to have the PLEX split covered before CCP releases it on May 9th.

Tuesday, October 18, 2016

Up And Down - Maintaining a OHLC Endpoint and Deploying Flask Restful


This is the first part of a more technical devblog. I will be writing up more specifics in a part 2, but I wanted to talk about the ups and downs behind the scenes with our EVE Mogul partnership. Issues are mostly my failings, and Jeronica, Randomboy50, and the rest of the team have been amazing given my shoddy uptime.

Prosper's OHLC Feed

I forgot to blog about this since the plans for Prosper's v2 codebase have only recently solidified, but we have a CREST markethistory -> OHLC feed hosted at eveprosper.com. The purpose was to run Flask/REST through its paces, but Jeronica over at EVE Mogul whipped up a front-end and Roeden at Neocom has been using it in their trading forays.

SSO Login Required To View

This originally served me well as a learning experience, but keeping a REST endpoint up isn't as simple as originally expected. From Flask's lack of out-of-the-box multithread support, to some more linux FUBAR's below, it's been a wild ride. And now that players are legitimately counting on this resource as part of their toolchain, I figured it's time to get my act together.

The Litany of SNAFUs

What really brought the house of cards down was our move from a traditional hosting service to a full r/homelab solution. Prosper has been living besides some other nerd projects (minecraft, arma, mumble, etc) and this move gets Prosper off the shitlist from the other customers when Wednesday night rolls around and I hammer box generating the show's plots. Unfortunately, for the added performance, we trade being under a benevolent tinkerer; restarts and reconfigs are more common than before. It's a huge upgrade, and I can't thank Randomboy50 enough for the support, but nothing is truly free (except the minerals you mine yourself™).

#nofilter #bareisbeautiful


This need for stability runs headlong into a shitty part of python: package deployment. Though wheeling up and distributing individual python libraries is easy, deploying python as a service is not. There will be a second blog on the specifics, but you're largely stuck with magic-project-deploying scripting out of the box, which can get really hairy if you're not careful about virtualenvs.

Thankfully, work turned me on to dh-virtualenv and though now we're grossly overengineered with a service .deb installer, we now have a properly deployed linux service that should be far more robust going forward. It does mean that there's now "build" and "deploy" steps for updates, but now that we're tied into systemctl the endpoint should be much less likely to go down.

With the last few months of work, I still expect a large amount of reengineering in our quest for a Quandl-like EVE service, but with the installer built in we can upkeep the endpoint with a lot less effort going forward. We are still behind on the ProsperWarehouse rollout, getting scrapers rewritten, but those modules should be a cakewalk to deploy now with ProsperAPI properly built up.

Also, I've worked in a discord logging handler which will be useful for monitoring, but notes on that later ;)

Friday, September 23, 2016

ProsperWarehouse - Building Less-Bad Python Code

EVE Prosper is first and foremost a data science project.  And though hack-and-slash has got us this far, we need to consider a proper design/environment if we want to actually expand coverage rather than just chase R/CREST/SQL bugs.


There has been some work moving Prosper to a v2 codebase (follow the new github projects here) but ProsperWarehouse is a big step toward that design.  This interface should allow us to open up a whole new field of projects, so it's critical nail this design on the first-pass before moving on.

What The Hell Is This Even
Building a Database Abstraction Layer (DAL).  

Up to now we have used ODBC, but there are some issues with cross-platform deployment, and database-specific weirdness that have caused issues: such as ARM and MacOS support being painful.  Furthermore, relying only on ODBC means we aren't able to integrate non-SQL sources like MongoDB or Influx into our stack without rewriting huge chunks of code.  Lastly, we have relied on raw-SQL and string-hacks sprinkled all over the original codebase, making updates a nightmare.

There are two goals of this project:
  1. Reduce complexity for other apps by giving standardized get/put methods.
  2. Allow easier conversion of datastore technologies.  Change connection without changing behavior
By adopting Pandas as the actual data-transporter, this means everything can talk the same talk and move data around with very little effort.  Though some complexity will come from cramming noSQL style data into traditional dataframes, that complexity can be abstracted under the hood and always yield the same structures when prompted.

How Does It Work?
The Magic of Abstract Methods

I've never been a great object-oriented developer, and I've been especially weak with parent/children relationships.  Recent projects at work have taught me some better tenants of API design and implementation, and I wanted to apply those lessons somewhere personal.  



Database Layer

Holds generic information about the connection; esentially the bulk of the API skeleton.  Whatever Database() defines will need to be filled in by its children.  This container doesn't do much work, but acts as the structure for the whole project under the hood.

Technology Layer

Right now, that's only SQLTable(), but this is designed to hold/init all the technology-specific weirdness.  Connections, query lingo, test infrastructure, configurations.  This is supposed to be interchangeable so you could pull out the SQLTable and replace it with a MongoDB- or Influx-specific structure.  This isn't 100% foolproof with some of the test hooks the way they are built in right now, but by standardizing input/output, conversion shouldn't be a catastrophe.

Datasource Layer

A connection-per-resource is the goal going forward.  This means we give up JOIN functionality inside SQL, but gain an easier to manage resource that can be abstracted.  All of the validation, connection setup/testing, and any special-snowflake modifications go to this layer.  Also, because these have been broken out into their own py files, debug tests can be built into __main__ as a way for humans to actually fix problems without having to rely on shoddy debug/logging.

This adds a lot of overhead for initializing a new datasource.  In return for that effort we get the ability to test/use/change those connections as needed rather than going up a layer and fixing everything that connected to that source.  It's not free, but should be a cost-benefit for faster development down the line.

Importlib Magic

The real heavy lifter for the project isn't just the API object design, but a helper that turns an ugly set of imports/inits into a far simpler fetch_data_source() call.  I would really like to dedicate a blog to this, but TL;DR: importlib lets us interact with structures more like function-pointers.  This was useful for a work project because we could execute modules by string rather than using a "main.py" structure that would need to import/execute every module in-sequence.  This should make it so you just have to import one module and get all the dependent structure automagically.

Without importlib, every datasource would have to be imported like:



Instead now it can look like


A small change, but it should clean up overhead and allow for more sources to be loaded more easily.  Also, this does mean you could fork the repo and build your own table_config path without going crazy trying to path everything.

A Lot Of Work For What Exactly?

The point is to simplify access into the databases.  With a unified design there, we can very easily lay the groundwork for a Quandl-like REST API.  Also, with the query logic simplified/unified, writing direct apps to both fetch/process the data go from 100+ lines of SQL to 2-3 lines of connection.  

By abstracting a painful piece of the puzzle, this should make collaboration easier.  This also buys us the ability to use a local-only dummy sources for testing without production data, so collaborators can run in a "headless mode".  Though I doubt I will get much assistance on updating the Warehouse code, it's a price worth paying to solve some of the more tedious issues like new cron scripts or REST API design with less arduous SQL-injection risk/test.