Showing posts with label python packages. Show all posts
Showing posts with label python packages. Show all posts

Monday, April 17, 2017

Aspiring Hari Seldon - Part 3 - Releasing REST Forecasts

Check out part 2

After playing with Prophet, I wanted a way to distribute the data more publicly.  I could have just incorporated it into our R templates and made it part of the show, but I felt like that was unfair to the general public.

(Though it required a ground-up rework) I used the Flask Restful app we were providing for EVE Mogul.  Though this turned out to be a complete rewrite, I learned a lot about Flask and testing to make things go.

What They Don't Tell You

One of the biggest pains of self-teaching is it's very easy to learn enough to be useful, but not enough to be good.  And the Flask documentation is particularly bad about giving up the barebones of what does what, but totally forgets to sell a viable project shape, leading to a lot of pain later.

Since the original OHLC feed was based loosely off a work project (as a way to test out corners on my own time), it repeated a design flaw.  Turns out there's a very particular Flask shape, and deviating from it causes a lot of problems.  

Some examples to avoid pitfalls:
By using the prescribed Flask structure, all the pieces from launcher to test make a lot more sense.  Particularly, testing was one of the more difficult pieces of the picture.

About The Endpoint

Because of the CPU-intensive operation of running predictions, I wanted to incorporate two features:
  • API keying
  • Caching
Though I'm happy to share the source, and open up to the community, I'm not ready for full-hog release.  TL;DR: There are a couple of Flask eccentricities, and gdoc integration could cause some serious issues.  Also, uncached performance can run north of 15s, which could cause issues for some platforms.

To help get around "walled garden" accusations, I've done a few things:
  1. I've left copious notes and automation on how to deploy the service on your own webhost
  2. I've provided API keys to some other market devs such as EVE Mogul and Adam4EVE
  3. I am happy to distribute keys on-request to other devs
The goal is to get the content out to the widest audience possible, even if the raw data is a little unwieldy.  And due to my limitations as a developer, this is my compromise.  

Predictions In The Wild

Adam4EVE



Though my API service is designed to give you soup-to-nuts everything you need to plot in the REST payload, Ethan02 over at Adam4EVE added his own DB to keep us honest.  I totally love this!  As of right now, it's still in their DEV branch, but expect to see more from them soon!

EVE-Mogul


Extending the existing OHLC candlestick plot, EVE Mogul will let you keep close tabs on what you're currently trading, and this is an excellent chance to gut-check your investments.

Conclusions

Check out the source: ProsperAPI

I didn't get to share all the other super-nerdy #devfleet stuff (like travis-ci integration, or PyPI release).  I will probably try to release more notes on python stuff directly on Medium going forward.

This was an eye opening project in a lot of ways.  This opens the door for more micro-service REST stuff in the future.  Also, I do plan to have the PLEX split covered before CCP releases it on May 9th.

Thursday, March 2, 2017

Aspiring Hari Seldon - Part 2

Been quite a while, but I have a follow up to this old post.

A Crash Course

Tinkering with prices is difficult, and most players may not understand why.  Though we all interact with the price of things, unspinning the mess of how and why becomes complicated fast.  

What's worse, the price of a thing doesn't follow traditional fitting tools, it's a collection of ups-and-downs day-to-day.  It's a solution to a complicated network of factors.  This is why I've had so much trouble designing forecasts; because starting point is critical, and the walking methods aren't strictly obvious.

After dancing around this problem for as long as I have, I've come up with a few best-practices for approaching Prosper's economic reporting:
  1. Make it normal or linear: hard math is hard, keep analysis as simple as possible
  2. Figure out connections: storytelling can quickly connect seemingly disparate points
  3. Assume players aren't dumb: everyone is trying to win
  4. Look for disproof: try to eliminate weaknesses and errors in tools

Prophet - A New Crack At Forecasting

My boss sent me a link to Prophet.  And, of course, I threw EVE data at it!
On the one hand, that PLEX prediction is pretty hilarious, while on the other, the injector forecast is very close to one I'd publish on the show.

So, what is going on here?  I still think we're running into the issue of putting linear-style modeling on a non-linear problem, but we're getting much closer to the gut-version I would like to illustrate on the show.  

What's more, we can use a previous lesson and use the forecaster to predict the day-to-day volatility as a second opinion.  Using a GBM-style method, we get this:
Getting a second opinion in this case is a good way to counteract the problem of runaway models; forecasts that get stuck in a runaway up or down swing.  Though it's not fool-proof, we're reaching a "good enough" level to actually start considering it in our tooling.

Let's Go!


As promising as these initial findings are, I still worry they aren't a good replacement for more robust methods.  I'm still investigating the following:
  1. Single-dimension: only using price data, not including volume for supply/demand accounting
  2. Limited test scope: only run on CREST history data, not extended history data yet
  3. Time series rigidity: designed for daily data
  4. No extended grading done: need to test predictions vs reality
But the future is promising.  There are some key features I'm loving in this library:
  • Built-in changepoint trending: predicting discontinuities is very powerful
  • Python/R library parity: easy development/testing
  • Built-in week/month/season accounting
  • EXTREMELY FAST
There's so much more to play with, and I'm excited to dive deeper.  I'm looking into including these forecasts sparingly into the Prosper show going forward, as a better way to illustrate my gut feelings.  And hopefully we can incorporate these forecasts in a more robust feature in the near future!
TypeNames redacted while debugging


Subscribe on Twitch and YouTube

Thursday, October 27, 2016

Favorite Python Packages 01 - Making a chatbot

I had to write a logging handler for work that pushed errors up to HipChat. Turns out the process was so easy, I could not resist adding a chat-handler to ProsperCommon (esp given my hacky email handler). Despite my love for Slack, Discord became the tool-of-choice because it’s easy to stand-up/tear-down chats with a lot of flexibility. I also skipped Slack for now because the tweetfleet server blows past the 10k message buffer on a daily basis.

So, let's cook up a chatbot! Discord's API offerings are dizzying; this should be easy!

Discord.py - Making Chatbots Easy

Since Discord relies on an oAuth2 connection, and chats are inherently asynchronous, cooking up a bot from scratch would hurt. Discord.py to the rescue! This library has exceptional API coverage and is easy to use.

My one gripe is the documentation. Docs are sparse in places, but I'll forgive that sin with their example code and an active community on the Discord API Guild. Also, I had some trouble getting off the ground with the Discord API docs. Specifically, getting the correct tokens required to work, but once the bot was authenticated, it was off to the races!


TinyDB - The Easy Object-store

Pinging the internet for data is not free; whether because of rate limits or round trip times. Tools like SQLite are great for lightweight/portable data storage, but also requires schema design. MongoDB is a powerful noSQL solution, but is heavy to stand up (and I'm not in love with the query language). TinyDB comes to the rescue as a way to get the JSON/noSQL storage of MongoDB with none of the server/auth standup.

This shines when paired with REST endpoints. It's easy to push/pop entries around and keep the same raw JSON in archive as what's coming from the endpoint. Also, it's as easy as JSON to add more keys for searching. I'm still not in love with my cache-timer implementation in ProsperBot, but fetching from cache is 100x faster than an internet-call. Lastly, debug is easy since output is raw JSON, though this could lead to compression issues down the line.


Quick pro-tip about TinyDB: get ujson. This pure-C implementation of the JSON library is a great drop-in replacement. It can also be baked into libraries like Requests. ujson makes handling JSON lightning fast! Also, TinyDB has a wide array of extensions, and I will be looking into MongoDB hooks at a future date.

NLTK - Processing Text Made Easy

The number one problem I have with stock quotes: it takes 2-3 extra clicks to figure out WHY the price moved for the day. Google/Yahoo/etc provide great single-stock pages that give news summaries, but when you open a ticker or phone widget, only the raw numbers are reported. If I'm going to make a quote bot, why not include some information and save people a search?

The good news, Google/Yahoo both give a by-ticker API of relevant news articles. The bad news, they yield 10-15 articles in the query. Furthermore, the data isn't particularly ranked/scored from the source. I could have gambled with first-article being the best, or stacked a publisher priority order, but all I wanted was:
Good news when the stock is up.  Bad news when the stock is down
NLTK to the rescue. I have wanted to try my hand at sentiment/language analysis since I saw a local talk on Analyzing P2P Lending Data. Putting headlines through the vader_lexicon tools did exactly what I wanted and was blazing fast.

After playing with this quick demo of NLTK, I'm excited to expand this toolset.  If I can find the time, I'd very much like to write up a new discord bot for grading a community and highlighting troublemakers statistically rather than bluntly using block lists and word black-lists.

Let's See It!


I'm going to save the "how to get [stock] data" question for another blog.  There's a wide world of API's and support out there, and digging into them is worth a whole blog.  For the impatient, I used these two articles as a springboard to get started:
Though designing the bot language may require some creative design for EVE topics, standing up the bot should be easy.  I've been able to add functions at a uniquely fast pace (0.5-1d/feature) and standing up the whole bot took just a few evenings once I got through the roadblocks.  The libraries above are excellent tools to have in your tool box, and I'm excited to dig deeper into their functionality beyond the small `hello world` functions written so far!