Thursday, October 27, 2016

Favorite Python Packages 01 - Making a chatbot

I had to write a logging handler for work that pushed errors up to HipChat. Turns out the process was so easy, I could not resist adding a chat-handler to ProsperCommon (esp given my hacky email handler). Despite my love for Slack, Discord became the tool-of-choice because it’s easy to stand-up/tear-down chats with a lot of flexibility. I also skipped Slack for now because the tweetfleet server blows past the 10k message buffer on a daily basis.

So, let's cook up a chatbot! Discord's API offerings are dizzying; this should be easy! - Making Chatbots Easy

Since Discord relies on an oAuth2 connection, and chats are inherently asynchronous, cooking up a bot from scratch would hurt. to the rescue! This library has exceptional API coverage and is easy to use.

My one gripe is the documentation. Docs are sparse in places, but I'll forgive that sin with their example code and an active community on the Discord API Guild. Also, I had some trouble getting off the ground with the Discord API docs. Specifically, getting the correct tokens required to work, but once the bot was authenticated, it was off to the races!

TinyDB - The Easy Object-store

Pinging the internet for data is not free; whether because of rate limits or round trip times. Tools like SQLite are great for lightweight/portable data storage, but also requires schema design. MongoDB is a powerful noSQL solution, but is heavy to stand up (and I'm not in love with the query language). TinyDB comes to the rescue as a way to get the JSON/noSQL storage of MongoDB with none of the server/auth standup.

This shines when paired with REST endpoints. It's easy to push/pop entries around and keep the same raw JSON in archive as what's coming from the endpoint. Also, it's as easy as JSON to add more keys for searching. I'm still not in love with my cache-timer implementation in ProsperBot, but fetching from cache is 100x faster than an internet-call. Lastly, debug is easy since output is raw JSON, though this could lead to compression issues down the line.

Quick pro-tip about TinyDB: get ujson. This pure-C implementation of the JSON library is a great drop-in replacement. It can also be baked into libraries like Requests. ujson makes handling JSON lightning fast! Also, TinyDB has a wide array of extensions, and I will be looking into MongoDB hooks at a future date.

NLTK - Processing Text Made Easy

The number one problem I have with stock quotes: it takes 2-3 extra clicks to figure out WHY the price moved for the day. Google/Yahoo/etc provide great single-stock pages that give news summaries, but when you open a ticker or phone widget, only the raw numbers are reported. If I'm going to make a quote bot, why not include some information and save people a search?

The good news, Google/Yahoo both give a by-ticker API of relevant news articles. The bad news, they yield 10-15 articles in the query. Furthermore, the data isn't particularly ranked/scored from the source. I could have gambled with first-article being the best, or stacked a publisher priority order, but all I wanted was:
Good news when the stock is up.  Bad news when the stock is down
NLTK to the rescue. I have wanted to try my hand at sentiment/language analysis since I saw a local talk on Analyzing P2P Lending Data. Putting headlines through the vader_lexicon tools did exactly what I wanted and was blazing fast.

After playing with this quick demo of NLTK, I'm excited to expand this toolset.  If I can find the time, I'd very much like to write up a new discord bot for grading a community and highlighting troublemakers statistically rather than bluntly using block lists and word black-lists.

Let's See It!

I'm going to save the "how to get [stock] data" question for another blog.  There's a wide world of API's and support out there, and digging into them is worth a whole blog.  For the impatient, I used these two articles as a springboard to get started:
Though designing the bot language may require some creative design for EVE topics, standing up the bot should be easy.  I've been able to add functions at a uniquely fast pace (0.5-1d/feature) and standing up the whole bot took just a few evenings once I got through the roadblocks.  The libraries above are excellent tools to have in your tool box, and I'm excited to dig deeper into their functionality beyond the small `hello world` functions written so far!