Thursday, February 4, 2016

How I Data Science - Hunting For Trends

Long time friend Blake at k162space.com has been pumping out some really exciting work around NPC-kill rates.  He recently poked me about taking that raw data forward to something with more teeth, and after futzing with it for about an hour, came out with some interesting data.

His inquiry reminded me of a common question I receive: How do I get into data science?  So, this blog is going to be a bit more long/technical than the recent fare.  We're going to walk step-by-step through the investigation process and I'll try to illustrate what I see as we go along.  The readouts were generated using JMP, only because it's faster to use than R.  This entire process can be done in R, and I can revisit with more specific R samples if its requested.

Let's take a look:

Getting Started

Raw Data

I like the Forecaster's Toolbox as a jumping off point.  We're looking for a few things when we start:

  • Basic visualizations: look for obvious trends
    • Simple time-series graph shows no obvious correlation
    • Simple scatter plots in case there's something obvious
  • Check scales: linear vs log vs sqrt
    • SUM_factionKills does not have much variance
    • log/sqrt price doesn't seem like a good idea
  • Skew data
    • +/- a few days to look for lead/lag 
  • Other useful indicators


This gives us some basic trends to start comparing.  Nothing is jumping out from the data at this point yet.

Scatter Plot


What we really want is a NPC-kills vs price correlation.  With that kind of relationship, we can basically automate investments off a single number.

At first crack, we're really not seeing anything.  The vertical spread, especially around the median factionKills value (~780k/day) is showing no viable trend to really predict with.  At best, we're seeing that variation/volatility is highest on median days, but that doesn't give us anything meaningful to work with in finding a trend to leverage.  Only prediction on which days might complete orders, not what price we can expect to get.

Looking at the skewed data (+1 to +5 days) there are some better clustering, but not better trending.  There's still a very obvious failure of the vertical line test which make it very hard to find a X->Y trend.  Also, some of the price outliers that were obvious in the first time-series graph are really showing to be problematic in this clustering view.

Back to the Drawing Board

As I have said in previous dev blogs, I really like if we can find a normal-shaped trend.  Even if we can find a correlation, it will be really hard to use effectively if it's not either linear OR normal.  Working outside those bounds gets difficult fast, so let's try to get back to the sorts of things we know well.


Thankfully, SUM_factionKills is reasonably normal.  But as we have discussed previously, prices really aren't.  But, deviation/volatility are normal-shaped trends.  This is starting to look like something we can at least statistically flag on, even if a linear relationship might be out of reach.

Now that we've clipped out the high-flier, and zoomed in on just the Machariel, things are looking a lot more useful.  Though the price/5d avg trends are essentially random, the deviation trend is looking far more linear.  This is extremely promising.

What I See

This preliminary result confirms a baseline assumption:
Higher ratting counts will lead to more NPC drops hitting the market and increased supply will drive down the price.
With the little bit of data, we can see a pretty strong correlation between deviation from the 5-day trend and total NPC kills in Angel space.  Now, that isn't to say we've "solved" the system yet, there are still a lot of troubling points:

  • The sample size is just barely big enough to work with.  
    • Don't like declaring trends without at least 60d of data to back them up
  • There are still some troubling fliers
    • Though the low and high ends of the graph are telling, there's some points around 800k-850k that make me slightly worried.
  •  Deviation/Volatility should be 0-centered.  
    • In a local period of decline.  Without positive swings, it's hard to confirm the "less ratting = higher prices" part of the equation
  • Flavor of the Month (FOTM)
    • Though the Machariel has been traditionally popular, it's easy to miss forest for trees with other indicators such as total sell volumes and other activity metrics
This is an extremely interesting first result out of the data at hand.  Though there are still plenty of points to be cautious about, this is enough confirmation to keep digging and collecting data.  Also, this being a derivative trend, I worry about leveraging it directly without a second signal to back it up.

Also, just to show the entire picture, we might need to include a fit-quality metric as a go/no-go boundary.  Where the Machariel/Dramiel are traditionally popular, the Cynabal isn't as strong.


Specifically troubling is the Dramiel graph which shows the reverse correlation we'd expect.  This could be a signal showing more about the demand driving the price of things more than strictly the supply.  Again, the best approach will probably be multi-factor, but this is a very interesting step toward something.  Paired with a market-side predictor, this could be a very useful second-source to validate against, or as a means to seed forecasts for items that aren't directly manufactured.

Also, I try very hard to test both positive and negative cases.  It's easy to accept when a model shows promise, and hard to accept where it might fail.  The second thing I always do in these kind of searches is try to find a case that breaks the tool, and understand why.  This is why I'm not particularly a fan of things like MACD, where it feels like 50/50 shot on whether the signal is true or not.  Even more so with candlestick reading.

Regardless, the NPC Kill rates are a very interesting trendline that I look forward to messing with more.  At the absolute least, there are still interesting things to be said about where players are spending their time, and there are still a lot of trends left to pick out of this data set.

38 comments:

Unknown said...

Way over my head and insanely interesting as usual.

Is there an advantage to using JMP over R other than speed? R seems more powerful to me.

DataScience Specialist said...

I will be interested in more similar topics. i see you got really very useful topics , i will be always checking your blog thanks
Data Science Course in Bangalore

DataScience Specialist said...

This is very educational content and written well for a change. It's nice to see that some people still understand how to write a quality post!
Data Science Training in Bangalore

Tech Institute said...

Excellent blog with valuable information thank for sharing.
Data Science Course in Hyderabad 360DigiTMG

360digiTMG Training said...

Hi! This is my first visit to your blog! We are a team of volunteers and new initiatives in the same niche. Blog gave us useful information to work. You have done an amazing job!
Best Digital Marketing Institute in Hyderabad

Global Tech Council said...

Thanks for the detailed blog.The blog consist of informational content about the topic.I really appreciate your blog post.You may also visit to the
Global Tech Council to get the best deal.

Visit- online data science certification courses

360digiTMG Training said...

I don t have the time at the moment to fully read your site but I have bookmarked it and also add your RSS feeds. I will be back in a day or two. thanks for a great site.
Best Institute for Data Science in Hyderabad

Ravi said...

Very Good Post. Thanks for sharing a useful info. I would also suggest for Data Science course with Real time experience, visit: https://socialprachar.com/data-science-training-in-bengaluru/

Babit said...

Thanks for sharing the such information with us.
Data Analyst Course in Pune

Babit said...

Really I feel happy to see this useful blog, Thanks for sharing such a nice blog.
Data Science Certification

data scientist course said...

I see some amazingly important and kept up to length of your strength searching for in your on the site
data scientist course in hyderabad

Anonymous said...

Informative blog, thanks for posting.
digital marketing video course

Unknown said...

Thanks for posting this useful information.
Visit us: Business Analytics Course in Dombivli

sandeep said...

Nice info! blog has all the details related to data science which i found helpful and i hope others also find it helpful for them.

Also visit us: "Business Analytics Course Training in Chandigarh
"

Mohanraj A said...

Extremely helpful post, thanks for giving this wonderful article.
Visit us: Data Science Course in Rourkela

traininginstitute said...

This was not just great in fact this was really perfect your talent in writing was great.
business analytics course

Tejas Thakkar said...

Thank you for information

Data Analytics

Ramesh Sampangi said...

Learn to master Data Science in real-time by doing hands-on exercises on real-time data science projects with the Data Science Training in Hyderabad program by AI Patasala.
Data Science Training Hyderabad

traininginstitute said...

This is really very nice post you shared, i like the post, thanks for sharing..
data scientist course in malaysia

Nirmala Mary said...

I read this blog, Nice article...Thanks for sharing and waiting for the next...
devops tutorial
devops for beginners

Maneesha said...

Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing
data science course in hyderabad

Ramesh Sampangi said...

Nice information. Very useful to all. I am satisfied with your site. Keep sharing more stuff like this. Thanks for sharing this blog with us.
Data Science Training in Hyderabad
Data Science Course in Hyderabad

360DigiTMG said...

This is a smart blog. I mean it. You have so much knowledge about this issue, and so much passion. You also know how to make people rally behind it, obviously from the responses.
best data science training in hyderabad

Ramesh Sampangi said...

Thanks for sharing this blog with us. Really informative and knowledgeable content to all. Keep up this work in further blogs.
Data Science Training in Hyderabad

Akshat said...

https://eve-prosper.blogspot.com/2016/02/how-i-data-science-hunting-for-trends.html?showComment=1645772726790#c3163268608304538148

data science bangalore said...


I at long last discovered incredible post here.I will get back here. I just added your blog to my bookmark locales. thanks.Quality presents is the urgent on welcome the guests to visit the website page, that is the thing that this site page is giving.data analytics course in rohtak

patna said...

It is the superset of data mining in which data is collected. It is then cleansed with the help of statistical algorithms to transform it into a model that can efficiently represent data.

Career Academic institute said...

Simple Linear Regression is a logistic method used to find out the relation between a single input variable and an output variable when both variables are continuous. To learn more about Simple Linear Regression start your Data Science course today with 360DigiTMG.

Data Science in Bangalore

Career Program and Skill Development said...

Data Science has understood the necessity of every scholar and ensure that every scholar gets an unmatched studying experience for the lifetime.


Best Data Science Training institute in Bangalore

BORIVALI said...

You should get certification in the relevant courses if you need to be considered for recruiting data experts.data science training in borivali

Professional Career Technology said...

Enroll in the Data Science course near me to learn the handling of huge amounts of data by analyzing it with the help of analytical tools. This field offers ample job profiles to work as a Data Architect, Data Administrator, Data Analyst, Business Analyst, Data Manager, and BI Manager. Step into an exciting career in the field of Data Science and achieve great heights by acquiring the right knowledge and skills to formulate solutions to business problems.

Data Analytics Course in Calicut

Career Program and Skill Development said...

360DigiTMG offers the best Data Science certification course in the market with placement assistance. Get trained by IIT, IIM, and ISB alumni.

Data Science Training in Jodhpur

Career Programs Excellence said...

Advance your technical skills required to crack huge datasets to bring out new possibilities from data. Join the Data Science institutes in Bangalore and get access to top industry trainers, LMS, live projects, assignments, and mock interviews to skyrocket your career in the ever- evolving field of Data Science.

Data Scientist Course in Bangalore

Learning Skill Opertunity said...

Boost your professional reputation with a surefire way to pick up some impressive new skills in data science by registering for the Data science courses near me. Learn to collect, clean, and analyze data with tools like Hadoop and Spark. Learn to develop algorithms and build models in machine learning to optimize product performance and gross profit for your organization. Become an expert in techniques like Data Mining, Data Cleansing, and Data Exploring that help refine data, making it possible to present it in an understandable format.

Data Science Course Fees in Bangalore

360digitmgmalaysia said...

This is the most amazing blog I have ever come across. Not only did I find it interesting and fast-paced, but it also motivated and encouraged me to build a successful career and take the right steps in the right direction. Taking a data analytics course will help me gain knowledge of theoretical concepts and hands-on exposure to the data science industry. 360DigiTMG teaches students courses in business analytics, data analytics, and data science and helps them get placed in top companies based on their merits and skills. The detailed information and details that have been posted here will be helpful to the readers especially the aspirants.
iot certification courses

iteducationcentre said...

That was really useful and informative blog.
artificial intelligence course in Pune

iteducationcentre said...
This comment has been removed by the author.
SAii said...

Impressive breakdown of data science methods applied to gaming trends, offering valuable insights into NPC-kill rates and market dynamics.embedded systems course in hyderabad

Post a Comment