[three]Bean
statscache - thoughts on a new Fedora Infrastructure backend service
May 26, 2015 | categories: fedora, datagrepper, statscache View CommentsWe've been working on a new backend service called statscache. It's not even close to done, but it has gotten to the point that it deserves an introduction.
A little preamble: the Fedora Infrastructure team runs ~40 web services. Community ongoings are channeled through irc meetings (for which we have a bot), wiki pages (which we host), and more. Packages are built in our buildsystem. QA feedback goes through lots of channels, but particularly the updates system. There are lots of systems with many heterogenous interfaces. About three years ago, we started linking them all together with a common message bus on the backend and this has granted us a handful of advantages. One of them is that we now have a common history for all Fedora development activity (and to a lesser extent, community activity).
There is a web interface to this common history called datagrepper. Here are some example queries to get the feel for it:
- all messages from the last five minutes
- all karma commands from IRC
- all messages about the package firefox
On top of this history API, we can build other things -- the first of which was the release engineering dashboard. It is a pure html/js app -- it has no backend server components of its own (no python) -- it directs your browser to make many different queries to the datagrepper API. It 1) asks for all the recent message types for X, Y, and Z categories, 2) locally filters out the irrelevant ones, and 3) tries to render only the latest events.
It is arguably useful. QA likes it so they can see what new things are available to test. In the future, perhaps the websites team can use it to get the latest AMIs for images uploaded to amazon, so they can in turn update getfedora.org.
It is arguably slow. I mean, that thing really crawls when you try to load the page and we've already put some tweaks in place to try to make it incrementally faster. We need a new architecture.
Enter statscache. The releng dash is pulling raw data from the server to the browser, and then computing some 'latest values' from there to display. Why don't we compute and cache those latest values in a server-side service instead? This way they'll be ready and available for snappy delivery to web clients and we won't have to stress out the master archive DB with all those queries trawling for gems.
@rtnpro and I have been working on it for the past few months and have a nice basis for the framework. It can currently cache some rudimentary stuff and most of the releng-dash information, but we have big plans. It is pluggable -- so if there's a new "thing you want to know about", you can write a statscache plugin for it, install it, and we'll start tracking that statistics over time. There are all sorts of metrics -- both the well understood kind and the half-baked kind -- that we can track and have available for visualization.
We can then plug those graphs in as widgets to the larger Fedora Hubs effort we're embarking on (visit the wiki page to learn about it). Imagine user profile pages there with nice d3.js graphs of personal and aggregate community activity. Something in the style of the calendar of contributions graph that GitHub puts on user profile pages would be a perfect fit (but for Fedora activity -- not GitHub activity).
Check out the code:
- the main repo
- the plugins repo
At this point we need:
- New plugins of all kinds. What kinds of running stats/metrics would be interesting?
- By writing plugins that will flex the API of the framework, we want to find edge cases that cannot be easily coded. With those we can in turn adjust the framework now -- early -- instead of 6 months from now when we have other code relying on this.
- A set of example visualizations would be nice. I don't think statscache should host or serve the visualization, but it will help to build a few toy ones in an examples/ directory to make sure the statscache API can be used sanely. We've been doing this with a statscache branch of the releng dash repo.
- Unit/functional test cases. We have some, but could use more.
- Stress testing. With a handful of plugins, how much does the backend suffer under load?
- Plugin discovery. It would be nice to have an API endpoint we can query to find out what plugins are installed and active on the server.
- Chrome around the web interface? It currently serves only JSON responses, but a nice little documentation page that will introduce a new user to the API would be good (kind of like datagrepper itself).
- A deployment plan. We're pretty good at doing this now so it shouldn't be problematic.