[three]Bean

monitoring fedmsg process health in collectd

May 01, 2014 | categories: fedmsg, fedora, collectd View Comments

Happy May Day! Almost a year ago, we started monitoring fedmsg throughput in collectd.

Nowadays, we have many more message types on the bus and a much higher volume too. I started to get worried about the performance of the daemons handling messages. We have log files in /var/log/fedmsg/, but it required someone familiar to go and look at them. Too manual.

Well, over a month ago we cooked up an idea to expose more of the fedmsg-hub's internals for monitoring. That stuff has all been implemented and released after some sprint-work at PyCon. Janez Nemanič is working on nagios checks for all this and just yesterday I wrote the collectd plugin to pull all that information in for visualization. Take a look:

Here's the "backlog" of the Fedora Badges backend. It is a graph of how many messages have arrived in its internal queue, but that it has not yet dealt with. Smaller numbers are better here. As you can see, the badges awarder mostly stays on top of its workload. It can award badges almost as rapidly as it is notified of events.

https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=badges-backend01.phx2.fedoraproject.org;plugin=fedmsg;plugin_instance=hub;type=queue_length;type_instance=FedoraBadgesConsumer_backlog;begin=-86400

Here is the same graph for summershum. It is a daemon that watches the bus, and when new source tarballs are uploaded to the lookaside cache, it downloads them, extracts the contents, and then computes and stores hashes of all the source files. The graph here has a different profile. Lookaside uploads occur relatively infrequently, but when they do occur summershum undertakes a significant workload:

https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=summershum01.phx2.fedoraproject.org;plugin=fedmsg;plugin_instance=hub;type=queue_length;type_instance=SummerShumConsumer_backlog;begin=-86400

Lastly, and this is my favorite, this is the backend for Fedora Notifications. It has some inefficiencies that need to be dealt with, which you can plainly see from the profile here. On some occasions, its backlog will grow to hundreds of messages.

https://admin.fedoraproject.org/collectd/bin/graph.cgi?hostname=notifs-backend01.phx2.fedoraproject.org;plugin=fedmsg;plugin_instance=hub;type=queue_length;type_instance=FMNConsumer_backlog;begin=-86400

For the curious, its workload looks like this: "when a message arrives, compare it against rules defined for every user in our database. If there are any matches, then forward that message on to that user." The inefficiencies stem almost entirely from database queries. For every message that arrives, it extracts every rule for every user from the database across the network. Unnecessary. We could cache all that in memory and have the backend then intelligently invalidate its own cache when a user changes a preference rule in the frontend. We could signal that such a change has been made with, you guessed it, a fedmsg message. Such a change is probably all that's needed...

...but now we have graphs. And with graphs we can measure how much improvement we do or don't get. Much nicer than guessing.

View Comments

PkgDB ACL changes should be much quicker now

Apr 02, 2014 | categories: fedmsg, fedora, pkgdb View Comments

So, it used to be that when someone was granted commit access to a package in the Fedora PackageDB (pkgdb), the webapp simply wrote to a database table indicating the new relationship. Every hour, a cronjob would run that queried the state of that database and then re-wrote out the ACLs for gitolite -- the software that manages access to our package repositories.

Consequently, we had lots of waiting: you would request commit access to a repository, then wait for an owner to grant you rights, then wait for that cronjob to run before you could actually push.

With this new fedmsg consumer that we have in place, those gitolite ACLs will be re-written in response to fedmsg messages from the pkgdb. It is much faster, although not instantaneous. Ballpark: 2 minutes.

Happy hacking!

View Comments

A fedmsg widget for your site

Apr 01, 2014 | categories: fedmsg, widgetry, fedora, datagrepper View Comments

Not an April Fool's joke: more cool fedmsg tools from the Fedora Infrastructure Team: the datagrepper app now provides a little self-expanding javascript widget that you can embed on your blog or website. I have it installed on my blog; if you look there (here?), it should show up on the right hand side of the screen.

Here's the example from the datagrepper docs:

<html>
  <body>
    <h1>My Site</h1>
    <p class="lead">Welcome to my site.</p>
    <p>Here is my latest Fedora activity:</p>

    <script
      id="datagrepper-widget"
      src="https://apps.fedoraproject.org/datagrepper/widget.js?css=true"
      data-user="ralph"
      data-rows_per_page="40">
    </script>

    <footer>Happy Hacking!</footer>
  </body>
</html>

Just copy-and-paste that into a file called testing-stuff.html on your machine, and then open that file in your browser. You should see something like this:

http://threebean.org/blog/static/images/datagrepper-widget-user.png

Like the docs say, you can change the data- attributes on the <script> tag to perform different kinds of queries. For instance, the following would render a widget showing only Bodhi events about the Firefox package:

<script
  id="datagrepper-widget"
  src="https://apps.fedoraproject.org/datagrepper/widget.js?css=true"
  data-category="bodhi"
  data-package="firefox"
  data-rows_per_page="20">
</script>

You might want to include such a thing on a status page for a project you're working on.

http://threebean.org/blog/static/images/datagrepper-widget-bodhi.png

You can make queries about all the fedmsg topics (see the fedmsg docs for the full list), topics like org.fedoraproject.prod.fedbadges.badge.award which would render a feed of the latest Fedora Badges awards:

<script
  id="datagrepper-widget"
  src="https://apps.fedoraproject.org/datagrepper/widget.js?css=true"
  data-topic="org.fedoraproject.prod.fedbadges.badge.award"
  data-rows_per_page="40">
</script>
http://threebean.org/blog/static/images/datagrepper-widget-badges.png

Please let me know in #fedora-apps on freenode if you have any questions (or if you find some cool use for it -- I love hearing that stuff).

View Comments

Threading Moksha

Mar 27, 2014 | categories: fedmsg, fedora, moksha View Comments

We have this rad tool in Fedora Infrastructure we use for passing around server-side messages called fedmsg. It uses zeromq behind the scenes and it is built on top of a framework Luke Macken made called Moksha (which is in turn built on top of Twisted).

To cut to the chase, I have a problem where I want to be able to measure how backlogged some of our message processing consumers are. Here's a diagram of how moksha works as things stand now:

http://threebean.org/blog/static/images/moksha/moksha-as-is.png

Furthermore, here's a depiction of Twisted's own event loop; all of Moksha's code that I'll be discussing lives below in the "our code" section:

http://krondo.com/blog/wp-content/uploads/2009/08/reactor-doread.png

Now, when a message arrives, it is picked up by one of the backends (in our case, the zeromq one) and that message is handed of to the moksha dispatcher ("our code"). The dispatcher then hands that message to any locally registered message consumers that might want it, one after another in series (a consumer is just a Python class that defines a .consume(self, message) method). Some of these message consumers are quite fast: the datanommer consumer just stuffs the message in a postgres database for later analysis. The ircbot consumer just formats the message and sends it off to freenode (although, it throttles itself so as to not get kicked for being spammy). Other consumers take a longer time to handle individual messages. The Fedora Badges message consumer has to compare the message against a couple hundred different rules and some of those rules involve making large database queries -- not quick. The Fedora Notifications consumer has to compare the message against as many different rules and then ultimately forward the message on to irc, email or google cloud messaging for android -- not quick.

At the time of this writing, we have 2,752,890 messages in our message store which has been operating since October 2012. That averages about 4 messages per minute (quite low), but we often have relatively large spikes in volume, around 120 messages per second. How much does that backlog our consumer processes? How long does it take them to catch up? We can eyeball the logs and make guesses, but I'd really like to measure and track it.

Here's an idea. We split the moksha dispatcher into a main "enqueuing" thread and a secondary "dispatching" thread.

http://threebean.org/blog/static/images/moksha/moksha-2-threads.png

The logic for the enqueuer is simple: "when a message arrives, put it on the work queue". The logic for the new secondary dispatcher thread is also simple: "when I find a message in the queue, hand it off to each of my registered consumers in serial". Only when the last consumer has finished a message does the dispatcher thread then return to its work queue to get the next message. The dispatcher thread works much like it did before, but we introduce a little buffer in front of it that we can measure (with collectd, in our case).

Perhaps we can take it further. Give each consumer its own thread and work queue so they can work in parallel:

http://threebean.org/blog/static/images/moksha/moksha-many-threads.png

Here, the enqueuer changes: "when a message arrives, put it in each consumer queue that is registered for this kind of message." Each consumer now is managed by its own thread which picks its own messages off of its own queue and handles them as they can. The advantage here is that we can measure just how backlogged each particular consumer becomes, not just the whole hub.

Things might get tricky as some consumers might have been hacked together to share state that they shouldn't be -- I know the notifications backend does some silly stuff sharing access to the irc connection between consumers. That can be dealt with, though.

So, I dunno, good idea? Bad idea? Lemme know in #fedora-apps or #moksha on freenode.

View Comments

Querying user activity

Mar 24, 2014 | categories: fedmsg, fedora, datagrepper View Comments

In July, I wrote about some tools you can use to query Fedora package history. This post is just to point out that you can use the same approach to query user history. (It is the same data source that we use in Fedora Badges queries -- it also comes with a nice HTML output). Here's some example output from the console:

~❯ ./userwat ralph
2014-03-24T00:15:30 ralph submitted datagrepper-0.4.0-2.el6 to stable https://admin.fedoraproject.org/updates/datagrepper-0.4.0-2.el6
2014-03-24T00:15:28 ralph submitted python-fedbadges-0.4.1-1.el6 to stable https://admin.fedoraproject.org/updates/python-fedbadges-0.4.1-1.el6
2014-03-24T00:15:28 ralph submitted python-taskw-0.8.1-1.el6 to stable https://admin.fedoraproject.org/updates/python-taskw-0.8.1-1.el6
2014-03-24T00:15:27 ralph submitted python-tahrir-api-0.6.0-2.el6 to stable https://admin.fedoraproject.org/updates/python-tahrir-api-0.6.0-2.el6
2014-03-24T00:15:27 ralph submitted python-fedbadges-0.4.0-1.el6 to stable https://admin.fedoraproject.org/updates/python-fedbadges-0.4.0-1.el6
2014-03-23T13:51:16 ralph updated a ticket on the fedora-badges trac instance https://fedorahosted.org/fedora-badges/ticket/122
2014-03-21T17:08:21 ralph's packages.yml playbook run completed
2014-03-21T17:03:38 ralph started an ansible run of packages.yml
2014-03-21T16:31:48 ralph updated a ticket on the fedora-badges trac instance https://fedorahosted.org/fedora-badges/ticket/213
2014-03-21T15:09:10 ralph's python-bugzilla2fedmsg-0.1.3-1.el6 tagged into dist-6E-epel-testing by bodhi http://koji.fedoraproject.org/koji/taginfo?tagID=137

The tool isn't packaged at all, but here's the script if you'd like to copy and use it:

#!/usr/bin/env python
""" userwat - A script to query a user's history from fedmsg.
Author: Ralph Bean
License: LGPLv2+
"""

import datetime
import requests
import sys

format_date = lambda stamp: datetime.datetime.fromtimestamp(stamp).isoformat()


def make_request(user, page):
    response = requests.get(
        "https://apps.fedoraproject.org/datagrepper/raw",
        params=dict(
            meta=["subtitle", "link", "title"],
            start=1,
            user=[user],
            rows_per_page=100,
            page=page,
        )
    )

    return response.json()


def main(user):
    results = make_request(user, page=1)

    for i in range(results['pages']):
        page = i + 1
        results = make_request(user, page=page)

        for msg in results['raw_messages']:
            print format_date(msg['timestamp']),
            #print msg['meta']['title'],
            print msg['meta']['subtitle'],
            print msg['meta']['link']

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print sys.argv
        print "Usage:  userwat <FAS_USERNAME>"
        sys.exit(1)

    username = sys.argv[1]
    main(username)
View Comments

« Previous Page -- Next Page »