[three]Bean
Querying fedmsg history for package details by example
Jun 11, 2013 | categories: fedmsg, datanommer, fedora, datagrepper View CommentsIn case you missed it, you can query fedmsg history now with the datagrepper API.
I wrote up an example here to show how you might use it in a script. This will print out the whole history of the hovercraft package (at least everything that is published via fedmsg anyways). In time, I hope to add it to python-pkgwat-api and as a subcommand of the pkgwat command line tool. Until then, you can use:
#!/usr/bin/env python """ Query the history of a package using datagrepper! Check out the api at https://apps.fedoraproject.org/datagrepper/ :Author: Ralph Bean <rbean@redhat.com> :License: LGPLv2+ """ import datetime import re import requests regexp = re.compile( r'<p class="success">Your ur1 is: ' '<a href="(?P<shorturl>.+)">(?P=shorturl)</a></p>') def shorten_url(longurl): response = requests.post("http://ur1.ca/", data=dict(longurl=longurl)) return regexp.search(response.text).groupdict()['shorturl'] def get_data(package, rows=20): url = "https://apps.fedoraproject.org/datagrepper/raw/" response = requests.get( url, params=dict( package=package, delta=9999999, meta=['subtitle', 'link'], rows_per_page=rows, order='desc', ), ) data = response.json() return data.get('raw_messages', []) def print_data(package, links=False): for message in get_data(package, 40): dt = datetime.datetime.fromtimestamp(message['timestamp']) print dt.strftime("%Y/%m/%d"), print message['meta']['subtitle'], if links: print shorten_url(message['meta']['link']) else: print if __name__ == '__main__': print_data(package='hovercraft', links=False)
And here's the output:
- 2013/06/10 ralph's hovercraft-1.1-3.fc18 untagged from f18-updates-pending by bodhi http://ur1.ca/ea99c
- 2013/06/10 ralph's hovercraft-1.1-3.fc18 tagged into f18-updates by bodhi http://ur1.ca/ea99e
- 2013/06/10 ralph's hovercraft-1.1-3.fc18 untagged from f18-updates-testing by bodhi http://ur1.ca/ea99f
- 2013/06/10 ralph's hovercraft-1.1-3.fc19 tagged into f19-updates-pending by bodhi http://ur1.ca/ea99x
- 2013/06/10 ralph's hovercraft-1.1-3.fc18 tagged into f18-updates-pending by bodhi http://ur1.ca/ea99c
- 2013/06/10 ralph submitted hovercraft-1.1-3.fc18 to stable http://ur1.ca/ea99y
- 2013/06/10 ralph submitted hovercraft-1.1-3.fc19 to stable http://ur1.ca/ea99z
- 2013/05/21 ralph's hovercraft-1.1-3.fc18 untagged from f18-updates-testing-pending by bodhi http://ur1.ca/ea9a0
- 2013/05/21 ralph's hovercraft-1.1-3.fc18 tagged into f18-updates-testing by bodhi http://ur1.ca/ea99f
- 2013/05/21 ralph's hovercraft-1.1-3.fc18 untagged from f18-updates-candidate by bodhi http://ur1.ca/ea9a1
- 2013/05/21 ralph's hovercraft-1.1-3.fc19 untagged from f19-updates-testing-pending by bodhi http://ur1.ca/ea9a2
- 2013/05/21 ralph's hovercraft-1.1-3.fc19 tagged into f19-updates-testing by bodhi http://ur1.ca/ea9a3
- 2013/05/21 ralph's hovercraft-1.1-3.fc19 untagged from f19-updates-candidate by bodhi http://ur1.ca/ea9a4
- 2013/05/21 ralph's hovercraft-1.1-3.fc18 tagged into f18-updates-testing-pending by bodhi http://ur1.ca/ea9a0
- 2013/05/21 ralph submitted hovercraft-1.1-3.fc18 to testing http://ur1.ca/ea99y
- 2013/05/21 ralph's hovercraft-1.1-3.fc17 failed to build http://ur1.ca/ea9a5
- 2013/05/21 hovercraft-1.1-3.fc17 started building http://ur1.ca/ea9a5
- 2013/05/21 ralph pushed to hovercraft (f17). "Add BR on python3-manuel." http://ur1.ca/ea9a6
- 2013/05/21 ralph's hovercraft-1.1-3.fc18 tagged into f18-updates-candidate by ralph http://ur1.ca/ea9a1
- 2013/05/21 ralph's hovercraft-1.1-3.fc18 completed http://ur1.ca/ea9a7
- 2013/05/21 hovercraft-1.1-3.fc18 started building http://ur1.ca/ea9a7
- 2013/05/21 ralph pushed to hovercraft (f18). "Add BR on python3-manuel." http://ur1.ca/ea9a8
- 2013/05/20 ralph's hovercraft-1.1-3.fc19 tagged into f19-updates-testing-pending by bodhi http://ur1.ca/ea9a2
- 2013/05/20 ralph submitted hovercraft-1.1-3.fc19 to testing http://ur1.ca/ea99z
- 2013/05/20 ralph's hovercraft-1.1-3.fc19 tagged into f19-updates-candidate by ralph http://ur1.ca/ea9a4
- 2013/05/20 ralph's hovercraft-1.1-3.fc19 completed http://ur1.ca/ea9ab
- 2013/05/20 ralph's hovercraft-1.1-3.fc20 tagged into f20 by ralph http://ur1.ca/ea9ac
- 2013/05/20 ralph's hovercraft-1.1-3.fc20 completed http://ur1.ca/ea9ad
- 2013/05/20 hovercraft-1.1-3.fc19 started building http://ur1.ca/ea9ab
- 2013/05/20 hovercraft-1.1-3.fc20 started building http://ur1.ca/ea9ad
- 2013/05/20 ralph pushed to hovercraft (f19). "Add BR on python3-manuel." http://ur1.ca/ea9ae
- 2013/05/20 ralph pushed to hovercraft (master). "Add BR on python3-manuel." http://ur1.ca/ea9af
- 2013/05/20 ralph's hovercraft-1.1-2.fc19 failed to build http://ur1.ca/ea9ag
- 2013/05/20 hovercraft-1.1-2.fc19 started building http://ur1.ca/ea9ag
- 2013/05/20 ralph's hovercraft-1.1-2.fc19 failed to build http://ur1.ca/ea9ag
- 2013/05/20 hovercraft-1.1-2.fc19 started building http://ur1.ca/ea9ag
- 2013/05/20 ralph's hovercraft-1.1-2.fc19 failed to build http://ur1.ca/ea9ag
- 2013/05/20 hovercraft-1.1-2.fc19 started building http://ur1.ca/ea9ag
- 2013/05/20 ralph's hovercraft-1.1-2.fc19 failed to build http://ur1.ca/ea9ag
- 2013/05/20 hovercraft-1.1-2.fc19 started building http://ur1.ca/ea9ag
Distributing jobs via hashmaths
Jun 10, 2013 | categories: python, aosa, fedmsg, fedora View CommentsAs things stand, we can't load balance across multiple fedmsg-hub daemons (We can and do balance across multiple fedmsg-gateway instances, though -- that is another story).
For the hubs though, here's a scheme that might work. However.. is it fast enough?
#!/usr/bin/env python """ Distribute jobs via.. "hashspace"? Actually give this a run. See what it does. I learned about it from the mailman3 chapter in AOSA http://www.aosabook.org/en/mailman.html """ import json import hashlib class Consumer(object): def __init__(self, i, n): self.i = i self.n = n def should_I_process_this(self, msg): """ True if this message falls in "our" portion of the hash space. This seems great, except I bet its pretty expensive. Can you come up with an equivalent, "good enough" method that is more efficient? """ as_a_string = json.dumps(msg) as_a_hash = hashlib.md5(as_a_string).hexdigest() as_a_long = int(as_a_hash, 16) return (as_a_long % self.n) == self.i def demonstration(msg): """ Handy printing utility to show the results """ print "* Who takes this message? %r" % msg for consumer in consumers: print " *", consumer.i, "of", consumer.n, "hubs.", print " Process this one?", consumer.should_I_process_this(msg) if __name__ == '__main__': # Say we have 4 moksha-hubs each running the same set of consumers. # For story-telling sake, we'll say we're dealing here with the datanommer # consumer. Let's pretend it has to do some heavy scrubbing on the message # before it throws it in the DB, so we need to load-balance that. # As things stand now with fedmsg.. we can't do that *hand waving*. # This is a potential scheme to make it possible. # We have 4 moksha-hubs, one on each of 4 machines. N = 4 consumers = [Consumer(i, N) for i in range(N)] # Fedmsg delivers a message. All 4 of our hubs receive it. # They each take the md5 sum of the message, convert that to a long # and then mod that by the number of moksha-hubs. If that remainder is # *their* id, then they claim the message and process it. demonstration(msg={'blah': 'I am a message, lol.'}) demonstration(msg={'blah': 'I am a different message.'}) demonstration(msg={'blah': 'me too.'}) # Since md5 sums are "effectively" random, this should distribute across # all our nodes mostly-evenly.
As I was typing this post up, Toshio Kuratomi mentioned that I should look into zlib.adler32 and binascii.crc32 if I am concerned about speed (which I am).
Perhaps some benchmarking is in order?
Writing plugins for Mailman 3
May 23, 2013 | categories: python, fedmsg, mailman, fedora View CommentsGNU Mailman 3 (the long-awaited revamp of the widely-used, widely-despised GNU Mailman 2) is on the way, and Fedora's Aurélien Bompard has been working on a new frontend for it (here's a before and after screencast showing some of it off). I set out to write a fedmsg plugin for mm3 so we can add it to cool visualizations, gather more data on non-development contributions to Fedora (which is always hard to quantify), and to support future fedmsg use cases we haven't yet thought of.
I searched for documentation, but didn't find anything directly on how to write plugins. I found Barry's chapter in AOSA to be a really helpful guide before diving into the mailman3 source itself. This blog post is meant to relay my findings: two (of many) different ways to write plugins for mailman3.
Adding a new Handler to the Pipeline
At the end of the day, all we want to do is publish a ØMQ message on a zmq.PUB socket for every mail that makes its way through mailman.
I learned that at its core mailman is composed of two sequential processing steps. First, a chain of rules that make moderation decisions about a message (should it be posted to the list? rejected? discarded?). Second, a pipeline of handlers that perform manipulation operations on a message (should special text be added to the end? headers? should it be archived? added to the digest?).
I came up with this template while trying to figure out how to add another handler to that second pipeline. It works! (but its not the approach we ended up using. read further!):
""" An example template for setting up a custom pipeline for Mailman 3. Message processing in mailman 3 is split into two phases, "moderation" and "modification". This pipeline is for the second phase which will only be invoked *after* a message has been cleared for delivery. In order to have this module imported and setup by mailman, our ``initialize`` function needs to be called. This can be accomplished with the mailman 3 ``post_hook`` in the config file:: [mailman] post_hook: mm3_custom_handler_template.initialize After our ``initialize`` function has been called, the 'custom-posting-pipeline' should be available internally to mailman. In mailman 3, each mailing list can have its *own* pipeline; precisely which pipeline gets used at runtime is configured in the database -- through postorious. :Author: Ralph Bean <rbean@redhat.com> """ from __future__ import absolute_import, print_function, unicode_literals import logging from zope.interface import implementer from zope.interface.verify import verifyObject from mailman.config import config from mailman.core.i18n import _ from mailman.core.pipelines import PostingPipeline from mailman.interfaces.handler import IHandler __all__ = [ 'CustomHandler', 'CustomPostingPipeline', 'initialize', ] elog = logging.getLogger("mailman.error") @implementer(IHandler) class CustomHandler: """ Do something.. anything with the message. """ name = 'my-custom-handler' description = _('Do something.. anything with the message.') def process(self, mlist, msg, msgdata): """ See `IHandler` """ elog.error("CUSTOM HANDLER: %r %r %r" % (mlist, msg, msgdata)) class CustomPostingPipeline(PostingPipeline): """ A custom posting pipeline that adds our custom handler """ name = 'my-custom-posting-pipeline' description = _('My custom posting pipeline') _default_handlers = PostingPipeline._default_handlers + ( 'my-custom-handler', ) def initialize(): """ Initialize our custom objects. This should be called as the `config.mailman.post_hook` callable during the third phase of initialization, *after* the other default pipelines have been set up. """ # Initialize our handler and make it available handler = CustomHandler() verifyObject(IHandler, handler) assert handler.name not in config.handlers, ( 'Duplicate handler "{0}" found in {1}'.format( handler.name, CustomHandler)) config.handlers[handler.name] = handler # Initialize our pipeline and make it available pipeline = CustomPostingPipeline() config.pipelines[pipeline.name] = pipeline
The above approach works, but it involves a lot of hacking to get mailman to load our code into the pipeline. We have to occupy the mailman post_hook and then kind-of hot-patch our pipeline into the list of existing pipelines.
A benefit of this approach is that we could use postorious (the DB) to control which mailing lists included our plugin and which didn't. The site administrator can leave some decisions up to the list administrators.
I ended up abandoning the above approach and instead landed on...
Adding a second Archiver
One of the Handlers in the default pipeline is the to-archive Handler. It has a somewhat nicer API for defining multiple destinations for archival. One of those is typically HyperKitty (or... kittystore)... but you can add as many as you like.
I wrote this "archiver" (and threw it up on github, pypi, and fedora). Barring tweaks and modifications, I think its the approach we'll end up using down the road:
""" Publish notifications about mails to the fedmsg bus. Enable this by adding the following to your mailman.cfg file:: [archiver.fedmsg] # The class implementing the IArchiver interface. class: mailman3_fedmsg_plugin.Archiver enable: yes You can exclude certain lists from fedmsg publication by adding them to a 'mailman.excluded_lists' list in /etc/fedmsg.d/:: config = { 'mailman.excluded_lists': ['bugzilla', 'commits'], } """ from zope.interface import implements from mailman.interfaces.archiver import IArchiver import socket import fedmsg import fedmsg.config class Archiver(object): """ A mailman 3 archiver that forwards messages to the fedmsg bus. """ implements(IArchiver) name = "fedmsg" # This is a list of the headers we're interested in publishing. keys = [ "archived-at", "delivered-to", "from", "cc", "to", "in-reply-to", "message-id", "subject", "x-message-id-hash", "references", "x-mailman-rule-hits", "x-mailman-rule-misses", ] def __init__(self): """ Just initialize fedmsg. """ hostname = socket.gethostname() if not getattr(getattr(fedmsg, '__local', None), '__context', None): fedmsg.init(name="mailman.%s" % hostname) self.config = fedmsg.config.load_config() def archive_message(self, mlist, msg): """Send the message to the "archiver". In our case, we just publish it to the fedmsg bus. :param mlist: The IMailingList object. :param msg: The message object. """ if mlist.list_name in self.config.get('mailman.excluded_lists', []): return format = lambda value: value and unicode(value) msg_metadata = dict([(k, format(msg.get(k))) for k in self.keys]) lst_metadata = dict(list_name=mlist.list_name) fedmsg.publish(topic='receive', modname='mailman', msg=dict(msg=msg_metadata, mlist=lst_metadata)) def list_url(self, mlist): """ This doesn't make sense for fedmsg. But we must implement for IArchiver. """ return None def permalink(self, mlist, msg): """ This doesn't make sense for fedmsg. But we must implement for IArchiver. """ return None
FedoraHosted gets on the bus
May 14, 2013 | categories: fedmsg, fedorahosted, fedora View CommentsLast week I wrote a plugin to make trac emit fedmsg messages. We installed it first on a few projects to make sure it worked ok, and everything seems fine. If you're watching the #fedora-fedmsg channel on freenode, you may have already noticed some trac.ticket.update messages go by.
It is currently enabled for the following projects:
If you want it enabled for YOUR fedorahosted project, please open an infrastructure ticket and we'll get to it as soon as we can. It's really pretty easy to set up; someone on the infrastructure team just needs to follow this documented Standard Operating Procedure.
Querying fedmsg history
May 14, 2013 | categories: fedmsg, datanommer, fedora, datagrepper View CommentsYesterday, Ian Weller and I got the first iteration of "datagrepper" into production. It is a JSON api that lets you query the history of the fedmsg bus. In case you're confused.. it is related to, but not the same thing as "datanommer". You can check out the datagrepper docs on its index page as well as on its reference page.
It is another building block. With it, we can now:
- Use it as a reliability resource for other fedmsg projects.
- Say you have a daemon that listens for fedmsg messages to act on... but it crashes. When it gets restarted it can ask datagrepper "What messages did I miss since this morning at 4:03am?" and catch up on those as it can.
- Build apps that query it to show "the latest so many messages
that meet such and such criteria."
- Imagine an HTML5 mobile app that shows you the latest of anything and everything in Fedora. (pingou is at work on this).
- Imagine package-centric UI widgets that show the latest Fedora-wide events pertaining to a certain package. We could embed them in the Fedora Packages app.
- Imagine user-centric UI widgets that show the latest activity of developers. You could embed yours in your blog or wiki page.
- Statistics! The whole dataset is available to you and updated in real time. Can you tell any cool stories with it?
It is, like I mentioned, an initial release so please be gentle. We have a big list of plans and bugs to crack on. If you run into issues or simply have questions, feel free to file a bug or ping us in #fedora-apps on freenode.
« Previous Page -- Next Page »