[three]Bean

Writing plugins for Mailman 3

May 23, 2013 | categories: python, fedmsg, mailman, fedora View Comments

GNU Mailman 3 (the long-awaited revamp of the widely-used, widely-despised GNU Mailman 2) is on the way, and Fedora's Aurélien Bompard has been working on a new frontend for it (here's a before and after screencast showing some of it off). I set out to write a fedmsg plugin for mm3 so we can add it to cool visualizations, gather more data on non-development contributions to Fedora (which is always hard to quantify), and to support future fedmsg use cases we haven't yet thought of.

I searched for documentation, but didn't find anything directly on how to write plugins. I found Barry's chapter in AOSA to be a really helpful guide before diving into the mailman3 source itself. This blog post is meant to relay my findings: two (of many) different ways to write plugins for mailman3.

Adding a new Handler to the Pipeline

At the end of the day, all we want to do is publish a ØMQ message on a zmq.PUB socket for every mail that makes its way through mailman.

I learned that at its core mailman is composed of two sequential processing steps. First, a chain of rules that make moderation decisions about a message (should it be posted to the list? rejected? discarded?). Second, a pipeline of handlers that perform manipulation operations on a message (should special text be added to the end? headers? should it be archived? added to the digest?).

I came up with this template while trying to figure out how to add another handler to that second pipeline. It works! (but its not the approach we ended up using. read further!):

""" An example template for setting up a custom pipeline for Mailman 3.

Message processing in mailman 3 is split into two phases, "moderation"
and "modification".  This pipeline is for the second phase which
will only be invoked *after* a message has been cleared for delivery.

In order to have this module imported and setup by mailman, our ``initialize``
function needs to be called.  This can be accomplished with the mailman 3
``post_hook`` in the config file::

    [mailman]
    post_hook: mm3_custom_handler_template.initialize

After our ``initialize`` function has been called, the
'custom-posting-pipeline' should be available internally to mailman.
In mailman 3, each mailing list can have its *own* pipeline; precisely
which pipeline gets used at runtime is configured in the database --
through postorious.

:Author: Ralph Bean <rbean@redhat.com>

"""

from __future__ import absolute_import, print_function, unicode_literals

import logging

from zope.interface import implementer
from zope.interface.verify import verifyObject

from mailman.config import config
from mailman.core.i18n import _
from mailman.core.pipelines import PostingPipeline
from mailman.interfaces.handler import IHandler

__all__ = [
    'CustomHandler',
    'CustomPostingPipeline',
    'initialize',
]

elog = logging.getLogger("mailman.error")


@implementer(IHandler)
class CustomHandler:
    """ Do something.. anything with the message. """

    name = 'my-custom-handler'
    description = _('Do something.. anything with the message.')

    def process(self, mlist, msg, msgdata):
        """ See `IHandler` """
        elog.error("CUSTOM HANDLER: %r %r %r" % (mlist, msg, msgdata))


class CustomPostingPipeline(PostingPipeline):
    """ A custom posting pipeline that adds our custom handler """

    name = 'my-custom-posting-pipeline'
    description = _('My custom posting pipeline')

    _default_handlers = PostingPipeline._default_handlers + (
        'my-custom-handler',
    )


def initialize():
    """ Initialize our custom objects.

    This should be called as the `config.mailman.post_hook` callable
    during the third phase of initialization, *after* the other default
    pipelines have been set up.

    """

    # Initialize our handler and make it available
    handler = CustomHandler()
    verifyObject(IHandler, handler)
    assert handler.name not in config.handlers, (
        'Duplicate handler "{0}" found in {1}'.format(
            handler.name, CustomHandler))
    config.handlers[handler.name] = handler

    # Initialize our pipeline and make it available
    pipeline = CustomPostingPipeline()
    config.pipelines[pipeline.name] = pipeline

The above approach works, but it involves a lot of hacking to get mailman to load our code into the pipeline. We have to occupy the mailman post_hook and then kind-of hot-patch our pipeline into the list of existing pipelines.

A benefit of this approach is that we could use postorious (the DB) to control which mailing lists included our plugin and which didn't. The site administrator can leave some decisions up to the list administrators.

I ended up abandoning the above approach and instead landed on...

Adding a second Archiver

One of the Handlers in the default pipeline is the to-archive Handler. It has a somewhat nicer API for defining multiple destinations for archival. One of those is typically HyperKitty (or... kittystore)... but you can add as many as you like.

I wrote this "archiver" (and threw it up on github, pypi, and fedora). Barring tweaks and modifications, I think its the approach we'll end up using down the road:

""" Publish notifications about mails to the fedmsg bus.

Enable this by adding the following to your mailman.cfg file::

    [archiver.fedmsg]
    # The class implementing the IArchiver interface.
    class: mailman3_fedmsg_plugin.Archiver
    enable: yes

You can exclude certain lists from fedmsg publication by
adding them to a 'mailman.excluded_lists' list in /etc/fedmsg.d/::

    config = {
        'mailman.excluded_lists': ['bugzilla', 'commits'],
    }

"""

from zope.interface import implements
from mailman.interfaces.archiver import IArchiver

import socket
import fedmsg
import fedmsg.config


class Archiver(object):
    """ A mailman 3 archiver that forwards messages to the fedmsg bus. """

    implements(IArchiver)
    name = "fedmsg"

    # This is a list of the headers we're interested in publishing.
    keys = [
        "archived-at",
        "delivered-to",
        "from",
        "cc",
        "to",
        "in-reply-to",
        "message-id",
        "subject",
        "x-message-id-hash",
        "references",
        "x-mailman-rule-hits",
        "x-mailman-rule-misses",
    ]

    def __init__(self):
        """ Just initialize fedmsg. """
        hostname = socket.gethostname()
        if not getattr(getattr(fedmsg, '__local', None), '__context', None):
            fedmsg.init(name="mailman.%s" % hostname)
        self.config = fedmsg.config.load_config()


    def archive_message(self, mlist, msg):
        """Send the message to the "archiver".

        In our case, we just publish it to the fedmsg bus.

        :param mlist: The IMailingList object.
        :param msg: The message object.
        """

        if mlist.list_name in self.config.get('mailman.excluded_lists', []):
            return

        format = lambda value: value and unicode(value)
        msg_metadata = dict([(k, format(msg.get(k))) for k in self.keys])
        lst_metadata = dict(list_name=mlist.list_name)

        fedmsg.publish(topic='receive', modname='mailman',
                       msg=dict(msg=msg_metadata, mlist=lst_metadata))

    def list_url(self, mlist):
        """ This doesn't make sense for fedmsg.
        But we must implement for IArchiver.
        """
        return None

    def permalink(self, mlist, msg):
        """ This doesn't make sense for fedmsg.
        But we must implement for IArchiver.
        """
        return None
View Comments

Now Available - List of fedmsg Topics

Apr 03, 2013 | categories: python, fedmsg, fedora View Comments

fedmsg (the Fedora Infrastructure Message Bus) now has a full list of published message topics with example messages and more.

For instance, one of the topics on which the koji build system emits messages is org.fedoraproject.prod.buildsys.build.state.change. The docs there show:

  • The topic.
  • Some description of the event.
  • An example message in JSON format. This is what you would see if you ran $ fedmsg-tail --really-pretty.
  • Some description of what you would get if you passed the example message into some of the functions in the fedmsg.meta python module. This is the stuff you see in the #fedora-fedmsg irc channel and on the identi.ca bot.

Just in case you didn't know, you can listen to the public fedmsg bus (with no configuration) by running:

$ sudo yum install fedmsg
$ fedmsg-tail --really-pretty

If you want to program something that responds to fedmsg messages, check out the docs on consuming messages from python -- the new list of topics should come in handy.

View Comments

Live visualization of fedmsg activity

Mar 04, 2013 | categories: python, fedmsg, zeromq, fedora, gource View Comments

Another crazy idea -- piping ØMQ into gource!

If you have problems with the video, you can download it directly here in ogg or webm.

As before, I used gravatar.com to grab the avatar images, using the FAS_USERNAME@fedoraproject.org formula.

Props to decause for the idea.

View Comments

Async Caching & the Fedora Packages App

Feb 07, 2013 | categories: python, cache, ruby, fedora View Comments

Backstory

Luke (lmacken) Macken, John (J5) Palmieri, and Máirín (mizmo/miami) Duffy created the Fedora Packages webapp; its initial release was about this time last year at FUDCon Blacksburg. It is cool!

Over the summer, I wrote a python api and cli tool called pkgwat to query it. David Davis then wrote his own ruby bindings which now power the Is It Fedora Ruby? webapp.

In late 2012 as part of another scheme, I redirected the popular https://bugz.fedoraproject.org/$SOME_PACKAGE alias from an older app to the packages app's bug list interface.

There was a problem though: developers reported that the page didn't work and was too slow!

Darn.


Caching

There were a number of small bugs (including an ssl timeout when querying Red Hat's bugzilla instance). Once those were cleared there remained the latency issue. For packages like the kernel, our shiny webapp absolutely crawled; it would sometimes take a minute and a half for the AJAX requests to complete. We were confused -- caching had been written into the app from day one, so.. what gives?

To my surprise, I found that our app servers were being blocked from talking to our memcached servers by iptables (I'm not sure how we missed that one). Having flipped that bit, there was somewhat of an improvement, but.. we could do better.


Caching with dogpile.cache

I had read Mike Bayer's post on dogpile.cache back in Spring 2012. We had been using Beaker, so I decided to try it out. It worked! It was cool...

...and I was completely mistaken about what it did. Here is what it actually does:

  1. When a request for a cache item comes in and that item doesn't exist, it blocks while generating that value.
  2. If a second request for the same item comes in before the value has finished generating, the second request blocks and simply waits for the first request's value.
  3. Once the value is finished generating, it is stuffed in the cache and both the first and second threads/procs return the value "simultaneously".
  4. Subsequent requests for the item happily return the cached value.
  5. Once the expiration passes, the value remains in the cache but is now considered "stale".
  6. The next request will behaves the same as the very first: it will block while regenerating the value for the cache.
  7. Other requests that come in while the cache is being refreshed now do not block, but happily return the stale value. This is awesome. When a value becomes stale, only one thread/proc is elected to refresh the cache, all others return snappily (happily).

The above is what dogpile.cache actually does (to the best of my story-telling abilities).

In my imagination, I thought that step number vi didn't actually block. I thought that dogpile.cache would spin off a background thread to refresh the cache asynchronously and that number vi's request would return immediately. This would mean that once cache entries had been filled, the app would feel quick and responsive forever!

It did not work that way, so I submitted a patch. Now it does. :)


Threads

A programmer had a problem.
He thought to himself, "I know, I'll solve it with threads!".
has Now problems. two he

With dogpile.cache now behaving magically and the Fedora Packages webapp patched to take advantage of it, I deployed a new release to staging the day before FUDCon and then again to production at FUDCon on Saturday, January 19th, 2013. At this point our memcached servers promptly lost their minds.

http://threebean.org/sad_memcached.png

Mike Bayer sagely warned that the approach was creepy.

Each of those threads weren't cleaning up their memcached connections properly and enough were being created that the number of concurrent connections was bringing down the cache daemon. To make it all worse, enough changes were made rapidly enough at FUDCon that isolating what caused this took some time. Other environment-wide FUDCon changes to my fedmsg system were also causing unrelated issues and I spent the following week putting out all kinds of fires in all kinds of contexts in what seemed like a never-ending rush. (New ones were started in the process too, like unearthing a PyCurl issue in python-fedora, porting the underlying mechanism to python-requests, only to encounter bundling, compatibility, and security bugs there.)


A long-running queue

During that rush, I reimplemented my async_creator.

Where before it would start up a new thread, do the work, update the cache, and release the mutex, now it would simply drop a note in a redis queue with python-retask. With that came a corresponding worker daemon that picks those tasks off the queue one at a time, does the work, refreshes the cache and releases the distributed dogpile lock before moving on to the next item.

The app seems to be working well now. Any issues? Please file them if so. I am going to take a nap.

View Comments

WebSockets on OpenShift (Moksha in the Cloud)

Jan 07, 2013 | categories: python, fedora, moksha, openshift View Comments

I've been waiting on the OpenShift team to crack the WebSocket nut for a while and they finally got it back in December. To try it out, I tried to set up the Moksha Demo Dashboard on two different gears.

It wasn't too tricky. I created two OpenShift "DIY"-type apps, one for the WSGI app and another for the WebSocket server (the moksha-hub). All the work in those two repos is done in the .openshift/action_hooks directories (the code is actually just installed from PyPI). Additionally, the diy/development.ini files hold all the configuration.

It's live now at http://mokshademo-threebean.rhcloud.com/ but it's our same demo as before. Other apps still in the development pipeline should be more interesting when they arrive.

http://threebean.org/moksha-screenshot-2012-10-25.png
View Comments

« Previous Page -- Next Page »