[three]Bean

Fedora Photowall

Sep 04, 2014 | categories: python, fedora View Comments

If you didn't already know it, there is a Fedora Badge for associating a libravatar with your Fedora account.

https://badges.fedoraproject.org/pngs/mugshot.png

A fun by-product of having such a thing is that we have a list of FAS usernames of people who have public avatars with predictable URLs. With that, I wrote a script that pulls down that list and assembles a "photo wall" of a random subset. Check it out:

http://threebean.org/blog/static/images/montage2.png

I wrote it as a part of another project that I ended up junking, but the script is neat on its own.

Perhaps we can use it as a splash image for a Fedora website someday (say, next year's Flock site or a future iteration of FAS?). It might make a fun desktop wallpaper, too.

Here's the script if you'd like to re-use and modify. You can tweak the dimensions variable to change the number of rows and columns in the output.

#!/usr/bin/env python
""" fedora-photowall.py - Make a photo montage of Fedora contributors.

Dependencies:
 $ sudo yum install python-sh python-beautifulsoup4 python-requests ImageMagick

Usage:   python fedora-photowall.py
Author:  Ralph Bean
License: LGPLv2+

"""

import hashlib
import os
import random
import urllib

import sh
import bs4
import requests

dimensions = (12, 5)

datadir = './data'
avatar_dir = datadir + '/avatars'
montage_dir = datadir + '/montage'


def make_directories():
    try:
        os.makedirs(datadir)
    except OSError:
        pass

    try:
        os.makedirs(avatar_dir)
    except OSError:
        pass

    try:
        os.makedirs(montage_dir)
    except OSError:
        pass


def avatars(N):
    url = 'https://badges.fedoraproject.org/badge/mugshot/full'
    response = requests.get(url)
    soup = bs4.BeautifulSoup(response.text)
    last_pane = soup.findAll(attrs={'class': 'grid-100'})[-1]
    persons = last_pane.findAll('a')

    persons = random.sample(persons, N)

    for person in persons:
        name = person.text.strip()
        openid = 'http://%s.id.fedoraproject.org/' % name
        hash = hashlib.sha256(openid).hexdigest()
        url = "https://seccdn.libravatar.org/avatar/%s" % hash
        yield (name, url)


def make_montage(candidates):
    """ Pull down avatars to disk and stich with imagemagick """

    filenames = []
    for name, url in candidates:
        filename = os.path.join(avatar_dir, name)
        if not os.path.exists(filename):
            print "Grabbing", name, "at", url
            urllib.urlretrieve(url, filename=filename)
        else:
            print "Already have", name, "at", filename
        filenames.append(filename)

    args = filenames + [montage_dir + '/montage.png']
    sh.montage('-tile', '%ix%i' % dimensions, '-geometry', '+0+0', *args)
    print "Output in", montage_dir


def main():
    make_directories()
    N = dimensions[0] * dimensions[1]
    candidates = avatars(N)
    make_montage(candidates)

if __name__ == '__main__':
    main()

And another example of output:

http://threebean.org/blog/static/images/montage4.png

Cheers!

View Comments

A tiny optimization

Sep 03, 2013 | categories: python, nitpicking, fedora View Comments

Talking over a pull request with @pypingou, we found that this one method of constructing a set from a list of stripped strings was slightly faster than another:

#!/usr/bin/env python
""" Timing stuff.

::
    $ python timeittest.py
    set(map(str.strip, ['wat '] * 200))
    30.9805839062
    set([s.strip() for s in ['wat '] * 200])
    31.884624958
"""

import timeit


def measure(stmt):
    print stmt
    results = timeit.timeit(stmt)
    print results


measure("set(map(str.strip, ['wat '] * 200))")
measure("set([s.strip() for s in ['wat '] * 200])")

Admission: Pierre bullied me into blogging about this!


UPDATE: Folks in the comments recommended using a generator or itertools.imap. The results are significantly better. Here they are:

import itertools; [s.strip() for s in ['wat '] * 200]
28.2224271297
import itertools; (s.strip() for s in ['wat '] * 200)
3.0280148983
import itertools; map(str.strip, ['wat '] * 200)
25.7294211388
import itertools; itertools.imap(str.strip, ['wat '] * 200)
2.3925549984

UPDATE (again): Comments further reveal that the update above is misleading -- the generators aren't actually doing any work there. If we force them to spin out, we get results like these:

import itertools; set([s.strip() for s in ['wat '] * 200])
33.4951019287
import itertools; set((s.strip() for s in ['wat '] * 200))
35.5591659546
import itertools; set(map(str.strip, ['wat '] * 200))
33.7568879128
import itertools; set(itertools.imap(str.strip, ['wat '] * 200))
35.9931280613

No clear benefit for use of imap or generators.

View Comments

Distributing jobs via hashmaths

Jun 10, 2013 | categories: python, aosa, fedmsg, fedora View Comments

As things stand, we can't load balance across multiple fedmsg-hub daemons (We can and do balance across multiple fedmsg-gateway instances, though -- that is another story).

For the hubs though, here's a scheme that might work. However.. is it fast enough?

#!/usr/bin/env python
""" Distribute jobs via.. "hashspace"?

Actually give this a run.  See what it does.

I learned about it from the mailman3 chapter in AOSA
http://www.aosabook.org/en/mailman.html
"""

import json
import hashlib


class Consumer(object):

    def __init__(self, i, n):
        self.i = i
        self.n = n

    def should_I_process_this(self, msg):
        """ True if this message falls in "our" portion of the hash space.

        This seems great, except I bet its pretty expensive.

        Can you come up with an equivalent, "good enough" method that is
        more efficient?
        """
        as_a_string = json.dumps(msg)
        as_a_hash = hashlib.md5(as_a_string).hexdigest()
        as_a_long = int(as_a_hash, 16)
        return (as_a_long % self.n) == self.i


def demonstration(msg):
    """ Handy printing utility to show the results """

    print "* Who takes this message? %r" % msg
    for consumer in consumers:
        print "  *", consumer.i, "of", consumer.n, "hubs.",
        print "  Process this one?", consumer.should_I_process_this(msg)

if __name__ == '__main__':
    # Say we have 4 moksha-hubs each running the same set of consumers.
    # For story-telling sake, we'll say we're dealing here with the datanommer
    # consumer.  Let's pretend it has to do some heavy scrubbing on the message
    # before it throws it in the DB, so we need to load-balance that.

    # As things stand now with fedmsg.. we can't do that *hand waving*.
    # This is a potential scheme to make it possible.

    # We have 4 moksha-hubs, one on each of 4 machines.
    N = 4
    consumers = [Consumer(i, N) for i in range(N)]

    # Fedmsg delivers a message.  All 4 of our hubs receive it.
    # They each take the md5 sum of the message, convert that to a long
    # and then mod that by the number of moksha-hubs.  If that remainder is
    # *their* id, then they claim the message and process it.
    demonstration(msg={'blah': 'I am a message, lol.'})
    demonstration(msg={'blah': 'I am a different message.'})
    demonstration(msg={'blah': 'me too.'})

    # Since md5 sums are "effectively" random, this should distribute across
    # all our nodes mostly-evenly.

As I was typing this post up, Toshio Kuratomi mentioned that I should look into zlib.adler32 and binascii.crc32 if I am concerned about speed (which I am).

Perhaps some benchmarking is in order?

View Comments

ToscaWidgets2 Sprint Postponed

Jun 04, 2013 | categories: python, tw2 View Comments

Last week, Moritz Schlarb announced a tw2 sprint set for this coming weekend.

Unfortunately, we discovered in #toscawidgets that we need to reschedule. The new dates are tentatively set for August 26th - 28th, 2013.

If you have any items you want us to take up or you'd like to participate, please see our laundry list.

View Comments

Writing plugins for Mailman 3

May 23, 2013 | categories: python, fedmsg, mailman, fedora View Comments

GNU Mailman 3 (the long-awaited revamp of the widely-used, widely-despised GNU Mailman 2) is on the way, and Fedora's Aurélien Bompard has been working on a new frontend for it (here's a before and after screencast showing some of it off). I set out to write a fedmsg plugin for mm3 so we can add it to cool visualizations, gather more data on non-development contributions to Fedora (which is always hard to quantify), and to support future fedmsg use cases we haven't yet thought of.

I searched for documentation, but didn't find anything directly on how to write plugins. I found Barry's chapter in AOSA to be a really helpful guide before diving into the mailman3 source itself. This blog post is meant to relay my findings: two (of many) different ways to write plugins for mailman3.

Adding a new Handler to the Pipeline

At the end of the day, all we want to do is publish a ØMQ message on a zmq.PUB socket for every mail that makes its way through mailman.

I learned that at its core mailman is composed of two sequential processing steps. First, a chain of rules that make moderation decisions about a message (should it be posted to the list? rejected? discarded?). Second, a pipeline of handlers that perform manipulation operations on a message (should special text be added to the end? headers? should it be archived? added to the digest?).

I came up with this template while trying to figure out how to add another handler to that second pipeline. It works! (but its not the approach we ended up using. read further!):

""" An example template for setting up a custom pipeline for Mailman 3.

Message processing in mailman 3 is split into two phases, "moderation"
and "modification".  This pipeline is for the second phase which
will only be invoked *after* a message has been cleared for delivery.

In order to have this module imported and setup by mailman, our ``initialize``
function needs to be called.  This can be accomplished with the mailman 3
``post_hook`` in the config file::

    [mailman]
    post_hook: mm3_custom_handler_template.initialize

After our ``initialize`` function has been called, the
'custom-posting-pipeline' should be available internally to mailman.
In mailman 3, each mailing list can have its *own* pipeline; precisely
which pipeline gets used at runtime is configured in the database --
through postorious.

:Author: Ralph Bean <rbean@redhat.com>

"""

from __future__ import absolute_import, print_function, unicode_literals

import logging

from zope.interface import implementer
from zope.interface.verify import verifyObject

from mailman.config import config
from mailman.core.i18n import _
from mailman.core.pipelines import PostingPipeline
from mailman.interfaces.handler import IHandler

__all__ = [
    'CustomHandler',
    'CustomPostingPipeline',
    'initialize',
]

elog = logging.getLogger("mailman.error")


@implementer(IHandler)
class CustomHandler:
    """ Do something.. anything with the message. """

    name = 'my-custom-handler'
    description = _('Do something.. anything with the message.')

    def process(self, mlist, msg, msgdata):
        """ See `IHandler` """
        elog.error("CUSTOM HANDLER: %r %r %r" % (mlist, msg, msgdata))


class CustomPostingPipeline(PostingPipeline):
    """ A custom posting pipeline that adds our custom handler """

    name = 'my-custom-posting-pipeline'
    description = _('My custom posting pipeline')

    _default_handlers = PostingPipeline._default_handlers + (
        'my-custom-handler',
    )


def initialize():
    """ Initialize our custom objects.

    This should be called as the `config.mailman.post_hook` callable
    during the third phase of initialization, *after* the other default
    pipelines have been set up.

    """

    # Initialize our handler and make it available
    handler = CustomHandler()
    verifyObject(IHandler, handler)
    assert handler.name not in config.handlers, (
        'Duplicate handler "{0}" found in {1}'.format(
            handler.name, CustomHandler))
    config.handlers[handler.name] = handler

    # Initialize our pipeline and make it available
    pipeline = CustomPostingPipeline()
    config.pipelines[pipeline.name] = pipeline

The above approach works, but it involves a lot of hacking to get mailman to load our code into the pipeline. We have to occupy the mailman post_hook and then kind-of hot-patch our pipeline into the list of existing pipelines.

A benefit of this approach is that we could use postorious (the DB) to control which mailing lists included our plugin and which didn't. The site administrator can leave some decisions up to the list administrators.

I ended up abandoning the above approach and instead landed on...

Adding a second Archiver

One of the Handlers in the default pipeline is the to-archive Handler. It has a somewhat nicer API for defining multiple destinations for archival. One of those is typically HyperKitty (or... kittystore)... but you can add as many as you like.

I wrote this "archiver" (and threw it up on github, pypi, and fedora). Barring tweaks and modifications, I think its the approach we'll end up using down the road:

""" Publish notifications about mails to the fedmsg bus.

Enable this by adding the following to your mailman.cfg file::

    [archiver.fedmsg]
    # The class implementing the IArchiver interface.
    class: mailman3_fedmsg_plugin.Archiver
    enable: yes

You can exclude certain lists from fedmsg publication by
adding them to a 'mailman.excluded_lists' list in /etc/fedmsg.d/::

    config = {
        'mailman.excluded_lists': ['bugzilla', 'commits'],
    }

"""

from zope.interface import implements
from mailman.interfaces.archiver import IArchiver

import socket
import fedmsg
import fedmsg.config


class Archiver(object):
    """ A mailman 3 archiver that forwards messages to the fedmsg bus. """

    implements(IArchiver)
    name = "fedmsg"

    # This is a list of the headers we're interested in publishing.
    keys = [
        "archived-at",
        "delivered-to",
        "from",
        "cc",
        "to",
        "in-reply-to",
        "message-id",
        "subject",
        "x-message-id-hash",
        "references",
        "x-mailman-rule-hits",
        "x-mailman-rule-misses",
    ]

    def __init__(self):
        """ Just initialize fedmsg. """
        hostname = socket.gethostname()
        if not getattr(getattr(fedmsg, '__local', None), '__context', None):
            fedmsg.init(name="mailman.%s" % hostname)
        self.config = fedmsg.config.load_config()


    def archive_message(self, mlist, msg):
        """Send the message to the "archiver".

        In our case, we just publish it to the fedmsg bus.

        :param mlist: The IMailingList object.
        :param msg: The message object.
        """

        if mlist.list_name in self.config.get('mailman.excluded_lists', []):
            return

        format = lambda value: value and unicode(value)
        msg_metadata = dict([(k, format(msg.get(k))) for k in self.keys])
        lst_metadata = dict(list_name=mlist.list_name)

        fedmsg.publish(topic='receive', modname='mailman',
                       msg=dict(msg=msg_metadata, mlist=lst_metadata))

    def list_url(self, mlist):
        """ This doesn't make sense for fedmsg.
        But we must implement for IArchiver.
        """
        return None

    def permalink(self, mlist, msg):
        """ This doesn't make sense for fedmsg.
        But we must implement for IArchiver.
        """
        return None
View Comments

Next Page »