Work continues on Factory 2.0...
Recall that we have 1000 different problems we're trying to solve, but we're attempting to focus on an isolated subset for now: problems we've picked so that their solutions can enable higher-level problem solving in the coming months. The work currently includes a focus on:
- Preparing elementary build infrastructure for the Fedora 26 Alpha release.
- Deserializing pipeline processes that could be done more quickly in parallel.
- Building a dependency chain database, so that we can build smarter rebuild automation and pipeline analytics.
- Monitoring pipeline performance metrics, so that as we later improve things we can be sure we had an effect.
Overall, I'm happy with the work output of this sprint, but there are complications.
- We made significant progress on the module build infrastructure. The stack is maturing - the number of new features will be slowing as we start to prepare it for production for F26. Our proposals were approved by FESCo last week. We still have a long way to go, but we're currently on target to build an F26 Edition out of the base modules. Status: Green.
- Our deserialization effort hit some roadbumps as we negotiate how to interface with existing test-execution environments. We're not blocked at the moment and are in dialogue with our partners. We will slip a few weeks on our MvP deliverable here. Status: Yellow.
- The dependency chain MvP is ready for deployment efforts in Sprint 5. We underestimated how this work would depend on other message bus enablement pre-requisites, which has pushed our delivery date back a few weeks. Status: Yellow.
- Our performance metrics work is progressing, but much more slowly than we had initially expected. We promised a delivery date of Dec. 1st for this, but we will miss it by a wide margin and need to adjust expectations. The primary factor is that the metrics are just more complex than we anticipated. The pipeline is complex, therefore exacting meaningful measurements is complex. A separate factor is some unpredictability with write-access to the integration lab's ELK instance we are depending on. The delay in work here does not block our other efforts. Status: Red.
mbs-build-profiles, by threebean
Here we show how we're re-using the "installation profiles" feature of modulemd to define the buildsystem "build groups" for a module. It's a natural extension of the install profiles metadata which furthmore helps unblock the base-runtime team in their quest to produce the first generational core!
mbs-config, by fivaldi
In this video I'm trying to explain changes in configuration internals of Module Build Service. This aims to split configuration data/logic so that it will be simplier to use and understand for contributors/users.
modulemd1-mock, by jkaluza
In this demo I inform about the support of modulemd-1.0 format in Module Build Service, describes the build-order feature and shows how the module using this feature builds using the mock builder.
resultsdb-updater, by mprahl
This video shows the new microservice ResultsDB-Updater which listens on the CI message bus for test results and adds them to ResultsDB.
umb-brew, by mikeb
A demo of message publication from Brew to the Unified Message Bus. Explains the topic hierarchy and the message format.
umb-dist-git-demo, by mikeb
A demo showing the publication of messages from a dist-git repo to the Unified Message Bus. Shows how the repos are configured for publication on the server side, and explains a little bit about message format.
This was our first full sprint with the new team! Welcome, Jan Kaluza, Courtney Pacheco, Vera Karas, and Stanislav Ochotnicky. We're glad to have Filip Valder join us in sprint 4 starting today.
Our top priority in sprint 3 was making sure that the base runtime team isn't blocked. They have a big job ahead of themselves to curate and build a collection of base modules at the core of the distro, and they need to use our prototype build tooling to do it. Anytime they're blocked, the Factory 2.0 team is trying to chase down the solution -- fixing tracebacks and developing new features. Cheers to Matt Prahl and Jan Kaluza for staying on top of this.
Meanwhile, we're continuing apace with the Dependency Chain and Deserialization epics that we originally scheduled for work this quarter. Mike Bonnet has been chasing down difficult technical pre-requisites for the later (message bus enablement), Matt Prahl demoed his dependency chain web UI, and Courtney Pacheco is giving shape to our metrics project (so we can have some confidence that future pipeline changes we make actually improve the state of affairs).
As always, we're double-tasked with laying the groundwork for work in future quarters. Thanks to Stanislav Ochotnicky for starting the conversation with Platform representatives about workflow changes and continuous integration, and thanks to all those who participated in this round of resultsdb/CI discussions.
f26-changes, by threebean
I talk about this in the video, but the changes are designed specifically to limit the amount of risk we impose on the rest of the release process and the Modularity initiative. We can use the F26 release to assess the viability of Modularity and decide then how far we want to go in the Fedora 27 timeframe.
mbs-mock-backend, by jkaluza
In this demo I describe what is Mock Builder and why it is useful to Module Build Service. I also show it in action briefly.
pdc-tangle-web, by mprahl
In this demo I talk about an Angular2 web app I wrote called PDC Tangle Web. The app queries PDC based on the user's search criteria to show an artifact's dependencies. This is a beta version, so please keep in mind more features will come, but feel free to make suggestions on GitHub for what you'd like to see added.
resultsdb, by threebean
This one's just pointing you to a blog post that is a transcription of an internal document on resultsdb and our plans for it.
I know that showing off a blog post isn't all that exciting, but I'm glad we've gotten to this point. It is the product of a long process of discussion and eventual agreement.
This post is primarily about taking some of the lessons we learned in Fedora QA infrastructure and applying them to some internal Red Hat pipelines as well as generalizing the pattern to other parts of Fedora's infrastructure.
- ResultsDB is a database for storing results. Unsurprising!
- It is a passive system, it doesn't actively do anything.
- It has a HTTP REST interface. You POST new results to it and GET them out.
- It was written by Josef Skladanka of Tim Flink's Fedora QA team.
It was originally written as a part of the larger Taskotron system, but we're talking about it independently here because it's such a gem!
What problems can we solve with it?
In formal Factory 2.0 problem statement terms, this helps us solve the Serialization and Automation problems directly, and indirectly all of the problems that depend on those two.
Beyond that, let's look at fragmentation. The goal of the "Central CI" project in Red Hat was to consolidate all of the fragmentation around various CI solutions. This was a success in terms of operational and capex costs -- instead of 100 different CI systems running on 100 different servers sitting under 100 different desks, we have one Central CI infrastructure backed by OpenStack serving on-demand Jenkins masters. Win. A side-effect of this has been that teams can request and configure their own Jenkins masters, without commonality in their configuration. While teams are incentivized to move to a common test execution tool (Jenkins), there's no common way to organize jobs and results. While we reduced fragmentation at one level, it remains untouched at another. People informally speak of this as the problem of "the fourteen Jenkins masters" of Platform QE.
Beyond Jenkins, some Red Hat PnT DevOps tools perform tasks that are QE-esque but yet are not a part of the Central CI infrastructure. Notably, the Errata Tool (which plays a very similar role to Fedora's Bodhi system) directly runs jobs like covscan, rpmgrill, rpmdiff, and TPS/insanity that are unnecessarily tied to the "release checklist" phase of the workflow. They could benefit from the common infrastructure of Central CI. (The Errata Tool developers are burdened by having to think about scheduling and storing test results while developing the release checklist application. This thing is too big.)
One option could be to attempt to corral all of the various dev and QE groups into getting onto the same platform and configuring their jobs the same way. That's a possibility, but there is a high cost to achieving that level of social coordination.
Instead, we intend to use resultsdb and a small number of messagebus hooks to insulate consuming services from the details of job execution.
Getting data out of resultsdb
Resultsdb, unsurprisingly, stores results. A result must be associated with a testcase, which is just a namespaced name (for example, general.rpmlint). It must also be associated with an item, which you can think about as the unique name of a build artifact produced by some RCM tool: the nevra of an rpm is a typical value for the item field indicating that a particular result is associated with a particular rpm.
Take a look at some examples of queries to the Fedora QA production instance of taskotron, to get an idea for what this thing can store:
- A list of known testcases https://taskotron.fedoraproject.org/resultsdb_api//api/v1.0/testcases
- Information on the depcheck testcase https://taskotron.fedoraproject.org/resultsdb_api//api/v1.0/testcases/dist.depcheck
- All known results for the depcheck testcase https://taskotron.fedoraproject.org/resultsdb_api//api/v1.0/testcases/dist.depcheck/results
- Only depcheck results associated with builds https://taskotron.fedoraproject.org/resultsdb_api//api/v1.0/testcases/dist.depcheck/results?type=koji_build
- All rpmlint results associated with the python-gradunwarp-1.0.3-1.fc24 build https://taskotron.fedoraproject.org/resultsdb_api//api/v1.0/testcases/dist.rpmlint/results?item=python-gradunwarp-1.0.3-1.fc24
- All results of any testcase associated with that same build https://taskotron.fedoraproject.org/resultsdb_api//api/v1.0/results?item=python-gradunwarp-1.0.3-1.fc24
For the Release Checklist
For the Errata Tool problems described in the introduction, we need to:
- Set up Jenkins jobs that do exactly what the Errata Tool processes do today: rpmgrill, covscan, rpmdiff, TPS/Insanity. Ondrej Hudlicky's group is working on this.
- We need to ingest data from the bus about those jobs, and store that in resultsdb. The Factory 2.0 team will be working on that.
- We also need to write and stand up an accompanying waiverdb service, that allows overriding an immutable result in resultsdb. We can re-use this in Fedora to level up the Bodhi/taskotron interaction.
- The Errata Tool needs to be modified to refer to resultsdb's stored results instead of its own.
- We can decommission Errata Tool's scheduling and storage of QE-esque activities. Hooray!
Note that, in Fedora the Bodhi Updates System already works along these lines to gate updates on their resultsdb status. A subset of testcases are declared as required. However, if a testcase is failing erroneously, a developer must change the requirements associated with the update to get it out the door. This is silly. Writing and deploying something like waiverdb will make that much more straightforward.
On expanding this pattern in Fedora
Note also that the fedimg tool, used to upload newly composed images to AWS, currently has no gating in place at all. It uploads everything. While talking about how we actually want to introduce gating into its workflow, it was proposed that it should query the cloud-specific test executor called autocloud. Our answer here should be no. Autocloud should store its results in resultsdb, and fedimg should consult resultsdb to know if an image is good or not. This insulates fedimg's code from the details of autocloud and enables us to more flexibly change out QE methods and tools in the future.
For Rebuild Automation
For Fedora Modularity, we know we need to build and deploy tools to automate rebuilds. In order to avoid unnecessary rebuilds of Tier 2 and Tier 3 artifacts, we'll want to first ensure that Tier 1 artifacts are "good". The rebuild tooling we design will need to:
- Refer to resultsdb to gather testcase results. It should not query test-execution systems directly for the reasons mentioned above.
- Have configurable policy. Resultsdb gives us access to all test results. Do we block rebuilds if one test fails? How do we introduce new experimental tests while not blocking the rebuild process? A constrained subset of the total set of testcases should be used on a per-product/per-component basis to define the rebuild criteria: a policy.
Putting data in resultsdb
- Resultsdb receives new results by way of an HTTP POST.
- In Fedora, the Taskotron system puts results directly into resultsdb.
- Internally, we'll need a level of indirection due to the social coordination issue described above. Any QE process that wants to have its results stored in resultsdb (and therefore be considered in PnT DevOps rebuild and release processes) will need to publish to the unified message bus or the CI-bus using the “CI-Metrics” format driven by Jiri Canderle.
- The Factory 2.0 team will write, deploy and maintain a service that listens for those messages, formats them appropriately, and stores them in resultsdb.
Today marks the end of Sprint #2 for the so-called Factory 2.0 project. So far, it's been a small-ish team with Matthew Prahl and myself (and part-time support from the awesome Michael Bonnet). We know which way we want to go now and we're going to pick up a few more people in the coming weeks.
We write status reports internally, but we're doing work in the open, in Fedora. We should probably bridge that gap and I figured the easiest way to share would be to post our demo videos here as well:
Running fedmsg services on other message bus backends
Storing rpm dependencies in PDC
Storing container dependencies in PDC
Querying dependencies from PDC
Standing up resultsdb with ansible
I'll try my best to post all of our future sprint report videos here. No secrets! The best place to find us is in
#fedora-admin. Happy Hacking!
Recently, over in a pull request review, we found the need to list all of the existing fedmsg topics (to see if some code we were writing would or wouldn't stumble on any of them).
Way back when, we added a feature to datagrepper that would let you list all of the unique topics, but it never worked and was eventually removed.
Here's the next best thing:
#!/usr/bin/env python from fedmsg_meta_fedora_infrastructure import ( tests, doc_utilities, ) from fedmsg.tests.test_meta import Unspecified classes = doc_utilities.load_classes(tests) classes = [cls for cls in classes if hasattr(cls.context, 'msg')] topics =  for cls in classes: if not cls.context.msg is Unspecified: topics.append(cls.context.msg['topic'] .replace('.stg.', '.prod.') .replace('.dev.', '.prod.')) # Unique and sort topics = sorted(list(set(topics))) import pprint pprint.pprint(topics)
Next Page »