Can maintaining package dependencies in RPM be magic?

One of the biggest complaints I hear about packaging software is packaging up all the dependencies, and then being responsible for keeping them current. Obviously, this is not the only complaint.. but i’m not talking about the others at the moment.

A long time ago i had the idea for a website where you could aggregate status updates for software projects you care about, thus allowing you to go get all your updates without having to subscribe to billions of lists. I had this idea back when still installing things manually, because the package ecosystem was nowhere near as complete as it is today (in debian or fedora). Sometime after that i registered (my software update feed) with the intention of one day writing such a tool.

That was a long time ago, and obviously this kind of thing is an issue. So some really clever people (okay, at the very least more motivated) in Fedora came up with Anitya, a tool for monitoring releases. They even provide a public and freely consumable installation.

Anitya publishes events to fedmsg (fedora infratructure message bus), which can be listened to as an unauthenticated feed.

Stanislav Ochotnický created a small proof of concept project called fedwatch which listens to fedmsg and will pass data from it to scripts in its run directory. My idea was to take release information and pass it to jobs in jenkins that are setup to bump the relevant information in a spec file with rpmdev-bumpspec (>=1.8.5) and build the new version. In theory this would cover lots of releases with minimal overhead, allowing the outlying changes to require hands on. I’ve put some completely unfinished code here.

Some RPM packaging thoughts and resources

Hopefully in the next two days my topic of ‘Using system level packaging’ will get picked up at our internal unconference. In preparation for that I figured I would compile a little bit of my data into a post so that I point people to it as a reference.

First off, try reading this old blog post by myself.

Now that that diatribe is out of the way, on to some advice for developers.

Intended Audience

Did you start your project by defining the problem? As part of that definition did you consider who your target audience is?  In my opinion this is a very important part of the process related to system packaging.

Let’s say that I’m working on a project that will be useful to run on OpenStack servers.  If you utilize the latest python libraries installed via pip and virtualenv, many systems that OpenStack runs on will not support those versions.  However, if you follow the global requirements that OpenStack projects follow, it is easier for your project to fit into that environment.

What if have brilliant idea to create a next generation orchestration tool? Knowing that this would be an awesome tool for enterprises, who typically are willing to buy support even for your free open-source project, it makes sense to consider that from day one.  As of 2013, many enterprise shops are running a mixture of RHEL5 and RHEL6.  That means that python 2.3 and 2.4, respectively, are the default python version available to you. In some cases there are additional version available, but we’ll get to that next.

3rd Party Repositories for EL distributions

Now maybe you have decided that Red Hat Enterprise Linux and it’s derivatives are your target and have started gnashing your teeth.  “I don’t want to have to package all these dependencies!”  Luckily, some wonderful communities have popped up to help.  The tried and true repository, Extra Packages for Enterprise Linux (EPEL) has been around for a while, and provides a nice stable extension to the base EL distribution.  Both fortunately and unfortunately, it has similar stability and version restrictions to base EL.

If you are using CentOS there are several additional repositories available for you.

Going back to the OpenStack example above there is a fairly recent repository set that can be extremely help as well.  Red Hat Deployed OpenStack provides the OpenStack global requirements, all nice and packaged up for public consumption. Here is some information about these repositories.

Software Collections

Software Collections is an extremely interesting new concept with lots of potential stemming from the Fedora community and embraced by Red Hat. Generically speaking the goal is to provide a consistent framework for parallel installation of different versions of software.  It supports anything from newer versions of Python and Ruby to new versions of databases like PostgreSQL and MySQL. By providing a framework it becomes very possible for you to readily package your entire dependency stack in a way that can be installed in a standard RPMish way, but still isolated from the rest of the system!

That’s it for now.

puppet tricks: staging puppet

As I have been learning puppet @dayjob one of the things I have been striving to deal with is order of operations.  Puppet supports a few resource references, such as before, after, notify, and subscribe. But my classes were quickly becoming slightly painful to define all these in, when the reality was there was not always hard dependencies so much as a preferred order.  After having issues with this for a while and researching other parts of puppet I stumbled across some mention of run stages, which were added in the 2.6.0 release of puppet.  If you read through the language guide they are mentioned.  There has always been a single default stage, main.  But now you add as many as you want.  To define a stage you have to go into a manifest such as your site.pp and define the stages, like so:

That defines the existence of two stages, a pre stage for before main and a post for after main.  But I have not defined any ordering.  To do that we can do the following, still in site.pp:

Thus telling puppet how to order these stages.  An alternate way would be:

It all depends on your style. So now that we have created the alternate stages, and told puppet what the ordering of these stages is, how do we associate our classes inside them?  It is fairly simple, when you are including a class or module you pass it in as a class parameter.  To do this they introduced an alternate method of “including” a class.  Before you would use one of these two methods:

In this the base class requires that the users class is done before it, and then includes the packages class. Its fairly basic. Transitioning this to stages comes out like this:

It is very similar to calling a define.  In production I ended up where adding my base class in the pre stage of a lot of classes, and which became kinda burdensome. I knew that there were universal bits that belonged in the pre stage, and universal bits that did not. To simplify I settled on the following:

With this setup I do not have to worry about defining the stages multiple times. I even took it further by doing the same concept for the different groups that are also applied to systems, so the universal base and the group base are both configured as in the last example. I have not tried it with the post stage, as I do not use one yet, but I would imagine it would work just as above. Here is an untested example:

Maybe this seems fairly obvious to people already using stages, but it took me a bit to arrive here, so hopefully it helps you out.


UPDATE: PuppetLabs’ stdlib module provides a ‘deeper’ staging setup.  Here is the manifest in github.

git reference guide – part one

I’m still getting used to utilizing git for my version control. The part I like most is the merge handling. So here is another reference post for me, hopefully it will help me remember bits of my git work flow. Mostly basics, and some I do not need to remind myself, but it does not hurt to document.

  • Checkout repository
  • Add a file to the index
  • See current status
  • See differences between current changes and committed changes
  • Stash changes without committing them
  • Update local repository from remote
  • Commit changes in index
  • Generate a patch from local commit

    Some useful options

    • –find-renames, -M n%
    • –output-directory, -o dir
    • –numbered, -n
    • –unnumbered, -N
    • –signoff, -s
  • Directly send locally committed patch via e-mail (see man page for Gmail config)
  • Apply a patch set
  • Push changes to remote

That is it for now, but I know there will be at least one more of this because I have not touched on branching and switching around between repositories.

Renaming images utilizing time taken metadata in linux

This is more of a reminder for myself, but i figured I’d put it here.  The wifey wanted me to fix the naming on some pictures so that they were named based on their date.  Instead of manually doing so a quick search on Google showed me that the identify program that comes with the ImageMagick package in Linux will give me access to the data on the command line.  Taking the data and some awk-fu I threw together this quick one liner:

Since I am actually pretty prone to trial and error as I make my one-liners, I prefer for the command to print my commands out instead of executing them.  Makes it easier to spot errors before execution, and is just a simple copy-paste away from execution.

I’d break this out into a moreable block, but the awk section kinda goes on and on,  but here goes a breakdown

So given a set of files named IMG_2204.JPG, IMG_2840.JPG, IMG_3204.JPG as a source into this we pull the following date:modify results (in order):

And the final output from the script is:

All the users gather round

Linux has two classification of accounts.  System accounts and User accounts.  System accounts are delineated as any account with a UID lower than the defined UID_MIN value in the /etc/login.defs file, with the UID of 0 being reserved for the root account.  Red Hat based distributions systems set UID_MIN to 500, which is a deviation from the upstream project, shadow-utils, which uses of 1000.  Some of these UIDs are considered to be statically allocated and others for dynamically allocated.  In the upstream distribution Fedora there are currently static UIDs up to 173.  There is no clear definition of where the dynamically allocated UIDs start, but within Fedora as of version 16 and higher there is currently a plan to help define this more clearly.  One part of that plan is that Fedora upping their definition of UID_MIN to the upstream 1000.  If the feature makes it in this will still not effect RHEL until version 7 at the earliest.  I’m honestly not sure if any other distribution has a clearer definition of the usage of these, but if not maybe that will change.

The primary use for system accounts is for any application that needs a dedicated user.  Some good examples of this are tomcat, mysql, and httpd. One of the biggest benefits of having a designated space for system accounts is that you can define a specific UID, and have that application user have that same UID on every system.  Take for example a case where a user, such as myapp, owns millions of files on a system.  If the myapp user was created without defining that it is a system account, then myapp would get a UID in the 500+ range, we will use 502 for the sake of this example.  Now say I need to keep these files synchronized with a backup system.  However on the backup system there were already several more users than on my production system and so myapp was assigned the UID of 509.  What about 502? That is assigned to gswift. Well now if my sync of the files preserves the file ownership, the user swiftg now has ownership of all of those files, because sync is a based on the UID, not the readable mapping.  The same thing could occur if you were migrating from one server to a new one.

So, where am I going with this?  I think it is important for developers to remember that any time you are creating a user on the system for your application, it should be in the system account area.  Luckily most do, especially when they include their software in a public distribution.

Introducing ‘rhcpdl’ project

I’ve been a Red Hat customer for over a decade now.  One of the things that has been a common work flow for me is the download of the ISOs from Red Hat’s web site (Previously through RHN, now Customer Portal). Because I usually store these on a central machine that is not my desktop, I often just copy the download URL, and wget it from my storage server’s cli. With the changes introduced by the new Customer Portal the URLs have changed in such a way that this process is much more difficult, although still do able. I complained through the support channel, and after >6m of waiting finally got back a response stating that this is not something they are interested in fixing. I have a hard time believing myself and a few others I know are the only ones affected by this so I have begun a protest of the process.  I’m also pretty sure that the only people this affects are the paying customers.

But in the nature of our community and open source my protest is not just a bunch of whining (although one could consider the explanation of the background for my protest whining, but take it as you will), but an actually attempt to “fix” the issue.

Step 1: I wrote and published a utility (rhcpdl) that effectively restores this functionality
Step 2: Attempt to get people to use/back that project so that maybe RH will realize they need to fix the issue

You can get more information from the project page at There is also RPMs for RHEL5 and 6, and a SRPM available for download.

Remember, the goal of this project is obsolescence :)

Global Variables and Namespaces in python

Recently I had someone come ask me for a bit more information about working with global variables. For those new to python, this might be something helpful, so I figured I’d share.   Personally, for ease of reference, I specify my global variables names in ALLUPPERCASE. This helps distinguish them since I use that naming standard nowhere else in my code.

In a python application you have multiple namespaces. Each namespace is intended to be completely isolated, so you can use the same name in multiple namespaces without conflict. The global namespace is the only one where this does not hold strictly true.  If the below is not enough, a good and more in depth explanation is available here: A guide to Python Namespaces.

A global variable can be called from inside any namespace, but without a special call any changes stay inside that local namespace. If you state inside your function/class/whatever that you are using the global variable as a global, then you changes take place in the global name space.

Here is some sample code that show this in action: