Tag Archives: open source

All posts relating to Open Source software, mostly but not exclusively UNIX focused.

Delicious Entropy

I run a large GNU/Linux server with KVM for running numerous virtual machine guests, including build hosts used to package and compile software for different GNU/Linux distributions and other operating systems.

I recently ran into an issue during a kernel compile where the kernel compile hung indefinitely whilst GPG (tried) to sign kernel modules as part of the build process, due to the virtual machine guest running out of available entropy and being unable to proceed until more random data was available.

Bro, I'm stalled as bro!

Bro, I’m stalled as bro!

On Linux there are two sources of random data  – /dev/random, which provides high quality random data and /dev/urandom which provides an unlimited amount of pseudo-random data based on a seed value taken from the random pool initially.

Linux generates this random data by collecting entropy from somewhat-random events, such as disk activity, network activity, keyboard, mouse and other sources. When the pool of entropy is exhausted, /dev/random will block (ie force processes to freeze) until more is available, whereas /dev/urandom will continue to serve continuous pseudo-random data, although the quality of the random data is not considered as secure as /dev/random.

On a workstation or single server this tends to be enough to generate sufficient random data for most applications (although if you’re doing certain tasks you may still have an issue). Virtual machines on the other hand, lack hardware sources of entropy such as disks or keyboards and it’s very easy to quickly exhaust the available entropy pool and have some applications block until more is available.

Applications like Apache (with mod_ssl) and OpenSSL use /dev/urandom so aren’t impacted by shortages of entropy, but some signing processes, such as GPG require /dev/random and can be impacted if the source of entropy is exhausted  – which is exactly what happened to my kernel signing process.

 

It’s pretty easy to use to test and see how quickly a Linux system re-fills the entropy pool by running a test to read data from /dev/random, forcing the pool to empty and be repopulated.

# dd if=/dev/random of=/dev/null count=1000
0+1000 records in
16+1 records out
8496 bytes (8.5 kB) copied, 149.849 s, 0.1 kB/s

The host doing this test has around 12 physical hard disks, 10 active KVM virtual machines spewing out packets, an unfiltered WAN link feeding random junk – all which is good for generating a decent amount of entropy. The numbers may look pretty bad, but when compared with the amount of entropy generated by my laptop…

# dd if=/dev/random of=/dev/null count=1000
0+1000 records in
16+1 records out
8409 bytes (8.4 kB) copied, 1389.95 s, 0.0 kB/s

The rate of entropy generation on my laptop is quite depressing – but at least my laptop has a keyboard, mouse and hardware environmental values to help add sometime to the entropy sources.

When I run the same test on a virtual machine guest, which lacks all these physical sources, it comes to  a grinding halt:

# dd if=/dev/random of=/dev/null count=10000
0+24 records in
0+0 records out
0 bytes (0 B) copied, 1865.68 s, 0.0 kB/s

I was forced to kill the above test due to it timing out indefinitely thanks to the host running out of any available entropy and being unable to generate any more to complete the test. :-(

Even when performing an intensive activity such as compiling a large software library, it still takes considerable time to complete this test on a VM:

# dd if=/dev/random of=/dev/null count=1000
0+1000 records in
15+1 records out
8018 bytes (8.0 kB) copied, 2560.36 s, 0.0 kB/s

It seems that the lack of the random data generated by active physical hardware is too much for the VM guest to be able to complete the test. And whilst some applications like an HTTPS website would continue to operate fine, others like a build host GPG-signing packages may fail and hang indefinitely, unable to obtain the required volume of random data to complete it’s key generation process.

 

For times when this lack of entropy becomes an issue for your applications, it is possible to obtain additional entropy from a hardware random number generator – this can be as simple as using a feed such as analog noise from the sound card or as sophisticated as a hardware random number generator or functionality built into certain CPUs which is designed to be extremely random and unpredictable.

A while ago I picked up a pair of Simtec Electronic’s Entropy Keys, a small USB device which generates truly random sources of data by a clever method of abusing semiconductors and connected one to my primary KVM servers.

The device ships with an open source daemon that takes random data from the key and injects it into the Linux entropy pool for use by all /dev/random using applications. It instantly makes a huge difference to the available volume by generating almost 3.9KB/s of random data.

Gain entropy with just 1 easy repayment!

Gain entropy with just 1 easy repayment! Call now!

After starting the daemon and re-running the test, the performance looks much better:

# dd if=/dev/random of=/dev/null count=1000
0+1000 records in
145+1 records out
74504 bytes (75 kB) copied, 21.8926 s, 3.4 kB/s

The numbers are still low, but the reality is you generally you only need a few bytes at a time, rather than massive volumes like this test demands – for general signing usage, 3.4kB/s is a huge volume to have.

So whilst this test doesn’t reflect the real way /dev/random is used, it does illustrates the difference in data volume a proper random number generator can make. And whilst this might not be a common problem thanks to the low volume of random data required for most applications to function, the increasing use of virtualisation makes this issue possibly one that people may bump into more in future.

Now that I have my host server getting a reliable and steady flow of random data, my next step is to share that data to the virtual machines running on the host – as I’m doing all my signing in guests, it’s vital that I get that random data through to them,

I’m in the process of investigating a few different options and will cover these in a follow up blog post, as it’s a somewhat sizeable topic in it’s own right.

WordPress & SSL Fixes

I’ve been using WordPress for this blog for a number of years now – at some point I realised that whilst writing my own code is fun, there’s no need to reinvent yet-another-fucking-blog-platform and ended up selecting WordPress to use for my content, on the basis of it’s strong and active development and community.

Generally it’s pretty good, but there are times it disappoints, such as WordPress expecting servers to have FTP for unpacking updates and plugins (it’s 2013 guys, SFTP at least!), excessively setting cookies which makes caching layers more complex and doing stupid stuff with storing full URLs inside the database for page links and image resources.

The latter has been impacting me in particular. Visitors to my site have had the option of using HTTP or HTTPS (SSL secured) access methods for some time, but annoyingly whenever I posted an article with images, WordPress includes all the images using http://. This mixed content type prevents browsers from showing the lock icon (best case) or throws up a nasty error (worst case) depending on the browser and it’s level of concern for user safety for mismatched content.

Dubious Firefox is dubious about this site.

Dubious Firefox is dubious about this site, no lock icon of security here!

Despite having accessed the site on https://, WordPress still uses http:// for my images.

Despite having accessed the site on https://, WordPress still uses http:// for my images.

I could work around this by setting the WordPress base URL for my site to be https://www.jethrocarr.com, but then images served at the unsecured http:// site would also be served via SSL, which is just adding pointless load to the server (not that SSL termination really adds much load these days, but damnit, I’m being a purist here!).

I was hoping that it was a misconfiguration of my WordPress setup, but reading online it seems that this is a known issue with WordPress and a whole bunch of modules, hacks and themes have sprung up to fix/workaround the issue…

Of course there’s an easier way – fix it at the webserver layer! Both Nginx and Apache have modules to do substitutions in page content on load, for Nginx there’s HttpSubModule and for Apache there is mod_substitute. In my case with stock Apache 2.2 on CentOS 5, I was able to fix the whole issue by adding the following to my SSL vhost configuration:

# Fix SSL URLs thanks to WordPress hardcoding http:// links to images :'(
<Location />
    AddOutputFilterByType SUBSTITUTE text/html
    Substitute "s|http://www.example.com|https://www.example.com|"
</Location>

Following this, things look much better:

The lock icon of browser approval!

The lock icon of browser approval!

All media files are now https://, not http://

All media files are now https://, not http://

Technically this substitution will have some level of performance impact, as it has to process the generated HTML content and check for strings to replace, but the impact is so low that I wasn’t able to measure it amongst the usual variation of page response times – and it’s not going to be anywhere as slow as mod_php and WordPress itself anyway. ;-)

Finally, if you haven’t already, you probably want to change the following in wp-config.php:

define('FORCE_SSL_ADMIN', true);

This forces all WordPress logins and wp-admin activities to take place under HTTPS which is a pretty good idea if you ever post to your blog from an unsecured network.

The Apache that wanted to be root

I’ve run into an issue a couple of times where some web applications on my server have broken following a restart of Apache when the application in question calls external programs..

What seems to happen is that when an administrator restarts Apache during general maintenance of that server, Apache picks up some of the unwanted environmental settings from the root user account, in particular the variable HOME ends up getting set to the home directory of the root user account (/root).

Generally it won’t be an issue for web applications, but if they call an external application (in my case, Git), that external application may use the HOME environment to try and read or write configuration files.

# tail -n1 error.log
fatal: unable to access '/root/.config/git/config': Permission denied

In my case, Git kept dying with a fatal error, which lead to a very confused sysadmin wondering why a process running as Apache is trying to read from the root user’s account…

By looking at the environmental settings for the Apache worker processes, we can see what’s happening. After a normal boot, the environmental variables look something like the below:

# ps aux | grep httpd
root     10173  0.0  1.6  27532  8496 ?        Ss   22:42   0:00 /usr/sbin/httpd
apache   10175  0.1  2.8  37560 14692 ?        S    22:42   0:01 /usr/sbin/httpd
apache   10176  0.1  2.8  37836 14952 ?        S    22:42   0:01 /usr/sbin/httpd
apache   10177  0.1  2.8  37332 14876 ?        S    22:42   0:01 /usr/sbin/httpd
apache   10178  0.1  2.8  37560 14692 ?        S    22:42   0:01 /usr/sbin/httpd

# cat /proc/10175/environ
TERM=dumbPATH=/sbin:/usr/sbin:/bin:/usr/binPWD=/LANG=CSHLVL=2_=/usr/sbin/httpd

Because Apache has been started by init, it has a nice clean environment. But after a restart by the root user, it’s clear that some cruft from the root user account has been pulled into the application environment variables:

# cat /proc/10175/environ

HOSTNAME=localhostSHELL=/bin/bashTERM=xtermHISTSIZE=1000USER=root:
MAIL=/var/spool/mail/rootPATH=/sbin:/usr/sbin:/bin:/usr/bin
INPUTRC=/etc/inputrcPWD=/rootLANG=CSHLVL=3HOME=/rootLOGNAME=root
LESSOPEN=|/usr/bin/lesspipe.sh %sG_BROKEN_FILENAMES=1_=/usr/sbin/httpd

Because of these settings, external programs relying on the value of HOME will try to read/write to a directory that they aren’t permitted to use.

Debian-based systems fix this issue by unsetting certain environmentals (including HOME) in the bootscript for Apache, based on the rules in /etc/apache2/envvars.

To fix the issue on a RHEL/CentOS host, you can instead just append a replacement HOME setting into /etc/sysconfig/httpd. This particular configuration file is read at server startup and isn’t overwritten when Apache gets upgraded.

cat >> /etc/sysconfig/httpd << "EOF"
# Correct Apache's home directory
HOME=/var/www
EOF

Following a restart, Apache should now show the correct HOME environmental variable and your application should function as expected.

Awstats 7.2 + extras RPMs

I’ve been a long term user of Awstats for reporting on visitor traffic to my websites. Whilst it’s a little dated, it’s simplicity and reliance only on the web server logs makes it ideal for any application, including general websites such as blog, but also more specialised sites such as my package repositories which can’t make use of more sophisticated client-side Javascript tracking methods as files are being downloaded by non-browser clients.

Simple web 1.0 goodness. No fancy AJAX graphs here son!

Simple web 1.0 goodness. No fancy AJAX graphs here son!

That repository server in particular (repos.jethrocarr.com) is now pushing 20-40GB of traffic per month to around 2500-3000 servers. Unfortunately Awstats doesn’t differentiate between general purpose file grabbers and the Yum downloader for RPM-based distributions, and it makes it difficult to see if downloads are from machines vs mirror scripts scanning and re-downloading files.

I also run dual-stack IPv4 and IPv6 – Awstats includes some useful GeoIP modules to lookup where user traffic comes from, but it doesn’t support mixed IPv4 and IPv6 by default and as my IPv6 traffic usage increases, this could be a problem as the “Unknown” country counter increases.

To fix this, I’ve written a patch for adding Yum user agent support and also merged in a patch by Sven Strickroth which adds a geoip6 module that does both IPv4 and IPv6 country lookups using the popular MaxMind GeoLite databases.

I’ve built packages for CentOS/RHEL/etc 5 & 6, which are available at my repositories at repos.jethrocarr.com. The awstats package I’ve built includes these two patches and also pulls in a current copy of MaxMind’s GeoIP database and required dependencies, so you’re all good to go immediately.

If you’re after the patches themselves, you can download them directly:

NamedManager 1.6.0

I’ve just finished up a few changes to NamedManager this weekend and released version 1.6.0. It provides a few bug fixes and small improvements, as well as the addition of support for IPv6 PTR (reverse) records, so you can now maintain both forwards and reverse DNS for both IPv4 and IPv6 with NamedManager.

IPv6 AAAA records on a domain

IPv6 AAAA records on a domain

When you add records with NamedManager, you can have a reverse PTR record added for your particular A or AAAA record by ticking a checkbox. NamedManager then generates the appropriate reverse record for you, simplifying the process of managing DNS.

IPv6 PTR records

IPv6 PTR records

If you’re interested in NamedManager you can download NamedManager from my project website (Tarball or Git), from GitHub, or download RPMs for RHEL/CentOS 5/6.

MONA, Hobart

I was down in Hobart a couple weeks ago for PyCon AU 2013, a Python programming conference organised by a friend of mine. Whilst I don’t do that much in Python currently, it was just a good excuse to go hang out with a bunch of interesting people and friends for a couple days and to get out of Sydney for a bit.

I’ve been to Hobart before, it’s a nice place for a visit, with a very NZ-like climate and fauna and an interesting mix of small town with a blend of great coffee, bars and distilleries mixed in.

mmm, cool fresh air - just like back home!

mmm, cool fresh air – just like back home!

Soaking up some fresh air and sun before retreating into a dark room with my laptop for the rest of the day.

Soaking up some fresh air and sun before retreating into a dark room with my laptop for the rest of the day.

This Hobart pub has a better beer selection than most of the places near my home and work in Sydney's CBD.

This Hobart pub has a better beer selection than most of the places near my home and work in Sydney’s CBD – it’s like being back in New Zealand again!

One of my other big motivators was to go and visit MONA, the Museum of Old and New Art, a massive underground museum created by an eccentric wealthy Tasmanian who has built an amazing collection of contemporary art.

It’s an absolutely stunning collection, worthy enough of making a weekend visit to Hobart purely to check it out. Sadly I only had a couple hours allocated to explore it, but I could have possibly spent a whole day there – particularly with them having a bar in the museum!

I am the data lord!

I am the data lord!

Is that the source up there?

Is that the source up there?

Where old art and new art meets.

Where old art and new art meets.

It’s easy to get out there with a short ferry trip from the town, so sit back with a craft beer or go and admire all the street art around the boat whilst it whisks you past Hobart.

Art! And not just me, the stuff on the wall!

Art! And not just me, the stuff on the wall!

Upon arrival, iPod-based guides are handed out, which were actually much more useful than the traditional approach of having everyone crowding around a small plaque on the wall, as it allowed you to select and read in your own space and time.

One of the nice things about MONA is how they manage to not take themselves too seriously, with easter eggs and other playful pokes at themselves around the place.

Love it!

Love it!

I thoroughly enjoyed my trip to MONA and certainly recommend it as a must-see if you visit Hobart and a worthy contender to be a reason for making a trip to MONA solely for it in particular.

Git & GitHub Enabled

GitHub Goodness

GitHub Goodness

I’ve been developing software for a little while now and have build up a few repositories for my applications, with all my open source ones being available publicly. Sometimes people find some of my applications useful and I get thanks, patches or PHP hate mail. :-)

For a long time I’ve been using SVN for storing my code, along with Indefero as my project tracker at projects.jethrocarr.com. It’s a good combination, very similar to Google Code in many respects and generally a lightweight application with a good core feature set.

The only issue has been that with the explosion in popularity of Git and the socalisation of coding with sites such as GitHub, users have gotten tired of the “diff and email patch” approach when submitting contributions and want to take advantage of shiner features such as pull requests which make contributes much easier, as well as more recognisable to others.

Whilst I’m keen to do as much as possible to make it easier to get commits and users, switching to any one particular hosted provider is of concern to me – whilst they may be popular now, will they still be as popular in 10 years time? (Remember SourceForge anyone?).

The solution is that undergoing the pain to migrate existing repositories from SVN to Git (a lot more messing around than you might think) opens up the ability to pull and push to multiple repositories, which means that I now have all my open source projects on GitHub and in addition have my own hosted Indefero server which has a full copy of all my code.
This allows me to engage with users on GitHub, whilst still maintaining control of my own issue tracker and full copies of my repositories and data. It also avoids users from setting up their own GitHub repositories of my projects and having them confused as official ones – with my own in place, it’s a starting place for forks to occur from.
I’m going to trial this for a few months – if it all works well, I’m going to take a look at adding in easy support to Indefero to create and push/pull from a GitHub repository automatically as part of creating a new project. And if needed I could add additional Git providers to mirror to as well (eg BitBucket) should other popular hives of activity appear.

PRISM Break

The EFF has put together a handy website for anyone looking to replace some of their current proprietary/cloud controlled systems with their own components.

You can check our their guide at: http://prism-break.org/

Generally it’s pretty good, although I have concerns around a couple of their recommendations:

  • DuckDuckGo is a hosted proprietary service, so whilst they claim to not track or record searches, it’s entirely possible that they could be legally forced to do so for a particular user/IP address and have a gag order on that. Having said this, it sounds like they’re the type of company that would push back against such requests as much as possible.
  • Moving from Gmail to something like Riseup is just replacing one centralised provider with another, it doesn’t add any additional protection against PRISIM.

As always, the only truly secure (excluding security bugs etc) is one you control entirely. If a leak of your data must be avoided at all costs, you need to be running a server.

python-twitter 1.0 for API 1.1

With Twitter turning of the older API 1.0 today in favour of API 1.1, developers of bots and applications that used the older API need to either upgrade their apps, or they’re die a sad and lonely death.

I have a couple bots written using python-twitter module which broke – thankfully it’s just an easy case of updating the module to version 1.0 (an unfortunate version they should have made it version 1.1 to match Twitter). ;-)

If you’re using RHEL/CentOS/etc, EPEL includes a python-twitter package, but it’s way out of date. Instead, I have RPMs of version 1.0 available for EL5/6 in my repositories. You will want to enable both EPEL and the “amberdms-os” repository before you can install the RPM – EPEL includes a number of Python dependencies I don’t ship myself.

Amberdms Billing System 2.0.0

It’s been a long while since my last release of the Amberdms Billing System (ABS), but at last I’ve finished merging in and testing all the new features that were worked on during my time running Amberdms with two other great coders and prepared a new stable release and documentation.

If you’re not familiar with ABS, it’s an open source billing system providing double-entry accounting, invoicing (with PDF generation), customer management, service billing (including telco usage billing) and time sheeting and billing functionality.

It’s been used by a couple smaller ISPs in New Zealand as well as various open source users around the world and is extremely flexible and powerful software (in my biased opinion). :-)

Accounting and billing isn't a sexy application... but it needs to get done.

Accounting and billing isn’t a sexy application… but it needs to get done.

The major developments of this release include:

  • New invoice templates using HTML/CSS with Webkit as a rendering engine to produce stunning PDF invoices.
  • Numerous improvements and additions to the customer management page.
  • Credit notes & customer credit balance management.
  • Easier bulk handling of payments with bulk payment interface and (beta) bank statement import function.
  • Support for VoIP billing, including charging customers for all calls made based on a configurable call record database as the source.
  • Service bundling to group multiple services together to form packages.

If you’re running a business, particularly a service or technology orientated company, I invite you to take a look and check out ABS. Even if you’re using an existing accounting system like Xero, ABS is a great fit for the billing requirements and is a great base to use, rather than building your own in house billing platform.

You can read the release announcement details here, or go directly to the open source project page and download the installation guides and source code.