Tag Archives: linux

mailx contains invalid character

Munin Performance

Munin is a popular open source network resource monitoring tool which polls the hosts on your network for statistics for various services, resources and other attributes.

A typical deployment will see Munin being used to monitor CPU usage, memory usage, amount of traffic across network interface, I/O statistics and more – it’s very handy for seeing long term performance trends and for checking the impact that upgrades or adjustments to the environment have made.

Whilst having some overlap with Nagios, Munin isn’t really a replacement, more an addition – I use Nagios to do critical service and resource monitoring and use Munin to graph things in more detail – something that Nagios doesn’t natively do.

A typical Munin graph - Munin provides daily, weekly, monthly and yearly graphs (RRD powered)

Rather than running as a daemon, the Munin master runs a cronjob every 5minutes that calls a sequence of scripts to poll the configured servers and generate new graphs.

munin-update to poll configured hosts for new statistics and store the information in RRD databases.
munin-limits to highlight perceived issues in the web interface and optionally to a file for Nagios integration.
munin-graph to generate all the graphs for all the services and hosts.
munin-html to generate the html files for the web interface (which is purely static).

The problem with this model, is that it doesn’t scale particularly well – once you start getting a substantial number of servers, the step-by-step approach can start to run out of resources and time to complete within the 5minute cron period.

For example, the following are the results for the 3 key scripts that run on my (virtualised) Munin VM monitoring 18 hosts:

sh-3.2$ time /usr/share/munin/munin-update
real    3m22.187s
user    0m5.098s
sys     0m0.712s

sh-3.2$ time /usr/share/munin/munin-graph
real    2m5.349s
user    1m27.713s
sys     0m9.388s

sh-3.2$ time /usr/share/munin/munin-html
real    0m36.931s
user    0m11.541s
sys     0m0.679s

It’s a total of around 6 minutes time to run – long enough that the finishing job is going to start clashing with the currently running job.

So why so long?

Firstly, munin-update – munin-update’s time is mostly spent polling the munin-node daemon running on all the monitored systems and then a small amount of I/O time writing the new information to the on-disk RRD files.

The developers have appeared to realise the issue of scale with munin-update and have the ability to run it in a forked mode – however this broke horribly for me with a highly virtualised environment, since sending a poll to 12+ servers all running on the one physical host would cause a sudden load spike and lead to a service poll timeout, with no values being returned at all. :-(

This occurs because by default Munin allows a maximum of 5 seconds for each service query to complete across all hosts and queries all the hosts and services rapidly, ignoring any that fail to respond fast enough. And when querying a large number of servers on one physical host, the server would be too loaded to respond quickly enough.

I ended up boosting the timeouts on some servers to 60 seconds (particular the KVM hosts themselves, as there would sometimes be 60+ LVM volumes that Munin wanted statistics for), but it still wasn’t a good solution and the load spikes would continue.

There are some tweaks that can be used, such as adjusting the max number of forked processes, but it ended up being more reliable and easier to support to just run a single thread and make sure it completed as fast as possible – and taking 3 mins to poll all 18 servers and save to the RRD database is pretty reasonable, particular for a staggered polling session.

After getting munin-update to complete in a reasonable timeframe, I took a look into munin-html and munin-graph – both these processes involve reading the RRD databases off the disk and then writing HTML and RRDTool Graphs (PNG files) to disk for the web interface.

Both processes have the same issue – they chew a solid amount of CPU whilst processing data and then they would get stuck waiting for the disk I/O to catch up when writing the graphs.

The I/O on this server isn’t the fastest at the best of times, considering it’s an AES-256 encrypted RAID 6 volume and the time taken to write around 200MB of changed data each time was a bit too much to do efficiently.

Munin offers some options, including on-demand graph generation using CGIs, however I found this just made the web interface unbearably slow to use – although from chats with the developer, it sounds like version 2.0 will resolve many of these issues.

I needed to fix the performance with the current batch generation model. Just watching the processes in top quickly shows the issue with the scripts, particular with munin-graph which runs 4 concurrent processes, all of them waiting for I/O. (Linux process crash course: S is sleeping (idle), R is running, D is performing I/O operations – or waiting for them).

Clearly this isn’t ideal – I can’t do much about the underlying performance, other than considering putting the monitoring VM onto a different I/O device without encryption, however I then lose all the advantages of having everything on one big LVM pool.

I do however, have plenty of CPU and RAM (Quad Phenom, 16GB RAM) so I decided to boost the VM from 256MB to 1024MB RAM and setup a tmpfs filesystem, which is a in-memory filesystem.

Munin has two main data sources – the RRD databases and the HTML & graph outputs:

# du -hs /var/www/html/munin/
227M    /var/www/html/munin/

# du -hs /var/lib/munin/
427M    /var/lib/munin/

I decided that putting the RRD databases in /var/lib/munin/ into tmpfs would be a waste of RAM – remember that munin-update is running single-threaded and waiting for results from network polls, meaning that I/O writes are going to be spread out and not particularly intensive.

The other problem with putting the RRD databases into tmpfs, is that a server crash/power down would lose all the data and that then requires some regular processes to copy it to a safe place, etc, etc – not ideal.

However the HTML & graphs are generated fresh each time, so a loss of their data isn’t an issue. I setup a tmpfs filesystem for it in /etc/fstab with plenty of space:

tmpfs  /var/www/html/munin   tmpfs   rw,mode=755,uid=munin,gid=munin,size=300M   0 0

And ran some performance tests:

sh-3.2$ time /usr/share/munin/munin-graph 
real    1m37.054s
user    2m49.268s
sys     0m11.307s

sh-3.2$ time /usr/share/munin/munin-html 
real    0m11.843s
user    0m10.902s
sys     0m0.288s

That’s a decrease from 161 seconds (2.68mins) to 108 seconds (1.8 mins). It’s a reasonable increase, but the real difference is the massive reduction in load for the server.

For a start, we can see from watching the processes with top that the processor gets worked a bit more to complete the process, since there’s not as much waiting for I/O:

With the change, munin-graph spends almost all it’s time doing CPU processing, rather than creating I/O load – although there’s the occasional period of I/O as above, I suspect from the time spent reading the RRD databases off the slower disk.

Increased bursts of CPU activity is fine – it actually works out to less CPU load, since there’s no need for the CPU to be doing disk encryption and hammering 1 core for a short period of time is fine, there’s plenty of other cores and Linux handles scheduling for resources pretty well.

We can really see the difference with Munin’s own graphs for the monitoring VM after making the change:

In addition, the host server’s load average has dropped significantly, as well as the load time for the web interface on the server being insanely fast, no more waiting for my browser to finish pulling all the graphs down for a page, instead it loads in a flash. Munin itself gives you an idea of the difference:

If performance continues to be a problem, there are some other options such as moving RRD databases into memory, patching Munin to do virtualisation-friendly threading for munin-update or looking at better ways to fix CGI on-demand graphing – the tmpfs changes would help a bit to start with.

find-debuginfo.sh invalid predicate

2 Replies

I do a lot of packaging for RHEL/CentOS 5 hosts, often this packaging is backporting of newer software versions, typically I’ll pull Fedora’s latest package and make various adjustments to it for RHEL 5’s older environment – typically things like package name changes, downgrade from systemd to init and correcting any missing build dependencies.

Today I came across this rather unhelpful error message:

+ /usr/lib/rpm/find-debuginfo.sh /usr/src/redhat/BUILD/my-package-1.2.3
find: invalid predicate `'

This error is due to the newer Fedora spec files often not explicitly setting the value of BuildRoot which then leaves the package to install into the default location, which isn’t always defined on RHEL 5 hosts.

The correct fix is to define the build root in the spec file with:

BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n)

This will set both %{buildroot} and $RPM_BUILD_ROOT, so no matter whether you’re using either syntax, the files will be installed into the right place.

However, this error is a symptom of a bigger issue – without defining BuildRoot, the package will still compile and complete make install, however instead of the installed files going into /var/tmp/packagename…etc, the files will be installed directly into the actual / filesystem, which is generally ReallyBad(tm)

Now if you were building the package as a non-privileged user, this would have failed at the install phase and you would not have gotten as far as the invalid predicate error.

But if you were naughty and building as the root user, the package would have installed into / without complaint and clobbered any existing files installed on the build host. And the first sign of something being wrong is the invalid predicate error when the find debug script gets provided with no files.

This is the key reason why you are highly recommended to build all packages as a non-privileged user, so that if the build incorrectly tries to install anything into /, the install will be denied and the user will quickly realize things aren’t installing into the right place.

Building as root can be even worse than just “whoops, I overwrote the installed version of mypackage whilst building a new one” or “blagh annoying invalid predicate error” – consider the following specfile line:

rm -rf $RPM_BUILD_ROOT/%{_includedir}

On a properly defined package, this would execute:

rm -rf /var/tmp/packagename/usr/include/

But on a package lacking a BuildRoot definition it becomes:

rm -rf /usr/include/

Yikes! Not exactly what you want – of course, running as a non-root user would save you, since that rm command would be refused and you’d quickly figure out the issue.

I will leave it as an exercise of the reader to determine why I commented about this specific example… ;-)

IMHO, rpmbuild should be patched to just outright refuse to compile packages as the root user so this mistake can’t happen, it seems silly to allow a bad packaging habit to be used when the damages are so severe.

acpid trickiness

3 Replies

Ran into an issue last night with one of my KVM VMs not registering a shutdown command from the host server.

This typically happens because the guest isn’t listening (or is configured to ignore) ACPI power “button” presses, so the guest doesn’t get told that it should shutdown.

In the case of my CentOS (RHEL) 5 VM, the acpid daemon wasn’t installed/running so the ACPI events were being ignored and the VM would just stay running. :-(

To install, start and configure to run at boot:

# yum install -y acpid
# /etc/init.d/acpid start
# chkconfig --level 345 acpid on

If acpid wasn’t originally running, it appears that HAL daemon can grab control of the /proc/acpi/event file and you may end up with the following error upon starting acpid:

Starting acpi daemon: acpid: can't open /proc/acpi/event: Device or resource bus

The reason can quickly be established with a ps aux:

[root@basestar ~]# ps aux | grep acpi
root        17  0.0  0.0      0     0 ?        S<   03:16   0:00 [kacpid]
68        2121  0.0  0.3   2108   812 ?        S    03:18   0:00 hald-addon-acpi: listening on acpi kernel interface /proc/acpi/event
root      3916  0.0  0.2   5136   704 pts/0    S+   03:24   0:00 grep acpi

Turns out HAL grabs the proc file for itself if acpid isn’t running, but if acpid is running, it will talk to acpid to get it’s information. This would self-correct on a reboot, but we can just do:

# /etc/init.d/haldaemon stop
# /etc/init.d/acpid start
# /etc/init.d/haldaemon start

And sorted:

[root@basestar ~]# ps aux | grep acpi
root        17  0.0  0.0      0     0 ?        S<   03:16   0:00 [kacpid]
root      3985  0.0  0.2   1760   544 ?        Ss   03:24   0:00 /usr/sbin/acpid
68        4014  0.0  0.3   2108   808 ?        S    03:24   0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root     16500  0.0  0.2   5136   704 pts/0    S+   13:24   0:00 grep acpi

A tale of two route controllers

Leave a reply

Ever since I built a Linux 3.2.0 kernel for my Debian Stable laptop to take advantage of some of the newer kernel features, I have been experiencing occasional short periods of disconnect/reconnect on the Wi-Fi network.

This wasn’t happening heaps (maybe a couple times a day), but it was starting to get annoying, so I decided to sort it out properly and do a kernel driver and microcode update for my Intel Centrino Wireless-N 1000 card.

The firmware/microcode update was easy enough, simply a case of downloading the latest code from Intel and installing into /lib/firmware/ – the kernel driver does the rest, finding it and loading it into the Wi-Fi card at boot time.

Next step was building a new kernel for my machine, I went through and tuned the module selection very carefully tossing out all the hardware my laptop will never use, as I was getting sick of wasting lots of disk space on the billion+ device modules in Linux these days.

After finding that my initial kernel lacked support for my video card (turns out the Lenovo X201i laptops still use AGP-based i915 cards, I was assuming PCIe) I got a working kernel up and running.

Except that my Wi-Fi stability problem was worse than ever, instead of losing connectivity every few hours, it was now doing so ever few minutes. :-(

The logs weren’t particularly helpful – NetworkManager likes to give reason numbers but I couldn’t easily find a documented explanation of these (but maybe I’m looking in the wrong place).

19:44:36 NetworkManager[1650]: <info> (wlan0): device state change: 8 -> 9 (reason 5)
19:44:36 NetworkManager[1650]: <warn> Activation (wlan0) failed for access point (b201)
19:44:36 NetworkManager[1650]: <warn> Activation (wlan0) failed.
19:44:36 NetworkManager[1650]: <info> (wlan0): device state change: 9 -> 3 (reason 0)
19:44:36 NetworkManager[1650]: <info> (wlan0): deactivating device (reason: 0).
19:44:36 NetworkManager[1650]: <info> (wlan0): canceled DHCP transaction, DHCP client pid 3354
19:44:36 kernel: [  391.070772] wlan0: deauthenticating from 00:0c:42:67:8b:bc by local choice (reason=3)
19:44:36 kernel: [  391.185461] wlan0: moving STA 00:0c:42:67:8b:bc to state 2
19:44:36 kernel: [  391.185466] wlan0: moving STA 00:0c:42:67:8b:bc to state 1
19:44:36 kernel: [  391.185470] wlan0: moving STA 00:0c:42:67:8b:bc to state 0
19:44:36 wpa_supplicant[1682]: CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
19:44:36 NetworkManager[1650]: <error> [1337240676.376011] [nm-system.c:1229] check_one_route(): (wlan0): \
         error -34 returned from rtnl_route_del(): Netlink Error (errno = Numerical result out of range)
19:44:36 kernel: [  391.233344] cfg80211: Calling CRDA to update world regulatory domain
19:44:36 avahi-daemon[1633]: Withdrawing address record for 192.168.1.11 on wlan0.
19:44:36 avahi-daemon[1633]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.11.
19:44:36 avahi-daemon[1633]: Interface wlan0.IPv4 no longer relevant for mDNS.
19:44:36 avahi-daemon[1633]: Withdrawing address record for 2407:1000:1003:99:226:c7ff:fe66:b822 on wlan0.
19:44:36 avahi-daemon[1633]: Leaving mDNS multicast group on interface wlan0.IPv6 with address 2407:1000:1003:99:226:c7ff:fe66:b822.
19:44:36 NetworkManager[1650]: <info> (wlan0): writing resolv.conf to /sbin/resolvconf
19:44:36 avahi-daemon[1633]: Joining mDNS multicast group on interface wlan0.IPv6 with address fe80::226:c7ff:fe66:b822.
19:44:36 avahi-daemon[1633]: Registering new address record for fe80::226:c7ff:fe66:b822 on wlan0.*.

So I proceeded to debug:

Cursed and wished my 300m spool of Cat6 ethernet wasn’t in Wellington.
Rolled back the microcode update – my initial thought was that the new code was making the card unstable and the result was the card dropping the connection and NetworkManager doing the clean up.
Did a full power down to make sure that the microcode wasn’t remaining active on the card across reboots (had this problem once with a dodgy GPU once).
Verdict: Microcode upgrade was OK, must be something else.
Upgraded NetworkManager from 0.8.1 to 0.8.4 from Debian Backports – 0.8.1 isn’t too recent, was tempted to try 0.9 series but would have required a lot more backporting work.
Verdict: Appears not to be a NetworkManager issue in the 0.8 series – maybe something fixed in 0.9 or later?
Upgraded wpasupplicant from 0.6.10 to 1.0 by manual backport from unstable – the activation error made me consider it might have been a bug with newer kernels & wpasupplicant’s AP negotiation.
Verdict: No change to the issue.
Built a Linux 3.3 kernel with the older less-crashy 3.2 iwlwifi driver to see if it was driver specific, or otherwise-kernel related.
Verdict: Same issue continued to occur, rolling back driver version infact made no change – something about the 3.3 kernel itself was the problem.
Got suspicious about NetworkManager – either it or the kernel had to be at fault, one possibility was some weird API breakage with the age gap between the software versions being used. The kernel is *usually* pretty solid and something like wifi drivers dropping every couple of minutes would be a pretty serious bug to get through, so I looked through the logs to see if I could get anything more useful with NetworkManager’s logs.
Spotted a kernel error “ICMPv6 RA: ndisc_router_discovery() failed to add default route.“. This error tended to occur shortly before any WiFi disconnection occurred, but not immediately so.
Found an entry in Red Hat’s bugzilla.
And then the upstream bug fix from 19th April.

Turns out that the Linux 3.3 kernel and NetworkManager fight over which one is going to control the default route for each router advertised link – the kernel adds one, Network Manager removes and then the kernel gets upset and drops all router advertisements.

In hindsight, I should have spotted it sooner, but I had discarded the RA statement from being related initially as the disconnection often didn’t happen till a minute of two after the log entry occurred – eg:

19:51:40 kernel: [  814.274903] ICMPv6 RA: ndisc_router_discovery() failed to add default route.
19:52:47 NetworkManager[1650]: <info> (wlan0): device state change: 8 -> 9 (reason 5)

What’s interesting about this bug, is that at first reading it explains a loss of IPv6 connectivity perfectly – however it doesn’t explain why IPv4 or the Wi-Fi connection itself was impacted.

The reason this happened, is that NetworkManager was set to have IPv6 as a requirement for that connection to be established – in the event of IPv6 not working, NetworkManager would consider the interface to be down, even if IPv4 was up.

There is a good reason for this, that the developers detailed on their (excellently written) blog, explaining that by having NetworkManager check for IPv6, it allows applications to be written smarter to better understand their level of connectivity.

For users of the NetworkManager 0.9 series, there’s a patch already committed which you can grab here and I would expect the next NetworkManager update will have this fix.

If you’re on the NetworkManager 0.8 series, this patch won’t apply cleanly – I might make some time to go and backport it, but you can workaround it for now by using the Ignore method so that NetworkManager does nothing and leaves it up to the Linux kernel in the background to negotiate IPv6 addressing.

Breaking vs Working Network Manager Settings

Of course if you’re not connecting to any IPv6 capable networks, you don’t have anything to worry about (other than the fact you’re still stuck in the 20th century).

Initially I was a bit annoyed at NetworkManager for being so silly as to drop the whole interface when just one of the two networking stacks was broken, however after thinking about it for a bit, it does make some sense as to why it chose that behavior – often most interface issues can be fixed by reconnecting – maybe the AP got rebooted, maybe the laptop just moved between two of them, etc – a reconnect can solve many of these.

But a smarter approach, would be to determine whether network issues are layer 2 or layer 3 – if it’s just a layer 3 issue, then there’s little need to drop the Wi-Fi connection itself, instead attempt to re-establish IPv4 or IPv6 connectivity where appropriate, and if unable to do so, use the notifications to tell the user that “IPv6 connectivity is experiencing a problem, some hosts and services may be unreachable”.

It’s actually something that Windows does semi-OK – it figures out roughly how borked a user’s connection is and then does a balloon popup stating that there’s limited connectivity or IP conflict, or some other sometimes helpful message.

This may be better in newer versions of NetworkManager, I’ll have to have a play with a more recent release and see.

DAViCal 1.0.2 on RHEL 5 & 6

8 Replies

To follow up on my previous post about DAViCal, I’ve built and published RPMs for DAViCal itself and the php-awl dependency.

These are based off provided spec files from the project and tweaked somewhat to be more suitable for RHEL 5 & 6.

RHEL 5 & PostgreSQL 8.1 Note

Whilst DAViCal is intended (and for normal operation, does) work with PostgreSQL 8.1 or later, this version is too old for the LDAP authentication module to work, as it uses some PostgreSQL 8.4 version queries.

Fortunately RHEL & CentOS ship with both PostgreSQL 8.1 and PostgreSQL 8.4 now available, so you can fix the solution by installing with:

# yum install davical postgresql84-server

RHEL 5 & 6 Installation Instructions

These instructions assume you have confirmed the Amberdms RHEL 5 “amberdms-os” repository at minimum – or you can go and pull the specific RPM files you want – php-awl and davical and add them to your own repository.

Once the repositories are setup, simply install with:

# yum install davical

DAViCal uses PostgreSQL, if this is a new/first PostgreSQL installation, you will need to start and possibly initilise the DB:

# service postgresql start
 /var/lib/pgsql/data is missing. Use "service postgresql initdb" to
 initialize the cluster first.     [FAILED]
# service postgresql initdb
 Initializing database:    [  OK  ]
# service postgresql start
 Starting postgresql service:     [  OK  ]

We need to edit the PostgreSQL user authentication configuration to allow local-only password-less access for the DAViCal application. Optionally you can configure MD5, ident or other desired methods. Add the two lines below to the configuration file, above any existing lines.

# vi /var/lib/pgsql/data/pg_hba.conf

 # trust davical
 local   davical davical_app     trust
 local   davical davical_dba     trust

Restart PostgreSQL for the changes to take effect:

# service postgresql restart

Install the database:

# cd /tmp/
# su postgres -c /usr/share/davical/dba/create-database.sh

 Supported locales updated.
 Updated view: dav_principal.sql applied.
 CalDAV functions updated.
 RRULE functions updated.
 Database permissions updated.
 NOTE
 ====
 *  The password for the 'admin' user has been set to 'EXAMPLE' 

 Thanks for trying DAViCal!  Check in /usr/share/doc/davical/examples/ for
 some configuration examples.  For help, visit #davical on irc.oftc.net.

Adjust the access rules for Apache & restart it:

# vi /etc/httpd/conf.d/davical.conf
# service httpd restart

Test access at http://localhost/davical/or whatever your appropriate server URL is. Any 403 errors probably suggest fault with the /etc/httpd/conf.d/davical.conf IP ACL configuration.

RHEL 5 & 6 Upgrade Instructions

Using the packages I have provided, the DAViCal PostgreSQL DB will be updated on any new releases when installing newer RPMs.

This uses the /usr/share/davical/dba/update-davical-database script supplied with DAViCal and shouldn’t require any manual execution or options normally.

LDAP Authentication

To configure LDAP authentication, edit the configuration file and define the external authentication settings.

vi /etc/davical/config.php

See the notes in the file about LDAP configuration or consult the quite reliable source of documentation at the DAViCal wiki.

You will also need to have php-ldap installed – it’s not one of the default package dependencies – if it’s missing, you will get this clear message on the login screen:

"drivers_ldap : function ldap_connect not defined, check your php_ldap module"

To install, run:

# yum install php-ldap
# service httpd restart

If authentication still fails to work, try the following

Check the version of PostgreSQL used – must be 8.4 or later, not 8.1, as per my note at the start of this document.
Check Apache error logs (typically /var/log/httpd/error_log)
Check the LDAP server logs

Incur the Wrath of Linux

5 Replies

Linux is a pretty hardy operating system that will take a lot of abuse, but there are ways to make even a Linux system unhappy and vengeful by messing with available resources.

I’ve managed to trigger all of these at least once, sometimes I even do it a few times before I finally learn, so I’ve decided to sit down and make a list for anyone interested.

Disk Space

Issue:

Running out of disk. This is a wonderful way to cause weird faults with services like databases, since processes will block (pause) until there is sufficient disk space available again to allow writes to complete.

This leads to some delightful errors such as websites failing to load since the dynamic pages are waiting on the database, which in return is waiting on disk. Or maybe apache can’t write anymore PHP session files to disk, so no PHP based pages load.

And mail servers love not having disk, thankfully in all the cases I’ve seen, Sendmail & dovecot just halt and retain messages in memory without causing a loss of data. (although a reboot when this is occurring could be interesting).

Resolution:

For production systems I always carefully consider the partition table structure, so that an issue such as out-of-control logging processes or tmp directories can’t impact key services such as databases, by creating separate partitions for their data.

This issue is pretty easy to fix with good monitoring, packages such as Nagios include disk usage checks in the stock versions that can alert at configurable intervals (eg 80% of disk used).

Disk Access

Issue:

Don’t unplug a disk whilst Linux is trying to use it. Just don’t. Really. Things get really unhappy and you get to look at nice output from ps aux showing processes blocked for disk.

The typical mistake here is unplugging devices like USB hard drives in the middle of a backup process causing the backup process to halt and typically the kernel will spewing the system logs with warnings about how naughty you’ve been.

Fortunately this is almost always recoverable, the process will eventually timeout/terminate and the storage device will work fine on the next connection, although possibly with some filesystem errors or a corrupt file if halfway through writing to disk.

Resolution:

Don’t be a muppet. Or at least educate users that they probably shouldn’t unplug the backup drive if it’s flashing away busy still.

Networked Storage

Issue:

When using networked storage the kernel still considers the block storage to be just as critical as local storage, so if there’s a disruption accessing data on a network file system, processes will again halt until the storage returns.

This can have mixed blessings – in a server environment where the storage should always be accessible, halting can be the best solution since your programs will wait for the storage to return and hopefully there will be no data loss.

However for a mobile environment this can cause problems to hang indefinetly waiting for storage that might not be able to be reconnected.

Resolution:

In this case, the soft option can be used when mounting network shares, which will cause the kernel to return an error to the process using the storage if it becomes unavailable so that the application (hopefully) warns the user and terminates gracefully.

Using a daemon such as autofs to automatically mount and unmount network shares on demand can help reduce this sort of headache.

Low Memory

Issue:

Running out of memory. I don’t just mean RAM, but swap space (pagefile for you windows users). When you run out of RAM on almost any OS, it won’t be that happy – Linux handles this situation by killing off processes using the OOM in order to free up memory gain.

This makes sense in theory (out of memory, so let’s kill things that are using it), but the problem is that it doesn’t always kill the ones you want, leading to anything from amusement to unmanageable boxes.

I’ve had some run-ins with the OOM before, killing my ssh daemon on overloaded boxes preventing me from logging into them. :-/

One the other hand, just giving your system many GB of swap space so that it doesn’t run out of memory isn’t a good fix either, swap is terribly slow and your machine will quickly grind to a near-halt.

The performance of using swap is so bad it’s sometimes difficult to even log in to a heavily swapping system.

Resolution:

Buy more RAM. Ideally you shouldn’t be trying to run more than possible on a box – of course it’s possible to get by with swap space, but only to a small degree due to the performance pains.

In a virtual environment, I’m leaning towards running without swap and letting OOM just kill processes on guests if they run out of memory, usually it’s better to take the hit of a process being killed than the more painful slowdown from swap.

And with VMs, if the worst case happens, you can easily reboot and console into the systems, compared to physical hosts where you can’t afford to lose manageability at all costs.

Of course this really depends on your workload and what you’re doing, best solution is monitoring so that you don’t end up in this situation in the first place.

Sometimes it just happens due a once-off process and is difficult to always forsee memory issues.

Incorrect Time

Issue:

Having the incorrect time on your server may appear only a nuisance, but it can lead to many other more devious faults.

Any applications which are time-sensitive can experience weird issues, I’ve seen problems such as samba clients being unable to see newer files than the system time and having bind break for any lookups. Clock issues are WEIRD.

Resolution:

We have NTP, it works well. Turn it on and make sure the NTP process is included in your process monitoring list.

Authentication Source Outages

Issue:

In larger deployments it’s often common to have a central source of authentication such as LDAP, Kerberos, Radius or even Active Directory.

Linux actually does a remarkable amount of lookups against the configured authentication sources in regular operation. Aside from the need to lookup whenever a user wishes to login, Linux will lookup the user database every time the attributes of a file is viewed (user/group information) which is pretty often.

There’s some level of inbuilt caching, but unless you’re running a proper authentication caching daemon allowing off-line mode, a prolonged outage to the authentication server will make it impossible for users to login, but also break simple queries such as ls as the process will be trying to make user/group information lookups.

Resolution:

There’s a reason why we always have two or more sources for key network services such as DNS and LDAP, take advantage of the redundancy built into the design.

However this doesn’t help if the network is down entirely, in which case the best solution is having the system configured to quickly failover to local authentication or to use the local cache.

Even if failover to a secondary system is working, a lot of the timeout defaults are too high (eg 300 seconds before trying the secondary). Whilst the lookups will still complete eventually, these delays will noticely impact services, so it’s recommended to lookup the authentication methods being used and adjust the timeouts down to a couple seconds tops.

This is just a few of simple yet nasty ways to break Linux systems in ways that cause weird application behaviour, but not nessacarily in a form that’s easy to debug.

In most cases, decent monitoring will help you avoid and handle many of these issues better by alerting to low resource situations – if you have nothing currently, Nagios is a good start.

Why I hate DSL

2 Replies

mmm latency, delicious delicious latency

I’m living around 8km from the center of Auckland, New Zealand’s largest city, with only 8 mbits down and 0.84 mbits upstream in a very modern building with (presumably) good wiring installed. :'(

The above is what happens to your domestic latency when your server’s cronjobs decide to push 1.8GB of new RPMs up to a public repo, causing the performance to any other hosts to slow to a grinding halt. :-/

On the plus side it did make me aware of a fault in my server setup – one of my VMs was incorrectly set to use my secondary LDAP server as it’s primary authentication source, meaning it called back across the VPN over this DSL connection, so when this performance hit occurred, the server started having weird “hangs” due to processes blocking whilst waiting for authentication attempts to complete.

The sooner we can shift off DSL the better – there’s the possibility that my area might be covered with VDSL, but since I don’t think we’ll be here for too long, I’m not going to go to the effort to look at other access methods.

But I will make sure I carefully consider where UFB is getting laid when I come to buy property….

Intel 320 SSD stats & encryption

8 Replies

I recently obtained a 120GB Intel 320 SSD for upgrading my Lenovo Thinkpad X201i from it’s sluggish hard disk to something with a bit sharper performance.

Whilst not the latest and highest performance SSD from Intel, it’s certainly still very quick compared to the hard disk, and it made more sense than buying a more expensive newer model that would be restricted by the SATA 2 bus on my laptop.

The performance increase is impressive, my sequential reads went from 40,300 KB/s to 132,673 KB/s, showing dramatically faster boot performance and snappy application load times. And the seek times jumped massively from 151.4/per second to at least 10,524/per second.

Infact, the SSD is so fast, it can be difficult to get stats of it’s true seek performance. With the seeks completing in only a few microseconds, the bonnie++ tests often finished a bit early and the results would vary, it’s possible the seeks might be even larger than 10k+ per second.

The next major question for me, was what would the performance be if running disk encryption ontop of the SSD. Due to the private nature of my data, I fully encrypt my laptop using dm-crypt/Linux disk encryption with AES 256bit, so that if the machine is ever stolen, the data is unreadable.

Of course, this security imposes an overhead – data needs to be decrypted before it can be read, adding additional overheads, particularly with CPU performance. It’s also worth noting, that the Linux disk encryption implementation is single threaded, meaning that the maximum encryption/decryption performance is limited by the maximum performance of a single core of your processor.

After installing the OS using an encrypted disk, there was a noticeable performance drop. In particular, the sequential reads dropped from 132,673 KB/s to a much less exciting 69,805 KB/s. Whilst still significantly faster than the conventional hard drive’s 40,300 KB/s, it’s a big drop from the true capability of the SSD.

Fortunately the write performance was impacted far less, I suspect because the OS and the CPU core doing the encryption was able to keep up with the slower performance of writing to the SSD, in comparison to the reads. Based on the stats I obtained, it looks like my laptop tops out around 70,000 KB/s, so any additional performance of the SSD above that is wasted.

I’ve uploaded the actual performance statistics generated to a separate page, which you can view if interested.

From a usability point-of-view, even with encryption, the boot time performance is impressive, the laptop starts in about half the time of what it did previously, along with massive improvements in the start time of applications.

The improvements are particularly noticeable when loading a number of applications concurrently – with a conventional hard drive, the need to load data across different physical parts of the disk platters causes a lot of delays when multitasking application loads. On the SSD, I can click a number of applications and have them *all* load within a second or two.

Overall I’m pleased with the upgrade, even with the reduced performance from encryption, the SSD still offers some major performance upgrades and was well worth doing.

The only outstanding downside now is the issue of fitting all my data that I actually want to regularly access on my laptop onto the small size of the SSD…. I’m currently looking into filesystems that provide offline access or caching of networked filesystems from my servers, so that I can have regularly accessed files stored locally, but the full selection just a network transfer away.