Tag Archives: geek

Anything IT related (which is most things I say) :-)

Google Search & Control

I’ve been using Google for search for years, however it’s the first time I’ve ever come across a DMCA takedown notice included in the results.

Possibly not helped by the fact that Google is so good at finding what I want, that I don’t tend to scroll down more than the first few entries 99% of the time, so it’s easy to miss things at the bottom of the page.

Lawyers, fuck yeah!

Turns out that Google has been doing this since around 2002 and there’s a form process you can follow with Google to file a notice to request a search result removal.

Sadly I suspect that we are going to see more and more situations like this as governments introduce tighter internet censorship laws and key internet gatekeepers like Google are going to follow along with whatever they get told to do.

Whilst people working at Google may truly subscribe to the “Don’t be evil” slogan, the fundamental fact is that Google is a US-based company that is legally required to do what’s best for the shareholders – and the best thing for the shareholders is to not try and fight the government over legalization, but to implement as needed and keep selling advertising.

In response to concerns about Google over privacy, I’ve seen a number of people to shift to new options, such as the increasingly popular and open-source friendly Duck Duck Go search engine, or even Microsoft’s Bing which isn’t too bad at getting decent results with a UI looking much more like early Google.

However these alternatives all suffer from the same fundamental problem – they’re centralized gatekeepers who can be censored or controlled – and then there’s the fact that a centralised entity can track so much about your online browsing. Replacing Google with another company will just leave us in the same position in 10 years time.

Lately I’ve been seeking to remove all the centralized providers from my online life, moving to self-run and federated services – basic stuff like running my own email, instant messaging (XMPP), but also more complex “cloud” services being delivered by federated or self-run servers for tasks such as browser syncing, avatar handling, contacts sync, avoiding URL shortners and quitting or replacing social networks.

The next big one of the list is finding an open source and federated search solution – I’m currently running tests with a search engine called YaCy, which is a peer-to-peer decentralised search engine that is made up of thousands of independent servers, sharing information between themselves.

To use YaCy, you download and run your own server, set it’s search indexing behavior and let it run and share results with other servers (it’s also possible to run it in a disconnected mode for indexing your internal private networks).

The YaCy homepage has an excellent write up of their philiosophy and design fundamentals for the application.

It’s still a bit rough, I think the search results could be better – but this is something that having more nodes will certainly help with and the idea is promising – I’m planning to setup a public instance on my server in the near future for adding all my sites to the index and providing a good test of it’s feasibility.

Mozilla Firefox “Pin as App”

In a moment of madness, I decided to RTFM the latest Mozilla Firefox Feature List and came across this nifty ability called “Pin as App”.

nawww baby tabs!

It’s pretty handy, I’m using it to maintain tabs of commonly access websites or web applications that I need many times a day, easy to find since it’s always on the left in the defined order, and much smaller than the full tab size.

Only issue is that you need your remote site/app to have a decent favicon – if they don’t, you’ll just end up with a dashed square placeholder and there’s no way in Firefox to set a custom icon for that pin that I can see.

Incur the Wrath of Linux

Linux is a pretty hardy operating system that will take a lot of abuse, but there are ways to make even a Linux system unhappy and vengeful by messing with available resources.

I’ve managed to trigger all of these at least once, sometimes I even do it a few times before I finally learn, so I’ve decided to sit down and make a list for anyone interested.

 

Disk Space

Issue:

Running out of disk. This is a wonderful way to cause weird faults with services like databases, since processes will block (pause) until there is sufficient disk space available again to allow writes to complete.

This leads to some delightful errors such as websites failing to load since the dynamic pages are waiting on the database, which in return is waiting on disk. Or maybe apache can’t write anymore PHP session files to disk, so no PHP based pages load.

And mail servers love not having disk, thankfully in all the cases I’ve seen, Sendmail & dovecot just halt and retain messages in memory without causing a loss of data. (although a reboot when this is occurring could be interesting).

Resolution:

For production systems I always carefully consider the partition table structure, so that an issue such as out-of-control logging processes or tmp directories can’t impact key services such as databases, by creating separate partitions for their data.

This issue is pretty easy to fix with good monitoring, packages such as Nagios include disk usage checks in the stock versions that can alert at configurable intervals (eg 80% of disk used).

 

Disk Access

Issue:

Don’t unplug a disk whilst Linux is trying to use it. Just don’t. Really. Things get really unhappy and you get to look at nice output from ps aux showing processes blocked for disk.

The typical mistake here is unplugging devices like USB hard drives in the middle of a backup process causing the backup process to halt and typically the kernel will spewing the system logs with warnings about how naughty you’ve been.

Fortunately this is almost always recoverable, the process will eventually timeout/terminate and the storage device will work fine on the next connection, although possibly with some filesystem errors or a corrupt file if halfway through writing to disk.

Resolution:

Don’t be a muppet. Or at least educate users that they probably shouldn’t unplug the backup drive if it’s flashing away busy still.

 

Networked Storage

Issue:

When using networked storage the kernel still considers the block storage to be just as critical as local storage, so if there’s a disruption accessing data on a network file system, processes will again halt until the storage returns.

This can have mixed blessings – in a server environment where the storage should always be accessible, halting can be the best solution since your programs will wait for the storage to return and hopefully there will be no data loss.

However for a mobile environment this can cause problems to hang indefinetly waiting for storage that might not be able to be reconnected.

Resolution:

In this case, the soft option can be used when mounting network shares, which will cause the kernel to return an error to the process using the storage if it becomes unavailable so that the application (hopefully) warns the user and terminates gracefully.

Using a daemon such as autofs to automatically mount and unmount network shares on demand can help reduce this sort of headache.

 

Low Memory

Issue:

Running out of memory. I don’t just mean RAM, but swap space (pagefile for you windows users). When you run out of RAM on almost any OS, it won’t be that happy – Linux handles this situation by killing off processes using the OOM in order to free up memory gain.

This makes sense in theory (out of memory, so let’s kill things that are using it), but the problem is that it doesn’t always kill the ones you want, leading to anything from amusement to unmanageable boxes.

I’ve had some run-ins with the OOM before, killing my ssh daemon on overloaded boxes preventing me from logging into them. :-/

One the other hand, just giving your system many GB of swap space so that it doesn’t run out of memory isn’t a good fix either, swap is terribly slow and your machine will quickly grind to a near-halt.

The performance of using swap is so bad it’s sometimes difficult to even log in to a heavily swapping system.

 

 Resolution:

Buy more RAM. Ideally you shouldn’t be trying to run more than possible on a box – of course it’s possible to get by with swap space, but only to a small degree due to the performance pains.

In a virtual environment, I’m leaning towards running without swap and letting OOM just kill processes on guests if they run out of memory, usually it’s better to take the hit of a process being killed than the more painful slowdown from swap.

And with VMs, if the worst case happens, you can easily reboot and console into the systems, compared to physical hosts where you can’t afford to lose manageability at all costs.

Of course this really depends on your workload and what you’re doing, best solution is monitoring so that you don’t end up in this situation in the first place.

Sometimes it just happens due a once-off process and is difficult to always forsee memory issues.

 

Incorrect Time

Issue:

Having the incorrect time on your server may appear only a nuisance, but it can lead to many other more devious faults.

Any applications which are time-sensitive can experience weird issues, I’ve seen problems such as samba clients being unable to see newer files than the system time and having bind break for any lookups. Clock issues are WEIRD.

Resolution:

We have NTP, it works well. Turn it on and make sure the NTP process is included in your process monitoring list.

 

Authentication Source Outages

Issue:

In larger deployments it’s often common to have a central source of authentication such as LDAP, Kerberos, Radius or even Active Directory.

Linux actually does a remarkable amount of lookups against the configured authentication sources in regular operation. Aside from the need to lookup whenever a user wishes to login, Linux will lookup the user database every time the attributes of a file is viewed (user/group information) which is pretty often.

There’s some level of inbuilt caching, but unless you’re running a proper authentication caching daemon allowing off-line mode, a prolonged outage to the authentication server will make it impossible for users to login, but also break simple queries such as ls as the process will be trying to make user/group information lookups.

Resolution:

There’s a reason why we always have two or more sources for key network services such as DNS and LDAP, take advantage of the redundancy built into the design.

However this doesn’t help if the network is down entirely, in which case the best solution is having the system configured to quickly failover to local authentication or to use the local cache.

Even if failover to a secondary system is working, a lot of the timeout defaults are too high (eg 300 seconds before trying the secondary). Whilst the lookups will still complete eventually, these delays will noticely impact services, so it’s recommended to lookup the authentication methods being used and adjust the timeouts down to a couple seconds tops.

 

This is just a few of simple yet nasty ways to break Linux systems in ways that cause weird application behaviour, but not nessacarily in a form that’s easy to debug.

In most cases, decent monitoring will help you avoid and handle many of these issues better by alerting to low resource situations – if you have nothing currently, Nagios is a good start.

Mozilla Collusion

This week Mozilla released an add-on called Collusion, an experimental extension which shows and graphs how you are being tracked online.

It’s pretty common knowledge how much you get tracked online these days, if you just watch your status bar when loading many popular sites you’ll always see a few brief hits to services such as Google Analytics, but there’s also a lot of tracking down with social networking services and advertisers.

The results are pretty amazing, I took these after turning it on for myself for about 1 day of browsing, every day I check in the graph is even bigger and more amazing.

The web actually starting to look like a web....

As expected, Google is one of the largest trackers around, this will be thanks to the popularity of their Google Analytics service, not to mention all the advertising technology they’ve acquired and built over the years including their acquisition of DoubleClick.

I for one, welcome our new Google overlords and would like to remind them that as a trusted internet celebrity I can be useful for rounding up other sites to work in their code mines.

But even more interesting is the results for social networks. I ran this test whilst logged out of my Twitter account, logged out of LinkedIn and I don’t even have Facebook:

Mark Zuckerberg knows what porn you look at.

Combine 69+ tweets a day & this information and I think Twitter would have a massive trove of data about me on their servers.

Linkedin isn't quite as linked at Facebook or Twitter, but probably has a simular ratio if you consider the userbase size differences.

When you look at this information, you can see why Google+ makes sense for the company to invest in. Google has all the data about your browsing history, but the social networks are one up – they have all your browsing information with the addition of all your daily posts, musings, etc.

With this data advertising can get very, very targeted and it makes sense for Google to want to get in on this to maintain the edge in their business.

It’s yet another reason I’m happy to be off Twitter now, so much less information that can be used by advertisers for me. It’s not that I’m necessarily against targeted advertising, I’d rather see ads for computer parts than for baby clothes, but I’m not that much of a fan of my privacy being so exposed and organisations like Google having a full list of everything I do and visit and being able to profile me so easily.

What will be interesting will be testing how well the tracking holds up once IPv6 becomes popular. On one hand, IPv6 can expose users more, if they’re connecting with a MAC-based address, but on the other hand, could privatise more using IPv6 address randomisation when assigning systems IP addresses.

Mozilla Sync Server RPMs

A few weeks ago I wrote about the awesomeness that is Mozilla’s Firefox Sync, a built-in feature of Firefox versions 4 & later which allows for synchronization of bookmarks, history, tabs and password information between multiple systems. (historically known as Weave)

I’ve been running this for a few weeks now on my servers using fully packaged components and it’s been working well, excluding a few minor hick-ups.

It’s taken a bit longer than I would have liked, but I now have stable RPM packages for RHEL/CentOS 5 and 6 for both i386 and x86_64 available publicly.

I always package all software I use on my servers (and even my desktops most of the time) as it makes re-building, upgrading and supporting systems far easier in the long run. By having everything in packages and repos, I can rebuild a server entirely simply by knowing what list of packages were originally installed and their configuration files.

Packaging one’s software is also great when upgrading distribution, as you can get a list of all non-vendor programs and libraries installed and then use the .src.rpm files to build new packages for the new OS release.

 

Packaging Headaches

Mozilla Sync Server was much more difficult to package than I would have liked, mostly due  to the documentation clarity and the number of dependencies.

The primary source of pain was that I run CentOS 5 for a lot of my production systems,which ships with Python 2.4, whereas to run Mozilla Sync Server, you will need Python 2.6 or later.

This meant that I had to build RPMs for a large number (upwards of 20 IIRC) python packages to provide python26 versions of existing system packages. Whilst EPEL had a few of the core ones (such as python26 itself), many of the modules I needed either weren’t packaged, or were only had EPEL packages for Python 2.4.

The other major headache was due to unclear information and in some cases, incorrect documentation from Mozilla.

Mozilla uses the project source name of server-full in the setup documentation, however this isn’t actually the entire “full” application – rather it provides the WSGI executable and some libraries, however you also need server-core, server-reg and server-storage plus a number of python modules to build a complete solution.

Sadly this isn’t entirely clear to anyone reading the setup instructions, the only setup information relates to checking out server-full and running a build script which will go through and download all the dependencies (in theory, it often broke for me) and build a working system, complete with paster web server.

Whilst this would be a handy resource for anyone doing development, it’s pretty useless for someone wanting to package a proper system for deployment since you need to break all the dependencies into separate packages.

(Note that whilst Mozilla refer to having RPM packages for the software components, these have been written for their own inhouse deployment and are not totally suitable for stock systems, not to mention even when you have SPEC files for some of the Mozilla components, you still lack the SPEC files for dependencies.)

To top it off, some information is just flat out wrong and can only be found out by first subscribing to the developer mailing list – in order to gain a login to browse the list archives – so that you can ind such gems as “LDAP doesn’t work and don’t try as it’s being re-written”.

Toss in finding a few bugs that got fixed right around the time I was working on packaging these apps and you can understand if I’m not filled with love for the developers right this moment.

Of course, this is a particularly common open source problem – the team clearly released in a way that made sense to them, and of course everyone would know the difference between server-core/full/reg/storage, etc right?? ;-) I know I’m sometimes guilty of the same thing.

Having said that, the documentation does appear to be getting better and the community is starting to contribute more good documentation resources. I also found a number of people on the mailing list quite helpful and the Mozilla Sync team were really fast and responsive when I opened a bug report, even when it’s a “stupid jethro didn’t hg pull the latest release before testing” issue.

 

Getting My Packages

All the new packages can be found in the Amberdms public package repositories, the instructions on setting up the CentOS 5 or CentOS 6 repos can be found here.

 

RHEL/CentOS 5 Repo Instructions

If you are running RHEL/CentOS 5, you only need to enable amberdms-os, since all the packages will install in parallel to the distribution packages. Nothing in this repo should ever clash with packages released by RedHat, but may clash/be newer than dag or EPEL packages.

 

RHEL/CentOS 6 Repo Instructions

If you are running RHEL/CentOS6, you will need to enable both amberdms-os and amberdms-updates, as some of the python packages that are required are shipped by RHEL, but are too outdated to be used for Mozilla Sync Server.

Note that amberdms-updates may contain newer versions of other packages, so take care when enabling it, as I will have other unrelated RPMs in there. If you only want my newer python packages for mozilla sync, set includepkgs=python-* for amberdms-updates

Also whilst I have tested these packages for Mozilla Sync Server’s requirements, I can’t be sure of their suitability with existing Python applications on your server, so take care when installing these as there’s always a chance they could break something.

 

RHEL/CentOS 5 & 6 Installation Instructions

Prerequisites:

  1. Configured Amberdms Repositories as per above instructions.
  2. Working & configured Apache/httpd server. The packaged programs will work with other web servers, but you’ll have to write your own configuration files for them.

Installation Steps:

  1. Install packages with:
    yum install mozilla-sync-server
  2. Adjust Apache configuration to allow access from desired networks (standard apache IP rules).
    /etc/httpd/conf.d/mozilla-sync-server.conf
  3. Adjust Mozilla Sync Server configuration. If you want to run with the standard sqllite DB (good for initial testing), all you must adjust is line 44 to set the fallback_node value to the correct reachable URL for Firefox clients.
    vi /etc/mozilla-sync-server/mozilla-sync-server.conf
  4. Restart Apache – due to the way mozilla-sync-server uses WSGI, if you make a change to the configuration, there might still be a running process using the existing config. Doing a restart of Apache will always fix this.
    /etc/init.d/httpd restart
  5. Test that you can reach the sync server location and see if anything breaks. These tests will fail if something is wrong such as missing modules or inability to access the database.
    http://host.example.com/mozilla-sync/
    ^ should return 404 if working - anything else indicated error
    
    http://host.example.com/mozilla-sync/user/1.0/a/
    ^ should return 200 with the page output of only 0
  6. There is also a heartbeat page that can be useful when doing automated checks of the service health, although I found it possible to sometimes break the server in ways that would stop sync for Firefox, but still show OK for heartbeat.
    http://host.example.com/mozilla-sync/__heartbeat__
  7. If you experience any issues with the test URLs, check /var/log/httpd/*error_log*. You may experience problems if you’re using https:// with self-signed certificates that aren’t installed in the browser as trusted too, so import your certs properly so they’re trusted.
  8. Mozilla Sync Server is now ready for you to start using with Firefox clients. My recommendation is to use a clean profile you can delete and re-create for testing purposes and only add sync with your actual profile once you’ve confirmed the server is working.

 

Using MySQL instead of SQLite:

I tend to standardise on using MySQL where possible for all my web service applications since I have better and more robust monitoring and backup tools for MySQL databases.

If you want to setup Mozilla Sync Server to use MySQL, it’s best to get it working with SQLite first and then try with MySQL to ensure you don’t have any issues with the basic setup before doing more complex bits.

  1. Obviously the first step should be to setup MySQL server, if you haven’t done this yet, the following command will set it up and take you through a secure setup process to password protect the root DB accounts:
    yum install -y mysql-server
    /etc/init.d/mysqld start
    chkconfig --level 345 mysqld on
    /usr/bin/mysql_secure_installation
  2. Once the MySQL server is running, you’ll need to create a database and user for Mozilla Sync Server to use – this can be done with:
    mysql -u root -p
    # or without -p if no MySQLroot password
    CREATE DATABASE mozilla_sync;
    GRANT ALL PRIVILEGES ON mozilla_sync.* TO mozilla_sync@localhost IDENTIFIED BY  'examplepassword';
    flush privileges;
    \q
  3. Copy the [storage] and [auth] sections from /etc/mozilla-sync-server/sample-configs/mysql.conf to replace the same sections in /etc/mozilla-sync-server/mozilla-sync-server.conf. The syntax for the sqluri line is:
    sqluri = mysql://mozilla_sync:examplepassword@localhost:3306/mozilla_sync
  4. Restart Apache (very important, failing todo so will not apply config changes):
    /etc/init.d/httpd restart
  5. Complete! Test from a Firefox client and check table structure is created with SHOW TABLES; MySQL query to confirm successful configuration.

 

Other Databases

I haven’t done any packaging or testing for it, but Mozilla Sync Server also supports memcached as a storage database, there is a sample configuration file supplied with the RPMs I’ve built, but you may need to also built some python26 modules to support it.

 

Other Platforms?

If you want to package for another platform, the best/most accurate resource on configuring the sync server currently is one by Fabian Wenk about running it on FreeBSD.

I haven’t seen any guides to packaging the application, the TL;DR version is that you’ll essentially need server-full, server-core, server-reg and server-storage, plus all the other python-module dependencies – take a look at the RPM specfiles to get a good idea.

I’ll hopefully do some Debian packages in the near future too, will have to work on improving my deb packaging foo.

 

Warnings, issues, small print, etc.

These packages are still quite beta, they’ve only been tested by me so far and there’s possibly some things in them that are wrong.

I want to go through and clean up some of the Python module RPMs at some stage as I don’t think the SPEC files I have are as portable as they should be, commits back always welcome. ;-)

If you find these packages useful, please let me know in comments or emails, always good to get an idea how people find this stuff and whether it’s worth the late nighters. ;-)

And if you have any problems, feel free to email me or comment on this page and I’ll help out the best I can – I suspect I’ll have to write a Mozilla Sync Server troubleshooting guide at some stage sooner or later.

IBM x3500 M3 Server

I recently got to play with a nice shiny new IBM x3500 M3 server ordered for a customer to replace a previous IBM x3400 M2 that had become a bit too acquainted with a sprinkler system….

These machines offer a good mix of features that makes them suitable for small and medium businesses, with the option for both SAS and SATA drive, dual CPU sockets and up to 192GB RAM in a (large) tower format.

Whilst not for everyone, I love the IBM xseries industrial design.

The only issue is that they sometimes miss certain handy features that competitors like Dell are shipping in their machines – one such feature being ESATA, which I find really handy for small business customers doing backups onto external hard disks.

With the x3500 M3 the server ships with UEFI instead of a legacy BIOS, sadly it doesn’t seem to speed up the server boot time but hopefully as they start to build a better design around UEFI this issue will improve in future releases.

I still have high hopes for what they could accomplish with UEFI, but so far it seems to be mostly a system for booting a BIOS-like mode so I’m not sure what has actually been accomplished other than to add more layers worthy of Inception.

As standard these machines ship with a single power supply, for redundancy you will probably want to order the Redundant Cooling & Power kit to get a second supply, along with several more fans you don’t really want or need.

(Tip: On older models, if you dislodged any fans by accident, the server will think there’s been a fan failure and will run all the other fans at maximum speed which is incredibility loud. In normal operation, it should be reasonably quiet with the fans speed dynamically slowing.)

Enough fans for a small hurricane.

IBM is moving towards 2.5″ drives being the size of choice, so take care when ordering disks to suit. In the case of the model we purchased, it shipped with 8x 2.5″ SATA/SAS bays as well as a big general bay area and mounts of older existing 3.5″ disks.

I presume this large bay is where additional 2.5″ bays could also be installed if you have particularly large storage requirements.

I do love the tiny new 2.5" drives, pitythey can't reduce the size of the rest of the server to suit....

Most likely you’ll be ordering the machine with additional memory to install, take note that these servers (like many of IBMs) are particularly explicit about which slot there memory modules must be installed into.

And if you’re ordering a lot of RAM take a careful read of the product manual – what I see with the memory installation instructions hints that certain DIMM slots are only usable with a second CPU.

Memory installation instructions are on the side panel/lid.

The best part of the x3500 M3 is that it ships with an IBM Integrated Management Module as a standard feature. This allows full management of the server including viewing the screen all the way from power on, through UEFI/BIOS and to the OS remotely via a web brower, eliminating any need for a network connected KVM.

This is particularly great for us, since a customer who is ordering a tower server typically only has a couple machines at the most and isn’t going to want to invest extra money for remote access – having it as a standard feature makes our lives a bit easier without costing extra.

Kernel paniced your box? No worries, a reboot is just a click away!

I was also happy to find that instead of some nasty flash plugin or windows-only application, the IMM browser interface works fine on my Linux machine and even the Java-based KVM functionality works fine under Linux and OpenJDK.

Don't mess with those BIOS settings in that tiny server room, do it from the pub! (or maybe don't, alcohol and BIOS settings sounds like a recipie for disaster....)

The one problem I did have with the IMM is that they made the process of the first login a bit harder than needed, with some obscure default admin user/password details, but then allowing the user to continue to use these insecure credentials for ongoing maintenance of the server.

Naturally you’ll want to change the passwords of the IMM because having randoms login and reboot your server isn’t exactly desirable… You should also setup and force HTTPS as well, to ensure there aren’t any insecure connections established sending keystrokes without encryption.

 

I think the IBM x3500 M3 series servers certainly have room to improve – they’re physically overly large, UEFI still boots slowly, the H/W RAID configuration interface needs a lot to be desired for and a lack of a built-in ESATA port is very annoying.

But when it comes to the manageability and expandability of the platform, they hold their own and for businesses with a single primary server I think they’re a great option without needing massive investment in management infrastructure.

Why I hate URL shorteners

I’ve used Awstats for years as my website statistics/reporting program of choice – it’s trivial to setup, reliable and works with Apache log files and requires no modification to the website or usage of remote tools (like with Google Analytics).

One of the handy features is the “Links from an external page” display, which is a great way of finding out where sudden bursts of hits are coming from, such as news posts mentioning your website or other bloggers linking back.

Sadly over the past couple of years it’s getting less useful thanks to the horrible wonder that is URL shortening.

URL shorteners have always been a controversial service – whilst they can be a useful way of making some of the internet’s more horrible website URLS usable, they cause a number of long term issues:

  • Centralisation – The internet works best when decentralised, but URL shortening makes a large number of links dependent on a few particular organisations who may or not be around in the future. There’s already been a number of link shortening companies who have closed down killing large numbers of links and there will undoubtedly be more in the future.
  • Link Hiding – Short URLs are a great way to send someone a link and have them open it without realising what content they’re actually about to open. It could be as innocent as a prank for a friend or as bad as malicious malware or scamming websites.
  • Performance – It takes an extra DNS query (or several) to lookup the short URL servers before the actual destination can be looked up. This sounds like a minor issue, but it can add up when on high latency connections (eg mobile) or when connecting to international content on NZ’s wonderful internet and can add up to a number of seconds sometimes.
  • Privacy – a third party can collate large amounts of information about an individual’s browsing history if they have a popular enough URL shortening service.

Of course URL shortening isn’t entirely evil, there’s a few valid use cases where they are acceptable or at least forgivable:

  • Printed materials with URLs on them for manual entry. Nobody likes typing more than they need to, that’s understandable.
  • Quickly sending temporary links to people via IM or email where the full URL breaks due to the client application’s inability to phrase the URL correctly.

Anything other than the above is inexcusable, computers are great at hiding the complexities of large bits of information, there’s no need for your blog, social network or application to use short URLs where there is no human entry factor involved.

Twitter is particularly guilty at abusing short URLs – part of this was originally historic, but when Twitter had the opportunity to fix, they chose to instead contribute further towards the problem.

Back in the early days of Twitter, there was no native URL handling, so in order to fit many links into the maximum tweet size of 140chars, users would use a URL shortener such as the classic tinyurl.com or more recent arrivals such as bit.ly to keep the URL lengths as small as possible.

Twitter later decided to implement their own URL shortening service called t.co and now enforce the re-writing of all URLs posted via Twitter to use t.co links, in a semi-transparent fashion where some/all of the original URL will be shown in the tweet, but the actual hyperlink will always go through t.co.

This change offers some advantage to users in that they were no longer dependent on external providers closing down and breaking all their links, as well as having some security advantages in that Twitter maintain lists of bad URLs (URLs they consider to serve malware or other unwanted content) to help stop the spread of dodgy content.

But it also gave Twitter the ability to track click data to figure out which links users were clicking on, I imagine this information would be highly valuable to advertisers. (Google do a very similar thing with the Google results web pages, where all clicks are first directed through a Google server to track what results users select, before the user is delivered to the requested page).

The now mandatory use of URL shorteners on Twitter has lead to a situation where it’s no longer easy to track which tweets or even, what tweeters, are leading to the source of your hits.

Even more confusingly, the handling of referred URLs is inconsistent depending on the browser/client following the link. The vast majority will log as the short URL version, but some will be smart enough to provide the referred URL *before* the referring took place.

RFC 2616 doesn’t touch on how shortened URLs should be handled when referring and leaves the issue of how 301 redirects should have their referrers handled up to the implementers decision. And their are valid arguments for using the original page vs the short URL as the referrer.

For example, for this tweet I have about 9 visits via http://twitter.com/jethrocarr/status/170112859685126145 and 29 visits via http://t.co/0RJteq3r, which throws out hit-count based ordering of the results:

Got to love Twitter & shortened URLs - most of these relate to tweets, but to which tweets? No easy way to track back.

A much better solution, would have been for twitter to display shortened versions of URLs in the tweet text to meet the 140 char limit, but the actual link href record featuring the full URL – for example, a tweet could have “jethrocarr.com/i-like-…..” as the link text to fit within 140 chars, but the actual href record would be the full “jethrocarr.com/i-like-cake” URL.

Whilst tweets are known as being 140 chars, there’s actually far more information than that sorted about each tweet: location co-ordinates, full URL information, date, time and more, so there is no excuse for Twitter to not be able to retain that URL data – of course, that information has value for them for advertising and tracking purposes, so I wouldn’t expect it to ever go away.

(As a side note, there’s an excellent write up on ReadWriteWeb about the structure of a tweet and associated information)

 

Over all, shortened URLs are just a pain for dealing with and it would be far better if people avoided them as much as possible, essentially if you’re using a short URL and it’s not because a user will be manually typing out content, then you’re doing it wrong.

Also keep in mind that many sites have their own shortish URL variations. For example, this article can be accessed via both date/name and ID number:

https://www.jethrocarr.com/2012/02/26/why-i-hate-url-shorteners
https://www.jethrocarr.com/?p=1453

Many people also run their own private shorteners, quite common with popular sites such as news websites wanting to retain control of the link process and is a much better idea if you plan to have lots of short URLs for your website for a valid reason.

Virtualbox Awesomeness

Work recently upgraded us to the latest MS Office edition for our platform. Most of our staff run MacOS, but we have a handful of Windows users and one dedicated Linux user (guess who?) who received MS Office 2010 for Windows.

I’ve been using MS Office 2007 under Wine for several years, it was never perfect, but about 90% of the functionality worked with some exceptions such as PDF export and certain UI and performance artifacts.

With the 2010 upgrade I decided to instead switch to using Windows under a VM on my laptop to avoid any headaches and to fix the missing features and performance issues experienced running Office under Wine.

Whilst I’m a fan of Xen and KVM, they aren’t so well suited for desktop virtualisation as they’re designed more for server environments and don’t offer some of the more desktop focused features such as seamless integration, video acceleration and easy point & click management interfaces.

Instead I went with VirtualBox thanks to it being mostly open source (open source with exception for a few extensions for USB 2.0 forwarding and network boot) and with a pretty good reputation as a decent VM application.

It also has some of the user-friendly desktop features you’d expect such as being able to forward USB hardware through to guest, mounting any folder on the host as a network share (without needing to setup samba) and 2D/3D video acceleration.

But the real killer feature for me was the seamless windows feature, which allows me to boot the virtual windows desktop and Windows applications alongside my Linux applications smoothly and without the nastiness of an RDP window.

Windows & Linux application windows running together concurrently.

Sadly it’s not quite good enough for you to be able to run the latest Windows games in as the 3D acceleration is quite basic, but it’s magnificent for just about any other non-multimedia application.

The only glitch I found, is that if you have dual screens, you can only run the windows session on one screen at a time, although virtualbox does allow moving the session between monitors whilst running so it’s not too big a deal.

The other annoying issue I had with virtualbox is that it uses image files for storing the guest VMs and it doesn’t appear possible to get it to use an LVM volume instead – so in my case, I waste a bit of space and performance for unnecessary filesystem formatting to store the Windows VM. I guess this is a feature that only a small subset of users would want so it’s not particularly high priority for them to add it.

I’m running Win7 with 2 virtual cores and 1GB of RAM on top of a host with an Intel Core i5 CPU (with hardware virtualisation enabled), 8GB RAM and a Intel 320 series SSD and it’s pretty damn snappy.

As a side note, the seemless window integration also works for Linux-based guests, so you could also do the same ontop of a Windows host, or even Linux-on-Linux if desired.

Johnsonville Train

I was in Wellington a week ago for several work projects and ended up on a train out to Johnsonville to help my good friend Tom with his wifi/cable modem issues at his new flat, now that #geekflat is over. :'(

It’s not a secret that I love trains, a good deal of my per-computing childhood was spent reading train books, visiting the Silverstream Railway in Wellington (I think I was the youngest member at the time) and when I was younger Dad would sometimes take me out on Wellington’s suburban trains for daytrips.

The fact that Wellington’s rolling stock was (and in many cases, still is) positively ancient made it fantastic for a young train fan, since all the locomotives made such great noises, screeching and rattling around the place.

Until recently with the 2011 introduction of the Matangi FP Class trains, most of the Wellington region passenger trains were the NZ EM/ET class dating back to 1982 or even worse, the NZ DM/D class trains which date all the back to 1938.

DM/D train running the Johnsonville Line in the foreground. An EM/ET class in the background.

The current Johnsonville Line was laid and the current Johnsonville station opened in 1938, which replaced the original rail line dating back to 1885. If you’ve caught a train on it recently, you might be forgiven for thinking that nothing has changed since.

This will be changing, new Matangi trains have been successfully tested on the Johnsonville Line and will be finally replacing the DM/D class – which whilst it will make for a smoother trip, will make it slightly less exciting for train fans. ;-)

There’s a great youtube video of the whole trip at about 23 minutes which gives you an idea of the noise, but if you’re just wanting a quick idea of the route and the number of tunnels, there’s a timelapse version. :-)

Die Flash, Die!

I hated flash whilst it was still cool!” — Jethro Carr, Internet Hipster

Adobe Flash has to be one of the more polarizing internet technologies out there, people either love it or hate it, but either way, it’s difficult to avoid. It’s used as the default for playing youtube videos, many online browser games, banner adds, “smart” uploaders and a large number of adult websites.

It’s also used for some important systems as well – Air New Zealand make heavy use of it for their Airports membership page (infact it’s not possible to login unless you have flash), which is extremely poor from a large company that should know better, along with a few too many enterprise web applications I’ve come across.

Whilst Flash has had a reputation for poor performance, CPU eating and battery-life killing, these are all implementation faults – the primary issue with Flash has always been that it’s a proprietary application and a proprietary standard.

If Adobe had simply allows Flash to become an open standard and open sourced the flash player, many of the technical issues with it would be resolved by the developer community, and it would become more ubiquitous with ports to other platforms that Adobe might consider “too small” to worth spending developer time with.

Adobe didn’t even release specifications and allow free licensing until 2009 when they kicked off the Open Screen Project and released the specification – but it’s a big catchup game to play for other applications to fully implement the specification needed to support flash applications. And the flash player itself is still fully proprietary, if Adobe doesn’t want to support a platform or a browser, you’re effectively screwed.

Open source projects like Gnash are slowly catching up, when I tried it recently it was good enough to allow me to play Youtube videos and some other flash features, but would fail on more complex applications such as Air New Zealand’s abomination of a website, so depending on your needs, you may still be chained to it.

 

Flash on Linux has always had a particularly rocky history – historically Adobe made a plugin available but only supported the i386 platform, requiring many years of the use of 32 to 64bit wrapper libraries in order to run Flash on modern 64bit Linux systems, leading to all sorts of wonderful performance, memory and audio issues.

A 64-bit alpha plugin emerged relatively recently and Adobe now supports 64-bit Linux as part of their official downloads, but other platforms such as PPC, MIPS and ARM are still unsupported – an issue which becomes more and more apparent as vendors release ARM based smart-phones and tablets and are unable to install flash player on them.

Adobe has now announced that they will be dropping support for Flash on Linux for anything but Google’s Chrome browser, which has it’s own special build in flash binaries – I suspect this will mean that it won’t extend to supporting the open source build of Chrome (called Chromium) which currently excludes the Flash support.

Of course, for other browser users like myself (eg Firefox), this decision is short sighted and very frustrating – a text book example of the problems with relying on proprietary software and standards.

Thankfully Adobe did at least realise that this decision is going to result in a lot of users sticking with the final 11.2 version on Linux and is promising to support 11.2 with security updates for another 5 years, so at least we won’t have thousands of users running around with vulnerable flash players – Flash Player does have a reputation for security holes after all.

 

On the positive side, Flash is dying.

Adobe has already announced plans to stop supporting mobile platforms like Android in favor of Adobe Air, although Adobe Air sounds like they’re making the mistakes of Flash all over again, unless they allow fully HTML5 based Air applications to run without need for a browser plugin in future.

Apple has always refused to support Flash on the iOS platform (iphone/ipad) and recently stopped shipping Flash with MacOS on Macbook Air by default. (in a hilariously ironic statement, Apple criticized Flash for being a proprietary locked down platform, whilst happily ruling the iOS platform and App store with an iron fist).

HTML5 along with Javascript is quickly securing it’s place as the web platform of choice for rich UI web application developers and I expect we’ll see more and more tools and frameworks to make working with these technologies easier.

You can even watch Youtube videos in HTML5 if you have a capable browser (recent versions of Chrome or Firefox will work) under their HTML5 trial.

Hopefully projects like Gnash are able to complete their implementation of Flash to a sufficient level to support legacy websites and applications, although by the time this happens, it may be that we won’t need it any more.

 

If Adobe had just open sourced Flash Player and the standards years ago, maybe this wouldn’t have been the case and we’d all be running stable open Flash implementations already, Adobe only has itself to blame for Flash’s demise.

But they won’t see any tears from me.