Linux Kernel development case study

History

The first release was in 1991. Linus Torvaalds announced this release on the Minix Usenet newsgroup. Many of those on that group reviewed his kernel and contributed code towards it in the form of patches. Linux initially used Minix as the platform on which the kernel was compiled, and used various parts of the Minix toolchain (e.g. the shell, the Emacs text editor and a 'C' compiler etc. By version 0.11 December 1991 Linux became "self hosting" i.e. capable of compiling and installing itself. In 1992 Linux development moved from the Minix newsgroup to its own group: comp.os.linux

The title of Linus" autobiography Just for fun, describes some of the motivation, though this book also suggests a learning motive, that Linus was unable to learn as much as he wanted about his computer by using existing packaged operating systems.

Andrew Tannenbaum was interested in keeping Minix small as a learning tool to support his computing science classes. Also Minix, while source available, was commercial software. Linux filled a gap because no other free software operating system was available. BSD Unix was engaged in a legal dispute and the GNU Hurd kernel wasn't ready. The consequence was that Linux attracted development interest away from other systems.

Release early, release often

Linux has had thousands of point releases since its initial release. Linus has only been able to manage one of the various branches. When there was a large gap between stable and development releases other maintainers have maintained the stable releases to enable Linus to concentrate on the development version, e.g. by backporting fixes and drivers. Stable releases tend to be feature frozen.

Other branches

When is a branch not a fork ? When it is not intended to go for a long time in a different direction. Forking tends to duplicate effort but branching recognises a differentiated user community who need different versions. Other developers have maintained branches intended to carry out research e.g. into innovative filesystems or scheduling. Others (e.g. Andrew Morton or Alan Cox) have seen their branches as staging posts for patches to be tested before possible inclusion in the mainstream. Other special purpose branches exist e.g. for embedded systems. Distributions e.g. Ubuntu, Debian and Red-Hat have their own patch sets.

Some branching is inefficient. If a developer of a product that requires Linux kernel modifications does not feed patches upstream they will find maintaining their own increasingly diverging branch increasingly costly.

Distributed Testing

Raymond's law: many eyeballs make all bugs shallow. On one occasion I needed a device driver that wasn't yet in mainstream. So I applied the Alan Cox patch to the last Linus version and it didn't compile. So I sent the compilation error to Alan giving him the error message indicating how it broke. Since when were Linus or Alan able to compile all their code on every possible architecture let alone carry out extensive run time testing of every possible CPU design ?

CPU Portability

Wrist watches, supercomputers and everything in between. ARM, PowerPC, 680X0, I386, AMD, Via, Intel, Sparc, big endian, little endian - you name it the chances are some version of Linux runs on it. How come ? Answer: Source code and the freedom by anyone with the urge or money to earn by hacking it.

So who contributes ?

See Who wrote 2.6.20?

Linux isn't a volunteer created system any more. Big business depends upon it rather too much for that. At least 65% of the code that went into 2.6.20 was by people working for companies - possibly as much as 84% or more.

Stable/Production versioning

Version 0 went through very many point releases with none of these being officially considered stable. After the release of verion 1.0 in 1994 (with 196,000 lines of code), the 1.0 series was considered stable and the 1.1 development series was opened. Version 1.2 was released as the next stable series in 1995 (310,950 lines of code). Linux 2.6.25 was released in April 2008 with 9,232,484 lines of code.

Previously to version 2.6 odd major numbers were the development series and even major numbers were production series. This practice was changed with version 2.6, due to problems getting the 2.5 series stabilised and sufficiently well tested and with the growing demand to backport functionality to the 2.4 series. With the 2.6 series every point release goes through a time limited feature enhancement merge window followed by a stabilisation window. Other maintainers apply bugfixes to kernel version in use, e.g. 2.6.24.x where x is a minor patch version.

Subsystem Development

Given the number of new and changed lines of code in each release, Linus clearly can't review all the patches himself. He delegates this work to trusted subsystem maintainers.

Subsystems of a kernel include various categories of device drivers, process schedulers, memory allocators and pagers, security subsystems and filesystems.

What does this say about future production of artefacts requiring collaboration ?

We're still working this out but eliminating transaction costs play a very significant part in achieving mass collaboration in the creation of high quality products. Other examples include Wikipedia. Proprietary software producers are not able to eliminate transaction costs, so can only negotiate subcontracted software from another supplier by dealing with or buying the supplier. As the size of useful systems increases, counted by lines of source code, question marks arise over the ability of proprietary competitors to manage similar productions in-house based upon the limits to their ability to manage such large organisations as efficiently as self-organising groups of independant collaborators can.

How can one company trust a competitor to develop a critical resource ?

By employing development coordination services using trusted independant developers who demonstrated commitment to the project by investing their time in it before they got paid to do it. Linus and Andrew Morton are now paid to work for The Linux Foundation which receives grants from the major commercial beneficiaries.

I've been told Linux infringes various patents

Patents on software as such are not yet legal in the EU. In the US software patents can have a chilling effect. But which ones ? SCO claimed this and went bankrupt trying and failing to prove it. No-one developing Linux wants to spend time finding out which patents might be infringed, because knowing this would increase potential damages. Microsoft have claimed Linux allegedly infringes their patents but won't state which ones. In practice it seems unlikely that any valid patents could be litigated against the companies supporting the development of Linux and benefitting financially from it, which could not be rapidly coded around.

Why can't I create my own system and call it Linux ?

Someone tried pretending this by registering it first who got sued and lost. This would fall foul of the service mark subsequently registered by the Linux Mark Institute which also helps protect the rights of owners of derived trademarks, e.g. Red Hat Linux.