You and a similarly skilled, large, set of associates are charged to keep running an even larger set of computers, for an indefinite time. The operating systems for the computers are all installed by others, others that you have no contact with. Each computer must stay up, no downtime allowed, or it will be scrapped and returned from whence it came. During your maintenance, each computer will process terabytes upon terabytes of data, and produce and discard data in similar quantities. You can monitor the data received by each, and you can inject your own data into the stream, but you can’t stop the flow of data.
Careful examination of each computer’s operating system reveals many similarities, both in how the running programs interact and process data, and in the various machine code components of the installed programs and OS. There are also differences. Except in very rare cases, each computer has a slightly different installed OS from its peers, the differences varying significantly in the effort needed to discern them.
Some portion of the computers crash from various kernel panics in a given year, while the remaining machines all appear unaffected. Over time, you and your team discover that certain “patches”, sorts of data sent to, or programs run by, a computer affect the likelihood of a certain sort of crash on that computer. These patches start to be used on some computers that exhibit indications that they may soon crash, some wonderfully successful, many only partially so.
In an effort to understand the crashes, you and your associates make the superhuman effort to determine the complete, exact, machine code used by a representative computer. You are then able, with relatively little effort, to discern whether any other particular computer differs from your representative, for some particular chosen part of the OS.
There’s just one problem. Not one of your associates, or yourself, actually understands any of the computers’ machine code.
You know, so you believe anyway, which parts of the code are operations, which are data and which are simply placeholders. So you use statistical methods to associate particular kernel panics with particular differences in the data received – and then later associating panics to differences in each OS. You even start taking the bold move of changing some of the OS in various other computers that you and your team have arranged access to, simply so you can observe the change in how data is processed by those other machines.
Little progress is made at preventing the crashes in the 10 years since the first computer’s code was determined, at least for efforts directly connected to the project. Observers of your team’s effort start to criticize the work and complain of the cost involved. Your team remains steadfast that the code will eventually lead to a deep understanding of how each computer processes its data – but is humbled by the now clearly seen magnitude of the effort remaining – the effort to understand the “genome”, the kernel used by each of your “patients”, which your team of “doctors” are charged to maintain, patching them with an ever growing supply of “medicines”, all to reduce the number of “deaths” experienced each year.
I wish the project luck.