On Variety in Systems

Variety may be the spice of life, but it can also be the bane or the godsend of a system.

Fair Warning: this is a long post.

There is an airline which arranges its planes with digital dashboards to be configured so the appearance is virtually identical to the same plane type that has an analogue gauge in that position. Now any pilot can recognize the gauge in any plane. This increases safety while reducing training cost. In this case the lack of variety is the enhancing factor.

Many examples of this could be called up: standard brick sizes, track widths, final exams (if we ever get to that point), outlet voltages, gasoline specifications, and on and on and on. I am sure every reader can add dozens of better examples. The lack of variety increases inter-operability and interchangeability of parts, components, student marks, whatever.

In contrast a project with a large variety of contributors can be disastrous. Nobody is fully in charge of everything, unless you have an overarching project architect with overarching authority and overarching insight. The latter can happen, but on very large projects, is seems unlikely, perhaps not even humanly possible.

Yet there is a system type that definitely benefits from internal variety. This is almost any system of interconnected inter-operating intercommunicating parts. That’s too abstract, so I will use one I actually saw in action as an example.

This occurred long before there was an Internet, so please be patient with the outdated message technology. It was state of the art, once upon a time.

A certain banking system used a message concentrator to trade off line capacity for number of lines. All the branches in an area would have their teller message traffic passing through this concentrator. There were perhaps a dozen or so concentrators, all of which passed their messages, on high speed lines, to and from the central computer (CP in what follows, concentrators will be abbreviated RP for regional processors). The RP assembled messages into large blocks, especially during times of heavy traffic. The CP unravelled the blocks and processed each message on its own merits, and responded back to the RP – again, in large blocks if traffic was heavy.

Nothing exciting here. Except when one RP was temporarily unable to reach the CP. Messages from the tellers backed up in it; getting no response, tellers would hit “reset” and key in the customer’s transaction again, perhaps several times.They were frustrated.

Then the RP somehow got reconnected to the CP. Now this unnaturally big block of duplicate and triplicate transactions arrived at the CP. Since transactions for the same terminal, or same account, are not allowed to process simultaneously, the duplicate and triplicate transactions were held in CP memory in a sort of wait state while their twins processed ahead of them.

This overloaded the CP with transactions of which the majority were just waiting behind those really processing. Eventually, transactions from other RP’s were backed up.

You guessed it. In the areas served by those other RP’s, frustrated tellers were hitting reset and re-keying their customers’ transactions.

Once the system got itself into this situation, it could not get itself out of it. There was always one area backlogged, waiting to saturate the CP and in turn backlog other areas.

The fact that every RP reacted exactly the same way meant, there was a sort of evil resonance that kept itself going.

That’s a long example; here is a shorter one.

There used to be a message system within a certain large corporation, whose initials are not HAL. This system also preceded the Internet, but it worked more or less like email, and had inter-person scheduling and calendaring features as well.

One facility added was, a sign-out message that one set up to inform others of a long absence, such as a vacation. Anyone sending a message to such an absentee would get this “away from the office” message.

Can you guess the catch?

An entire department went somewhere, maybe a critical conference in another city. All of them set up their personal “away” messages.

Then one of them sent a final message to the department, reminding them of something they had to do after the conference.

Every ID in the department received the message and responded, without human intervention, with each personal “away” message. The originator’s id responded, without human intervention, with his/her personal “away” message, to all of them. All this was automatic and happened very fast, as the system needs no keystrokes to send an automated message, eh?

The system went down. It flooded itself with “away” messages.

I have heard of cases of NT networks that went down because an unrecognized error message caused a broadcast of another unrecognized error message.

A part of the problem in each of these cases was, the similarity of the networks caused each part to overload another part, which in turn…. and so on.

I am reminded that Ethernet collisions were at one time resolved by waiting for “a random time interval” and trying again. Imagine if the randomizer in each machine were, somehow, generating exactly the same random interval: collisions would be guaranteed to re-collide. Difference is essential here.

So, in some cases, variety is the spice of life. It is the case in genomes, where diversity increases potential selection choices. It can be the case in networks, where failure to “lock in” a bad resonance saves the day. It can be the case in a crowd, where an individual decides to assist another while everyone else feels her- or himself to be “just another nobody” and thus useless.

Conclusions.

In the HAL case, it was arranged so that any ID would only “away” another ID exactly once, on the first instance of message receipt. Each ID made itself slightly different, with different response behaviour.

There is a final joke about the banking system. An unnamed person claims  to have made two suggestions regarding the “reset” key. First, put a really big spring under it, so it is not keyed lightly. Second, put a thorn on top of the key. Only the first recommendation was implemented.

Leave a Reply

Your email address will not be published. Required fields are marked *