A real scary story

Gather 'round the campfire boys and girls, for I shall tell you a tale to chill your bones! Ok fine it's not really a scary story, but I think it's pretty damn terrifying nonetheless.

In the 1960s, IBM introduced OS/360, an operating system, for their S/360 Mainframe. Both the operating system and the mainframe have had a massive impact on the world as we know it. It would be incorrect to say that we would not have modern operating systems without OS/360; after all, Multics was created around the same time. Multics greatly influenced UNIX, which in turn massively influenced a whole range of modern operating systems like Linux, AIX, BSD, and OS X to name a few. However, the impact the S/360 Mainframe has had on computers is nothing short of revolutionary. Its legacy continues on from the 1960s, the proof being the extent to which it affects our lives to this very day. It is actively developed and improved upon, the most recently announced (at the time of this writeup) being the z13 mainframe and the z/OS V2.2 OS.

A staggering number of the world's financial institutions run their backend transaction applications on the mainframe. How did IBM secure so many top customers? Well the mainframe is one beast of a machine when it comes to reliability as it can achieve "five nines" availability. This is an extremely appealing feature, especially if you are a bank with billions of dollars at stake. Another of its biggest selling points is that it is 100% backward compatible. If you were so inclined, you could take a program running in 24-bit addressing mode written in HLASM back in the 1970s and run it on the latest machine without the need to recompile anything whatsoever (in fact this actually does happen). As a result, it lived up to expectations; the mainframe continues to be used by over 90% of the Fortune 500 companies.

The scary part comes from the people who work with the routines that run on these mainframes. Keep in mind, these guys are charged with writing and maintaining software that affects literally billions of dollars. I sympathesize with how extremely conservative they are when it comes to change. That said, some of the things that have happened, and continue to happen, as as result of this conservatism are a little...unconventional, to put it mildly.

Starting from the 1960s, software developers working for banks began to write programs designed to run on mainframes. These could be HLASM routines, stand alone COBOL programs, or CICS applications. They were very well tested, as one would imagine a bank would require. Once the behaviour of the program was completely understood, it became untouchable...to this day: there are routines running on mainframes that have not been recompiled since the 1970s! Nor are there any plans to recompile or update them anytime soon. Four decades of change in music, fashion, technology, and politics have gone by, but these programs fundamental to the world's financial transactions run just as they did when they were first written.

This begs the question, why don't these mainframe developers recompile and update their code? After all, these routines were compiled for machines that existed decades ago. Even if they were run on the latest mainframe, it would not be able to make use of all the new instructions that have since been added to the hardware. Being able to take advantage of these new capabilities would no doubt improve performance many times over. Well, there are three main reasons:

1. The behaviour of the new compiled executable may be different from the original executable
People who develop applications for the mainframe, in general, do not like a change in very well known behaviour. It is a fairly valid concern. Compilers tend to do many optimizations including Dead Code Elimination and Code Reordering. While these optimizations are functionally correct, it does change the code path. This introduces uncertainties mainframe developers are not willing to take a risk on. In addition, a compiler is after all just another program. It can and will have bugs. I don't know about you, but if I had billions of dollars at stake, I would much rather pick a slow but stable program over a fast but potentially buggy one (however tiny the risk).

2. The source code is no longer available
Yes, apparently this happens. The source code is no longer there. How can you recompile a program if the source code is no longer available?* How could something like this ever happen you might ask? Well, back then source control wasn't nearly as ubiquitous as it is today. Source code was stored in Data Sets on disk. When migrating machines, it is entirely possible that some of the code might not have been transferred over. There is also the possibility that the source is there but no one has any idea where it is, or, if they do know where it is, they don't know which version of the source is the right one.

3. The executable does not match the source code
That's right. Even if the source is available, the executable's behaviour does not correspond to what is defined in the source code. Now I know what you're thinking: "What the fuck?! How does this happen? Why would anyone even allow this to happen?". Well here's the thing. Mainframe customers are charged by the MIPS. This is similar to how cloud companies who use Amazon's virtual machine deployments get charged. It may be relatively cheap now, but back in the 60s and 70s, it was far too expensive to waste compute resources on a compile to fix a bug. The quick, easy, and cheap thing to do was to directly modify the executable. I know that sounds insane, but if you think about it, it isn't that farfetched. Executables were tiny back then, and the behaviour was very well understood. Compared with trying something similar nowadays with say a Linux kernel module, it was not that difficult to pull off.

To summarize: If it ain't broke, don't fix it.

These are some of the lesser known (and more frightening) realities of the tech world. However, the explanations I presented are partially of own creation to try to reconcile why these realities exist. As concerning as all this may seem, I wouldn't fret too much about it. The various banks have a lot more to lose than we do; the foundations of the financial transactions of the world may be dark and murky, but they are every bit as strong now as they were when they were first built.

Sources:
Chats with compiler developers
http://www-01.ibm.com/software/htp/cics/35/cics_intro/
http://www.longpelaexpertise.com/ezine/zosvUnix.php
http://www-03.ibm.com/systems/z/os/zos
http://www-03.ibm.com/systems/z/hardware/z13.html
http://en.wikipedia.org/wiki/Multics
http://en.wikipedia.org/wiki/Unix
http://en.wikipedia.org/wiki/CICS

* In principle one could decompile the binary into IL, perform the optimizations, and then compile it into a new binary, but this is not as well understood as compiling from source code.

If it ain't broke, don't fix it	Chess, computers and artificial intelligence	That Guy With The Glasses	Five nines
System/360	dead code	Farfetched	CICS
MIPS	February 14, 2015	cloud computing	OS/360
Untouchable	Myth of objectivity	OS X	GOP
8-track	SIMD	bell bottoms	backward compatibility
fret	Amazon	revolutionary

Category: