Computer security is a weird thing. Most people worry
about it, at least a little. A lot of companies make whole
heaping piles of money peddling software and
hardware that purports to finally make all of your data
completely and impregnably secure. The government is
busy producing new laws that provide a legal framework for
the guarantee of security in a networked environment.
But I can't help but feel that somehow everybody is missing
the point. Consumers insist on encryption for the actual
transfer of a credit card number but turn around and hand
their password (with the associated access to the credit
card data for the account) to the first person who instant
messages them claiming to be an AOL employee who needs to
verify their password. Businesses spend a lot of money on expensive
firewalls and sophisticated monitoring software, but then
forget to change default passwords on third-party software
that they install. Software vendors provide products that
are capable of assisting in securing a network, but then make
the default configuration as open to penetration as any 60's
free-love flower child. The left hand hasn't the vaguest
notion of what the left wrist is doing, much less the right hand.
It doesn't matter though, because all of them are worrying
about entirely the wrong things. I don't care if a box
gets rooted and my company's webpage screams Fuck Authority for
a few hours. We have a solid enough backup system that fixing
that would require nothing more than scrubbing the compromised
disks and then running dd to copy a known good image. It doesn't matter
if you open an email with a nasty virus that blows away all
of your software. Yeah, it's a pain to have to re-install all of
your applications, but you won't even remember it in a year. Who
gives a shit if some high-profile company gets hit with a denial
of service attack, even if they're down for a whole day. Let's
be honest, these Internet behemoths have a lot to do to prove that
they have viable businesses and, in the end, no single 24 hour
period is going to make the decision.
No, what scares me is that the only applications where any thought
is being given to security belong to the class of user-visible
applications. But most of the applications that need to be secured
belong to the class of decision-support applications. These
are the invisible applications that today's corporations, and tomorrow's
individuals, use to provide them with the fundamental information
upon which they make decisions about how to interact with the world.
These applications are tremendously complex, and the vulnerabilities
in them are often exceedingly subtle, but they represent a far greater
threat than any vulnerability seen yet in any email program. Perhaps
an example is in order.
I worked for an Internet company that, internally, took this whole
information revolution thing very seriously. We had a much higher
ratio of certifiable net.fanatics to normal people than any place
I've ever seen in my entire life.1 Needless to say, we
had a lot of software that had been developed internally that we used
to make all sorts of important business decisions--the kind with
consequences measured in the millions of dollars--every day.
We trusted these systems. We wrote them ourselves, and all of them
were actually extensively reviewed by many engineers at every
phase of the design and implementation. We were careful, we
knew that if these systems went haywire it could cause real
problems. If the wrong set of systems decided to go bad, it
could literally kill the company.
At least one of the systems went haywire anyway. In spite of the
very best efforts of dozens of seriously hard-working programmers,
bugs crept in. They always do, it's inevitable in any real world
environment. Databases drift--columns are added; tables are added;
as years pass different applications begin to think that one field
means different things. Source skews--work-arounds are made for
system specific deficiencies, but aren't documented as such and are
thus preserved through ports to new systems; implicit assumptions
creep in; bizarre dependencies begin to appear.
In our case, it was a system that helped control the amount of
certain products that were ordered from our suppliers based on
the way
that different customers were using rebates. We called the
system Fred (it was an inside joke among some of the engineers).
We wanted Fred to make sure that we were adequately stocked to
cover whatever response we might see to our various marketing
initiatives. Of course, it fed that data into several other analytic systems
that were used to drive all sorts of marketing activities and
purchasing decisions.
Fred worked fine at first, and actually helped make a significant
difference in the company's financial results the first quarter
it was available. After that, it performed above expectations
for a long time before finally settling down to perform at or
just below the rest of our systems--or at least it did according to
our metric software, also developed internally.
In reality, Fred worked great for about 18 months. Then it broke when
another application stopped writing the particular piece of data
it was dependent upon to the database. Apparently the fact that
that column had been written had just been a side effect of the way
that the other application handled a particular part of each customer-record update. When that application was updated, Fred could no longer
find the piece of data it needed.
But, Fred did the Right Thing. The system promptly reported
via email to several people that something was wrong and what
it was. The next day the engineer who now owned that part of
Fred (Fred having grown large enough that it took several engineers
to care for him) promptly investigated, quickly ascertained
the cause of the problem, and immediately fixed Fred to simply
look the data up elsewhere. Because he actually cut several dozen
lines of code that had done the previous look-up (there was a more
recent database API available that offered easier access) he felt
good that he had reduced the overall complexity of the system
by just a little bit. And hey, incremental gain is better than no
gain at all.
Fred now appeared to work again. A few months passed, the programmer
who had made the fix decided to retire on his stock-option millions,
a new programmer took over that part of Fred, and life went on. But
Fred was now very quietly doing something very horrible.
A particular counter was being used by both Fred and another customer
analysis program. One of them pushed the counter up while the other
pushed the counter down. Unfortunately, when Fred had been updated
one of the things that had changed was the exact timing of the counter
increment.
Another part of Fred assumed that the counter had already
been updated and scribbled all over several internal data structures,
preparing for the next update. But under some inputs this wasn't the
case. Under some inputs the counter was never updated.
The result wasn't massive breakage--no, everything looked like it
worked just fine. But every time Fred ran, some essentially random
segment of his data store was corrupted. Not in a major way, but
in a way that made a lot of the statistics and reports that Fred
produced a little wrong.
These statistics and reports, of course, were then used as input to
other decision control systems. The reports produced by these
systems were, in turn, fed into still more systems.
We finally discovered the problem when some of our business people
came to the IT people to say that the computer's purchase recommendations
weren't making any sense. Apparently they hadn't been making sense
for a very long time, but the mid-level marketing people who were using
Fred trusted his data more than they trusted what they read elsewhere in
the media, more than what focus groups said, even more than what they
themselves thought was going to happen in our
industry (more than once I heard someone say "I thought for sure that
Fred would say to spend more marketing in FOO than in BAR, but then
he came back and said 'Nope, BAR is better.'.", it should have tipped me
off, but it didn't--I trusted Fred, too).
We didn't believe there was a problem, of course, but we were more than
willing to take a look. Over the next day or two we hacked up some more
metrics that fetched their data in a completely different way and performed
their calculations completely differently. That's the only way to debug
software like this, you have to compare it to something that is known to
be good. We didn't have anything comparable, so we rewrote the reporting
(but not the analysis) portions of Fred.
It didn't take long for it to become crystal clear that something was
very wrong. We suspended usage of Fred and of every system that depended
upon Fred. This was tremendously painful, people had to be laid off
because we didn't have any software for them to use. We set about trying
to count just how much this had cost us.
In the end it wasn't pretty. We had directly lost nearly a half-billion
dollars as inventory that we'd never be able to sell had to be written off.
We couldn't afford to lose a half-billion dollars, we didn't have it.
And that was only the direct cost, we never bothered to count how much
we incurred in indirect costs from the systems using Fred as input data.
We were obviously out of business by then.
The real bitch of it, though, is that I think we may have been attacked. I think
that our company--1,000 people paying their bills, taking care of their
kids, and just trying to live their lives--could have been murdered by our biggest
competitor.
One of the first things we discovered when we began investigating was
that the bug was dependent upon data supplied at our website by our
users. In other words, it was certifiable, grade-A untrustworthy data.
And we did all of the right things with it; we were careful not to overflow
our buffers; we didn't execute anything; we just loaded it into a
database.
But if someone had a copy of our source code they'd have been able to
exploit the bug. It would have been trivial to put together a little
bot using perl to make an arbitrary number of requests per day into
our systems with bad data.
We had gotten used to our competitors looking at our site continuously,
day in and day out. We even thought it was a good idea, and we looked
at them, too. About a year ago, our biggest competitor had vastly
stepped up the average number of requests per day into our system. This
corresponded to a period during which they rolled out several dozen features
that we had that they didn't yet. We just assumed that the increased
requests were the result of them reverse-engineering our features. We
played with them for a while, blocking their requests based on first
domain, then ip-block, but we weren't serious about it and they kept
looking at us.
Now I wonder, though. The increase in activity corresponds to when
Fred first started to go out of control. I don't think it would
have been too hard to come up with a copy of our source code--like
most rapidly growing companies we had a tremendous in and out flow of
contractor programmers and dba's. If our competitor had gotten a copy
of our source they could have masked the attack in the greater volume
of requests that they were making for the reverse engineering.
Anyway, that doesn't matter now. What does matter is that our
security concerns are completely out-of-whack. We spend too
much time worrying about superficial system security and interface
security and not enough time worrying about the security that matters.
And in the near future we'll begin adopting personal decision support
devices.
I doubt we'll ever learn.
1- This is called corporate culture and is actually very
fascinating, but not in an article about computer security.
endless thanks to sirnonya for proofreading and editing help.