display | more...

Or, How Compaq and Netware Nearly Killed Me.

Long ago, in the spring of 1999, my department decided that it was time to move from our distributed file storage system to a centralized file storage system. At the time we had about thirty Netware servers of varying flavors. Each one was an MPR as well as providing file and print services for the local users. Most of these Netware servers were resident on Compaq Proliant 500's. When purchased these boxes were fairly stout and more than met the requirements to perform their job. That was a long time ago though, and 2 gigs of storage just wasn't cutting the mustard anymore. Most users had more storage space on their desktop PC's but still persisted in the belief that the network drives were virtual bags of holding.

It was decided that what we really needed was one server. One server to rule them all, One client to find them, One NOS to bring them all and in the darkness bind them... Ahem. So, there were a number of emerging technologies that we could take advantage of that would make a central file server available for the entire campus and reliable enough to support several thousand Users. Or so we thought.

We went through all the state financing bureaucratic hoops and jumps. We requested money for a Honking big server, a honking big tape backup system, and the software to run it. We had regional sales reps from Novell come to our site and pitch their wares. We traveled to HP's home office to review their hardware. Compaq wooed us with stories of 99.99% up time and limitless terabytes of data storage.

In the end we decided to go with Compaq and Netware 5.1. We were familiar with Netware and had great hopes for the new build. We had great success with Compaq servers in the past, and they seemed a logical choice, especially when they revealed that they had designed a cluster system to work specifically with Netware 5.x. Towards that end we hired MicroAge to design and build a system for us. We wanted a turnkey system. They would put it together, drop it off and it would just work. What we got was a nightmare.

What we specifically got were two Proliant 6400R's. Both were identical, they had two nine gig drives, dual PIII 550 Xeon processors and two gigs of RAM. They were each umbilicaled with fibre channel to three RAID arrays, for a massive 550 gigs of storage with the potential to increase many terabytes as our demands arose. Total price tag: $160,000.

The server was assembled, and delivered... to a room with no available power. We had requested it be setup in the campus's main computer room, where a generator was maintained and we would never have to suffer from a power blackout. The only problem was the electricians couldn't be bothered to process our work request and provide the necessary power for the system to use. 160,000 dollars of University hardware sat in the corner, dormant for six months before the power was activated.

The purchasing department used this time, and then some, to bicker with Novell about the purchase price for an unlimited licensing agreement. The Purchasing department insisted on taking the purchase request out to bid, as if somehow, someone else would be able to underbid Novell on the purchase of a license agreement that only they sell.

Time passed. Novell, surprisingly, won the bid for the license. Now all we needed was to get the signatures of practically every person involved with University administration on the purchase agreement. We waited almost two months to get these signatures, only for it to be lost enroute. The whole signature process was repeated. Total price tag: $60,000

Sometime in March of 2000 we had completed the purchase agreement and had received our software. MicroAge, as part of the purchase agreement sent out a technician to install the software on the servers. When it came time to install the cluster services he requested the appropriate software. A few phone calls later we realized that the Netware clustering service was not included in our unlimited license. Another, separate, bid process was started to acquire the Netware clustering software. A license to connect two servers to one clustered array cost us another $10,000 and 30 more days. Well it would have been 30 days if they had sent us the right software. We received NCS ver1.0 for Netware 5.0, not NCS 1.1 for Netware 5.1. The two are incompatible.

So about May of 2000 the server was finally setup and the clusters were enabled. There was great rejoicing, for a very short amount of time. It didn't quite work right. We called Novell, only to discover that our Unlimited License agreement did not cover support either. 7000 dollars and another round of signatures solved that. Netware Premium Technical Support told us that our problem was definitely hardware related. We contacted Compaq Technical Support, after ensuring them that we had installed all of the latest ROMPAQ's and yes, we had reseated the memory, they insisted that it was a software problem.

So what was the problem? The cluster was designed so that two servers could share the same data volume. If one server went down, then the other came up and took over hosting duties. The switch would take about 20 seconds and as far as the average user could tell, windows just had a little hemorrhage and went on its merry way. It was a brilliant system, and it worked too. Each time one server went down the other came up, like magic. The only real problem was that it did it about a half dozen times a day. Every hour and a half or so the server would just stop and restart itself, failing over to the other server.

After some research we found that a Compaq utility running on the server was reporting a strange error at a specific Netware memory address. Of course, Compaq claimed it was a Netware issue because the utility specifically named Netware. Netware claimed it was a Compaq issue because it was a Compaq utility reporting the error. This was very frustrating.

It was during this time that I learned about Compaq Technical supports most asinine procedure. Every time I called them I talked to a different technician, even when I called on the same incident number. Every time I called I had to review the entire problem to this new first level technician. They made me go through the same ridiculous procedure every time. They asked if I had updated the ROMPAQ's, had I reseated the memory, did I try reseating the processor? Had I talked to Novell Technical support? One day I had finally had enough. I wanted to talk to the same technician; I didn't want to have to re-explain everything. The operator claimed that I could not directly contact any support technicians. I threatened, yelled and complained, eventually the operator relented and transferred me to the technician I had previously spoken with. When she answered I explained who I was and she was incredulous. "How did you get this number?" She asked. I explained what I had done and she asked for the name of the operator. I told her I couldn't remember who it was and asked why it mattered. "I have to report this to my supervisor, It's against our policy to take the same call twice."

In July of 2000, fed up with dealing with first tier technical support, I called our Compaq sales agent and told her that we were sick of this server. I told her to take the damn thing back; we were never going to buy another Compaq. The next day I received a call from Compaq Gold Technical Support. Three months after my first call to technical support, my case had been upgraded. We then spent several months going over the details. Technicians came out and investigated our power and our system. Lots of people had ideas, nobody had a solution. By this point the server would crash anytime you attempted to copy data down from it. At Compaq's request we boxed up one of the servers and sent it to them in whole.

Presumably, while in their custody it was submitted to a battery of tests. They found no problems and sent it back, wherein it immediately failed again.

On a hunch the Compaq Support ninja asked what type of NIC we were using in our servers. We had installed 3com 100fx fiber NICs. At the time of purchase Compaq had stopped manufacturing their popular and dependable Netflex series of NICs and had yet to produce their less popular and much more expensive Netintelligent NICs. MicroAge had recommended the 3com NIC.

The technician suggested that perhaps that was the problem; he had no fiber plant in his test lab and used the EISA UTP card that shipped with the Proliant server to test it when we had shipped it back. He told me he would attempt to find a Compaq Fiber NIC and ship it to me. So, in October we replaced the 3com NIC with an old Compaq Netflex fiber NIC...

It worked.

Now, before you go assaulting my intelligence, I had replaced virtually every piece of hardware in this machine, including the NIC's, with known good, new, out of the box hardware. Of course I had always replaced the NICs with the same brand and model. No one had even suggested, in all the months we co-operated with both Compaq and Netware, that maybe we should try a different brand of NIC. There was a brief amount of celebration, followed by great cursing of Netware and Compaq.

Now, almost two years after our first request for funds, we are just now starting to implement our great scheme. I am convinced that as we upgrade 7500 client machines all over campus that we will encounter many more problems. I am also convinced that by the time we get the system installed and functional, it will be obsolete. Towards that end we have begun the purchase and implementation of hardware routers. Unfortunately, some of us didn't quite learn our lesson last time. It has been decided that instead of using standard Cisco routers, we will instead use routers made by Entarsys, formerly Cabletron.

I need a new job.

Log in or register to write something here or to contact authors.