And, look at that, you're already here too!
On April 29th, around 9:40AM US CST1 time, E2 went offline. In its place was Nate's Word Galaxy as the site was moved off its previous servers to new ones. Around 1PM US CST time1, the site came back up with some weirdness, but largely functional. We have been on our new servers since then with improved responsiveness, far fewer 5xx errors, and a new level of hamster wrangling.
Our new servers are virtual. Most significantly, this means the risk of hardware failure is dramatically lowered. There should never again be the risk of E2 dying because somebody turned off the AC to a professor's office over a school break. Virtualization also means if load increases or our needs change, we can create new servers with little effort. It also makes it much easier to test larger changes without risking performance problems for the site.
Huge thanks are due to nate who managed the process of moving off of the servers and coordinated the virtualization project. Further appreciation and praise goes to Joe–not a noder as far as I'm aware–who did the hard work of setting up the base virtual servers and programming all of the bits to deploy virtual machines which can host E2's odd bits of code. (And no small amount of praise is due to him for tolerating my many emails as he refined the process on the dev server we've been using for the past few months.) And, of course, clampe is to be thanked for getting E2 a place to exist in the first place and keeping us at MSU. If you're not familiar with his role in getting E2 its current hosting, I suggest you read what he wrote a back in late 2007 and marvel how much he put himself on the line to make sure we still have this site. I know some have questioned why MSU gives us hosting, and I can't give a complete answer. But I'm sure part of the reason there is a place for E2 to run is because there is a place in the heart of Dr. Lampe for E2.
I've noted in the past our physical setup of boxes. Since we're in a virtualized environment, that has changed. We are operating with five virtual servers right now: 2 web boxes (dom01, dom02), a combination SQL server/cron job machine, the one currently spun up dev box (we can spin up multiple), and a combination web, high availability, & SQL backup server (dom52).2
Joe was good enough to put together some helpful and thorough documentation about the new server setup, how to configure, update, and manage them. I haven't thoroughly read through all of them, but we're the recipients of some very nice additional capabilities as a result of his work:
- Automated SQL backups. They aren't offsite yet, granted, but they are an improvement over the one-off, manually overseen backups from before.
- Automated monitoring and notification of offline services. We'll actually be able to be automatically contacted if the site gets flaky as well as track down exactly when and how things went wrong to fix them.
- Thorough stats on utilization. Only 5 days into May, E2 has already seen over 100,000 unique visitors.
- Multiple development servers. So now if a coder wants to make an ecore change, she can do so with a fully-functioning test environment and without breaking the dev environement for anybody else.
Bugs that were
We had several highly noticable bugs that popped up as soon as we got on the new servers. Some of those have been squashed already.
- Everybody got logged out.
- This was a side effect of maintaining a configuration setting from the development servers which stored the cookie for logging in in a different place.
Some people reported that they "kept getting logged off". This is actually the default behavior. When you log in, click the "Remember Me" box. If you don't, you'll be logged out the next time you close your browser.
- Nobody's last seen time was updating.
- This was because the stored procedures didn't get imported to the new database, and it's a stored procedure that updates your last seen time. I recreated this from a local copy I had.
- Votes and Chings weren't resetting
- A side effect of the last seen time thing.
- Other User was initially frozen
- Caused by the issue above with last seen times not updating. Once that was fixed, new people showed up on the list, but people who hadn't been active for a while still weren't getting cleared off.
- Other Users got gigantic
- There is a regularly-running script which cleans out the list every five minutes or so. It wasn't running. This resulted in the amusing events described at As a side effect of the server move, this list isn't gettin cleared automatically, so many people listed aren't actually online. Sorry for the temporary inconvenience. Yes, I misspelled »getting«. Whoops.
We got the script running again, and the list shrunk back to normal. While the list was inflated, several people proposed that it'd be nice to normally have a list of visitors over the last day. in10se put together Recent Users for that purpose.
- Times were sometimes a few minutes off
This is an occasional negative side effect of virtual servers. When they are shut down, there is no physical clock inside of them ticking, so they can get behind. ntp got all of the virtual machines synchronized, and times are now all as expected.
Bugs on the hit list
Up until I posted this root log, all pages had a fair warning from nate on them:
We are migrating servers. Still a few warts, but we're going home. Hang in there.
Well, the biggest warts are gone, but there are a few still around that we're aware of. If you find any that aren't here, please drop us a note at E2 Bugs. All of E2 appreciates it when issues with the site are fixed, and we coders can only get them all with the help of your keen eyes.
- Homenode pictures are old and can't be updated.
- I expect this is because the apache user doesn't have permission to write to the files on the nfs share for some reason. It is probably also complicated by the automatic-repository-synching which might be repeatedly overwriting new files with old ones.
- Nuked items aren't going away
- I wager this is because of the warning flag I added to the bin scripts when I was trying to get them all to work again. Should fix the warnings so these scripts run without problem.
- Emails don't work
- This means both Create a new user and What's my password? aren't working right.
The short list
For the last couple of months I've been largely inactive. I apologize for my evanescence. Life hasn't been great, and I've been given reason to reevaluate my involvement with E2. As such, you've seen very little of use from me. DonJaime's and Oolong's root logs are sure to be quite a bit more interesting as far as changes go, as they've been working on some fairly large projects.
You'll probably see more of me in the coming weeks as I help iron out the remaining wrinkles from the server move and implement a few features that have been waiting a while.
With that said, there are basically two things I did in the code in April:
- My Chatterlight
- - Disabled for Guest User. This never should have been available to not-logged-in users as the catbox is not generally supposed to be accessible without an account. There have been discussions in the past about the nature of the catbox as a goldfish bowl. I'd much rather prefer it were treated more as a private chat room and the XML feed required validation to access. This would require little work on E2's part, but we'd certainly have to gather stats on what 3rd party clients are accessing the catbox and see how broad an impact it might have. It's not really worthwhile to enforce the implied semi-privacy if it makes the place a ghost town. Even if assistance were offered, I'm unsure how many clientdev people would be willing to update to cope with the additional hurdle of authentication.
- - Strip code so that when godhood/codehome goes away, things still look clean. This way, if somebody has code on their homenode and loses the privilege, they want left with a gobbledegook homenode. This had a negative side effect on Jet-Poop's homenode, causing some text to disappear. Quoting from the E2 Bugs entry:
Most of the time when we embed code on E2, it's in blocks like this:
... code ...
Or like this:
However, we also support Perl-interpreted strings like this:
... interpreted string ...
When stripping that last type, if there are links which start and end with a quotation mark, they get stripped out of a homenode.
I have yet to fix this one, but will be on it shortly.
changeset 279:a1335d5e5d1b, Tue Apr 19 04:52:58 2011 -0400
Removed debug error logging that I had put in when I had added the "as soon as you log in, you show up in Other Users" code. This was spamming up the logs, and I accidentally deployed it, when it should have only been in dev.
changeset 278:b376df45c4fd, Tue Apr 19 04:13:19 2011 -0400
Added the stripCode function which is used to remove code. See the above item about homenodes to see why it was added.
Also fixed a warning where we were using \1 instead of $1 in a substitution.
: This originally read US EST, which was a timezone off.
: I didn't realize this machine was also in the web server pool when I initially wrote this.