These days everyone wants a search box on their website. As a web designer you might not know a lot about server-side technology and find yourself puzzling it out at 2 in the morning. Well if you want to be a solo web designer, my first piece of advice is, "learn to program son!" In the woolly world of contract web development, sometimes you gotta write it yourself, and sometimes you gotta find a freebie to do it for you. The only thing you can be sure of is that the client is not going to pony up for a commercial 'solution'.

So if you want to make a site searchable, chances are you want a ready-built search engine. Why? Because MySQL's full text search sucks, and a good search engine is hard to write. Not only that, but it's a pretty general problem, so you can bet there's a fair amount of science to it. A lot of people solve this by using a clever Google link and leave it at that. That's good, but you can do better.

Ht://Dig is an open source website search engine. What sets it apart from other free search engines is that it actually spiders your website similar to the big boys: by fetching pages through HTTP. This may not seem to be important, until you realize that the majority of websites have some dynamic portion supplied through a content management system or database of some sort. The source files on the server are nothing! Your search engine needs to see the site as god intended: through the eyes of the user. Ht://Dig is like the ADHD neighbor kid, you give it your homepage and it clicks through millions of links in minutes.

Installing and configuring Ht://Dig is not hard, but it's the hardest part. The manual guides you through it pretty painlessly. Setting up the search box on your page and the results pages is even simpler. Ht://Dig is sensible about using template files, so you're not stuck with a generic results page.

Getting Good Search Results

Okay I lied about the hardest part. That was just me trying to sell you Ht://Dig to satisfy my own twisted sense of right. Sure, getting it working is relatively painless, but good search results are another catfish farm entirely. Ht://Dig employs several search algorithms:

  • exact: The obvious.
  • soundex: Attempts magical phonetic matching.
  • metaphone: Attempts magicaler phonetic matching optimized for English.
  • common word endings (stemming): Gets to the root of the matter.
  • synonyms: Uses a thesaurus to attempt to be smart.
  • accent stripping: Don't let those Euros tell you how to spell naive.
  • substring and prefix: Make the search dog slow by overanalyzing everything.

These algorithms can be combined arbitrarily with a weight between 0 and 1 for each one. Usually exact is given a weight of 1 and then some other combination of lesser algorithms are used to sweeten the deal lest random results creep into the top 10. This kind of flexibility is nice, but it comes at a cost: suddenly you are responsible for the search results! The best configuration clearly depends on your content.

Tweaking the Content

So now you're responsible, something that web designers generally try to avoid at all costs. But since your immunity has already been compromised, you've got no choice but to deliver results. The best way is to make your content machine search-engine friendly. You've always been tempted, but now you can do it without feeling like you're whoring yourself out to the whimsical Google Lords. This Search Engine Optimization has nothing to do with nothing with overstuffed meta tags or mirrored URLs. This is the technical equivalent of downhome cooking.

The key here is define a convention for how you use the following elements: TITLE, H1...6, META KEYWORDS and META DESCRIPTION. All of these (as well as plain text) can have a multiplier associated to them. If you have a habit of setting really good titles for your pages, then you can give titles a high multiplier to push them to the top of the search results. Personally I tend to use H1s, H2s, and H3s as the most meaningful headings on a page (I start navigation headings at H4), and give appropriate weight. You should choose an approach that fits the existing state of your content.

Measuring Success

A certain number of people coming a website only look for one thing: the search box. It's sad but true, the 2000 hours you spent on information architecture and usability testing can all fly out the window at the hands of a search-happy user. Chances are they will search for something that is literally right on the front page already. Instead of immediately stabbing your eye out with a hot poker, just breathe in and take heart my friend. This situation actually makes your job easier! If you can optimize your search engine for the top 50 things your audience searches for, then you pretty much are a shoe-in for the web design hall of fame (not really, but they might at least pay you). Using a good log analyzer and some common sense you can quickly achieve this through judicious tweaking of headers and weights.

Pitfalls of Ht://Dig

Ht://Dig is so fun and educational that it's easy to forget how dangerous it is. The number one thing to avoid is infinite navigation. Sure it has facilities to avoid duplicate pages with different links, but they are not at all as robust as Internet search engines which have to contend with intentional honey pots and other electronic dangers. As a matter of fact, you should be very careful when using Ht://Dig to index content that is created by anyone whom you do not have complete control over. The simplest thing like an calendar with month to month navigation is a veritable event horizon for Ht://Dig. Don't say you haven't been warned.

Log in or register to write something here or to contact authors.