There is a huge
problem in the
face of E2: Duplicate content and
people who can’t spell. Splattered across the nodegel are different versions of the same basic
content, just spelled
wrong.
We see them all the time, and I have been trying to come up with a solution to it. Maybe there could be something
easily done in
code to fix this… It struck me the other day while using my friend
Google… they have a great system for
detecting misspellings, and finding the right
content. Maybe we could do something
similar.
My
suggestion is this: create a new nodetype of, "
e2pointer", that would contain the correct
spelling for a common
misspelling (such as potatoe => potato, etc.) When a user searches, look for an exact match to a
pointer to that node. If so, display text as in “Are you looking for…”
link node with correct spelling contained in the e2pointer”? and then display the other text items. (still allowing them to add the node if they wanted). I think it would slow people down from making
mistakes with their write-ups, and keep
newbies on the right track.
Whenever an editor / god-type
kills a “
see also” node, they could add an
e2pointer for the
misspelling. A good currently existing candidate for such a thing, for instance, would be:
Mohammed, for instance (I am not picking on anyone, but using this as a live example). The correct spelling would be (AFAIK)
Muhammed, but people noded it, and thus if a god doesn’t whack the
nodeshell, someone may find and “rescue” it. I did this once with
mosiac, as it should have been
mosaic, and I too made the mistake (of the see also), until dannye set me on the path of light.
The search code and a few other things would need to be
modified a bit to first search for an exact match of
e2pointer, and then do the display for the rest of the matches. There would also have to be a
mechanism to actually add these
pointers to e2.
I am looking for
comments on the
suggestion, and whether the extra one look-up on
following soft-links and the search
functionality would lag the database terribly more. I think that after a few
months of adding
pointers, we could increase the
flow and and make the
editor’s job easier, and help noders make sure everything gets put where it
belongs.
Note, I wish I could do a proof of concept, but I can add new
nodetypes to e2.
Clarification: I would like these pointers/keywords/etc to be editor / god controlled, so we have a different situation other than nodeshells.
nate sez -- you wouldn't even need to do it through nodetypes (it would be another bazillion nodes, and fields like author_user, createtime, etc don't matter with "misspellings"). Create a setting that would contain (misspelled word => correct spelling) -- then something like this:
my ($nodename) = @_; (we're in the nodeName function)
my $MISSPELLINGS = getVars(getNode('misspellings','setting'));
$nodename = $$MISSPELLINGS{$nodename} if exists $$MISSPELLINGS{$nodename};
Note: Perlmonks does this exact same gig with keywords, (searching for sprintf will bring you to manpage: sprintf) so it's not a new hack, really. However, there is much potential for problems -- "common misspellings" are kind of a myth, and it wouldn't work at all for misspellings that are part of a title. A better gig might be to run ispell on the name and offer the user a suggestion before creating it.
jay sez back: Running ispell on the thing may not catch things like Chucky Cheese v. Chuck E. Cheese (todays ed logs). Maybe a combination of both? What kind of perf hit on the spell check thing is there?
anotherone says: nothing useful here, but I've got a great name for it:
"Firmlinks"
. (softlinks... hardlinks... eh, never mind)