On Making E2 Distributed - Everything2.com

by s_alanet

Wed Aug 21 2002 at 5:44:37

So Glowing Fish sez to me, he sez...

    So I am thinking of e2, and the e2izer. The biggest problem with e2 is
 the server bill, and it may happen that e2 eventually goes down if Hemos
 stops paying the bill.
    But there is really no reason why e2 has to all be on one server. e2 could 
be split up between hundreds or thousands of servers. The e2 program could turn 
into a protocol, where certain webpages, all on different servers, kept a 
header that kept track of reputation, C! and stuff...there perhaps could be one 
main page where the names of new nodes and users were kept, but it wouldn't be 
accessed all that much, and would take up a lot less bandwidth. I suppose 
authentication and quality control would be a lot harder, but it could probably 
still be done.

    What do you think? Is it possible?

So I sez to Glowing Fish, I sez...

Yes, I think it's pretty possible - from the perspective of nodes and users and such. It would fundamentally change the nature of e2, of course. :)

But it could be done using trust metrics and the like. The Proper Way to do it, in my opinion, would be to release an e2.net node package that would obey some certain XML-RPC based protocols. These would be put up by people with servers and bandwidth, and connect into the e2 network, acting like nodes in a gnutella network would. Catbox messages could be passed by similar means.

Advantages of the distributed model:

Nate et al would remain gods. Your high rankingness on an important server would remain useful!
Make yourself uber god on your own server. It doesn't matter. If your server is not respected then you get nothing.
Trusted content can be easily filtered from the untrusted and backed up and passed around.
Bandwidth et al becomes less of a problem, though updates and such could become Uglier. This might be alleviated by having some client side logic to make searches and stuff more intelligent. The trick is to make sure the distribution bandwidth used is less costly than the server bandwidth that would be used.
You could use whatever language you liked s'long as it obeyed the protocol. PHP, PERL, Brainfuck, C, J2EE, whatever.

Disadvantages of the distributed model:

More complex to the user. For consistency, things like chings! and votes would have to be managed per node and such. On different nodes you might have a different level and different attendant votes and chings. Messages wouldn't be global - they would pass between a few nodes, probably, or else there would inherently be more chatterboxes.

Fragmentation of E2. This is already the case, but this could exacerbate the problem. The possibility of not all nodes following the rules and such could also be a problem. This could be fixed by using the trust metric but might still be a problem.

More difficult to manage. Gods (unless they're granted permission on a per-site basis by the site hosts) can't just change stuff.

So then E2 sez to me, they sez...

I thought this might be interesting to the edev people, and perhaps others among us. Is it worthwhile to try to make E2 distributed? Is it feasible?

I don't think it's possible or desirable at this stage to make ecore distributed. But could we make the application - Everything2 itself - live in more than one place at once?

The benefits are tremendous, especially if we make it clever enough that people can extend it in different ways. Anyone could implement a method of having user data on their own node. Presumably the best method would win. Order could be enforced by the gods, of course... Fringe servers with little content could be used for weird dev stuff.

The possibilities of this could be endless.

What do you think?

ponder says re On Making E2 Distributed: You ask "What do you think?" Where should people post replies? There is a danger of the node becoming a full blown GTKY discussion. BTW I want to reply.

/msg s_alanet or e-mail me at <bgarney@purdue.edu>. My experience with edev nodes leads me to believe that the gods are not angered by what might be considered GTKY behaviour on them, but let's not tempt fate, k? Bad write ups, here as elsewhere, are sure to be blasted. E-mail/msg me first, then if it is warranted, add write up here.

Thanks,

s_alanet

Page category:

« E2 Client Development »

I like it!

1 C!

(idea)

by Wayland

Wed Aug 21 2002 at 10:36:03

Fair enough, indeed a man with a plan. However I don't think splitting the ecore is a prudent move, this could do more damage than good. I have an idea that might work a bit better. The idea is E2 is big sluggish and costs a fortune in bandwidth. So we spread the load around. Its quite tricky and has to be done correctly but this would be the plan:
ecore stays where it is, but stop running apache, no web pages, all it does is host the database. In front of that machine is a DNS machine that all DNS requests come to. This machine can be anywhere in the world. Then you have web servers, as many as you need where ever you need. I think between 10 and 20 would be a good place to start.

This set-up up would enable E2 to stay centralised, all Gods and Editors would remain the same with the same power, E2 would look and feel the same except it would be an awful lot quicker. This is how it would work.

Noder at home would type in the address bar www.everthing2.com
That request would arrive at the DNS machine that would then say "Right then, the last person who visited was sent to web2.e2.com, so you are going to web3.e2.com" (now this would need testing as you could also use something like mod_rewrite to load balance it too)
Noder's request then sends him to the IP address of the webserver. Not of the E2 core machine but to the IP address of another machine hosted by someone else possibly in another Country.
Noder arrives at the E2 site. He types in his Username and password or whatever and web3.e2.com sends an encrypted request to ecore for database info via a VPN.
Whatever the Noder does is then written back to the ecore machine so any other web server will pick up the change, so all the E2 servers will be the same

This set-up means you can have 50 web servers around the world serving web pages in an even way and the ecore machine just sends out text from the database, this will mean the web servers have a small spread bandwidth and the ecore gets a huge reduction in bandwidth because there is no web pages.

Its a bit more technical than that, but its just about as straight forward as that. If done correctly, E2 would always be up.

This would also mean that if the ecore machine had to be hosted elsewhere, it could and would only cause a couple of hours downtime with a new location and IP than it would as it is now with a whole DNS propagation of 2 days. Well that's what I would do.

Sorry if this was a bit rushed, but I just saw this WU and thought "ooh ooh, I know, do it this way" I love E2 and although I wouldn't want to stop its development. I would hate it to lose its very special way of working, Splitting it down too far I think would really hurt it and damage the E2 community. I believe that something needs to be done to spread the workload but not at the expense of the central administration of the wu's.

Sorry again, just adding while I think of it.

You could also get a bit clever with it as well, Say you have few machines in the US and a couple in the UK. If the Noder is in the UK you can tell the system to say "Hey, you're in the UK, well, it just so happens that there is a server there too! I'll send you there instead" Again this would speed up the connection to the site as its only having to travel to the US to get MySQL text. If we use the DNS way of balancing the load there is a cunning spin off to that too, any one else going to E2 that connects to the internet with the same ISP as that guy would auto-magic-ally be sent to the same E2 web server.

I would be more than happy to host the E2 server its self at no cost, however I live in the UK and I suspect that might be frowned upon, although it would be hooked up to a 86Mb Line that is peered with BT Telehouse, which is nice, but I suspect the Gods would prefer a US location. Still the offer is always there.

Page category:

« E2 Client Development »

I like it!

(idea)

by Nick4753

Tue Oct 29 2002 at 15:38:39

Everything2 could be easily distributed to multiple servers at multiple organizations, with a set standard of communication between the servers (which could be based on an XML Web Service communication protocall such as SOAP). E2 could release all the data for the project in XML format (similar to what OpenDirectory does) and then sites such as Yahoo and Google could use that data to develop their own Everything2 datamines (pending they follow the standards setup by using the Everything2 data in the first place). If Yahoo and Google were to use this data, they could easily intergrate their large datamines to the database, and if MSN was to say intergrate it they could intergrate things like Encarta.

Many places are involved in active datamining, and adding the Everything2 data to their service is a very valueable thing to have. The current owners of the E2 database could easily charge those who wish to sell the data along with other products, and then provide the data to all who want to actively participate in the program and allow their users to add to E2.

Information is a commodity, and like any commodity it can easily be traded into capital if you know how to properly manage your commodity (money that would make this server go a bit faster and possibly get a nice Oracle 9i Database Cluster running this thing.. or at least something more suited for datamining than MySQL).

Page category:

« E2 Client Development »

I like it!

(idea)

by sorhed

Mon Apr 14 2003 at 11:00:30

I think that splitting Everything2 to distributed servers not only will reduce bandwidth consumption and payment bills' size, but also will bring a fresh stream of new valuable information. Anyone can install E2 onto his own server, thanks to edevel, but what's the point? He can't make links to original E2 content, so he need to start the work from scratch. Sometimes it's imposed by goals of these E2 installations, sometimes not. But when there will be possible for anyone to install Distributed Everything and jack in to all of the E2 content... There will be hundreds of those installations, or even more. E2 will spread like a virus. Certainly, we will need a some kind of trust management services. But I thing it's not too difficult. I'll try to explain.

There will be Everything Domains. Domain is a namespace for nodes. For example, there will be E2 Core Domain (you are here now!). It can be hosted on one or many servers, it doesn't matter. Gods will remain their powers in the bounds of single domain. It means that for distributed E2, we need to keep all servers at single control. Someone may wish to create another E2 domain. For example, there can be national domains (there are many Internet users who don't speak English or speak it poorly, like myself), thematic domains (programming languages, movies, etc.), and so on. To link a node within a current domain, it would be possible just to [wrap it in square brackets], like we do it now. For inter-domain linking, we'll need to invent something else, like [fr:Paris]. It's a link to Paris node in the French National Everything Domain, where "fr" is a domain namespace prefix. Nice idea, isn't it?

Update: Domains may be quite separated from each other, so inter-domain communication remains a problem. If we'll need the possibility to make inter-domain soft links, we won't get rid of huge traffic because of synchronization. But that's the only problem. If we won't allow soft links between domains, we won't need any communication between domains at all. It will work just like e2izer works.

Page category:

« E2 Client Development »

I like it!

(idea)

by Gartogg

Mon Apr 14 2003 at 11:33:07

The central server, however, in any of these distributed content schemes, would need to enforce content integrity, meaning that everytime that someone changed a node, it would need to send the updated page to every content server. This is a very non-trivial task, especially since the databased is changed almost constantly. Imagine sending 10 copies of every change out, instead of sending pages as requested. My page reading vs. page modification ratio is probably about 5 to 1, meaning that a system with 10 servers would make my use of the central server double. Normally more servers means less load, because there is very little syncronization needed, but with Everything2, there is so much syncronization needed that it may end up using more bandwidth than the original system (we'd have faster node loads, though)

In all the times I have implemented multiple server configurations, it get amazingly complex, even using pre-written interaction tools. Making Everything2 multi-server would be a nightmare. I am not saying that it couldn't be done, and it certainly has advantages, but I think they are insufficient

Page category:

« E2 Client Development »

I like it!

(idea)

by telbij

Mon Apr 14 2003 at 16:22:11

I know these nodes have huge potential for GTKY-ness, but I've actually had this idea bouncin' around for a while and it could be useful.

Everything2 works pretty damn well. It's a little slow, but it works. Gnutella and other P2P systems don't work, at least not in the sense of 100% data availability which is what E2 demands. Distributing updates and maintaining disparate servers seems like a recipe for disaster (as Gartogg points out). The database must remain centralized, and I for one am willing to pay for that luxury. But I think that the situation could be improved by providing a barebones interface to the database. A web service if you will. I know this is not a new idea, but I think it bears repeating in the context of server load.

How much server capacity could be saved is entirely dependent on how much processor time is spent on actually pulling data from the database and how much is spent formatting the pages. As a full-time Web developer, I gotta imagine that a significant amount of time is spent on each page just stitching together the nodelets. Optimizing performance without decentralizing the database is essentially a caching problem. While I presume E2 has extensive internal caching, why can't the client do the caching? Obviously Web browsers don't provide any kind of sophisticated caching, but a custom client could do it quite handily. If it were well-written it could absorb a huge chunk of page processing load as well as reducing page loads overall. Some features are more condusive to this approach than others, fortunately the focal point of E2 (the writeup) is very cacheable.

Here's how it could work:

Create XML tickers for every possible logical unit on the site. XML is a bit text-heavy, and might not be the most efficient format if we're really trying to shave bandwidth, but let's go with what we have for now. The client would build a page by loading several tickers (or a meta-ticker that would initialize the session with the user's choice of page and nodelets). The client would then cache everything. Anything that was cached would not be reloaded until a set amount of time had passed or the user explicitly reloads that piece. The slight inconvenience of reloading individual nodelets would easily be offset by the Everything addiction factor. Having a cached client would be nice because you could easily move between anything you had previously loaded and only reload the exact piece you wanted (catbox anyone?).

Add a few slick interactive features (local bookmarks, scratchpad) and suddenly you'd have a tool that sold itself to the experienced Everythingian. Use Mozilla for emulation of the Web interface and cross-platform goodness. This project would have to be supervised by the E2 developers, of course. But if the caching mechanism was robust and clearly-defined, this tool would improve bandwidth and user experience. Whether true distribution could provide a better bandwidth/processor savings is open to a lot of factors such as distributed server availability and the mechanisms of synchronization, but developing a robust client is a sure bet.

These ideas could also be used in the implementation of a pseudo-distributed-E2-server. Specifically a Web application that performed these caching functions to a bunch of users at once rather than one at a time. This could further reduce the central server load, but would depend on having a sufficient number of these middleware servers and would provide none of the cool instant interactivity that the individual client could provide.

Page category:

« E2 Client Development »

I like it!

e2izer	E2 Link and Logger Client	Hemos	E2 Explorer
Why I still pay (part of) E2's bills	No	world wide web	Forest for the Trees
ecore	You can't get there from here	The worst things about divorce	XML-RPC
Everything2 Hubs	DNS	The all-time best moments of professional wrestling	Alternatives to Everything2
VPN	Distributed Everything	datamine	How to avoid holodeck addiction
Moon Patrol	A view from the bottom of E2	What is edev?