Jung-Markov Collective Dream Constructor

Ever since I first noticed the existence on E2 of the dream logs I was fascinated by them. Here was a vast storehouse of the unconscious, the dreams of perhaps hundreds of people all in one convenient place, and in a convenient format. I've been interested for awhile in Jung's theories of the collective unconscious and synchronicity, and i've also been very interested in the fascinating and often very humorous coincidences that occur when you collage 2 or more texts by 2 or more different people together. So I had an idea: use the Everything2 Dream Logs as source material for randomly (or is it random?) generated uber-dreams - dreams of everyone. The Collective dreams of all noders. And hence was born the Jung-Markov Collective Dream Constructor.

This is an early version. as it works now, the program selects random dream logs from the list of dream logs, and downloads them in XML format. Because of the often horrible E2 lag, this can take awhile, so I may modify this so it works from local copies of the XML files, and perhaps set up a cronjob that downloads the new dream log every day. Also planned for the future is a CGI version so that anyone can operate this from their browser. Anyway, here's the code:

#!/usr/bin/perl
# jung - grab xml dream logs from Everything2
# culling out only the document text of each
# writeup, then "travestize" it.
# steev hise, august 2001. steev at datamassage.com
#  
#   $Id: jung,v 1.1 2001/08/04 19:04:45 steev Exp $
#
#   copyleft steev hise, 2001. licensed under the GNU GPL.
#   see http://fsf.org for details.
#################################################

use strict;
use lib '.';
use E2::DoctextXML;        # my module for parsing the XML
use Text::Travesty;        # my modularization of "travesty"
use Data::Dumper;

# my $xmlfile = $ARGV[0];
# dreamlogs list.
my $baseurl = "http://www.everything2.com/index.pl";
my $starturl = "$baseurl?node_id=828102";  
my $ndream = 20;  			     # number of dreamlogs to grab.
my $pattern = '(node=Dream%20Log.*?)&';   # what we look for in the list
my $inputtext;

# grab and parse the DreamLogs Superdoc - the list of all Dream Logs.
my $listpage = &webget($starturl);
my @rawlist = split m|</a>|, $listpage;
my @urllist = grep s/.*$pattern.*/$1/, @rawlist;
print STDERR "grabbed list of urls.\n";
# print join "\n", @urllist;

# now get a certain number of urls from this list, chosen randomly.
# then grab those pages, concatenating them into one string.
for (1..$ndream) {
   my $node = $urllist[int(rand($#urllist))];
   my $url = "$baseurl?$node" . '&displaytype=xml';
   $inputtext .= &webget($url);
   print STDERR "grabbed $url\n";
}

# a bunch of special character matching. stupid microsoft word (probably).
# expat, the xml parser library, doesn't like special characters.
$inputtext =~ s/\xA3/\$/g;
$inputtext =~ s/\x93|\x94/"/g;
$inputtext =~ s/\x92/'/g;
$inputtext =~ s/[\x80-\xFF]//g;   # destroy any other nonascii characters.
$inputtext =~ tr/\r/\n/;	  # replace carriage returns
$inputtext =~ s/&lt;.*?&gt;//gi;  # remove "fake" html tags.

# save in temp file for debugging
open F, ">/tmp/temp.xml";
print F $inputtext;
close F;

# now parse the XML.
my $xmlobj = E2::DoctextXML->new($inputtext);
$xmlobj->parse();

# we may want to have more detailed error-handling...
# such as checking to see if the XML files exists, etc etc.
if($xmlobj->error) {
	print STDERR "ERROR:" , $xmlobj->error;
	print "--------------------\n\n$inputtext\n\n---------------------\n";
	exit;
}

#print Dumper($xmlobj);

my $output = $xmlobj->{doctext};
$output =~ s/<[^>]*>//g;   # get rid of html tags
$output =~ tr/[]|//d;       # get rid of E2 brackets
$output =~ s/\s+/ /gi;   # canonicalize spaces.
my $obj = new Text::Travesty($output);

my $in;
while($in ne "n\n") {
   print $obj->travestize(200), "\n\n";
   print "\nMORE?\n";
   $in = <STDIN>;       # keep going as long as we don't press 'n'
}


sub webget {
   use LWP::UserAgent; 
   use HTTP::Request; 
   use HTTP::Response; 
   my ($url) = shift @_; 
   $| = 1;                                      # to flush next line 
   # printf "%s =>\n\t", $url;
   my $ua = LWP::UserAgent->new(); 
   $ua->agent("Jung/v0.1"); 	# give it time, it'll get there
   my $req = HTTP::Request->new(GET => $url); 
   $req->referer("http://datamassage.com"); 
   my $response = $ua->request($req);
   if ($response->is_error()) {
        warn " %s\n", $response->status_line;
        return 0;
    } else {
        my $content = $response->content();
        return $content;      
    } 
}

some example output of this program can be found at Dream Log: August 24, 2001 (with $ndream=20) and at Dream Log: August 7, 2001 (with $ndream=10).

Dream Log: August 24, 2001	Dream Log: August 7, 2001	Text::Travesty	E2::DoctextXML
Markov chain	collective unconscious	Dear Postal Customer	Synchronicity
Everything2 Help	August 24, 2001	TINSTAC	intertextuality
Everything 2 Neural Network Chaos	William S. Burroughs	October 12, 2010	Electric Sheep
March 7, 2003	Dream Log: December 25, 2001	Dream Log: May 30, 2002	Slow Wave
use strict	E2 Lag Reduction Suggestions for Noders	Dream Log: January 11, 1997