display | more...

Having fallen quickly into a deep addiction to Everything2, I soon realized that my life could not be complete until I was able to get the New Writeups Nodelet into my RSS reader. I was soon directed to the New Writeups XML Ticker which provided me a machine-readable account of what I was after. Alas, NetNewsWire wasn't interested in digesting that (nor did I expect it to be), and so I decided it was up to me to write something that would eat E2link-formatted XML and poop out RSS for my compulsive newsreading pleasure.

My own personal copy of this (for you to try, if you so desire) is at:

http://notion.gestalt-inc.com/e2new/index.cfm

BUT, if you find this useful enough, and have access to a ColdFusion 6.1 or better web application server, than you can copy the code below and have your very own copy to play with and make changes to to your heart's content.


NOTE: for readability's sake, I have added some extraneous line breaks into the code below. If you use this on your server, feel free to make the code pretty again, or contact me and I'll get you a better copy.


<!--- 
Everything2 New Writeups XML ticker to RSS format converter   Version 1.0  May 26, 2004
Written by Sigil (Andy Waschick) @ Gestalt, Inc.  http://www.gestalt.cx   

This file sits on a conveniently-located ColdFusion 6.1 (or equivalent function) web application server 
and, when a request is made to it, will call the E2 New Writeups XML ticker and convert it on-the-fly 
to an RSS feed, suitable for display in your favorite news reader. 
NOTE:  This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. 
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.0/ or 
send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
--->


<!--- Translator default settings - these are set as Custom Tag defaults in case you want to take this 
file and integrate its into something else.  --->

<cfparam name="attributes.e2url" default="http://www.everything2.com/index.pl?node_id=1291781"> 
<!--- the URL of the XML ticker --->
<cfparam name="attributes.e2timeout" default="120"> 
<!--- the number of seconds you're willing to wait for your fix --->
<cfparam name="attributes.servertimezone" default="CST"> 
<!--- the time zone of this server --->

<cfparam name="attributes.appname" default="E2RSS"> 
<!--- the name of the CF app we're using on your server 
to track what's old and what's new --->
<cfparam name="attributes.targetarray" default="newwriteups"> 
<!--- the name of the XML structure which 
contains the "wu" array we're after --->

<cfparam name="attributes.rss_headline" default="Everything2 New Writeups"> 
<!--- the title of the RSS Feed --->
<cfparam name="attributes.rss_link" default="#cgi.server_name##cgi.script_name#"> 
<!--- the link to this feed.  Generally the automatic self-referencing link will do fine, 
unless you want to get all fancy --->
<cfparam name="attributes.rss_description" default="Like you were wasting enough time on E2.  
Here's the latest new writeups!  Say, if you like this and have a ColdFusion 6.1 + server, you should 
set up your own copy so you won't have to use up all my server bandwidth.  Enjoy!"> 
<!--- the description of the RSS feed --->



<!--- NO USER SERVICEABLE PARTS BELOW HERE --->

<!--- initialize the web application --->
<cfapplication name="#attributes.appname#" sessionmanagement='no' clientmanagement='no'>

<!--- initialize the ID cache variable and/or pull its existing value out of the application variable --->
<cfset variables.newidcache = structNew()> <!--- the ID cache we'll be writing back to the application --->
<cflock scope="application" type="readonly" timeout="30">
	<cfif isdefined("application.idcache") is "yes">
		<cfset variables.idcache = application.idcache>
	<cfelse>
		<cfset variables.idcache = structNew()>
	</cfif>
</cflock>

<!--- go talk to the E2 site --->
<cfhttp url = "#attributes.e2url#" timeout = "#attributes.e2timeout#">

<!--- make sure there's a value for the returned response --->
<cfparam name="cfhttp.filecontent" default=""> 

<!--- get the response data and turn it into an struct we can talk to --->
<cfset variables.e2response = xmlparse(cfhttp.filecontent)>

<!--- get the target array out of the XML response --->
<cfset variables.wuarray = variables.e2response[attributes.targetarray]>



<!--- RSS Feed Output --->

<!--- HEADER --->
<cfoutput>
<cfcontent type="text/xml" reset="yes"><?xml version="1.0" ?> 
<!-- RSS generated by Sigil's E2 to RSS translator at #attributes.rss_link# on 
#dateformat(now(),'ddd, dd mmm yyyy')# #timeformat(now(),'HH:mm:ss')# local server time -->
<rss version="2.0">
<channel>
	<title>#attributes.rss_headline#</title>
	<link>#attributes.rss_link#</link>
	<description>#attributes.rss_description#</description>
	<language>en-us</language>
	<lastBuildDate>#dateformat(now(),'ddd, dd mmm yyyy')# #timeformat(now(),'HH:mm:ss')# 
	#attributes.servertimezone#</lastBuildDate>

<!--- BODY --->
<cfloop from="1" to="#arraylen(variables.wuarray.wu)#" index="itemindex">
<!--- loop over the entries in the writeup array --->
<cfset thisitem = variables.wuarray.wu[itemindex]> 
<!--- get this writeup info into a variable that's less of a mouthful --->	

<!--- determine the time/date stamp we should apply to this node --->
<cfif structkeyexists(variables.idcache,thisitem.e2link.xmlattributes.node_id)>
	<cfset variables.date = variables.idcache[thisitem.e2link.xmlattributes.node_id]> 
	<!--- this node has been translated by t
	his app before, use that datetime --->
<cfelse>
	<cfset variables.date = now()> 
	<!--- this is a node this app doesn't know about.  Set a new date value. --->
</cfif>

<!--- copy this node ID into the new ID cache variable so it will be remembered for later.  
This is so old nodes no longer needed in the ID cache will be cleared out --->
<cfset variables.newidcache[thisitem.e2link.xmlattributes.node_id] = variables.date>

<item>
<title>#thisitem.e2link.xmltext#</title>
<link>http://www.everything2.com/index.pl?node_id=#thisitem.e2link.xmlattributes.node_id#</link>
<description>
<a href="http://www.everything2.com/index.pl?node_id=#thisitem.e2link.xmlattributes.node_id#">
#thisitem.e2link.xmltext#</a> 
by 
<a href="http://www.everything2.com/index.pl?node_id=#thisitem.author.e2link.xmlattributes.node_id#">
#thisitem.author.e2link.xmltext#</a>
</description>
<author>#thisitem.author.e2link.xmltext#</author>
<pubDate>#dateformat(variables.date,'ddd, dd mmm yyyy')# #timeformat(variables.date,'HH:mm:ss')# 
#attributes.servertimezone#</pubDate>
<guid>http://www.everything2.com/index.pl?node_id=#thisitem.e2link.xmlattributes.node_id#</guid>
</item>

</cfloop>

<!--- FOOTER --->
</channel>
</rss>
</cfoutput>


<!--- remember the ID cache for later --->
<cflock scope="application" type="exclusive" timeout="30">
	<cfset application.idcache = variables.newidcache>
</cflock>


Having found Sigil's nifty New w/u -> RSS proxy, I soon realized that my life would not be complete without such a script on my own computer ;-) . Since I'm not really fond of ColdFusion (actually it's the second time I've encountered it in quite a long time), I've decided do something similar in PHP, which I'm more familiar with (OK, OK, I'm sure Perl would be more suited to the task and whatnot, but my experience with PHP is somewhat larger). I'm aware that the code is an ugly hack and that I should have used XML parser, but ... it works. I might rewrite it if I'm motivated enough.

Unfortunately, I don't have a hosting which would allow socket functions, therefore I'm unable to give you an URL:-(. I'm running it on Apache 1.3.24 with PHP 4.3.1, but it should run on about anything that supports socket functions.

So, here it is - comments and corrections are welcome.

BTW as far as copyright is concerned, I did ask for and did receive Sigil's permission to reverse-engineer and translate his code.

<?
/*
 * e4rss - Everything2.com New Writeups XML ticker to RSS converter 
 *  
 *  Version 1.0  January 24, 2005
 *  Written by solaraddict
 *  Inspired by Sigil's New Writeups RSS Feed for ColdFusion
 *
 */
 

// edit these if you're behind a proxy
$proxy = '';
$proxyPort = 0;



/* the following items will almost certainly need no tweaking */

// everything2 site stats
$address = 'www.everything2.org';
$port = 80;
$site = "http://$address/index.pl?node_id=";

// this is the node_id of New Writeups XML Ticker
$newWriteupsTicker = '1291781';
// connection timeout
$timeout = 30;


/* nothing needs to be changed below here */

// variable will be set to server time
$date = 0;
// where is this script running from?
$thisScript = 'http://' . $_SERVER['SERVER_NAME'] . $_SERVER['SCRIPT_NAME'];

/* something actually starts happenning from now on*/
// load the node
list($date,$text) = readWeb($address, $port, $site . $newWriteupsTicker, $proxy, $proxyPort, $timeout);

// define XML to RSS transform regexp
$search = array(	
		"'<newwriteups>'si",
		"'</newwriteups>'si",
		
		"'<wu[^>]+>\s*<e2link node_id=\"(\d+)\">([^<]+)</e2link>\s*<author>\s*" .
			"<e2link node_id=\"(\d+)\">([^<]+)</e2link>\s*</author><parent>\s*" .
			"<e2link node_id=\"(\d+)(.*?)</wu>'si",
		);
$replace = array (
		"<!-- RSS generated by solaraddict's e4rss at $thisScript on $date e2 server time -->
		<rss version=\"2.0\">
		
		<channel>
		<title>Everything2 New Writeups</title>
		<description>All the newest additions to e2 - get them while they're fresh!</description>
		<language>en-us</language>
		<lastBuildDate>$date</lastBuildDate>",

		'</channel></rss>',
		
		"<item><title>\\2</title><description>&lt;a href=\"$site\\1\"&gt;\\2&lt;/a&gt;" .
			" &lt;small&gt;(&lt;a href=\"$site\\5\"&gt;full&lt;/a&gt;)&lt;/small&gt;" .
			" by &lt;a href=\"$site\\3\"&gt;\\4&lt;/a&gt;</description><link>$site\\1" .
			"</link><author>\\4</author></item>"
		);

if ($date > -1) { // download OK
	
	// regexp transform
	$newText = preg_replace($search, $replace, $text);
	
	// send the header, so the RSS reader knows what type of data to expect
	header("Content-type: text/xml");

	// send the transformed text to the RSS reader
	print $newText;
} else { // couldn't connect - don't send any data

}

function readWeb($address, $port, $url, $proxy, $proxyPort, $timeout) { // get the data
	$error = $errorMsg = '';
	
	if ($proxy != '' && $proxyPort != 0) { // we're behind a proxy
		$connectAddress = $proxy;
		$connectPort = $proxyPort;
	} else { // directly connected
		$connectAddress = $address;
		$connectPort = $port;
	}
	
	// connect to host
	$fh = fsockopen( $connectAddress, $connectPort, &$errno, &$errstr, $timeout);
	if (!$fh) { // not connected, go away.
	    return array(-1, "$errstr ($errno)<br>\n");
	} else { // connected, send the request
	    fputs ($fh, "GET $url HTTP/1.0\r\nHost: $address:$port\r\n\r\n");
		// we're not really interested in the headers
		while (!feof($fh)) {
			$line = fgets ($fh,128);
			if (chop($line) == '') { // the empty line which separates headers from data
				break;
			} else { // find the e2 server time
				if (substr($line, 0,5) == 'Date:') {
					$date = substr($line, 6);
				}
			}
		}
		$content = '';
		$buffer = '';
		// read data while there are any
		while ($buffer = fread ($fh, 128)) {
			$content .= $buffer;
		}
		// close connection
		fclose ($fh);
	}
	return array($date,$content);
}
?>

I haven't tried the two solutions presented above (which may well be excellent), but I find XSLT to be a cleaner way of doing XML-to-XML transformations such as this one. Here's what works for me...

Save the following code as e2rss.xsl:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <rss version="2.0">
      <channel>
        <title>E2 New Writeups</title>
        <link>http://www.everything2.com/</link>
        <description>Everything2 New Writeups Feed.</description>
        <language>en-us</language>
        <generator>motiz88's e2rss.xsl</generator>
        <ttl>5</ttl>
        <xsl:for-each select="newwriteups/wu">
          <item>
            <title><xsl:value-of select="e2link"/> [<xsl:value-of select="author/e2link"/>]</title>
            <link>http://www.everything2.com/index.pl?node_id=<xsl:value-of select="e2link/@node_id"/></link>
          </item>
        </xsl:for-each>
      </channel>
    </rss>
  </xsl:template>
</xsl:stylesheet>

This stylesheet, when applied to the document at New Writeups XML Ticker, generates a standard RSS 2.0 feed. All that's left now is to apply the stylesheet to the actual data. This necessarily1 involves some server-side processing - the following PHP 4/5 code2 should do the trick. Save it as e2rss.php on a web server (Windows/Linux/Mac/Toaster - just make sure it has PHP 5 and the XSL extension installed, or PHP 4 and the XSLT extension), in the same directory as e2rss.xsl, and point your favorite RSS reader at its URL.

<?php
$e2feed='http://www.everything2.com/index.pl?node_id=1291781';
$xslfile='e2rss.xsl';
header('Content-Type: text/xml');
if (extension_loaded('xsl')) {        // PHP >= 5.0.0
  $xml = new DomDocument('1.0','iso-8859-1');
  $xml->load($e2feed);

  $xsl = new DomDocument;
  $xsl->load($xslfile);

  $proc = new XSLTProcessor();
  $proc->importStyleSheet($xsl);
  echo($proc->transformToXML($xml));
}
else if (extension_loaded('xslt')) {   // PHP >= 4.0.3
  $xml=implode('',file($e2feed));
  $xsl=implode('',file($xslfile));
  $arguments = array(
       '/_xml' => $xml,
       '/_xsl' => $xsl
  );
  $xh = xslt_create();
  echo(xslt_process($xh, 'arg:/_xml', 'arg:/_xsl', NULL, $arguments));
  xsltfree($xh);
}
else
  echo('<rss><channel><item><title>Install the XSL (PHP 5) or XSLT (PHP 4) extension. Your PHP version is ' . PHP_VERSION . '.</title></item></channel></rss>');
?>

1 I first tried simply sticking <?xml-stylesheet type="text/xsl" href="e2rss.xsl"?> on to the existing XML, thinking it should at least work as a Live Bookmark in Firefox, but, well, it didn't.

2 I wrote the PHP 4 part from examples, so it should work, but I could not test it anywhere. Feel free to send corrections or equivalent code for other languages/platforms.

Log in or register to write something here or to contact authors.