A screen scraper, as I understand it, is a tool to automatically get raw information from an already formatted page. This includes a number of CPAN Perl modules that get the HTML code and strips the HTML formatting to get the raw data. A number of CPAN modules do this for e.g. TV program info and have gotten in trouble for it.

The screen scraper can also be used to present the formatted data in other ways, thus re-formatting it.

A screen scraper would have a procedure somewhat similar to this:

  1. Get formatted data, store in RAM or harddisk
  2. Analyze formatted data, determine what deformatting rules to use
  3. Apply deformatting rules to remove formatting
  4. Store in RAM or harddisk
  5. (Optional) Reformat to new data format and store or display

Log in or register to write something here or to contact authors.