The type of string used in the Pascal programming language. Like most things related to Pascal, it is not the least bit confusing, but really kind of limiting when you think about it. The concept of Pascal strings is not necessarily limited to Pascal; some C programmers choose to use Pascal strings internally just because they think it's a better way of doing things.

A Pascal string (in any language) is stored in memory as a series of bytes representing characters. The characters begin at the second address (in C, this would be address 1); the first address contains the length of the string as a raw number value.
That's about it. You just set the first byte for your length, and add whatever characters you need. Like a C string, a four-letter string will actually be five characters long because the first character is in use (and equal to "4").
The general complaint is that if your length has to be expressed as a single byte, your string length is limited to what a byte can express (no higher than 256, and the first byte is reserved for the length). I don't know how exactly Pascal programmers get around this, but i imagine they must have found some way, because there's no way you can live with only 255-character stings.

A lot of the classic macintosh toolbox functions expect to be passed strings in Pascal string form. This is maybe the worst thing about programming for the mac at first, because you have to keep track of where you want to use a C string and where you want to use a Pascal string and when to convert between the two. There are, by the way, standard p2c() and c2p() library functions that will convert between Pascal and c strings.

See also C string.

Pascal was originally designed for teaching programming, not for real-world software development.

The language had a string type suitable for demonstrating the concepts of text manipulation without having to deal with memory management. As this was the days before object orientation, the string type had to be built in to the language, an atomic type as with integer or char. Like these, it occupied a fixed number of bytes on the stack.

Here is an e.g. in standard Pascal:

procedure StringDemo;
var
  I: integer; // occupies four bytes on a 32bit OS
  C: char; // one byte
  S: string; // 256 bytes
  ca: array[1..1024] of char; // 1024 bytes - c-style strings can also be done
  pc: pointer;
begin
  // copy a p string to a char array the hard way
  for i := 1 to length(s) do
  begin
    ca[i] := s[i] ;
  end;
  ca[length(s) + 1] := #0; // null terminate

  // get the adress - this is now Ok for a c string
  pc := @ca;
end;

The layout of a standard Pascal string is quite simple: It occupies a fixed 256 bytes. The first byte stores a length, the remaining bytes contain the characters. The string is not null terminated. It is likely that this length was thought at the time to be a reasonable compromise between usable length and not wasting to much precious memory. This was fine for teaching purposes, but falls short for real world usage in two ways:

  • For many uses, it is just too short. 255 characters is not enough to store a HTML page or the contents of a text file, or even a long SQL query.
  • It is not compatible with most OS APIs. C or C++ is the language of choice for systems programming, and vast majority of operating system libararies (including those of Windows and Linux) expose an API of C functions. The API functions will thus want C strings, that is the address of an array of chars that ends with a null (zero) byte. The Pascal string, as noted, is not null terminated.

You can try to make Pascal strings work with OS APIs (assuming that your Pascal has a few additions like pointer arithmetic and the like.

Assuming that the string that you wish to pass to the C api is shorter than 255 chars, you can forcibly put a null character #0 on the end of the string and can then pass the address of the first char to the API.

There are even worse problems with return values – the API will generally expect the address of a buffer, which it will fill. It will not correctly set the length byte for you.

Alternatives to Pascal strings in Pascal programs

Use char arrays
If you are using a C API from a Pascal program, it is a better idea to go down to the level of a plain C programmer, and use an array of characters. I would think that this can be done in any version of Pascal. You can write routines to pack and unpack these arrays from regular Pascal strings, up to the null char (see the code above).

Use a class
If you are working in a Pascal (or Pascal-derived language), which has only Pascal strings, but some Object Orientation (Delphi version 1 fits this category), there may be a class (e.g. TStringList in Delphi), which is capable of storing longer pieces of text.

Delphi strings
Delphi strings were implemented in Delphi 2 onwards, i.e. in the first 32 bit version of Delphi, released in 1996. They are the default string type - if you are using Delphi 2 or later, then unless you are trying hard, you are actually not using pascal strings at all.

Delphi strings are a complete replacement for Pascal strings. They have a 4-byte length field (i.e. they can hold 2Gb, or until you run out of memory, whichever comes first), are always null-terminated for easy use with OS functions, and have reference counting with copy-on-write.

The layout of a Delphi string is slightly more complex than a pascal string. The Delphi help explains it well:

A long-string variable is a pointer occupying four bytes of memory. When the variable is empty—that is, when it contains a zero-length string—the pointer is nil and the string uses no additional storage. When the variable is nonempty, it points to a dynamically allocated block of memory that contains the string value, a 32-bit length indicator, and a 32-bit reference count. This memory is allocated on the heap, but its management is entirely automatic and requires no user code.

Because long-string variables are pointers, two or more of them can reference the same value without consuming additional memory. The compiler exploits this to conserve resources and execute assignments faster. Whenever a long-string variable is destroyed or assigned a new value, the reference count of the old string (the variable’s previous value) is decremented and the reference count of the new value (if there is one) is incremented; if the reference count of a string reaches zero, its memory is deallocated. This process is called reference-counting. When indexing is used to change the value of a single character in a string, a copy of the string is made if—but only if—its reference count is greater than one. This is called copy-on-write semantics.

The block of memory is sized appropriately to the content. So it looks to the programmer like a simple variable, but under the hood it is a pointer to a buffer. Casting the string to pchar (pointer to character, i.e. a C string) just gets the address of the first character.

Log in or register to write something here or to contact authors.