UN*X allows file access from directories, which attach filenames to inodes (or vnodes, if you've been spoiled by NFS and later weenie file systems). The connection filename ⇒ inode is the "hardlink". A "file" (really, an inode, as we've just seen!) can have filenames in multiple directories.

You create a hardlink either by creating a file, or by saying "ln a b", which creates "b" as a filename pointing to the same inode that "a" points to. You could now even say "rm a", "safe" in the knowledge that b still holds your file. You could even write "mv" in this way (apart from inode limits).

Clearly, hardlinks are UN*X's gift to the Klingon programmer.

You cannot have a hard link to a directory in most UNIXen today. This is widely considered a Good Thing: it could badly break directory-traversing programs, except those written by Klingon programmers for non-Klingons. Even if you could, these Klingon-written directory-traversing programs could detect this by examining the inode of .. in every directory they enter, refusing if that inode doesn't match the inode of the directory through which they're entering. Sure, they'd miss some directories (and consequently presumably have to kill themselves), but they wouldn't enter an infinite loop (and consequently have to kill -9 their programmer). Of course, such a shameful program could be fooled, creating directories it can never enter (except when started from), by saying

% mkdir a
% ln a b
% rmdir a

You can always put symbolic links to a directory, though. Symlinks are for weenies. Well, they would be, except that they're far more useful.

The main reason the Unix hard link is confusing is that it suggests a feature in Unix file systems that doesn't actually exist.

In Unix, a file is identified by an inode on a file system. Files have attributes, such as permissions, ownership and various timestamps. None of these attributes is a name for the file.

A directory is a file that contains a mapping of file names to inodes. An inode can appear multiple times, in different directories, or even in the same directory under different names. An occurrence of a file in a directory is called a hard link to the file.

So a filename is not actually a property of a file: it is a property of a hard link to the file, which is an entry in a directory in which the file appears. Creating a hard link is not an operation on a file; it is an edit operation on a directory.

However, the constraint is maintained that every inode has at least one hard link, while every hard link (= directory entry) points to a valid file. To impose these constraints, a file is deleted automatically after the last hard link to it is deleted (and no process has the file open), while a file can only be created together with a hard link to it. (Exceptions are possible, once you know about file descriptors.)

A further constraint is that for every file there is a finite path of hardlinks to it from the root directory, /. This guarantees that every file has a full pathname, also called an absolute name or path to the file: it is formed by concatenating the names of the hard links preceded by /.

A file has a unique full pathname only when all files on the path have no more than one hard link. For directories, this constraint is usually maintained within a single file system, but it isn't across file systems: the same file system can be simultaneously mounted on different directories, and automounters and loopback file systems even allow it to appear in infinitely many different places. All this without using a single symlink.

So now that we know what a hard link is, where does the confusion come from?

It originates from the fact that most operations on files are by filename or full pathname, and practically none of them will increase the number of hard links to a file. Therefore it is natural for users to think of a filename as being a property of the file, indicating its unique point of appearance in the directory tree. Users may work with Unix for years without encountering multiple hard links.

Users who do know about hard links tend to use the term for additional hard links to an already existing file, created, for instance, with the ln command. While this is a natural thought pattern, it leaves the incorrect impression that such additional hard links are in any way different from the first hard link to a file. They are not: once a file has multiple hard links, none of them can be distinguished as "the real filename" or "the original link" in any way. After doing ln a b, I do not have "the real file a" and "the hardlink b"; I have one file with two hardlinks, a and b.

Other systems, such as E2, use the term hard link in different ways.

Log in or register to write something here or to contact authors.