Thursday, March 28, 2013

File Management

Purpose of a File Management


The file manager handles all files on secondary storage media. To perform these tasks, file management must:
  • be able to identify the numerous files by giving unique names to them
  • maintain a list telling where exactly each file is stored, how many sectors on the medium it occupies, and in which order those sectors make up the file
  • provide simple and fast algorithms to read and write files in cooperation with the device manager
  • give and deny access rights on files to users and programs
  • allocate and deallocate files to processes in cooperation with the process manager
  • provide users and programs with simple commands for file handling



File names, naming conventions


In order to make users, programs and the file manager itself able to identify the different files they must be given a unique file name.

The relative file name is what a user normally recognises as file name; it consists of a name and an extension, for instance problem.txt or forloop.cpp. Apart from some exceptions, relative file names look the same in all operating systems.
The name is normally given by the user, whereas the extension (which is separated from the name by a dot) generally indicates what kind of file it is.

The absolute file name is normally much longer than the user thinks it is. Here, the relative file name is preceeded by the place on disk it is stored, that is: the drive name and the directory names in which to find that file.
So the absolute file name consists of:
  1. drive name
  2. directory name(s)
  3. file name
  4. extension
File name and extension are separated by a dot. The directories are separated by slashes (UNIX) or back slashes (Windows, DOS). Because drive names and file organization differ from OS to OS, absolute file names look different depending on what operating system is used.

For instance, a file with the relative name syllabus.doc, saved by the user Peter in the directory data would look like that
in DOS:
c:\data\syllabus.doc
in LINUX:
/usr/home/Peter/data/syllabus.doc
Note that the absolute file name changes when the location is different. The relative file name, however, stays the same. So, after saving that file on a floppy disk, the absolute file name of the backup would be
in DOS:
a:\syllabus.doc
in LINUX:
/mnt/fdd0/syllabus.doc


A relative file name is restricted in length. How this restriction exactly looks like again depends on the OS. DOS has the hardest restictions, allowing the file name and also all directory names only to be 8 characters long, and the extension 3. This is properly known as "8.3"-restiction (speak: eight-dot-three). All other OS's allow the relative file name to be at least 14, but most often up to 255 characters long.

File allocation on storage media

On the storage medium a file is saven in blocks (sectors) of equal size. To access these files, device manager and file manager work together: The device manager "knows" where to find each sector on disk, but only the file manager has a list telling in what sectors either file is stored. This list is the File Allocation Table (FAT)

There are different ways of allocating files. The main concern is to provide a strategy that lets the FAT not grow too large, that makes it possible to retrieve a special sector of a file, and that wastes not too much storage space.
  • contiguous file allocation
  • non-contiguous file allocation (FAT)
  • chained allocation
  • indexed allocation

Contiguous file allocation

With contiguous file allocation a single set of blocks is allocated to a file at the time of file creation. Each file is stored contiguously, one sector after another.

The advantage is that the FAT only has to have a single entry for each file, indicating the name, the start sector, and the length. Moreover, it is easy to get a single block because its address can simply be calculated: If a file starts at sector c, and the nth block is wanted, the location on secondary storage is simply c+n.

The disadvantage is that it may be difficult (if not impossible) to find a sufficiently large set of contiguous blocks. From time to time it will be neccessary to perform compaction.

Contiguous file allocation is nowadays only used for tapes and recordable CDs. One does not make use of compaction algorithms, though, because data there is not supposed to be changed. It is rather overwritten/thrown away if no longer needed.

Non-contiguous file allocation (FAT)

With non-contiguous file allocation all blocks of a file can be distributed all over the storage medium. The File Allocation Table (FAT) lists not only all files, but has an entry for each sector the file occupies. Because all information is stored in the FAT, and no assumption on the distribution of the file is taken, this method of allocation is sometimes simply called FAT.

The advantage is that it is very easy to get a single block, because each block has its entry in the FAT. Additionally, it is a very simple allocation method where not much overhead is produced and no sophisticated search method for free blocks is needed.







No comments:

Post a Comment