Long File Names
With the release of Windows® 95, users were released from the bonds of the standard eight character file name and three character file extension for file naming conventions that had been imposed by the earlier MS-DOS® file system. Although Microsoft began moving to long file name conventions with the advent of Windows® for Workgroups, it wasn’t fully implemented until the development and release of Windows® 95 with its VFAT (Virtual File Allocation Table). If you are a recent computer user, you might well imagine the rather inventive, yet cryptic, file names earlier users had to develop for their files in order to identify them given the limitation to only eight (8) characters. Although an inventive (but cryptic) name might make complete sense at the time you create it, six months down the road might present a problem when trying to find a file. Given Microsoft’s “ease of use” competition with Macintosh, it is quite surprising that they waited until the development of Windows® 95 to solve this.
When Microsoft developed Windows® NT, it also developed NTFS (New Technology File System), which had been developed from the ground up by Microsoft and included long file name support. Microsoft was determined to bring long file name support to Windows® 95, however, doing so would involve a struggle. Microsoft needed to maintain compatibility with existing disk structures, older versions of DOS, previous versions of Windows, as well as older applications designed around these earlier operating systems. Unfortunately they couldn’t merely discard everything that had come before as they had with Windows® NT. Microsoft needed to work with the restriction of “8.3” (standard file name sizes) file names within the directories if they wanted create compatibility.
Microsoft had a lofty set of goals to meet in order to insure compatibility:
- They wanted Windows® 95, as well as any applications written for Windows® 95 to use file names considerably longer than an 8 character file name and 3 character extension.
- To insure compatibility, any new long file name conventions must be able to be stored on existing MS-DOS® volumes using standard directory structures.
- A method would need to be created that would allow previous software versions to access their files that would be stored using these new file names. A conversion routine would need to be developed.
As we mentioned above, during the development of Windows® 95, Microsoft also developed (VFAT), a Virtual File Allocation Table file system. For the most part, VFAT accomplishes each of the goals identified by Microsoft’s engineers.
- It supports long file names of up to 255 characters per file, which can be assigned to any file within Windows® 95, or by any program written for Windows® 95. Microsoft recommended however, that file names under 100 characters be used.
- Support for long file names was also made part of MS-DOS® version 7.x, the underlying operating system of Windows® 95.
- The existing file extensions structure was maintained to preserve the way in which they were used by existing software.
- Although long file names are limited to the same characters as standard file names are, additional characters were added: + , ; = [ ].
Although created for Windows® 95, VFAT was retained for all subsequent versions of Microsoft’s 9x operating systems, which included Windows® 98 and Windows® ME. The FAT 32 file system was built around VFAT, and uses the same conventions.
In order to create compatibility and permit access by older software packages, files that use a long file name also have a standard file name alias that is automatically assigned to it. This is often referred to as a short file name in order to distinguish it from a long file name. This process of adding an alias is performed by truncating and modifying the file name.
- Before the alias name is created, the conversion routine analyzes the first six characters of the long file name, without consideration given to any blank spaces. Characters that are valid in long file name convention but not in standard file name convention, such as + ; = [ and ], are replaced by underscores, and then all lower-case letters are converted to upper case. These six characters are then stored as the first six characters of the file name.
- The last two characters of the file name are then assigned as “~1”. If the “~1” causes a conflict because there is already a file with this alias in the directory, then the routine tries “~2”. This process continues until the routine finds a unique alias for the file.
- The 3-character long file name extension is transferred to the extension of the alias file name, although the file name can have an extension of only one or two characters.
Let’s take a quick look at an example to see how this works. You’re working in Windows® 9x and have just created a Word document that is your proposed budget for the coming year.
You save this document as: Proposed Budget for 2003.DOC
Windows® will save this document exactly as you named it, however the alias naming convention will cause this document to be saved as PROPOS~1.DOC.
Taking this a step further, let’s say you create an amended budget, and name that file: Proposed Budget for 2003 As Amended.DOC. Note, that except for the addition of ” As Amended”, the file name in the beginning is identical to your first one.
When the alias routine reads the first six characters, appends the ~1 to the file and then attempts to save it, it will see that a file already exists. Windows® will then append a ~2″ to the file and attempt to save it again. If successful, your alias file will look like: PROPOS~2.DOC.
By doing this, Microsoft has solved the problem of “ease of use” as well as making the files available to older software by allowing the reference to the file using the older 8.3 naming convention. Even the use of spaces in long file names is permitted, and doing so doesn’t cause any problems as applications designed for Windows® 95 are aware of their use, and because the short file name alias has the spaces removed.
Now that you have some idea of how Microsoft conquered the “ease of use” and “compatibility” issues, let’s take a look at what they did to be able to store and retrieve these files through various applications. While files using the long file name conventions are stored in regular directories using standard directory entries, there are a few other issues that were dealt with. The Window® 95 file system creates a standard directory entry for the “long file name” file, into which it puts the short file name alias. It then creates several additional directory entries to hold the rest of the long file name. This is important to understand, as a single long file name can use (or create) quite a few directory entries, as each entry is only 32 bytes in length. It is, for this reason, recommended that long file names not be placed in the root directory of a VFAT partition, as the total number of directory entries in the root is limited.
The FAT 32 file system removes this root restriction, however limiting the number of “long file name” files you place in the root of a partition is still good practice!
In order to ensure that pre-Windows® software versions aren’t confused by this change in directory use, each of the extra directory entries used to hold the long file name information are tagged with the an odd combination of file attributes. Odd in the sense that they really didn’t exist on the same scale as they did after the release of Windows® 95 and MS-DOS® 7.x.
- read-only
- hidden
- system
- volume label
This will ensure that older versions of MS-DOS® will not try to do anything with these long file name entries, including not overwriting them because it thinks they are not being used. This combination of file attributes causes older software to ignore the extra directory entries being used by VFAT as this combination of file attributes had no valid meaning in earlier coding.
Microsoft is often criticized for the way in which the improved the usability of Windows® 95 while remaining compatible with old software kind of shows. Their development of VFAT is often referred to as a hack built on top of the standard FAT file system. As with any software change, there are numerous problems that arise, and below are some of those that involve long file names that we felt you should be aware of:
- Compatibility Problems with Older Utilities:
As noted earlier, Microsoft’s long file name routine in VFAT marks the extra entries that are produced with special attributes such as read-only, hidden, system and volume label in an effort to prevent normal applications from disturbing them. However, a disk utility program like Norton® Disk Doctor is not fooled by this, nor is it prevented from determining that something is broken and needs to be fixed. If you happen to be using an early version of Norton’s DD that is not long file name aware, it will detect these entries as errors on your disk and tidy them all up for you. Unfortunately though, that’s the end of your long file names! Utilities run under Windows® 9x and Windows® ME must be aware of long file names in order to perform properly. Given that long file names having been in use for several years now, this isn’t as critical as it once was, just don’t use any old disk utility software! - No Long File Name Support with Older Software:
Older applications will work with long file names by relying upon their short name alias. By themselves, they have no ability to directly access a long file name. One of the biggest problems facing computer users was their own refusal to update older software. One very common example happened to be backup programs originally written for MS-DOS® only that would accidentally trash long file names as it wasn’t long file names aware. It saved the alias file (8.3 type file) without difficulty, but everything else was history. Unfortunately it wasn’t until a restoration process started that anyone realized what was happening. Another problem arose with older DOS related word processing applications. Most weren’t aware of the fact that if you load a file into an older program using the alias, you can only save back to the same file name, or to another 8.3 file name. If you save back to the same name in the same directory, the original long file name is retained. - Problems with Conflicting Alias File Names: There are two significant problems with the long file name alias mechanism:
- Alias Names Change: The alias name assigned to the long file name is not permanently linked to it, and can change. If we were to take the example document we used earlier: “Proposed Budget for 2003.DOC” and save it to a new empty directory, it would be assigned the alias “PROPOS~1.DOC”. If we were then to copy that same file to a directory that has the file “Proposed Budget for 2003 As Amended.DOC” in it, which just happens to be already using the alias “PROPOS~1.DOC” we would find that the operating system will change the alias for the new file to “PROPOS~2.DOC”. Obviously this is confusing to most people who would be referring to the file both by the long file name as well as the alias.
- Alias Duplicates Can Be Overwritten:
This second scenario is quite a bit more serious then the first. Using the same example as above, we’ll make the presumption that, although you are using Windows® 95, you are also using an MS-DOS® based word processing program. You’re merrily working on your amended budget and decide to save it with a few changes and place it into the same directory with the rest of your budget documents. The older application won’t be aware of long file names, all it will see is “PROPOS~1.DOC” and it think it is the same file, and then overwrite it. If you’re fortunate, it may ask if you if you really want to overwrite the older file. Bear in mind though, this is a maybe, and it’s quite possible that your earlier work just became history!
- Copying or Restoring Files Can Change their Alias Names:
In Windows® 95, 98 and ME (and under certain circumstances Windows NT), when copying a file with a long file name from one partition to another, or restoring one from a backup, the associated short file name alias is changed. As yet there’s no concrete explanation for this, however it can cause some strange behavior as the hard-coded references to the short file name no longer resolves to the correct file. As an example, many of the Windows registry entries refer to the short file name alias, not the long file name. In addition, in spite of the fact that Windows NT’s NTFS file system was built from the ground up, it too is plagued by this problem because it relies on the long file name alias for limited backward compatibility.
File Names and their Extensions
As your are probably already aware, DOS-based computer files are named using a fixed 8.3 format convention that has been in use since IBM’s release of the first personal computer. This file naming conventions is comprised of two parts:
- File Name:The file name is the initial eight (8) character file name for the file itself. This part of the naming convention must be between one and eight characters in length. In the event this file is deleted (but it’s really not gone) the system changes the first character, placing therein a special hex byte code of E5h to identify it as a deleted file.
- File Extension:
A file extension is usually comprised of three characters appearing after the “dot” that appears after the file name itself. A file extension need not be used (read that as no extension is required), but it can be comprised of up to three alpha-numeric (3) characters. Most frequently you will see this as .com, .exe, .bat, .htm, .dos, .jpg, .psp, .psd etcetera.
File extensions are often the cause of confusion, and hopefully the following will clear up some of this. Most of today’s operating systems, especially Windows®, rely upon file extensions in order to do two things, associate a specific file or group of files with a given program, if not the operating system itself, as well as determine what to do with a specific file in the event some form of command is executed that calls a specific file. Confused yet?
Normally, a file extension lets you, but more importantly lets the operating system, in a simplistic way, know what type of file it is looking at. As an example, most .com, .bat and .exe files are operating system or program related. In all cases, .exe files are executables crafted to perform a specific function. Likewise, .jpg, .bmp, .psd and .psp for example, are all normally graphic related files. If you were working only in DOS environment, many of the file names used today would be meaningless to that operating system. DOS has no idea what an .htm, .jpg, .bmp, .psd or .psp is. On the other hand, DOS knows exactly how to handle files with extensions that were written as part of its original source code, such as .exe, .com, .bat etcetera.
Windows®, on the other hand, needs the file extensions to both organize files as well as associate them with the programs that require them. You can put this importance to the test by trying to change a file extension. Windows will waste no time in warning you that doing so may render the file useless, or damage the program that requires the file you’re trying to change.
Prior to the release of Windows®, there were but a few file extensions to worry about, and no one gave any thought to standards or consistency. Today, after the release of several versions of Windows®, using consistent file extensions is important. Windows® maintains a list of extensions and their file associations, which guides Windows as to what needs to be done when you double-click on a specific file types. Windows® will automatically launch the program that it associates with the file you select, and then tell the program to open the file you selected. Windows® does not examine the content of the file when it launches what it perceives as the associated program. It merely looks at the file extension!
Although we would like to think otherwise at times, Windows® is not perfect when selecting which program to open when you double-click on a file. Sometimes there are wrong program file extension associations, and even those occasions when the association is missing entirely. This is why Windows® will permit you to add or change a file association. If the file extension is unknown to Windows (for what ever reason), it will ask you which program you would like to use to open it.
The following characters are permitted as part of the DOS file naming convention: A-Z 0-9 $ % ‘ – _ @ ~ ` ! ( ) ^ # &. Although a blank space is a valid character, we recommend that you not use it as many programs, including Windows®, become confused by file names that contain blank spaces in them. Microsoft specifically recommends that you avoid this practice in Windows® 9x/ME because of the 8.3 alias for long file names.
Notice: Windows® 95, Windows® 98, Windows® NT, Windows® 2000, Windows® XP and Microsoft® Office are registered trademarks or trademarks of the Microsoft Corporation.
All other trademarks are the property of their respective owners.