跳转到内容

英文维基 | 中文维基 | 日文维基 | 草榴社区

用户:Junjie Yuan/正

本页使用了标题或全文手工转换
维基百科,自由的百科全书

翻译至:磁盘碎片,翻译自File system fragmentation(英文),oldid=832616347。

磁盘碎片及碎片整理的可视化表示

计算领域,文件系统碎片(英语:File system fragmentation,或称磁盘碎片文件系统老化)是文件系统将文件内容非连续排列以允许就地修改其内容的后果,亦是碎片化的一个特例。磁盘碎片会增加磁盘磁头移动或寻道时间(应用于此处),这会降低磁盘读写性能,进而影响操作系统软件的性能。另外,文件系统不能承受无限制的碎片。对现有碎片的更正是将文件和可用空间重新组织为连续区域,这是一个称为碎片整理的过程。

现代计算机中的固态硬盘(SSD)不是真正的磁盘,也不是“旋转的”(应该是“固定的”),所以没有磁盘碎片问题。事实上,对这些驱动器进行碎片整理反而会缩短它们的使用寿命。

原因

[编辑]

分区首次初始化文件系统时,它只包含一些小的内部结构,其他区域则是一个连续的空白区块。[a] 这意味着文件系统能够将新创建的文件放置在分区的任何位置。在文件系统被创建后的一段时间内,其中的文件布局近乎最佳。而当操作系统软件被安装或存档被解包时,单独的文件最终会按顺序发生,因此相关文件的位置彼此接近。

当现有文件被删除或截断时,将会创建新的可用空间;而当新数据被附加到现有文件时,通常不会在文件结束的地方重新开始写入,因为另一个文件可能已经被分配在那里。因此,必须分配新的磁盘空间片段。随着时间的推移,并且持续存在相同的因素,自由空间以及频繁附加的文件往往会碎片化。可用空间变短也意味着文件系统不再能够为新文件分配连续空间,而必须将它们分解成碎片。当文件系统变满并且大量连续的可用空间区域不可用时尤其如此。

范例

[编辑]
对自由空间碎片和文件碎片发生过程的简单示例

下面的例子是一个简化表示:

新硬盘有5个文件,将其分别命名为A、B、C、D、E,并按顺序连续保存。每个文件使用10个块空间。(在这个例子中,块的大小并不重要。)剩余的磁盘空间则是一个空闲块。因此,可以在文件E之后创建并保存附加文件。

如果文件B被删除, a second region of ten blocks of free space is created, and the disk becomes fragmented. The empty space is simply left there, marked as and available for later use, then used again as needed.[b] The file system could defragment the disk immediately after a deletion, but doing so would incur a severe performance penalty at unpredictable times.

Now, a new file called F, which requires seven blocks of space, can be placed into the first seven blocks of the newly freed space formerly holding the file B, and the three blocks following it will remain available. If another new file called G, which needs only three blocks, is added, it could then occupy the space after F and before C.

If subsequently F needs to be expanded, since the space immediately following it is occupied, there are three options for the file system:

  1. Adding a new block somewhere else and indicating that F has a second extent
  2. Moving files in the way of the expansion elsewhere, to allow F to remain contiguous
  3. Moving file F so it can be one contiguous file of the new, larger size

The second option is probably impractical for performance reasons, as is the third when the file is very large. The third option is impossible when there is no single contiguous free space large enough to hold the new file. Thus the usual practice is simply to create an extent somewhere else and chain the new extent onto the old one.

Material added to the end of file F would be part of the same extent. But if there is so much material that no room is available after the last extent, then another extent would have to be created, and so on. Eventually the file system has free segments in many places and some files may be spread over many extents. Access time for those files (or for all files) may become excessively long.

必要性

[编辑]

Some early file systems were unable to fragment files. One such example was the Acorn DFS file system used on the BBC Micro. Due to its inability to fragment files, the error message can't extend would at times appear, and the user would often be unable to save a file even if the disk had adequate space for it.

DFS used a very simple disk structure and files on disk were located only by their length and starting sector. This meant that all files had to exist as a continuous block of sectors and fragmentation was not possible. Using the example in the table above, the attempt to expand file F in step five would have failed on such a system with the can't extend error message. Regardless of how much free space might remain on the disk in total, it was not available to extend the data file.

Standards of error handling at the time were primitive and in any case programs squeezed into the limited memory of the BBC Micro could rarely afford to waste space attempting to handle errors gracefully. Instead, the user would find themselves dumped back at the command prompt with the Can't extend message and all the data which had yet to be appended to the file would be lost. The resulting frustration would be greater if the user had taken the trouble to check the free space on the disk beforehand and found free space. While free space on the disk may exist, the fact that it was not in the place where it was needed was not apparent without analyzing the numbers presented by the disk catalog and so would escape the user's notice. In addition, DFS users had almost without exception previously been accustomed to cassette file storage, which does not suffer from this error. The upgrade to a floppy disk system was expensive performance upgrade, and it was a shock to make the sudden and unpleasant discovery that the upgrade might without warning cause data loss.[1][2]

类型

[编辑]

磁盘碎片可能发生在多个级别上:

  • Fragmentation within individual files
  • Free space fragmentation
  • The decrease of locality of reference between separate, but related files

文件碎片

[编辑]

单个文件碎片发生在单个文件被分成多个部分(称为基于扩展名的文件系统的扩展部分)时。虽然磁盘文件系统试图保持单个文件的连续性,但如果没有显着的性能损失,这通常是不可能的。文件系统检查和碎片整理工具通常仅将“碎片百分比”统计中的文件碎片考虑在内。

Individual file fragmentation occurs when a single file has been broken into multiple pieces (called extents on extent-based file systems). While disk file systems attempt to keep individual files contiguous, this is not often possible without significant performance penalties. File system check and defragmentation tools typically only account for file fragmentation in their "fragmentation percentage" statistic.

可用空间碎片化

[编辑]

Free (unallocated) space fragmentation occurs when there are several unused areas of the file system where new files or metadata can be written to. Unwanted free space fragmentation is generally caused by deletion or truncation of files, but file systems may also intentionally insert fragments ("bubbles") of free space in order to facilitate extending nearby files (see preventing fragmentation below).

File scattering

[编辑]

File segmentation, also called related-file fragmentation, or application-level (file) fragmentation, refers to the lack of locality of reference (within the storing medium) between related files (see file sequence for more detail). Unlike the previous two types of fragmentation, file scattering is a much more vague concept, as it heavily depends on the access pattern of specific applications. This also makes objectively measuring or estimating it very difficult. However, arguably, it is the most critical type of fragmentation, as studies have found that the most frequently accessed files tend to be small compared to available disk throughput per second.[3]

To avoid related file fragmentation and improve locality of reference (in this case called file contiguity), assumptions or active observations about the operation of applications have to be made. A very frequent assumption made is that it is worthwhile to keep smaller files within a single directory together, and lay them out in the natural file system order. While it is often a reasonable assumption, it does not always hold. For example, an application might read several different files, perhaps in different directories, in exactly the same order they were written. Thus, a file system that simply orders all writes successively, might work faster for the given application.

负面影响

[编辑]

File system fragmentation is more problematic with consumer-grade hard disk drives because of the increasing disparity between sequential access speed and rotational latency (and to a lesser extent seek time) on which file systems are usually placed.[4] Thus, fragmentation is an important problem in file system research and design. The containment of fragmentation not only depends on the on-disk format of the file system, but also heavily on its implementation.[5] File system fragmentation has less performance impact upon solid-state drives, as there is no mechanical seek time involved.[6] However, the file system needs to store one additional piece of metadata for the corresponding file. Each piece of metadata itself occupies space and requires processing power and processor time. If the maximum fragmentation limit is reached, write requests fail.[6]

In simple file system benchmarks, the fragmentation factor is often omitted, as realistic aging and fragmentation is difficult to model. Rather, for simplicity of comparison, file system benchmarks are often run on empty file systems. Thus, the results may vary heavily from real-life access patterns.[7]

Mitigation

[编辑]

Several techniques have been developed to fight fragmentation. They can usually be classified into two categories: preemptive and retroactive. Due to the difficulty of predicting access patterns these techniques are most often heuristic in nature and may degrade performance under unexpected workloads.

防止碎片

[编辑]

Preemptive techniques attempt to keep fragmentation at a minimum at the time data is being written on the disk. The simplest is appending data to an existing fragment in place where possible, instead of allocating new blocks to a new fragment.

Many of today's file systems attempt to preallocate longer chunks, or chunks from different free space fragments, called extents to files that are actively appended to. This largely avoids file fragmentation when several files are concurrently being appended to, thus avoiding their becoming excessively intertwined.[5]

If the final size of a file subject to modification is known, storage for the entire file may be preallocated. For example, the Microsoft Windows swap file (page file) can be resized dynamically under normal operation, and therefore can become highly fragmented. This can be prevented by specifying a page file with the same minimum and maximum sizes, effectively preallocating the entire file.

BitTorrent and other peer-to-peer filesharing applications limit fragmentation by preallocating the full space needed for a file when initiating downloads.[8]

A relatively recent technique is delayed allocation in XFS, HFS+[9] and ZFS; the same technique is also called allocate-on-flush in reiser4 and ext4. When the file system is being written to, file system blocks are reserved, but the locations of specific files are not laid down yet. Later, when the file system is forced to flush changes as a result of memory pressure or a transaction commit, the allocator will have much better knowledge of the files' characteristics. Most file systems with this approach try to flush files in a single directory contiguously. Assuming that multiple reads from a single directory are common, locality of reference is improved.[10] Reiser4 also orders the layout of files according to the directory hash table, so that when files are being accessed in the natural file system order (as dictated by readdir), they are always read sequentially.[11]

碎片整理

[编辑]

Retroactive techniques attempt to reduce fragmentation, or the negative effects of fragmentation, after it has occurred. Many file systems provide defragmentation tools, which attempt to reorder fragments of files, and sometimes also decrease their scattering (i.e. improve their contiguity, or locality of reference) by keeping either smaller files in directories, or directory trees, or even file sequences close to each other on the disk.

The HFS Plus file system transparently defragments files that are less than 20 MiB in size and are broken into 8 or more fragments, when the file is being opened.[12]

The now obsolete Commodore Amiga Smart File System (SFS) defragmented itself while the filesystem was in use. The defragmentation process is almost completely stateless (apart from the location it is working on), so that it can be stopped and started instantly. During defragmentation data integrity is ensured for both metadata and normal data.

参见

[编辑]

注释

[编辑]
  1. ^ 一些文件系统,例如NTFSext2+,可能为了特殊目的预先分配空的连续区域。
  2. ^ The practice of leaving the space occupied by deleted files largely undisturbed is why undelete programs were able to work; they simply recovered the file whose name had been deleted from the directory, but whose contents were still on disk.

参考文献

[编辑]
  1. ^ http://www.8bs.com/hints/083.txt - Description of the can't extend error
  2. ^ http://8bs.com/mag/1to4/basegd1.txt - Possible data loss caused by the can't extend error
  3. ^ Douceur, John R.; Bolosky, William J. A Large-Scale Study of File-System Contents. ACM SIGMETRICS Performance Evaluation Review (Association for Computing Machinery). June 1999, 27 (1): 59–70. doi:10.1145/301464.301480. 
  4. ^ Kryder, Mark H. Future Storage Technologies: A Look Beyond the Horizon (PDF). Storage Networking World conference. Seagate Technology. 2006-04-03. (原始内容 (PDF)存档于17 July 2006). 
  5. ^ 5.0 5.1 McVoy, L. W.; Kleiman, S. R. Extent-like Performance from a UNIX File System (PostScript). Proceedings of USENIX winter '91. Dallas, Texas: Sun Microsystems, Inc.: 33–43. Winter 1991 [2006-12-14]. 
  6. ^ 6.0 6.1 Hanselman, Scott. The real and complete story - Does Windows defragment your SSD?. Scott Hanselman's blog. 3 December 2014. 
  7. ^ Smith, Keith Arnold. Workload-Specific File System Benchmarks (PDF). Cambridge, Massachusetts: Harvard University. January 2001 [2006-12-14]. (原始内容 (PDF)存档于2004-11-17). 
  8. ^ Layton, Jeffrey. From ext3 to ext4: An Interview with Theodore Ts'o. Linux Magazine (QuinStreet). 29 March 2009. 
  9. ^ Singh, Amit. Fragmentation in HFS Plus Volumes. Mac OS X Internals. May 2004. 
  10. ^ Sweeney, Adam; Doucette, Doug; Hu, Wei; Anderson, Curtis; Nishimoto, Mike; Peck, Geoff. Scalability in the XFS File System (PDF). Proceedings of the USENIX 1996 Annual Technical Conference. San Diego, California: Silicon Graphics. January 1996 [2006-12-14]. 
  11. ^ Reiser, Hans. The Reiser4 Filesystem. Google TechTalks. 2006-02-06 [2006-12-14]. (原始内容存档于19 May 2011). 
  12. ^ Singh, Amit. 12 The HFS Plus File System. Mac OS X Internals: A Systems Approach. Addison Wesley. 2007. ISBN 0321278542. 

延伸阅读

[编辑]