* vfat file system extreme fragmentation on multiprocessor
@ 2008-09-11 18:01 Harun Scheutzow
2008-09-11 19:10 ` Lennart Sorensen
2008-09-11 23:50 ` Andrew Morton
0 siblings, 2 replies; 11+ messages in thread
From: Harun Scheutzow @ 2008-09-11 18:01 UTC (permalink / raw)
To: linux-kernel
I like to share the following observation made with different kernels of 2.6.x series, a T7100 Core2Duo CPU (effectively 2 processors). I have not seen such a post while searching.
Two applications compress data at the same time and try to do their best to avoid fragmenting the file system by writing blocks of 50 MByte to a VFAT (FAT32) partition on SATA harddisk, cluster size 8 KByte. Resulting file size is 200 to 250 MByte. It is ok to get 4 to 5 fragments per file. But at random, approximately at every 4th file, there are a few 100 up to more than 4500 (most likely case approx 1500) fragments for each of the two files written in parallel.
My best guess: In this case both CPU cores were in the cluster allocation function of the fat file system at (nearly) the same time, allocating only a few clusters (guess 8) for their file before the other core got the next. The compression task is CPU bound. The harddisk could probably cater 4 cores. This reverses for decompression.
The files are ok, no corruption, just heavy fragmentation. I know vfat is not liked very much. Nevertheless I like to hope someone with more Linux kernel coding experience than me fixes this in the future.
vfat still seems to be the reliable way for data exchange accross platforms (anyone an ext2 driver for Win up to Vista which does not trash the f.s. every few days, or a reliable NTFS for Linux?). Anyway, it is a general design issue on SMP systems one should not forget.
I tried the same to an ext2 f.s.. It showed only very little fragmentation, most files were 1 piece, well done!
Best Regards, Harun Scheutzow
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: vfat file system extreme fragmentation on multiprocessor
2008-09-11 18:01 vfat file system extreme fragmentation on multiprocessor Harun Scheutzow
@ 2008-09-11 19:10 ` Lennart Sorensen
2008-09-11 19:36 ` H. Peter Anvin
2008-09-11 20:11 ` Harun Scheutzow
2008-09-11 23:50 ` Andrew Morton
1 sibling, 2 replies; 11+ messages in thread
From: Lennart Sorensen @ 2008-09-11 19:10 UTC (permalink / raw)
To: Harun Scheutzow; +Cc: linux-kernel
On Thu, Sep 11, 2008 at 08:01:16PM +0200, Harun Scheutzow wrote:
> I like to share the following observation made with different kernels of 2.6.x series, a T7100 Core2Duo CPU (effectively 2 processors). I have not seen such a post while searching.
>
> Two applications compress data at the same time and try to do their best to avoid fragmenting the file system by writing blocks of 50 MByte to a VFAT (FAT32) partition on SATA harddisk, cluster size 8 KByte. Resulting file size is 200 to 250 MByte. It is ok to get 4 to 5 fragments per file. But at random, approximately at every 4th file, there are a few 100 up to more than 4500 (most likely case approx 1500) fragments for each of the two files written in parallel.
>
> My best guess: In this case both CPU cores were in the cluster allocation function of the fat file system at (nearly) the same time, allocating only a few clusters (guess 8) for their file before the other core got the next. The compression task is CPU bound. The harddisk could probably cater 4 cores. This reverses for decompression.
>
> The files are ok, no corruption, just heavy fragmentation. I know vfat is not liked very much. Nevertheless I like to hope someone with more Linux kernel coding experience than me fixes this in the future.
>
> vfat still seems to be the reliable way for data exchange accross platforms (anyone an ext2 driver for Win up to Vista which does not trash the f.s. every few days, or a reliable NTFS for Linux?). Anyway, it is a general design issue on SMP systems one should not forget.
>
> I tried the same to an ext2 f.s.. It showed only very little fragmentation, most files were 1 piece, well done!
>
> Best Regards, Harun Scheutzow
I don't think fat filesystems have any concept of reserving space for
expanding files. It's a pretty simple filesystem after all designed for
a single cpu machine with a non-multitasking OS (if you can call DOS an
OS). Space tends to be allocated from the start of the disk wherever
free space is found since otherwise you would have to go searching for
the free space, which isn't that efficient.
ext2 of course was designed to avoid fragmentation and has lots of fancy
things like cylinder groups, and reserving space and such as far as I
understand it.
Now what would happen if you used ftruncate to extend the file you open
to a large size, and then started writing it, and then set the size
correctly at the end? Or if you simply used ftruncate to make the file
50MB initially, then wrote data until you hit 50MB, then extended it to
100MB, wrote more data, and so on, and at the end truncate to the
correct length? My guess would be that the ftruncate call would go
allocate all the clusters for you right away and reserve the space,
after which you can go fill in those clusters with real date. That if
it works ought to reduce the number of fragments you get.
Of course avoiding fragments on a filesystem that is practically
designed to fragment isn't going to be easy.
--
Len Sorensen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: vfat file system extreme fragmentation on multiprocessor
2008-09-11 19:10 ` Lennart Sorensen
@ 2008-09-11 19:36 ` H. Peter Anvin
2008-09-11 21:09 ` Lennart Sorensen
2008-09-11 20:11 ` Harun Scheutzow
1 sibling, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2008-09-11 19:36 UTC (permalink / raw)
To: Lennart Sorensen; +Cc: Harun Scheutzow, linux-kernel
Lennart Sorensen wrote:
>
> I don't think fat filesystems have any concept of reserving space for
> expanding files. It's a pretty simple filesystem after all designed for
> a single cpu machine with a non-multitasking OS (if you can call DOS an
> OS). Space tends to be allocated from the start of the disk wherever
> free space is found since otherwise you would have to go searching for
> the free space, which isn't that efficient.
>
Well, you can always do that in-memory.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: vfat file system extreme fragmentation on multiprocessor
2008-09-11 19:36 ` H. Peter Anvin
@ 2008-09-11 21:09 ` Lennart Sorensen
2008-09-11 21:43 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Lennart Sorensen @ 2008-09-11 21:09 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Harun Scheutzow, linux-kernel
On Thu, Sep 11, 2008 at 12:36:24PM -0700, H. Peter Anvin wrote:
> Lennart Sorensen wrote:
> >
> >I don't think fat filesystems have any concept of reserving space for
> >expanding files. It's a pretty simple filesystem after all designed for
> >a single cpu machine with a non-multitasking OS (if you can call DOS an
> >OS). Space tends to be allocated from the start of the disk wherever
> >free space is found since otherwise you would have to go searching for
> >the free space, which isn't that efficient.
> >
>
> Well, you can always do that in-memory.
It still involves searching no matter where you do it. With FAT-16 that
means up to 2^16 clusters to search through. With FAT-32 even more. It
doesn't have a cluster bitmap to speed things up.
--
Len Sorensen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: vfat file system extreme fragmentation on multiprocessor
2008-09-11 21:09 ` Lennart Sorensen
@ 2008-09-11 21:43 ` H. Peter Anvin
0 siblings, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2008-09-11 21:43 UTC (permalink / raw)
To: Lennart Sorensen; +Cc: Harun Scheutzow, linux-kernel
Lennart Sorensen wrote:
>>>
>> Well, you can always do that in-memory.
>
> It still involves searching no matter where you do it. With FAT-16 that
> means up to 2^16 clusters to search through. With FAT-32 even more. It
> doesn't have a cluster bitmap to speed things up.
>
You don't, obviously, and the cluster links are fairly big. Lipstick on
a pig, and all that. But it's prettier with the lipstick.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: vfat file system extreme fragmentation on multiprocessor
2008-09-11 19:10 ` Lennart Sorensen
2008-09-11 19:36 ` H. Peter Anvin
@ 2008-09-11 20:11 ` Harun Scheutzow
2008-09-11 21:20 ` Lennart Sorensen
1 sibling, 1 reply; 11+ messages in thread
From: Harun Scheutzow @ 2008-09-11 20:11 UTC (permalink / raw)
To: Lennart Sorensen; +Cc: linux-kernel
At 21:10 11.09.2008, Lennart Sorensen wrote:
Why do you start a flame war against fat fs without the proper knowledge? This is of no help.
ext2 is no magic bullet. As any other file system which does not rearrange clusters on the fly (this is defragmentation), it can only prevent fragmentation by wasting space, that means it seems not to fragment at 25 % full, but does at 90 %, just depending on size of files created. Initially wasting space and on-the-fly defragmentation can be done on FAT as well, if the driver likes to do.
All this NOT the problem here. It is NOT what I'm asking for.
>Now what would happen if you used ftruncate to extend the file you open
>to a large size, and then started writing it, and then set the size
>correctly at the end?
Because ftruncate has to allocate the clusters the same way fwrite does, it would probably give the same results.
Anyway, I want to keep things simple. I do not know the final file size in advance, may be 550 MByte, often 200, but can be 5. It is perfectly ok to get 4 fragments per 200 MByte. All I suggest is that a file system which is told to write 50 MByte a piece, does so.
I do not want to synchronize the two applications, preventing them from writing at the same time. But currently this seems to be the only option.
Harun Scheutzow
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: vfat file system extreme fragmentation on multiprocessor
2008-09-11 20:11 ` Harun Scheutzow
@ 2008-09-11 21:20 ` Lennart Sorensen
2008-09-11 23:03 ` Harun Scheutzow
0 siblings, 1 reply; 11+ messages in thread
From: Lennart Sorensen @ 2008-09-11 21:20 UTC (permalink / raw)
To: Harun Scheutzow; +Cc: linux-kernel
On Thu, Sep 11, 2008 at 10:11:05PM +0200, Harun Scheutzow wrote:
> At 21:10 11.09.2008, Lennart Sorensen wrote:
>
> Why do you start a flame war against fat fs without the proper knowledge? This is of no help.
I am not. FAT was nice and simple and did a nice job on floppies for
microsoft basic. Fragmentation was no big deal.
> ext2 is no magic bullet. As any other file system which does not rearrange clusters on the fly (this is defragmentation), it can only prevent fragmentation by wasting space, that means it seems not to fragment at 25 % full, but does at 90 %, just depending on size of files created. Initially wasting space and on-the-fly defragmentation can be done on FAT as well, if the driver likes to do.
Certainly true. I have had systems get amazingly slow with reiserfs3.6
in the past because I got too close to 100% full. When not full they
tend to behave quite nicely for many workloads. I am sure there are
workloads that will cause bad behaviour though.
> All this NOT the problem here. It is NOT what I'm asking for.
>
>
> >Now what would happen if you used ftruncate to extend the file you open
> >to a large size, and then started writing it, and then set the size
> >correctly at the end?
>
> Because ftruncate has to allocate the clusters the same way fwrite does, it would probably give the same results.
But would ftruncate be a single operation in the kernel, reserving all
the clusters at once from the filesystem? I doubt fwrite would write
50MB in a single chunk to any filesystem. If it does write a single
chunk in one system call, well then I am quite impressed actually.
> Anyway, I want to keep things simple. I do not know the final file size in advance, may be 550 MByte, often 200, but can be 5. It is perfectly ok to get 4 fragments per 200 MByte. All I suggest is that a file system which is told to write 50 MByte a piece, does so.
Well that's why I thought maybe requesting 50MB or 100MB at a time with
ftruncate might work OK, and if the kernel handles it as a single
operation and reserves all the clusters at once when calling ftruncate,
then you would avoid fragmentation. Seems simple enough and worth a
try. At least if it works then you get to control the fragment size,
even if you can't prevent fragments.
Unfortunately if I understand the fat code correctly in the kernel, the
fat driver doesn't permit extending files with ftruncate, only
shortening, so I guess that doesn't help at all.
> I do not want to synchronize the two applications, preventing them from writing at the same time. But currently this seems to be the only option.
That does seem like a pain. Not sure there are any other options in the
case of the fat filesystem under linux though.
--
Len Sorensen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: vfat file system extreme fragmentation on multiprocessor
2008-09-11 21:20 ` Lennart Sorensen
@ 2008-09-11 23:03 ` Harun Scheutzow
0 siblings, 0 replies; 11+ messages in thread
From: Harun Scheutzow @ 2008-09-11 23:03 UTC (permalink / raw)
To: linux-kernel; +Cc: Lennart Sorensen
Why should C library fwrite() split anything? There is no good reason (unless in x86 64 KByte segmented model trying to emulate a flat model - old DOS). The 50 MByte go to write() in a single piece here.
There are only very little "single operation"s in kernel nowadays. In most cases this is a good thing.
Looks like fat/fatent.c fat_alloc_clusters() is limited to allocate only 4 clusters under a single lock.
Found another assumption I do not like: cluster size >= 512. There are old FAT systems on SRAM cards having 128 byte/sector and cluster. But I don't want to have long filenames on them. Hope the 512 does not sit elsewhere, too.
fat/inode.c __fat_get_block() /* TODO: multiple cluster allocation would be desirable */
YES, OF COURSE. Only a single cluster is allocated at a time, no lock here, I can be happy I got still 8 clusters per fragment, might have been only 1.
Harun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: vfat file system extreme fragmentation on multiprocessor
2008-09-11 18:01 vfat file system extreme fragmentation on multiprocessor Harun Scheutzow
2008-09-11 19:10 ` Lennart Sorensen
@ 2008-09-11 23:50 ` Andrew Morton
1 sibling, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2008-09-11 23:50 UTC (permalink / raw)
To: Harun Scheutzow; +Cc: linux-kernel
On Thu, 11 Sep 2008 20:01:16 +0200
Harun Scheutzow <harun04@scheutzow.de> wrote:
> I like to share the following observation made with different kernels of 2.6.x series
Yeah. ext2 and ext3 also showed major, major, major differences in
layout quality (and hence performance) when switching between
uniprocessor and multiprocessor environments, for the same reason. (UP was
far better).
It was fixed in ext3/4 by the in-core reservations code (recently
ported into ext2 as well).
This is why filesystem performance testing is far from complete if it
is performed only on SMP (or UP) machines.
^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <baZfg-1yC-3@gated-at.bofh.it>]
end of thread, other threads:[~2008-09-12 22:12 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-11 18:01 vfat file system extreme fragmentation on multiprocessor Harun Scheutzow
2008-09-11 19:10 ` Lennart Sorensen
2008-09-11 19:36 ` H. Peter Anvin
2008-09-11 21:09 ` Lennart Sorensen
2008-09-11 21:43 ` H. Peter Anvin
2008-09-11 20:11 ` Harun Scheutzow
2008-09-11 21:20 ` Lennart Sorensen
2008-09-11 23:03 ` Harun Scheutzow
2008-09-11 23:50 ` Andrew Morton
[not found] <baZfg-1yC-3@gated-at.bofh.it>
[not found] ` <baZS2-2rX-15@gated-at.bofh.it>
2008-09-12 13:19 ` Bodo Eggert
2008-09-12 22:11 ` Harun Scheutzow
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.