* ext4 unlink performance
@ 2008-11-13 18:57 Bruce Guenter
2008-11-13 19:10 ` Bruce Guenter
2008-11-13 19:46 ` Theodore Tso
0 siblings, 2 replies; 17+ messages in thread
From: Bruce Guenter @ 2008-11-13 18:57 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 1692 bytes --]
Hi.
I started running some comparison benchmarks between ext3 and ext4 this
week. During one torture test, I observed that ext4 unlink speed is
much slower than ext3, and using 256-byte inodes makes it worse.
The torture test consists of unpacking a large tarball consisting of a
about 725,000 small files in random order and then recursively unlinking
the extracted directory. The majority of the files in the archive are a
single block, about 15% are span multiple blocks.
The results:
ext4 128 extract: 618.991 elapsed 54.903 user 81.789 sys 22.08%
ext4 256 extract: 671.655 elapsed 55.099 user 74.593 sys 19.30%
ext3 128 extract: 950.965 elapsed 55.155 user 77.473 sys 13.94%
ext3 256 extract: 985.687 elapsed 55.591 user 94.698 sys 15.24%
ext4 beats ext3 hands down, with either inode size. I would think that
the 128 byte inode runs are faster than the 256 byte inode runs simply
because of the number of small files involved.
ext4 128 unlink: 913.934 elapsed 0.296 user 33.550 sys 3.70%
ext4 256 unlink: 1507.696 elapsed 0.324 user 37.602 sys 2.51%
ext3 128 unlink: 171.150 elapsed 0.244 user 23.825 sys 14.06%
ext3 256 unlink: 360.073 elapsed 0.328 user 27.954 sys 7.85%
Ouch. Why is ext4 so much slower than ext3 here, and why is there such
a huge discrepancy between the different inode sizes? The filesystems
were created with the stock options except for inode size and ^huge_file
(for historical reasons when I was testing with older kernels).
I tested this with Linus's git sources on x86_64 on IDE. 2.6.28-rc4 and
2.6.27 had similar performance.
--
Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-13 18:57 ext4 unlink performance Bruce Guenter
@ 2008-11-13 19:10 ` Bruce Guenter
2008-11-13 20:42 ` Theodore Tso
2008-11-13 19:46 ` Theodore Tso
1 sibling, 1 reply; 17+ messages in thread
From: Bruce Guenter @ 2008-11-13 19:10 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 494 bytes --]
On Thu, Nov 13, 2008 at 12:57:12PM -0600, Bruce Guenter wrote:
> Ouch. Why is ext4 so much slower than ext3 here,
I forgot to mention one important detail. I started running 'vmstat 5'
during one of the unlink runs, and noticed that there were intervals of
15 or 20 seconds where no blocks were being read or written, and minimal
CPU was being used. I do not observe the same stalls when using ext3.
--
Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-13 18:57 ext4 unlink performance Bruce Guenter
2008-11-13 19:10 ` Bruce Guenter
@ 2008-11-13 19:46 ` Theodore Tso
2008-11-13 20:27 ` Bruce Guenter
1 sibling, 1 reply; 17+ messages in thread
From: Theodore Tso @ 2008-11-13 19:46 UTC (permalink / raw)
To: linux-ext4; +Cc: Bruce Guenter
On Thu, Nov 13, 2008 at 12:57:12PM -0600, Bruce Guenter wrote:
>
> Ouch. Why is ext4 so much slower than ext3 here, and why is there such
> a huge discrepancy between the different inode sizes? The filesystems
> were created with the stock options except for inode size and ^huge_file
> (for historical reasons when I was testing with older kernels).
>
I'm assuming that ext3 filesystem was created with htree enabled
(possibly not true on older versions of e2fsprogs), but if you're
creating ext4 filesystems, I'm assuming that you have been using an
1.41.x version of e2fsprogs.
If this the the case, the most likely explanation is that that ext4
defaults to barriers enabled, and ext3 defaults to barriers disabled.
So try mounting ext3 with "-o barriers=1", or ext4 with "-o
barriers=0", so that the comparison is between apples and apples.
Regards,
- Ted
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-13 19:46 ` Theodore Tso
@ 2008-11-13 20:27 ` Bruce Guenter
0 siblings, 0 replies; 17+ messages in thread
From: Bruce Guenter @ 2008-11-13 20:27 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 977 bytes --]
On Thu, Nov 13, 2008 at 02:46:28PM -0500, Theodore Tso wrote:
> I'm assuming that ext3 filesystem was created with htree enabled
> (possibly not true on older versions of e2fsprogs), but if you're
> creating ext4 filesystems, I'm assuming that you have been using an
> 1.41.x version of e2fsprogs.
All filesystems are new filesystems, created with e2fsprogs 1.41.2, all
with the default options except for ^huge_file and the inode size.
> If this the the case, the most likely explanation is that that ext4
> defaults to barriers enabled, and ext3 defaults to barriers disabled.
> So try mounting ext3 with "-o barriers=1", or ext4 with "-o
> barriers=0", so that the comparison is between apples and apples.
The indication from dmesg is that ext4 is already running with barriers
disabled due to running over dm, but I will re-run with barriers forced
off to double check.
--
Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-13 19:10 ` Bruce Guenter
@ 2008-11-13 20:42 ` Theodore Tso
2008-11-14 4:11 ` Bruce Guenter
0 siblings, 1 reply; 17+ messages in thread
From: Theodore Tso @ 2008-11-13 20:42 UTC (permalink / raw)
To: linux-ext4
On Thu, Nov 13, 2008 at 01:10:00PM -0600, Bruce Guenter wrote:
> On Thu, Nov 13, 2008 at 12:57:12PM -0600, Bruce Guenter wrote:
> > Ouch. Why is ext4 so much slower than ext3 here,
>
> I forgot to mention one important detail. I started running 'vmstat 5'
> during one of the unlink runs, and noticed that there were intervals of
> 15 or 20 seconds where no blocks were being read or written, and minimal
> CPU was being used. I do not observe the same stalls when using ext3.
Hmm... how very strange. Can you run the command:
ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:16,comm
during one of the quiescent periods, and see what is in the WCHAN
field for the unlink command?
- Ted
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-13 20:42 ` Theodore Tso
@ 2008-11-14 4:11 ` Bruce Guenter
2008-11-14 14:59 ` Theodore Tso
0 siblings, 1 reply; 17+ messages in thread
From: Bruce Guenter @ 2008-11-14 4:11 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 1980 bytes --]
On Thu, Nov 13, 2008 at 03:42:40PM -0500, Theodore Tso wrote:
> Hmm... how very strange. Can you run the command:
>
> ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:16,comm
>
> during one of the quiescent periods, and see what is in the WCHAN
> field for the unlink command?
Mostly I see this:
PID TID CLS RTPRIO NI PRI PSR %CPU STAT WCHAN COMMAND
5932 5932 TS - 0 19 0 2.4 D+ sync_buffer rm
I had been running these benchmarks over dm-crypt (which is the target
environment for which I was testing). So I re-ran the tests both bare
and with dm-crypt to compare. The stalls reported by vmstat did not
show up when running the test bare.
I also sat and watched the drive LED while the unlink test was running.
The drive LED showed the occasional 1/2 second stall, but at the same
time vmstat was showing 5-20 second stalls, so there that would seem to
point to some kind of reporting problem.
For completeness, I re-ran the benchmark both with and without dm-crypt
to see if it was the cause of the problem, all times mounting with
barrier=0. As expected, it did slow down the process, but the problem
remains:
ext4 128 plain unlink: 665.523 elapsed 0.276 user 34.882 sys 5.28%
ext4 128 crypt unlink: 907.934 elapsed 0.356 user 34.698 sys 3.86%
ext4 256 plain unlink: 1435.964 elapsed 0.248 user 40.319 sys 2.82%
ext4 256 crypt unlink: 1504.660 elapsed 0.304 user 35.186 sys 2.35%
ext3 128 plain unlink: 133.863 elapsed 0.248 user 24.618 sys 18.57%
ext3 128 crypt unlink: 133.092 elapsed 0.280 user 23.661 sys 17.98%
ext3 256 plain unlink: 309.635 elapsed 0.296 user 27.362 sys 8.93%
ext3 256 crypt unlink: 319.819 elapsed 0.268 user 23.713 sys 7.49%
Is there anything else I can try to see what's happening?
--
Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-14 4:11 ` Bruce Guenter
@ 2008-11-14 14:59 ` Theodore Tso
2008-11-14 15:48 ` Bruce Guenter
2008-11-15 20:44 ` Bruce Guenter
0 siblings, 2 replies; 17+ messages in thread
From: Theodore Tso @ 2008-11-14 14:59 UTC (permalink / raw)
To: linux-ext4
On Thu, Nov 13, 2008 at 10:11:21PM -0600, Bruce Guenter wrote:
>
> I had been running these benchmarks over dm-crypt (which is the target
> environment for which I was testing). So I re-ran the tests both bare
> and with dm-crypt to compare. The stalls reported by vmstat did not
> show up when running the test bare.
>
> I also sat and watched the drive LED while the unlink test was running.
> The drive LED showed the occasional 1/2 second stall, but at the same
> time vmstat was showing 5-20 second stalls, so there that would seem to
> point to some kind of reporting problem.
Yeah, I'm guessing that's a red herring caused by how dm-crypt works.
The blocks had already been posted to block device, but it was taking
a while for the blocks to be written out to disk.
This is beginning to perhaps sound like a layout problem of some kind.
How big is the filesystem that you are testing against? I'm guessing
that if you have 725,000 small files, that it can't be much more than
a gig or two, right?
Can you send me the output of
e2image -r /dev/XXX - | bzip2 ext4.e2i.bz2
for both the ext4 and ext3 filesystems, after you have loaded them
with your small files, and before you delete them? This will show me
the where all of the files are located, and in fact I'll be able to
replicate the delete workload on my end and see exactly what's going on.
I will see the directory and file names of your workload, but
hopefully that won't be an issue for you. Thanks,
- Ted
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-14 14:59 ` Theodore Tso
@ 2008-11-14 15:48 ` Bruce Guenter
2008-11-14 15:54 ` Theodore Tso
2008-11-15 20:44 ` Bruce Guenter
1 sibling, 1 reply; 17+ messages in thread
From: Bruce Guenter @ 2008-11-14 15:48 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 800 bytes --]
On Fri, Nov 14, 2008 at 09:59:14AM -0500, Theodore Tso wrote:
> This is beginning to perhaps sound like a layout problem of some kind.
> How big is the filesystem that you are testing against?
The partition is almost 30GB.
> I'm guessing
> that if you have 725,000 small files, that it can't be much more than
> a gig or two, right?
The compressed stream itself expands to 4.4GB, and more on disk due to
block sizes.
> Can you send me the output of
>
> e2image -r /dev/XXX - | bzip2 ext4.e2i.bz2
>
> for both the ext4 and ext3 filesystems, after you have loaded them
> with your small files, and before you delete them?
Will do. Can you accept LZMA so I can save a bit of bandwidth?
--
Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-14 15:48 ` Bruce Guenter
@ 2008-11-14 15:54 ` Theodore Tso
0 siblings, 0 replies; 17+ messages in thread
From: Theodore Tso @ 2008-11-14 15:54 UTC (permalink / raw)
To: linux-ext4
On Fri, Nov 14, 2008 at 09:48:48AM -0600, Bruce Guenter wrote:
>
> Will do. Can you accept LZMA so I can save a bit of bandwidth?
>
Yep, no problem, thanks!!
- Ted
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-14 14:59 ` Theodore Tso
2008-11-14 15:48 ` Bruce Guenter
@ 2008-11-15 20:44 ` Bruce Guenter
2008-11-15 23:08 ` Eric Sandeen
2008-11-16 0:56 ` Theodore Tso
1 sibling, 2 replies; 17+ messages in thread
From: Bruce Guenter @ 2008-11-15 20:44 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 497 bytes --]
On Fri, Nov 14, 2008 at 09:59:14AM -0500, Theodore Tso wrote:
> This is beginning to perhaps sound like a layout problem of some kind.
To test this theory, I ran one test where I populated the filesystem
with ext3 and then mounted as ext4 to do the unlinking. This produced
unlink times comparable with ext3. That is, the degredation is occuring
when the filesystem is populated, not when it is cleaned.
--
Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-15 20:44 ` Bruce Guenter
@ 2008-11-15 23:08 ` Eric Sandeen
2008-11-16 0:56 ` Theodore Tso
1 sibling, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2008-11-15 23:08 UTC (permalink / raw)
To: linux-ext4
Bruce Guenter wrote:
> On Fri, Nov 14, 2008 at 09:59:14AM -0500, Theodore Tso wrote:
>> This is beginning to perhaps sound like a layout problem of some kind.
>
> To test this theory, I ran one test where I populated the filesystem
> with ext3 and then mounted as ext4 to do the unlinking. This produced
> unlink times comparable with ext3. That is, the degredation is occuring
> when the filesystem is populated, not when it is cleaned.
Maybe run the unlinking activity through seekwatcher* in both cases, to
see where the IO is happening.
(also, you're right w.r.t. lvm/dm; barriers don't pass and should get
disabled after the first attempt at a barrier write).
-Eric
*http://oss.oracle.com/~mason/seekwatcher/ (also packaged for fedora,
maybe other distros as well)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-15 20:44 ` Bruce Guenter
2008-11-15 23:08 ` Eric Sandeen
@ 2008-11-16 0:56 ` Theodore Tso
2008-11-16 3:38 ` Bruce Guenter
2008-11-17 0:43 ` Andreas Dilger
1 sibling, 2 replies; 17+ messages in thread
From: Theodore Tso @ 2008-11-16 0:56 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 2385 bytes --]
On Sat, Nov 15, 2008 at 02:44:23PM -0600, Bruce Guenter wrote:
> On Fri, Nov 14, 2008 at 09:59:14AM -0500, Theodore Tso wrote:
> > This is beginning to perhaps sound like a layout problem of some kind.
>
> To test this theory, I ran one test where I populated the filesystem
> with ext3 and then mounted as ext4 to do the unlinking. This produced
> unlink times comparable with ext3. That is, the degredation is occuring
> when the filesystem is populated, not when it is cleaned.
The problem is definitely in how we choose the directory and file
inode numbers for ext4. A quick look of the free block and free inode
counts from the dumpe2fs of your ext3 and ext4 256-byte inode e2images
tells the tail. Ext4 is using blocks and inodes packed up against the
beginning of the filesystem, and ext3 has the blocks and inodes spread
out for better locality.
We didn't change the ext4's inode allocation algorithms, so I'm
guessing that it's interacting very poorly with ext4's block delayed
allocation algorithms. Bruce, how much memory did you have in your
system? Do you have a large amount of memory, say 6-8 gigs, by any
chance? When the filesystem creates a new directory, if the block
group is especially full, it will choose a new block group for the
directory, to spread things out. However, if the blocks haven't been
allocated yet, then the directories won't be spread out appropriately,
and then the inodes will be allocated close to the directories, and
then things go downhill from there. This is much more likely to
happen if you have a large number of small files, and a large amount
of memory, and when you are unpacking a tar file and so are write out
a large number of these small files spaced very closely in time,
before they have a chance to get forced out to disk and thus allocated
so the filesystem can take block group fullness into account when
deciding how to allocate inode numbers.
When I have a chance I'll write a program which analyzes how close the
blocks are to inodes, and how close inodes are to their containing
directory, but I'm pretty sure what we'll find will just confirm
what's going on in greater detail.
One thing is clear --- we need to rethink our block and inode
allocation algorithms in light of delayed allocation. Maybe XFS has
some tricks up its sleeve that we can learn from?
- Ted
[-- Attachment #2: ext3-256-inode-usage --]
[-- Type: text/plain, Size: 11985 bytes --]
29566 free blocks, 15551 free inodes, 95 directories
30568 free blocks, 16285 free inodes, 0 directories
31581 free blocks, 16312 free inodes, 0 directories
30484 free blocks, 16266 free inodes, 0 directories
31187 free blocks, 15954 free inodes, 0 directories
30693 free blocks, 16359 free inodes, 0 directories
31282 free blocks, 16276 free inodes, 0 directories
30689 free blocks, 16355 free inodes, 0 directories
31589 free blocks, 16258 free inodes, 0 directories
28653 free blocks, 15401 free inodes, 0 directories
31126 free blocks, 16207 free inodes, 0 directories
25170 free blocks, 16197 free inodes, 0 directories
31535 free blocks, 16320 free inodes, 0 directories
31489 free blocks, 16330 free inodes, 0 directories
30900 free blocks, 16310 free inodes, 0 directories
31617 free blocks, 16296 free inodes, 0 directories
31459 free blocks, 16308 free inodes, 0 directories
31402 free blocks, 16210 free inodes, 0 directories
31501 free blocks, 16203 free inodes, 0 directories
31429 free blocks, 16259 free inodes, 0 directories
30675 free blocks, 15607 free inodes, 0 directories
31690 free blocks, 16332 free inodes, 0 directories
31662 free blocks, 16303 free inodes, 0 directories
31554 free blocks, 16325 free inodes, 0 directories
31487 free blocks, 16173 free inodes, 0 directories
29451 free blocks, 15815 free inodes, 0 directories
31536 free blocks, 16189 free inodes, 0 directories
19668 free blocks, 16335 free inodes, 0 directories
30513 free blocks, 16001 free inodes, 0 directories
31586 free blocks, 16341 free inodes, 0 directories
30155 free blocks, 16167 free inodes, 0 directories
31178 free blocks, 16222 free inodes, 0 directories
31197 free blocks, 16048 free inodes, 0 directories
31364 free blocks, 16185 free inodes, 0 directories
31559 free blocks, 16203 free inodes, 0 directories
31380 free blocks, 16152 free inodes, 0 directories
30137 free blocks, 16053 free inodes, 0 directories
31044 free blocks, 16189 free inodes, 0 directories
31515 free blocks, 16274 free inodes, 0 directories
31676 free blocks, 16332 free inodes, 0 directories
31517 free blocks, 16324 free inodes, 0 directories
31623 free blocks, 16349 free inodes, 0 directories
31285 free blocks, 16178 free inodes, 0 directories
31272 free blocks, 16251 free inodes, 0 directories
31440 free blocks, 16331 free inodes, 0 directories
13310 free blocks, 16176 free inodes, 0 directories
0 free blocks, 16044 free inodes, 0 directories
30405 free blocks, 16202 free inodes, 0 directories
31474 free blocks, 16288 free inodes, 0 directories
30343 free blocks, 16301 free inodes, 0 directories
31159 free blocks, 16008 free inodes, 0 directories
31660 free blocks, 16321 free inodes, 0 directories
17290 free blocks, 16222 free inodes, 0 directories
23962 free blocks, 16335 free inodes, 0 directories
31233 free blocks, 16290 free inodes, 0 directories
30149 free blocks, 16258 free inodes, 0 directories
30357 free blocks, 16161 free inodes, 0 directories
30995 free blocks, 15863 free inodes, 0 directories
23693 free blocks, 16269 free inodes, 0 directories
27104 free blocks, 15766 free inodes, 0 directories
31648 free blocks, 16290 free inodes, 0 directories
31596 free blocks, 16269 free inodes, 0 directories
31403 free blocks, 16208 free inodes, 0 directories
31640 free blocks, 16327 free inodes, 0 directories
31575 free blocks, 16303 free inodes, 0 directories
31071 free blocks, 16103 free inodes, 0 directories
13310 free blocks, 16176 free inodes, 0 directories
31065 free blocks, 16333 free inodes, 0 directories
31588 free blocks, 16329 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
28372 free blocks, 16023 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
29706 free blocks, 16289 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
29130 free blocks, 16317 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
30203 free blocks, 16301 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
29788 free blocks, 16259 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
25710 free blocks, 15173 free inodes, 0 directories
12285 free blocks, 0 free inodes, 0 directories
26642 free blocks, 16275 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
28614 free blocks, 16214 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
29711 free blocks, 16228 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
27163 free blocks, 16240 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
28846 free blocks, 15929 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
26446 free blocks, 16342 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
24677 free blocks, 16304 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
28747 free blocks, 15548 free inodes, 0 directories
30087 free blocks, 16166 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
11233 free blocks, 0 free inodes, 0 directories
11154 free blocks, 0 free inodes, 0 directories
12057 free blocks, 0 free inodes, 0 directories
9413 free blocks, 0 free inodes, 0 directories
11859 free blocks, 0 free inodes, 0 directories
7909 free blocks, 0 free inodes, 0 directories
11103 free blocks, 0 free inodes, 0 directories
13169 free blocks, 8284 free inodes, 0 directories
24380 free blocks, 15681 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
16131 free blocks, 7469 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
0 free blocks, 16384 free inodes, 0 directories
9358 free blocks, 0 free inodes, 0 directories
27525 free blocks, 15333 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
11112 free blocks, 0 free inodes, 0 directories
29474 free blocks, 16254 free inodes, 0 directories
27974 free blocks, 16149 free inodes, 0 directories
25894 free blocks, 15618 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
11755 free blocks, 0 free inodes, 0 directories
30976 free blocks, 16297 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
31148 free blocks, 16349 free inodes, 0 directories
30863 free blocks, 16191 free inodes, 0 directories
30674 free blocks, 16368 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
27776 free blocks, 16240 free inodes, 0 directories
14180 free blocks, 3409 free inodes, 0 directories
31712 free blocks, 16354 free inodes, 0 directories
30933 free blocks, 16069 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
28038 free blocks, 15849 free inodes, 0 directories
13310 free blocks, 0 free inodes, 0 directories
11535 free blocks, 0 free inodes, 0 directories
12145 free blocks, 642 free inodes, 0 directories
26785 free blocks, 16306 free inodes, 0 directories
13310 free blocks, 146 free inodes, 0 directories
12397 free blocks, 269 free inodes, 0 directories
10192 free blocks, 16104 free inodes, 0 directories
20992 free blocks, 16346 free inodes, 0 directories
31697 free blocks, 16348 free inodes, 0 directories
31085 free blocks, 16335 free inodes, 0 directories
30950 free blocks, 16321 free inodes, 0 directories
24539 free blocks, 16322 free inodes, 0 directories
30995 free blocks, 15785 free inodes, 0 directories
31644 free blocks, 16338 free inodes, 0 directories
22704 free blocks, 9156 free inodes, 0 directories
31638 free blocks, 16325 free inodes, 0 directories
31718 free blocks, 16369 free inodes, 0 directories
30880 free blocks, 15661 free inodes, 0 directories
27625 free blocks, 13296 free inodes, 0 directories
31625 free blocks, 16306 free inodes, 0 directories
31591 free blocks, 16293 free inodes, 0 directories
31572 free blocks, 16252 free inodes, 0 directories
31544 free blocks, 16339 free inodes, 0 directories
31455 free blocks, 16315 free inodes, 0 directories
31569 free blocks, 16296 free inodes, 0 directories
31694 free blocks, 16341 free inodes, 0 directories
31603 free blocks, 16282 free inodes, 0 directories
31059 free blocks, 16126 free inodes, 0 directories
31543 free blocks, 16295 free inodes, 0 directories
31616 free blocks, 16346 free inodes, 0 directories
30928 free blocks, 16242 free inodes, 0 directories
31574 free blocks, 16337 free inodes, 0 directories
31464 free blocks, 16335 free inodes, 0 directories
31455 free blocks, 16236 free inodes, 0 directories
31240 free blocks, 16076 free inodes, 0 directories
28782 free blocks, 16307 free inodes, 0 directories
31616 free blocks, 16282 free inodes, 0 directories
31495 free blocks, 16265 free inodes, 0 directories
30114 free blocks, 15817 free inodes, 0 directories
30788 free blocks, 16044 free inodes, 0 directories
30826 free blocks, 16033 free inodes, 0 directories
30691 free blocks, 16347 free inodes, 0 directories
30268 free blocks, 16201 free inodes, 0 directories
31237 free blocks, 16098 free inodes, 0 directories
31632 free blocks, 16310 free inodes, 0 directories
31392 free blocks, 16067 free inodes, 0 directories
31587 free blocks, 16305 free inodes, 0 directories
18075 free blocks, 15746 free inodes, 0 directories
30974 free blocks, 15772 free inodes, 0 directories
31451 free blocks, 16316 free inodes, 0 directories
31633 free blocks, 16331 free inodes, 0 directories
31618 free blocks, 16328 free inodes, 0 directories
30546 free blocks, 16237 free inodes, 0 directories
31537 free blocks, 16318 free inodes, 0 directories
31469 free blocks, 16225 free inodes, 0 directories
31695 free blocks, 16355 free inodes, 0 directories
30358 free blocks, 15864 free inodes, 0 directories
31461 free blocks, 16203 free inodes, 0 directories
30353 free blocks, 16193 free inodes, 0 directories
31151 free blocks, 16131 free inodes, 0 directories
30619 free blocks, 16337 free inodes, 0 directories
31633 free blocks, 16277 free inodes, 0 directories
27108 free blocks, 12675 free inodes, 0 directories
31107 free blocks, 15946 free inodes, 0 directories
15105 free blocks, 2938 free inodes, 0 directories
31434 free blocks, 16243 free inodes, 0 directories
31696 free blocks, 16345 free inodes, 0 directories
28579 free blocks, 15890 free inodes, 0 directories
28891 free blocks, 16116 free inodes, 0 directories
27859 free blocks, 16075 free inodes, 0 directories
31483 free blocks, 16328 free inodes, 0 directories
28222 free blocks, 16300 free inodes, 0 directories
31588 free blocks, 16261 free inodes, 0 directories
31729 free blocks, 16373 free inodes, 0 directories
31546 free blocks, 16305 free inodes, 0 directories
31672 free blocks, 16292 free inodes, 0 directories
31490 free blocks, 16246 free inodes, 0 directories
13310 free blocks, 15882 free inodes, 0 directories
13464 free blocks, 16340 free inodes, 0 directories
31455 free blocks, 16295 free inodes, 0 directories
31431 free blocks, 16221 free inodes, 0 directories
31548 free blocks, 16305 free inodes, 0 directories
13310 free blocks, 14144 free inodes, 0 directories
13556 free blocks, 16218 free inodes, 0 directories
31510 free blocks, 16256 free inodes, 0 directories
13310 free blocks, 0 free inodes, 400 directories
0 free blocks, 9462 free inodes, 671 directories
3870 free blocks, 9683 free inodes, 401 directories
0 free blocks, 10254 free inodes, 1036 directories
0 free blocks, 12830 free inodes, 1040 directories
0 free blocks, 13954 free inodes, 886 directories
5497 free blocks, 15126 free inodes, 536 directories
[-- Attachment #3: ext4-256-inode-usage --]
[-- Type: text/plain, Size: 15768 bytes --]
685 free blocks, 0 free inodes, 1199 directories
4096 free blocks, 0 free inodes, 547 directories
3823 free blocks, 0 free inodes, 362 directories
4604 free blocks, 0 free inodes, 268 directories
2168 free blocks, 0 free inodes, 232 directories
2163 free blocks, 0 free inodes, 180 directories
3577 free blocks, 0 free inodes, 141 directories
3168 free blocks, 0 free inodes, 133 directories
2144 free blocks, 0 free inodes, 113 directories
4122 free blocks, 0 free inodes, 109 directories
3575 free blocks, 0 free inodes, 94 directories
2177 free blocks, 0 free inodes, 90 directories
2746 free blocks, 0 free inodes, 99 directories
2867 free blocks, 0 free inodes, 88 directories
4774 free blocks, 0 free inodes, 76 directories
2300 free blocks, 5408 free inodes, 49 directories, 5408 unused inodes
1196 free blocks, 0 free inodes, 72 directories
3745 free blocks, 0 free inodes, 63 directories
2419 free blocks, 0 free inodes, 61 directories
4606 free blocks, 0 free inodes, 65 directories
1501 free blocks, 0 free inodes, 56 directories
1136 free blocks, 0 free inodes, 61 directories
8875 free blocks, 0 free inodes, 56 directories
7227 free blocks, 0 free inodes, 61 directories
6326 free blocks, 0 free inodes, 47 directories
179 free blocks, 0 free inodes, 39 directories
1295 free blocks, 0 free inodes, 49 directories
6778 free blocks, 0 free inodes, 56 directories
4675 free blocks, 0 free inodes, 31 directories
2203 free blocks, 0 free inodes, 37 directories
1653 free blocks, 0 free inodes, 47 directories
2920 free blocks, 0 free inodes, 45 directories
1911 free blocks, 0 free inodes, 40 directories
2969 free blocks, 0 free inodes, 30 directories
1628 free blocks, 0 free inodes, 41 directories
5938 free blocks, 0 free inodes, 34 directories
2766 free blocks, 0 free inodes, 36 directories
4067 free blocks, 0 free inodes, 34 directories
3141 free blocks, 0 free inodes, 24 directories
3582 free blocks, 0 free inodes, 30 directories
1864 free blocks, 0 free inodes, 35 directories
4266 free blocks, 0 free inodes, 27 directories
2911 free blocks, 0 free inodes, 38 directories
4633 free blocks, 0 free inodes, 36 directories
2899 free blocks, 0 free inodes, 19 directories
6290 free blocks, 5438 free inodes, 15 directories, 5438 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
720 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
863 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
1541 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
717 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
5467 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
12388 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
16352 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
0 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
31743 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
16352 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
16352 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
16352 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
16352 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
16352 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
16352 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
30768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-16 0:56 ` Theodore Tso
@ 2008-11-16 3:38 ` Bruce Guenter
2008-11-17 0:43 ` Andreas Dilger
1 sibling, 0 replies; 17+ messages in thread
From: Bruce Guenter @ 2008-11-16 3:38 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 962 bytes --]
On Sat, Nov 15, 2008 at 07:56:10PM -0500, Theodore Tso wrote:
> Bruce, how much memory did you have in your
> system? Do you have a large amount of memory, say 6-8 gigs, by any
> chance?
The test system has 1.5GB RAM.
> One thing is clear --- we need to rethink our block and inode
> allocation algorithms in light of delayed allocation. Maybe XFS has
> some tricks up its sleeve that we can learn from?
Just so I am clear as well, I fully realize this is quite an artifical
benchmark. I asked about it because of how large the regression is. I
deal with many systems that typically have to handle creating and
unlinking large numbers of small files (mail servers). They all are
running ext3 now, but I am considering switching them to ext4 once
2.6.28 is out. As such my big concern is that this regression will
cause performance problems for them.
--
Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-16 0:56 ` Theodore Tso
2008-11-16 3:38 ` Bruce Guenter
@ 2008-11-17 0:43 ` Andreas Dilger
[not found] ` <20081119024021.GA10185@mit.edu>
1 sibling, 1 reply; 17+ messages in thread
From: Andreas Dilger @ 2008-11-17 0:43 UTC (permalink / raw)
To: Theodore Tso; +Cc: linux-ext4
On Nov 15, 2008 19:56 -0500, Theodore Ts'o wrote:
> The problem is definitely in how we choose the directory and file
> inode numbers for ext4. A quick look of the free block and free inode
> counts from the dumpe2fs of your ext3 and ext4 256-byte inode e2images
> tells the tail. Ext4 is using blocks and inodes packed up against the
> beginning of the filesystem, and ext3 has the blocks and inodes spread
> out for better locality.
>
> We didn't change the ext4's inode allocation algorithms,
That isn't true, in ext4 the inode allocation algorithm is different
when FLEX_BG is enabled.
> so I'm guessing that it's interacting very poorly with ext4's block delayed
> allocation algorithms. Bruce, how much memory did you have in your
> system? Do you have a large amount of memory, say 6-8 gigs, by any
> chance? When the filesystem creates a new directory, if the block
> group is especially full, it will choose a new block group for the
> directory, to spread things out. However, if the blocks haven't been
> allocated yet, then the directories won't be spread out appropriately,
> and then the inodes will be allocated close to the directories, and
> then things go downhill from there.
It isn't clear this is the root of the problem yet. In fact, packing
the inodes and directories together should improve performance, because
there is no seeking when accessing the file metadata. If the file data
is not "close" to the inode that doesn't really matter, because unlinks
do not need to access the file data.
Even with the old algorithm the data is not right beside the inode so
there will always have to be a seek of some kind to access it, and the
difference in performance between a short seek and a long seek is not
that much.
> This is much more likely to
> happen if you have a large number of small files, and a large amount
> of memory, and when you are unpacking a tar file and so are write out
> a large number of these small files spaced very closely in time,
> before they have a chance to get forced out to disk and thus allocated
> so the filesystem can take block group fullness into account when
> deciding how to allocate inode numbers.
Presumably the below listings are ext3 first, ext4 second?
> 29566 free blocks, 15551 free inodes, 95 directories
> 30568 free blocks, 16285 free inodes, 0 directories
> 31581 free blocks, 16312 free inodes, 0 directories
> 30484 free blocks, 16266 free inodes, 0 directories
> 31187 free blocks, 15954 free inodes, 0 directories
> 30693 free blocks, 16359 free inodes, 0 directories
> 31282 free blocks, 16276 free inodes, 0 directories
> 30689 free blocks, 16355 free inodes, 0 directories
> 31589 free blocks, 16258 free inodes, 0 directories
> 13310 free blocks, 14144 free inodes, 0 directories
:
: [snip]
:
> 13556 free blocks, 16218 free inodes, 0 directories
> 31510 free blocks, 16256 free inodes, 0 directories
> 13310 free blocks, 0 free inodes, 400 directories
> 0 free blocks, 9462 free inodes, 671 directories
> 3870 free blocks, 9683 free inodes, 401 directories
> 0 free blocks, 10254 free inodes, 1036 directories
> 0 free blocks, 12830 free inodes, 1040 directories
> 0 free blocks, 13954 free inodes, 886 directories
> 5497 free blocks, 15126 free inodes, 536 directories
In the ext3 case there are possibly a hundred different groups that
need to be updated, spread all over the disk.
> 685 free blocks, 0 free inodes, 1199 directories
> 4096 free blocks, 0 free inodes, 547 directories
> 3823 free blocks, 0 free inodes, 362 directories
> 4604 free blocks, 0 free inodes, 268 directories
> 2168 free blocks, 0 free inodes, 232 directories
:
: [snip]
:
> 2899 free blocks, 0 free inodes, 19 directories
> 6290 free blocks, 5438 free inodes, 15 directories, 5438 unused inodes
> 32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
> 32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
> 32768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
> 30768 free blocks, 16384 free inodes, 0 directories, 16384 unused inodes
:
In the ext4 case, there are maybe a dozen groups that are filled completely,
and the rest of the groups are untouched. This would suggest that less
seeking is needed to access the metadata for all of these files, instead
of more.
Recall again that we don't really care where the file data is
located in the unlink case, except that we need to need to update the
block bitmaps when the blocks are freed. Again in the ext4 case, since
there are fewer groups holding the inodes, there are also fewer groups
with blocks and it _should_ be that fewer block bitmaps need updating.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
[not found] ` <20081119024021.GA10185@mit.edu>
@ 2008-11-19 18:10 ` Andreas Dilger
2008-11-19 21:18 ` Theodore Tso
0 siblings, 1 reply; 17+ messages in thread
From: Andreas Dilger @ 2008-11-19 18:10 UTC (permalink / raw)
To: Theodore Tso; +Cc: linux-ext4
On Nov 18, 2008 21:40 -0500, Theodore Ts'o wrote:
> Looking at the blkparse profiles, doing an rm -rf given the ext4
> produced layout required 5130 megabytes. The exact same directory
> hierarchy, as laied out by ext3, required only 1294 megabytes.
> Looking at a few selected inode allocation bitmaps, we see that ext4
> will often need to write (and thus journal) the same block allocation
> bitmap block 4 or 5 times:
>
> 254,7 0 352 0.166492349 9376 C R 8216 + 8 [0]
> 254,7 0 348788 212.885545554 0 C W 8216 + 8 [0]
> 254,7 0 461448 309.533613765 0 C W 8216 + 8 [0]
> 254,7 0 827687 558.781690434 0 C W 8216 + 8 [0]
> 254,7 0 1210492 760.738217014 0 C W 8216 + 8 [0]
>
> However, the same block allocation block bitmap is only written once
> or twice.
>
> 254,8 0 3119 9.535331283 0 C R 524288 + 8 [0]
> 254,8 0 24504 45.253431031 0 C W 524288 + 8 [0]
> 254,8 0 85476 144.455205555 23903 C W 524288 + 8 [0]
Looking at the seekwatcher graphs, it is clear that the ext4 layout
is doing fewer seeks, and packing the data into a smaller part of
the filesystem, which is counter-intuitive to the performance result.
Even though the IO bandwidth is ostensibly higher (usually a good thing
on metadata benchmarks) that isn't any good if we are doing more writes.
It isn't immediately clear that _just_ the case of rewriting the same
block multiple times is a culprit in itself, because in the ext3 case
there would be more block bitmaps affeted that would _each_ be written
out 1 or 2 times, while the closer packing of ext4 allocations results
in fewer total bimaps being used.
One would think in the case of more sharing of a block bitmap would
result in a performance _increase_ because there is more chance that
it will be re-used within the same transaction.
> ext4:
> Reads Completed: 59947, 239788KiB
> Writes Completed: 1282K, 5130MiB
>
> ext3:
> Reads Completed: 64856, 259424KiB
> Writes Completed: 323582, 1294MiB
The reads look the about same, writes are 4x higher. What would be
useful to examine is the inode number grouping of files in the same
subdirectory, along with the blocks they are allocating. It seems
like the inodes are being packed more closely together, but the
blocks (and hence block bitmap writes) are spread further apart.
That may be a side-effect of the mballoc per-CPU cache again, where
files being written in the same subdirectory are spread apart because
of the write thread being rescheduled to different cores.
I discussed this in the past with Eric, in the case of a file doing
small writes+fsync and the blocks being fragmented needlessly between
different parts of the filesystem. The proposed solution in that case
(that Aneesh could probably fix quickly) is to attach an inode to the
per-CPU preallocation group on the first write (for small files). If it
doesn't get any more writes that is fine, but if it does then the same
PA would be used for further allocations regardless of what CPU is doing
the IO.
Another solution for that case, and (as I speculate) this case, is to
attach the PA to the parent directory and have all small files in the
same directory use that PA. This would ensure that blocks allocated to
small inodes in the same directory are kept together. The drawback is
that this could hurt performance for multiple threads writing to the
same directory.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-19 18:10 ` Andreas Dilger
@ 2008-11-19 21:18 ` Theodore Tso
2008-11-20 22:49 ` Bruce Guenter
0 siblings, 1 reply; 17+ messages in thread
From: Theodore Tso @ 2008-11-19 21:18 UTC (permalink / raw)
To: Andreas Dilger; +Cc: linux-ext4
On Wed, Nov 19, 2008 at 12:10:01PM -0600, Andreas Dilger wrote:
>
> That may be a side-effect of the mballoc per-CPU cache again, where
> files being written in the same subdirectory are spread apart because
> of the write thread being rescheduled to different cores.
>
It would be good for us to get confirmation one way or another about
this theory. Bruce, if you have multiple CPU's (or cores on your
system --- i.e., cat /proc/cpuinfo reports multiple processors), can
you try unpacking your tarball on a test ext4 filesystem using
something like:
taskset 1 tar xjf /path/to/my/tarball.tar.bz2
The "taskset 1" will bind the tar process to only run on a single
processors. If that significantly changes the time to do run rm -rf,
can you save a raw e2image using that workload? That would be very
useful indeed.
Alternatively, if you don't have the taskset command handy, you can
also add maxcpus=1 to the kernel boot command-line, which will force
the system to only use one cpu. Using taskset is much more
convenient, though.
Thanks!!
- Ted
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: ext4 unlink performance
2008-11-19 21:18 ` Theodore Tso
@ 2008-11-20 22:49 ` Bruce Guenter
0 siblings, 0 replies; 17+ messages in thread
From: Bruce Guenter @ 2008-11-20 22:49 UTC (permalink / raw)
To: linux-ext4
[-- Attachment #1: Type: text/plain, Size: 402 bytes --]
On Wed, Nov 19, 2008 at 04:18:40PM -0500, Theodore Tso wrote:
> It would be good for us to get confirmation one way or another about
> this theory. Bruce, if you have multiple CPU's (or cores on your
> system --- i.e., cat /proc/cpuinfo reports multiple processors),
The test system has only a single core CPU.
--
Bruce Guenter <bruce@untroubled.org> http://untroubled.org/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2008-11-20 22:49 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-13 18:57 ext4 unlink performance Bruce Guenter
2008-11-13 19:10 ` Bruce Guenter
2008-11-13 20:42 ` Theodore Tso
2008-11-14 4:11 ` Bruce Guenter
2008-11-14 14:59 ` Theodore Tso
2008-11-14 15:48 ` Bruce Guenter
2008-11-14 15:54 ` Theodore Tso
2008-11-15 20:44 ` Bruce Guenter
2008-11-15 23:08 ` Eric Sandeen
2008-11-16 0:56 ` Theodore Tso
2008-11-16 3:38 ` Bruce Guenter
2008-11-17 0:43 ` Andreas Dilger
[not found] ` <20081119024021.GA10185@mit.edu>
2008-11-19 18:10 ` Andreas Dilger
2008-11-19 21:18 ` Theodore Tso
2008-11-20 22:49 ` Bruce Guenter
2008-11-13 19:46 ` Theodore Tso
2008-11-13 20:27 ` Bruce Guenter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).