* Bug in delayed allocation: really bad block layouts!
@ 2008-08-10 17:30 Theodore Ts'o
2008-08-10 17:54 ` Eric Sandeen
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Theodore Ts'o @ 2008-08-10 17:30 UTC (permalink / raw)
To: linux-ext4
Thanks to a comment on a recent blog entry of mine[1], I think I've
uncovered a rather embarassing bug in mballoc.
[1]http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times
I created a fresh 5 gig ext4 filesystem, and then populating it using a
single-threaded tar command:
(cd /usr ; tar cf - bin lib) | (cd /mnt; tar xfp -)
I then unmounted the filesystem, and ran an instrumented e2fsck looking
for fragmented files, and found a whole series of fragmanted files with
the following pattern:
Inode 122: (0):58399, (1-3):43703-43705
Inode 124: (0):58400, (1):43707
Inode 127: (0):58401, (1-7):43709-43715
Inode 128: (0):58402, (1-2):43716-43717
Inode 129: (0):58403, (1-3):43718-43720
Inode 133: (0):58404, (1-5):43722-43726
Inode 135: (0):58405, (1):43728
Inode 136: (0):58406, (1-3):43729-43731
Inode 141: (0-1):58407-58408, (2-6):43734-43738
Inode 143: (0):58409, (1):43740
Inode 144: (0):58410, (1-5):43741-43745
Inode 146: (0):58411, (1):43746
Inode Pathname
122 /bin/smproxy
124 /bin/debconf-updatepo
127 /bin/iostat
128 /bin/xeyes
129 /bin/pbmtog3
133 /bin/join-dctrl
135 /bin/dpkg-name
136 /bin/lockfile
141 /bin/id
143 /bin/ppmcolormask
144 /bin/tty
146 /bin/colrm
If I do this test with -o nodelalloc, I get a slightly different
pattern. Now I get a whole series of discontiguous regions after the
first 15 blocks:
inode last_block pblk lblk len
=============================================
2932: was 47087 actual extent 41894 (15, 3)...
3512: was 47829 actual extent 41908 (15, 1)...
3535: was 47904 actual extent 41912 (15, 37)...
3549: was 47977 actual extent 41949 (15, 4)...
3637: was 48225 actual extent 41959 (15, 6)...
3641: was 48245 actual extent 41965 (15, 13)...
3675: was 48418 actual extent 41978 (15, 1)...
3675: was 41979 actual extent 48640 (16, 15)...
3714: was 41984 actual extent 48656 (1, 2)...
3954: was 49449 actual extent 48660 (15, 16)...
3999: was 49569 actual extent 48679 (15, 2)...
4010: was 49644 actual extent 48681 (15, 1)...
4143: was 49943 actual extent 48687 (15, 10)...
4202: was 50036 actual extent 48699 (15, 6)...
So all of the discontiguities start at logical block #15, and when I
examine the inodes, what we find is one extent for blocks 0-14, ending
at the last_block number, and then the second extent which extends for
the rest of the file, starting somewhere else earlier in the block
group.
So a very similar issue, even without delayed allocation. That leads me
to suspect the problem is somewhere inside mballoc. Aneesh, Andreas,
Alex --- I think you folks are most familiar the mballoc code the;
someone have time to take a look? This is clearly a bug, and clearly
something we want to fix. If we can't get an optimal layout with one
single-threaded process writing to the filesystem, what hope do we have
of getting it right on more realistic benchmarks or real-world usage?
- Ted
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Bug in delayed allocation: really bad block layouts! 2008-08-10 17:30 Bug in delayed allocation: really bad block layouts! Theodore Ts'o @ 2008-08-10 17:54 ` Eric Sandeen 2008-08-10 18:22 ` Theodore Tso 2008-08-10 18:28 ` Theodore Tso ` (2 subsequent siblings) 3 siblings, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2008-08-10 17:54 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 Theodore Ts'o wrote: > Thanks to a comment on a recent blog entry of mine[1], I think I've > uncovered a rather embarassing bug in mballoc. > > [1]http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times > > I created a fresh 5 gig ext4 filesystem, and then populating it using a > single-threaded tar command: > > (cd /usr ; tar cf - bin lib) | (cd /mnt; tar xfp -) > > I then unmounted the filesystem, and ran an instrumented e2fsck looking > for fragmented files, and found a whole series of fragmanted files with > the following pattern: Hm, last year I reported "delalloc fragmenting files?" and Alex sent some patches, I wonder if they all got integrated. I guess I should re-run that simple large-file dd test and see how we're doing now... ok, on a 1G single-threaded write it looks sane: Discontinuity: Block 53248 is at 100352 (was 98303) Discontinuity: Block 116736 is at 165888 (was 163839) Discontinuity: Block 180224 is at 231424 (was 229375) Discontinuity: Block 243712 is at 296960 (was 294911) so I suppose this is not the same issue ... Hm, and I tried writing out 10 files in order as a simple test but umount/remount brought me back many 0-byte files, I need to update my patchset I guess. :) -Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-10 17:54 ` Eric Sandeen @ 2008-08-10 18:22 ` Theodore Tso 2008-08-10 18:54 ` Eric Sandeen 0 siblings, 1 reply; 15+ messages in thread From: Theodore Tso @ 2008-08-10 18:22 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-ext4 On Sun, Aug 10, 2008 at 12:54:00PM -0500, Eric Sandeen wrote: > > Hm, and I tried writing out 10 files in order as a simple test but > umount/remount brought me back many 0-byte files, I need to update my > patchset I guess. :) > One of the questions in my mind is whether this is a regression triggered by the some of our most recent patches.... since I only have 2.2% files reported a fragmented by e2fsck, and if this problem had always been there, I would have expected a much higher fragmentation number. So if you have some older kernels, you might want to see if you can replicate the problem. I've since found that just doing a copy via "(tar -cf - -C / usr/include ) | tar -C /mnt -xf -)" is sufficient to see the problem. Just add a "sync; sleep 5" before the umount. :-) - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-10 18:22 ` Theodore Tso @ 2008-08-10 18:54 ` Eric Sandeen 2008-08-10 20:04 ` Eric Sandeen 0 siblings, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2008-08-10 18:54 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 Theodore Tso wrote: > On Sun, Aug 10, 2008 at 12:54:00PM -0500, Eric Sandeen wrote: >> Hm, and I tried writing out 10 files in order as a simple test but >> umount/remount brought me back many 0-byte files, I need to update my >> patchset I guess. :) >> > > One of the questions in my mind is whether this is a regression > triggered by the some of our most recent patches.... since I only > have 2.2% files reported a fragmented by e2fsck, and if this problem > had always been there, I would have expected a much higher > fragmentation number. So if you have some older kernels, you might > want to see if you can replicate the problem. I've since found that > just doing a copy via "(tar -cf - -C / usr/include ) | tar -C /mnt -xf -)" > is sufficient to see the problem. Just add a "sync; sleep 5" before > the umount. :-) It may be; I tried this and then a quick filefrag run: # filefrag usr/include/*.h | grep -v extents | awk -F : '{print $2}' | sort | uniq -c 146 1 extent found so everything came out contiguous. This was with 2.6.27-0.186.rc0.git15.fc10.x86_64 -Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-10 18:54 ` Eric Sandeen @ 2008-08-10 20:04 ` Eric Sandeen 2008-08-11 1:46 ` Theodore Tso 0 siblings, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2008-08-10 20:04 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 Eric Sandeen wrote: > It may be; I tried this and then a quick filefrag run: > > # filefrag usr/include/*.h | grep -v extents | awk -F : '{print $2}' | > sort | uniq -c > 146 1 extent found > > so everything came out contiguous. > > This was with 2.6.27-0.186.rc0.git15.fc10.x86_64 Or better: # find . -name \*.h | xargs filefrag | grep -v extents | awk -F : '{print $2}' | sort | uniq -c 2460 1 extent found so *really* everything. -Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-10 20:04 ` Eric Sandeen @ 2008-08-11 1:46 ` Theodore Tso 2008-08-11 5:36 ` Eric Sandeen 0 siblings, 1 reply; 15+ messages in thread From: Theodore Tso @ 2008-08-11 1:46 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-ext4 1;1613;0cOn Sun, Aug 10, 2008 at 03:04:00PM -0500, Eric Sandeen wrote: > > # find . -name \*.h | xargs filefrag | grep -v extents | awk -F : > '{print $2}' | sort | uniq -c > 2460 1 extent found > > so *really* everything. > Um, no, because you filtered out "extents", and filefrag will print "2 extents found", so that it gets the singular/plural correct. One of the things I've been meaning to do is to clean up the output of filefrag so that it's a bit easier to parse. It's really stupid for it to print "file is in extents format" over and over again... I've done some research on my laptop's filesystem, and I've found examples of files with the characteristic horrendous layout going back as far back to at least July 7th. (See below) This takes us back to about the time when we were stablizing the patches that went into 2.6.26, and before the most recent set of patches to fix the journal credits and delayed allocation patches. Interestingly, I'm not seeing the bad layouts for files created June 30th, which is when I switched my laptop's root filesystem to ext4. So that gives us a pretty good time bound on the patches that might be involved. (This is a great demonstration of what it's a real win to manage the ext4 patch queue using git, since we can see which patches were added during that week or so.) In other news, I've added an enhancement to e2fsck which makes it much easier to spot these sorts of problems. On the latest e2fsprogs git branch, you'll be able to do "e2fsck -E fragcheck /dev/thunk/root" and get a report which makes it really ease to see the problem. - Ted debugfs: stat <4545760> Inode: 4545760 Type: regular Mode: 0755 Flags: 0x80000 Generation: 278636620 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 6669 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 16 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x487282ce:50c55b54 -- Mon Jul 7 16:55:42 2008 atime: 0x487282cd:00000000 -- Mon Jul 7 16:55:41 2008 mtime: 0x485d754f:00000000 -- Sat Jun 21 17:40:31 2008 crtime: 0x487282ce:50c55b54 -- Mon Jul 7 16:55:42 2008 Size of extra inode fields: 28 BLOCKS: (0):19472457, (1):19816447 TOTAL: 2 debugfs: stat <4545768> Inode: 4545768 Type: regular Mode: 0755 Flags: 0x80000 Generation: 278636628 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 7655 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 16 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x487282ce:5a3cff0c -- Mon Jul 7 16:55:42 2008 atime: 0x487282cd:00000000 -- Mon Jul 7 16:55:41 2008 mtime: 0x485d754f:00000000 -- Sat Jun 21 17:40:31 2008 crtime: 0x487282ce:5a3cff0c -- Mon Jul 7 16:55:42 2008 Size of extra inode fields: 28 BLOCKS: (0):19472458, (1):20849215 TOTAL: 2 debugfs: stat <4545769> Inode: 4545769 Type: regular Mode: 0755 Flags: 0x80000 Generation: 278636629 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 17452 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 40 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x487282ce:5a3cff0c -- Mon Jul 7 16:55:42 2008 atime: 0x487282cd:00000000 -- Mon Jul 7 16:55:41 2008 mtime: 0x485d754f:00000000 -- Sat Jun 21 17:40:31 2008 crtime: 0x487282ce:5a3cff0c -- Mon Jul 7 16:55:42 2008 Size of extra inode fields: 28 BLOCKS: (0):19472459, (1-4):20849216-20849219 TOTAL: 5 debugfs: ncheck 4545760 4545768 4545769 Inode Pathname 4545760 /usr/projects/sid-root/usr/bin/piconv 4545768 /usr/projects/sid-root/usr/bin/shasum 4545769 /usr/projects/sid-root/usr/bin/splain ----------------------Example of files created June 30th debugfs: stat <18246> Inode: 18246 Type: regular Mode: 0755 Flags: 0x80000 Generation: 1457985356 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 24448 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 48 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4868cc68:d7f10208 -- Mon Jun 30 08:07:04 2008 atime: 0x4868cbfa:8ac0ef60 -- Mon Jun 30 08:05:14 2008 mtime: 0x47f5cddd:00000000 -- Fri Apr 4 02:42:37 2008 crtime: 0x4868cc68:d724eee4 -- Mon Jun 30 08:07:04 2008 Size of extra inode fields: 28 BLOCKS: (0-5):351687-351692 TOTAL: 6 debugfs: stat <18247> Inode: 18247 Type: regular Mode: 0755 Flags: 0x80000 Generation: 1457985357 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 1297 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4868cc68:d7f10208 -- Mon Jun 30 08:07:04 2008 atime: 0x4868cbfa:8ac0ef60 -- Mon Jun 30 08:05:14 2008 mtime: 0x47e38ec1:00000000 -- Fri Mar 21 06:32:33 2008 crtime: 0x4868cc68:d7f10208 -- Mon Jun 30 08:07:04 2008 Size of extra inode fields: 28 BLOCKS: (0):351693 TOTAL: 1 debugfs: stat <18248> Inode: 18248 Type: regular Mode: 0755 Flags: 0x80000 Generation: 1457985358 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 28492 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 56 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4868cc68:d7f10208 -- Mon Jun 30 08:07:04 2008 atime: 0x4868cbfa:8ac0ef60 -- Mon Jun 30 08:05:14 2008 mtime: 0x47398252:00000000 -- Tue Nov 13 05:54:10 2007 crtime: 0x4868cc68:d7f10208 -- Mon Jun 30 08:07:04 2008 Size of extra inode fields: 28 BLOCKS: (0-6):351694-351700 TOTAL: 7 debugfs: ncheck 18246 18247 18248 Inode Pathname 18246 /bin/uname 18247 /bin/bzmore 18248 /bin/mt-gnu ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-11 1:46 ` Theodore Tso @ 2008-08-11 5:36 ` Eric Sandeen 2008-08-18 10:50 ` Andreas Dilger 0 siblings, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2008-08-11 5:36 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 Theodore Tso wrote: > 1;1613;0cOn Sun, Aug 10, 2008 at 03:04:00PM -0500, Eric Sandeen wrote: >> # find . -name \*.h | xargs filefrag | grep -v extents | awk -F : >> '{print $2}' | sort | uniq -c >> 2460 1 extent found >> >> so *really* everything. >> > > Um, no, because you filtered out "extents", and filefrag will print "2 > extents found", so that it gets the singular/plural correct. Oh, heh, whoops! :) But still: [root@inode test]# find . -name \*.h | xargs filefrag | grep -v "extents format" | awk -F : '{print $2}' | sort | uniq -c 2461 1 extent found and to be sure: [root@inode test]# find . -name \*.h | xargs filefrag | grep "extents found" [root@inode test]# > One of > the things I've been meaning to do is to clean up the output of > filefrag so that it's a bit easier to parse. It's really stupid for > it to print "file is in extents format" over and over again... I'm not really even sure why it should print it at all, ever, at least without -vvv or something. (That's what I meant to filter the first time ;) If the tool is to report layout I'm not sure the trivia about how the layout is tracked internally is all that interesting. The fiemap patches for filefrag that print it out more as a table format, more like xfs_bmap does, should be a lot better. But anyway... > I've done some research on my laptop's filesystem, and I've found > examples of files with the characteristic horrendous layout going back > as far back to at least July 7th. (See below) > > This takes us back to about the time when we were stablizing the > patches that went into 2.6.26, and before the most recent set of > patches to fix the journal credits and delayed allocation patches. > > Interestingly, I'm not seeing the bad layouts for files created June > 30th, which is when I switched my laptop's root filesystem to ext4. > So that gives us a pretty good time bound on the patches that might be > involved. (This is a great demonstration of what it's a real win to > manage the ext4 patch queue using git, since we can see which patches > were added during that week or so.) > > In other news, I've added an enhancement to e2fsck which makes it much > easier to spot these sorts of problems. On the latest e2fsprogs git > branch, you'll be able to do "e2fsck -E fragcheck /dev/thunk/root" and > get a report which makes it really ease to see the problem. > > - Ted Hm... I wonder why I'm not seeing it? -Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-11 5:36 ` Eric Sandeen @ 2008-08-18 10:50 ` Andreas Dilger 0 siblings, 0 replies; 15+ messages in thread From: Andreas Dilger @ 2008-08-18 10:50 UTC (permalink / raw) To: Eric Sandeen; +Cc: Theodore Tso, linux-ext4, Kalpak Shah On Aug 11, 2008 00:36 -0500, Eric Sandeen wrote: > Theodore Tso wrote: > > One of > > the things I've been meaning to do is to clean up the output of > > filefrag so that it's a bit easier to parse. It's really stupid for > > it to print "file is in extents format" over and over again... > > I'm not really even sure why it should print it at all, ever, at least > without -vvv or something. (That's what I meant to filter the first > time ;) If the tool is to report layout I'm not sure the trivia about > how the layout is tracked internally is all that interesting. The > fiemap patches for filefrag that print it out more as a table format, > more like xfs_bmap does, should be a lot better. But anyway... Actually, the FIEMAP patches for filefrag provide a much nicer format for the output, more like what xfs_bmap does. For now it only does this if requested with the "-e" option (--extents) but if Ted is not adverse to changing the output format of filefrag we could make it the default. The whole Lustre patchset (against 1.40.11) is at http://downloads.lustre.org/public/tools/e2fsprogs/latest/e2fsprogs-1.40.11-sun1-patches.tar.gz but Kalpak or I can send just the FIEMAP one again if you are interested. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-10 17:30 Bug in delayed allocation: really bad block layouts! Theodore Ts'o 2008-08-10 17:54 ` Eric Sandeen @ 2008-08-10 18:28 ` Theodore Tso 2008-08-11 7:55 ` Aneesh Kumar K.V 2008-08-11 14:39 ` Aneesh Kumar K.V 3 siblings, 0 replies; 15+ messages in thread From: Theodore Tso @ 2008-08-10 18:28 UTC (permalink / raw) To: linux-ext4 P.S. Please excuse the subject line; when I was first writing up the e-mail, I thought the issue was caused by a bug with how delayed allocation when under memory pressure interacted with mballoc(). Subsequent tests showed my initial theory was incorrect, and I fixed the text of my e-mail, but not the subject line.... - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-10 17:30 Bug in delayed allocation: really bad block layouts! Theodore Ts'o 2008-08-10 17:54 ` Eric Sandeen 2008-08-10 18:28 ` Theodore Tso @ 2008-08-11 7:55 ` Aneesh Kumar K.V 2008-08-11 14:39 ` Aneesh Kumar K.V 3 siblings, 0 replies; 15+ messages in thread From: Aneesh Kumar K.V @ 2008-08-11 7:55 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Sun, Aug 10, 2008 at 01:30:14PM -0400, Theodore Ts'o wrote: > > Thanks to a comment on a recent blog entry of mine[1], I think I've > uncovered a rather embarassing bug in mballoc. > > [1]http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times > > I created a fresh 5 gig ext4 filesystem, and then populating it using a > single-threaded tar command: > > (cd /usr ; tar cf - bin lib) | (cd /mnt; tar xfp -) > > I then unmounted the filesystem, and ran an instrumented e2fsck looking > for fragmented files, and found a whole series of fragmanted files with > the following pattern: > > Inode 122: (0):58399, (1-3):43703-43705 > Inode 124: (0):58400, (1):43707 > Inode 127: (0):58401, (1-7):43709-43715 > Inode 128: (0):58402, (1-2):43716-43717 > Inode 129: (0):58403, (1-3):43718-43720 > Inode 133: (0):58404, (1-5):43722-43726 > Inode 135: (0):58405, (1):43728 > Inode 136: (0):58406, (1-3):43729-43731 > Inode 141: (0-1):58407-58408, (2-6):43734-43738 > Inode 143: (0):58409, (1):43740 > Inode 144: (0):58410, (1-5):43741-43745 > Inode 146: (0):58411, (1):43746 > > Inode Pathname > 122 /bin/smproxy > 124 /bin/debconf-updatepo > 127 /bin/iostat > 128 /bin/xeyes > 129 /bin/pbmtog3 > 133 /bin/join-dctrl > 135 /bin/dpkg-name > 136 /bin/lockfile > 141 /bin/id > 143 /bin/ppmcolormask > 144 /bin/tty > 146 /bin/colrm > > If I do this test with -o nodelalloc, I get a slightly different > pattern. Now I get a whole series of discontiguous regions after the > first 15 blocks: > > inode last_block pblk lblk len > ============================================= > 2932: was 47087 actual extent 41894 (15, 3)... > 3512: was 47829 actual extent 41908 (15, 1)... > 3535: was 47904 actual extent 41912 (15, 37)... > 3549: was 47977 actual extent 41949 (15, 4)... > 3637: was 48225 actual extent 41959 (15, 6)... > 3641: was 48245 actual extent 41965 (15, 13)... > 3675: was 48418 actual extent 41978 (15, 1)... > 3675: was 41979 actual extent 48640 (16, 15)... > 3714: was 41984 actual extent 48656 (1, 2)... > 3954: was 49449 actual extent 48660 (15, 16)... > 3999: was 49569 actual extent 48679 (15, 2)... > 4010: was 49644 actual extent 48681 (15, 1)... > 4143: was 49943 actual extent 48687 (15, 10)... > 4202: was 50036 actual extent 48699 (15, 6)... > > So all of the discontiguities start at logical block #15, and when I > examine the inodes, what we find is one extent for blocks 0-14, ending > at the last_block number, and then the second extent which extends for > the rest of the file, starting somewhere else earlier in the block > group. > > So a very similar issue, even without delayed allocation. That leads me > to suspect the problem is somewhere inside mballoc. Aneesh, Andreas, > Alex --- I think you folks are most familiar the mballoc code the; > someone have time to take a look? This is clearly a bug, and clearly > something we want to fix. If we can't get an optimal layout with one > single-threaded process writing to the filesystem, what hope do we have > of getting it right on more realistic benchmarks or real-world usage? > Are these small files ? The locality group preallocation is discarded with the recent changes. 6be2ded1d7c51b39144b9f07d2c839e1bd8707f1 on linus tree ? -aneesh ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-10 17:30 Bug in delayed allocation: really bad block layouts! Theodore Ts'o ` (2 preceding siblings ...) 2008-08-11 7:55 ` Aneesh Kumar K.V @ 2008-08-11 14:39 ` Aneesh Kumar K.V 2008-08-11 18:15 ` Aneesh Kumar K.V 3 siblings, 1 reply; 15+ messages in thread From: Aneesh Kumar K.V @ 2008-08-11 14:39 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Sun, Aug 10, 2008 at 01:30:14PM -0400, Theodore Ts'o wrote: > > Thanks to a comment on a recent blog entry of mine[1], I think I've > uncovered a rather embarassing bug in mballoc. > > [1]http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times > > I created a fresh 5 gig ext4 filesystem, and then populating it using a > single-threaded tar command: > > (cd /usr ; tar cf - bin lib) | (cd /mnt; tar xfp -) > > I then unmounted the filesystem, and ran an instrumented e2fsck looking > for fragmented files, and found a whole series of fragmanted files with > the following pattern: > > Inode 122: (0):58399, (1-3):43703-43705 > Inode 124: (0):58400, (1):43707 > Inode 127: (0):58401, (1-7):43709-43715 > Inode 128: (0):58402, (1-2):43716-43717 > Inode 129: (0):58403, (1-3):43718-43720 > Inode 133: (0):58404, (1-5):43722-43726 > Inode 135: (0):58405, (1):43728 > Inode 136: (0):58406, (1-3):43729-43731 > Inode 141: (0-1):58407-58408, (2-6):43734-43738 > Inode 143: (0):58409, (1):43740 > Inode 144: (0):58410, (1-5):43741-43745 > Inode 146: (0):58411, (1):43746 > > Inode Pathname > 122 /bin/smproxy > 124 /bin/debconf-updatepo > 127 /bin/iostat > 128 /bin/xeyes > 129 /bin/pbmtog3 > 133 /bin/join-dctrl > 135 /bin/dpkg-name > 136 /bin/lockfile > 141 /bin/id > 143 /bin/ppmcolormask > 144 /bin/tty > 146 /bin/colrm > > If I do this test with -o nodelalloc, I get a slightly different > pattern. Now I get a whole series of discontiguous regions after the > first 15 blocks: > > inode last_block pblk lblk len > ============================================= > 2932: was 47087 actual extent 41894 (15, 3)... > 3512: was 47829 actual extent 41908 (15, 1)... > 3535: was 47904 actual extent 41912 (15, 37)... > 3549: was 47977 actual extent 41949 (15, 4)... > 3637: was 48225 actual extent 41959 (15, 6)... > 3641: was 48245 actual extent 41965 (15, 13)... > 3675: was 48418 actual extent 41978 (15, 1)... > 3675: was 41979 actual extent 48640 (16, 15)... > 3714: was 41984 actual extent 48656 (1, 2)... > 3954: was 49449 actual extent 48660 (15, 16)... > 3999: was 49569 actual extent 48679 (15, 2)... > 4010: was 49644 actual extent 48681 (15, 1)... > 4143: was 49943 actual extent 48687 (15, 10)... > 4202: was 50036 actual extent 48699 (15, 6)... > > So all of the discontiguities start at logical block #15, and when I > examine the inodes, what we find is one extent for blocks 0-14, ending > at the last_block number, and then the second extent which extends for > the rest of the file, starting somewhere else earlier in the block > group. > > So a very similar issue, even without delayed allocation. That leads me > to suspect the problem is somewhere inside mballoc. Aneesh, Andreas, > Alex --- I think you folks are most familiar the mballoc code the; > someone have time to take a look? This is clearly a bug, and clearly > something we want to fix. If we can't get an optimal layout with one > single-threaded process writing to the filesystem, what hope do we have > of getting it right on more realistic benchmarks or real-world usage? > Can you try this patch ? The patch make group preallocation use the goal block. diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 865e9dd..25fe375 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3281,6 +3281,29 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac, mb_debug("use %u/%u from group pa %p\n", pa->pa_lstart-len, len, pa); } +static struct ext4_prealloc_space * +ext4_mb_check_group_pa(ext4_fsblk_t goal_block, + struct ext4_prealloc_space *pa, + struct ext4_prealloc_space *cpa) +{ + ext4_fsblk_t cur_distance, new_distance; + + if (cpa == NULL) { + atomic_inc(&pa->pa_count); + return pa; + } + cur_distance = abs(goal_block - cpa->pa_pstart); + new_distance = abs(goal_block - pa->pa_pstart); + + if (cur_distance < new_distance) + return cpa; + + /* drop the previous reference */ + atomic_dec(&cpa->pa_count); + atomic_inc(&pa->pa_count); + return pa; +} + /* * search goal blocks in preallocated space */ @@ -3290,7 +3313,8 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) int order, i; struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); struct ext4_locality_group *lg; - struct ext4_prealloc_space *pa; + struct ext4_prealloc_space *pa, *cpa = NULL; + ext4_fsblk_t goal_block; /* only data can be preallocated */ if (!(ac->ac_flags & EXT4_MB_HINT_DATA)) @@ -3333,6 +3357,10 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) /* The max size of hash table is PREALLOC_TB_SIZE */ order = PREALLOC_TB_SIZE - 1; + goal_block = ac->ac_g_ex.fe_group * EXT4_BLOCKS_PER_GROUP(ac->ac_sb) + + ac->ac_g_ex.fe_start + + le32_to_cpu(EXT4_SB(ac->ac_sb)->s_es->s_first_data_block); + for (i = order; i < PREALLOC_TB_SIZE; i++) { rcu_read_lock(); list_for_each_entry_rcu(pa, &lg->lg_prealloc_list[i], @@ -3340,17 +3368,19 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) spin_lock(&pa->pa_lock); if (pa->pa_deleted == 0 && pa->pa_free >= ac->ac_o_ex.fe_len) { - atomic_inc(&pa->pa_count); - ext4_mb_use_group_pa(ac, pa); - spin_unlock(&pa->pa_lock); - ac->ac_criteria = 20; - rcu_read_unlock(); - return 1; + + cpa = ext4_mb_check_group_pa(goal_block, + pa, cpa); } spin_unlock(&pa->pa_lock); } rcu_read_unlock(); } + if (cpa) { + ext4_mb_use_group_pa(ac, cpa); + ac->ac_criteria = 20; + return 1; + } return 0; } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-11 14:39 ` Aneesh Kumar K.V @ 2008-08-11 18:15 ` Aneesh Kumar K.V 2008-08-13 2:32 ` Theodore Tso 0 siblings, 1 reply; 15+ messages in thread From: Aneesh Kumar K.V @ 2008-08-11 18:15 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Mon, Aug 11, 2008 at 08:09:12PM +0530, Aneesh Kumar K.V wrote: > Can you try this patch ? The patch make group preallocation use the goal > block. > Results with and without patch. http://www.radian.org/~kvaneesh/ext4/lg-fragmentation/ -aneesh ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-11 18:15 ` Aneesh Kumar K.V @ 2008-08-13 2:32 ` Theodore Tso 2008-08-13 10:52 ` Aneesh Kumar K.V 0 siblings, 1 reply; 15+ messages in thread From: Theodore Tso @ 2008-08-13 2:32 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: linux-ext4 On Mon, Aug 11, 2008 at 11:45:24PM +0530, Aneesh Kumar K.V wrote: > On Mon, Aug 11, 2008 at 08:09:12PM +0530, Aneesh Kumar K.V wrote: > > Can you try this patch ? The patch make group preallocation use the goal > > block. > > > > Results with and without patch. > > http://www.radian.org/~kvaneesh/ext4/lg-fragmentation/ > My results match yours; seems to be a bit better, but it's not fixing the fundamental problem. With the patch: 26524: expecting 638190 actual extent phys 631960 log 1 len 1 26527: expecting 638191 actual extent phys 631963 log 1 len 1 26533: expecting 638192 actual extent phys 631976 log 1 len 5 26534: expecting 638193 actual extent phys 631981 log 1 len 2 26536: expecting 638194 actual extent phys 631984 log 1 len 6 26538: expecting 638195 actual extent phys 631991 log 1 len 5 26540: expecting 638196 actual extent phys 631997 log 1 len 2 26545: expecting 638197 actual extent phys 632009 log 1 len 1 26546: expecting 638198 actual extent phys 632010 log 1 len 6 26604: expecting 638199 actual extent phys 632156 log 1 len 1 Useing debugfs's stat command to look at the blocks: 26524: (0):638189, (1):631960 26527: (0):638190, (1):631963 26533: (0):638191, (1-5):631976-631980 26534: (0):638192, (1-2):631981-631982 26536: (0):638193, (1-6):631984-631989 26538: (0):638194, (1-5):631991-631995 26540: (0):638195, (1-2):631997-631998 26545: (0):638196, (1):632009 26546: (0):638197, (1-6):632010-632015 Out of curiosity, I also probed the inode numbers that were out of sequence from above. They seem to be mostly allocating out of the numbers used for the second extent, above. 26526: (0):631961 26526: (0):631962 26528: (0):631964 26529: (0):411742 26530: (0):631965 26531: (0-1):631966-631967 26532: (0-7):631968-631975 26535: (0):631983 26537: (0):631990 26541: (0-7):631999-632006 26542: (0):632007 26543: (0):632008 26544: (0):411743 26547: (0):632016 Inode Pathname 26524 /lib/rhythmbox/plugins/lyrics/LyricsConfigureDialog.py 26525 /lib/rhythmbox/plugins/lyrics/LyrcParser.py 26526 /lib/rhythmbox/plugins/lyrics/LyricsParse.py 26527 /lib/rhythmbox/plugins/lyrics/LyricsConfigureDialog.pyc 26528 /lib/rhythmbox/plugins/lyrics/WinampcnParser.py 26529 /lib/rhythmbox/plugins/magnatune 26530 /lib/rhythmbox/plugins/magnatune/magnatune_logo_color_small.png 26531 /lib/rhythmbox/plugins/magnatune/magnatune.rb-plugin 26532 /lib/rhythmbox/plugins/magnatune/magnatune-prefs.glade 26533 /lib/rhythmbox/plugins/magnatune/MagnatuneSource.pyc 26534 /lib/rhythmbox/plugins/magnatune/__init__.py 26535 /lib/rhythmbox/plugins/magnatune/BuyAlbumHandler.py 26536 /lib/rhythmbox/plugins/magnatune/magnatune-purchase.glade 26537 /lib/rhythmbox/plugins/magnatune/TrackListHandler.py 26538 /lib/rhythmbox/plugins/magnatune/MagnatuneSource.py 26539 /lib/rhythmbox/plugins/magnatune/magnatune_logo_color_tiny.png 26540 /lib/rhythmbox/plugins/magnatune/__init__.pyc 26541 /lib/rhythmbox/plugins/magnatune/magnatune-loading.glade 26542 /lib/rhythmbox/plugins/magnatune/TrackListHandler.pyc 26543 /lib/rhythmbox/plugins/magnatune/BuyAlbumHandler.pyc 26544 /lib/rhythmbox/plugins/audioscrobbler 26546 /lib/rhythmbox/plugins/audioscrobbler/audioscrobbler-prefs.glade 26547 /lib/rhythmbox/plugins/audioscrobbler/audioscrobbler-ui.xml Looks like we still have some problems with the block allocator... - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-13 2:32 ` Theodore Tso @ 2008-08-13 10:52 ` Aneesh Kumar K.V 2008-08-14 21:49 ` Mingming Cao 0 siblings, 1 reply; 15+ messages in thread From: Aneesh Kumar K.V @ 2008-08-13 10:52 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 On Tue, Aug 12, 2008 at 10:32:05PM -0400, Theodore Tso wrote: > On Mon, Aug 11, 2008 at 11:45:24PM +0530, Aneesh Kumar K.V wrote: > > On Mon, Aug 11, 2008 at 08:09:12PM +0530, Aneesh Kumar K.V wrote: > > > Can you try this patch ? The patch make group preallocation use the goal > > > block. > > > > > > > Results with and without patch. > > > > http://www.radian.org/~kvaneesh/ext4/lg-fragmentation/ > > > > My results match yours; seems to be a bit better, but it's not fixing > the fundamental problem. With the patch: > > 26524: expecting 638190 actual extent phys 631960 log 1 len 1 > 26527: expecting 638191 actual extent phys 631963 log 1 len 1 > 26533: expecting 638192 actual extent phys 631976 log 1 len 5 > 26534: expecting 638193 actual extent phys 631981 log 1 len 2 > 26536: expecting 638194 actual extent phys 631984 log 1 len 6 > 26538: expecting 638195 actual extent phys 631991 log 1 len 5 > 26540: expecting 638196 actual extent phys 631997 log 1 len 2 > 26545: expecting 638197 actual extent phys 632009 log 1 len 1 > 26546: expecting 638198 actual extent phys 632010 log 1 len 6 > 26604: expecting 638199 actual extent phys 632156 log 1 len 1 > > Useing debugfs's stat command to look at the blocks: > > 26524: (0):638189, (1):631960 > 26527: (0):638190, (1):631963 > 26533: (0):638191, (1-5):631976-631980 > 26534: (0):638192, (1-2):631981-631982 > 26536: (0):638193, (1-6):631984-631989 > 26538: (0):638194, (1-5):631991-631995 > 26540: (0):638195, (1-2):631997-631998 > 26545: (0):638196, (1):632009 > 26546: (0):638197, (1-6):632010-632015 I am not sure why we are getting single block request for inodes 26524 etc. With delayed alloc we should have got 2 block request. > > Out of curiosity, I also probed the inode numbers that were out of > sequence from above. They seem to be mostly allocating out of the > numbers used for the second extent, above. > > 26526: (0):631961 > 26526: (0):631962 > 26528: (0):631964 > 26529: (0):411742 > 26530: (0):631965 > 26531: (0-1):631966-631967 > 26532: (0-7):631968-631975 > 26535: (0):631983 > 26537: (0):631990 > 26541: (0-7):631999-632006 > 26542: (0):632007 > 26543: (0):632008 > 26544: (0):411743 > 26547: (0):632016 > > Inode Pathname > 26524 /lib/rhythmbox/plugins/lyrics/LyricsConfigureDialog.py > 26525 /lib/rhythmbox/plugins/lyrics/LyrcParser.py > 26526 /lib/rhythmbox/plugins/lyrics/LyricsParse.py > 26527 /lib/rhythmbox/plugins/lyrics/LyricsConfigureDialog.pyc > 26528 /lib/rhythmbox/plugins/lyrics/WinampcnParser.py > 26529 /lib/rhythmbox/plugins/magnatune > 26530 /lib/rhythmbox/plugins/magnatune/magnatune_logo_color_small.png > 26531 /lib/rhythmbox/plugins/magnatune/magnatune.rb-plugin > 26532 /lib/rhythmbox/plugins/magnatune/magnatune-prefs.glade > 26533 /lib/rhythmbox/plugins/magnatune/MagnatuneSource.pyc > 26534 /lib/rhythmbox/plugins/magnatune/__init__.py > 26535 /lib/rhythmbox/plugins/magnatune/BuyAlbumHandler.py > 26536 /lib/rhythmbox/plugins/magnatune/magnatune-purchase.glade > 26537 /lib/rhythmbox/plugins/magnatune/TrackListHandler.py > 26538 /lib/rhythmbox/plugins/magnatune/MagnatuneSource.py > 26539 /lib/rhythmbox/plugins/magnatune/magnatune_logo_color_tiny.png > 26540 /lib/rhythmbox/plugins/magnatune/__init__.pyc > 26541 /lib/rhythmbox/plugins/magnatune/magnatune-loading.glade > 26542 /lib/rhythmbox/plugins/magnatune/TrackListHandler.pyc > 26543 /lib/rhythmbox/plugins/magnatune/BuyAlbumHandler.pyc > 26544 /lib/rhythmbox/plugins/audioscrobbler > 26546 /lib/rhythmbox/plugins/audioscrobbler/audioscrobbler-prefs.glade > 26547 /lib/rhythmbox/plugins/audioscrobbler/audioscrobbler-ui.xml > > Looks like we still have some problems with the block allocator... The problem is with delalloc and mballoc locality group. With delalloc we use pdflush to write the pages. Small file allocation use a per-cpu prealloc space. In my understanding using Per-CPU prealloc space is fine without delalloc. Because without delalloc get_block happens in the process context at write_begin and OS scheduler will not schedule the task to other CPU unless needed. With delalloc we have pdflush doing block allocation and using per-cpu may not really help here. So i tried a small patch as below. But that didn't help much. Also the patch would increase contention on the locality group mutex. So i guess the change is not worth. But with delalloc we should have got multiple block request together. That implies we should get a single get_block request for the whole file. I will have to instrument the kernel to understand why it is not happening. Even though the files are fragmented I guess we have those blocks closer on disk right ? diff --git a/fs/ext4/ext4_i.h b/fs/ext4/ext4_i.h index ef7409f..734b6ef 100644 --- a/fs/ext4/ext4_i.h +++ b/fs/ext4/ext4_i.h @@ -163,6 +163,8 @@ struct ext4_inode_info { /* mballoc */ struct list_head i_prealloc_list; spinlock_t i_prealloc_lock; + /* locality group used for block allocation */ + struct ext4_locality_group *lg; /* allocation reservation info for delalloc */ unsigned long i_reserved_data_blocks; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 25fe375..293f048 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4061,9 +4061,10 @@ static inline void ext4_mb_show_ac(struct ext4_allocation_context *ac) */ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) { - struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); - int bsbits = ac->ac_sb->s_blocksize_bits; loff_t size, isize; + int bsbits = ac->ac_sb->s_blocksize_bits; + struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); + struct ext4_inode_info *ei = EXT4_I(ac->ac_inode); if (!(ac->ac_flags & EXT4_MB_HINT_DATA)) return; @@ -4085,13 +4086,23 @@ static void ext4_mb_group_or_file(struct ext4_allocation_context *ac) * per cpu locality group is to reduce the contention between block * request from multiple CPUs. */ - ac->ac_lg = &sbi->s_locality_groups[get_cpu()]; - put_cpu(); + if (ei->lg) + ac->ac_lg = ei->lg; + else { + ac->ac_lg = &sbi->s_locality_groups[get_cpu()]; + ei->lg = ac->ac_lg; + put_cpu(); + } /* we're going to use group allocation */ ac->ac_flags |= EXT4_MB_HINT_GROUP_ALLOC; - /* serialize all allocations in the group */ + /* + * serialize all allocations in the group + * If we find lot of contention we may want + * to add waiters count and use other lg if + * we have large number of waiters + */ mutex_lock(&ac->ac_lg->lg_mutex); } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 09e3c56..08bdbf9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -576,6 +576,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) ei->i_reserved_meta_blocks = 0; ei->i_allocated_meta_blocks = 0; ei->i_delalloc_reserved_flag = 0; + ei->lg = NULL; spin_lock_init(&(ei->i_block_reservation_lock)); return &ei->vfs_inode; } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Bug in delayed allocation: really bad block layouts! 2008-08-13 10:52 ` Aneesh Kumar K.V @ 2008-08-14 21:49 ` Mingming Cao 0 siblings, 0 replies; 15+ messages in thread From: Mingming Cao @ 2008-08-14 21:49 UTC (permalink / raw) To: Aneesh Kumar K.V; +Cc: Theodore Tso, linux-ext4 在 2008-08-13三的 16:22 +0530,Aneesh Kumar K.V写道: > On Tue, Aug 12, 2008 at 10:32:05PM -0400, Theodore Tso wrote: > > On Mon, Aug 11, 2008 at 11:45:24PM +0530, Aneesh Kumar K.V wrote: > > > On Mon, Aug 11, 2008 at 08:09:12PM +0530, Aneesh Kumar K.V wrote: > > > > Can you try this patch ? The patch make group preallocation use the goal > > > > block. > > > > > > > > > > Results with and without patch. > > > > > > http://www.radian.org/~kvaneesh/ext4/lg-fragmentation/ > > > > > > > My results match yours; seems to be a bit better, but it's not fixing > > the fundamental problem. With the patch: > > > > 26524: expecting 638190 actual extent phys 631960 log 1 len 1 > > 26527: expecting 638191 actual extent phys 631963 log 1 len 1 > > 26533: expecting 638192 actual extent phys 631976 log 1 len 5 > > 26534: expecting 638193 actual extent phys 631981 log 1 len 2 > > 26536: expecting 638194 actual extent phys 631984 log 1 len 6 > > 26538: expecting 638195 actual extent phys 631991 log 1 len 5 > > 26540: expecting 638196 actual extent phys 631997 log 1 len 2 > > 26545: expecting 638197 actual extent phys 632009 log 1 len 1 > > 26546: expecting 638198 actual extent phys 632010 log 1 len 6 > > 26604: expecting 638199 actual extent phys 632156 log 1 len 1 > > > > Useing debugfs's stat command to look at the blocks: > > > > 26524: (0):638189, (1):631960 > > 26527: (0):638190, (1):631963 > > 26533: (0):638191, (1-5):631976-631980 > > 26534: (0):638192, (1-2):631981-631982 > > 26536: (0):638193, (1-6):631984-631989 > > 26538: (0):638194, (1-5):631991-631995 > > 26540: (0):638195, (1-2):631997-631998 > > 26545: (0):638196, (1):632009 > > 26546: (0):638197, (1-6):632010-632015 > > I am not sure why we are getting single block request for inodes > 26524 etc. With delayed alloc we should have got 2 block request. > > > > > Out of curiosity, I also probed the inode numbers that were out of > > sequence from above. They seem to be mostly allocating out of the > > numbers used for the second extent, above. > > > > 26526: (0):631961 > > 26526: (0):631962 > > 26528: (0):631964 > > 26529: (0):411742 > > 26530: (0):631965 > > 26531: (0-1):631966-631967 > > 26532: (0-7):631968-631975 > > 26535: (0):631983 > > 26537: (0):631990 > > 26541: (0-7):631999-632006 > > 26542: (0):632007 > > 26543: (0):632008 > > 26544: (0):411743 > > 26547: (0):632016 > > > > Inode Pathname > > 26524 /lib/rhythmbox/plugins/lyrics/LyricsConfigureDialog.py > > 26525 /lib/rhythmbox/plugins/lyrics/LyrcParser.py > > 26526 /lib/rhythmbox/plugins/lyrics/LyricsParse.py > > 26527 /lib/rhythmbox/plugins/lyrics/LyricsConfigureDialog.pyc > > 26528 /lib/rhythmbox/plugins/lyrics/WinampcnParser.py > > 26529 /lib/rhythmbox/plugins/magnatune > > 26530 /lib/rhythmbox/plugins/magnatune/magnatune_logo_color_small.png > > 26531 /lib/rhythmbox/plugins/magnatune/magnatune.rb-plugin > > 26532 /lib/rhythmbox/plugins/magnatune/magnatune-prefs.glade > > 26533 /lib/rhythmbox/plugins/magnatune/MagnatuneSource.pyc > > 26534 /lib/rhythmbox/plugins/magnatune/__init__.py > > 26535 /lib/rhythmbox/plugins/magnatune/BuyAlbumHandler.py > > 26536 /lib/rhythmbox/plugins/magnatune/magnatune-purchase.glade > > 26537 /lib/rhythmbox/plugins/magnatune/TrackListHandler.py > > 26538 /lib/rhythmbox/plugins/magnatune/MagnatuneSource.py > > 26539 /lib/rhythmbox/plugins/magnatune/magnatune_logo_color_tiny.png > > 26540 /lib/rhythmbox/plugins/magnatune/__init__.pyc > > 26541 /lib/rhythmbox/plugins/magnatune/magnatune-loading.glade > > 26542 /lib/rhythmbox/plugins/magnatune/TrackListHandler.pyc > > 26543 /lib/rhythmbox/plugins/magnatune/BuyAlbumHandler.pyc > > 26544 /lib/rhythmbox/plugins/audioscrobbler > > 26546 /lib/rhythmbox/plugins/audioscrobbler/audioscrobbler-prefs.glade > > 26547 /lib/rhythmbox/plugins/audioscrobbler/audioscrobbler-ui.xml > > > > Looks like we still have some problems with the block allocator... > > The problem is with delalloc and mballoc locality group. With delalloc > we use pdflush to write the pages. Small file allocation use a per-cpu > prealloc space. In my understanding using Per-CPU prealloc space is > fine without delalloc. Because without delalloc get_block happens in the > process context at write_begin and OS scheduler will not schedule the > task to other CPU unless needed. > > With delalloc we have pdflush doing block allocation and using per-cpu > may not really help here. I wonder if it still make sense for using per~cpu group locality allocation with delalloc, with the fact that all block allocation is done via pdflush? > So i tried a small patch as below. But that > didn't help much. Also the patch would increase contention on the > locality group mutex. So i guess the change is not worth. > > But with delalloc we should have got multiple block request together. > That implies we should get a single get_block request for the whole > file. I will have to instrument the kernel to understand why it is not > happening. > I am courious to know this too. Why we get single block allocation request for delalloc? Mingming -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-08-18 10:50 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-10 17:30 Bug in delayed allocation: really bad block layouts! Theodore Ts'o 2008-08-10 17:54 ` Eric Sandeen 2008-08-10 18:22 ` Theodore Tso 2008-08-10 18:54 ` Eric Sandeen 2008-08-10 20:04 ` Eric Sandeen 2008-08-11 1:46 ` Theodore Tso 2008-08-11 5:36 ` Eric Sandeen 2008-08-18 10:50 ` Andreas Dilger 2008-08-10 18:28 ` Theodore Tso 2008-08-11 7:55 ` Aneesh Kumar K.V 2008-08-11 14:39 ` Aneesh Kumar K.V 2008-08-11 18:15 ` Aneesh Kumar K.V 2008-08-13 2:32 ` Theodore Tso 2008-08-13 10:52 ` Aneesh Kumar K.V 2008-08-14 21:49 ` Mingming Cao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).