* file allocation problem @ 2009-07-16 11:31 Stephan Kulow 2009-07-16 15:58 ` Theodore Tso 0 siblings, 1 reply; 13+ messages in thread From: Stephan Kulow @ 2009-07-16 11:31 UTC (permalink / raw) To: linux-ext4 Hi, I played around with ext4 online defrag on 2.6.31-rc3 and noticed a problem. The core is this: # filefrag -v /usr/bin/gimp-2.6 File size of /usr/bin/gimp-2.6 is 4677400 (1142 blocks, blocksize 4096) ext logical physical expected length flags 0 0 2884963 29 1 29 2890819 2884991 29 2 58 2906960 2890847 62 3 120 2893864 2907021 29 4 149 2898531 2893892 29 5 178 2887012 2898559 28 6 206 2887261 2887039 27 7 233 2888229 2887287 27 8 260 2907727 2888255 49 9 309 2907811 2907775 90 10 399 2889078 2907900 26 11 425 2890641 2889103 26 12 451 2908065 2890666 31 13 482 2908136 2908095 33 14 515 2908170 2908168 54 15 569 2908257 2908223 31 16 600 2908378 2908287 38 17 638 2886399 2908415 25 18 663 2908646 2886423 26 19 689 2909129 2908671 56 20 745 2909186 2909184 62 21 807 2909281 2909247 31 22 838 2902503 2909311 25 23 863 103690 2902527 161 24 1024 109621 103850 118 eof /usr/bin/gimp-2.6: 25 extents found ext4 defragmentation for /usr/bin/gimp-2.6 [1/1]/usr/bin/gimp-2.6: 100% extents: 25 -> 25 [ OK ] Success: [1/1] (filefrag will output very much the same now) But now the really interesting part starts: when I copy away that file (as far as I understand the code, e4defrag allocates space in /usr/bin too), I get: cp -a /usr/bin/gimp-2.6{,.defrag} (I have 50% free, so I expect it to find room): filefrag -v /usr/bin/gimp-2.6.defrag File size of /usr/bin/gimp-2.6.defrag is 4677400 (1142 blocks, blocksize 4096) ext logical physical expected length flags 0 0 452952 40 1 40 439168 452991 32 2 72 442912 439199 32 3 104 448544 442943 32 4 136 449472 448575 32 5 168 453920 449503 32 6 200 429625 453951 31 7 231 430714 429655 31 8 262 435296 430744 31 9 293 454842 435326 31 10 324 436410 454872 29 11 353 426832 436438 28 12 381 453651 426859 27 13 408 447705 453677 25 14 433 436510 447729 23 15 456 442421 436532 23 16 479 451098 442443 23 17 502 447082 451120 22 18 524 451647 447103 22 19 546 437950 451668 21 20 567 439293 437970 21 21 588 454464 439313 21 22 609 455776 454484 21 23 630 454624 455796 20 24 650 450592 454643 18 25 668 451136 450609 18 26 686 452305 451153 18 27 704 427088 452322 16 28 720 427568 427103 16 29 736 427952 427583 16 30 752 427984 427967 16 31 768 650240 427999 256 32 1024 634851 650495 69 33 1093 633344 634919 49 eof /usr/bin/gimp-2.6.defrag: 34 extents found Now that I call fragmented! Calling e4defrag again gives me 34->28 and now it moved _parts_ .. 24 781 478136 480191 56 25 837 475850 478191 54 26 891 1836751 475903 133 27 1024 1875978 1836883 118 eof /usr/bin/gimp-2.6.defrag: 28 extents found This looks really strange to me, is this a problem with my very file system or a bug? Greetings, Stephan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-16 11:31 file allocation problem Stephan Kulow @ 2009-07-16 15:58 ` Theodore Tso 2009-07-16 17:43 ` Stephan Kulow 0 siblings, 1 reply; 13+ messages in thread From: Theodore Tso @ 2009-07-16 15:58 UTC (permalink / raw) To: Stephan Kulow; +Cc: linux-ext4 On Thu, Jul 16, 2009 at 01:31:17PM +0200, Stephan Kulow wrote: > Hi, > > I played around with ext4 online defrag on 2.6.31-rc3 and noticed a problem. > The core is this: Was your filesystem originally an ext3 filesystme which was converted over to ext4? What features are currently enabled (sending a copy of the output of "dumpe2fs -h /dev/XXX" would be helpful.) If it is the case that this was originally an ext3 filesystem, e4defrag does have some definite limitations that will prevent it from doing a great job in such a case. I'm guessing that's what's going on here. > Now that I call fragmented! Calling e4defrag again gives me > 34->28 and now it moved _parts_ I'm not sure what you mean by moving _parts_? - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-16 15:58 ` Theodore Tso @ 2009-07-16 17:43 ` Stephan Kulow 2009-07-17 1:12 ` Theodore Tso 0 siblings, 1 reply; 13+ messages in thread From: Stephan Kulow @ 2009-07-16 17:43 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 On Thursday 16 July 2009 17:58:32 Theodore Tso wrote: > On Thu, Jul 16, 2009 at 01:31:17PM +0200, Stephan Kulow wrote: > > Hi, > > > > I played around with ext4 online defrag on 2.6.31-rc3 and noticed a > > problem. The core is this: > > Was your filesystem originally an ext3 filesystme which was converted > over to ext4? What features are currently enabled (sending a copy of Yes, it was converted quite some time ago. > the output of "dumpe2fs -h /dev/XXX" would be helpful.) Filesystem volume name: <none> Last mounted on: /root Filesystem UUID: ec4454af-a8db-42ad-9627-19c9c17a0220 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 853440 Block count: 3409788 Reserved block count: 170489 Free blocks: 1156411 Free inodes: 615319 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 832 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8128 Inode blocks per group: 508 Filesystem created: Fri Dec 12 17:01:57 2008 Last mount time: Thu Jul 16 19:30:26 2009 Last write time: Thu Jul 16 19:30:26 2009 Mount count: 718 Maximum mount count: -1 Last checked: Thu Jan 29 15:01:57 2009 Check interval: 0 (<none>) Lifetime writes: 5211 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 First orphan inode: 650850 Default directory hash: half_md4 Directory Hash Seed: a262693d-9659-4212-8e5b-5901140edff8 Journal backup: inode blocks Journal size: 128M > > If it is the case that this was originally an ext3 filesystem, > e4defrag does have some definite limitations that will prevent it from > doing a great job in such a case. I'm guessing that's what's going on > here. My problem is not so much with what e4defrag does, but the fact that a new file I create with cp(1) contains 34 extents. > > > Now that I call fragmented! Calling e4defrag again gives me > > 34->28 and now it moved _parts_ > > I'm not sure what you mean by moving _parts_? It moved a couple of blocks from 6XXX to 10XXX and most extents stayed in the area where they were (I guess close to the rest of /usr/bin?) Greetings, Stephan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-16 17:43 ` Stephan Kulow @ 2009-07-17 1:12 ` Theodore Tso 2009-07-17 4:32 ` Andreas Dilger 2009-07-17 5:17 ` Stephan Kulow 0 siblings, 2 replies; 13+ messages in thread From: Theodore Tso @ 2009-07-17 1:12 UTC (permalink / raw) To: Stephan Kulow; +Cc: linux-ext4 On Thu, Jul 16, 2009 at 07:43:21PM +0200, Stephan Kulow wrote: > > If it is the case that this was originally an ext3 filesystem, > > e4defrag does have some definite limitations that will prevent it from > > doing a great job in such a case. I'm guessing that's what's going on > > here. > My problem is not so much with what e4defrag does, but the fact that > a new file I create with cp(1) contains 34 extents. Well, because your filesystem is still fragmented; you asked e4defrag to defragment a single file. In fact, it wasn't able to do much -- the file previously had 25 extents, and the new file had 25 extents. E4defrag is quite new, and still needs a lot of polishing; I'm not sure it should have tried to swap files when the newly allocated file has the same number of extents. This might be a case of changing a ">=" to ">" in code. The reason why "cp" still created a file with 34 extents is because the free space was still fragmented. As I said, e4defrag is quite primitive; it doesn't know how to defrag free space; it simply tries to reduce the number of extents for each file, on a file-by-file basis. The other problem is that an ext3 filesystem that has been converted to ext4 does not have the flex_bg feature. This is a feature that, when set at when the file system is formatted, creates a higher order flex_bg which combines several block groups into a bigger allocation group, a flex_bg. This helps avoid fragmentation, especially for directories like /usr/bin which typically have more than 128 megs (a single block group) worth of files in it. Using an ext3 filesystem format, the filesystem driver will first try to find space in the home block group of the directory, and if there is no space there, it will look in other block groups. With a freshly formatted ext4 filesystem, the allocation group is the flex_bg, which is much larger, and which gives us a better opportunity for allocating contiguous blocks. I suspect we could do better with our allocator in this case; maybe should use a flex_bg to give the block group allocator a bigger set of block groups to search. The inode tables will still not be optimally laid out for flex_bg, but we might still be better off. Or, if the block group is terribly fragmented, maybe we should have the allocator find some other bg, even if it isn't the ideal block group close to the directory. According to the dumpe2fs output, the filesystem is only 66% or so full, so there's probably some possibly completely unused block groups we should be using instead. One of the things that we have _not_ had time to do is optimize the block allocator for heavily fragimented filesystems, especially for fragmented filesystems that had been converted from ext3 filesystems. In any case, I don't anything went _wrong_ per se, just that both e4defrag and our block allocator are insufficiently smart to help improve things for you given your current filesystem. A backup, reformat, and restore will result in a filesystem that works far better. Out of curiosity, what sort of workload had the file system received? It looks like the filesystem hadn't been created that long ago, so it's bit surprising it was so fragmented. Were you perhaps updating your system (by doing a yum update or apt-get update) very frequently, perhaps? - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-17 1:12 ` Theodore Tso @ 2009-07-17 4:32 ` Andreas Dilger 2009-07-17 5:31 ` Stephan Kulow 2009-07-17 5:17 ` Stephan Kulow 1 sibling, 1 reply; 13+ messages in thread From: Andreas Dilger @ 2009-07-17 4:32 UTC (permalink / raw) To: Theodore Tso; +Cc: Stephan Kulow, linux-ext4 On Jul 16, 2009 21:12 -0400, Theodore Ts'o wrote: > On Thu, Jul 16, 2009 at 07:43:21PM +0200, Stephan Kulow wrote: > > My problem is not so much with what e4defrag does, but the fact that > > a new file I create with cp(1) contains 34 extents. > > The other problem is that an ext3 filesystem that has been converted > to ext4 does not have the flex_bg feature. This is a feature that, > when set at when the file system is formatted, creates a higher order > flex_bg which combines several block groups into a bigger allocation > group, a flex_bg. This helps avoid fragmentation, especially for > directories like /usr/bin which typically have more than 128 megs (a > single block group) worth of files in it. It seems quite odd to me that mballoc didn't find enough contiguous free space for this relatively small file. It might be worthwhile to look at (though not necessarily post) the output from the file /sys/fs/ext4/{dev}/mb_groups (or "dumpe2fs" has equivalent data) and see if there are groups with a lot of contiguous free space. In the mb_groups file this would be numbers in the 2^{high} column. I don't agree that flex_bg is necessary to have good block allocation, since we do get about 125MB per group. Maybe mballoc is being constrained to look at too few block groups in this case? Looking at /sys/fs/ext4/{dev}/mb_history under the "groups" column will tell how many groups were scanned to find that allocation, and the "original" and "result" will show group/grpblock/count@logblock for recent writes. $ dd if=/dev/zero of=/myth/tmp/foo bs=1M count=1 pid inode original goal result 4423 110359 3448/14336/256@0 1646/18944/256@0 1646/19456/256@0 You might also try to create a new temp directory elsewhere on the filesystem, copy the file over to the temp directory, and then see if it is less fragmented in the new directory. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-17 4:32 ` Andreas Dilger @ 2009-07-17 5:31 ` Stephan Kulow 0 siblings, 0 replies; 13+ messages in thread From: Stephan Kulow @ 2009-07-17 5:31 UTC (permalink / raw) To: Andreas Dilger, linux-ext4 On Friday 17 July 2009 06:32:42 Andreas Dilger wrote: Hi, > It seems quite odd to me that mballoc didn't find enough contiguous > free space for this relatively small file. It might be worthwhile > to look at (though not necessarily post) the output from the file > /sys/fs/ext4/{dev}/mb_groups (or "dumpe2fs" has equivalent data) > and see if there are groups with a lot of contiguous free space. > In the mb_groups file this would be numbers in the 2^{high} column. I'm not sure what you expect with "a lot", so I pasted the full file (that happens to be in /proc/fs here): http://ktown.kde.org/~coolo/sda6 > > I don't agree that flex_bg is necessary to have good block allocation, > since we do get about 125MB per group. Maybe mballoc is being > constrained to look at too few block groups in this case? Looking at > /sys/fs/ext4/{dev}/mb_history under the "groups" column will tell how > many groups were scanned to find that allocation, and the "original" > and "result" will show group/grpblock/count@logblock for recent writes. > > $ dd if=/dev/zero of=/myth/tmp/foo bs=1M count=1 > > pid inode original goal result > 4423 110359 3448/14336/256@0 1646/18944/256@0 1646/19456/256@0 > > You might also try to create a new temp directory elsewhere on the > filesystem, copy the file over to the temp directory, and then see > if it is less fragmented in the new directory. > cp /usr/bin/gimp-2.6{.defrag}: 31548 106916 13/0/1142@0 13/0/1024@0 13/24152/59@0 201 1 2 1056 0 0 31548 106916 13/24211/1083@59 13/24211/965@59 13/26192/41@59 201 1 2 1568 0 0 31548 106916 13/26233/1042@100 13/26233/924@100 13/21777/34@100 201 1 2 1568 0 0 31548 106916 13/21811/1008@134 13/21811/890@134 13/6688/32@134 201 1 2 1568 0 0 31548 106916 13/6720/976@166 13/6720/858@166 13/10944/32@166 201 1 2 1568 0 0 31548 106916 13/6720/1@0 13/6720/1@0 13/513/1@0 1 1 1 1024 0 0 31548 106916 13/10976/944@198 13/10976/826@198 13/16896/32@198 201 1 2 1568 0 0 31548 106916 13/16928/912@230 13/16928/794@230 13/12564/31@230 201 1 2 1568 0 0 31548 106916 13/12595/881@261 13/12595/763@261 13/12724/31@261 201 1 2 1568 0 0 31548 106916 13/12755/850@292 13/12755/732@292 13/31700/31@292 201 1 2 1568 0 0 31548 106916 13/31731/819@323 13/31731/701@323 13/18103/30@323 201 1 2 1568 0 0 31548 106916 13/18133/789@353 13/18133/671@353 13/21691/30@353 201 1 2 1568 0 0 31548 106916 13/21721/759@383 13/21721/641@383 13/25881/30@383 201 1 2 1568 0 0 31548 106916 13/25911/729@413 13/25911/611@413 13/22196/29@413 201 1 2 1568 0 0 31548 106916 13/22225/700@442 13/22225/582@442 13/31380/29@442 201 1 2 1568 0 0 31548 106916 13/31409/671@471 13/31409/553@471 13/12954/27@471 201 2 2 1568 0 0 31548 106916 13/12981/644@498 13/12981/526@498 13/18176/27@498 201 2 2 1568 0 0 31548 106916 13/18203/617@525 13/18203/499@525 13/15161/26@525 201 2 2 1568 0 0 31548 106916 13/15187/591@551 13/15187/473@551 13/17625/26@551 201 2 2 1568 0 0 31548 106916 13/17651/565@577 13/17651/447@577 13/19936/26@577 201 2 2 1568 0 0 31548 106916 13/19962/539@603 13/19962/421@603 13/20247/26@603 201 2 2 1568 0 0 31548 106916 13/20273/513@629 13/20273/395@629 13/23515/26@629 201 2 2 1568 0 0 31548 106916 13/23541/487@655 13/23541/369@655 13/9949/25@655 201 2 2 1568 0 0 31548 106916 13/9974/462@680 13/9974/344@680 13/19832/25@680 201 2 2 1568 0 0 31548 106916 13/19857/437@705 13/19857/319@705 13/29244/25@705 201 2 2 1568 0 0 31548 106916 13/29269/412@730 13/29269/294@730 13/1344/24@730 201 2 2 1568 0 0 31548 106916 13/1368/388@754 13/1368/270@754 13/11776/23@754 201 2 2 1568 0 0 31548 106916 13/11799/365@777 13/11799/247@777 14/3104/26@777 201 2 2 1568 0 0 31548 106916 14/3130/339@803 14/3130/221@803 14/9984/50@803 201 1 2 1568 0 0 31548 106916 14/10034/289@853 14/10034/171@853 14/11264/46@853 201 1 2 1568 0 0 31548 106916 14/11310/243@899 14/11310/125@899 58/1024/125@899 11 1 1 1568 125 128 31548 106916 58/1149/118@1024 58/1149/1024@1024 58/17408/445@1024 201 2 2 1568 0 0 filefrag: 59 extents. cp /usr/bin/gimp-2.6 /tmp/nd/ 25449 650578 80/0/1@0 80/0/1@0 80/589/1@0 4 1 1 0 0 0 Filesystem type is: ef53 File size of /tmp/nd/gimp-2.6 is 4677400 (1142 blocks, blocksize 4096) ext logical physical expected length flags 0 0 2638592 588 1 588 2628896 2639179 436 2 1024 2637846 2629331 118 eof /tmp/nd/gimp-2.6: 3 extents found Greetings, Stephan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-17 1:12 ` Theodore Tso 2009-07-17 4:32 ` Andreas Dilger @ 2009-07-17 5:17 ` Stephan Kulow 2009-07-17 14:26 ` Theodore Tso 1 sibling, 1 reply; 13+ messages in thread From: Stephan Kulow @ 2009-07-17 5:17 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 On Friday 17 July 2009 03:12:19 you wrote: > On Thu, Jul 16, 2009 at 07:43:21PM +0200, Stephan Kulow wrote: > > > If it is the case that this was originally an ext3 filesystem, > > > e4defrag does have some definite limitations that will prevent it from > > > doing a great job in such a case. I'm guessing that's what's going on > > > here. > > > > My problem is not so much with what e4defrag does, but the fact that > > a new file I create with cp(1) contains 34 extents. > Hi, > > The reason why "cp" still created a file with 34 extents is because > the free space was still fragmented. As I said, e4defrag is quite > primitive; it doesn't know how to defrag free space; it simply tries > to reduce the number of extents for each file, on a file-by-file > basis. Well, is there a tool to check the overall state of the file system? I can't really believe it's 1010101010, but it's hard to say without a picture :) > > The other problem is that an ext3 filesystem that has been converted > to ext4 does not have the flex_bg feature. This is a feature that, > when set at when the file system is formatted, creates a higher order > flex_bg which combines several block groups into a bigger allocation > group, a flex_bg. This helps avoid fragmentation, especially for > directories like /usr/bin which typically have more than 128 megs (a > single block group) worth of files in it. Oh, I enabled flex_bg after you asked, rebooted to get a e2fsck - and I still get 34 extents for my gimp-2.6.defrag. From what I understand, this doesn't help in the after fact, but then again how am I supposed to fix my file system if even new files are created fragmented. > In any case, I don't anything went _wrong_ per se, just that both > e4defrag and our block allocator are insufficiently smart to help > improve things for you given your current filesystem. A backup, > reformat, and restore will result in a filesystem that works far > better. I believe that, but my hope for online defrag was not having to rely on this 80ties defrag method :) > > Out of curiosity, what sort of workload had the file system received? > It looks like the filesystem hadn't been created that long ago, so > it's bit surprising it was so fragmented. Were you perhaps updating > your system (by doing a yum update or apt-get update) very frequently, > perhaps? Yes, that's what I'm doing. I'm updating about every file in this file system every second day by means of rpm packages (openSUSE calls it factory, you will now it as rawhide). Greetings, Stephan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-17 5:17 ` Stephan Kulow @ 2009-07-17 14:26 ` Theodore Tso 2009-07-17 18:02 ` Stephan Kulow 0 siblings, 1 reply; 13+ messages in thread From: Theodore Tso @ 2009-07-17 14:26 UTC (permalink / raw) To: Stephan Kulow; +Cc: linux-ext4 On Fri, Jul 17, 2009 at 07:17:12AM +0200, Stephan Kulow wrote: > Well, is there a tool to check the overall state of the file system? I can't > really believe it's 1010101010, but it's hard to say without a picture :) Well, you can check the fragmentation of the free space by using dumpe2fs and looking at the free blocks in each block group. > > The other problem is that an ext3 filesystem that has been converted > > to ext4 does not have the flex_bg feature. This is a feature that, > > when set at when the file system is formatted, creates a higher order > > flex_bg which combines several block groups into a bigger allocation > > group, a flex_bg. This helps avoid fragmentation, especially for > > directories like /usr/bin which typically have more than 128 megs (a > > single block group) worth of files in it. > > Oh, I enabled flex_bg after you asked, rebooted to get a e2fsck - > and I still get 34 extents for my gimp-2.6.defrag. From what I > understand, this doesn't help in the after fact, but then again how > am I supposed to fix my file system if even new files are created > fragmented. Well, it's actually not enough to enable flex_bg filesystem feature; you need to also set the flex_bg size, like this: debugfs -w /dev/XXX debugfs: ssv log_groups_per_flex 4 debugfs: quit (And no, this isn't something which we've done a lot of testing on.) And this isn't necessarily going to help; if 16 block groups around (2**4) for the flex_bg for the /usr/bin directory are all badly fragmented, then when you create new files in /usr/bin, it will still be fragmented. > > In any case, I don't anything went _wrong_ per se, just that both > > e4defrag and our block allocator are insufficiently smart to help > > improve things for you given your current filesystem. A backup, > > reformat, and restore will result in a filesystem that works far > > better. > > I believe that, but my hope for online defrag was not having to rely on this > 80ties defrag method :) Yeah, sorry, online defrag is a very new feature. It will hopefully get better, but it's matter of resources. Ultimately, though, the problem is that the ext3 allocation algorithms are very different (and far more primitive) than the ext4 allocation algorithms. So undoing the ext3 allocation algorithm decisions is going to be non-trivial, and even if we can eventually get e4defrag to the point where it can do this on the whole filesystem, I suspect backup/reformat/restore will almost always be faster. > > Out of curiosity, what sort of workload had the file system received? > > It looks like the filesystem hadn't been created that long ago, so > > it's bit surprising it was so fragmented. Were you perhaps updating > > your system (by doing a yum update or apt-get update) very frequently, > > perhaps? > > Yes, that's what I'm doing. I'm updating about every file in this > file system every second day by means of rpm packages (openSUSE > calls it factory, you will now it as rawhide). Unfortunately, constantly updating every single file on a daily basis is a very effective way of seriously aging a filesystem. The ext4 allocator tries to keep files aligned on power of two boundaries, which tends to help this a lot (although this means that dumpe2fs -h will show a bunch of holes that makes the free space look more fragmented than it really is), but the ext3 allocator doesn't have any such smarts on it. - Ted ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-17 14:26 ` Theodore Tso @ 2009-07-17 18:02 ` Stephan Kulow 2009-07-17 21:14 ` Andreas Dilger 0 siblings, 1 reply; 13+ messages in thread From: Stephan Kulow @ 2009-07-17 18:02 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4 On Friday 17 July 2009 16:26:28 Theodore Tso wrote: > And this isn't necessarily going to help; if 16 block groups around > (2**4) for the flex_bg for the /usr/bin directory are all badly > fragmented, then when you create new files in /usr/bin, it will still > be fragmented. Yeah, but even the file in /tmp/nd got 3 extents. my file is 1142 blocks and my mb_groups says 2**9 is the highest possible value. So I guess I will indeed try to create the file system from scratch to test the allocator for real. > > > > In any case, I don't anything went _wrong_ per se, just that both > > > e4defrag and our block allocator are insufficiently smart to help > > > improve things for you given your current filesystem. A backup, > > > reformat, and restore will result in a filesystem that works far > > > better. > > > > I believe that, but my hope for online defrag was not having to rely on > > this 80ties defrag method :) > > Yeah, sorry, online defrag is a very new feature. It will hopefully > get better, but it's matter of resources. Ultimately, though, the > problem is that the ext3 allocation algorithms are very different (and > far more primitive) than the ext4 allocation algorithms. So undoing > the ext3 allocation algorithm decisions is going to be non-trivial, > and even if we can eventually get e4defrag to the point where it can > do this on the whole filesystem, I suspect backup/reformat/restore > will almost always be faster. I don't have any kind of experience in that field, but would it possible to allocate a big file that would get all all the free blocks and then move the extents of one group into it, basically freeing all blocks of one group so it can be used purely by ext4 allocation? Or even go as far and pack the blocks of every group. As far as I see there is no way with the current ioctl interface to achieve that once your file system is fragmented enough because the allocator will always create new files as fragemented and the ioctl can only move extents from one fragemented to another fragemented. And yes, backup/restore might be faster, but it's also the far more interruptive action than leaving defrag running over night. > > > > Out of curiosity, what sort of workload had the file system received? > > > It looks like the filesystem hadn't been created that long ago, so > > > it's bit surprising it was so fragmented. Were you perhaps updating > > > your system (by doing a yum update or apt-get update) very frequently, > > > perhaps? > > > > Yes, that's what I'm doing. I'm updating about every file in this > > file system every second day by means of rpm packages (openSUSE > > calls it factory, you will now it as rawhide). > > Unfortunately, constantly updating every single file on a daily basis > is a very effective way of seriously aging a filesystem. The ext4 Of course it is, guess why I'm so interested in having it :) > allocator tries to keep files aligned on power of two boundaries, > which tends to help this a lot (although this means that dumpe2fs -h > will show a bunch of holes that makes the free space look more > fragmented than it really is), but the ext3 allocator doesn't have any > such smarts on it. But there is nothing packing the blocks if the groups get full, so these holes will always cause fragmentation once the file system gets full, right? So I guess online defragmentation first needs to pretend doing an online resize so it can use the gained free size. Now I have something to test.. :) Greetings, Stephan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-17 18:02 ` Stephan Kulow @ 2009-07-17 21:14 ` Andreas Dilger 2009-07-18 21:16 ` Stephan Kulow 2009-07-19 22:45 ` Ron Johnson 0 siblings, 2 replies; 13+ messages in thread From: Andreas Dilger @ 2009-07-17 21:14 UTC (permalink / raw) To: Stephan Kulow; +Cc: Theodore Tso, linux-ext4 On Jul 17, 2009 20:02 +0200, Stephan Kulow wrote: > On Friday 17 July 2009 16:26:28 Theodore Tso wrote: > > And this isn't necessarily going to help; if 16 block groups around > > (2**4) for the flex_bg for the /usr/bin directory are all badly > > fragmented, then when you create new files in /usr/bin, it will still > > be fragmented. > > Yeah, but even the file in /tmp/nd got 3 extents. my file is 1142 blocks > and my mb_groups says 2**9 is the highest possible value. So I guess I will > indeed try to create the file system from scratch to test the allocator for > real. The defrag code needs to become smarter, so that it finds small files in the middle of freespace and migrates those to fit into a small gap. That will allow larger files to be defragged once there is large chunks of free space. > > allocator tries to keep files aligned on power of two boundaries, > > which tends to help this a lot (although this means that dumpe2fs -h > > will show a bunch of holes that makes the free space look more > > fragmented than it really is), but the ext3 allocator doesn't have any > > such smarts on it. > But there is nothing packing the blocks if the groups get full, so these > holes will always cause fragmentation once the file system gets full, right? Well, this isn't quite correct. The mballoc code only tries to allocate "large" files on power-of-two boundaries, where large is 64kB by default, but is tunable in /proc. For smaller files it tries to pack them together into the same block, or into gaps that are exactly the size of the file. > So I guess online defragmentation first needs to pretend doing an online > resize so it can use the gained free size. Now I have something to test.. :) Yes, that would give you some good free space at the end of the filesystem. Then find the largest files in the filesystem, migrate them there, then defrag the smaller files. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-17 21:14 ` Andreas Dilger @ 2009-07-18 21:16 ` Stephan Kulow 2009-07-19 22:45 ` Ron Johnson 1 sibling, 0 replies; 13+ messages in thread From: Stephan Kulow @ 2009-07-18 21:16 UTC (permalink / raw) To: Andreas Dilger, linux-ext4 On Friday 17 July 2009 23:14:44 Andreas Dilger wrote: > > Yeah, but even the file in /tmp/nd got 3 extents. my file is 1142 blocks > > and my mb_groups says 2**9 is the highest possible value. So I guess I > > will indeed try to create the file system from scratch to test the > > allocator for real. > > The defrag code needs to become smarter, so that it finds small files > in the middle of freespace and migrates those to fit into a small gap. > That will allow larger files to be defragged once there is large chunks > of free space. Is there a way that user space can hint the allocator to fill these gaps? I don't see any obvious way. Relying on the allocator not to make matters worse might be enough, but it doesn't sound ideal. Unless something urgent comes up I might actually continue experiment next week :) My resize2fs defrag worked pretty well actually, but then again I did it on an offline copy and it won't work for online that way. Greetings, Stephan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-17 21:14 ` Andreas Dilger 2009-07-18 21:16 ` Stephan Kulow @ 2009-07-19 22:45 ` Ron Johnson 2009-07-20 21:18 ` Andreas Dilger 1 sibling, 1 reply; 13+ messages in thread From: Ron Johnson @ 2009-07-19 22:45 UTC (permalink / raw) To: linux-ext4 On 2009-07-17 16:14, Andreas Dilger wrote: [snip] > > Well, this isn't quite correct. The mballoc code only tries to allocate > "large" files on power-of-two boundaries, where large is 64kB by default, > but is tunable in /proc. For smaller files it tries to pack them together > into the same block, or into gaps that are exactly the size of the file. How does ext4 act on growing files? I.e., creating a tarball that, obviously, starts at 0 bytes and then grows to multi-GB? -- Scooty Puff, Sr The Doom-Bringer ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: file allocation problem 2009-07-19 22:45 ` Ron Johnson @ 2009-07-20 21:18 ` Andreas Dilger 0 siblings, 0 replies; 13+ messages in thread From: Andreas Dilger @ 2009-07-20 21:18 UTC (permalink / raw) To: Ron Johnson; +Cc: linux-ext4 On Jul 19, 2009 17:45 -0500, Ron Johnson wrote: > On 2009-07-17 16:14, Andreas Dilger wrote: >> Well, this isn't quite correct. The mballoc code only tries to allocate >> "large" files on power-of-two boundaries, where large is 64kB by default, >> but is tunable in /proc. For smaller files it tries to pack them together >> into the same block, or into gaps that are exactly the size of the file. > > How does ext4 act on growing files? I.e., creating a tarball that, > obviously, starts at 0 bytes and then grows to multi-GB? ext4 has "delayed allocation" (delalloc) so that no blocks are allocated during initial file writes, but rather only when RAM is running short or when the data has been sitting around for a while. Normally, if you are writing to a file with _most_ applications the IO rate is high enough that within the 5-30s memory flush interval the size of the file has grown large enough to give the allocator an idea whether the file will be small or large. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-07-20 21:19 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-16 11:31 file allocation problem Stephan Kulow 2009-07-16 15:58 ` Theodore Tso 2009-07-16 17:43 ` Stephan Kulow 2009-07-17 1:12 ` Theodore Tso 2009-07-17 4:32 ` Andreas Dilger 2009-07-17 5:31 ` Stephan Kulow 2009-07-17 5:17 ` Stephan Kulow 2009-07-17 14:26 ` Theodore Tso 2009-07-17 18:02 ` Stephan Kulow 2009-07-17 21:14 ` Andreas Dilger 2009-07-18 21:16 ` Stephan Kulow 2009-07-19 22:45 ` Ron Johnson 2009-07-20 21:18 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).