* [PATCH] reiserfs: eliminate minimum window size for bitmap searching
@ 2006-08-22 14:28 Jeff Mahoney
2006-08-22 15:33 ` David Masover
0 siblings, 1 reply; 10+ messages in thread
From: Jeff Mahoney @ 2006-08-22 14:28 UTC (permalink / raw)
To: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List,
ReiserFS List
Cc: Mike Benoit
When a file system becomes fragmented (using MythTV, for example), the
bigalloc window searching ends up causing huge performance problems. In
a file system presented by a user experiencing this bug, the file system
was 90% free, but no 32-block free windows existed on the entire file system.
This causes the allocator to scan the entire file system for each 128k write
before backing down to searching for individual blocks.
In the end, finding a contiguous window for all the blocks in a write is
an advantageous special case, but one that can be found naturally when
such a window exists anyway.
This patch removes the bigalloc window searching, and has been proven to fix
the test case described above.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
diff -ruNp linux-2.6.18-rc4.orig/fs/reiserfs/bitmap.c linux-2.6.18-rc4.orig.devel/fs/reiserfs/bitmap.c
--- linux-2.6.18-rc4.orig/fs/reiserfs/bitmap.c 2006-08-22 09:49:45.000000000 -0400
+++ linux-2.6.18-rc4.orig.devel/fs/reiserfs/bitmap.c 2006-08-22 10:19:35.000000000 -0400
@@ -1019,7 +1019,6 @@ static inline int blocknrs_and_prealloc_
b_blocknr_t finish = SB_BLOCK_COUNT(s) - 1;
int passno = 0;
int nr_allocated = 0;
- int bigalloc = 0;
determine_prealloc_size(hint);
if (!hint->formatted_node) {
@@ -1046,28 +1045,9 @@ static inline int blocknrs_and_prealloc_
hint->preallocate = hint->prealloc_size = 0;
}
/* for unformatted nodes, force large allocations */
- bigalloc = amount_needed;
}
do {
- /* in bigalloc mode, nr_allocated should stay zero until
- * the entire allocation is filled
- */
- if (unlikely(bigalloc && nr_allocated)) {
- reiserfs_warning(s, "bigalloc is %d, nr_allocated %d\n",
- bigalloc, nr_allocated);
- /* reset things to a sane value */
- bigalloc = amount_needed - nr_allocated;
- }
- /*
- * try pass 0 and pass 1 looking for a nice big
- * contiguous allocation. Then reset and look
- * for anything you can find.
- */
- if (passno == 2 && bigalloc) {
- passno = 0;
- bigalloc = 0;
- }
switch (passno++) {
case 0: /* Search from hint->search_start to end of disk */
start = hint->search_start;
@@ -1105,8 +1085,7 @@ static inline int blocknrs_and_prealloc_
new_blocknrs +
nr_allocated,
start, finish,
- bigalloc ?
- bigalloc : 1,
+ 1,
amount_needed -
nr_allocated,
hint->
--
Jeff Mahoney
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching 2006-08-22 14:28 [PATCH] reiserfs: eliminate minimum window size for bitmap searching Jeff Mahoney @ 2006-08-22 15:33 ` David Masover 2006-08-22 15:55 ` Jeff Mahoney 0 siblings, 1 reply; 10+ messages in thread From: David Masover @ 2006-08-22 15:33 UTC (permalink / raw) To: Jeff Mahoney Cc: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List, ReiserFS List, Mike Benoit Jeff Mahoney wrote: > When a file system becomes fragmented (using MythTV, for example), the > bigalloc window searching ends up causing huge performance problems. In > a file system presented by a user experiencing this bug, the file system > was 90% free, but no 32-block free windows existed on the entire file system. > This causes the allocator to scan the entire file system for each 128k write > before backing down to searching for individual blocks. Question: Would it be better to take that performance hit once, then cache the result for awhile? If we can't find enough consecutive space, such space isn't likely to appear until a lot of space is freed or a repacker is run. > In the end, finding a contiguous window for all the blocks in a write is > an advantageous special case, but one that can be found naturally when > such a window exists anyway. Hmm. Ok, I don't understand how this works, so I'll shut up. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching 2006-08-22 15:33 ` David Masover @ 2006-08-22 15:55 ` Jeff Mahoney 2006-08-22 20:25 ` David Masover 2006-08-23 0:07 ` Hans Reiser 0 siblings, 2 replies; 10+ messages in thread From: Jeff Mahoney @ 2006-08-22 15:55 UTC (permalink / raw) To: David Masover Cc: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List, ReiserFS List, Mike Benoit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 David Masover wrote: > Jeff Mahoney wrote: >> When a file system becomes fragmented (using MythTV, for example), the >> bigalloc window searching ends up causing huge performance problems. In >> a file system presented by a user experiencing this bug, the file system >> was 90% free, but no 32-block free windows existed on the entire file >> system. >> This causes the allocator to scan the entire file system for each >> 128k write >> before backing down to searching for individual blocks. > > Question: Would it be better to take that performance hit once, then > cache the result for awhile? If we can't find enough consecutive space, > such space isn't likely to appear until a lot of space is freed or a > repacker is run. The problem is that finding the window isn't really a direct function of free space, it's a function of fragmentation. You could have a 50% full file system that still can't find a 32 block window by having every other block used. I know it's an extremely unlikely case, but it demonstrates the point perfectly. >> In the end, finding a contiguous window for all the blocks in a write is >> an advantageous special case, but one that can be found naturally when >> such a window exists anyway. > > Hmm. Ok, I don't understand how this works, so I'll shut up. If the space after the end of the file has 32 or more blocks free, even without the bigalloc behavior, those blocks will be used. Also, I think the bigalloc behavior just ultimately ends up introducing even more fragmentation on an already fragmented file system. It'll keep contiguous chunks together, but those chunks can end up being spread all over the disk. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFE6yjsLPWxlyuTD7IRAuT0AJ9ssQafYPW+Gy/E/xN+LKCxamjycwCgqL6P aUbgXdn+0+K3sJhWGBWtrno= =NDyT -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching 2006-08-22 15:55 ` Jeff Mahoney @ 2006-08-22 20:25 ` David Masover 2006-08-22 21:20 ` Jeff Mahoney 2006-08-23 0:07 ` Hans Reiser 1 sibling, 1 reply; 10+ messages in thread From: David Masover @ 2006-08-22 20:25 UTC (permalink / raw) To: Jeff Mahoney Cc: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List, ReiserFS List, Mike Benoit Jeff Mahoney wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > David Masover wrote: >> Jeff Mahoney wrote: >>> When a file system becomes fragmented (using MythTV, for example), the >>> bigalloc window searching ends up causing huge performance problems. In >>> a file system presented by a user experiencing this bug, the file system >>> was 90% free, but no 32-block free windows existed on the entire file >>> system. >>> This causes the allocator to scan the entire file system for each >>> 128k write >>> before backing down to searching for individual blocks. >> Question: Would it be better to take that performance hit once, then >> cache the result for awhile? If we can't find enough consecutive space, >> such space isn't likely to appear until a lot of space is freed or a >> repacker is run. > > The problem is that finding the window isn't really a direct function of > free space, it's a function of fragmentation. You could have a 50% full > file system that still can't find a 32 block window by having every > other block used. I know it's an extremely unlikely case, but it > demonstrates the point perfectly. Maybe, but it's still not a counterpoint. No matter how fragmented a filesystem is, freeing space can open up contiguous space, whereas if space is not freed, you won't open up contiguous space. Thus, if your FS is 50% full and 100% fragmented, then you wait till space is freed, because if nothing happens, or if more space is filled in, you'll have the same problem at 60% than you did at 50%. If, however, you're at 60% full, and 10% of the space is freed, then it's fairly unlikely that you still don't have contiguous space, and it's worth it to scan once more at 50%, and again if it then drops to 40%. So, if your FS is 90% full and space is being freed, I'd think it would be worth it to scan again at 80%, 70%, and so on. I'd also imagine it would do little or nothing to constantly monitor an FS that stays mostly full -- maybe give it a certain amount of time, but if we're repacking anyway, just wait for a repacker run. It seems very unlikely that between repacker runs, activity between 86% and 94% would open up contiguous space. It's still not a direct function of freed space (as opposed to free space), but it starts to look better. I'm not endorsing one way or the other without benchmarks, though. >>> In the end, finding a contiguous window for all the blocks in a write is >>> an advantageous special case, but one that can be found naturally when >>> such a window exists anyway. >> Hmm. Ok, I don't understand how this works, so I'll shut up. > > If the space after the end of the file has 32 or more blocks free, even > without the bigalloc behavior, those blocks will be used. For what behavior -- appending? > Also, I think the bigalloc behavior just ultimately ends up introducing > even more fragmentation on an already fragmented file system. It'll keep > contiguous chunks together, but those chunks can end up being spread all > over the disk. This sounds like the NTFS strategy, which was basically to allow all hell to break loose -- above a certain chunk size. Keep chunks of a certain size contiguous, and you limit the number of seeks by quite a lot. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching 2006-08-22 20:25 ` David Masover @ 2006-08-22 21:20 ` Jeff Mahoney 2006-08-23 0:11 ` Andrew Morton 0 siblings, 1 reply; 10+ messages in thread From: Jeff Mahoney @ 2006-08-22 21:20 UTC (permalink / raw) To: David Masover Cc: Andrew Morton, Linus Torvalds, Linux Kernel Mailing List, ReiserFS List, Mike Benoit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 David Masover wrote: > Jeff Mahoney wrote: >> The problem is that finding the window isn't really a direct function of >> free space, it's a function of fragmentation. You could have a 50% full >> file system that still can't find a 32 block window by having every >> other block used. I know it's an extremely unlikely case, but it >> demonstrates the point perfectly. > > Maybe, but it's still not a counterpoint. No matter how fragmented a > filesystem is, freeing space can open up contiguous space, whereas if > space is not freed, you won't open up contiguous space. > > Thus, if your FS is 50% full and 100% fragmented, then you wait till > space is freed, because if nothing happens, or if more space is filled > in, you'll have the same problem at 60% than you did at 50%. If, > however, you're at 60% full, and 10% of the space is freed, then it's > fairly unlikely that you still don't have contiguous space, and it's > worth it to scan once more at 50%, and again if it then drops to 40%. > > So, if your FS is 90% full and space is being freed, I'd think it would > be worth it to scan again at 80%, 70%, and so on. I'd also imagine it > would do little or nothing to constantly monitor an FS that stays mostly > full -- maybe give it a certain amount of time, but if we're repacking > anyway, just wait for a repacker run. It seems very unlikely that > between repacker runs, activity between 86% and 94% would open up > contiguous space. > > It's still not a direct function of freed space (as opposed to free > space), but it starts to look better. > > I'm not endorsing one way or the other without benchmarks, though. I'd like to see benchmarks too. The goal is obviously to minimize seeks, but my feeling is that blocks that aren't entirely contiguous but are located in close enough proximity to each other so that they are all in the drive's cache anyway will perform better than 128k chunks spread all over the disk. Your solution is one possible approach, but I'd rather kill off bigalloc for reasons described below. Also, for clarification, the 128k I keep quoting is just what reiserfs_file_write() breaks larger writes into. It seems MythTV writes in large chunks (go figure, it's a streaming media application ;), so they get split up. For smaller writes, they'll go to the allocator with a request of that many blocks. reiserfs_{writepage,prepare_write,commit_write} all operate on one page (and so one block, usually) at a time. >>>> In the end, finding a contiguous window for all the blocks in a >>>> write is >>>> an advantageous special case, but one that can be found naturally when >>>> such a window exists anyway. >>> Hmm. Ok, I don't understand how this works, so I'll shut up. >> >> If the space after the end of the file has 32 or more blocks free, even >> without the bigalloc behavior, those blocks will be used. > > For what behavior -- appending? For any allocation after the first one. The allocator chooses a starting position based on the last block it knows about before the position of the write. This applies for both appends and sparse files. >> Also, I think the bigalloc behavior just ultimately ends up introducing >> even more fragmentation on an already fragmented file system. It'll keep >> contiguous chunks together, but those chunks can end up being spread all >> over the disk. > > This sounds like the NTFS strategy, which was basically to allow all > hell to break loose -- above a certain chunk size. Keep chunks of a > certain size contiguous, and you limit the number of seeks by quite a lot. The bigalloc behavior ends up reducing local fragmentation at the expense of global fragmentation. The free space of the test file system that prompted this patch was *loaded* with 31 block chunks. All of these were skipped until we backed off and searched for single block chunks - or worse, ignored the close chunks in favor of a contiguous chunk elsewhere. I don't think this is ideal behavior at all. Certainly it's better to have a contiguous chunk of 63 blocks and one block elsewhere. That lone block might only be a few blocks away and in the disk's cache already, but bigalloc doesn't take that into account either. The start of the allocation could be at the end of a bitmap group, leaving empty space where we naturally should have just grown the file. Without bigalloc, we still end up getting as many blocks together as we can in a particular bitmap before moving on to another one. It will group as many free blocks together as it can, and then try to find the next window. Bigalloc just meant that two windows of 16 blocks, a block apart, wasn't good enough. Once it's time to move on to another bitmap, the skip_busy behavior (enabled by default), will search for bitmap groups that are at least 10% free until the file system is 95% full[1]. We're already seeking anyway so this gives us the best chance of finding a group with room to grow. It also leaves room in bitmaps for existing files to grow, avoiding fragmentation there as well. It could stand to be a bit smarter though, perhaps taking into account its proximity to a neighboring bitmap group in making that determination. - -Jeff [1]: Although, the comment says 80%. One or the other is a bug. Mea culpa. - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFE63UXLPWxlyuTD7IRAg8yAJ4/sFePRtuV8b2TDA/49pMNSeyp8QCeMymb n3AnyFC2jyPe28Q16B7WhAQ= =gNSt -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching 2006-08-22 21:20 ` Jeff Mahoney @ 2006-08-23 0:11 ` Andrew Morton 2006-08-23 0:38 ` Jeff Mahoney 2006-08-23 2:46 ` Clay Barnes 0 siblings, 2 replies; 10+ messages in thread From: Andrew Morton @ 2006-08-23 0:11 UTC (permalink / raw) To: Jeff Mahoney Cc: David Masover, Linus Torvalds, Linux Kernel Mailing List, ReiserFS List, Mike Benoit I can see that the bigalloc code as-is is pretty sad, but this is a scary patch. It has the potential to cause people significant performance problems way, way ahead in the future. For example, suppose userspace is growing two files concurrently. It could be that the bigalloc code would cause one file's allocation cursor to repeatedly jump far away from the second. ie: a beneficial side-effect. Without bigalloc that side-effect is lost and the two files blocks end up all intermingled. I don't know if that scenario is realistic, but I bet there are similar accidental oddities which can occur as a result of this change. But what are they? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching 2006-08-23 0:11 ` Andrew Morton @ 2006-08-23 0:38 ` Jeff Mahoney 2006-08-23 2:46 ` Clay Barnes 1 sibling, 0 replies; 10+ messages in thread From: Jeff Mahoney @ 2006-08-23 0:38 UTC (permalink / raw) To: Andrew Morton Cc: David Masover, Linus Torvalds, Linux Kernel Mailing List, ReiserFS List, Mike Benoit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andrew Morton wrote: > I can see that the bigalloc code as-is is pretty sad, but this is a scary > patch. It has the potential to cause people significant performance > problems way, way ahead in the future. > > For example, suppose userspace is growing two files concurrently. It could > be that the bigalloc code would cause one file's allocation cursor to > repeatedly jump far away from the second. ie: a beneficial side-effect. > Without bigalloc that side-effect is lost and the two files blocks end up > all intermingled. > > I don't know if that scenario is realistic, but I bet there are similar > accidental oddities which can occur as a result of this change. > > But what are they? Bigalloc doesn't cause that effect one way or the other. You'll end up with blocks still intermingled, just in 32 block[1] chunks. It doesn't throw the cursor way out, just until the next 32 block free window. Another thread writing will do the same thing, and the blocks can end up getting intermingled in the same manner on a different part of the disk. The behavior you're describing can only be caused by bad hinting: Two files that are placed too close to each other. This patch changes the part of the allocator that is *only* responsible for finding the free bits. Where it should start looking for them is a decision made earlier in determine_search_start(). This patch just reverts the change that Chris and I submitted ages ago as part of a number of block allocator enhancements, not as a bug fix. I think I traced it to the 2.5 days, but I can't find that particular email. Neither of us anticipated the problem that MythTV users are hitting with it. Reverting it just makes that part of the allocator behave similarly to the ext[23] allocator where it just collects available blocks from a starting point. For every day use, I don't think performance should be terribly affected, and it definitely fixes the pathological case that the MythTV users were seeing. - -Jeff [1]: For simplicity, I'll continue to reference 32 blocks as the chunk size. In reality, it can be anything up to 32 blocks. - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFE66OXLPWxlyuTD7IRApXFAJ9bUsrtfmRC2kOMMWcCel4BZq6/SgCfTcsV rS6dvKc6MowiAY+r/0Jhp5A= =MwOb -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching 2006-08-23 0:11 ` Andrew Morton 2006-08-23 0:38 ` Jeff Mahoney @ 2006-08-23 2:46 ` Clay Barnes 2006-08-23 2:49 ` Andrew Morton 1 sibling, 1 reply; 10+ messages in thread From: Clay Barnes @ 2006-08-23 2:46 UTC (permalink / raw) To: Andrew Morton Cc: Jeff Mahoney, David Masover, Linus Torvalds, Linux Kernel Mailing List, ReiserFS List, Mike Benoit On 17:11 Tue 22 Aug , Andrew Morton wrote: > > I can see that the bigalloc code as-is is pretty sad, but this is a scary > patch. It has the potential to cause people significant performance > problems way, way ahead in the future. > > For example, suppose userspace is growing two files concurrently. It could > be that the bigalloc code would cause one file's allocation cursor to > repeatedly jump far away from the second. ie: a beneficial side-effect. > Without bigalloc that side-effect is lost and the two files blocks end up > all intermingled. Perhaps I mis-recall, but shouldn't delayed writes (or something along those lines) cause a case where two files are being incrementally written rare? It seems that this case would only occur if two processes were writing two files in small chunks and calling fsync constantly (*cough* evolution column resizing bug *cough*), PLUS the two would have to have the same offset (or close) for the file writes. It seems that the risk of fragmentation is a lesser danger than the full system near lock-up caused by the old behavour. --Clay > > I don't know if that scenario is realistic, but I bet there are similar > accidental oddities which can occur as a result of this change. > > But what are they? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching 2006-08-23 2:46 ` Clay Barnes @ 2006-08-23 2:49 ` Andrew Morton 0 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2006-08-23 2:49 UTC (permalink / raw) To: Clay Barnes Cc: Jeff Mahoney, David Masover, Linus Torvalds, Linux Kernel Mailing List, ReiserFS List, Mike Benoit On Tue, 22 Aug 2006 19:46:34 -0700 Clay Barnes <clay.barnes@gmail.com> wrote: > Perhaps I mis-recall, but shouldn't delayed writes (or something along > those lines) cause a case where two files are being incrementally > written rare? If we did delayed allocation, yes. But we generally don't. (Exceptions: XFS, reiser4, ext4, ext2 prototype circa 2001). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] reiserfs: eliminate minimum window size for bitmap searching 2006-08-22 15:55 ` Jeff Mahoney 2006-08-22 20:25 ` David Masover @ 2006-08-23 0:07 ` Hans Reiser 1 sibling, 0 replies; 10+ messages in thread From: Hans Reiser @ 2006-08-23 0:07 UTC (permalink / raw) To: Jeff Mahoney Cc: David Masover, Andrew Morton, Linus Torvalds, Linux Kernel Mailing List, ReiserFS List, Mike Benoit Jeff Mahoney wrote: > > > Also, I think the bigalloc behavior just ultimately ends up introducing > even more fragmentation on an already fragmented file system. It'll keep > contiguous chunks together, but those chunks can end up being spread all > over the disk. > > -Jeff > Yes, and almost as important, it makes it difficult to understand and predict the allocator, which means other optimizations become harder to do. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-08-23 2:49 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-22 14:28 [PATCH] reiserfs: eliminate minimum window size for bitmap searching Jeff Mahoney 2006-08-22 15:33 ` David Masover 2006-08-22 15:55 ` Jeff Mahoney 2006-08-22 20:25 ` David Masover 2006-08-22 21:20 ` Jeff Mahoney 2006-08-23 0:11 ` Andrew Morton 2006-08-23 0:38 ` Jeff Mahoney 2006-08-23 2:46 ` Clay Barnes 2006-08-23 2:49 ` Andrew Morton 2006-08-23 0:07 ` Hans Reiser
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox