* Re: MD write performance issue - found Catalyst patches @ 2009-10-18 10:00 mark delfman 2009-10-18 22:39 ` NeilBrown 2009-10-29 6:41 ` Neil Brown 0 siblings, 2 replies; 18+ messages in thread From: mark delfman @ 2009-10-18 10:00 UTC (permalink / raw) To: Mattias Hellström, Linux RAID Mailing List, NeilBrown [-- Attachment #1: Type: text/plain, Size: 2020 bytes --] We have tracked the performance drop to the attached two commits in 2.6.28.6. The performance never fully recovers in later kernels so I presuming that the change in the write cache is still affecting MD today. The problem for us is that although we have slowly tracked it down, we have no understanding of linux at this level and simply wouldn’t know where go from this point. Considering this seems to only effect MD and not hardware based RAID (in our tests) I thought that this would be an appropriate place to post these patches and findings. There are 2 patches which impact MD performance via a filesystem: a) commit 66c85494570396661479ba51e17964b2c82b6f39 - write-back: fix nr_to_write counter b) commit fa76ac6cbeb58256cf7de97a75d5d7f838a80b32 - Fix page writeback thinko, causing Berkeley DB slowdown 1) no patches applied into 2.6.28.5 kernel: write speed is 1.1 GB/s via xfs 2) both patches are applied into 2.6.28.5 kernel: xfs drops to circa: 680 MB/s (like in kernel 2.6.28.6 and later) 3) put only one patch: 66c85494570396661479ba51e17964b2c82b6f39 (write-back: fix nr_to_write counter) - performance goes down to circa 780 MB/s 4) put only one patch: fa76ac6cbeb58256cf7de97a75d5d7f838a80b32 (Fix page writeback thinko) - the performance is good: 1.1 GB/s (on XFS) change log for 28.6 ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.28.6 Hopefully this helps to resolve this.... Mark 2009/10/17 Mattias Hellström <hellstrom.mattias@gmail.com>: > (If I were you) I would further test the revisions between the > following and then look at the changelog for the culprit. Looks like > versions after this are just trying to regain the missing speed. > > Linux linux-tlfp 2.6.27.14-vanilla #1 SMP Fri Oct 16 00:56:25 BST 2009 > x86_64 x86_64 x86_64 GNU/Linux > > RAW: 1.1 > XFS 1.1 > > Linux linux-tlfp 2.6.27.20-vanilla #1 SMP Thu Oct 15 23:59:32 BST 2009 > x86_64 x86_64 x86_64 GNU/Linux > > RAw 1.1 GB/s > XFS: 487 MB/s > [-- Attachment #2: fa76ac6cbeb58256cf7de97a75d5d7f838a80b32.patch --] [-- Type: application/octet-stream, Size: 2047 bytes --] commit fa76ac6cbeb58256cf7de97a75d5d7f838a80b32 Author: Nick Piggin <npiggin@suse.de> Date: Thu Feb 12 04:34:23 2009 +0100 Fix page writeback thinko, causing Berkeley DB slowdown commit 3a4c6800f31ea8395628af5e7e490270ee5d0585 upstream. A bug was introduced into write_cache_pages cyclic writeout by commit 31a12666d8f0c22235297e1c1575f82061480029 ("mm: write_cache_pages cyclic fix"). The intention (and comments) is that we should cycle back and look for more dirty pages at the beginning of the file if there is no more work to be done. But the !done condition was dropped from the test. This means that any time the page writeout loop breaks (eg. due to nr_to_write == 0), we will set index to 0, then goto again. This will set done_index to index, then find done is set, so will proceed to the end of the function. When updating mapping->writeback_index for cyclic writeout, we now use done_index == 0, so we're always cycling back to 0. This seemed to be causing random mmap writes (slapadd and iozone) to start writing more pages from the LRU and writeout would slowdown, and caused bugzilla entry http://bugzilla.kernel.org/show_bug.cgi?id=12604 about Berkeley DB slowing down dramatically. With this patch, iozone random write performance is increased nearly 5x on my system (iozone -B -r 4k -s 64k -s 512m -s 1200m on ext2). Signed-off-by: Nick Piggin <npiggin@suse.de> Reported-and-tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 08d2b96..11400ed 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -997,7 +997,7 @@ continue_unlock: pagevec_release(&pvec); cond_resched(); } - if (!cycled) { + if (!cycled && !done) { /* * range_cyclic: * We hit the last page and there is more work to be done: wrap [-- Attachment #3: 66c85494570396661479ba51e17964b2c82b6f39.patch --] [-- Type: application/octet-stream, Size: 2347 bytes --] commit 66c85494570396661479ba51e17964b2c82b6f39 Author: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Date: Mon Feb 2 18:33:49 2009 +0200 write-back: fix nr_to_write counter commit dcf6a79dda5cc2a2bec183e50d829030c0972aaa upstream. Commit 05fe478dd04e02fa230c305ab9b5616669821dd3 introduced some @wbc->nr_to_write breakage. It made the following changes: 1. Decrement wbc->nr_to_write instead of nr_to_write 2. Decrement wbc->nr_to_write _only_ if wbc->sync_mode == WB_SYNC_NONE 3. If synced nr_to_write pages, stop only if if wbc->sync_mode == WB_SYNC_NONE, otherwise keep going. However, according to the commit message, the intention was to only make change 3. Change 1 is a bug. Change 2 does not seem to be necessary, and it breaks UBIFS expectations, so if needed, it should be done separately later. And change 2 does not seem to be documented in the commit message. This patch does the following: 1. Undo changes 1 and 2 2. Add a comment explaining change 3 (it very useful to have comments in _code_, not only in the commit). Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Acked-by: Nick Piggin <npiggin@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 11400ed..0c4100e 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -981,13 +981,22 @@ continue_unlock: } } - if (wbc->sync_mode == WB_SYNC_NONE) { - wbc->nr_to_write--; - if (wbc->nr_to_write <= 0) { - done = 1; - break; - } + if (nr_to_write > 0) + nr_to_write--; + else if (wbc->sync_mode == WB_SYNC_NONE) { + /* + * We stop writing back only if we are not + * doing integrity sync. In case of integrity + * sync we have to keep going because someone + * may be concurrently dirtying pages, and we + * might have synced a lot of newly appeared + * dirty pages, but have not synced all of the + * old dirty pages. + */ + done = 1; + break; } + if (wbc->nonblocking && bdi_write_congested(bdi)) { wbc->encountered_congestion = 1; done = 1; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-10-18 10:00 MD write performance issue - found Catalyst patches mark delfman @ 2009-10-18 22:39 ` NeilBrown 2009-10-29 6:41 ` Neil Brown 1 sibling, 0 replies; 18+ messages in thread From: NeilBrown @ 2009-10-18 22:39 UTC (permalink / raw) To: mark delfman; +Cc: Mattias Hellström, Linux RAID Mailing List On Sun, October 18, 2009 9:00 pm, mark delfman wrote: > We have tracked the performance drop to the attached two commits in > 2.6.28.6. The performance never fully recovers in later kernels so > I presuming that the change in the write cache is still affecting MD > today. > > The problem for us is that although we have slowly tracked it down, we > have no understanding of linux at this level and simply wouldnt know > where go from this point. > > Considering this seems to only effect MD and not hardware based RAID > (in our tests) I thought that this would be an appropriate place to > post these patches and findings. > > There are 2 patches which impact MD performance via a filesystem: > > a) commit 66c85494570396661479ba51e17964b2c82b6f39 - write-back: fix > nr_to_write counter > b) commit fa76ac6cbeb58256cf7de97a75d5d7f838a80b32 - Fix page > writeback thinko, causing Berkeley DB slowdown > > > 1) no patches applied into 2.6.28.5 kernel: write speed is 1.1 GB/s via > xfs > 2) both patches are applied into 2.6.28.5 kernel: xfs drops to circa: > 680 MB/s (like in kernel 2.6.28.6 and later) > 3) put only one patch: 66c85494570396661479ba51e17964b2c82b6f39 > (write-back: fix nr_to_write counter) - performance goes down to circa > 780 MB/s > 4) put only one patch: fa76ac6cbeb58256cf7de97a75d5d7f838a80b32 (Fix > page writeback thinko) - the performance is good: 1.1 GB/s (on XFS) > > change log for 28.6 > ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.28.6 > > > Hopefully this helps to resolve this.... Hopefully it will... Thanks for tracking this down. It is certainly easier to work out what is happening when you have a small patch that makes the difference. I'll see what I (or others) can discover. Thanks, NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-10-18 10:00 MD write performance issue - found Catalyst patches mark delfman 2009-10-18 22:39 ` NeilBrown @ 2009-10-29 6:41 ` Neil Brown 2009-10-29 6:48 ` Thomas Fjellstrom 2009-10-29 8:08 ` Asdo 1 sibling, 2 replies; 18+ messages in thread From: Neil Brown @ 2009-10-29 6:41 UTC (permalink / raw) To: mark delfman; +Cc: Mattias Hellström, Linux RAID Mailing List, npiggin [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=unknown, Size: 2293 bytes --] On Sunday October 18, markdelfman@googlemail.com wrote: > We have tracked the performance drop to the attached two commits in > 2.6.28.6. The performance never fully recovers in later kernels so > I presuming that the change in the write cache is still affecting MD > today. > > The problem for us is that although we have slowly tracked it down, we > have no understanding of linux at this level and simply wouldnt know > where go from this point. > > Considering this seems to only effect MD and not hardware based RAID > (in our tests) I thought that this would be an appropriate place to > post these patches and findings. > > There are 2 patches which impact MD performance via a filesystem: > > a) commit 66c85494570396661479ba51e17964b2c82b6f39 - write-back: fix > nr_to_write counter > b) commit fa76ac6cbeb58256cf7de97a75d5d7f838a80b32 - Fix page > writeback thinko, causing Berkeley DB slowdown > I've had a look at this and asked around and I'm afraid there doesn't seem to be an easy answer. The most likely difference between 'before' and 'after' those patches is that more pages are being written per call to generic_writepages in the 'before' case. This would generally improve throughput, particularly with RAID5 which would get more full stripes. However that is largely a guess as the bugs which were fixed by the patch could interact in interesting ways with XFS (which decrements ->nr_to_write itself) and it isn't immediately clear to me that more pages would be written... In any case, the 'after' code is clearly correct, so if throughput can really be increased, the change should be somewhere else. What might be useful would be to instrument write_cache_pages to count how many pages were written each time it calls. You could either print this number out every time or, if that creates too much noise, print out an average ever 512 calls or similar. Seeing how this differs with and without the patches in question could help understand what is going one and provide hints for how to fix it. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-10-29 6:41 ` Neil Brown @ 2009-10-29 6:48 ` Thomas Fjellstrom 2009-10-29 7:32 ` Thomas Fjellstrom 2009-10-29 8:08 ` Asdo 1 sibling, 1 reply; 18+ messages in thread From: Thomas Fjellstrom @ 2009-10-29 6:48 UTC (permalink / raw) To: Neil Brown Cc: mark delfman, Mattias Hellström, Linux RAID Mailing List, npiggin On Thu October 29 2009, Neil Brown wrote: > On Sunday October 18, markdelfman@googlemail.com wrote: > > We have tracked the performance drop to the attached two commits in > > 2.6.28.6. The performance never fully recovers in later kernels so > > I presuming that the change in the write cache is still affecting MD > > today. > > > > The problem for us is that although we have slowly tracked it down, we > > have no understanding of linux at this level and simply wouldnt know > > where go from this point. > > > > Considering this seems to only effect MD and not hardware based RAID > > (in our tests) I thought that this would be an appropriate place to > > post these patches and findings. > > > > There are 2 patches which impact MD performance via a filesystem: > > > > a) commit 66c85494570396661479ba51e17964b2c82b6f39 - write-back: fix > > nr_to_write counter > > b) commit fa76ac6cbeb58256cf7de97a75d5d7f838a80b32 - Fix page > > writeback thinko, causing Berkeley DB slowdown > > I've had a look at this and asked around and I'm afraid there doesn't > seem to be an easy answer. > > The most likely difference between 'before' and 'after' those patches > is that more pages are being written per call to generic_writepages in > the 'before' case. This would generally improve throughput, > particularly with RAID5 which would get more full stripes. > > However that is largely a guess as the bugs which were fixed by the > patch could interact in interesting ways with XFS (which decrements > ->nr_to_write itself) and it isn't immediately clear to me that more > pages would be written... > > In any case, the 'after' code is clearly correct, so if throughput can > really be increased, the change should be somewhere else. > > What might be useful would be to instrument write_cache_pages to count > how many pages were written each time it calls. You could either > print this number out every time or, if that creates too much noise, > print out an average ever 512 calls or similar. > > Seeing how this differs with and without the patches in question could > help understand what is going one and provide hints for how to fix it. > I don't suppose this causes "bursty" writeout like I've been seeing lately? For some reason writes go full speed for a short while and then just stop for a short time, which averages out to 2-4x slower than what the array should be capable of. > NeilBrown > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Thomas Fjellstrom tfjellstrom@shaw.ca -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-10-29 6:48 ` Thomas Fjellstrom @ 2009-10-29 7:32 ` Thomas Fjellstrom 0 siblings, 0 replies; 18+ messages in thread From: Thomas Fjellstrom @ 2009-10-29 7:32 UTC (permalink / raw) To: Neil Brown Cc: mark delfman, Mattias Hellström, Linux RAID Mailing List, npiggin On Thu October 29 2009, Thomas Fjellstrom wrote: > On Thu October 29 2009, Neil Brown wrote: > > On Sunday October 18, markdelfman@googlemail.com wrote: > > > We have tracked the performance drop to the attached two commits in > > > 2.6.28.6. The performance never fully recovers in later kernels so > > > I presuming that the change in the write cache is still affecting MD > > > today. > > > > > > The problem for us is that although we have slowly tracked it down, > > > we have no understanding of linux at this level and simply wouldnt > > > know where go from this point. > > > > > > Considering this seems to only effect MD and not hardware based RAID > > > (in our tests) I thought that this would be an appropriate place to > > > post these patches and findings. > > > > > > There are 2 patches which impact MD performance via a filesystem: > > > > > > a) commit 66c85494570396661479ba51e17964b2c82b6f39 - write-back: fix > > > nr_to_write counter > > > b) commit fa76ac6cbeb58256cf7de97a75d5d7f838a80b32 - Fix page > > > writeback thinko, causing Berkeley DB slowdown > > > > I've had a look at this and asked around and I'm afraid there doesn't > > seem to be an easy answer. > > > > The most likely difference between 'before' and 'after' those patches > > is that more pages are being written per call to generic_writepages in > > the 'before' case. This would generally improve throughput, > > particularly with RAID5 which would get more full stripes. > > > > However that is largely a guess as the bugs which were fixed by the > > patch could interact in interesting ways with XFS (which decrements > > ->nr_to_write itself) and it isn't immediately clear to me that more > > pages would be written... > > > > In any case, the 'after' code is clearly correct, so if throughput can > > really be increased, the change should be somewhere else. > > > > What might be useful would be to instrument write_cache_pages to count > > how many pages were written each time it calls. You could either > > print this number out every time or, if that creates too much noise, > > print out an average ever 512 calls or similar. > > > > Seeing how this differs with and without the patches in question could > > help understand what is going one and provide hints for how to fix it. > > I don't suppose this causes "bursty" writeout like I've been seeing > lately? For some reason writes go full speed for a short while and then > just stop for a short time, which averages out to 2-4x slower than what > the array should be capable of. At the very least, 2.6.26 doesn't have this issue. Speeds are lower than I was expecting (350MB/s write, 450MB/s read), but no where near as bad as later kernels. and there is no "bursty" behaviour. speeds are fairly constant throughout testing. > > NeilBrown > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" > > in the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Thomas Fjellstrom tfjellstrom@shaw.ca -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-10-29 6:41 ` Neil Brown 2009-10-29 6:48 ` Thomas Fjellstrom @ 2009-10-29 8:08 ` Asdo 2009-10-31 10:51 ` mark delfman 1 sibling, 1 reply; 18+ messages in thread From: Asdo @ 2009-10-29 8:08 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Neil Brown wrote: > I've had a look at this and asked around and I'm afraid there doesn't > seem to be an easy answer. > > The most likely difference between 'before' and 'after' those patches > is that more pages are being written per call to generic_writepages in > the 'before' case. This would generally improve throughput, > particularly with RAID5 which would get more full stripes. > > However that is largely a guess as the bugs which were fixed by the > patch could interact in interesting ways with XFS (which decrements > ->nr_to_write itself) and it isn't immediately clear to me that more > pages would be written... > > In any case, the 'after' code is clearly correct, so if throughput can > really be increased, the change should be somewhere else. > Thank you Neil for looking into this How can "writing less pages" be more correct than "writing more pages"? I can see the first as an optimization to the second, however if this reduces throughput then the optimization doesn't work... Isn't it possible to "fix" it so to write more pages and still be semantically correct? Thomas Fjellstrom wrote: > I don't suppose this causes "bursty" writeout like I've been seeing lately? > For some reason writes go full speed for a short while and then just stop > for a short time, which averages out to 2-4x slower than what the array > should be capable of. > I have definitely seen this bursty behaviour on 2.6.31. It would be interesting to know what are the CPUs doing or waiting for in the pause times. But I am not a kernel expert :-( how could one check this? Thank you ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-10-29 8:08 ` Asdo @ 2009-10-31 10:51 ` mark delfman 2009-11-03 4:58 ` Neil Brown 0 siblings, 1 reply; 18+ messages in thread From: mark delfman @ 2009-10-31 10:51 UTC (permalink / raw) To: Asdo; +Cc: Neil Brown, linux-raid Thank you Neil... if the commits improve overall the stability of Linux then they are obviously important and hopefully there is another way to achieve the same results... as you say if we can see that there is an opportunity for significant performance gain (i think 600MBs extra is significant), it’s maybe worth some thought. I would very much like to contribute to this and indeed ongoing developments, but the only hurdle i have is that i am storage focused, which means I work solely in storage environments (which is good) but also means i know very little about linux and dramatically less about programming (i have never even touched C for example). I come from hardware based storage into the world of linux, so lag behind greatly in many ways. I do have current access to equipment in which we can gain the performance needed to see the effect of the commits (up to 600MBsec difference) but i have the lack of ability to implement the ideas which you have suggested. I am hopeful that you or another member of this group could offer some advice / patch to implement the print options you suggested... if so i would happily allocated resource and time to do what i can to help with this. I appreciate that this group is generally aimed at those with linux experience, but hopefully i can still add some value whether simply with test equipment, comparisons or real life introduction for feedback etc. The print options you suggested... are these a simple introduction? Could someone maybe offer a abc of how to add this? On Thu, Oct 29, 2009 at 9:08 AM, Asdo <asdo@shiftmail.org> wrote: > Neil Brown wrote: >> >> I've had a look at this and asked around and I'm afraid there doesn't >> seem to be an easy answer. >> >> The most likely difference between 'before' and 'after' those patches >> is that more pages are being written per call to generic_writepages in >> the 'before' case. This would generally improve throughput, >> particularly with RAID5 which would get more full stripes. >> >> However that is largely a guess as the bugs which were fixed by the >> patch could interact in interesting ways with XFS (which decrements >> ->nr_to_write itself) and it isn't immediately clear to me that more >> pages would be written... >> In any case, the 'after' code is clearly correct, so if throughput can >> really be increased, the change should be somewhere else. >> > > Thank you Neil for looking into this > > How can "writing less pages" be more correct than "writing more pages"? > I can see the first as an optimization to the second, however if this > reduces throughput then the optimization doesn't work... > Isn't it possible to "fix" it so to write more pages and still be > semantically correct? > > > Thomas Fjellstrom wrote: >> >> I don't suppose this causes "bursty" writeout like I've been seeing >> lately? For some reason writes go full speed for a short while and then just >> stop for a short time, which averages out to 2-4x slower than what the array >> should be capable of. >> > > I have definitely seen this bursty behaviour on 2.6.31. > > It would be interesting to know what are the CPUs doing or waiting for in > the pause times. But I am not a kernel expert :-( how could one check this? > > Thank you > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-10-31 10:51 ` mark delfman @ 2009-11-03 4:58 ` Neil Brown 2009-11-03 12:11 ` mark delfman 0 siblings, 1 reply; 18+ messages in thread From: Neil Brown @ 2009-11-03 4:58 UTC (permalink / raw) To: mark delfman; +Cc: Asdo, linux-raid On Saturday October 31, markdelfman@googlemail.com wrote: > > I am hopeful that you or another member of this group could offer some > advice / patch to implement the print options you suggested... if so i > would happily allocated resource and time to do what i can to help > with this. I've spent a little while exploring this. It appears to very definitely be an XFS problem, interacting in interesting ways with the VM. I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and 2.6.28.6 using each of xfs and ext2. ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6 xfs gives 86MB/sec on .5 and only 51MB/sec on .6 When write_cache_pages is called it calls 'writepage' some number of times. On ext2, writepage will write at most one page. On xfs writepage will sometimes write multiple pages. I created a patch as below that prints (in a fairly cryptic way) the number of 'writepage' calls and the number of pages that XFS actually wrote. For ext2, the number of writepage calls is at most 1536 and averages around 140 For xfs with .5, there is usually only one call to writepage and it writes around 800 pages. For .6 there are about 200 calls to writepages but the achieve an average of about 700 pages together. So as you can see, there is very different behaviour. I notice a more recent patch in XFS in mainline which looks like a dirty hack to try to address this problem. I suggest you try that patch and/or take this to the XFS developers. NeilBrown diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 08d2b96..aa4bccc 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -875,6 +875,8 @@ int write_cache_pages(struct address_space *mapping, int cycled; int range_whole = 0; long nr_to_write = wbc->nr_to_write; + long hidden_writes = 0; + long clear_writes = 0; if (wbc->nonblocking && bdi_write_congested(bdi)) { wbc->encountered_congestion = 1; @@ -961,7 +963,11 @@ continue_unlock: if (!clear_page_dirty_for_io(page)) goto continue_unlock; + { int orig_nr_to_write = wbc->nr_to_write; ret = (*writepage)(page, wbc, data); + hidden_writes += orig_nr_to_write - wbc->nr_to_write; + clear_writes ++; + } if (unlikely(ret)) { if (ret == AOP_WRITEPAGE_ACTIVATE) { unlock_page(page); @@ -1008,12 +1014,37 @@ continue_unlock: end = writeback_index - 1; goto retry; } + if (!wbc->no_nrwrite_index_update) { if (wbc->range_cyclic || (range_whole && nr_to_write > 0)) mapping->writeback_index = done_index; wbc->nr_to_write = nr_to_write; } + { static int sum, cnt, max; + static unsigned long previous; + static int sum2, max2; + + sum += clear_writes; + cnt += 1; + + if (max < clear_writes) max = clear_writes; + + sum2 += hidden_writes; + if (max2 < hidden_writes) max2 = hidden_writes; + + if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) { + printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d sum2=%d max2=%d mean2=%d\n", + sum, cnt, max, sum/cnt, + sum2, max2, sum2/cnt); + sum = 0; + cnt = 0; + max = 0; + max2 = 0; + sum2 = 0; + previous = jiffies; + } + } return ret; } EXPORT_SYMBOL(write_cache_pages); ------------------------------------------------------ From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001 From: Eric Sandeen <sandeen@sandeen.net> Date: Fri, 31 Jul 2009 00:02:17 -0500 Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage VM calculation for nr_to_write seems off. Bump it way up, this gets simple streaming writes zippy again. To be reviewed again after Jens' writeback changes. Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Cc: Chris Mason <chris.mason@oracle.com> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com> --- fs/xfs/linux-2.6/xfs_aops.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c index 7ec89fc..aecf251 100644 --- a/fs/xfs/linux-2.6/xfs_aops.c +++ b/fs/xfs/linux-2.6/xfs_aops.c @@ -1268,6 +1268,14 @@ xfs_vm_writepage( if (!page_has_buffers(page)) create_empty_buffers(page, 1 << inode->i_blkbits, 0); + + /* + * VM calculation for nr_to_write seems off. Bump it way + * up, this gets simple streaming writes zippy again. + * To be reviewed again after Jens' writeback changes. + */ + wbc->nr_to_write *= 4; + /* * Convert delayed allocate, unwritten or unmapped space * to real space and flush out to disk. -- 1.6.4.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-11-03 4:58 ` Neil Brown @ 2009-11-03 12:11 ` mark delfman 2009-11-04 17:15 ` mark delfman 0 siblings, 1 reply; 18+ messages in thread From: mark delfman @ 2009-11-03 12:11 UTC (permalink / raw) To: Neil Brown; +Cc: Asdo, linux-raid Thanks Neil, I seem to recall that I tried this on EXT3 and saw the same results as XFS, but with your code and suggestions I think it is well worth me trying some more tests and reporting back.... Mark On Tue, Nov 3, 2009 at 4:58 AM, Neil Brown <neilb@suse.de> wrote: > On Saturday October 31, markdelfman@googlemail.com wrote: >> >> I am hopeful that you or another member of this group could offer some >> advice / patch to implement the print options you suggested... if so i >> would happily allocated resource and time to do what i can to help >> with this. > > > I've spent a little while exploring this. > It appears to very definitely be an XFS problem, interacting in > interesting ways with the VM. > > I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and > 2.6.28.6 using each of xfs and ext2. > > ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6 > xfs gives 86MB/sec on .5 and only 51MB/sec on .6 > > > When write_cache_pages is called it calls 'writepage' some number of > times. On ext2, writepage will write at most one page. > On xfs writepage will sometimes write multiple pages. > > I created a patch as below that prints (in a fairly cryptic way) > the number of 'writepage' calls and the number of pages that XFS > actually wrote. > > For ext2, the number of writepage calls is at most 1536 and averages > around 140 > > For xfs with .5, there is usually only one call to writepage and it > writes around 800 pages. > For .6 there are about 200 calls to writepages but the achieve > an average of about 700 pages together. > > So as you can see, there is very different behaviour. > > I notice a more recent patch in XFS in mainline which looks like a > dirty hack to try to address this problem. > > I suggest you try that patch and/or take this to the XFS developers. > > NeilBrown > > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 08d2b96..aa4bccc 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -875,6 +875,8 @@ int write_cache_pages(struct address_space *mapping, > int cycled; > int range_whole = 0; > long nr_to_write = wbc->nr_to_write; > + long hidden_writes = 0; > + long clear_writes = 0; > > if (wbc->nonblocking && bdi_write_congested(bdi)) { > wbc->encountered_congestion = 1; > @@ -961,7 +963,11 @@ continue_unlock: > if (!clear_page_dirty_for_io(page)) > goto continue_unlock; > > + { int orig_nr_to_write = wbc->nr_to_write; > ret = (*writepage)(page, wbc, data); > + hidden_writes += orig_nr_to_write - wbc->nr_to_write; > + clear_writes ++; > + } > if (unlikely(ret)) { > if (ret == AOP_WRITEPAGE_ACTIVATE) { > unlock_page(page); > @@ -1008,12 +1014,37 @@ continue_unlock: > end = writeback_index - 1; > goto retry; > } > + > if (!wbc->no_nrwrite_index_update) { > if (wbc->range_cyclic || (range_whole && nr_to_write > 0)) > mapping->writeback_index = done_index; > wbc->nr_to_write = nr_to_write; > } > > + { static int sum, cnt, max; > + static unsigned long previous; > + static int sum2, max2; > + > + sum += clear_writes; > + cnt += 1; > + > + if (max < clear_writes) max = clear_writes; > + > + sum2 += hidden_writes; > + if (max2 < hidden_writes) max2 = hidden_writes; > + > + if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) { > + printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d sum2=%d max2=%d mean2=%d\n", > + sum, cnt, max, sum/cnt, > + sum2, max2, sum2/cnt); > + sum = 0; > + cnt = 0; > + max = 0; > + max2 = 0; > + sum2 = 0; > + previous = jiffies; > + } > + } > return ret; > } > EXPORT_SYMBOL(write_cache_pages); > > > ------------------------------------------------------ > From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001 > From: Eric Sandeen <sandeen@sandeen.net> > Date: Fri, 31 Jul 2009 00:02:17 -0500 > Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage > > VM calculation for nr_to_write seems off. Bump it way > up, this gets simple streaming writes zippy again. > To be reviewed again after Jens' writeback changes. > > Signed-off-by: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Eric Sandeen <sandeen@sandeen.net> > Cc: Chris Mason <chris.mason@oracle.com> > Reviewed-by: Felix Blyakher <felixb@sgi.com> > Signed-off-by: Felix Blyakher <felixb@sgi.com> > --- > fs/xfs/linux-2.6/xfs_aops.c | 8 ++++++++ > 1 files changed, 8 insertions(+), 0 deletions(-) > > diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c > index 7ec89fc..aecf251 100644 > --- a/fs/xfs/linux-2.6/xfs_aops.c > +++ b/fs/xfs/linux-2.6/xfs_aops.c > @@ -1268,6 +1268,14 @@ xfs_vm_writepage( > if (!page_has_buffers(page)) > create_empty_buffers(page, 1 << inode->i_blkbits, 0); > > + > + /* > + * VM calculation for nr_to_write seems off. Bump it way > + * up, this gets simple streaming writes zippy again. > + * To be reviewed again after Jens' writeback changes. > + */ > + wbc->nr_to_write *= 4; > + > /* > * Convert delayed allocate, unwritten or unmapped space > * to real space and flush out to disk. > -- > 1.6.4.3 > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-11-03 12:11 ` mark delfman @ 2009-11-04 17:15 ` mark delfman 2009-11-04 17:25 ` Asdo 2009-11-04 19:05 ` Steve Cousins 0 siblings, 2 replies; 18+ messages in thread From: mark delfman @ 2009-11-04 17:15 UTC (permalink / raw) To: Neil Brown; +Cc: Asdo, linux-raid [-- Attachment #1: Type: text/plain, Size: 6258 bytes --] Some FS comparisons attached in pdf not sure what to make of them as yet, but worth posting On Tue, Nov 3, 2009 at 12:11 PM, mark delfman <markdelfman@googlemail.com> wrote: > Thanks Neil, > > I seem to recall that I tried this on EXT3 and saw the same results as > XFS, but with your code and suggestions I think it is well worth me > trying some more tests and reporting back.... > > > Mark > > On Tue, Nov 3, 2009 at 4:58 AM, Neil Brown <neilb@suse.de> wrote: >> On Saturday October 31, markdelfman@googlemail.com wrote: >>> >>> I am hopeful that you or another member of this group could offer some >>> advice / patch to implement the print options you suggested... if so i >>> would happily allocated resource and time to do what i can to help >>> with this. >> >> >> I've spent a little while exploring this. >> It appears to very definitely be an XFS problem, interacting in >> interesting ways with the VM. >> >> I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and >> 2.6.28.6 using each of xfs and ext2. >> >> ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6 >> xfs gives 86MB/sec on .5 and only 51MB/sec on .6 >> >> >> When write_cache_pages is called it calls 'writepage' some number of >> times. On ext2, writepage will write at most one page. >> On xfs writepage will sometimes write multiple pages. >> >> I created a patch as below that prints (in a fairly cryptic way) >> the number of 'writepage' calls and the number of pages that XFS >> actually wrote. >> >> For ext2, the number of writepage calls is at most 1536 and averages >> around 140 >> >> For xfs with .5, there is usually only one call to writepage and it >> writes around 800 pages. >> For .6 there are about 200 calls to writepages but the achieve >> an average of about 700 pages together. >> >> So as you can see, there is very different behaviour. >> >> I notice a more recent patch in XFS in mainline which looks like a >> dirty hack to try to address this problem. >> >> I suggest you try that patch and/or take this to the XFS developers. >> >> NeilBrown >> >> >> >> diff --git a/mm/page-writeback.c b/mm/page-writeback.c >> index 08d2b96..aa4bccc 100644 >> --- a/mm/page-writeback.c >> +++ b/mm/page-writeback.c >> @@ -875,6 +875,8 @@ int write_cache_pages(struct address_space *mapping, >> int cycled; >> int range_whole = 0; >> long nr_to_write = wbc->nr_to_write; >> + long hidden_writes = 0; >> + long clear_writes = 0; >> >> if (wbc->nonblocking && bdi_write_congested(bdi)) { >> wbc->encountered_congestion = 1; >> @@ -961,7 +963,11 @@ continue_unlock: >> if (!clear_page_dirty_for_io(page)) >> goto continue_unlock; >> >> + { int orig_nr_to_write = wbc->nr_to_write; >> ret = (*writepage)(page, wbc, data); >> + hidden_writes += orig_nr_to_write - wbc->nr_to_write; >> + clear_writes ++; >> + } >> if (unlikely(ret)) { >> if (ret == AOP_WRITEPAGE_ACTIVATE) { >> unlock_page(page); >> @@ -1008,12 +1014,37 @@ continue_unlock: >> end = writeback_index - 1; >> goto retry; >> } >> + >> if (!wbc->no_nrwrite_index_update) { >> if (wbc->range_cyclic || (range_whole && nr_to_write > 0)) >> mapping->writeback_index = done_index; >> wbc->nr_to_write = nr_to_write; >> } >> >> + { static int sum, cnt, max; >> + static unsigned long previous; >> + static int sum2, max2; >> + >> + sum += clear_writes; >> + cnt += 1; >> + >> + if (max < clear_writes) max = clear_writes; >> + >> + sum2 += hidden_writes; >> + if (max2 < hidden_writes) max2 = hidden_writes; >> + >> + if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) { >> + printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d sum2=%d max2=%d mean2=%d\n", >> + sum, cnt, max, sum/cnt, >> + sum2, max2, sum2/cnt); >> + sum = 0; >> + cnt = 0; >> + max = 0; >> + max2 = 0; >> + sum2 = 0; >> + previous = jiffies; >> + } >> + } >> return ret; >> } >> EXPORT_SYMBOL(write_cache_pages); >> >> >> ------------------------------------------------------ >> From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001 >> From: Eric Sandeen <sandeen@sandeen.net> >> Date: Fri, 31 Jul 2009 00:02:17 -0500 >> Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage >> >> VM calculation for nr_to_write seems off. Bump it way >> up, this gets simple streaming writes zippy again. >> To be reviewed again after Jens' writeback changes. >> >> Signed-off-by: Christoph Hellwig <hch@infradead.org> >> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> >> Cc: Chris Mason <chris.mason@oracle.com> >> Reviewed-by: Felix Blyakher <felixb@sgi.com> >> Signed-off-by: Felix Blyakher <felixb@sgi.com> >> --- >> fs/xfs/linux-2.6/xfs_aops.c | 8 ++++++++ >> 1 files changed, 8 insertions(+), 0 deletions(-) >> >> diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c >> index 7ec89fc..aecf251 100644 >> --- a/fs/xfs/linux-2.6/xfs_aops.c >> +++ b/fs/xfs/linux-2.6/xfs_aops.c >> @@ -1268,6 +1268,14 @@ xfs_vm_writepage( >> if (!page_has_buffers(page)) >> create_empty_buffers(page, 1 << inode->i_blkbits, 0); >> >> + >> + /* >> + * VM calculation for nr_to_write seems off. Bump it way >> + * up, this gets simple streaming writes zippy again. >> + * To be reviewed again after Jens' writeback changes. >> + */ >> + wbc->nr_to_write *= 4; >> + >> /* >> * Convert delayed allocate, unwritten or unmapped space >> * to real space and flush out to disk. >> -- >> 1.6.4.3 >> >> > [-- Attachment #2: FS test.pdf --] [-- Type: application/pdf, Size: 53707 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-11-04 17:15 ` mark delfman @ 2009-11-04 17:25 ` Asdo [not found] ` <66781b10911050904m407d14d6t7d3bec12578d6500@mail.gmail.com> 2009-11-04 19:05 ` Steve Cousins 1 sibling, 1 reply; 18+ messages in thread From: Asdo @ 2009-11-04 17:25 UTC (permalink / raw) To: mark delfman; +Cc: Neil Brown, linux-raid Hey great job Neil and Mark Mark, your benchmarks seems to confirm Neil's analysis: ext2 and ext3 are not slowed down from 2.6.28.5 and 2.6.28.6 Mark why don't you try to apply the patch below here by Eric Sandeen found by Neil to the 2.6.28.6 to see if the xfs write performance comes back? Thank you for your efforts Asdo mark delfman wrote: > Some FS comparisons attached in pdf > > not sure what to make of them as yet, but worth posting > > > On Tue, Nov 3, 2009 at 12:11 PM, mark delfman > <markdelfman@googlemail.com> wrote: > >> Thanks Neil, >> >> I seem to recall that I tried this on EXT3 and saw the same results as >> XFS, but with your code and suggestions I think it is well worth me >> trying some more tests and reporting back.... >> >> >> Mark >> >> On Tue, Nov 3, 2009 at 4:58 AM, Neil Brown <neilb@suse.de> wrote: >> >>> On Saturday October 31, markdelfman@googlemail.com wrote: >>> >>>> I am hopeful that you or another member of this group could offer some >>>> advice / patch to implement the print options you suggested... if so i >>>> would happily allocated resource and time to do what i can to help >>>> with this. >>>> >>> I've spent a little while exploring this. >>> It appears to very definitely be an XFS problem, interacting in >>> interesting ways with the VM. >>> >>> I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and >>> 2.6.28.6 using each of xfs and ext2. >>> >>> ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6 >>> xfs gives 86MB/sec on .5 and only 51MB/sec on .6 >>> >>> >>> When write_cache_pages is called it calls 'writepage' some number of >>> times. On ext2, writepage will write at most one page. >>> On xfs writepage will sometimes write multiple pages. >>> >>> I created a patch as below that prints (in a fairly cryptic way) >>> the number of 'writepage' calls and the number of pages that XFS >>> actually wrote. >>> >>> For ext2, the number of writepage calls is at most 1536 and averages >>> around 140 >>> >>> For xfs with .5, there is usually only one call to writepage and it >>> writes around 800 pages. >>> For .6 there are about 200 calls to writepages but the achieve >>> an average of about 700 pages together. >>> >>> So as you can see, there is very different behaviour. >>> >>> I notice a more recent patch in XFS in mainline which looks like a >>> dirty hack to try to address this problem. >>> >>> I suggest you try that patch and/or take this to the XFS developers. >>> >>> NeilBrown >>> >>> >>> >>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c >>> index 08d2b96..aa4bccc 100644 >>> --- a/mm/page-writeback.c >>> +++ b/mm/page-writeback.c >>> @@ -875,6 +875,8 @@ int write_cache_pages(struct address_space *mapping, >>> int cycled; >>> int range_whole = 0; >>> long nr_to_write = wbc->nr_to_write; >>> + long hidden_writes = 0; >>> + long clear_writes = 0; >>> >>> if (wbc->nonblocking && bdi_write_congested(bdi)) { >>> wbc->encountered_congestion = 1; >>> @@ -961,7 +963,11 @@ continue_unlock: >>> if (!clear_page_dirty_for_io(page)) >>> goto continue_unlock; >>> >>> + { int orig_nr_to_write = wbc->nr_to_write; >>> ret = (*writepage)(page, wbc, data); >>> + hidden_writes += orig_nr_to_write - wbc->nr_to_write; >>> + clear_writes ++; >>> + } >>> if (unlikely(ret)) { >>> if (ret == AOP_WRITEPAGE_ACTIVATE) { >>> unlock_page(page); >>> @@ -1008,12 +1014,37 @@ continue_unlock: >>> end = writeback_index - 1; >>> goto retry; >>> } >>> + >>> if (!wbc->no_nrwrite_index_update) { >>> if (wbc->range_cyclic || (range_whole && nr_to_write > 0)) >>> mapping->writeback_index = done_index; >>> wbc->nr_to_write = nr_to_write; >>> } >>> >>> + { static int sum, cnt, max; >>> + static unsigned long previous; >>> + static int sum2, max2; >>> + >>> + sum += clear_writes; >>> + cnt += 1; >>> + >>> + if (max < clear_writes) max = clear_writes; >>> + >>> + sum2 += hidden_writes; >>> + if (max2 < hidden_writes) max2 = hidden_writes; >>> + >>> + if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) { >>> + printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d sum2=%d max2=%d mean2=%d\n", >>> + sum, cnt, max, sum/cnt, >>> + sum2, max2, sum2/cnt); >>> + sum = 0; >>> + cnt = 0; >>> + max = 0; >>> + max2 = 0; >>> + sum2 = 0; >>> + previous = jiffies; >>> + } >>> + } >>> return ret; >>> } >>> EXPORT_SYMBOL(write_cache_pages); >>> >>> >>> ------------------------------------------------------ >>> From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001 >>> From: Eric Sandeen <sandeen@sandeen.net> >>> Date: Fri, 31 Jul 2009 00:02:17 -0500 >>> Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage >>> >>> VM calculation for nr_to_write seems off. Bump it way >>> up, this gets simple streaming writes zippy again. >>> To be reviewed again after Jens' writeback changes. >>> >>> Signed-off-by: Christoph Hellwig <hch@infradead.org> >>> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> >>> Cc: Chris Mason <chris.mason@oracle.com> >>> Reviewed-by: Felix Blyakher <felixb@sgi.com> >>> Signed-off-by: Felix Blyakher <felixb@sgi.com> >>> --- >>> fs/xfs/linux-2.6/xfs_aops.c | 8 ++++++++ >>> 1 files changed, 8 insertions(+), 0 deletions(-) >>> >>> diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c >>> index 7ec89fc..aecf251 100644 >>> --- a/fs/xfs/linux-2.6/xfs_aops.c >>> +++ b/fs/xfs/linux-2.6/xfs_aops.c >>> @@ -1268,6 +1268,14 @@ xfs_vm_writepage( >>> if (!page_has_buffers(page)) >>> create_empty_buffers(page, 1 << inode->i_blkbits, 0); >>> >>> + >>> + /* >>> + * VM calculation for nr_to_write seems off. Bump it way >>> + * up, this gets simple streaming writes zippy again. >>> + * To be reviewed again after Jens' writeback changes. >>> + */ >>> + wbc->nr_to_write *= 4; >>> + >>> /* >>> * Convert delayed allocate, unwritten or unmapped space >>> * to real space and flush out to disk. >>> -- >>> 1.6.4.3 >>> >>> >>> ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <66781b10911050904m407d14d6t7d3bec12578d6500@mail.gmail.com>]
* Re: MD write performance issue - found Catalyst patches [not found] ` <66781b10911050904m407d14d6t7d3bec12578d6500@mail.gmail.com> @ 2009-11-05 19:09 ` Asdo 2009-11-06 4:52 ` Neil Brown 2009-11-06 15:51 ` mark delfman 0 siblings, 2 replies; 18+ messages in thread From: Asdo @ 2009-11-05 19:09 UTC (permalink / raw) To: mark delfman; +Cc: Neil Brown, linux-raid Great! So the dirty hack pumped at x16 does really work! (while we wait for Jens, as written in the patch: "To be reviewed again after Jens' writeback changes.") Thanks for having tried up to x32. Still Raid-6 xfs write is not yet up to the old speed... maybe the old code was better at filling RAID stripes exactly, who knows. Mark, yep, personally I would be very interested in seeing how does 2.6.31 perform on your hardware so I can e.g. see exactly how much my 3ware 9650 controllers suck... (so also pls try vanilla 3.6.31 which I think has an integrated x4 hack, do not just try with x16 please) We might also be interested in 2.6.32 performances if you have time, also because 2.6.32 includes the fixes for the CPU lockups in big arrays during resyncs which was reported on this list, and this is a good incentive for upgrading (Neil, btw, is there any chance those lockups fixes get backported to mainstream 2.6.31.x?). Thank you! Asdo mark delfman wrote: > Hi Gents, > > Attached is the result of some testing with the XFS patch... as we can > see it does make a reasonable difference! Changing the value from > 4,16,32 shows 16 is a good level... > > Is this a 'safe' patch at 16? > > I think that maybe there is still some performance to be gained, > especially in the R6 configs which is where most would be interested i > suspect.. but its a great start! > > > I think that i should jump up to maybe .31 and see how this reacts..... > > Neil, i applied your writepage patch and have outputs if these are of > interest... > > Thank you for the help with the pacthing and linux!!!! > > > mark > > > > On Wed, Nov 4, 2009 at 5:25 PM, Asdo <asdo@shiftmail.org> wrote: > >> Hey great job Neil and Mark >> Mark, your benchmarks seems to confirm Neil's analysis: ext2 and ext3 are >> not slowed down from 2.6.28.5 and 2.6.28.6 >> Mark why don't you try to apply the patch below here by Eric Sandeen found >> by Neil to the 2.6.28.6 to see if the xfs write performance comes back? >> Thank you for your efforts >> Asdo >> >> mark delfman wrote: >> >>> Some FS comparisons attached in pdf >>> >>> not sure what to make of them as yet, but worth posting >>> >>> >>> On Tue, Nov 3, 2009 at 12:11 PM, mark delfman >>> <markdelfman@googlemail.com> wrote: >>> >>> >>>> Thanks Neil, >>>> >>>> I seem to recall that I tried this on EXT3 and saw the same results as >>>> XFS, but with your code and suggestions I think it is well worth me >>>> trying some more tests and reporting back.... >>>> >>>> >>>> Mark >>>> >>>> On Tue, Nov 3, 2009 at 4:58 AM, Neil Brown <neilb@suse.de> wrote: >>>> >>>> >>>>> On Saturday October 31, markdelfman@googlemail.com wrote: >>>>> >>>>> >>>>>> I am hopeful that you or another member of this group could offer some >>>>>> advice / patch to implement the print options you suggested... if so i >>>>>> would happily allocated resource and time to do what i can to help >>>>>> with this. >>>>>> >>>>>> >>>>> I've spent a little while exploring this. >>>>> It appears to very definitely be an XFS problem, interacting in >>>>> interesting ways with the VM. >>>>> >>>>> I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and >>>>> 2.6.28.6 using each of xfs and ext2. >>>>> >>>>> ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6 >>>>> xfs gives 86MB/sec on .5 and only 51MB/sec on .6 >>>>> >>>>> >>>>> When write_cache_pages is called it calls 'writepage' some number of >>>>> times. On ext2, writepage will write at most one page. >>>>> On xfs writepage will sometimes write multiple pages. >>>>> >>>>> I created a patch as below that prints (in a fairly cryptic way) >>>>> the number of 'writepage' calls and the number of pages that XFS >>>>> actually wrote. >>>>> >>>>> For ext2, the number of writepage calls is at most 1536 and averages >>>>> around 140 >>>>> >>>>> For xfs with .5, there is usually only one call to writepage and it >>>>> writes around 800 pages. >>>>> For .6 there are about 200 calls to writepages but the achieve >>>>> an average of about 700 pages together. >>>>> >>>>> So as you can see, there is very different behaviour. >>>>> >>>>> I notice a more recent patch in XFS in mainline which looks like a >>>>> dirty hack to try to address this problem. >>>>> >>>>> I suggest you try that patch and/or take this to the XFS developers. >>>>> >>>>> NeilBrown >>>>> >>>>> >>>>> >>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c >>>>> index 08d2b96..aa4bccc 100644 >>>>> --- a/mm/page-writeback.c >>>>> +++ b/mm/page-writeback.c >>>>> @@ -875,6 +875,8 @@ int write_cache_pages(struct address_space *mapping, >>>>> int cycled; >>>>> int range_whole = 0; >>>>> long nr_to_write = wbc->nr_to_write; >>>>> + long hidden_writes = 0; >>>>> + long clear_writes = 0; >>>>> >>>>> if (wbc->nonblocking && bdi_write_congested(bdi)) { >>>>> wbc->encountered_congestion = 1; >>>>> @@ -961,7 +963,11 @@ continue_unlock: >>>>> if (!clear_page_dirty_for_io(page)) >>>>> goto continue_unlock; >>>>> >>>>> + { int orig_nr_to_write = wbc->nr_to_write; >>>>> ret = (*writepage)(page, wbc, data); >>>>> + hidden_writes += orig_nr_to_write - >>>>> wbc->nr_to_write; >>>>> + clear_writes ++; >>>>> + } >>>>> if (unlikely(ret)) { >>>>> if (ret == AOP_WRITEPAGE_ACTIVATE) { >>>>> unlock_page(page); >>>>> @@ -1008,12 +1014,37 @@ continue_unlock: >>>>> end = writeback_index - 1; >>>>> goto retry; >>>>> } >>>>> + >>>>> if (!wbc->no_nrwrite_index_update) { >>>>> if (wbc->range_cyclic || (range_whole && nr_to_write > 0)) >>>>> mapping->writeback_index = done_index; >>>>> wbc->nr_to_write = nr_to_write; >>>>> } >>>>> >>>>> + { static int sum, cnt, max; >>>>> + static unsigned long previous; >>>>> + static int sum2, max2; >>>>> + >>>>> + sum += clear_writes; >>>>> + cnt += 1; >>>>> + >>>>> + if (max < clear_writes) max = clear_writes; >>>>> + >>>>> + sum2 += hidden_writes; >>>>> + if (max2 < hidden_writes) max2 = hidden_writes; >>>>> + >>>>> + if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) { >>>>> + printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d >>>>> sum2=%d max2=%d mean2=%d\n", >>>>> + sum, cnt, max, sum/cnt, >>>>> + sum2, max2, sum2/cnt); >>>>> + sum = 0; >>>>> + cnt = 0; >>>>> + max = 0; >>>>> + max2 = 0; >>>>> + sum2 = 0; >>>>> + previous = jiffies; >>>>> + } >>>>> + } >>>>> return ret; >>>>> } >>>>> EXPORT_SYMBOL(write_cache_pages); >>>>> >>>>> >>>>> ------------------------------------------------------ >>>>> From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001 >>>>> From: Eric Sandeen <sandeen@sandeen.net> >>>>> Date: Fri, 31 Jul 2009 00:02:17 -0500 >>>>> Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage >>>>> >>>>> VM calculation for nr_to_write seems off. Bump it way >>>>> up, this gets simple streaming writes zippy again. >>>>> To be reviewed again after Jens' writeback changes. >>>>> >>>>> Signed-off-by: Christoph Hellwig <hch@infradead.org> >>>>> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> >>>>> Cc: Chris Mason <chris.mason@oracle.com> >>>>> Reviewed-by: Felix Blyakher <felixb@sgi.com> >>>>> Signed-off-by: Felix Blyakher <felixb@sgi.com> >>>>> --- >>>>> fs/xfs/linux-2.6/xfs_aops.c | 8 ++++++++ >>>>> 1 files changed, 8 insertions(+), 0 deletions(-) >>>>> >>>>> diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c >>>>> index 7ec89fc..aecf251 100644 >>>>> --- a/fs/xfs/linux-2.6/xfs_aops.c >>>>> +++ b/fs/xfs/linux-2.6/xfs_aops.c >>>>> @@ -1268,6 +1268,14 @@ xfs_vm_writepage( >>>>> if (!page_has_buffers(page)) >>>>> create_empty_buffers(page, 1 << inode->i_blkbits, 0); >>>>> >>>>> + >>>>> + /* >>>>> + * VM calculation for nr_to_write seems off. Bump it way >>>>> + * up, this gets simple streaming writes zippy again. >>>>> + * To be reviewed again after Jens' writeback changes. >>>>> + */ >>>>> + wbc->nr_to_write *= 4; >>>>> + >>>>> /* >>>>> * Convert delayed allocate, unwritten or unmapped space >>>>> * to real space and flush out to disk. >>>>> -- >>>>> 1.6.4.3 >>>>> >>>>> >>>>> >>>>> >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-11-05 19:09 ` Asdo @ 2009-11-06 4:52 ` Neil Brown 2009-11-06 10:28 ` Asdo 2009-11-06 15:51 ` mark delfman 1 sibling, 1 reply; 18+ messages in thread From: Neil Brown @ 2009-11-06 4:52 UTC (permalink / raw) To: Asdo; +Cc: mark delfman, linux-raid On Thursday November 5, asdo@shiftmail.org wrote: > incentive for upgrading (Neil, btw, is there any chance those lockups > fixes get backported to mainstream 2.6.31.x?). That would be up to the XFS developers. I suggest you consider asking them. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-11-06 4:52 ` Neil Brown @ 2009-11-06 10:28 ` Asdo 2009-11-06 10:51 ` Neil F Brown 0 siblings, 1 reply; 18+ messages in thread From: Asdo @ 2009-11-06 10:28 UTC (permalink / raw) To: Neil Brown; +Cc: mark delfman, linux-raid Neil Brown wrote: > On Thursday November 5, asdo@shiftmail.org wrote: > >> incentive for upgrading (Neil, btw, is there any chance those lockups >> fixes get backported to mainstream 2.6.31.x?). > > That would be up to the XFS developers. I suggest you consider asking > them. > Hi Neil, no sorry I meant the patches for md raid lockups like this one: http://neil.brown.name/git?p=md;a=commitdiff;h=1d9d52416c0445019ccc1f0fddb9a227456eb61b and those for raid 5,6 for which i don't know the link... Hm actually I don't see them applied to even mainstream 2.6.32 yet :-( http://git.kernel.org/?p=linux/kernel/git/djbw/md.git;a=blob_plain;f=drivers/md/raid1.c;hb=2fdc246aaf9a7fa088451ad2a72e9119b5f7f029 am I correct? The bug can be serious imho depending on the hardware: when I saw it on my hardware all disk accesses were completely starved forever and it was even impossible to log-in until the resync finished. It can actually be worked around by reducing the maximum resync speed, but this is only if the user knows the trick... Thank you ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-11-06 10:28 ` Asdo @ 2009-11-06 10:51 ` Neil F Brown 0 siblings, 0 replies; 18+ messages in thread From: Neil F Brown @ 2009-11-06 10:51 UTC (permalink / raw) To: Asdo; +Cc: mark delfman, linux-raid On Fri, November 6, 2009 9:28 pm, Asdo wrote: > Neil Brown wrote: >> On Thursday November 5, asdo@shiftmail.org wrote: >> >>> incentive for upgrading (Neil, btw, is there any chance those lockups >>> fixes get backported to mainstream 2.6.31.x?). >> >> That would be up to the XFS developers. I suggest you consider asking >> them. >> > Hi Neil, no sorry I meant the patches for md raid lockups like this one: > http://neil.brown.name/git?p=md;a=commitdiff;h=1d9d52416c0445019ccc1f0fddb9a227456eb61b > and those for raid 5,6 for which i don't know the link... > Hm actually I don't see them applied to even mainstream 2.6.32 yet :-( > http://git.kernel.org/?p=linux/kernel/git/djbw/md.git;a=blob_plain;f=drivers/md/raid1.c;hb=2fdc246aaf9a7fa088451ad2a72e9119b5f7f029 > am I correct? > The bug can be serious imho depending on the hardware: when I saw it on > my hardware all disk accesses were completely starved forever and it was > even impossible to log-in until the resync finished. It can actually be > worked around by reducing the maximum resync speed, but this is only if > the user knows the trick... > Thank you Those patches are in 2.6.32-rc: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1d9d52416c0445019ccc1f0fddb9a227456eb61b however I haven't submitted them for -stable. Maybe I should... Thanks. NeilBrown > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-11-05 19:09 ` Asdo 2009-11-06 4:52 ` Neil Brown @ 2009-11-06 15:51 ` mark delfman 1 sibling, 0 replies; 18+ messages in thread From: mark delfman @ 2009-11-06 15:51 UTC (permalink / raw) To: Asdo; +Cc: Neil Brown, linux-raid [-- Attachment #1: Type: text/plain, Size: 9944 bytes --] Attached are later kernel results..... Not an awful lot of difference (apart from the native due to the fact 28.6 doesnt have the pacth included).... 32.rc6 is certainly upto 10% faster on R6 Note we running around 10 test on each, this is a low number for averages and the result move around 100MB plus.. but in this case we did not need to be over accurate... they show maybe 20% reduction writing to FS as opposed to direct to MD. Whilst the reads on XFS are now 20% 'faster' on XFS than to the raw device (reaching 2GBs)... .3X seems better at read caching on XFS. I have only graphed the writes... Mark On Thu, Nov 5, 2009 at 7:09 PM, Asdo <asdo@shiftmail.org> wrote: > Great! > So the dirty hack pumped at x16 does really work! (while we wait for Jens, > as written in the patch: "To be reviewed again after Jens' writeback > changes.") Thanks for having tried up to x32. > Still Raid-6 xfs write is not yet up to the old speed... maybe the old code > was better at filling RAID stripes exactly, who knows. > Mark, yep, personally I would be very interested in seeing how does 2.6.31 > perform on your hardware so I can e.g. see exactly how much my 3ware 9650 > controllers suck... (so also pls try vanilla 3.6.31 which I think has an > integrated x4 hack, do not just try with x16 please) > We might also be interested in 2.6.32 performances if you have time, also > because 2.6.32 includes the fixes for the CPU lockups in big arrays during > resyncs which was reported on this list, and this is a good incentive for > upgrading (Neil, btw, is there any chance those lockups fixes get backported > to mainstream 2.6.31.x?). > Thank you! > Asdo > > > mark delfman wrote: >> >> Hi Gents, >> >> Attached is the result of some testing with the XFS patch... as we can >> see it does make a reasonable difference! Changing the value from >> 4,16,32 shows 16 is a good level... >> >> Is this a 'safe' patch at 16? >> >> I think that maybe there is still some performance to be gained, >> especially in the R6 configs which is where most would be interested i >> suspect.. but its a great start! >> >> >> I think that i should jump up to maybe .31 and see how this reacts..... >> >> Neil, i applied your writepage patch and have outputs if these are of >> interest... >> >> Thank you for the help with the pacthing and linux!!!! >> >> >> mark >> >> >> >> On Wed, Nov 4, 2009 at 5:25 PM, Asdo <asdo@shiftmail.org> wrote: >> >>> >>> Hey great job Neil and Mark >>> Mark, your benchmarks seems to confirm Neil's analysis: ext2 and ext3 are >>> not slowed down from 2.6.28.5 and 2.6.28.6 >>> Mark why don't you try to apply the patch below here by Eric Sandeen >>> found >>> by Neil to the 2.6.28.6 to see if the xfs write performance comes back? >>> Thank you for your efforts >>> Asdo >>> >>> mark delfman wrote: >>> >>>> >>>> Some FS comparisons attached in pdf >>>> >>>> not sure what to make of them as yet, but worth posting >>>> >>>> >>>> On Tue, Nov 3, 2009 at 12:11 PM, mark delfman >>>> <markdelfman@googlemail.com> wrote: >>>> >>>> >>>>> >>>>> Thanks Neil, >>>>> >>>>> I seem to recall that I tried this on EXT3 and saw the same results as >>>>> XFS, but with your code and suggestions I think it is well worth me >>>>> trying some more tests and reporting back.... >>>>> >>>>> >>>>> Mark >>>>> >>>>> On Tue, Nov 3, 2009 at 4:58 AM, Neil Brown <neilb@suse.de> wrote: >>>>> >>>>> >>>>>> >>>>>> On Saturday October 31, markdelfman@googlemail.com wrote: >>>>>> >>>>>> >>>>>>> >>>>>>> I am hopeful that you or another member of this group could offer >>>>>>> some >>>>>>> advice / patch to implement the print options you suggested... if so >>>>>>> i >>>>>>> would happily allocated resource and time to do what i can to help >>>>>>> with this. >>>>>>> >>>>>>> >>>>>> >>>>>> I've spent a little while exploring this. >>>>>> It appears to very definitely be an XFS problem, interacting in >>>>>> interesting ways with the VM. >>>>>> >>>>>> I built a 4-drive raid6 and did some simple testing on 2.6.28.5 and >>>>>> 2.6.28.6 using each of xfs and ext2. >>>>>> >>>>>> ext2 gives write throughput of 65MB/sec on .5 and 66MB/sec on .6 >>>>>> xfs gives 86MB/sec on .5 and only 51MB/sec on .6 >>>>>> >>>>>> >>>>>> When write_cache_pages is called it calls 'writepage' some number of >>>>>> times. On ext2, writepage will write at most one page. >>>>>> On xfs writepage will sometimes write multiple pages. >>>>>> >>>>>> I created a patch as below that prints (in a fairly cryptic way) >>>>>> the number of 'writepage' calls and the number of pages that XFS >>>>>> actually wrote. >>>>>> >>>>>> For ext2, the number of writepage calls is at most 1536 and averages >>>>>> around 140 >>>>>> >>>>>> For xfs with .5, there is usually only one call to writepage and it >>>>>> writes around 800 pages. >>>>>> For .6 there are about 200 calls to writepages but the achieve >>>>>> an average of about 700 pages together. >>>>>> >>>>>> So as you can see, there is very different behaviour. >>>>>> >>>>>> I notice a more recent patch in XFS in mainline which looks like a >>>>>> dirty hack to try to address this problem. >>>>>> >>>>>> I suggest you try that patch and/or take this to the XFS developers. >>>>>> >>>>>> NeilBrown >>>>>> >>>>>> >>>>>> >>>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c >>>>>> index 08d2b96..aa4bccc 100644 >>>>>> --- a/mm/page-writeback.c >>>>>> +++ b/mm/page-writeback.c >>>>>> @@ -875,6 +875,8 @@ int write_cache_pages(struct address_space >>>>>> *mapping, >>>>>> int cycled; >>>>>> int range_whole = 0; >>>>>> long nr_to_write = wbc->nr_to_write; >>>>>> + long hidden_writes = 0; >>>>>> + long clear_writes = 0; >>>>>> >>>>>> if (wbc->nonblocking && bdi_write_congested(bdi)) { >>>>>> wbc->encountered_congestion = 1; >>>>>> @@ -961,7 +963,11 @@ continue_unlock: >>>>>> if (!clear_page_dirty_for_io(page)) >>>>>> goto continue_unlock; >>>>>> >>>>>> + { int orig_nr_to_write = wbc->nr_to_write; >>>>>> ret = (*writepage)(page, wbc, data); >>>>>> + hidden_writes += orig_nr_to_write - >>>>>> wbc->nr_to_write; >>>>>> + clear_writes ++; >>>>>> + } >>>>>> if (unlikely(ret)) { >>>>>> if (ret == AOP_WRITEPAGE_ACTIVATE) { >>>>>> unlock_page(page); >>>>>> @@ -1008,12 +1014,37 @@ continue_unlock: >>>>>> end = writeback_index - 1; >>>>>> goto retry; >>>>>> } >>>>>> + >>>>>> if (!wbc->no_nrwrite_index_update) { >>>>>> if (wbc->range_cyclic || (range_whole && nr_to_write > >>>>>> 0)) >>>>>> mapping->writeback_index = done_index; >>>>>> wbc->nr_to_write = nr_to_write; >>>>>> } >>>>>> >>>>>> + { static int sum, cnt, max; >>>>>> + static unsigned long previous; >>>>>> + static int sum2, max2; >>>>>> + >>>>>> + sum += clear_writes; >>>>>> + cnt += 1; >>>>>> + >>>>>> + if (max < clear_writes) max = clear_writes; >>>>>> + >>>>>> + sum2 += hidden_writes; >>>>>> + if (max2 < hidden_writes) max2 = hidden_writes; >>>>>> + >>>>>> + if (cnt > 100 && time_after(jiffies, previous + 10*HZ)) { >>>>>> + printk("write_page_cache: sum=%d cnt=%d max=%d mean=%d >>>>>> sum2=%d max2=%d mean2=%d\n", >>>>>> + sum, cnt, max, sum/cnt, >>>>>> + sum2, max2, sum2/cnt); >>>>>> + sum = 0; >>>>>> + cnt = 0; >>>>>> + max = 0; >>>>>> + max2 = 0; >>>>>> + sum2 = 0; >>>>>> + previous = jiffies; >>>>>> + } >>>>>> + } >>>>>> return ret; >>>>>> } >>>>>> EXPORT_SYMBOL(write_cache_pages); >>>>>> >>>>>> >>>>>> ------------------------------------------------------ >>>>>> From c8a4051c3731b6db224482218cfd535ab9393ff8 Mon Sep 17 00:00:00 2001 >>>>>> From: Eric Sandeen <sandeen@sandeen.net> >>>>>> Date: Fri, 31 Jul 2009 00:02:17 -0500 >>>>>> Subject: [PATCH] xfs: bump up nr_to_write in xfs_vm_writepage >>>>>> >>>>>> VM calculation for nr_to_write seems off. Bump it way >>>>>> up, this gets simple streaming writes zippy again. >>>>>> To be reviewed again after Jens' writeback changes. >>>>>> >>>>>> Signed-off-by: Christoph Hellwig <hch@infradead.org> >>>>>> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> >>>>>> Cc: Chris Mason <chris.mason@oracle.com> >>>>>> Reviewed-by: Felix Blyakher <felixb@sgi.com> >>>>>> Signed-off-by: Felix Blyakher <felixb@sgi.com> >>>>>> --- >>>>>> fs/xfs/linux-2.6/xfs_aops.c | 8 ++++++++ >>>>>> 1 files changed, 8 insertions(+), 0 deletions(-) >>>>>> >>>>>> diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c >>>>>> index 7ec89fc..aecf251 100644 >>>>>> --- a/fs/xfs/linux-2.6/xfs_aops.c >>>>>> +++ b/fs/xfs/linux-2.6/xfs_aops.c >>>>>> @@ -1268,6 +1268,14 @@ xfs_vm_writepage( >>>>>> if (!page_has_buffers(page)) >>>>>> create_empty_buffers(page, 1 << inode->i_blkbits, 0); >>>>>> >>>>>> + >>>>>> + /* >>>>>> + * VM calculation for nr_to_write seems off. Bump it way >>>>>> + * up, this gets simple streaming writes zippy again. >>>>>> + * To be reviewed again after Jens' writeback changes. >>>>>> + */ >>>>>> + wbc->nr_to_write *= 4; >>>>>> + >>>>>> /* >>>>>> * Convert delayed allocate, unwritten or unmapped space >>>>>> * to real space and flush out to disk. >>>>>> -- >>>>>> 1.6.4.3 >>>>>> >>>>>> >>>>>> >>>>>> >>> >>> > > [-- Attachment #2: XFSvMD_2.pdf --] [-- Type: application/pdf, Size: 34619 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-11-04 17:15 ` mark delfman 2009-11-04 17:25 ` Asdo @ 2009-11-04 19:05 ` Steve Cousins 2009-11-04 22:08 ` mark delfman 1 sibling, 1 reply; 18+ messages in thread From: Steve Cousins @ 2009-11-04 19:05 UTC (permalink / raw) To: mark delfman; +Cc: linux-raid mark delfman wrote: > Some FS comparisons attached in pdf > > not sure what to make of them as yet, but worth posting > I'm not sure either. Two things jump out. 1. Why is raw RAID0 read performance slower than write performance 2. Why is read performance with some file systems at or above raw read performance? For number one, does this indicate that write caching is actually On on the drives? Are all tests truly apples-apples comparisons or were there other factors in there that aren't listed in the charts? I guess these issues might not have a lot to do with your main question but you might want to double-check the tests and numbers. Steve ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: MD write performance issue - found Catalyst patches 2009-11-04 19:05 ` Steve Cousins @ 2009-11-04 22:08 ` mark delfman 0 siblings, 0 replies; 18+ messages in thread From: mark delfman @ 2009-11-04 22:08 UTC (permalink / raw) To: Steve Cousins; +Cc: linux-raid Yes write cache is on the drives and the comparisons are all with the same hardware... apples for apples and no pears ;) Write is often faster (in my mind) simply because you can use a lot of write cache (in system)... whilst reads reads you are limited to what the drives can pull off. I also guess that the FS's - mainly ext2 it seems is more effecient at implementing a read cache than a raw device, hence the slight performance increase.... but i just guessing to be honest On Wed, Nov 4, 2009 at 7:05 PM, Steve Cousins <steve.cousins@maine.edu> wrote: > mark delfman wrote: >> >> Some FS comparisons attached in pdf >> >> not sure what to make of them as yet, but worth posting >> > > I'm not sure either. Two things jump out. > > 1. Why is raw RAID0 read performance slower than write performance > 2. Why is read performance with some file systems at or above raw read > performance? > > For number one, does this indicate that write caching is actually On on the > drives? > > Are all tests truly apples-apples comparisons or were there other factors in > there that aren't listed in the charts? > > I guess these issues might not have a lot to do with your main question but > you might want to double-check the tests and numbers. > > Steve > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-11-06 15:51 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-18 10:00 MD write performance issue - found Catalyst patches mark delfman
2009-10-18 22:39 ` NeilBrown
2009-10-29 6:41 ` Neil Brown
2009-10-29 6:48 ` Thomas Fjellstrom
2009-10-29 7:32 ` Thomas Fjellstrom
2009-10-29 8:08 ` Asdo
2009-10-31 10:51 ` mark delfman
2009-11-03 4:58 ` Neil Brown
2009-11-03 12:11 ` mark delfman
2009-11-04 17:15 ` mark delfman
2009-11-04 17:25 ` Asdo
[not found] ` <66781b10911050904m407d14d6t7d3bec12578d6500@mail.gmail.com>
2009-11-05 19:09 ` Asdo
2009-11-06 4:52 ` Neil Brown
2009-11-06 10:28 ` Asdo
2009-11-06 10:51 ` Neil F Brown
2009-11-06 15:51 ` mark delfman
2009-11-04 19:05 ` Steve Cousins
2009-11-04 22:08 ` mark delfman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).