* Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL [not found] ` <20090104142303.98762f81.akpm@linux-foundation.org> @ 2009-01-04 22:43 ` Theodore Tso 2009-01-04 23:19 ` Andrew Morton 0 siblings, 1 reply; 4+ messages in thread From: Theodore Tso @ 2009-01-04 22:43 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, linux-ext4, Arjan van de Ven, Jens Axboe On Sun, Jan 04, 2009 at 02:23:03PM -0800, Andrew Morton wrote: > > Following up with an e-mail thread started by Arjan two months ago, > > (subject: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority), I have > > a patch, just sent to linux-ext4@vger.kernel.org, which fixes the jbd2 > > layer to submit journal writes via submit_bh() with WRITE_SYNC. > > Hopefully this might be enough of a priority boost so we don't have to > > force a higher I/O priority level via a buffer_head flag. However, > > while looking through the code paths, in ordered data mode, we end up > > flushing data pages via the page writeback paths on a per-inode basis, > > and I noticed that even though we are passing in > > wbc.sync_mode=WBC_SYNC_ALL, __block_write_full_page() is using > > submit_bh(WRITE, bh) instead of submit_bh(WRITE_SYNC). > > But this is all the wrong way to fix the problem, isn't it? > > The problem is that at one particular point, the current transaction > blocks callers behind the committing transaction's IO completion. > > Did anyone look at fixing that? ISTR concluding that a data copy and > shadow-bh arrangement might be needed. I haven't had time to really drill down into the jbd code yet, and yes, eventually we probably want to do this. Still, if we are submitting I/O which we are going to end up waiting on, we really should submit it with WRITE_SYNC, and this patch should optimize writes in other situations; for example, if we fsync() a file, we will also end up calling block_write_full_page(), and so supplying the WRITE_SYNC hint to the block layer would be a Good Thing. - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL 2009-01-04 22:43 ` [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL Theodore Tso @ 2009-01-04 23:19 ` Andrew Morton 2009-01-05 0:21 ` Theodore Tso 2009-01-05 8:02 ` Jens Axboe 0 siblings, 2 replies; 4+ messages in thread From: Andrew Morton @ 2009-01-04 23:19 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-mm, linux-ext4, Arjan van de Ven, Jens Axboe On Sun, 4 Jan 2009 17:43:51 -0500 Theodore Tso <tytso@mit.edu> wrote: > On Sun, Jan 04, 2009 at 02:23:03PM -0800, Andrew Morton wrote: > > > Following up with an e-mail thread started by Arjan two months ago, > > > (subject: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority), I have > > > a patch, just sent to linux-ext4@vger.kernel.org, which fixes the jbd2 > > > layer to submit journal writes via submit_bh() with WRITE_SYNC. > > > Hopefully this might be enough of a priority boost so we don't have to > > > force a higher I/O priority level via a buffer_head flag. However, > > > while looking through the code paths, in ordered data mode, we end up > > > flushing data pages via the page writeback paths on a per-inode basis, > > > and I noticed that even though we are passing in > > > wbc.sync_mode=WBC_SYNC_ALL, __block_write_full_page() is using > > > submit_bh(WRITE, bh) instead of submit_bh(WRITE_SYNC). > > > > But this is all the wrong way to fix the problem, isn't it? > > > > The problem is that at one particular point, the current transaction > > blocks callers behind the committing transaction's IO completion. > > > > Did anyone look at fixing that? ISTR concluding that a data copy and > > shadow-bh arrangement might be needed. > > I haven't had time to really drill down into the jbd code yet, and > yes, eventually we probably want to do this. We do. > Still, if we are > submitting I/O which we are going to end up waiting on, we really > should submit it with WRITE_SYNC, and this patch should optimize > writes in other situations; for example, if we fsync() a file, we will > also end up calling block_write_full_page(), and so supplying the > WRITE_SYNC hint to the block layer would be a Good Thing. Is it? WRITE_SYNC means "unplug the queue after this bh/BIO". By setting it against every bh, don't we risk the generation of more BIOs and the loss of merging opportunities? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL 2009-01-04 23:19 ` Andrew Morton @ 2009-01-05 0:21 ` Theodore Tso 2009-01-05 8:02 ` Jens Axboe 1 sibling, 0 replies; 4+ messages in thread From: Theodore Tso @ 2009-01-05 0:21 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, linux-ext4, Arjan van de Ven, Jens Axboe On Sun, Jan 04, 2009 at 03:19:27PM -0800, Andrew Morton wrote: > > Still, if we are > > submitting I/O which we are going to end up waiting on, we really > > should submit it with WRITE_SYNC, and this patch should optimize > > writes in other situations; for example, if we fsync() a file, we will > > also end up calling block_write_full_page(), and so supplying the > > WRITE_SYNC hint to the block layer would be a Good Thing. > > Is it? WRITE_SYNC means "unplug the queue after this bh/BIO". By setting > it against every bh, don't we risk the generation of more BIOs and > the loss of merging opportunities? Good point, yeah, that's a problem. Some of IO schedulers also use REQ_RW_SYNC to prioritize the I/O's above non-sync I/O's. That's an orthognal issue to unplugging the queue; it would be useful to be able to mark an I/O as "this is bio is one that we will eventually end up waiting to complete", separately from "please unplug the the queue after this bio submitted". BTW, I notice that the CFQ io scheduler prioritizes REQ_RW_META bio's behind REQ_RW_SYNC bio's, but ahead of normal bio requeuss. But as far as I can tell nothing is actually marking requests REQ_RW_META. What is the intended use for this, and are there plans to make other I/O schedulers honor REQ_RW_META? - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL 2009-01-04 23:19 ` Andrew Morton 2009-01-05 0:21 ` Theodore Tso @ 2009-01-05 8:02 ` Jens Axboe 1 sibling, 0 replies; 4+ messages in thread From: Jens Axboe @ 2009-01-05 8:02 UTC (permalink / raw) To: Andrew Morton; +Cc: Theodore Tso, linux-mm, linux-ext4, Arjan van de Ven On Sun, Jan 04 2009, Andrew Morton wrote: > On Sun, 4 Jan 2009 17:43:51 -0500 Theodore Tso <tytso@mit.edu> wrote: > > > On Sun, Jan 04, 2009 at 02:23:03PM -0800, Andrew Morton wrote: > > > > Following up with an e-mail thread started by Arjan two months ago, > > > > (subject: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority), I have > > > > a patch, just sent to linux-ext4@vger.kernel.org, which fixes the jbd2 > > > > layer to submit journal writes via submit_bh() with WRITE_SYNC. > > > > Hopefully this might be enough of a priority boost so we don't have to > > > > force a higher I/O priority level via a buffer_head flag. However, > > > > while looking through the code paths, in ordered data mode, we end up > > > > flushing data pages via the page writeback paths on a per-inode basis, > > > > and I noticed that even though we are passing in > > > > wbc.sync_mode=WBC_SYNC_ALL, __block_write_full_page() is using > > > > submit_bh(WRITE, bh) instead of submit_bh(WRITE_SYNC). > > > > > > But this is all the wrong way to fix the problem, isn't it? > > > > > > The problem is that at one particular point, the current transaction > > > blocks callers behind the committing transaction's IO completion. > > > > > > Did anyone look at fixing that? ISTR concluding that a data copy and > > > shadow-bh arrangement might be needed. > > > > I haven't had time to really drill down into the jbd code yet, and > > yes, eventually we probably want to do this. > > We do. > > > Still, if we are > > submitting I/O which we are going to end up waiting on, we really > > should submit it with WRITE_SYNC, and this patch should optimize > > writes in other situations; for example, if we fsync() a file, we will > > also end up calling block_write_full_page(), and so supplying the > > WRITE_SYNC hint to the block layer would be a Good Thing. > > Is it? WRITE_SYNC means "unplug the queue after this bh/BIO". By setting > it against every bh, don't we risk the generation of more BIOs and > the loss of merging opportunities? But it also implies that the io scheduler will treat the IO as sync even if it is a write, which seems to be the very effect that Ted is looking for as well. Perhaps we should seperate it into two behavioural flags instead and make the unplugging explicit. -- Jens Axboe -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-01-05 8:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1LJatq-00061O-0e@closure.thunk.org>
[not found] ` <20090104142303.98762f81.akpm@linux-foundation.org>
2009-01-04 22:43 ` [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL Theodore Tso
2009-01-04 23:19 ` Andrew Morton
2009-01-05 0:21 ` Theodore Tso
2009-01-05 8:02 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).