linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL
       [not found] ` <20090104142303.98762f81.akpm@linux-foundation.org>
@ 2009-01-04 22:43   ` Theodore Tso
  2009-01-04 23:19     ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Theodore Tso @ 2009-01-04 22:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-ext4, Arjan van de Ven, Jens Axboe

On Sun, Jan 04, 2009 at 02:23:03PM -0800, Andrew Morton wrote:
> > Following up with an e-mail thread started by Arjan two months ago,
> > (subject: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority), I have
> > a patch, just sent to linux-ext4@vger.kernel.org, which fixes the jbd2
> > layer to submit journal writes via submit_bh() with WRITE_SYNC.
> > Hopefully this might be enough of a priority boost so we don't have to
> > force a higher I/O priority level via a buffer_head flag.  However,
> > while looking through the code paths, in ordered data mode, we end up
> > flushing data pages via the page writeback paths on a per-inode basis,
> > and I noticed that even though we are passing in
> > wbc.sync_mode=WBC_SYNC_ALL, __block_write_full_page() is using
> > submit_bh(WRITE, bh) instead of submit_bh(WRITE_SYNC).
> 
> But this is all the wrong way to fix the problem, isn't it?
> 
> The problem is that at one particular point, the current transaction
> blocks callers behind the committing transaction's IO completion.
> 
> Did anyone look at fixing that?  ISTR concluding that a data copy and
> shadow-bh arrangement might be needed.

I haven't had time to really drill down into the jbd code yet, and
yes, eventually we probably want to do this.  Still, if we are
submitting I/O which we are going to end up waiting on, we really
should submit it with WRITE_SYNC, and this patch should optimize
writes in other situations; for example, if we fsync() a file, we will
also end up calling block_write_full_page(), and so supplying the
WRITE_SYNC hint to the block layer would be a Good Thing.

	   	       	     	   	      - Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL
  2009-01-04 22:43   ` [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL Theodore Tso
@ 2009-01-04 23:19     ` Andrew Morton
  2009-01-05  0:21       ` Theodore Tso
  2009-01-05  8:02       ` Jens Axboe
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Morton @ 2009-01-04 23:19 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-mm, linux-ext4, Arjan van de Ven, Jens Axboe

On Sun, 4 Jan 2009 17:43:51 -0500 Theodore Tso <tytso@mit.edu> wrote:

> On Sun, Jan 04, 2009 at 02:23:03PM -0800, Andrew Morton wrote:
> > > Following up with an e-mail thread started by Arjan two months ago,
> > > (subject: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority), I have
> > > a patch, just sent to linux-ext4@vger.kernel.org, which fixes the jbd2
> > > layer to submit journal writes via submit_bh() with WRITE_SYNC.
> > > Hopefully this might be enough of a priority boost so we don't have to
> > > force a higher I/O priority level via a buffer_head flag.  However,
> > > while looking through the code paths, in ordered data mode, we end up
> > > flushing data pages via the page writeback paths on a per-inode basis,
> > > and I noticed that even though we are passing in
> > > wbc.sync_mode=WBC_SYNC_ALL, __block_write_full_page() is using
> > > submit_bh(WRITE, bh) instead of submit_bh(WRITE_SYNC).
> > 
> > But this is all the wrong way to fix the problem, isn't it?
> > 
> > The problem is that at one particular point, the current transaction
> > blocks callers behind the committing transaction's IO completion.
> > 
> > Did anyone look at fixing that?  ISTR concluding that a data copy and
> > shadow-bh arrangement might be needed.
> 
> I haven't had time to really drill down into the jbd code yet, and
> yes, eventually we probably want to do this.

We do.

>  Still, if we are
> submitting I/O which we are going to end up waiting on, we really
> should submit it with WRITE_SYNC, and this patch should optimize
> writes in other situations; for example, if we fsync() a file, we will
> also end up calling block_write_full_page(), and so supplying the
> WRITE_SYNC hint to the block layer would be a Good Thing.

Is it?  WRITE_SYNC means "unplug the queue after this bh/BIO".  By setting
it against every bh, don't we risk the generation of more BIOs and
the loss of merging opportunities?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL
  2009-01-04 23:19     ` Andrew Morton
@ 2009-01-05  0:21       ` Theodore Tso
  2009-01-05  8:02       ` Jens Axboe
  1 sibling, 0 replies; 4+ messages in thread
From: Theodore Tso @ 2009-01-05  0:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-ext4, Arjan van de Ven, Jens Axboe

On Sun, Jan 04, 2009 at 03:19:27PM -0800, Andrew Morton wrote:
> >  Still, if we are
> > submitting I/O which we are going to end up waiting on, we really
> > should submit it with WRITE_SYNC, and this patch should optimize
> > writes in other situations; for example, if we fsync() a file, we will
> > also end up calling block_write_full_page(), and so supplying the
> > WRITE_SYNC hint to the block layer would be a Good Thing.
> 
> Is it?  WRITE_SYNC means "unplug the queue after this bh/BIO".  By setting
> it against every bh, don't we risk the generation of more BIOs and
> the loss of merging opportunities?

Good point, yeah, that's a problem.  Some of IO schedulers also use
REQ_RW_SYNC to prioritize the I/O's above non-sync I/O's.  That's an
orthognal issue to unplugging the queue; it would be useful to be able
to mark an I/O as "this is bio is one that we will eventually end up
waiting to complete", separately from "please unplug the the queue
after this bio submitted".

BTW, I notice that the CFQ io scheduler prioritizes REQ_RW_META bio's
behind REQ_RW_SYNC bio's, but ahead of normal bio requeuss.  But as
far as I can tell nothing is actually marking requests REQ_RW_META.
What is the intended use for this, and are there plans to make other
I/O schedulers honor REQ_RW_META?

							- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL
  2009-01-04 23:19     ` Andrew Morton
  2009-01-05  0:21       ` Theodore Tso
@ 2009-01-05  8:02       ` Jens Axboe
  1 sibling, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2009-01-05  8:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Theodore Tso, linux-mm, linux-ext4, Arjan van de Ven

On Sun, Jan 04 2009, Andrew Morton wrote:
> On Sun, 4 Jan 2009 17:43:51 -0500 Theodore Tso <tytso@mit.edu> wrote:
> 
> > On Sun, Jan 04, 2009 at 02:23:03PM -0800, Andrew Morton wrote:
> > > > Following up with an e-mail thread started by Arjan two months ago,
> > > > (subject: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority), I have
> > > > a patch, just sent to linux-ext4@vger.kernel.org, which fixes the jbd2
> > > > layer to submit journal writes via submit_bh() with WRITE_SYNC.
> > > > Hopefully this might be enough of a priority boost so we don't have to
> > > > force a higher I/O priority level via a buffer_head flag.  However,
> > > > while looking through the code paths, in ordered data mode, we end up
> > > > flushing data pages via the page writeback paths on a per-inode basis,
> > > > and I noticed that even though we are passing in
> > > > wbc.sync_mode=WBC_SYNC_ALL, __block_write_full_page() is using
> > > > submit_bh(WRITE, bh) instead of submit_bh(WRITE_SYNC).
> > > 
> > > But this is all the wrong way to fix the problem, isn't it?
> > > 
> > > The problem is that at one particular point, the current transaction
> > > blocks callers behind the committing transaction's IO completion.
> > > 
> > > Did anyone look at fixing that?  ISTR concluding that a data copy and
> > > shadow-bh arrangement might be needed.
> > 
> > I haven't had time to really drill down into the jbd code yet, and
> > yes, eventually we probably want to do this.
> 
> We do.
> 
> >  Still, if we are
> > submitting I/O which we are going to end up waiting on, we really
> > should submit it with WRITE_SYNC, and this patch should optimize
> > writes in other situations; for example, if we fsync() a file, we will
> > also end up calling block_write_full_page(), and so supplying the
> > WRITE_SYNC hint to the block layer would be a Good Thing.
> 
> Is it?  WRITE_SYNC means "unplug the queue after this bh/BIO".  By setting
> it against every bh, don't we risk the generation of more BIOs and
> the loss of merging opportunities?

But it also implies that the io scheduler will treat the IO as sync even
if it is a write, which seems to be the very effect that Ted is looking
for as well.

Perhaps we should seperate it into two behavioural flags instead and
make the unplugging explicit.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-01-05  8:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E1LJatq-00061O-0e@closure.thunk.org>
     [not found] ` <20090104142303.98762f81.akpm@linux-foundation.org>
2009-01-04 22:43   ` [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL Theodore Tso
2009-01-04 23:19     ` Andrew Morton
2009-01-05  0:21       ` Theodore Tso
2009-01-05  8:02       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).