linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-ext4@vger.kernel.org,
	Arjan van de Ven <arjan@infradead.org>
Subject: Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL
Date: Mon, 5 Jan 2009 09:47:40 -0500	[thread overview]
Message-ID: <20090105144740.GA4116@mit.edu> (raw)
In-Reply-To: <20090105080241.GX32491@kernel.dk>

[ I've removed linux-mm from the cc list since we seem to have
  wandered away from any details of page writeback.  -- Ted]

On Mon, Jan 05, 2009 at 09:02:43AM +0100, Jens Axboe wrote:
> > Is it?  WRITE_SYNC means "unplug the queue after this bh/BIO".  By setting
> > it against every bh, don't we risk the generation of more BIOs and
> > the loss of merging opportunities?
> 
> But it also implies that the io scheduler will treat the IO as sync even
> if it is a write, which seems to be the very effect that Ted is looking
> for as well.

Yeah, but I suspect the downsides caused by the lack of merging will
far outweigh wins from giving the hints to the I/O scheduler.
Separting things into two behavioural flags sounds like the the right
thing to me.  Until that happens, I've dropped my proposed patch and
substituted a patch which allows the user to diddle the I/O priorities
of kjournald2 via a mount option so we can experiment a bit.

I agree that Andrew's long-term solution is probably the right one,
but there will be times (i.e., when we are doing a checkpoint as
opposed to a commit, or in a fsync-heavy workload), where we will end
up getting blocked behind kjournald, so upping the I/O priority really
does make sense.  So a tunable mount option seems to make sense even
in the long run, since for some workloads we'll want to adjust
kjournald's I/O priority even after we stop normal (non-fsync) I/O
from blocking against the commit operation done by kjournald.

Jens, one question....  I came across the folllowing in blkdev.h:

      __REQ_RW_SYNC,		/* request is sync (O_DIRECT) */

Is the comment about O_DIRECT still relevant?  I'm not sure it is.

Also, there's another confusing comment in bio.h:

#define BIO_RW		0	/* Must match RW in req flags (blkdev.h) */
#define BIO_RW_AHEAD	1	/* Must match FAILFAST in req flags */
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
#define BIO_RW_FAILFAST_DEV		6
#define BIO_RW_FAILFAST_TRANSPORT	7
#define BIO_RW_FAILFAST_DRIVER		8


In fact, submit_bh() depends on BIO_RW_AHEAD matching with the
definition of READA in fs.h.  I'm a bit confused about the fact that
we have both BIO_RW_AHEAD and BIO_RW_FAILFAST_DEV, and then in the req
flags in blkdev.h:

/*
 * request type modified bits. first two bits match BIO_RW* bits, important
 */
enum rq_flag_bits {
     __REQ_RW,		/* not set, read. set, write */
     __REQ_FAILFAST_DEV,   /* no driver retries of device errors */
     __REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
     __REQ_FAILFAST_DRIVER,    /* no driver retries of driver errors */

I assume when doing readhaead, we don't want to retry in the face of
device errors, which is why it's desirable for __REQ_FAILFAST_DEV
overlays with BIO_RW_AHEAD.  But if that's the case, why are
BIO_RW_READA and BIO_RW_FAILFAST_DEV not mapped to the same BIO_RW_*
flag?

Am I missing something here?  As far as I can tell nothing seems to be
using BIO_RW_FAILFAST_DEV.  Is there a reason why it's not just
#define'd to be the same value as BIO_RW_READA?

Thanks, regards,

						- Ted

  reply	other threads:[~2009-01-05 14:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-04 21:52 [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL Theodore Ts'o
2009-01-04 22:23 ` Andrew Morton
2009-01-04 22:43   ` Theodore Tso
2009-01-04 23:19     ` Andrew Morton
2009-01-05  0:21       ` Theodore Tso
2009-01-05  8:02       ` Jens Axboe
2009-01-05 14:47         ` Theodore Tso [this message]
2009-01-05 15:58           ` Jens Axboe
2009-01-05 18:47           ` Andrew Morton
2009-01-05 19:35             ` Theodore Tso
2009-01-05 19:38               ` Jens Axboe
2009-01-05 21:16                 ` Theodore Tso
2009-01-06  7:34                   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090105144740.GA4116@mit.edu \
    --to=tytso@mit.edu \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).