All of lore.kernel.org
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-ext4@vger.kernel.org,
	Arjan van de Ven <arjan@infradead.org>
Subject: Re: [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL
Date: Mon, 5 Jan 2009 09:47:40 -0500	[thread overview]
Message-ID: <20090105144740.GA4116@mit.edu> (raw)
In-Reply-To: <20090105080241.GX32491@kernel.dk>

[ I've removed linux-mm from the cc list since we seem to have
  wandered away from any details of page writeback.  -- Ted]

On Mon, Jan 05, 2009 at 09:02:43AM +0100, Jens Axboe wrote:
> > Is it?  WRITE_SYNC means "unplug the queue after this bh/BIO".  By setting
> > it against every bh, don't we risk the generation of more BIOs and
> > the loss of merging opportunities?
> 
> But it also implies that the io scheduler will treat the IO as sync even
> if it is a write, which seems to be the very effect that Ted is looking
> for as well.

Yeah, but I suspect the downsides caused by the lack of merging will
far outweigh wins from giving the hints to the I/O scheduler.
Separting things into two behavioural flags sounds like the the right
thing to me.  Until that happens, I've dropped my proposed patch and
substituted a patch which allows the user to diddle the I/O priorities
of kjournald2 via a mount option so we can experiment a bit.

I agree that Andrew's long-term solution is probably the right one,
but there will be times (i.e., when we are doing a checkpoint as
opposed to a commit, or in a fsync-heavy workload), where we will end
up getting blocked behind kjournald, so upping the I/O priority really
does make sense.  So a tunable mount option seems to make sense even
in the long run, since for some workloads we'll want to adjust
kjournald's I/O priority even after we stop normal (non-fsync) I/O
from blocking against the commit operation done by kjournald.

Jens, one question....  I came across the folllowing in blkdev.h:

      __REQ_RW_SYNC,		/* request is sync (O_DIRECT) */

Is the comment about O_DIRECT still relevant?  I'm not sure it is.

Also, there's another confusing comment in bio.h:

#define BIO_RW		0	/* Must match RW in req flags (blkdev.h) */
#define BIO_RW_AHEAD	1	/* Must match FAILFAST in req flags */
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
#define BIO_RW_FAILFAST_DEV		6
#define BIO_RW_FAILFAST_TRANSPORT	7
#define BIO_RW_FAILFAST_DRIVER		8


In fact, submit_bh() depends on BIO_RW_AHEAD matching with the
definition of READA in fs.h.  I'm a bit confused about the fact that
we have both BIO_RW_AHEAD and BIO_RW_FAILFAST_DEV, and then in the req
flags in blkdev.h:

/*
 * request type modified bits. first two bits match BIO_RW* bits, important
 */
enum rq_flag_bits {
     __REQ_RW,		/* not set, read. set, write */
     __REQ_FAILFAST_DEV,   /* no driver retries of device errors */
     __REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
     __REQ_FAILFAST_DRIVER,    /* no driver retries of driver errors */

I assume when doing readhaead, we don't want to retry in the face of
device errors, which is why it's desirable for __REQ_FAILFAST_DEV
overlays with BIO_RW_AHEAD.  But if that's the case, why are
BIO_RW_READA and BIO_RW_FAILFAST_DEV not mapped to the same BIO_RW_*
flag?

Am I missing something here?  As far as I can tell nothing seems to be
using BIO_RW_FAILFAST_DEV.  Is there a reason why it's not just
#define'd to be the same value as BIO_RW_READA?

Thanks, regards,

						- Ted

  reply	other threads:[~2009-01-05 14:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-04 21:52 [PATCH, RFC] Use WRITE_SYNC in __block_write_full_page() if WBC_SYNC_ALL Theodore Ts'o
2009-01-04 22:23 ` Andrew Morton
2009-01-04 22:43   ` Theodore Tso
2009-01-04 22:43     ` Theodore Tso
2009-01-04 23:19     ` Andrew Morton
2009-01-05  0:21       ` Theodore Tso
2009-01-05  8:02       ` Jens Axboe
2009-01-05 14:47         ` Theodore Tso [this message]
2009-01-05 15:58           ` Jens Axboe
2009-01-05 18:47           ` Andrew Morton
2009-01-05 19:35             ` Theodore Tso
2009-01-05 19:38               ` Jens Axboe
2009-01-05 21:16                 ` Theodore Tso
2009-01-06  7:34                   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090105144740.GA4116@mit.edu \
    --to=tytso@mit.edu \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.