All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: jaxboe@fusionio.com, linux-fsdevel@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
	hch@lst.de, James.Bottomley@suse.de, tytso@mit.edu,
	chris.mason@oracle.com, swhiteho@redhat.com,
	konishi.ryusuke@lab.ntt.co.jp, dm-devel@redhat.com, vst@vlnb.net,
	jack@suse.cz, rwheeler@redhat.com, hare@suse.de, neilb@suse.de,
	rusty@rustcorp.com.au, mst@redhat.com,
	Mikulas Patocka <mpatocka@redhat.com>,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
	Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Subject: Re: [PATCH 5/5] dm: implement REQ_FLUSH/FUA support
Date: Tue, 17 Aug 2010 10:07:35 -0400	[thread overview]
Message-ID: <20100817140734.GA30768@redhat.com> (raw)
In-Reply-To: <4C6A5780.2090100@kernel.org>

On Tue, Aug 17 2010 at  5:33am -0400,
Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> On 08/16/2010 09:02 PM, Mike Snitzer wrote:
> > On Mon, Aug 16 2010 at 12:52pm -0400,
> > Tejun Heo <tj@kernel.org> wrote:
> > 
> >> From: Tejun Heo <tj@kernle.org>
> >>
> >> This patch converts dm to support REQ_FLUSH/FUA instead of now
> >> deprecated REQ_HARDBARRIER.
> > 
> > What tree does this patch apply to?  I know it doesn't apply to
> > v2.6.36-rc1, e.g.: http://git.kernel.org/linus/708e929513502fb0
> 
> (from the head message)
> These patches are on top of
> 
>   block#for-2.6.36-post (c047ab2dddeeafbd6f7c00e45a13a5c4da53ea0b)
> + block-replace-barrier-with-sequenced-flush patchset[1]
> + block-fix-incorrect-bio-request-flag-conversion-in-md patch[2]
> 
> and available in the following git tree.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git flush-fua
> 
> [1] http://thread.gmane.org/gmane.linux.kernel/1022363
> [2] http://thread.gmane.org/gmane.linux.kernel/1023435
> 
> Probably fetching the git tree is the easist way to review?

OK, I missed this info because I just looked at the DM patch.
 
> >> For bio-based dm,
> >> * -EOPNOTSUPP retry logic dropped.
> >
> > That logic wasn't just about retries (at least not in the latest
> > kernel).  With commit 708e929513502fb0 the -EOPNOTSUPP checking also
> > serves to optimize the barrier+discard case (when discards aren't
> > supported).
> 
> With the patch applied, there's no second flush.  Those requests would
> now be REQ_FLUSH + REQ_DISCARD.  The first can't be avoided anyway and
> there won't be the second flush to begin with, so I don't think this
> worsens anything.

Makes sense, but your patches still need to be refreshed against the
latest (2.6.36-rc1) upstream code.  Numerous changes went in to DM
recently.

> >> * Nothing much changes.  It just needs to handle FLUSH requests as
> >>   before.  It would be beneficial to advertise FUA capability so that
> >>   it can propagate FUA flags down to member request_queues instead of
> >>   sequencing it as WRITE + FLUSH at the top queue.
> > 
> > Can you expand on that TODO a bit?  What is the mechanism to propagate
> > FUA down to a DM device's members?  I'm only aware of propagating member
> > devices' features up to the top-level DM device's request-queue (not the
> > opposite).
> > 
> > Are you saying that establishing the FUA capability on the top-level DM
> > device's request_queue is sufficient?  If so then why not make the
> > change?
> 
> Yeah, I think it would be enough to always advertise FLUSH|FUA if the
> member devices support FLUSH (regardless of FUA support).  The reason
> why I didn't do it was, umm, laziness, I suppose.

I don't buy it.. you're far from lazy! ;)

> >> Lightly tested linear, stripe, raid1, snap and crypt targets.  Please
> >> proceed with caution as I'm not familiar with the code base.
> > 
> > This is concerning...
> 
> Yeap, I want you to be concerned. :-) This was the first time I looked
> at the dm code and there are many different disjoint code paths and I
> couldn't fully follow or test all of them, so it definitely needs a
> careful review from someone who understands the whole thing.

You'll need Mikulas (bio-based) and NEC (request-based, Kiyoshi and
Jun'ichi) to give it serious review.

NOTE: NEC has already given some preliminary feedback to hch in the
"[PATCH, RFC 2/2] dm: support REQ_FLUSH directly" thread:
https://www.redhat.com/archives/dm-devel/2010-August/msg00026.html
https://www.redhat.com/archives/dm-devel/2010-August/msg00033.html

> > if we're to offer more comprehensive review I think we need more
> > detail on what guided your changes rather than details of what the
> > resulting changes are.
> 
> I'll try to explain it.  If you have any further questions, please let
> me know.

Thanks for the additional details.

> * For bio based dm:
> 
>   * Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA doesn't have any ordering
>     requirements.  Remove assumptions of ordering and/or draining.
> 
>     A related question: Is dm_wait_for_completion() used in
>     process_flush() safe against starvation under continuous influx of
>     other commands?

OK, so you folded dm_flush() directly into process_flush() -- the code
that was dm_flush() only needs to be called once now.

As for your specific dm_wait_for_completion() concern -- I'll defer to
Mikulas.  But I'll add: we haven't had any reported starvation issues
with DM's existing barrier support.  DM uses a mempool for its clones,
so it should naturally throttle (without starvation) when memory gets
low.

>   * As REQ_FLUSH/FUA doesn't require any ordering of requests before
>     or after it, on array devices, the latter part - REQ_FUA - can be
>     handled like other writes.  ie. REQ_FLUSH needs to be broadcasted
>     to all devices but once that is complete the data/REQ_FUA bio can
>     be sent to only the affected devices.  This needs some care as
>     there are bio cloning/splitting code paths where REQ_FUA bit isn't
>     preserved.
> 
>   * Guarantee that REQ_FLUSH w/ data never reaches targets (this in
>     part is to put it in alignment with request based dm).

bio-based DM already split the barrier out from the data (in
process_barrier).  You've renamed process_barrier to process_flush and
added the REQ_FLUSH logic like I'd expect.

> * For request based dm:
> 
>   * The sequencing is done by the block layer for the top level
>     request_queue, so the only things request based dm needs to make
>     sure is 1. handling empty REQ_FLUSH correctly (block layer will
>     only send down empty REQ_FLUSHes) and 2. propagate REQ_FUA bit to
>     member devices.

OK, so seems 1 is done, 2 is still TODO.  Looking at your tree it seems
2 would be as simple as using the following in
dm_init_request_based_queue (on the most current upstream dm.c):

blk_queue_flush(q, REQ_FLUSH | REQ_FUA);

(your current patch only sets REQ_FLUSH in alloc_dev).

  parent reply	other threads:[~2010-08-17 14:07 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-16 16:51 [RFC PATCHSET block#for-2.6.36-post] block: convert to REQ_FLUSH/FUA Tejun Heo
2010-08-16 16:51 ` Tejun Heo
2010-08-16 16:51 ` [PATCH 1/5] block/loop: implement REQ_FLUSH/FUA support Tejun Heo
2010-08-16 16:51 ` Tejun Heo
2010-08-16 16:51   ` Tejun Heo
2010-08-16 16:52 ` [PATCH 2/5] virtio_blk: " Tejun Heo
2010-08-16 16:52 ` Tejun Heo
2010-08-16 16:52   ` Tejun Heo
2010-08-16 18:33   ` Christoph Hellwig
2010-08-17  8:17     ` Tejun Heo
2010-08-17 13:23       ` Christoph Hellwig
2010-08-17 16:22         ` Tejun Heo
2010-08-18 10:22         ` Rusty Russell
2010-08-17  1:16   ` Rusty Russell
2010-08-17  8:18     ` Tejun Heo
2010-08-19 15:14   ` [PATCH 2/5 UPDATED] virtio_blk: drop REQ_HARDBARRIER support Tejun Heo
2010-08-16 16:52 ` [PATCH 3/5] lguest: replace VIRTIO_F_BARRIER support with VIRTIO_F_FLUSH/FUA support Tejun Heo
2010-08-16 16:52   ` Tejun Heo
2010-08-19 15:15   ` [PATCH 3/5] lguest: replace VIRTIO_F_BARRIER support with VIRTIO_F_FLUSH support Tejun Heo
2010-08-16 16:52 ` [PATCH 3/5] lguest: replace VIRTIO_F_BARRIER support with VIRTIO_F_FLUSH/FUA support Tejun Heo
2010-08-16 16:52 ` [PATCH 4/5] md: implment REQ_FLUSH/FUA support Tejun Heo
2010-08-16 16:52 ` Tejun Heo
2010-08-16 16:52   ` Tejun Heo
2010-08-24  5:41   ` Neil Brown
2010-08-25 11:22     ` [PATCH UPDATED " Tejun Heo
2010-08-25 11:42       ` Neil Brown
2010-08-16 16:52 ` [PATCH 5/5] dm: implement " Tejun Heo
2010-08-16 16:52   ` Tejun Heo
2010-08-16 19:02   ` Mike Snitzer
2010-08-17  9:33     ` Tejun Heo
2010-08-17 13:13       ` Christoph Hellwig
2010-08-17 14:07       ` Mike Snitzer [this message]
2010-08-17 16:51         ` Tejun Heo
2010-08-17 18:21           ` Mike Snitzer
2010-08-17 18:21             ` Mike Snitzer
2010-08-18  6:32             ` Tejun Heo
2010-08-19 10:32           ` Kiyoshi Ueda
2010-08-19 15:45             ` Tejun Heo
2010-08-16 16:52 ` Tejun Heo
2010-08-18  9:53 ` [RFC PATCHSET block#for-2.6.36-post] block: convert to REQ_FLUSH/FUA Christoph Hellwig
2010-08-18 14:26   ` James Bottomley
2010-08-18 14:33     ` Christoph Hellwig
2010-08-19 15:37     ` FUJITA Tomonori
2010-08-19 15:41       ` Christoph Hellwig
2010-08-19 15:56         ` FUJITA Tomonori
2010-08-23 16:47 ` Christoph Hellwig
2010-08-24  9:51   ` Lars Ellenberg
2010-08-24 15:45   ` Philipp Reisner
     [not found]     ` <20101022083511.GA7853@lst.de>
2010-10-23 11:18       ` [GIT PULL] convert DRBD " Philipp Reisner
2010-10-23 11:21         ` Christoph Hellwig
2010-10-23 16:48         ` Jens Axboe
2010-10-23 16:59           ` [PATCH] block: remove REQ_HARDBARRIER Christoph Hellwig
2010-10-23 17:17             ` Matthew Wilcox
2010-10-23 19:07             ` Jens Axboe
2010-10-24 11:58               ` Christoph Hellwig
2010-11-10 13:42               ` Christoph Hellwig
2010-11-10 13:59                 ` Jens Axboe
2010-10-23 19:08             ` Jens Axboe
2010-10-25  8:54             ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100817140734.GA30768@redhat.com \
    --to=snitzer@redhat.com \
    --cc=James.Bottomley@suse.de \
    --cc=chris.mason@oracle.com \
    --cc=dm-devel@redhat.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=jack@suse.cz \
    --cc=jaxboe@fusionio.com \
    --cc=k-ueda@ct.jp.nec.com \
    --cc=konishi.ryusuke@lab.ntt.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=mst@redhat.com \
    --cc=neilb@suse.de \
    --cc=rusty@rustcorp.com.au \
    --cc=rwheeler@redhat.com \
    --cc=swhiteho@redhat.com \
    --cc=tj@kernel.org \
    --cc=tytso@mit.edu \
    --cc=vst@vlnb.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.