Linux block layer
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Matias Bjorling <Matias.Bjorling@wdc.com>
Subject: Re: [PATCH] block: Disable write plugging for zoned block devices
Date: Wed, 10 Jul 2019 11:10:05 +0800	[thread overview]
Message-ID: <20190710031003.GB2621@ming.t460p> (raw)
In-Reply-To: <BYAPR04MB58162A2087D53F27BAF8176BE7F10@BYAPR04MB5816.namprd04.prod.outlook.com>

On Tue, Jul 09, 2019 at 02:47:12PM +0000, Damien Le Moal wrote:
> Hi Ming,
> 
> On 2019/07/09 23:29, Ming Lei wrote:
> > On Tue, Jul 09, 2019 at 06:02:19PM +0900, Damien Le Moal wrote:
> >> Simultaneously writing to a sequential zone of a zoned block device
> >> from multiple contexts requires mutual exclusion for BIO issuing to
> >> ensure that writes happen sequentially. However, even for a well
> >> behaved user correctly implementing such synchronization, BIO plugging
> >> may interfere and result in BIOs from the different contextx to be
> >> reordered if plugging is done outside of the mutual exclusion section,
> >> e.g. the plug was started by a function higher in the call chain than
> >> the function issuing BIOs.
> >>
> >>       Context A                           Context B
> >>
> >>    | blk_start_plug()
> >>    | ...
> >>    | seq_write_zone()
> >>      | mutex_lock(zone)
> >>      | submit_bio(bio-0)
> >>      | submit_bio(bio-1)
> >>      | mutex_unlock(zone)
> >>      | return
> >>    | ------------------------------> | seq_write_zone()
> >>   				       | mutex_lock(zone)
> >> 				       | submit_bio(bio-2)
> >> 				       | mutex_unlock(zone)
> >>    | <------------------------------ |
> >>    | blk_finish_plug()
> >>
> >> In the above example, despite the mutex synchronization resulting in the
> >> correct BIO issuing order 0, 1, 2, context A BIOs 0 and 1 end up being
> >> issued after BIO 2 when the plug is released with blk_finish_plug().
> > 
> > I am wondering how you guarantee that context B is always run after
> > context A.
> 
> My example was a little too oversimplified. Think of a file system allocating
> blocks sequentially and issuing page I/Os for the allocated blocks sequentially.
> The typical sequence is:
> 
> mutex_lock(zone)
> alloc_block_extent(zone)
> for all blocks in the extent
> 	submit_bio()
> mutex_unlock(zone)
> 
> This way, it does not matter which context gets the lock first, all write BIOs
> for the zone remain sequential. The problem with plugs as explained above is

But wrt. the example in the commit log, it does matter which context gets the lock
first, and it implies that context A has to run seq_write_zone() first,
because you mentioned bio-2 has to be issued after bio-0 and bio-1.

If there is 3rd context which is holding the lock, then either context A or
context B can win in getting the lock first. So looks the zone lock itself
isn't enough for maintaining the IO order. But that may not be related
with this patch.

Also seems there is issue with REQ_NOWAIT for zone support, for example,
context A may see out-of-request and return earlier, however context B
may get request and move on.

> that if the plug start/finish is not within the zone lock, reordering can happen
> for the 2 sequences of BIOs issued by the 2 contexts.
> 
> We hit this problem with btrfs writepages (page writeback) where plugging is
> done before the above sequence execution, in the caller function of the page
> writeback processing, resulting in unaligned write errors.
> 
> >> To fix this problem, introduce the internal helper function
> >> blk_mq_plug() to access the current context plug, return the current
> >> plug only if the target device is not a zoned block device or if the
> >> BIO to be plugged not a write operation. Otherwise, ignore the plug and
> >> return NULL, resulting is all writes to zoned block device to never be
> >> plugged.
> > 
> > Another candidate approach is to run the following code before
> > releasing 'zone' lock:
> > 
> > 	if (current->plug)
> > 		blk_finish_plug(context->plug)
> > 
> > Then we can fix zone specific issue in zone code only, and avoid generic
> > blk-core change for zone issue.
> 
> Yes indeed, that would work too. But this patch is precisely to avoid having to
> add such code and simplify implementing support for zoned block device in
> existing code. Furthermore, plugging for writes to sequential zones has no real
> value because mq-deadline will dispatch at most one write per zone. So writes
> for a single zone tend to accumulate in the scheduler queue, and that creates
> plenty of opportunities for merging small sequential writes (e.g. file system
> page BIOs).
> 
> If you think this patch is really not appropriate, we can still address the
> problem case by case in the support we add for zoned devices. But again, a
> generic solution makes things simpler I think.

OK, then I am fine with this simple generic approach.

Thanks,
Ming

  reply	other threads:[~2019-07-10  3:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-09  9:02 [PATCH] block: Disable write plugging for zoned block devices Damien Le Moal
2019-07-09 11:15 ` Johannes Thumshirn
2019-07-09 13:51 ` Bart Van Assche
2019-07-09 14:59   ` Damien Le Moal
2019-07-09 14:29 ` Ming Lei
2019-07-09 14:47   ` Damien Le Moal
2019-07-10  3:10     ` Ming Lei [this message]
2019-07-10  4:14       ` Damien Le Moal
2019-07-10  2:55   ` Jens Axboe
2019-07-10  2:59     ` Damien Le Moal
2019-07-10 13:59       ` Jens Axboe
2019-07-10  2:32 ` Bart Van Assche
2019-07-10  2:36   ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190710031003.GB2621@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Matias.Bjorling@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox