All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junichi Nomura <j-nomura@ce.jp.nec.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>,
	device-mapper development <dm-devel@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>
Subject: Re: [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones
Date: Thu, 28 May 2015 06:38:29 +0000	[thread overview]
Message-ID: <5566B7E5.1070101@ce.jp.nec.com> (raw)
In-Reply-To: <5565935A.4060408@ce.jp.nec.com>

[-- Attachment #1: Type: text/plain, Size: 3053 bytes --]

On 05/27/15 18:50, Junichi Nomura wrote:
> On 05/27/15 17:21, Christoph Hellwig wrote:
>> On Tue, May 26, 2015 at 06:20:43AM +0000, Junichi Nomura wrote:
>>> Not completing bios is not sufficient.
>>> If you advance the bi_iter to the end, you need to somehow rewind it
>>> or the re-submission will be incomplete, that would end up as a data
>>> corruption...

Less critical than the data corruption issue,
I'm also worried about partial completion case.
For successful partial completion, current code completes
bio before fully completing the request.
Your patch changes bios not completed until the request is
fully completed.

Other related concern is partial failure. In the case of bad sector,
for example, current code fails I/O for the particular sector but
other sectors in the request succeeds.
If you make the request completion as all-or-nothing model,
that will be a degrade for such a case.

I'm not very sure how much impact does the removal of partial
completion have in the real world.
If partial completion is so negligible, I think it should be
handled in such a way all the cases, instead of special casing
REQ_CLONE.

>> Can you explain which particular case you're worried about?
> 
> General path failure case.
> 
> On retrying, another clone is created but bios it points to
> are already advanced to the end with your patch.
> So they look like bios with no remaining segments.
> Lower driver may successfully completes such a resubmitted
> clone *without doing actual I/O*.
> Then written data will be lost / read data will be bogus.
> 
> Can you test this scenario with your patch?
>   1. Set up a multipath device with fail-over mode
>   2. Write something to the multipath device.
>      After the clone request is sent to the primary path
>      and before the data goes to the disk, 
>      down the primary path
>      (e.g. echo offline > /sys/block/sdXX/device/state)
>   3. (dm-mpath will retry from the secondary path and
>       the write will eventually succeed)
>   4. Verify if the written data is really on the disk

I made a small script so that people can play with.
The script sets up tcm_loop multipath device and fio verification
test while repeating paths up and down quickly.

When your patch is applied, fio reports verification failure within
a minute like this:

# ./stress-mp.sh
..
test1: (g=0): rw=randwrite, bs=512K-512K/512K-512K/512K-512K, ioengine=libaio, iodepth=2
fio-2.2.8-16-g68d9
Starting 1 process
meta: verify failed at file /dev/mapper/mp offset 477626368, length 524288
       received data dumped as mp.477626368.received
       expected data dumped as mp.477626368.expected
fio: pid=13560, err=84/file:io_u.c:1866, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character

test1: (groupid=0, jobs=1): err=84 (file:io_u.c:1866, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=13560: Thu May 28 01:54:56 2015

-- 
Jun'ichi Nomura, NEC Corporation


[-- Attachment #2: stress-mp.sh --]
[-- Type: application/x-sh, Size: 1677 bytes --]

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



  reply	other threads:[~2015-05-28  6:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-22 13:14 [PATCH for-4.2 1/3] block: remove management of bi_remaining when restoring original bi_end_io Mike Snitzer
2015-05-22 13:14 ` [PATCH for-4.2 2/3] block, dm: don't copy bios for request clones Mike Snitzer
2015-05-26  6:20   ` Junichi Nomura
2015-05-27  8:21     ` Christoph Hellwig
2015-05-27  9:50       ` Junichi Nomura
2015-05-28  6:38         ` Junichi Nomura [this message]
2015-05-29 16:54           ` Christoph Hellwig
2015-06-01  1:14             ` Junichi Nomura
2015-06-03  7:37               ` Christoph Hellwig
2015-06-03 23:13                 ` Junichi Nomura
2015-05-29 16:55         ` Christoph Hellwig
2015-06-01  1:19           ` Junichi Nomura
2015-06-03  7:39             ` Christoph Hellwig
2015-05-22 13:14 ` [PATCH for-4.2 3/3] block: remove export for blk_queue_bio Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5566B7E5.1070101@ce.jp.nec.com \
    --to=j-nomura@ce.jp.nec.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=hch@lst.de \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.