linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	axboe@kernel.dk, linux-block@vger.kernel.org,
	xfs <linux-xfs@vger.kernel.org>
Subject: Block device direct read EIO handling broken?
Date: Mon, 5 Aug 2019 11:15:24 -0700	[thread overview]
Message-ID: <20190805181524.GE7129@magnolia> (raw)

Hi Damien,

I noticed a regression in xfs/747 (an unreleased xfstest for the
xfs_scrub media scanning feature) on 5.3-rc3.  I'll condense that down
to a simpler reproducer:

# dmsetup table
error-test: 0 209 linear 8:48 0
error-test: 209 1 error 
error-test: 210 6446894 linear 8:48 210

Basically we have a ~3G /dev/sdd and we set up device mapper to fail IO
for sector 209 and to pass the io to the scsi device everywhere else.

On 5.3-rc3, performing a directio pread of this range with a < 1M buffer
(in other words, a request for fewer than MAX_BIO_PAGES bytes) yields
EIO like you'd expect:

# strace -e pread64 xfs_io -d -c 'pread -b 1024k 0k 1120k' /dev/mapper/error-test
pread64(3, 0x7f880e1c7000, 1048576, 0)  = -1 EIO (Input/output error)
pread: Input/output error
+++ exited with 0 +++

But doing it with a larger buffer succeeds(!):

# strace -e pread64 xfs_io -d -c 'pread -b 2048k 0k 1120k' /dev/mapper/error-test
pread64(3, "XFSB\0\0\20\0\0\0\0\0\0\fL\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1146880, 0) = 1146880
read 1146880/1146880 bytes at offset 0
1 MiB, 1 ops; 0.0009 sec (1.124 GiB/sec and 1052.6316 ops/sec)
+++ exited with 0 +++

(Note that the part of the buffer corresponding to the dm-error area is
uninitialized)

On 5.3-rc2, both commands would fail with EIO like you'd expect.  The
only change between rc2 and rc3 is commit 0eb6ddfb865c ("block: Fix
__blkdev_direct_IO() for bio fragments").

AFAICT we end up in __blkdev_direct_IO with a 1120K buffer, which gets
split into two bios: one for the first BIO_MAX_PAGES worth of data (1MB)
and a second one for the 96k after that.

I think the problem is that every time we submit a bio, we increase ret
by the size of that bio, but at the time we do that we have no idea if
the bio is going to succeed or not.  At the end of the function we do:

	if (!ret)
		ret = blk_status_to_errno(dio->bio.bi_status);

Which means that we only pick up the IO error if we haven't already set
ret.  I suppose that was useful for being able to return a short read,
but now that we always increment ret by the size of the bio, we act like
the whole buffer was read.  I tried a -rc2 kernel and found that 40% of
the time I'd get an EIO and the rest of the time I got a short read.

Not sure where to go from here, but something's not right...

--D

             reply	other threads:[~2019-08-05 18:15 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-05 18:15 Darrick J. Wong [this message]
2019-08-05 18:31 ` Block device direct read EIO handling broken? Jens Axboe
2019-08-05 20:31   ` Jens Axboe
2019-08-05 20:54     ` Jens Axboe
2019-08-05 21:08     ` Damien Le Moal
2019-08-05 21:25       ` Jens Axboe
2019-08-05 21:27         ` Damien Le Moal
2019-08-05 21:28           ` Jens Axboe
2019-08-05 21:59             ` Damien Le Moal
2019-08-05 22:05               ` Damien Le Moal
2019-08-06  0:05                 ` Damien Le Moal
2019-08-06  0:23                   ` Dave Chinner
2019-08-06 11:32                     ` Damien Le Moal
2019-08-06  4:09                   ` Jens Axboe
2019-08-06  7:05                     ` Damien Le Moal
2019-08-06 13:34                       ` Jens Axboe
2019-08-07  9:42                         ` Damien Le Moal
2019-08-06 13:23                     ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190805181524.GE7129@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).