All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	linux-block@vger.kernel.org, xfs <linux-xfs@vger.kernel.org>
Subject: Re: Block device direct read EIO handling broken?
Date: Mon, 5 Aug 2019 13:54:46 -0700	[thread overview]
Message-ID: <1de52140-bc7b-2f1f-0c12-a2f453defdcd@kernel.dk> (raw)
In-Reply-To: <36973a52-e876-fc09-7a63-2fc16b855f8d@kernel.dk>

On 8/5/19 1:31 PM, Jens Axboe wrote:
> On 8/5/19 11:31 AM, Jens Axboe wrote:
>> On 8/5/19 11:15 AM, Darrick J. Wong wrote:
>>> Hi Damien,
>>>
>>> I noticed a regression in xfs/747 (an unreleased xfstest for the
>>> xfs_scrub media scanning feature) on 5.3-rc3.  I'll condense that down
>>> to a simpler reproducer:
>>>
>>> # dmsetup table
>>> error-test: 0 209 linear 8:48 0
>>> error-test: 209 1 error
>>> error-test: 210 6446894 linear 8:48 210
>>>
>>> Basically we have a ~3G /dev/sdd and we set up device mapper to fail IO
>>> for sector 209 and to pass the io to the scsi device everywhere else.
>>>
>>> On 5.3-rc3, performing a directio pread of this range with a < 1M buffer
>>> (in other words, a request for fewer than MAX_BIO_PAGES bytes) yields
>>> EIO like you'd expect:
>>>
>>> # strace -e pread64 xfs_io -d -c 'pread -b 1024k 0k 1120k' /dev/mapper/error-test
>>> pread64(3, 0x7f880e1c7000, 1048576, 0)  = -1 EIO (Input/output error)
>>> pread: Input/output error
>>> +++ exited with 0 +++
>>>
>>> But doing it with a larger buffer succeeds(!):
>>>
>>> # strace -e pread64 xfs_io -d -c 'pread -b 2048k 0k 1120k' /dev/mapper/error-test
>>> pread64(3, "XFSB\0\0\20\0\0\0\0\0\0\fL\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1146880, 0) = 1146880
>>> read 1146880/1146880 bytes at offset 0
>>> 1 MiB, 1 ops; 0.0009 sec (1.124 GiB/sec and 1052.6316 ops/sec)
>>> +++ exited with 0 +++
>>>
>>> (Note that the part of the buffer corresponding to the dm-error area is
>>> uninitialized)
>>>
>>> On 5.3-rc2, both commands would fail with EIO like you'd expect.  The
>>> only change between rc2 and rc3 is commit 0eb6ddfb865c ("block: Fix
>>> __blkdev_direct_IO() for bio fragments").
>>>
>>> AFAICT we end up in __blkdev_direct_IO with a 1120K buffer, which gets
>>> split into two bios: one for the first BIO_MAX_PAGES worth of data (1MB)
>>> and a second one for the 96k after that.
>>>
>>> I think the problem is that every time we submit a bio, we increase ret
>>> by the size of that bio, but at the time we do that we have no idea if
>>> the bio is going to succeed or not.  At the end of the function we do:
>>>
>>> 	if (!ret)
>>> 		ret = blk_status_to_errno(dio->bio.bi_status);
>>>
>>> Which means that we only pick up the IO error if we haven't already set
>>> ret.  I suppose that was useful for being able to return a short read,
>>> but now that we always increment ret by the size of the bio, we act like
>>> the whole buffer was read.  I tried a -rc2 kernel and found that 40% of
>>> the time I'd get an EIO and the rest of the time I got a short read.
>>>
>>> Not sure where to go from here, but something's not right...
>>
>> I'll take a look.
> 
> How about this? The old code did:
> 
> 	if (!ret)
> 		ret = blk_status_to_errno(dio->bio.bi_status);
> 	if (likely(!ret))
> 		ret = dio->size;
> 
> where 'ret' was just tracking the error. With 'ret' now being the
> positive IO size, we should overwrite it if ret is >= 0, not just if
> it's zero.
> 
> Also kill a use-after-free.

This should be better, we don't want to override 'ret' is bio->bi_status
doesn't indicate an error.


diff --git a/fs/block_dev.c b/fs/block_dev.c
index a6f7c892cb4a..1ac89f4fcbcc 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -386,6 +386,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 
 	ret = 0;
 	for (;;) {
+		ssize_t this_size;
 		int err;
 
 		bio_set_dev(bio, bdev);
@@ -433,13 +434,14 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 				polled = true;
 			}
 
+			this_size = bio->bi_iter.bi_size;
 			qc = submit_bio(bio);
 			if (qc == BLK_QC_T_EAGAIN) {
 				if (!ret)
 					ret = -EAGAIN;
 				goto error;
 			}
-			ret = dio->size;
+			ret += this_size;
 
 			if (polled)
 				WRITE_ONCE(iocb->ki_cookie, qc);
@@ -460,13 +462,14 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 			atomic_inc(&dio->ref);
 		}
 
+		this_size = bio->bi_iter.bi_size;
 		qc = submit_bio(bio);
 		if (qc == BLK_QC_T_EAGAIN) {
 			if (!ret)
 				ret = -EAGAIN;
 			goto error;
 		}
-		ret = dio->size;
+		ret += this_size;
 
 		bio = bio_alloc(gfp, nr_pages);
 		if (!bio) {
@@ -494,7 +497,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 	__set_current_state(TASK_RUNNING);
 
 out:
-	if (!ret)
+	if (ret >= 0 && dio->bio.bi_status)
 		ret = blk_status_to_errno(dio->bio.bi_status);
 
 	bio_put(&dio->bio);

-- 
Jens Axboe


  reply	other threads:[~2019-08-05 20:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-05 18:15 Block device direct read EIO handling broken? Darrick J. Wong
2019-08-05 18:31 ` Jens Axboe
2019-08-05 20:31   ` Jens Axboe
2019-08-05 20:54     ` Jens Axboe [this message]
2019-08-05 21:08     ` Damien Le Moal
2019-08-05 21:25       ` Jens Axboe
2019-08-05 21:27         ` Damien Le Moal
2019-08-05 21:28           ` Jens Axboe
2019-08-05 21:59             ` Damien Le Moal
2019-08-05 22:05               ` Damien Le Moal
2019-08-06  0:05                 ` Damien Le Moal
2019-08-06  0:23                   ` Dave Chinner
2019-08-06 11:32                     ` Damien Le Moal
2019-08-06  4:09                   ` Jens Axboe
2019-08-06  7:05                     ` Damien Le Moal
2019-08-06 13:34                       ` Jens Axboe
2019-08-07  9:42                         ` Damien Le Moal
2019-08-06 13:23                     ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1de52140-bc7b-2f1f-0c12-a2f453defdcd@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=Damien.LeMoal@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.