From: Ming Lei <ming.lei@redhat.com>
To: Martin Wilck <mwilck@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.com>,
Hannes Reinecke <hare@suse.de>,
Johannes Thumshirn <jthumshirn@suse.de>,
Kent Overstreet <kent.overstreet@gmail.com>,
Christoph Hellwig <hch@lst.de>,
linux-block@vger.kernel.org
Subject: Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio
Date: Thu, 19 Jul 2018 19:04:46 +0800 [thread overview]
Message-ID: <20180719110444.GC20700@ming.t460p> (raw)
In-Reply-To: <20180719093918.28876-3-mwilck@suse.com>
On Thu, Jul 19, 2018 at 11:39:18AM +0200, Martin Wilck wrote:
> bio_iov_iter_get_pages() returns only pages for a single non-empty
> segment of the input iov_iter's iovec. This may be much less than the number
> of pages __blkdev_direct_IO_simple() is supposed to process. Call
> bio_iov_iter_get_pages() repeatedly until either the requested number
> of bytes is reached, or bio.bi_io_vec is exhausted. If this is not done,
> short writes or reads may occur for direct synchronous IOs with multiple
> iovec slots (such as generated by writev()). In that case,
> __generic_file_write_iter() falls back to buffered writes, which
> has been observed to cause data corruption in certain workloads.
>
> Note: if segments aren't page-aligned in the input iovec, this patch may
> result in multiple adjacent slots of the bi_io_vec array to reference the same
> page (the byte ranges are guaranteed to be disjunct if the preceding patch is
> applied). We haven't seen problems with that in our and the customer's
> tests. It'd be possible to detect this situation and merge bi_io_vec slots
> that refer to the same page, but I prefer to keep it simple for now.
>
> Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io")
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
> fs/block_dev.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 0dd87aa..41643c4 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -221,7 +221,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
>
> ret = bio_iov_iter_get_pages(&bio, iter);
> if (unlikely(ret))
> - return ret;
> + goto out;
> +
> + while (ret == 0 &&
> + bio.bi_vcnt < bio.bi_max_vecs && iov_iter_count(iter) > 0)
> + ret = bio_iov_iter_get_pages(&bio, iter);
> +
> ret = bio.bi_iter.bi_size;
>
> if (iov_iter_rw(iter) == READ) {
> @@ -250,6 +255,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
> put_page(bvec->bv_page);
> }
>
> +out:
> if (vecs != inline_vecs)
> kfree(vecs);
>
You might put the 'vecs' leak fix into another patch, and resue the
current code block for that.
Looks all users of bio_iov_iter_get_pages() need this kind of fix, so
what do you think about the following way?
diff --git a/block/bio.c b/block/bio.c
index f3536bfc8298..23dd4c163dfc 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -904,15 +904,7 @@ int bio_add_page(struct bio *bio, struct page *page,
}
EXPORT_SYMBOL(bio_add_page);
-/**
- * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
- * @bio: bio to add pages to
- * @iter: iov iterator describing the region to be mapped
- *
- * Pins as many pages from *iter and appends them to @bio's bvec array. The
- * pages will have to be released using put_page() when done.
- */
-int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
+static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
{
unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
@@ -951,6 +943,28 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
iov_iter_advance(iter, size);
return 0;
}
+
+/**
+ * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
+ * @bio: bio to add pages to
+ * @iter: iov iterator describing the region to be mapped
+ *
+ * Pins as many pages from *iter and appends them to @bio's bvec array. The
+ * pages will have to be released using put_page() when done.
+ */
+int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
+{
+ int ret;
+ unsigned int size;
+
+ do {
+ size = bio->bi_iter.bi_size;
+ ret = __bio_iov_iter_get_pages(bio, iter);
+ } while (!bio_full(bio) && iov_iter_count(iter) > 0 &&
+ bio->bi_iter.bi_size > size);
+
+ return bio->bi_iter.bi_size > 0 ? 0 : ret;
+}
EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
static void submit_bio_wait_endio(struct bio *bio)
Thanks,
Ming
next prev parent reply other threads:[~2018-07-19 11:04 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-12 14:36 Silent data corruption in blkdev_direct_IO() Hannes Reinecke
2018-07-12 15:08 ` Jens Axboe
2018-07-12 16:11 ` Martin Wilck
2018-07-12 16:14 ` Hannes Reinecke
2018-07-12 16:20 ` Jens Axboe
2018-07-12 16:42 ` Jens Axboe
2018-07-13 6:47 ` Martin Wilck
2018-07-13 16:56 ` Martin Wilck
2018-07-13 18:00 ` Jens Axboe
2018-07-13 18:50 ` Jens Axboe
2018-07-13 22:21 ` Martin Wilck
2018-07-13 20:48 ` Martin Wilck
2018-07-13 20:52 ` Jens Axboe
2018-07-16 19:05 ` Martin Wilck
2018-07-12 23:29 ` Ming Lei
2018-07-13 18:54 ` Jens Axboe
2018-07-13 22:29 ` Martin Wilck
2018-07-16 11:45 ` Ming Lei
2018-07-18 0:07 ` Martin Wilck
2018-07-18 2:48 ` Ming Lei
2018-07-18 7:32 ` Martin Wilck
2018-07-18 7:54 ` Ming Lei
2018-07-18 9:20 ` Johannes Thumshirn
2018-07-18 11:40 ` Jan Kara
2018-07-18 11:57 ` Jan Kara
2018-07-19 9:39 ` [PATCH 0/2] Fix silent " Martin Wilck
2018-07-19 9:39 ` [PATCH 1/2] block: bio_iov_iter_get_pages: fix size of last iovec Martin Wilck
2018-07-19 10:05 ` Hannes Reinecke
2018-07-19 10:09 ` Ming Lei
2018-07-19 10:20 ` Jan Kara
2018-07-19 14:52 ` Christoph Hellwig
2018-07-19 9:39 ` [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio Martin Wilck
2018-07-19 10:06 ` Hannes Reinecke
2018-07-19 10:21 ` Ming Lei
2018-07-19 10:37 ` Jan Kara
2018-07-19 10:46 ` Ming Lei
2018-07-19 11:08 ` Al Viro
2018-07-19 14:53 ` Christoph Hellwig
2018-07-19 15:06 ` Jan Kara
2018-07-19 15:11 ` Christoph Hellwig
2018-07-19 19:21 ` Martin Wilck
2018-07-19 19:34 ` Martin Wilck
2018-07-19 10:45 ` Jan Kara
2018-07-19 12:23 ` Martin Wilck
2018-07-19 15:15 ` Jan Kara
2018-07-19 20:01 ` Martin Wilck
2018-07-19 11:04 ` Ming Lei [this message]
2018-07-19 11:56 ` Jan Kara
2018-07-19 12:20 ` Ming Lei
2018-07-19 15:21 ` Jan Kara
2018-07-19 19:06 ` Martin Wilck
2018-07-19 12:25 ` Martin Wilck
2018-07-19 10:08 ` [PATCH 0/2] Fix silent data corruption in blkdev_direct_IO() Hannes Reinecke
2018-07-19 14:50 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180719110444.GC20700@ming.t460p \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jack@suse.com \
--cc=jthumshirn@suse.de \
--cc=kent.overstreet@gmail.com \
--cc=linux-block@vger.kernel.org \
--cc=mwilck@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.