public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin Wilck <mwilck@suse.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>, Jan Kara <jack@suse.com>,
	Hannes Reinecke <hare@suse.de>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Christoph Hellwig <hch@lst.de>, Al Viro <viro@ZenIV.linux.org.uk>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	linux-block@vger.kernel.org
Subject: Re: [PATCH v3 3/3] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio
Date: Fri, 20 Jul 2018 09:18:17 +0200	[thread overview]
Message-ID: <46f0697a7d2175c340d804616a99c6186a56120d.camel@suse.com> (raw)
In-Reply-To: <20180720024130.GA19701@ming.t460p>

On Fri, 2018-07-20 at 10:41 +0800, Ming Lei wrote:
> On Thu, Jul 19, 2018 at 11:01:58PM +0200, Martin Wilck wrote:
> > When bio_iov_iter_get_pages() is called
> > from  __blkdev_direct_IO_simple(),
> > we already know that the content of the input iov_iter fits into a
> > single
> > bio, so we expect iov_iter_count(iter) to drop to 0. But in a
> > single invocation,
> > bio_iov_iter_get_pages() may add less bytes then we expect. For
> > iov_iters with
> > multiple segments (generated e.g. by writev()), it behaves like an
> > iterator's
> > next() function, taking only one step (segment) at a time.
> > Furthermore, MM may
> > fail or refuse to pin all requested pages. The latter may signify
> > an error condition
> > (in which case eventually an error code will be returned), the
> > former does not. 
> > 
> > Call bio_iov_iter_get_pages() repeatedly to avoid short reads or
> > writes.
> > Otherwise, __generic_file_write_iter() falls back to buffered
> > writes, which
> > has been observed to cause data corruption in certain workloads.
> > 
> > Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
> > simplified bdev direct-io")
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  fs/block_dev.c | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index aba2541..561c34e 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -222,6 +222,24 @@ __blkdev_direct_IO_simple(struct kiocb *iocb,
> > struct iov_iter *iter,
> >  	ret = bio_iov_iter_get_pages(&bio, iter);
> >  	if (unlikely(ret))
> >  		goto out;
> > +
> > +	/*
> > +	 * bio_iov_iter_get_pages() may add less bytes than we
> > expect:
> > +	 * - for multi-segment iov_iters, as it only adds one
> > segment at a time
> > +	 * - if MM refuses or fails to pin all requested pages. In
> > this case,
> > +	 *   an error is returned eventually if no progress can be
> > made.
> > +	 */
> > +	while (iov_iter_count(iter) > 0 && bio.bi_vcnt <
> > bio.bi_max_vecs) {
> 
> Please use the helper bio_full().
> 
> > +		ret = bio_iov_iter_get_pages(&bio, iter);
> > +		if (unlikely(ret))
> > +			goto out;
> > +	}
> 
> When ret returns, pages pinned already need to be put.

Oops, sorry for missing that.

> Also I think the way suggested in the following link may be more
> generic:
> 
> 	https://marc.info/?l=linux-block&m=153199830130209&w=2
> 
> in which it is retried until no progress is made, and there should
> have
> performance benefit for other users too, not only fix this partial
> dio
> issue.

As there was no unequivocal support for that proposal, I propose to add
a new helper without renaming the current one. Callers could then be
migrated one-by-one. I will do so in v4.

Thanks,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

      reply	other threads:[~2018-07-20  7:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-19 21:01 [PATCH v3 0/3] Fix silent data corruption in blkdev_direct_IO() Martin Wilck
2018-07-19 21:01 ` [PATCH v3 1/3] block: bio_iov_iter_get_pages: fix size of last iovec Martin Wilck
2018-07-19 21:01 ` [PATCH v3 2/3] blkdev: __blkdev_direct_IO_simple: fix leak in error case Martin Wilck
2018-07-20  2:22   ` Ming Lei
2018-07-19 21:01 ` [PATCH v3 3/3] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio Martin Wilck
2018-07-20  2:41   ` Ming Lei
2018-07-20  7:18     ` Martin Wilck [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46f0697a7d2175c340d804616a99c6186a56120d.camel@suse.com \
    --to=mwilck@suse.com \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jack@suse.com \
    --cc=jthumshirn@suse.de \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox