From: Maxim Levitsky <mlevitsk@redhat.com>
To: Jan Kara <jack@suse.cz>, Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org,
"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: [PATCH] bdev: Do not return EBUSY if bdev discard races with write
Date: Thu, 07 Jan 2021 17:52:14 +0200 [thread overview]
Message-ID: <54685e0e1c078ceb65052adf3c24ee7fd78cc565.camel@redhat.com> (raw)
In-Reply-To: <382d2087bb8652861bf30dec1b9096c44d093e00.camel@redhat.com>
On Thu, 2021-01-07 at 17:48 +0200, Maxim Levitsky wrote:
> On Thu, 2021-01-07 at 16:40 +0100, Jan Kara wrote:
> > blkdev_fallocate() tries to detect whether a discard raced with an
> > overlapping write by calling invalidate_inode_pages2_range(). However
> > this check can give both false negatives (when writing using direct IO
> > or when writeback already writes out the written pagecache range) and
> > false positives (when write is not actually overlapping but ends in the
> > same page when blocksize < pagesize). This actually causes issues for
> > qemu which is getting confused by EBUSY errors.
> >
> > Fix the problem by removing this conflicting write detection since it is
> > inherently racy and thus of little use anyway.
> >
> > Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
> > CC: "Darrick J. Wong" <darrick.wong@oracle.com>
> > Link: https://lore.kernel.org/qemu-devel/20201111153913.41840-1-mlevitsk@redhat.com
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/block_dev.c | 10 ++++------
> > 1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 3e5b02f6606c..a97f43b49839 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -1797,13 +1797,11 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> > return error;
> >
> > /*
> > - * Invalidate again; if someone wandered in and dirtied a page,
> > - * the caller will be given -EBUSY. The third argument is
> > - * inclusive, so the rounding here is safe.
> > + * Invalidate the page cache again; if someone wandered in and dirtied
> > + * a page, we just discard it - userspace has no way of knowing whether
> > + * the write happened before or after discard completing...
> > */
> > - return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping,
> > - start >> PAGE_SHIFT,
> > - end >> PAGE_SHIFT);
> > + return truncate_bdev_range(bdev, file->f_mode, start, end);
> > }
>
> But what happens if write and discard don't overlap? Won't we
> discard the written data in this case?
Ah, I see, the truncate_bdev_range preserves the partial
areas that are not included in the range.
In this case this indeed looks right.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Best regards,
Maxim Levitsky
>
>
> Best regards,
> Maxim Levitsky
>
>
> >
> > const struct file_operations def_blk_fops = {
next prev parent reply other threads:[~2021-01-07 15:53 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-07 15:40 [PATCH] bdev: Do not return EBUSY if bdev discard races with write Jan Kara
2021-01-07 15:48 ` Maxim Levitsky
2021-01-07 15:52 ` Maxim Levitsky [this message]
2021-01-07 19:40 ` Darrick J. Wong
2021-01-09 10:42 ` Christoph Hellwig
2021-01-26 10:02 ` Jan Kara
2021-01-26 17:22 ` Jens Axboe
2021-01-27 9:12 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54685e0e1c078ceb65052adf3c24ee7fd78cc565.camel@redhat.com \
--to=mlevitsk@redhat.com \
--cc=axboe@kernel.dk \
--cc=darrick.wong@oracle.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).