public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: Jan Kara <jack@suse.cz>, Keith Busch <kbusch@kernel.org>,
	"hch@infradead.org" <hch@infradead.org>,
	Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [PATCH] Revert "block: Do not discard buffers under a mounted filesystem"
Date: Fri, 19 Feb 2021 11:27:11 +0100	[thread overview]
Message-ID: <20210219102711.GC6086@quack2.suse.cz> (raw)
In-Reply-To: <BL0PR04MB651418C1C31967E5AF7743A2E7859@BL0PR04MB6514.namprd04.prod.outlook.com>

On Thu 18-02-21 22:35:41, Damien Le Moal wrote:
> On 2021/02/18 23:07, Jan Kara wrote:
> > On Tue 16-02-21 23:05:57, Damien Le Moal wrote:
> >> On 2021/02/17 2:51, Keith Busch wrote:
> >>> On Tue, Feb 16, 2021 at 04:36:06PM +0000, Christoph Hellwig wrote:
> >>>> On Tue, Feb 16, 2021 at 02:38:49PM +0100, Jan Kara wrote:
> >>>>> Apparently there are several userspace programs that depend on being
> >>>>> able to call BLKDISCARD ioctl without the ability to grab bdev
> >>>>> exclusively - namely FUSE filesystems have the device open without
> >>>>> O_EXCL (the kernel has the bdev open with O_EXCL) so the commit breaks
> >>>>> fstrim(8) for such filesystems. Also LVM when shrinking LV opens PV and
> >>>>> discards ranges released from LV but that PV may be already open
> >>>>> exclusively by someone else (see bugzilla link below for more details).
> >>>>>
> >>>>> This reverts commit 384d87ef2c954fc58e6c5fd8253e4a1984f5fe02.
> >>>>
> >>>> I think that is a bad idea. We fixed the problem for a reason.
> >>>> I think the right fix is to just do nothing if the device hasn't been
> >>>> opened with O_EXCL and can't be reopened with it, just don't do anything
> >>>> but also don't return an error.  After all discard and thus
> >>>> BLKDISCARD is purely advisory.
> >>>
> >>> A discard is advisory, but BLKZEROOUT is not, so something different
> >>> should happen there. We were also planning to send a patch using this
> >>> same pattern for Zone Reset to fix stale page cache issues after the
> >>> reset, but we'll wait to see how this settles before sending that.
> >>
> >> There is also another problem: the truncate_bdev & operation following it
> >> (discard, zeroout or zone reset) are not atomic vs read/write operations to the
> >> bdev. Without mutual exclusion, that page invalidation is best effort only since
> >> reads can snick in between the truncate and discard (or zeroout or zone reset).
> >> With our zone reset stale page problem case, it is reads from udevd that we see
> >> snicking in between the truncate bdev and zone reset and so we still get stale
> >> pages after the zone reset is finished. No solution to propose for solving that,
> >> yet...
> > 
> > Well, at least blkdev_fallocate() does:
> > 
> > 	truncate_bdev_range();
> > 	blkdev_issue_zeroout();
> > 	invalidate_inode_pages2_range();
> > 
> > so racing reads should not result in stale page cache contents AFAICT.
> 
> Yes, but concurrent writes can then get in between the blkdev_issue_zeroout()
> and invalidate_inode_pages2_range() and data discarded before hitting the
> drive... Not very nice either. Granted, that would mean that userland has 2
> concurrent writers that are not synchronized. So weird results are to be
> expected. I guess it is probably safe to ignore that case ?

Yes. IMHO any result that doesn't crash the kernel (or burn the HW) is fine
in that case.

> I guess the same pattern as above for zeroout would work for reset zone too.
> Will try to see if that solves our test problem.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

      reply	other threads:[~2021-02-19 10:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-16 13:38 [PATCH] Revert "block: Do not discard buffers under a mounted filesystem" Jan Kara
2021-02-16 16:36 ` Christoph Hellwig
2021-02-16 17:16   ` Jan Kara
2021-02-18 11:17     ` Jan Kara
2021-02-16 17:49   ` Keith Busch
2021-02-16 23:05     ` Damien Le Moal
2021-02-18 14:07       ` Jan Kara
2021-02-18 22:35         ` Damien Le Moal
2021-02-19 10:27           ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210219102711.GC6086@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=Damien.LeMoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox