Re: [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
	hch@infradead.org, djwong@kernel.org, dchinner@redhat.com,
	kbusch@kernel.org, sagi@grimberg.me, axboe@fb.com,
	brauner@kernel.org, hare@suse.de, ritesh.list@gmail.com,
	rgoldwyn@suse.com, jack@suse.cz, ziy@nvidia.com,
	ryan.roberts@arm.com, patches@lists.linux.dev,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, p.raghav@samsung.com,
	da.gomez@samsung.com, dan.helmick@samsung.com
Subject: Re: [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads
Date: Mon, 18 Sep 2023 10:59:07 +1000	[thread overview]
Message-ID: <ZQeg2+0X6yzGL1Mx@dread.disaster.area> (raw)
In-Reply-To: <ZQeIaN2WC+whc/OP@casper.infradead.org>

On Mon, Sep 18, 2023 at 12:14:48AM +0100, Matthew Wilcox wrote:
> On Mon, Sep 18, 2023 at 08:38:37AM +1000, Dave Chinner wrote:
> > On Fri, Sep 15, 2023 at 02:32:44PM -0700, Luis Chamberlain wrote:
> > > LBS devices. This in turn allows filesystems which support bs > 4k to be
> > > enabled on a 4k PAGE_SIZE world on LBS block devices. This alows LBS
> > > device then to take advantage of the recenlty posted work today to enable
> > > LBS support for filesystems [0].
> > 
> > Why do we need LBS devices to support bs > ps in XFS?
> 
> It's the other way round -- we need the support in the page cache to
> reject sub-block-size folios (which is in the other patches) before we
> can sensibly talk about enabling any filesystems on top of LBS devices.
>
> Even XFS, or for that matter ext2 which support 16k block sizes on
> CONFIG_PAGE_SIZE_16K (or 64K) kernels need that support first.

Well, yes, I know that. But the statement above implies that we
can't use bs > ps filesytems without LBS support on 4kB PAGE_SIZE
systems. If it's meant to mean the exact opposite, then it is
extremely poorly worded....

> > > There might be a better way to do this than do deal with the switching
> > > of the aops dynamically, ideas welcomed!
> > 
> > Is it even safe to switch aops dynamically? We know there are
> > inherent race conditions in doing this w.r.t. mmap and page faults,
> > as the write fault part of the processing is directly dependent
> > on the page being correctly initialised during the initial
> > population of the page data (the "read fault" side of the write
> > fault).
> > 
> > Hence it's not generally considered safe to change aops from one
> > mechanism to another dynamically. Block devices can be mmap()d, but
> > I don't see anything in this patch set that ensures there are no
> > other users of the block device when the swaps are done. What am I
> > missing?
> 
> We need to evict all pages from the page cache before switching aops to
> prevent misinterpretation of folio->private. 

Yes, but if the device is mapped, even after an invalidation, we can
still race with a new fault instantiating a page whilst the aops are
being swapped, right? That was the problem that sunk dynamic
swapping of the aops when turning DAX on and off on an inode, right?

> If switching aops is even
> the right thing to do.  I don't see the problem with allowing buffer heads
> on block devices, but I haven't been involved with the discussion here.

iomap supports bufferheads as a transitional thing (e.g. for gfs2).

Hence I suspect that a better solution is to always use iomap and
the same aops, but just switch from iomap page state to buffer heads
in the bdev mapping interface via a synchronised invalidation +
setting/clearing IOMAP_F_BUFFER_HEAD in all new mapping requests...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2023-09-18  1:00 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-15 21:32 [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 01/10] bdev: rename iomap aops Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 02/10] bdev: dynamically set aops to enable LBS support Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 03/10] bdev: increase bdev max blocksize depending on the aops used Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 04/10] filesystems: add filesytem buffer-head flag Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 05/10] bdev: allow to switch between bdev aops Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 06/10] bdev: simplify coexistance Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 07/10] nvme: enhance max supported LBA format check Luis Chamberlain
2023-09-15 22:20   ` Matthew Wilcox
2023-09-15 22:27     ` Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 08/10] nvme: add awun / nawun sanity check Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 09/10] nvme: add nvme_core.debug_large_atomics to force high awun as phys_bs Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 10/10] nvme: enable LBS support Luis Chamberlain
2023-09-15 21:51 ` [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads Matthew Wilcox
2023-09-15 22:26   ` Luis Chamberlain
2023-09-17 11:50   ` Hannes Reinecke
2023-09-18 17:12   ` Luis Chamberlain
2023-09-18 18:15     ` Matthew Wilcox
2023-09-18 18:42       ` Hannes Reinecke
2023-09-17 22:38 ` Dave Chinner
2023-09-17 23:14   ` Matthew Wilcox
2023-09-18  0:59     ` Dave Chinner [this message]
2023-09-18  1:13       ` Luis Chamberlain
2023-09-18  2:49         ` Dave Chinner
2023-09-18 17:51           ` Luis Chamberlain
2023-09-18 11:34     ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQeg2+0X6yzGL1Mx@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=axboe@fb.com \
    --cc=brauner@kernel.org \
    --cc=da.gomez@samsung.com \
    --cc=dan.helmick@samsung.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=patches@lists.linux.dev \
    --cc=rgoldwyn@suse.com \
    --cc=ritesh.list@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=sagi@grimberg.me \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox