public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: hch@infradead.org, djwong@kernel.org, dchinner@redhat.com,
	kbusch@kernel.org, sagi@grimberg.me, axboe@fb.com,
	willy@infradead.org, brauner@kernel.org, hare@suse.de,
	ritesh.list@gmail.com, rgoldwyn@suse.com, jack@suse.cz,
	ziy@nvidia.com, ryan.roberts@arm.com, patches@lists.linux.dev,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, p.raghav@samsung.com,
	da.gomez@samsung.com, dan.helmick@samsung.com
Subject: Re: [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads
Date: Mon, 18 Sep 2023 08:38:37 +1000	[thread overview]
Message-ID: <ZQd/7RYfDZgvR0n2@dread.disaster.area> (raw)
In-Reply-To: <20230915213254.2724586-1-mcgrof@kernel.org>

On Fri, Sep 15, 2023 at 02:32:44PM -0700, Luis Chamberlain wrote:
> Christoph added CONFIG_BUFFER_HEAD on v6.1 enabling a world where we can
> live without buffer-heads. When we opt into that world we end up also
> using the address space operations of the block device cache using
> iomap. Since iomap supports higher order folios it means then that block
> devices which do use the iomap aops can end up having a logical block
> size or physical block size greater than PAGE_SIZE. We refer to these as
> LBS devices. This in turn allows filesystems which support bs > 4k to be
> enabled on a 4k PAGE_SIZE world on LBS block devices. This alows LBS
> device then to take advantage of the recenlty posted work today to enable
> LBS support for filesystems [0].

Why do we need LBS devices to support bs > ps in XFS?

As long as the filesystem block size is >= logical sector size of
the device, XFS just doesn't care what the "block size" of the
underlying block device is. i.e. XFS will allow minimum IO sizes of
the logical sector size of the device for select metadata and
sub-block direct IO, but otherwise all other IO will be aligned to
filesystem block size and so the underlying device block sizes are
completely irrelevant...

> To experiment with larger LBA formtas you can also use kdevops and enable
> CONFIG_QEMU_ENABLE_EXTRA_DRIVE_LARGEIO. That enables a ton of drives with
> logical and physical block sizes >= 4k up to a desriable max target for
> experimentation. Since filesystems today only support up to 32k sector sizes,
> in practice you may only want to experiment up to 32k physical / logical.
> 
> Support for 64k sector sizes requires an XFS format change, which is something
> Daniel Gomez has experimental patches for, in case folks are interested in
> messing with.

Please don't do this without first talking to the upstream XFS
developers about the intended use cases and design of the new
format. Especially when the problem involves requiring a whole new
journal header and indexing format to be implemented...

As a general rule, nobody should *ever* be writing patches to change
the on-disk format of a -any- filesystem without first engaging the
people who maintain that filesystem. Architecture and design come
first, not implementation. The last thing we want is to have someone
spend weeks or months on something that takes the experts half a
minute to NACK because it is so obviously flawed....

> Patch 6 could probably be squashed with patch 5, but I wanted to be
> explicit about this, as this should be decided with the community.
> 
> There might be a better way to do this than do deal with the switching
> of the aops dynamically, ideas welcomed!

Is it even safe to switch aops dynamically? We know there are
inherent race conditions in doing this w.r.t. mmap and page faults,
as the write fault part of the processing is directly dependent
on the page being correctly initialised during the initial
population of the page data (the "read fault" side of the write
fault).

Hence it's not generally considered safe to change aops from one
mechanism to another dynamically. Block devices can be mmap()d, but
I don't see anything in this patch set that ensures there are no
other users of the block device when the swaps are done. What am I
missing?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2023-09-17 22:39 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-15 21:32 [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 01/10] bdev: rename iomap aops Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 02/10] bdev: dynamically set aops to enable LBS support Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 03/10] bdev: increase bdev max blocksize depending on the aops used Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 04/10] filesystems: add filesytem buffer-head flag Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 05/10] bdev: allow to switch between bdev aops Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 06/10] bdev: simplify coexistance Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 07/10] nvme: enhance max supported LBA format check Luis Chamberlain
2023-09-15 22:20   ` Matthew Wilcox
2023-09-15 22:27     ` Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 08/10] nvme: add awun / nawun sanity check Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 09/10] nvme: add nvme_core.debug_large_atomics to force high awun as phys_bs Luis Chamberlain
2023-09-15 21:32 ` [RFC v2 10/10] nvme: enable LBS support Luis Chamberlain
2023-09-15 21:51 ` [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads Matthew Wilcox
2023-09-15 22:26   ` Luis Chamberlain
2023-09-17 11:50   ` Hannes Reinecke
2023-09-18 17:12   ` Luis Chamberlain
2023-09-18 18:15     ` Matthew Wilcox
2023-09-18 18:42       ` Hannes Reinecke
2023-09-17 22:38 ` Dave Chinner [this message]
2023-09-17 23:14   ` Matthew Wilcox
2023-09-18  0:59     ` Dave Chinner
2023-09-18  1:13       ` Luis Chamberlain
2023-09-18  2:49         ` Dave Chinner
2023-09-18 17:51           ` Luis Chamberlain
2023-09-18 11:34     ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQd/7RYfDZgvR0n2@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=axboe@fb.com \
    --cc=brauner@kernel.org \
    --cc=da.gomez@samsung.com \
    --cc=dan.helmick@samsung.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=patches@lists.linux.dev \
    --cc=rgoldwyn@suse.com \
    --cc=ritesh.list@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=sagi@grimberg.me \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox