linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: John Garry <john.g.garry@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>,
	Dave Chinner <david@fromorbit.com>,
	chandan.babu@oracle.com, dchinner@redhat.com,
	viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, catherine.hoang@oracle.com,
	martin.petersen@oracle.com, Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
Date: Tue, 23 Jul 2024 15:26:03 -0700	[thread overview]
Message-ID: <20240723222603.GS612460@frogsfrogsfrogs> (raw)
In-Reply-To: <2fd991cc-8f29-47ce-a78a-c11c79d74e07@oracle.com>

On Tue, Jul 23, 2024 at 04:01:41PM +0100, John Garry wrote:
> On 23/07/2024 15:42, Christoph Hellwig wrote:
> > On Tue, Jul 23, 2024 at 11:11:28AM +0100, John Garry wrote:
> > > I am looking at something like this to implement read-only for those inodes:
> > 
> > Yikes.  Treating individual inodes in a file systems as read-only
> > is about the most confusing and harmful behavior we could do.
> 
> That was the suggestion which I was given earlier in this thread.

Well, Christoph and I suggested failing the mount /earlier/ in this
thread. ;)

> > 
> > Just treat it as any other rocompat feature please an mount the entire
> > file system read-only if not supported.
> > 
> > Or even better let this wait a little, and work with Darrick to work
> > on the rextsize > 1 reflіnk patches and just make the thing work.
> 
> I'll let Darrick comment on this.

COW with alloc_unit > fsblock is not currently possible, whether it's
forcealign or rtreflink because COW must happen at allocation unit
granularity.  Pure overwrites don't need all these twists and turns.

1. For COW to work, each write/page_mkwrite must mark dirty every
fsblock in the entire alloc unit.  Those fsblocks could be cached by
multiple folios, which means (in iomap terms) dirtying each block in
potentially multiple iomap_folio_state structures, as well as their
folios.

2. Similarly, writeback must then be able to issue IO in quantities that
are aligned to allocation units.  IOWs, for every dirty region in the
file, we'd have to find the folios for a given allocation unit, mark
them all for writeback, and issue bios for however much we managed to
do.  If it's not possible to grab a folio, then the entire allocation
unit can't be written out, which implies that writeback can fail to
fully clean folios.

3. Alternately I suppose we could track the number of folios undergoing
writeback for each allocation unit, issue the writeback ios whenever
we're ready, and only remap the allocation unit when the number of
folios undergoing writeback for that allocation unit reaches zero.

If we could get the mapping_set_folio_order patch merged, then we could
at least get partial support for power-of-two alloc_unit > fsblock
configurations by setting the minimum folio order to log2(alloc_unit).
For atomic writes this is probably a hard requirement because we must be
able to submit one bio with one memory region.

For everyone else this sucks because cranking up the min folio order
reduces the flexibility that the page cache can have in finding cache
memory... but until someone figures out how to make the batching work,
there's not much progress to be made.

For non power-of-two alloc_unit we can't just crank up the min folio
order because there will always be misalignments somewhere; we need a
full writeback batching implementation that can handle multiple folios
per alloc unit and partial folio writeback.

djwong-dev implements 1.  It partially handles 2 by enlarging the wbc
range to be aligned to allocation units, but it doesn't guarantee that
all the folios actually got tagged for the batch.  It can't do 3, which
means that it's probably broken if you press it hard enough.

Alternately we could disallow non power-of-two everywhere, which would
make the accounting simpler but that's a regression against ye olde xfs
which supports non power-of-two allocation units.

rtreflink is nowhere near ready to go -- it's still in djwong-wtf behind
metadata directories, rtgroups, realtime rmap, and (probably) hch's
zns patches.

> > > > So what about forcealign and RT?
> > > 
> > > Any opinion on this?
> > 
> > What about forcealign and RT?
> 
> In this series version I was mounting the whole FS as RO if
> XFS_FEAT_FORCEALIGN and XFS_FEAT_REFLINK was found in the SB. And so very
> different to how I was going to individual treat inodes which happen to be
> forcealign and reflink, above.
> 
> So I was asking guidance when whether that approach (for RT and forcealign)
> is sound.

I reiterate: don't allow mounting of (forcealign && reflink) or
(forcealign && rtextsize > 1) filesystems, and then you and I can work
on figuring out the rest.

--D

  reply	other threads:[~2024-07-23 22:26 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
2024-07-05 16:24 ` [PATCH v2 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
2024-07-05 16:24 ` [PATCH v2 02/13] xfs: always tail align maxlen allocations John Garry
2024-07-05 16:24 ` [PATCH v2 03/13] xfs: simplify extent allocation alignment John Garry
2024-07-05 16:24 ` [PATCH v2 04/13] xfs: make EOF allocation simpler John Garry
2024-08-06 18:58   ` Darrick J. Wong
2024-07-05 16:24 ` [PATCH v2 05/13] xfs: introduce forced allocation alignment John Garry
2024-07-05 16:24 ` [PATCH v2 06/13] xfs: align args->minlen for " John Garry
2024-07-05 16:24 ` [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag John Garry
2024-07-11  2:59   ` Darrick J. Wong
2024-07-11  3:59     ` Christoph Hellwig
2024-07-11  7:17     ` John Garry
2024-07-11 23:33       ` Dave Chinner
2024-07-11 23:20     ` Dave Chinner
2024-07-12  4:56       ` Christoph Hellwig
2024-07-18  8:53       ` John Garry
2024-07-23 10:11         ` John Garry
2024-07-23 14:42           ` Christoph Hellwig
2024-07-23 15:01             ` John Garry
2024-07-23 22:26               ` Darrick J. Wong [this message]
2024-07-26 14:14                 ` John Garry
2024-07-23 23:38         ` Dave Chinner
2024-07-24  0:04           ` Darrick J. Wong
2024-07-24 18:50             ` John Garry
2024-07-24  7:39           ` John Garry
2024-07-05 16:24 ` [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign John Garry
2024-07-06  7:56   ` Christoph Hellwig
2024-07-08  1:44     ` Dave Chinner
2024-07-08  7:36       ` John Garry
2024-07-08 11:12         ` Dave Chinner
2024-07-08 14:41           ` John Garry
2024-07-09  7:41       ` Christoph Hellwig
2024-07-05 16:24 ` [PATCH v2 09/13] xfs: Update xfs_inode_alloc_unitsize() " John Garry
2024-07-05 16:24 ` [PATCH v2 10/13] xfs: Unmap blocks according to forcealign John Garry
2024-07-06  7:58   ` Christoph Hellwig
2024-07-08 14:48     ` John Garry
2024-07-09  7:46       ` Christoph Hellwig
2024-07-17 15:24         ` John Garry
2024-07-17 16:42           ` Christoph Hellwig
2024-07-09  9:57     ` Dave Chinner
2024-07-09 11:19       ` Christoph Hellwig
2024-07-05 16:24 ` [PATCH v2 11/13] xfs: Only free full extents for forcealign John Garry
2024-07-06  7:59   ` Christoph Hellwig
2024-07-05 16:24 ` [PATCH v2 12/13] xfs: Don't revert allocated offset " John Garry
2024-07-05 16:24 ` [PATCH v2 13/13] xfs: Enable file data forcealign feature John Garry
2024-07-06  7:53 ` [PATCH v2 00/13] forcealign for xfs Christoph Hellwig
2024-07-08  7:48   ` John Garry
2024-07-09  7:48     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240723222603.GS612460@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=brauner@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).