[PATCH v2 00/13] forcealign for xfs

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/13] forcealign for xfs
@ 2024-07-05 16:24 John Garry
  2024-07-05 16:24 ` [PATCH v2 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
                   ` (13 more replies)
  0 siblings, 14 replies; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

This series is being spun off the block atomic writes for xfs series at
[0].

That series has got too big and also has a dependency on the core block
atomic writes support, which has now been queued for 6.11 in Jens' block
tree [1].

The actual forcealign patches are the same in this series, modulo an
attempt for a fix in xfs_bunmapi_align().

Why forcealign?
In some scenarios to may be required to guarantee extent alignment and
granularity.

For example, for atomic writes, the maximum atomic write unit size would
be limited at the extent alignment and granularity, guaranteeing that an
atomic write would not span data present in multiple extents.

forcealign may be useful as a performance tuning optimization in other
scenarios.

Early development xfsprogs support is at:
https://github.com/johnpgarry/xfsprogs-dev/commits/atomic-writes/

Differences to v1:
- Add Darricks RB tags (thanks)
- Disallow mount for forcealign and RT
- Disallow cp --reflink from forcealign inode
- Comments improvements (Darrick)
- Coding style improvements (Darrick)
- Fix xfs_inode_alloc_unitsize() (Darrick)

Baseline:
xfs/for-next @ 3ba3ab1f6719 ("xfs: enable FITRIM on the realtime device")

[0] https://lore.kernel.org/linux-xfs/20240607143919.2622319-1-john.g.garry@oracle.com/
[1] https://lore.kernel.org/linux-block/20240620125359.2684798-1-john.g.garry@oracle.com/

Darrick J. Wong (2):
  xfs: Introduce FORCEALIGN inode flag
  xfs: Enable file data forcealign feature

Dave Chinner (6):
  xfs: only allow minlen allocations when near ENOSPC
  xfs: always tail align maxlen allocations
  xfs: simplify extent allocation alignment
  xfs: make EOF allocation simpler
  xfs: introduce forced allocation alignment
  xfs: align args->minlen for forced allocation alignment

John Garry (5):
  xfs: Do not free EOF blocks for forcealign
  xfs: Update xfs_inode_alloc_unitsize() for forcealign
  xfs: Unmap blocks according to forcealign
  xfs: Only free full extents for forcealign
  xfs: Don't revert allocated offset for forcealign

 fs/xfs/libxfs/xfs_alloc.c     |  33 ++--
 fs/xfs/libxfs/xfs_alloc.h     |   3 +-
 fs/xfs/libxfs/xfs_bmap.c      | 321 +++++++++++++++++++---------------
 fs/xfs/libxfs/xfs_format.h    |   9 +-
 fs/xfs/libxfs/xfs_ialloc.c    |  12 +-
 fs/xfs/libxfs/xfs_inode_buf.c |  55 ++++++
 fs/xfs/libxfs/xfs_inode_buf.h |   3 +
 fs/xfs/libxfs/xfs_sb.c        |   2 +
 fs/xfs/xfs_bmap_util.c        |  14 +-
 fs/xfs/xfs_inode.c            |  17 +-
 fs/xfs/xfs_inode.h            |  23 +++
 fs/xfs/xfs_ioctl.c            |  51 +++++-
 fs/xfs/xfs_mount.h            |   2 +
 fs/xfs/xfs_reflink.c          |   5 +-
 fs/xfs/xfs_reflink.h          |  10 --
 fs/xfs/xfs_super.c            |  11 ++
 fs/xfs/xfs_trace.h            |   8 +-
 include/uapi/linux/fs.h       |   2 +
 18 files changed, 392 insertions(+), 189 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 01/13] xfs: only allow minlen allocations when near ENOSPC
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-05 16:24 ` [PATCH v2 02/13] xfs: always tail align maxlen allocations John Garry
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

When we are near ENOSPC and don't have enough free
space for an args->maxlen allocation, xfs_alloc_space_available()
will trim args->maxlen to equal the available space. However, this
function has only checked that there is enough contiguous free space
for an aligned args->minlen allocation to succeed. Hence there is no
guarantee that an args->maxlen allocation will succeed, nor that the
available space will allow for correct alignment of an args->maxlen
allocation.

Further, by trimming args->maxlen arbitrarily, it breaks an
assumption made in xfs_alloc_fix_len() that if the caller wants
aligned allocation, then args->maxlen will be set to an aligned
value. It then skips the tail alignment and so we end up with
extents that aren't aligned to extent size hint boundaries as we
approach ENOSPC.

To avoid this problem, don't reduce args->maxlen by some random,
arbitrary amount. If args->maxlen is too large for the available
space, reduce the allocation to a minlen allocation as we know we
have contiguous free space available for this to succeed and always
be correctly aligned.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 63315ddc46c6..74f0a3656458 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2409,14 +2409,23 @@ xfs_alloc_space_available(
 	if (available < (int)max(args->total, alloc_len))
 		return false;

+	if (flags & XFS_ALLOC_FLAG_CHECK)
+		return true;
+
 	/*
-	 * Clamp maxlen to the amount of free space available for the actual
-	 * extent allocation.
+	 * If we can't do a maxlen allocation, then we must reduce the size of
+	 * the allocation to match the available free space. We know how big
+	 * the largest contiguous free space we can allocate is, so that's our
+	 * upper bound. However, we don't exaclty know what alignment/size
+	 * constraints have been placed on the allocation, so we can't
+	 * arbitrarily select some new max size. Hence make this a minlen
+	 * allocation as we know that will definitely succeed and match the
+	 * callers alignment constraints.
 	 */
-	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
-		args->maxlen = available;
+	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
+	if (longest < alloc_len) {
+		args->maxlen = args->minlen;
 		ASSERT(args->maxlen > 0);
-		ASSERT(args->maxlen >= args->minlen);
 	}

 	return true;
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 02/13] xfs: always tail align maxlen allocations
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
  2024-07-05 16:24 ` [PATCH v2 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-05 16:24 ` [PATCH v2 03/13] xfs: simplify extent allocation alignment John Garry
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

When we do a large allocation, the core free space allocation code
assumes that args->maxlen is aligned to args->prod/args->mod. hence
if we get a maximum sized extent allocated, it does not do tail
alignment of the extent.

However, this assumes that nothing modifies args->maxlen between the
original allocation context setup and trimming the selected free
space extent to size. This assumption has recently been found to be
invalid - xfs_alloc_space_available() modifies args->maxlen in low
space situations - and there may be more situations we haven't yet
found like this.

Force aligned allocation introduces the requirement that extents are
correctly tail aligned, resulting in this occasional latent
alignment failure to be reclassified from an unimportant curiousity
to a must-fix bug.

Removing the assumption about args->maxlen allocations always being
tail aligned is trivial, and should not impact anything because
args->maxlen for inodes with extent size hints configured are
already aligned. Hence all this change does it avoid weird corner
cases that would have resulted in unaligned extent sizes by always
trimming the extent down to an aligned size.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> [provisional on v1 series comment]
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 74f0a3656458..2864520c3902 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -432,20 +432,18 @@ xfs_alloc_compute_diff(
  * Fix up the length, based on mod and prod.
  * len should be k * prod + mod for some k.
  * If len is too small it is returned unchanged.
- * If len hits maxlen it is left alone.
  */
-STATIC void
+static void
 xfs_alloc_fix_len(
-	xfs_alloc_arg_t	*args)		/* allocation argument structure */
+	struct xfs_alloc_arg	*args)
 {
-	xfs_extlen_t	k;
-	xfs_extlen_t	rlen;
+	xfs_extlen_t		k;
+	xfs_extlen_t		rlen = args->len;

 	ASSERT(args->mod < args->prod);
-	rlen = args->len;
 	ASSERT(rlen >= args->minlen);
 	ASSERT(rlen <= args->maxlen);
-	if (args->prod <= 1 || rlen < args->mod || rlen == args->maxlen ||
+	if (args->prod <= 1 || rlen < args->mod ||
 	    (args->mod == 0 && rlen < args->prod))
 		return;
 	k = rlen % args->prod;
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 03/13] xfs: simplify extent allocation alignment
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
  2024-07-05 16:24 ` [PATCH v2 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
  2024-07-05 16:24 ` [PATCH v2 02/13] xfs: always tail align maxlen allocations John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-05 16:24 ` [PATCH v2 04/13] xfs: make EOF allocation simpler John Garry
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

We currently align extent allocation to stripe unit or stripe width.
That is specified by an external parameter to the allocation code,
which then manipulates the xfs_alloc_args alignment configuration in
interesting ways.

The args->alignment field specifies extent start alignment, but
because we may be attempting non-aligned allocation first there are
also slop variables that allow for those allocation attempts to
account for aligned allocation if they fail.

This gets much more complex as we introduce forced allocation
alignment, where extent size hints are used to generate the extent
start alignment. extent size hints currently only affect extent
lengths (via args->prod and args->mod) and so with this change we
will have two different start alignment conditions.

Avoid this complexity by always using args->alignment to indicate
extent start alignment, and always using args->prod/mod to indicate
extent length adjustment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[jpg: fixup alignslop references in xfs_trace.h and xfs_ialloc.c]
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |  4 +-
 fs/xfs/libxfs/xfs_alloc.h  |  2 +-
 fs/xfs/libxfs/xfs_bmap.c   | 95 ++++++++++++++++----------------------
 fs/xfs/libxfs/xfs_ialloc.c | 10 ++--
 fs/xfs/xfs_trace.h         |  8 ++--
 5 files changed, 53 insertions(+), 66 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 2864520c3902..67b11e4d30ae 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2391,7 +2391,7 @@ xfs_alloc_space_available(
 	reservation = xfs_ag_resv_needed(pag, args->resv);
 
 	/* do we have enough contiguous free space for the allocation? */
-	alloc_len = args->minlen + (args->alignment - 1) + args->minalignslop;
+	alloc_len = args->minlen + (args->alignment - 1) + args->alignslop;
 	longest = xfs_alloc_longest_free_extent(pag, min_free, reservation);
 	if (longest < alloc_len)
 		return false;
@@ -2420,7 +2420,7 @@ xfs_alloc_space_available(
 	 * allocation as we know that will definitely succeed and match the
 	 * callers alignment constraints.
 	 */
-	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
+	alloc_len = args->maxlen + (args->alignment - 1) + args->alignslop;
 	if (longest < alloc_len) {
 		args->maxlen = args->minlen;
 		ASSERT(args->maxlen > 0);
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 3dc8e44fea76..1e9d0bde5640 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
 	xfs_extlen_t	minleft;	/* min blocks must be left after us */
 	xfs_extlen_t	total;		/* total blocks needed in xaction */
 	xfs_extlen_t	alignment;	/* align answer to multiple of this */
-	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
+	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
 	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
 	xfs_agblock_t	max_agbno;	/* ... */
 	xfs_extlen_t	len;		/* output: actual size of extent */
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6af6f744fdd6..b5156bafb7be 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3285,6 +3285,10 @@ xfs_bmap_select_minlen(
 	xfs_extlen_t		blen)
 {
 
+	/* Adjust best length for extent start alignment. */
+	if (blen > args->alignment)
+		blen -= args->alignment;
+
 	/*
 	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
 	 * possible that there is enough contiguous free space for this request.
@@ -3393,35 +3397,43 @@ xfs_bmap_alloc_account(
 	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, fld, ap->length);
 }
 
-static int
+/*
+ * Calculate the extent start alignment and the extent length adjustments that
+ * constrain this allocation.
+ *
+ * Extent start alignment is currently determined by stripe configuration and is
+ * carried in args->alignment, whilst extent length adjustment is determined by
+ * extent size hints and is carried by args->prod and args->mod.
+ *
+ * Low level allocation code is free to either ignore or override these values
+ * as required.
+ */
+static void
 xfs_bmap_compute_alignments(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_extlen_t		align = 0; /* minimum allocation alignment */
-	int			stripe_align = 0;
 
 	/* stripe alignment for allocation is determined by mount parameters */
 	if (mp->m_swidth && xfs_has_swalloc(mp))
-		stripe_align = mp->m_swidth;
+		args->alignment = mp->m_swidth;
 	else if (mp->m_dalign)
-		stripe_align = mp->m_dalign;
+		args->alignment = mp->m_dalign;
 
 	if (ap->flags & XFS_BMAPI_COWFORK)
 		align = xfs_get_cowextsz_hint(ap->ip);
 	else if (ap->datatype & XFS_ALLOC_USERDATA)
 		align = xfs_get_extsz_hint(ap->ip);
+
 	if (align) {
 		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
 					ap->eof, 0, ap->conv, &ap->offset,
 					&ap->length))
 			ASSERT(0);
 		ASSERT(ap->length);
-	}
 
-	/* apply extent size hints if obtained earlier */
-	if (align) {
 		args->prod = align;
 		div_u64_rem(ap->offset, args->prod, &args->mod);
 		if (args->mod)
@@ -3436,7 +3448,6 @@ xfs_bmap_compute_alignments(
 			args->mod = args->prod - args->mod;
 	}
 
-	return stripe_align;
 }
 
 static void
@@ -3508,7 +3519,7 @@ xfs_bmap_exact_minlen_extent_alloc(
 	args.total = ap->total;
 
 	args.alignment = 1;
-	args.minalignslop = 0;
+	args.alignslop = 0;
 
 	args.minleft = ap->minleft;
 	args.wasdel = ap->wasdel;
@@ -3548,7 +3559,6 @@ xfs_bmap_btalloc_at_eof(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		blen,
-	int			stripe_align,
 	bool			ag_only)
 {
 	struct xfs_mount	*mp = args->mp;
@@ -3562,23 +3572,15 @@ xfs_bmap_btalloc_at_eof(
 	 * allocation.
 	 */
 	if (ap->offset) {
-		xfs_extlen_t	nextminlen = 0;
+		xfs_extlen_t	alignment = args->alignment;
 
 		/*
-		 * Compute the minlen+alignment for the next case.  Set slop so
-		 * that the value of minlen+alignment+slop doesn't go up between
-		 * the calls.
+		 * Compute the alignment slop for the fallback path so we ensure
+		 * we account for the potential alignment space required by the
+		 * fallback paths before we modify the AGF and AGFL here.
 		 */
 		args->alignment = 1;
-		if (blen > stripe_align && blen <= args->maxlen)
-			nextminlen = blen - stripe_align;
-		else
-			nextminlen = args->minlen;
-		if (nextminlen + stripe_align > args->minlen + 1)
-			args->minalignslop = nextminlen + stripe_align -
-					args->minlen - 1;
-		else
-			args->minalignslop = 0;
+		args->alignslop = alignment - args->alignment;
 
 		if (!caller_pag)
 			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
@@ -3596,19 +3598,8 @@ xfs_bmap_btalloc_at_eof(
 		 * Exact allocation failed. Reset to try an aligned allocation
 		 * according to the original allocation specification.
 		 */
-		args->alignment = stripe_align;
-		args->minlen = nextminlen;
-		args->minalignslop = 0;
-	} else {
-		/*
-		 * Adjust minlen to try and preserve alignment if we
-		 * can't guarantee an aligned maxlen extent.
-		 */
-		args->alignment = stripe_align;
-		if (blen > args->alignment &&
-		    blen <= args->maxlen + args->alignment)
-			args->minlen = blen - args->alignment;
-		args->minalignslop = 0;
+		args->alignment = alignment;
+		args->alignslop = 0;
 	}
 
 	if (ag_only) {
@@ -3626,9 +3617,8 @@ xfs_bmap_btalloc_at_eof(
 		return 0;
 
 	/*
-	 * Allocation failed, so turn return the allocation args to their
-	 * original non-aligned state so the caller can proceed on allocation
-	 * failure as if this function was never called.
+	 * Aligned allocation failed, so all fallback paths from here drop the
+	 * start alignment requirement as we know it will not succeed.
 	 */
 	args->alignment = 1;
 	return 0;
@@ -3636,7 +3626,9 @@ xfs_bmap_btalloc_at_eof(
 
 /*
  * We have failed multiple allocation attempts so now are in a low space
- * allocation situation. Try a locality first full filesystem minimum length
+ * allocation situation. We give up on any attempt at aligned allocation here.
+ *
+ * Try a locality first full filesystem minimum length
  * allocation whilst still maintaining necessary total block reservation
  * requirements.
  *
@@ -3653,6 +3645,7 @@ xfs_bmap_btalloc_low_space(
 {
 	int			error;
 
+	args->alignment = 1;
 	if (args->minlen > ap->minlen) {
 		args->minlen = ap->minlen;
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
@@ -3672,13 +3665,11 @@ xfs_bmap_btalloc_low_space(
 static int
 xfs_bmap_btalloc_filestreams(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	int			stripe_align)
+	struct xfs_alloc_arg	*args)
 {
 	xfs_extlen_t		blen = 0;
 	int			error = 0;
 
-
 	error = xfs_filestream_select_ag(ap, args, &blen);
 	if (error)
 		return error;
@@ -3697,8 +3688,7 @@ xfs_bmap_btalloc_filestreams(
 
 	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
 	if (ap->aeof)
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
-				true);
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
 
 	if (!error && args->fsbno == NULLFSBLOCK)
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
@@ -3722,8 +3712,7 @@ xfs_bmap_btalloc_filestreams(
 static int
 xfs_bmap_btalloc_best_length(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	int			stripe_align)
+	struct xfs_alloc_arg	*args)
 {
 	xfs_extlen_t		blen = 0;
 	int			error;
@@ -3747,8 +3736,7 @@ xfs_bmap_btalloc_best_length(
 	 * trying.
 	 */
 	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
-				false);
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
 		if (error || args->fsbno != NULLFSBLOCK)
 			return error;
 	}
@@ -3775,27 +3763,26 @@ xfs_bmap_btalloc(
 		.resv		= XFS_AG_RESV_NONE,
 		.datatype	= ap->datatype,
 		.alignment	= 1,
-		.minalignslop	= 0,
+		.alignslop	= 0,
 	};
 	xfs_fileoff_t		orig_offset;
 	xfs_extlen_t		orig_length;
 	int			error;
-	int			stripe_align;
 
 	ASSERT(ap->length);
 	orig_offset = ap->offset;
 	orig_length = ap->length;
 
-	stripe_align = xfs_bmap_compute_alignments(ap, &args);
+	xfs_bmap_compute_alignments(ap, &args);
 
 	/* Trim the allocation back to the maximum an AG can fit. */
 	args.maxlen = min(ap->length, mp->m_ag_max_usable);
 
 	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
 	    xfs_inode_is_filestream(ap->ip))
-		error = xfs_bmap_btalloc_filestreams(ap, &args, stripe_align);
+		error = xfs_bmap_btalloc_filestreams(ap, &args);
 	else
-		error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align);
+		error = xfs_bmap_btalloc_best_length(ap, &args);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 14c81f227c5b..9f71a9a3a65e 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -758,12 +758,12 @@ xfs_ialloc_ag_alloc(
 		 *
 		 * For an exact allocation, alignment must be 1,
 		 * however we need to take cluster alignment into account when
-		 * fixing up the freelist. Use the minalignslop field to
-		 * indicate that extra blocks might be required for alignment,
-		 * but not to use them in the actual exact allocation.
+		 * fixing up the freelist. Use the alignslop field to indicate
+		 * that extra blocks might be required for alignment, but not
+		 * to use them in the actual exact allocation.
 		 */
 		args.alignment = 1;
-		args.minalignslop = igeo->cluster_align - 1;
+		args.alignslop = igeo->cluster_align - 1;
 
 		/* Allow space for the inode btree to split. */
 		args.minleft = igeo->inobt_maxlevels;
@@ -783,7 +783,7 @@ xfs_ialloc_ag_alloc(
 		 * on, so reset minalignslop to ensure it is not included in
 		 * subsequent requests.
 		 */
-		args.minalignslop = 0;
+		args.alignslop = 0;
 	}
 
 	if (unlikely(args.fsbno == NULLFSBLOCK)) {
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index ba839ce6a9cf..19035aa854f9 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1808,7 +1808,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__field(xfs_extlen_t, minleft)
 		__field(xfs_extlen_t, total)
 		__field(xfs_extlen_t, alignment)
-		__field(xfs_extlen_t, minalignslop)
+		__field(xfs_extlen_t, alignslop)
 		__field(xfs_extlen_t, len)
 		__field(char, wasdel)
 		__field(char, wasfromfl)
@@ -1827,7 +1827,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__entry->minleft = args->minleft;
 		__entry->total = args->total;
 		__entry->alignment = args->alignment;
-		__entry->minalignslop = args->minalignslop;
+		__entry->alignslop = args->alignslop;
 		__entry->len = args->len;
 		__entry->wasdel = args->wasdel;
 		__entry->wasfromfl = args->wasfromfl;
@@ -1836,7 +1836,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__entry->highest_agno = args->tp->t_highest_agno;
 	),
 	TP_printk("dev %d:%d agno 0x%x agbno 0x%x minlen %u maxlen %u mod %u "
-		  "prod %u minleft %u total %u alignment %u minalignslop %u "
+		  "prod %u minleft %u total %u alignment %u alignslop %u "
 		  "len %u wasdel %d wasfromfl %d resv %d "
 		  "datatype 0x%x highest_agno 0x%x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
@@ -1849,7 +1849,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		  __entry->minleft,
 		  __entry->total,
 		  __entry->alignment,
-		  __entry->minalignslop,
+		  __entry->alignslop,
 		  __entry->len,
 		  __entry->wasdel,
 		  __entry->wasfromfl,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 04/13] xfs: make EOF allocation simpler
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (2 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 03/13] xfs: simplify extent allocation alignment John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-08-06 18:58   ` Darrick J. Wong
  2024-07-05 16:24 ` [PATCH v2 05/13] xfs: introduce forced allocation alignment John Garry
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

Currently the allocation at EOF is broken into two cases - when the
offset is zero and when the offset is non-zero. When the offset is
non-zero, we try to do exact block allocation for contiguous
extent allocation. When the offset is zero, the allocation is simply
an aligned allocation.

We want aligned allocation as the fallback when exact block
allocation fails, but that complicates the EOF allocation in that it
now has to handle two different allocation cases. The
caller also has to handle allocation when not at EOF, and for the
upcoming forced alignment changes we need that to also be aligned
allocation.

To simplify all this, pull the aligned allocation cases back into
the callers and leave the EOF allocation path for exact block
allocation only. This means that the EOF exact block allocation
fallback path is the normal aligned allocation path and that ends up
making things a lot simpler when forced alignment is introduced.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c   | 129 +++++++++++++++----------------------
 fs/xfs/libxfs/xfs_ialloc.c |   2 +-
 2 files changed, 54 insertions(+), 77 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index b5156bafb7be..4122a2da06ec 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3309,12 +3309,12 @@ xfs_bmap_select_minlen(
 static int
 xfs_bmap_btalloc_select_lengths(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		*blen)
+	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*pag;
 	xfs_agnumber_t		agno, startag;
+	xfs_extlen_t		blen = 0;
 	int			error = 0;
 
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
@@ -3328,19 +3328,18 @@ xfs_bmap_btalloc_select_lengths(
 	if (startag == NULLAGNUMBER)
 		startag = 0;
 
-	*blen = 0;
 	for_each_perag_wrap(mp, startag, agno, pag) {
-		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
+		error = xfs_bmap_longest_free_extent(pag, args->tp, &blen);
 		if (error && error != -EAGAIN)
 			break;
 		error = 0;
-		if (*blen >= args->maxlen)
+		if (blen >= args->maxlen)
 			break;
 	}
 	if (pag)
 		xfs_perag_rele(pag);
 
-	args->minlen = xfs_bmap_select_minlen(ap, args, *blen);
+	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
 	return error;
 }
 
@@ -3550,78 +3549,40 @@ xfs_bmap_exact_minlen_extent_alloc(
  * If we are not low on available data blocks and we are allocating at
  * EOF, optimise allocation for contiguous file extension and/or stripe
  * alignment of the new extent.
- *
- * NOTE: ap->aeof is only set if the allocation length is >= the
- * stripe unit and the allocation offset is at the end of file.
  */
 static int
 xfs_bmap_btalloc_at_eof(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		blen,
-	bool			ag_only)
+	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*caller_pag = args->pag;
+	xfs_extlen_t		alignment = args->alignment;
 	int			error;
 
+	ASSERT(ap->aeof && ap->offset);
+	ASSERT(args->alignment >= 1);
+
 	/*
-	 * If there are already extents in the file, try an exact EOF block
-	 * allocation to extend the file as a contiguous extent. If that fails,
-	 * or it's the first allocation in a file, just try for a stripe aligned
-	 * allocation.
+	 * Compute the alignment slop for the fallback path so we ensure
+	 * we account for the potential alignemnt space required by the
+	 * fallback paths before we modify the AGF and AGFL here.
 	 */
-	if (ap->offset) {
-		xfs_extlen_t	alignment = args->alignment;
-
-		/*
-		 * Compute the alignment slop for the fallback path so we ensure
-		 * we account for the potential alignment space required by the
-		 * fallback paths before we modify the AGF and AGFL here.
-		 */
-		args->alignment = 1;
-		args->alignslop = alignment - args->alignment;
-
-		if (!caller_pag)
-			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
-		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
-		if (!caller_pag) {
-			xfs_perag_put(args->pag);
-			args->pag = NULL;
-		}
-		if (error)
-			return error;
-
-		if (args->fsbno != NULLFSBLOCK)
-			return 0;
-		/*
-		 * Exact allocation failed. Reset to try an aligned allocation
-		 * according to the original allocation specification.
-		 */
-		args->alignment = alignment;
-		args->alignslop = 0;
-	}
+	args->alignment = 1;
+	args->alignslop = alignment - args->alignment;
 
-	if (ag_only) {
-		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
-	} else {
+	if (!caller_pag)
+		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
+	error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
+	if (!caller_pag) {
+		xfs_perag_put(args->pag);
 		args->pag = NULL;
-		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
-		ASSERT(args->pag == NULL);
-		args->pag = caller_pag;
 	}
-	if (error)
-		return error;
 
-	if (args->fsbno != NULLFSBLOCK)
-		return 0;
-
-	/*
-	 * Aligned allocation failed, so all fallback paths from here drop the
-	 * start alignment requirement as we know it will not succeed.
-	 */
-	args->alignment = 1;
-	return 0;
+	/* Reset alignment to original specifications.  */
+	args->alignment = alignment;
+	args->alignslop = 0;
+	return error;
 }
 
 /*
@@ -3687,12 +3648,19 @@ xfs_bmap_btalloc_filestreams(
 	}
 
 	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
-	if (ap->aeof)
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
+	if (ap->aeof && ap->offset)
+		error = xfs_bmap_btalloc_at_eof(ap, args);
 
+	/* This may be an aligned allocation attempt. */
 	if (!error && args->fsbno == NULLFSBLOCK)
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
 
+	/* Attempt non-aligned allocation if we haven't already. */
+	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		args->alignment = 1;
+		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
+	}
+
 out_low_space:
 	/*
 	 * We are now done with the perag reference for the filestreams
@@ -3714,7 +3682,6 @@ xfs_bmap_btalloc_best_length(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
 {
-	xfs_extlen_t		blen = 0;
 	int			error;
 
 	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
@@ -3725,23 +3692,33 @@ xfs_bmap_btalloc_best_length(
 	 * the request.  If one isn't found, then adjust the minimum allocation
 	 * size to the largest space found.
 	 */
-	error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
+	error = xfs_bmap_btalloc_select_lengths(ap, args);
 	if (error)
 		return error;
 
 	/*
-	 * Don't attempt optimal EOF allocation if previous allocations barely
-	 * succeeded due to being near ENOSPC. It is highly unlikely we'll get
-	 * optimal or even aligned allocations in this case, so don't waste time
-	 * trying.
+	 * If we are in low space mode, then optimal allocation will fail so
+	 * prepare for minimal allocation and run the low space algorithm
+	 * immediately.
 	 */
-	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
-		if (error || args->fsbno != NULLFSBLOCK)
-			return error;
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
+		ASSERT(args->fsbno == NULLFSBLOCK);
+		return xfs_bmap_btalloc_low_space(ap, args);
+	}
+
+	if (ap->aeof && ap->offset)
+		error = xfs_bmap_btalloc_at_eof(ap, args);
+
+	/* This may be an aligned allocation attempt. */
+	if (!error && args->fsbno == NULLFSBLOCK)
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
+
+	/* Attempt non-aligned allocation if we haven't already. */
+	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		args->alignment = 1;
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	}
 
-	error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	if (error || args->fsbno != NULLFSBLOCK)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 9f71a9a3a65e..40a2daeea712 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -780,7 +780,7 @@ xfs_ialloc_ag_alloc(
 		 * the exact agbno requirement and increase the alignment
 		 * instead. It is critical that the total size of the request
 		 * (len + alignment + slop) does not increase from this point
-		 * on, so reset minalignslop to ensure it is not included in
+		 * on, so reset alignslop to ensure it is not included in
 		 * subsequent requests.
 		 */
 		args.alignslop = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 05/13] xfs: introduce forced allocation alignment
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (3 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 04/13] xfs: make EOF allocation simpler John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-05 16:24 ` [PATCH v2 06/13] xfs: align args->minlen for " John Garry
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

When forced allocation alignment is specified, the extent will
be aligned to the extent size hint size rather than stripe
alignment. If aligned allocation cannot be done, then the allocation
is failed rather than attempting non-aligned fallbacks.

Note: none of the per-inode force align configuration is present
yet, so this just triggers off an "always false" wrapper function
for the moment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.h |  1 +
 fs/xfs/libxfs/xfs_bmap.c  | 29 +++++++++++++++++++++++------
 fs/xfs/xfs_inode.h        |  5 +++++
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 1e9d0bde5640..71a85439e5fc 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -66,6 +66,7 @@ typedef struct xfs_alloc_arg {
 #define XFS_ALLOC_USERDATA		(1 << 0)/* allocation is for user data*/
 #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
 #define XFS_ALLOC_NOBUSY		(1 << 2)/* Busy extents not allowed */
+#define XFS_ALLOC_FORCEALIGN		(1 << 3)/* forced extent alignment */
 
 /* freespace limit calculations */
 unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 4122a2da06ec..1cc2d812a6e9 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3400,9 +3400,10 @@ xfs_bmap_alloc_account(
  * Calculate the extent start alignment and the extent length adjustments that
  * constrain this allocation.
  *
- * Extent start alignment is currently determined by stripe configuration and is
- * carried in args->alignment, whilst extent length adjustment is determined by
- * extent size hints and is carried by args->prod and args->mod.
+ * Extent start alignment is currently determined by forced inode alignment or
+ * stripe configuration and is carried in args->alignment, whilst extent length
+ * adjustment is determined by extent size hints and is carried by args->prod
+ * and args->mod.
  *
  * Low level allocation code is free to either ignore or override these values
  * as required.
@@ -3415,11 +3416,18 @@ xfs_bmap_compute_alignments(
 	struct xfs_mount	*mp = args->mp;
 	xfs_extlen_t		align = 0; /* minimum allocation alignment */
 
-	/* stripe alignment for allocation is determined by mount parameters */
-	if (mp->m_swidth && xfs_has_swalloc(mp))
+	/*
+	 * Forced inode alignment takes preference over stripe alignment.
+	 * Stripe alignment for allocation is determined by mount parameters.
+	 */
+	if (xfs_inode_has_forcealign(ap->ip)) {
+		args->alignment = xfs_get_extsz_hint(ap->ip);
+		args->datatype |= XFS_ALLOC_FORCEALIGN;
+	} else if (mp->m_swidth && xfs_has_swalloc(mp)) {
 		args->alignment = mp->m_swidth;
-	else if (mp->m_dalign)
+	} else if (mp->m_dalign) {
 		args->alignment = mp->m_dalign;
+	}
 
 	if (ap->flags & XFS_BMAPI_COWFORK)
 		align = xfs_get_cowextsz_hint(ap->ip);
@@ -3606,6 +3614,11 @@ xfs_bmap_btalloc_low_space(
 {
 	int			error;
 
+	if (args->alignment > 1 && (args->datatype & XFS_ALLOC_FORCEALIGN)) {
+		args->fsbno = NULLFSBLOCK;
+		return 0;
+	}
+
 	args->alignment = 1;
 	if (args->minlen > ap->minlen) {
 		args->minlen = ap->minlen;
@@ -3657,6 +3670,8 @@ xfs_bmap_btalloc_filestreams(
 
 	/* Attempt non-aligned allocation if we haven't already. */
 	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		if (args->datatype & XFS_ALLOC_FORCEALIGN)
+			return error;
 		args->alignment = 1;
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
 	}
@@ -3715,6 +3730,8 @@ xfs_bmap_btalloc_best_length(
 
 	/* Attempt non-aligned allocation if we haven't already. */
 	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		if (args->datatype & XFS_ALLOC_FORCEALIGN)
+			return error;
 		args->alignment = 1;
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 292b90b5f2ac..42f999c1106c 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -311,6 +311,11 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
 	return ip->i_diflags2 & XFS_DIFLAG2_NREXT64;
 }
 
+static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
+{
+	return false;
+}
+
 /*
  * Decide if this file is a realtime file whose data allocation unit is larger
  * than a single filesystem block.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 06/13] xfs: align args->minlen for forced allocation alignment
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (4 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 05/13] xfs: introduce forced allocation alignment John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-05 16:24 ` [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag John Garry
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

If args->minlen is not aligned to the constraints of forced
alignment, we may do minlen allocations that are not aligned when we
approach ENOSPC. Avoid this by always aligning args->minlen
appropriately. If alignment of minlen results in a value smaller
than the alignment constraint, fail the allocation immediately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 44 ++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 1cc2d812a6e9..db12f006646a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3278,32 +3278,48 @@ xfs_bmap_longest_free_extent(
 	return 0;
 }
 
-static xfs_extlen_t
+static int
 xfs_bmap_select_minlen(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		blen)
 {
-
 	/* Adjust best length for extent start alignment. */
 	if (blen > args->alignment)
 		blen -= args->alignment;
 
 	/*
 	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
-	 * possible that there is enough contiguous free space for this request.
+	 * possible that there is enough contiguous free space for this request
+	 * even if best length is less that the minimum length we need.
+	 *
+	 * If the best length won't satisfy the maximum length we requested,
+	 * then use it as the minimum length so we get as large an allocation
+	 * as possible.
 	 */
 	if (blen < ap->minlen)
-		return ap->minlen;
+		blen = ap->minlen;
+	else if (blen > args->maxlen)
+		blen = args->maxlen;
 
 	/*
-	 * If the best seen length is less than the request length,
-	 * use the best as the minimum, otherwise we've got the maxlen we
-	 * were asked for.
+	 * If we have alignment constraints, round the minlen down to match the
+	 * constraint so that alignment will be attempted. This may reduce the
+	 * allocation to smaller than was requested, so clamp the minimum to
+	 * ap->minlen to allow unaligned allocation to succeed. If we are forced
+	 * to align the allocation, return ENOSPC at this point because we don't
+	 * have enough contiguous free space to guarantee aligned allocation.
 	 */
-	if (blen < args->maxlen)
-		return blen;
-	return args->maxlen;
+	if (args->alignment > 1) {
+		blen = rounddown(blen, args->alignment);
+		if (blen < ap->minlen) {
+			if (args->datatype & XFS_ALLOC_FORCEALIGN)
+				return -ENOSPC;
+			blen = ap->minlen;
+		}
+	}
+	args->minlen = blen;
+	return 0;
 }
 
 static int
@@ -3339,8 +3355,7 @@ xfs_bmap_btalloc_select_lengths(
 	if (pag)
 		xfs_perag_rele(pag);
 
-	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
-	return error;
+	return xfs_bmap_select_minlen(ap, args, blen);
 }
 
 /* Update all inode and quota accounting for the allocation we just did. */
@@ -3660,7 +3675,10 @@ xfs_bmap_btalloc_filestreams(
 		goto out_low_space;
 	}
 
-	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
+	error = xfs_bmap_select_minlen(ap, args, blen);
+	if (error)
+		goto out_low_space;
+
 	if (ap->aeof && ap->offset)
 		error = xfs_bmap_btalloc_at_eof(ap, args);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (5 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 06/13] xfs: align args->minlen for " John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-11  2:59   ` Darrick J. Wong
  2024-07-05 16:24 ` [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign John Garry
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: "Darrick J. Wong" <djwong@kernel.org>

Add a new inode flag to require that all file data extent mappings must
be aligned (both the file offset range and the allocated space itself)
to the extent size hint.  Having a separate COW extent size hint is no
longer allowed.

The goal here is to enable sysadmins and users to mandate that all space
mappings in a file must have a startoff/blockcount that are aligned to
(say) a 2MB alignment and that the startblock/blockcount will follow the
same alignment.

Allocated space will be aligned to start of the AG, and not necessarily
aligned with disk blocks. The upcoming atomic writes feature will rely and
forcealign and will also require allocated space will also be aligned to
disk blocks.

Currently RT is not be supported for forcealign, so reject a mount under
that condition. In future, it should be possible to support forcealign for
RT. In this case, the extent size hint will need to be aligned with
rtextsize, so add inode verification for that now.

reflink link will not be supported for forcealign. This is because we have
the limitation of pageache writeback not knowing how to writeback an
entire allocation unut, so reject a mount with relink.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Co-developed-by: John Garry <john.g.garry@oracle.com>
[jpg: many changes from orig, including forcealign inode verification
 rework, disallow RT and forcealign mount, ioctl setattr rework,
 disallow reflink a forcealign inode]
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h    |  6 +++-
 fs/xfs/libxfs/xfs_inode_buf.c | 55 +++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_inode_buf.h |  3 ++
 fs/xfs/libxfs/xfs_sb.c        |  2 ++
 fs/xfs/xfs_inode.c            | 13 +++++++++
 fs/xfs/xfs_inode.h            | 20 ++++++++++++-
 fs/xfs/xfs_ioctl.c            | 51 ++++++++++++++++++++++++++++++--
 fs/xfs/xfs_mount.h            |  2 ++
 fs/xfs/xfs_reflink.c          |  5 ++--
 fs/xfs/xfs_reflink.h          | 10 -------
 fs/xfs/xfs_super.c            | 11 +++++++
 include/uapi/linux/fs.h       |  2 ++
 12 files changed, 164 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 61f51becff4f..b48cd75d34a6 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -353,6 +353,7 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
@@ -1094,16 +1095,19 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
 #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
 #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
+/* data extent mappings for regular files must be aligned to extent size hint */
+#define XFS_DIFLAG2_FORCEALIGN_BIT 5
 
 #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
 #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
 #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
 #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
 #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_FORCEALIGN	(1 << XFS_DIFLAG2_FORCEALIGN_BIT)
 
 #define XFS_DIFLAG2_ANY \
 	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
-	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
+	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_FORCEALIGN)
 
 static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 {
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 513b50da6215..5c61a1d1bb2b 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -657,6 +657,15 @@ xfs_dinode_verify(
 	    !xfs_has_bigtime(mp))
 		return __this_address;
 
+	if (flags2 & XFS_DIFLAG2_FORCEALIGN) {
+		fa = xfs_inode_validate_forcealign(mp,
+				be32_to_cpu(dip->di_extsize),
+				be32_to_cpu(dip->di_cowextsize),
+				mode, flags, flags2);
+		if (fa)
+			return fa;
+	}
+
 	return NULL;
 }
 
@@ -824,3 +833,49 @@ xfs_inode_validate_cowextsize(
 
 	return NULL;
 }
+
+/* Validate the forcealign inode flag */
+xfs_failaddr_t
+xfs_inode_validate_forcealign(
+	struct xfs_mount	*mp,
+	uint32_t		extsize,
+	uint32_t		cowextsize,
+	uint16_t		mode,
+	uint16_t		flags,
+	uint64_t		flags2)
+{
+	bool			rt =  flags & XFS_DIFLAG_REALTIME;
+
+	/* superblock rocompat feature flag */
+	if (!xfs_has_forcealign(mp))
+		return __this_address;
+
+	/* Only regular files and directories */
+	if (!S_ISDIR(mode) && !S_ISREG(mode))
+		return __this_address;
+
+	/* We require EXTSIZE or EXTSZINHERIT */
+	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
+		return __this_address;
+
+	/* We require a non-zero extsize */
+	if (!extsize)
+		return __this_address;
+
+	/* Reflink'ed disallowed */
+	if (flags2 & XFS_DIFLAG2_REFLINK)
+		return __this_address;
+
+	/* COW extsize disallowed */
+	if (flags2 & XFS_DIFLAG2_COWEXTSIZE)
+		return __this_address;
+
+	if (cowextsize)
+		return __this_address;
+
+	/* extsize must be a multiple of sb_rextsize for RT */
+	if (rt && mp->m_sb.sb_rextsize && extsize % mp->m_sb.sb_rextsize)
+		return __this_address;
+
+	return NULL;
+}
diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h
index 585ed5a110af..b8b65287b037 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.h
+++ b/fs/xfs/libxfs/xfs_inode_buf.h
@@ -33,6 +33,9 @@ xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp,
 xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp,
 		uint32_t cowextsize, uint16_t mode, uint16_t flags,
 		uint64_t flags2);
+xfs_failaddr_t xfs_inode_validate_forcealign(struct xfs_mount *mp,
+		uint32_t extsize, uint32_t cowextsize, uint16_t mode,
+		uint16_t flags, uint64_t flags2);
 
 static inline uint64_t xfs_inode_encode_bigtime(struct timespec64 tv)
 {
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 6b56f0f6d4c1..e56911553edd 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -164,6 +164,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_REFLINK;
 	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
 		features |= XFS_FEAT_INOBTCNT;
+	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
+		features |= XFS_FEAT_FORCEALIGN;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
 		features |= XFS_FEAT_FTYPE;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index a4e3cd8971fc..caffd4c75179 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -609,6 +609,8 @@ xfs_ip2xflags(
 			flags |= FS_XFLAG_DAX;
 		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
 			flags |= FS_XFLAG_COWEXTSIZE;
+		if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
+			flags |= FS_XFLAG_FORCEALIGN;
 	}
 
 	if (xfs_inode_has_attr_fork(ip))
@@ -738,6 +740,8 @@ xfs_inode_inherit_flags2(
 	}
 	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
 		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
+	if (pip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
+		ip->i_diflags2 |= XFS_DIFLAG2_FORCEALIGN;
 
 	/* Don't let invalid cowextsize hints propagate. */
 	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
@@ -746,6 +750,15 @@ xfs_inode_inherit_flags2(
 		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
 		ip->i_cowextsize = 0;
 	}
+
+	if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN) {
+		failaddr = xfs_inode_validate_forcealign(ip->i_mount,
+				ip->i_extsize, ip->i_cowextsize,
+				VFS_I(ip)->i_mode, ip->i_diflags,
+				ip->i_diflags2);
+		if (failaddr)
+			ip->i_diflags2 &= ~XFS_DIFLAG2_FORCEALIGN;
+	}
 }
 
 /*
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 42f999c1106c..536e646dd055 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -301,6 +301,16 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
 	return ip->i_cowfp && ip->i_cowfp->if_bytes;
 }
 
+static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
+{
+	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
+}
+
+static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
+{
+	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
+}
+
 static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
 {
 	return ip->i_diflags2 & XFS_DIFLAG2_BIGTIME;
@@ -313,7 +323,15 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
 
 static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
 {
-	return false;
+	if (!(ip->i_diflags & XFS_DIFLAG_EXTSIZE))
+		return false;
+	if (ip->i_extsize <= 1)
+		return false;
+	if (xfs_is_cow_inode(ip))
+		return false;
+	if (ip->i_diflags & XFS_DIFLAG_REALTIME)
+		return false;
+	return ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN;
 }
 
 /*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index f0117188f302..771ef3954f4e 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -525,10 +525,48 @@ xfs_flags2diflags2(
 		di_flags2 |= XFS_DIFLAG2_DAX;
 	if (xflags & FS_XFLAG_COWEXTSIZE)
 		di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
+	if (xflags & FS_XFLAG_FORCEALIGN)
+		di_flags2 |= XFS_DIFLAG2_FORCEALIGN;
 
 	return di_flags2;
 }
 
+/*
+ * Forcealign requires a non-zero extent size hint and a zero cow
+ * extent size hint.  Don't allow set for RT files yet.
+ */
+static int
+xfs_ioctl_setattr_forcealign(
+	struct xfs_inode	*ip,
+	struct fileattr		*fa)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (!xfs_has_forcealign(mp))
+		return -EINVAL;
+
+	if (xfs_is_reflink_inode(ip))
+		return -EINVAL;
+
+	if (!(fa->fsx_xflags & (FS_XFLAG_EXTSIZE |
+				FS_XFLAG_EXTSZINHERIT)))
+		return -EINVAL;
+
+	if (fa->fsx_xflags & FS_XFLAG_COWEXTSIZE)
+		return -EINVAL;
+
+	if (!fa->fsx_extsize)
+		return -EINVAL;
+
+	if (fa->fsx_cowextsize)
+		return -EINVAL;
+
+	if (fa->fsx_xflags & FS_XFLAG_REALTIME)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int
 xfs_ioctl_setattr_xflags(
 	struct xfs_trans	*tp,
@@ -537,10 +575,13 @@ xfs_ioctl_setattr_xflags(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	bool			rtflag = (fa->fsx_xflags & FS_XFLAG_REALTIME);
+	bool			forcealign = fa->fsx_xflags & FS_XFLAG_FORCEALIGN;
 	uint64_t		i_flags2;
+	int			error;
 
-	if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
-		/* Can't change realtime flag if any extents are allocated. */
+	/* Can't change RT or forcealign flags if any extents are allocated. */
+	if (rtflag != XFS_IS_REALTIME_INODE(ip) ||
+	    forcealign != xfs_inode_has_forcealign(ip)) {
 		if (ip->i_df.if_nextents || ip->i_delayed_blks)
 			return -EINVAL;
 	}
@@ -561,6 +602,12 @@ xfs_ioctl_setattr_xflags(
 	if (i_flags2 && !xfs_has_v3inodes(mp))
 		return -EINVAL;
 
+	if (forcealign) {
+		error = xfs_ioctl_setattr_forcealign(ip, fa);
+		if (error)
+			return error;
+	}
+
 	ip->i_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
 	ip->i_diflags2 = i_flags2;
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index d0567dfbc036..30228fea908d 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -299,6 +299,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
 #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
 #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
+#define XFS_FEAT_FORCEALIGN	(1ULL << 28)	/* aligned file data extents */
 
 /* Mount features */
 #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
@@ -385,6 +386,7 @@ __XFS_ADD_V4_FEAT(projid32, PROJID32)
 __XFS_HAS_V4_FEAT(v3inodes, V3INODES)
 __XFS_HAS_V4_FEAT(crc, CRC)
 __XFS_HAS_V4_FEAT(pquotino, PQUOTINO)
+__XFS_HAS_FEAT(forcealign, FORCEALIGN)
 
 /*
  * Mount features
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 265a2a418bc7..8da293e8bfa2 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1467,8 +1467,9 @@ xfs_reflink_remap_prep(
 
 	/* Check file eligibility and prepare for block sharing. */
 	ret = -EINVAL;
-	/* Don't reflink realtime inodes */
-	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
+	/* Don't reflink realtime or forcealign inodes */
+	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest) ||
+	    xfs_inode_has_forcealign(src) || xfs_inode_has_forcealign(dest))
 		goto out_unlock;
 
 	/* Don't share DAX file data with non-DAX file. */
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 65c5dfe17ecf..fb55e4ce49fa 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -6,16 +6,6 @@
 #ifndef __XFS_REFLINK_H
 #define __XFS_REFLINK_H 1
 
-static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
-{
-	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
-}
-
-static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
-{
-	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
-}
-
 extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *irec, bool *shared);
 int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 27e9f749c4c7..4f0c77f38de1 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1721,6 +1721,17 @@ xfs_fs_fill_super(
 		mp->m_features &= ~XFS_FEAT_DISCARD;
 	}
 
+	if (xfs_has_forcealign(mp)) {
+		if (xfs_has_realtime(mp)) {
+			xfs_alert(mp,
+	"forcealign not compatible with realtime device!");
+			error = -EINVAL;
+			goto out_filestream_unmount;
+		}
+		xfs_warn(mp,
+"EXPERIMENTAL forced data extent alignment feature in use. Use at your own risk!");
+	}
+
 	if (xfs_has_reflink(mp)) {
 		if (mp->m_sb.sb_rblocks) {
 			xfs_alert(mp,
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 45e4e64fd664..8af304c0e29a 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -158,6 +158,8 @@ struct fsxattr {
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+/* data extent mappings for regular files must be aligned to extent size hint */
+#define FS_XFLAG_FORCEALIGN	0x00020000
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /* the read-only stuff doesn't really belong here, but any other place is
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (6 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-06  7:56   ` Christoph Hellwig
  2024-07-05 16:24 ` [PATCH v2 09/13] xfs: Update xfs_inode_alloc_unitsize() " John Garry
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For when forcealign is enabled, we want the EOF to be aligned as well, so
do not free EOF blocks.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_bmap_util.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index fe2e2c930975..b9f8093ae78c 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -537,8 +537,13 @@ xfs_can_free_eofblocks(
 	 * forever.
 	 */
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
-	if (xfs_inode_has_bigrtalloc(ip))
+
+	/* Only try to free beyond the allocation unit that crosses EOF */
+	if (xfs_inode_has_forcealign(ip))
+		end_fsb = roundup_64(end_fsb, ip->i_extsize);
+	else if (xfs_inode_has_bigrtalloc(ip))
 		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
+
 	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
 	if (last_fsb <= end_fsb)
 		return false;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 09/13] xfs: Update xfs_inode_alloc_unitsize() for forcealign
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (7 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-05 16:24 ` [PATCH v2 10/13] xfs: Unmap blocks according to forcealign John Garry
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For forcealign enabled, the allocation unit size is in ip->i_extsize, and
this must never be zero.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_inode.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index caffd4c75179..bf9e84243f23 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -4301,7 +4301,9 @@ xfs_inode_alloc_unitsize(
 {
 	unsigned int		blocks = 1;
 
-	if (XFS_IS_REALTIME_INODE(ip))
+	if (xfs_inode_has_forcealign(ip))
+		blocks = ip->i_extsize;
+	else if (XFS_IS_REALTIME_INODE(ip))
 		blocks = ip->i_mount->m_sb.sb_rextsize;
 
 	return XFS_FSB_TO_B(ip->i_mount, blocks);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 10/13] xfs: Unmap blocks according to forcealign
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (8 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 09/13] xfs: Update xfs_inode_alloc_unitsize() " John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-06  7:58   ` Christoph Hellwig
  2024-07-05 16:24 ` [PATCH v2 11/13] xfs: Only free full extents for forcealign John Garry
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For when forcealign is enabled, blocks in an inode need to be unmapped
according to extent alignment, like what is already done for rtvol.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 46 ++++++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index db12f006646a..07478c88a51b 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5403,6 +5403,25 @@ xfs_bmap_del_extent_real(
 	return 0;
 }
 
+static xfs_extlen_t
+xfs_bunmapi_align(
+	struct xfs_inode	*ip,
+	xfs_fsblock_t		bno)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_agblock_t		agbno;
+
+	if (xfs_inode_has_forcealign(ip)) {
+		if (is_power_of_2(ip->i_extsize))
+			return bno & (ip->i_extsize - 1);
+
+		agbno = XFS_FSB_TO_AGBNO(mp, bno);
+		return agbno % ip->i_extsize;
+	}
+	ASSERT(XFS_IS_REALTIME_INODE(ip));
+	return xfs_rtb_to_rtxoff(ip->i_mount, bno);
+}
+
 /*
  * Unmap (remove) blocks from a file.
  * If nexts is nonzero then the number of extents to remove is limited to
@@ -5425,6 +5444,7 @@ __xfs_bunmapi(
 	struct xfs_bmbt_irec	got;		/* current extent record */
 	struct xfs_ifork	*ifp;		/* inode fork pointer */
 	int			isrt;		/* freeing in rt area */
+	int			isforcealign;	/* freeing for inode with forcealign */
 	int			logflags;	/* transaction logging flags */
 	xfs_extlen_t		mod;		/* rt extent offset */
 	struct xfs_mount	*mp = ip->i_mount;
@@ -5462,6 +5482,8 @@ __xfs_bunmapi(
 	}
 	XFS_STATS_INC(mp, xs_blk_unmap);
 	isrt = xfs_ifork_is_realtime(ip, whichfork);
+	isforcealign = (whichfork != XFS_ATTR_FORK) &&
+			xfs_inode_has_forcealign(ip);
 	end = start + len;
 
 	if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) {
@@ -5513,14 +5535,13 @@ __xfs_bunmapi(
 		if (del.br_startoff + del.br_blockcount > end + 1)
 			del.br_blockcount = end + 1 - del.br_startoff;
 
-		if (!isrt || (flags & XFS_BMAPI_REMAP))
+		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
 			goto delete;
 
-		mod = xfs_rtb_to_rtxoff(mp,
-				del.br_startblock + del.br_blockcount);
+		mod = xfs_bunmapi_align(ip, del.br_startblock + del.br_blockcount);
 		if (mod) {
 			/*
-			 * Realtime extent not lined up at the end.
+			 * Not aligned to allocation unit on the end.
 			 * The extent could have been split into written
 			 * and unwritten pieces, or we could just be
 			 * unmapping part of it.  But we can't really
@@ -5565,14 +5586,21 @@ __xfs_bunmapi(
 			goto nodelete;
 		}
 
-		mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
+		mod = xfs_bunmapi_align(ip, del.br_startblock);
 		if (mod) {
-			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
+			xfs_extlen_t off;
+
+			if (isforcealign) {
+				off = ip->i_extsize - mod;
+			} else {
+				ASSERT(isrt);
+				off = mp->m_sb.sb_rextsize - mod;
+			}
 
 			/*
-			 * Realtime extent is lined up at the end but not
-			 * at the front.  We'll get rid of full extents if
-			 * we can.
+			 * Extent is lined up to the allocation unit at the
+			 * end but not at the front.  We'll get rid of full
+			 * extents if we can.
 			 */
 			if (del.br_blockcount > off) {
 				del.br_blockcount -= off;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 11/13] xfs: Only free full extents for forcealign
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (9 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 10/13] xfs: Unmap blocks according to forcealign John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-06  7:59   ` Christoph Hellwig
  2024-07-05 16:24 ` [PATCH v2 12/13] xfs: Don't revert allocated offset " John Garry
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

Like we already do for rtvol, only free full extents for forcealign in
xfs_free_file_space().

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_bmap_util.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index b9f8093ae78c..daef6085208d 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -856,8 +856,11 @@ xfs_free_file_space(
 	startoffset_fsb = XFS_B_TO_FSB(mp, offset);
 	endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
 
-	/* We can only free complete realtime extents. */
-	if (xfs_inode_has_bigrtalloc(ip)) {
+	/* Free only complete extents. */
+	if (xfs_inode_has_forcealign(ip)) {
+		startoffset_fsb = roundup_64(startoffset_fsb, ip->i_extsize);
+		endoffset_fsb = rounddown_64(endoffset_fsb, ip->i_extsize);
+	} else if (xfs_inode_has_bigrtalloc(ip)) {
 		startoffset_fsb = xfs_rtb_roundup_rtx(mp, startoffset_fsb);
 		endoffset_fsb = xfs_rtb_rounddown_rtx(mp, endoffset_fsb);
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 12/13] xfs: Don't revert allocated offset for forcealign
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (10 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 11/13] xfs: Only free full extents for forcealign John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-05 16:24 ` [PATCH v2 13/13] xfs: Enable file data forcealign feature John Garry
  2024-07-06  7:53 ` [PATCH v2 00/13] forcealign for xfs Christoph Hellwig
  13 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

In xfs_bmap_process_allocated_extent(), for when we found that we could not
provide the requested length completely, the mapping is moved so that we
can provide as much as possible for the original request.

For forcealign, this would mean ignoring alignment guaranteed, so don't do
this.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 07478c88a51b..45694ceea35d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3492,11 +3492,15 @@ xfs_bmap_process_allocated_extent(
 	 * original request as possible.  Free space is apparently
 	 * very fragmented so we're unlikely to be able to satisfy the
 	 * hints anyway.
+	 * However, for an inode with forcealign, continue with the
+	 * found offset as we need to honour the alignment hint.
 	 */
-	if (ap->length <= orig_length)
-		ap->offset = orig_offset;
-	else if (ap->offset + ap->length < orig_offset + orig_length)
-		ap->offset = orig_offset + orig_length - ap->length;
+	if (!xfs_inode_has_forcealign(ap->ip)) {
+		if (ap->length <= orig_length)
+			ap->offset = orig_offset;
+		else if (ap->offset + ap->length < orig_offset + orig_length)
+			ap->offset = orig_offset + orig_length - ap->length;
+	}
 	xfs_bmap_alloc_account(ap);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 13/13] xfs: Enable file data forcealign feature
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (11 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 12/13] xfs: Don't revert allocated offset " John Garry
@ 2024-07-05 16:24 ` John Garry
  2024-07-06  7:53 ` [PATCH v2 00/13] forcealign for xfs Christoph Hellwig
  13 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-05 16:24 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: "Darrick J. Wong" <djwong@kernel.org>

Enable this feature.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index b48cd75d34a6..42e1f80206ab 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -358,7 +358,8 @@ xfs_sb_has_compat_feature(
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
 		 XFS_SB_FEAT_RO_COMPAT_REFLINK| \
-		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT | \
+		 XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 00/13] forcealign for xfs
  2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
                   ` (12 preceding siblings ...)
  2024-07-05 16:24 ` [PATCH v2 13/13] xfs: Enable file data forcealign feature John Garry
@ 2024-07-06  7:53 ` Christoph Hellwig
  2024-07-08  7:48   ` John Garry
  13 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-06  7:53 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, djwong, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Fri, Jul 05, 2024 at 04:24:37PM +0000, John Garry wrote:
> The actual forcealign patches are the same in this series, modulo an
> attempt for a fix in xfs_bunmapi_align().
> 
> Why forcealign?
> In some scenarios to may be required to guarantee extent alignment and
> granularity.
> 
> For example, for atomic writes, the maximum atomic write unit size would
> be limited at the extent alignment and granularity, guaranteeing that an
> atomic write would not span data present in multiple extents.
> 
> forcealign may be useful as a performance tuning optimization in other
> scenarios.

From previous side discussion I know Dave disagrees, but given how
much pain the larger than FSB rtextents have caused I'm very skeptical
if taking this on is the right tradeoff.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign
  2024-07-05 16:24 ` [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign John Garry
@ 2024-07-06  7:56   ` Christoph Hellwig
  2024-07-08  1:44     ` Dave Chinner
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-06  7:56 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, djwong, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Fri, Jul 05, 2024 at 04:24:45PM +0000, John Garry wrote:
> -	if (xfs_inode_has_bigrtalloc(ip))
> +
> +	/* Only try to free beyond the allocation unit that crosses EOF */
> +	if (xfs_inode_has_forcealign(ip))
> +		end_fsb = roundup_64(end_fsb, ip->i_extsize);
> +	else if (xfs_inode_has_bigrtalloc(ip))
>  		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);

Shouldn't we have a common helper to align things the right way?

But more importantly shouldn't this also cover hole punching if we
really want force aligned boundaries?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 10/13] xfs: Unmap blocks according to forcealign
  2024-07-05 16:24 ` [PATCH v2 10/13] xfs: Unmap blocks according to forcealign John Garry
@ 2024-07-06  7:58   ` Christoph Hellwig
  2024-07-08 14:48     ` John Garry
  2024-07-09  9:57     ` Dave Chinner
  0 siblings, 2 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-06  7:58 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, djwong, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

> +static xfs_extlen_t
> +xfs_bunmapi_align(
> +	struct xfs_inode	*ip,
> +	xfs_fsblock_t		bno)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_agblock_t		agbno;
> +
> +	if (xfs_inode_has_forcealign(ip)) {
> +		if (is_power_of_2(ip->i_extsize))
> +			return bno & (ip->i_extsize - 1);
> +
> +		agbno = XFS_FSB_TO_AGBNO(mp, bno);
> +		return agbno % ip->i_extsize;
> +	}
> +	ASSERT(XFS_IS_REALTIME_INODE(ip));
> +	return xfs_rtb_to_rtxoff(ip->i_mount, bno);

This helper isn't really bunmapi sepcific, is it?

> @@ -5425,6 +5444,7 @@ __xfs_bunmapi(
>  	struct xfs_bmbt_irec	got;		/* current extent record */
>  	struct xfs_ifork	*ifp;		/* inode fork pointer */
>  	int			isrt;		/* freeing in rt area */
> +	int			isforcealign;	/* freeing for inode with forcealign */

This is really a bool.  And while it matches the code around it the
code feels a bit too verbose..
> 
> +		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
>  			goto delete;
>  
> -		mod = xfs_rtb_to_rtxoff(mp,
> -				del.br_startblock + del.br_blockcount);
> +		mod = xfs_bunmapi_align(ip, del.br_startblock + del.br_blockcount);

Overly long line.

We've been long wanting to split the whole align / convert unwritten /
etc code into a helper outside the main bumapi flow.  And when adding
new logic to it this might indeed be a good time.

> +			if (isforcealign) {
> +				off = ip->i_extsize - mod;
> +			} else {
> +				ASSERT(isrt);
> +				off = mp->m_sb.sb_rextsize - mod;
> +			}

And we'll really need proper helpers so that we don't have to
open code the i_extsize vs sb_rextsize logic all over.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 11/13] xfs: Only free full extents for forcealign
  2024-07-05 16:24 ` [PATCH v2 11/13] xfs: Only free full extents for forcealign John Garry
@ 2024-07-06  7:59   ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-06  7:59 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, djwong, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

> +	if (xfs_inode_has_forcealign(ip)) {
> +		startoffset_fsb = roundup_64(startoffset_fsb, ip->i_extsize);
> +		endoffset_fsb = rounddown_64(endoffset_fsb, ip->i_extsize);
> +	} else if (xfs_inode_has_bigrtalloc(ip)) {
>  		startoffset_fsb = xfs_rtb_roundup_rtx(mp, startoffset_fsb);
>  		endoffset_fsb = xfs_rtb_rounddown_rtx(mp, endoffset_fsb);

And just like elsewhere this should use common helpers.

I mean in the end the rtextsize is basically a special case of force
align.  The checks should be able to cover both easily.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign
  2024-07-06  7:56   ` Christoph Hellwig
@ 2024-07-08  1:44     ` Dave Chinner
  2024-07-08  7:36       ` John Garry
  2024-07-09  7:41       ` Christoph Hellwig
  0 siblings, 2 replies; 48+ messages in thread
From: Dave Chinner @ 2024-07-08  1:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: John Garry, chandan.babu, djwong, dchinner, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Sat, Jul 06, 2024 at 09:56:09AM +0200, Christoph Hellwig wrote:
> On Fri, Jul 05, 2024 at 04:24:45PM +0000, John Garry wrote:
> > -	if (xfs_inode_has_bigrtalloc(ip))
> > +
> > +	/* Only try to free beyond the allocation unit that crosses EOF */
> > +	if (xfs_inode_has_forcealign(ip))
> > +		end_fsb = roundup_64(end_fsb, ip->i_extsize);
> > +	else if (xfs_inode_has_bigrtalloc(ip))
> >  		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
> 
> Shouldn't we have a common helper to align things the right way?

Yes, that's what I keep saying. The common way to do this is:

	align = xfs_inode_alloc_unitsize(ip);
	if (align > mp->m_blocksize)
		end_fsb = roundup_64(end_fsb, align);

Wrapping that into a helper might be appropriate, though we'd need
wrappers for aligning both the start (down) and end (up).

To make this work, the xfs_inode_alloc_unitsize() code needs to grow
a forcealign check. That overrides the RT rextsize value (force
align on RT should work the same as it does on data devs) and needs
to look like this:

	unsigned int		blocks = 1;

+	if (xfs_inode_has_forcealign(ip)
+		blocks = ip->i_extsize;
-	if (XFS_IS_REALTIME_INODE(ip))
+	else if (XFS_IS_REALTIME_INODE(ip))
                blocks = ip->i_mount->m_sb.sb_rextsize;

        return XFS_FSB_TO_B(ip->i_mount, blocks);

> But more importantly shouldn't this also cover hole punching if we
> really want force aligned boundaries?

Yes, that's what I keep saying. There is no difference in the
alignment behaviour needed for "xfs_inode_has_bigrtalloc" and
"xfs_inode_has_forcealign" except for the source of the allocation
alignment value.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign
  2024-07-08  1:44     ` Dave Chinner
@ 2024-07-08  7:36       ` John Garry
  2024-07-08 11:12         ` Dave Chinner
  2024-07-09  7:41       ` Christoph Hellwig
  1 sibling, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-08  7:36 UTC (permalink / raw)
  To: Dave Chinner, Christoph Hellwig, djwong
  Cc: chandan.babu, dchinner, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 08/07/2024 02:44, Dave Chinner wrote:
> On Sat, Jul 06, 2024 at 09:56:09AM +0200, Christoph Hellwig wrote:
>> On Fri, Jul 05, 2024 at 04:24:45PM +0000, John Garry wrote:
>>> -	if (xfs_inode_has_bigrtalloc(ip))
>>> +
>>> +	/* Only try to free beyond the allocation unit that crosses EOF */
>>> +	if (xfs_inode_has_forcealign(ip))
>>> +		end_fsb = roundup_64(end_fsb, ip->i_extsize);
>>> +	else if (xfs_inode_has_bigrtalloc(ip))
>>>   		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
>>
>> Shouldn't we have a common helper to align things the right way?
> 
> Yes, that's what I keep saying. 

Such a change was introduced in 
https://lore.kernel.org/linux-xfs/20240501235310.GP360919@frogsfrogsfrogs/

and, as you can see, Darrick was less than happy with it. That is why I 
kept this method which removed recently added RT code.

Darrick, can we find a better method to factor this code out, like below?

> The common way to do this is:
> 
> 	align = xfs_inode_alloc_unitsize(ip);
> 	if (align > mp->m_blocksize)
> 		end_fsb = roundup_64(end_fsb, align);
> 
> Wrapping that into a helper might be appropriate, though we'd need
> wrappers for aligning both the start (down) and end (up).
> 
> To make this work, the xfs_inode_alloc_unitsize() code needs to grow
> a forcealign check. That overrides the RT rextsize value (force
> align on RT should work the same as it does on data devs) and needs
> to look like this:
> 
> 	unsigned int		blocks = 1;
> 
> +	if (xfs_inode_has_forcealign(ip)
> +		blocks = ip->i_extsize;
> -	if (XFS_IS_REALTIME_INODE(ip))
> +	else if (XFS_IS_REALTIME_INODE(ip))
>                  blocks = ip->i_mount->m_sb.sb_rextsize;

That's in 09/13

> 
>          return XFS_FSB_TO_B(ip->i_mount, blocks);
> 
>> But more importantly shouldn't this also cover hole punching if we
>> really want force aligned boundaries?

so doesn't the xfs_file_fallocate(PUNCH_HOLES) -> 
xfs_flush_unmap_range() -> rounding with xfs_inode_alloc_unitsize() do 
the required job?

> 
> Yes, that's what I keep saying. There is no difference in the
> alignment behaviour needed for "xfs_inode_has_bigrtalloc" and
> "xfs_inode_has_forcealign" except for the source of the allocation
> alignment value.
> 




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 00/13] forcealign for xfs
  2024-07-06  7:53 ` [PATCH v2 00/13] forcealign for xfs Christoph Hellwig
@ 2024-07-08  7:48   ` John Garry
  2024-07-09  7:48     ` Christoph Hellwig
  0 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-08  7:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: chandan.babu, djwong, dchinner, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 06/07/2024 08:53, Christoph Hellwig wrote:
> On Fri, Jul 05, 2024 at 04:24:37PM +0000, John Garry wrote:
>> The actual forcealign patches are the same in this series, modulo an
>> attempt for a fix in xfs_bunmapi_align().
>>
>> Why forcealign?
>> In some scenarios to may be required to guarantee extent alignment and
>> granularity.
>>
>> For example, for atomic writes, the maximum atomic write unit size would
>> be limited at the extent alignment and granularity, guaranteeing that an
>> atomic write would not span data present in multiple extents.
>>
>> forcealign may be useful as a performance tuning optimization in other
>> scenarios.
> 
>  From previous side discussion I know Dave disagrees, but given how
> much pain the larger than FSB rtextents have caused I'm very skeptical
> if taking this on is the right tradeoff.
> 

I am not sure what that pain is, but I guess it's the maintainability 
and scalability of the scattered "if RT" checks for rounding up and down 
to larger extent size, right?

For forcealign, at least we can factor that stuff mostly into common 
forcealign+RT helpers, to keep the checks common. That is apart from the 
block allocator code.
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign
  2024-07-08  7:36       ` John Garry
@ 2024-07-08 11:12         ` Dave Chinner
  2024-07-08 14:41           ` John Garry
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2024-07-08 11:12 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, djwong, chandan.babu, dchinner, viro, brauner,
	jack, linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Mon, Jul 08, 2024 at 08:36:52AM +0100, John Garry wrote:
> On 08/07/2024 02:44, Dave Chinner wrote:
> > On Sat, Jul 06, 2024 at 09:56:09AM +0200, Christoph Hellwig wrote:
> > > On Fri, Jul 05, 2024 at 04:24:45PM +0000, John Garry wrote:
> > > > -	if (xfs_inode_has_bigrtalloc(ip))
> > > > +
> > > > +	/* Only try to free beyond the allocation unit that crosses EOF */
> > > > +	if (xfs_inode_has_forcealign(ip))
> > > > +		end_fsb = roundup_64(end_fsb, ip->i_extsize);
> > > > +	else if (xfs_inode_has_bigrtalloc(ip))
> > > >   		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
> > > 
> > > Shouldn't we have a common helper to align things the right way?
> > 
> > Yes, that's what I keep saying.
> 
> Such a change was introduced in
> https://lore.kernel.org/linux-xfs/20240501235310.GP360919@frogsfrogsfrogs/
> 
> and, as you can see, Darrick was less than happy with it. That is why I kept
> this method which removed recently added RT code.

I know.

However, "This is pointless busywork!" is not a technical argument
against the observation that rtbigalloc and forcealign are exactly
the same thing from the BMBT management POV and so should be
combined.

Arguing that "but doing the right thing makes more work for me"
doesn't hold any weight. It never has. Shouting and ranting
irrationally is a great way to shut down any further conversation,
though.

Our normal process is to factor the code and add the extra condition
for the new feature. That's all I'm asking to be done. It's not
technically difficult. It makes the code better. it makes the code
easier to test, too, because there are now two entries in the test
matrix taht exercise that code path. It's simpler to understand
months down the track, makes new alignment features easier to add in
future, etc.

Put simply: if we just do what we have always done, then we end up
with better code.  Hence I just don't see why people are trying to
make a mountain out of this...

> Darrick, can we find a better method to factor this code out, like below?
> 
> > The common way to do this is:
> > 
> > 	align = xfs_inode_alloc_unitsize(ip);
> > 	if (align > mp->m_blocksize)
> > 		end_fsb = roundup_64(end_fsb, align);
> > 
> > Wrapping that into a helper might be appropriate, though we'd need
> > wrappers for aligning both the start (down) and end (up).
> > 
> > To make this work, the xfs_inode_alloc_unitsize() code needs to grow
> > a forcealign check. That overrides the RT rextsize value (force
> > align on RT should work the same as it does on data devs) and needs
> > to look like this:
> > 
> > 	unsigned int		blocks = 1;
> > 
> > +	if (xfs_inode_has_forcealign(ip)
> > +		blocks = ip->i_extsize;
> > -	if (XFS_IS_REALTIME_INODE(ip))
> > +	else if (XFS_IS_REALTIME_INODE(ip))
> >                  blocks = ip->i_mount->m_sb.sb_rextsize;
> 
> That's in 09/13

Thanks, I thought it was somewhere in this patch series, I just
wanted to point out (once again) that rtbigalloc and forcealign are
basically the same thing.

And, in case it isn't obvious to everyone, setting forcealign on a
rt inode is basically the equivalent of turning on "rtbigalloc" for
just that inode....

> >          return XFS_FSB_TO_B(ip->i_mount, blocks);
> > 
> > > But more importantly shouldn't this also cover hole punching if we
> > > really want force aligned boundaries?
> 
> so doesn't the xfs_file_fallocate(PUNCH_HOLES) -> xfs_flush_unmap_range() ->
> rounding with xfs_inode_alloc_unitsize() do the required job?

No, xfs_flush_unmap_range() should be flushing to *outwards*
block/page size boundaries because it is cleaning and invalidating
the page cache over the punch range, not manipulating the physical
extents underlying the data.

It's only once we go to punch out the extents in
xfs_free_file_space() that we need to use xfs_inode_alloc_unitsize()
to determine the *inwards* rounding for the extent punch vs writing
physical zeroes....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign
  2024-07-08 11:12         ` Dave Chinner
@ 2024-07-08 14:41           ` John Garry
  0 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-08 14:41 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, djwong, chandan.babu, dchinner, viro, brauner,
	jack, linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On 08/07/2024 12:12, Dave Chinner wrote:
> On Mon, Jul 08, 2024 at 08:36:52AM +0100, John Garry wrote:
>> On 08/07/2024 02:44, Dave Chinner wrote:
>>> On Sat, Jul 06, 2024 at 09:56:09AM +0200, Christoph Hellwig wrote:
>>>> On Fri, Jul 05, 2024 at 04:24:45PM +0000, John Garry wrote:
>>>>> -	if (xfs_inode_has_bigrtalloc(ip))
>>>>> +
>>>>> +	/* Only try to free beyond the allocation unit that crosses EOF */
>>>>> +	if (xfs_inode_has_forcealign(ip))
>>>>> +		end_fsb = roundup_64(end_fsb, ip->i_extsize);
>>>>> +	else if (xfs_inode_has_bigrtalloc(ip))
>>>>>    		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
>>>>
>>>> Shouldn't we have a common helper to align things the right way?
>>>
>>> Yes, that's what I keep saying.
>>
>> Such a change was introduced in
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20240501235310.GP360919@frogsfrogsfrogs/__;!!ACWV5N9M2RV99hQ!LU97IJHrwX9otpItjJI_ewbNt-T-Lgyt5ulyz0yGKc5Dmms4jqhwZv5NregBEK_dTEtEDCYCwcA43RxQFnc$
>>
>> and, as you can see, Darrick was less than happy with it. That is why I kept
>> this method which removed recently added RT code.
> 
> I know.
> 
> However, "This is pointless busywork!" is not a technical argument
> against the observation that rtbigalloc and forcealign are exactly
> the same thing from the BMBT management POV and so should be
> combined.
> 
> Arguing that "but doing the right thing makes more work for me"
> doesn't hold any weight. It never has. Shouting and ranting
> irrationally is a great way to shut down any further conversation,
> though.
> 
> Our normal process is to factor the code and add the extra condition
> for the new feature. That's all I'm asking to be done. It's not
> technically difficult. It makes the code better. it makes the code
> easier to test, too, because there are now two entries in the test
> matrix taht exercise that code path. It's simpler to understand
> months down the track, makes new alignment features easier to add in
> future, etc.
> 
> Put simply: if we just do what we have always done, then we end up
> with better code.  Hence I just don't see why people are trying to
> make a mountain out of this...

I tend to agree with what you said; and the conversation was halted, so 
left me in an awkward position.

> 
>> Darrick, can we find a better method to factor this code out, like below?
>>
>>> The common way to do this is:
>>>
>>> 	align = xfs_inode_alloc_unitsize(ip);
>>> 	if (align > mp->m_blocksize)
>>> 		end_fsb = roundup_64(end_fsb, align);
>>>
>>> Wrapping that into a helper might be appropriate, though we'd need
>>> wrappers for aligning both the start (down) and end (up).
>>>
>>> To make this work, the xfs_inode_alloc_unitsize() code needs to grow
>>> a forcealign check. That overrides the RT rextsize value (force
>>> align on RT should work the same as it does on data devs) and needs
>>> to look like this:
>>>
>>> 	unsigned int		blocks = 1;
>>>
>>> +	if (xfs_inode_has_forcealign(ip)
>>> +		blocks = ip->i_extsize;
>>> -	if (XFS_IS_REALTIME_INODE(ip))
>>> +	else if (XFS_IS_REALTIME_INODE(ip))
>>>                   blocks = ip->i_mount->m_sb.sb_rextsize;
>>
>> That's in 09/13
> 
> Thanks, I thought it was somewhere in this patch series, I just
> wanted to point out (once again) that rtbigalloc and forcealign are
> basically the same thing.
> 
> And, in case it isn't obvious to everyone, setting forcealign on a
> rt inode is basically the equivalent of turning on "rtbigalloc" for
> just that inode....

sure

> 
>>>           return XFS_FSB_TO_B(ip->i_mount, blocks);
>>>
>>>> But more importantly shouldn't this also cover hole punching if we
>>>> really want force aligned boundaries?
>>
>> so doesn't the xfs_file_fallocate(PUNCH_HOLES) -> xfs_flush_unmap_range() ->
>> rounding with xfs_inode_alloc_unitsize() do the required job?
> 
> No, xfs_flush_unmap_range() should be flushing to *outwards*
> block/page size boundaries because it is cleaning and invalidating
> the page cache over the punch range, not manipulating the physical
> extents underlying the data.
> 
> It's only once we go to punch out the extents in
> xfs_free_file_space() that we need to use xfs_inode_alloc_unitsize()
> to determine the *inwards* rounding for the extent punch vs writing
> physical zeroes....
> 

ok, well we are covered for forcealign in both xfs_flush_unmap_range() 
and xfs_free_file_space().


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 10/13] xfs: Unmap blocks according to forcealign
  2024-07-06  7:58   ` Christoph Hellwig
@ 2024-07-08 14:48     ` John Garry
  2024-07-09  7:46       ` Christoph Hellwig
  2024-07-09  9:57     ` Dave Chinner
  1 sibling, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-08 14:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: chandan.babu, djwong, dchinner, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 06/07/2024 08:58, Christoph Hellwig wrote:
>> +static xfs_extlen_t
>> +xfs_bunmapi_align(
>> +	struct xfs_inode	*ip,
>> +	xfs_fsblock_t		bno)
>> +{
>> +	struct xfs_mount	*mp = ip->i_mount;
>> +	xfs_agblock_t		agbno;
>> +
>> +	if (xfs_inode_has_forcealign(ip)) {
>> +		if (is_power_of_2(ip->i_extsize))
>> +			return bno & (ip->i_extsize - 1);
>> +
>> +		agbno = XFS_FSB_TO_AGBNO(mp, bno);
>> +		return agbno % ip->i_extsize;
>> +	}
>> +	ASSERT(XFS_IS_REALTIME_INODE(ip));
>> +	return xfs_rtb_to_rtxoff(ip->i_mount, bno);
> 
> This helper isn't really bunmapi sepcific, is it?

Right, it is not really. Apart from the ASSERT to ensure that we are not 
calling from a stray context.

> 
>> @@ -5425,6 +5444,7 @@ __xfs_bunmapi(
>>   	struct xfs_bmbt_irec	got;		/* current extent record */
>>   	struct xfs_ifork	*ifp;		/* inode fork pointer */
>>   	int			isrt;		/* freeing in rt area */
>> +	int			isforcealign;	/* freeing for inode with forcealign */
> 
> This is really a bool.  And while it matches the code around it the
> code feels a bit too verbose..

I can change both to a bool - would that be better?

Using isfa (instead of isforcealign) might be interpreted as something 
else :)

>>
>> +		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
>>   			goto delete;
>>   
>> -		mod = xfs_rtb_to_rtxoff(mp,
>> -				del.br_startblock + del.br_blockcount);
>> +		mod = xfs_bunmapi_align(ip, del.br_startblock + del.br_blockcount);
> 
> Overly long line.

noted

> 
> We've been long wanting to split the whole align / convert unwritten /
> etc code into a helper outside the main bumapi flow.  And when adding
> new logic to it this might indeed be a good time.

ok, I'll see if can come up with something

> 
>> +			if (isforcealign) {
>> +				off = ip->i_extsize - mod;
>> +			} else {
>> +				ASSERT(isrt);
>> +				off = mp->m_sb.sb_rextsize - mod;
>> +			}
> 
> And we'll really need proper helpers so that we don't have to
> open code the i_extsize vs sb_rextsize logic all over.

sure

> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign
  2024-07-08  1:44     ` Dave Chinner
  2024-07-08  7:36       ` John Garry
@ 2024-07-09  7:41       ` Christoph Hellwig
  1 sibling, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-09  7:41 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, John Garry, chandan.babu, djwong, dchinner,
	viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen

On Mon, Jul 08, 2024 at 11:44:59AM +1000, Dave Chinner wrote:
> On Sat, Jul 06, 2024 at 09:56:09AM +0200, Christoph Hellwig wrote:
> > On Fri, Jul 05, 2024 at 04:24:45PM +0000, John Garry wrote:
> > > -	if (xfs_inode_has_bigrtalloc(ip))
> > > +
> > > +	/* Only try to free beyond the allocation unit that crosses EOF */
> > > +	if (xfs_inode_has_forcealign(ip))
> > > +		end_fsb = roundup_64(end_fsb, ip->i_extsize);
> > > +	else if (xfs_inode_has_bigrtalloc(ip))
> > >  		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
> > 
> > Shouldn't we have a common helper to align things the right way?
> 
> Yes, that's what I keep saying. The common way to do this is:
> 
> 	align = xfs_inode_alloc_unitsize(ip);
> 	if (align > mp->m_blocksize)
> 		end_fsb = roundup_64(end_fsb, align);
> 
> Wrapping that into a helper might be appropriate, though we'd need
> wrappers for aligning both the start (down) and end (up).

I think the above is already good enough.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 10/13] xfs: Unmap blocks according to forcealign
  2024-07-08 14:48     ` John Garry
@ 2024-07-09  7:46       ` Christoph Hellwig
  2024-07-17 15:24         ` John Garry
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-09  7:46 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, chandan.babu, djwong, dchinner, viro, brauner,
	jack, linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Mon, Jul 08, 2024 at 03:48:20PM +0100, John Garry wrote:
>>> +	int			isforcealign;	/* freeing for inode with forcealign */
>>
>> This is really a bool.  And while it matches the code around it the
>> code feels a bit too verbose..
>
> I can change both to a bool - would that be better?
>
> Using isfa (instead of isforcealign) might be interpreted as something else 

The check should be used in one single place where we decided if
we need to to the alignment based adjustments.  So IMHO just killing
it and open coding it there seems way easier.   Yes, it is in a loop,
but compared to all the work done is is really cheap.

>> We've been long wanting to split the whole align / convert unwritten /
>> etc code into a helper outside the main bumapi flow.  And when adding
>> new logic to it this might indeed be a good time.
>
> ok, I'll see if can come up with something

I can take a look too.  There is some real mess in there like trying
to account for cases where the transaction doesn't have a block
reservation, which I think could have happen in truncate until
Zhang Yi fixed it for the 6.11 merge window.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 00/13] forcealign for xfs
  2024-07-08  7:48   ` John Garry
@ 2024-07-09  7:48     ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-09  7:48 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, chandan.babu, djwong, dchinner, viro, brauner,
	jack, linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Mon, Jul 08, 2024 at 08:48:19AM +0100, John Garry wrote:
> I am not sure what that pain is, but I guess it's the maintainability and 
> scalability of the scattered "if RT" checks for rounding up and down to 
> larger extent size, right?
>
> For forcealign, at least we can factor that stuff mostly into common 
> forcealign+RT helpers, to keep the checks common. That is apart from the 
> block allocator code.

Maybe its just me hating all the rtalloc larger than block size
allocation granularity mess and hoping it goes away.  With this we'll
make it certain that it is not going away, but that might just have
been a faint hope.  But if we add more of it we'll at least need to
ensure it is done using common helpers and clean the existing mess up
as much as we can.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 10/13] xfs: Unmap blocks according to forcealign
  2024-07-06  7:58   ` Christoph Hellwig
  2024-07-08 14:48     ` John Garry
@ 2024-07-09  9:57     ` Dave Chinner
  2024-07-09 11:19       ` Christoph Hellwig
  1 sibling, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2024-07-09  9:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: John Garry, chandan.babu, djwong, dchinner, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Sat, Jul 06, 2024 at 09:58:58AM +0200, Christoph Hellwig wrote:
> 
> > +			if (isforcealign) {
> > +				off = ip->i_extsize - mod;
> > +			} else {
> > +				ASSERT(isrt);
> > +				off = mp->m_sb.sb_rextsize - mod;
> > +			}
> 
> And we'll really need proper helpers so that we don't have to
> open code the i_extsize vs sb_rextsize logic all over.

We already have that: xfs_inode_alloc_unitsize().

Have the code get that value, then do all the alignment based on
whether it is allocation unit size > mp->m_sb.sb_blocksize. Then all
the calculations are generic and not dependent on forcealign or rt,
but on whether the inode requires multi-block contiguous extent
alignment....

i.e.
	alloc_size = xfs_inode_alloc_unitsize(ip);
	if (alloc_size > mp->m_sb.sb_blocksize) {
		/* do aligned allocation setup stuff */
		.....
	}
	....

No code should be doing "if (forcealign) ... else if (realtime) ..."
branching for alignment purposes. All the code should all be
generic based on the value xfs_inode_alloc_unitsize() returns.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 10/13] xfs: Unmap blocks according to forcealign
  2024-07-09  9:57     ` Dave Chinner
@ 2024-07-09 11:19       ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-09 11:19 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, John Garry, chandan.babu, djwong, dchinner,
	viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen

On Tue, Jul 09, 2024 at 07:57:22PM +1000, Dave Chinner wrote:
> No code should be doing "if (forcealign) ... else if (realtime) ..."
> branching for alignment purposes. All the code should all be
> generic based on the value xfs_inode_alloc_unitsize() returns.

Yes, please.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-05 16:24 ` [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag John Garry
@ 2024-07-11  2:59   ` Darrick J. Wong
  2024-07-11  3:59     ` Christoph Hellwig
                       ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Darrick J. Wong @ 2024-07-11  2:59 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jul 05, 2024 at 04:24:44PM +0000, John Garry wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> Add a new inode flag to require that all file data extent mappings must
> be aligned (both the file offset range and the allocated space itself)
> to the extent size hint.  Having a separate COW extent size hint is no
> longer allowed.
> 
> The goal here is to enable sysadmins and users to mandate that all space
> mappings in a file must have a startoff/blockcount that are aligned to
> (say) a 2MB alignment and that the startblock/blockcount will follow the
> same alignment.
> 
> Allocated space will be aligned to start of the AG, and not necessarily
> aligned with disk blocks. The upcoming atomic writes feature will rely and
> forcealign and will also require allocated space will also be aligned to
> disk blocks.
> 
> Currently RT is not be supported for forcealign, so reject a mount under
> that condition. In future, it should be possible to support forcealign for
> RT. In this case, the extent size hint will need to be aligned with
> rtextsize, so add inode verification for that now.
> 
> reflink link will not be supported for forcealign. This is because we have
> the limitation of pageache writeback not knowing how to writeback an
> entire allocation unut, so reject a mount with relink.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> Co-developed-by: John Garry <john.g.garry@oracle.com>
> [jpg: many changes from orig, including forcealign inode verification
>  rework, disallow RT and forcealign mount, ioctl setattr rework,
>  disallow reflink a forcealign inode]
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_format.h    |  6 +++-
>  fs/xfs/libxfs/xfs_inode_buf.c | 55 +++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_inode_buf.h |  3 ++
>  fs/xfs/libxfs/xfs_sb.c        |  2 ++
>  fs/xfs/xfs_inode.c            | 13 +++++++++
>  fs/xfs/xfs_inode.h            | 20 ++++++++++++-
>  fs/xfs/xfs_ioctl.c            | 51 ++++++++++++++++++++++++++++++--
>  fs/xfs/xfs_mount.h            |  2 ++
>  fs/xfs/xfs_reflink.c          |  5 ++--
>  fs/xfs/xfs_reflink.h          | 10 -------
>  fs/xfs/xfs_super.c            | 11 +++++++
>  include/uapi/linux/fs.h       |  2 ++
>  12 files changed, 164 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 61f51becff4f..b48cd75d34a6 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -353,6 +353,7 @@ xfs_sb_has_compat_feature(
>  #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
>  #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
>  #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
> +#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
>  #define XFS_SB_FEAT_RO_COMPAT_ALL \
>  		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
>  		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
> @@ -1094,16 +1095,19 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
>  #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
>  #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
>  #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
> +/* data extent mappings for regular files must be aligned to extent size hint */
> +#define XFS_DIFLAG2_FORCEALIGN_BIT 5
>  
>  #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
>  #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
>  #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
>  #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
>  #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
> +#define XFS_DIFLAG2_FORCEALIGN	(1 << XFS_DIFLAG2_FORCEALIGN_BIT)
>  
>  #define XFS_DIFLAG2_ANY \
>  	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
> -	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
> +	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_FORCEALIGN)
>  
>  static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
>  {
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 513b50da6215..5c61a1d1bb2b 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -657,6 +657,15 @@ xfs_dinode_verify(
>  	    !xfs_has_bigtime(mp))
>  		return __this_address;
>  
> +	if (flags2 & XFS_DIFLAG2_FORCEALIGN) {
> +		fa = xfs_inode_validate_forcealign(mp,
> +				be32_to_cpu(dip->di_extsize),
> +				be32_to_cpu(dip->di_cowextsize),
> +				mode, flags, flags2);
> +		if (fa)
> +			return fa;
> +	}
> +
>  	return NULL;
>  }
>  
> @@ -824,3 +833,49 @@ xfs_inode_validate_cowextsize(
>  
>  	return NULL;
>  }
> +
> +/* Validate the forcealign inode flag */
> +xfs_failaddr_t
> +xfs_inode_validate_forcealign(
> +	struct xfs_mount	*mp,
> +	uint32_t		extsize,
> +	uint32_t		cowextsize,
> +	uint16_t		mode,
> +	uint16_t		flags,
> +	uint64_t		flags2)
> +{
> +	bool			rt =  flags & XFS_DIFLAG_REALTIME;
> +
> +	/* superblock rocompat feature flag */
> +	if (!xfs_has_forcealign(mp))
> +		return __this_address;
> +
> +	/* Only regular files and directories */
> +	if (!S_ISDIR(mode) && !S_ISREG(mode))
> +		return __this_address;
> +
> +	/* We require EXTSIZE or EXTSZINHERIT */
> +	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
> +		return __this_address;
> +
> +	/* We require a non-zero extsize */
> +	if (!extsize)
> +		return __this_address;
> +
> +	/* Reflink'ed disallowed */
> +	if (flags2 & XFS_DIFLAG2_REFLINK)
> +		return __this_address;

Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
superblock verifier or xfs_fs_fill_super fail the mount so that old
kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
support for forcealign'd cow and starts writing out files with both
iflags set?

That said, if the bs>ps patchset lands, then I think forcealign cow is
a simple matter of setting the min folio order to the forcealign size
and making sure that we always write out entire folios if any of the
blocks cached by the folio is shared.  Direct writes to forcealigned
shared files probably has to be aligned to the forcealign size or fall
back to buffered writes for cow.

--D

> +
> +	/* COW extsize disallowed */
> +	if (flags2 & XFS_DIFLAG2_COWEXTSIZE)
> +		return __this_address;
> +
> +	if (cowextsize)
> +		return __this_address;
> +
> +	/* extsize must be a multiple of sb_rextsize for RT */
> +	if (rt && mp->m_sb.sb_rextsize && extsize % mp->m_sb.sb_rextsize)
> +		return __this_address;
> +
> +	return NULL;
> +}
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h
> index 585ed5a110af..b8b65287b037 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.h
> +++ b/fs/xfs/libxfs/xfs_inode_buf.h
> @@ -33,6 +33,9 @@ xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp,
>  xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp,
>  		uint32_t cowextsize, uint16_t mode, uint16_t flags,
>  		uint64_t flags2);
> +xfs_failaddr_t xfs_inode_validate_forcealign(struct xfs_mount *mp,
> +		uint32_t extsize, uint32_t cowextsize, uint16_t mode,
> +		uint16_t flags, uint64_t flags2);
>  
>  static inline uint64_t xfs_inode_encode_bigtime(struct timespec64 tv)
>  {
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 6b56f0f6d4c1..e56911553edd 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -164,6 +164,8 @@ xfs_sb_version_to_features(
>  		features |= XFS_FEAT_REFLINK;
>  	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
>  		features |= XFS_FEAT_INOBTCNT;
> +	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
> +		features |= XFS_FEAT_FORCEALIGN;
>  	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
>  		features |= XFS_FEAT_FTYPE;
>  	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index a4e3cd8971fc..caffd4c75179 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -609,6 +609,8 @@ xfs_ip2xflags(
>  			flags |= FS_XFLAG_DAX;
>  		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
>  			flags |= FS_XFLAG_COWEXTSIZE;
> +		if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
> +			flags |= FS_XFLAG_FORCEALIGN;
>  	}
>  
>  	if (xfs_inode_has_attr_fork(ip))
> @@ -738,6 +740,8 @@ xfs_inode_inherit_flags2(
>  	}
>  	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
>  		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
> +	if (pip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
> +		ip->i_diflags2 |= XFS_DIFLAG2_FORCEALIGN;
>  
>  	/* Don't let invalid cowextsize hints propagate. */
>  	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
> @@ -746,6 +750,15 @@ xfs_inode_inherit_flags2(
>  		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
>  		ip->i_cowextsize = 0;
>  	}
> +
> +	if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN) {
> +		failaddr = xfs_inode_validate_forcealign(ip->i_mount,
> +				ip->i_extsize, ip->i_cowextsize,
> +				VFS_I(ip)->i_mode, ip->i_diflags,
> +				ip->i_diflags2);
> +		if (failaddr)
> +			ip->i_diflags2 &= ~XFS_DIFLAG2_FORCEALIGN;
> +	}
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 42f999c1106c..536e646dd055 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -301,6 +301,16 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
>  	return ip->i_cowfp && ip->i_cowfp->if_bytes;
>  }
>  
> +static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
> +{
> +	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
> +}
> +
> +static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
> +{
> +	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
> +}
> +
>  static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
>  {
>  	return ip->i_diflags2 & XFS_DIFLAG2_BIGTIME;
> @@ -313,7 +323,15 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
>  
>  static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
>  {
> -	return false;
> +	if (!(ip->i_diflags & XFS_DIFLAG_EXTSIZE))
> +		return false;
> +	if (ip->i_extsize <= 1)
> +		return false;
> +	if (xfs_is_cow_inode(ip))
> +		return false;
> +	if (ip->i_diflags & XFS_DIFLAG_REALTIME)
> +		return false;
> +	return ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN;
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index f0117188f302..771ef3954f4e 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -525,10 +525,48 @@ xfs_flags2diflags2(
>  		di_flags2 |= XFS_DIFLAG2_DAX;
>  	if (xflags & FS_XFLAG_COWEXTSIZE)
>  		di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
> +	if (xflags & FS_XFLAG_FORCEALIGN)
> +		di_flags2 |= XFS_DIFLAG2_FORCEALIGN;
>  
>  	return di_flags2;
>  }
>  
> +/*
> + * Forcealign requires a non-zero extent size hint and a zero cow
> + * extent size hint.  Don't allow set for RT files yet.
> + */
> +static int
> +xfs_ioctl_setattr_forcealign(
> +	struct xfs_inode	*ip,
> +	struct fileattr		*fa)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +
> +	if (!xfs_has_forcealign(mp))
> +		return -EINVAL;
> +
> +	if (xfs_is_reflink_inode(ip))
> +		return -EINVAL;
> +
> +	if (!(fa->fsx_xflags & (FS_XFLAG_EXTSIZE |
> +				FS_XFLAG_EXTSZINHERIT)))
> +		return -EINVAL;
> +
> +	if (fa->fsx_xflags & FS_XFLAG_COWEXTSIZE)
> +		return -EINVAL;
> +
> +	if (!fa->fsx_extsize)
> +		return -EINVAL;
> +
> +	if (fa->fsx_cowextsize)
> +		return -EINVAL;
> +
> +	if (fa->fsx_xflags & FS_XFLAG_REALTIME)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static int
>  xfs_ioctl_setattr_xflags(
>  	struct xfs_trans	*tp,
> @@ -537,10 +575,13 @@ xfs_ioctl_setattr_xflags(
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	bool			rtflag = (fa->fsx_xflags & FS_XFLAG_REALTIME);
> +	bool			forcealign = fa->fsx_xflags & FS_XFLAG_FORCEALIGN;
>  	uint64_t		i_flags2;
> +	int			error;
>  
> -	if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
> -		/* Can't change realtime flag if any extents are allocated. */
> +	/* Can't change RT or forcealign flags if any extents are allocated. */
> +	if (rtflag != XFS_IS_REALTIME_INODE(ip) ||
> +	    forcealign != xfs_inode_has_forcealign(ip)) {
>  		if (ip->i_df.if_nextents || ip->i_delayed_blks)
>  			return -EINVAL;
>  	}
> @@ -561,6 +602,12 @@ xfs_ioctl_setattr_xflags(
>  	if (i_flags2 && !xfs_has_v3inodes(mp))
>  		return -EINVAL;
>  
> +	if (forcealign) {
> +		error = xfs_ioctl_setattr_forcealign(ip, fa);
> +		if (error)
> +			return error;
> +	}
> +
>  	ip->i_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
>  	ip->i_diflags2 = i_flags2;
>  
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index d0567dfbc036..30228fea908d 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -299,6 +299,7 @@ typedef struct xfs_mount {
>  #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
>  #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
>  #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
> +#define XFS_FEAT_FORCEALIGN	(1ULL << 28)	/* aligned file data extents */
>  
>  /* Mount features */
>  #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
> @@ -385,6 +386,7 @@ __XFS_ADD_V4_FEAT(projid32, PROJID32)
>  __XFS_HAS_V4_FEAT(v3inodes, V3INODES)
>  __XFS_HAS_V4_FEAT(crc, CRC)
>  __XFS_HAS_V4_FEAT(pquotino, PQUOTINO)
> +__XFS_HAS_FEAT(forcealign, FORCEALIGN)
>  
>  /*
>   * Mount features
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 265a2a418bc7..8da293e8bfa2 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1467,8 +1467,9 @@ xfs_reflink_remap_prep(
>  
>  	/* Check file eligibility and prepare for block sharing. */
>  	ret = -EINVAL;
> -	/* Don't reflink realtime inodes */
> -	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
> +	/* Don't reflink realtime or forcealign inodes */
> +	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest) ||
> +	    xfs_inode_has_forcealign(src) || xfs_inode_has_forcealign(dest))
>  		goto out_unlock;
>  
>  	/* Don't share DAX file data with non-DAX file. */
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 65c5dfe17ecf..fb55e4ce49fa 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -6,16 +6,6 @@
>  #ifndef __XFS_REFLINK_H
>  #define __XFS_REFLINK_H 1
>  
> -static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
> -{
> -	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
> -}
> -
> -static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
> -{
> -	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
> -}
> -
>  extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *irec, bool *shared);
>  int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 27e9f749c4c7..4f0c77f38de1 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1721,6 +1721,17 @@ xfs_fs_fill_super(
>  		mp->m_features &= ~XFS_FEAT_DISCARD;
>  	}
>  
> +	if (xfs_has_forcealign(mp)) {
> +		if (xfs_has_realtime(mp)) {
> +			xfs_alert(mp,
> +	"forcealign not compatible with realtime device!");
> +			error = -EINVAL;
> +			goto out_filestream_unmount;
> +		}
> +		xfs_warn(mp,
> +"EXPERIMENTAL forced data extent alignment feature in use. Use at your own risk!");
> +	}
> +
>  	if (xfs_has_reflink(mp)) {
>  		if (mp->m_sb.sb_rblocks) {
>  			xfs_alert(mp,
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 45e4e64fd664..8af304c0e29a 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -158,6 +158,8 @@ struct fsxattr {
>  #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
>  #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
>  #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
> +/* data extent mappings for regular files must be aligned to extent size hint */
> +#define FS_XFLAG_FORCEALIGN	0x00020000
>  #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
>  
>  /* the read-only stuff doesn't really belong here, but any other place is
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-11  2:59   ` Darrick J. Wong
@ 2024-07-11  3:59     ` Christoph Hellwig
  2024-07-11  7:17     ` John Garry
  2024-07-11 23:20     ` Dave Chinner
  2 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-11  3:59 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Wed, Jul 10, 2024 at 07:59:58PM -0700, Darrick J. Wong wrote:
> Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
> superblock verifier or xfs_fs_fill_super fail the mount so that old
> kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
> support for forcealign'd cow and starts writing out files with both
> iflags set?

Yes.

> That said, if the bs>ps patchset lands, then I think forcealign cow is
> a simple matter of setting the min folio order to the forcealign size
> and making sure that we always write out entire folios if any of the
> blocks cached by the folio is shared.  Direct writes to forcealigned
> shared files probably has to be aligned to the forcealign size or fall
> back to buffered writes for cow.

It has all the same problems as rtexsize > 1 + reflink, and suppoting
it will require raiding your patch stack.  Or better just wait until
we've got all that in now that we're actively working on it.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-11  2:59   ` Darrick J. Wong
  2024-07-11  3:59     ` Christoph Hellwig
@ 2024-07-11  7:17     ` John Garry
  2024-07-11 23:33       ` Dave Chinner
  2024-07-11 23:20     ` Dave Chinner
  2 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-11  7:17 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 11/07/2024 03:59, Darrick J. Wong wrote:
> On Fri, Jul 05, 2024 at 04:24:44PM +0000, John Garry wrote:
>> From: "Darrick J. Wong" <djwong@kernel.org>
>>
>> Add a new inode flag to require that all file data extent mappings must
>> be aligned (both the file offset range and the allocated space itself)
>> to the extent size hint.  Having a separate COW extent size hint is no
>> longer allowed.
>>
>> The goal here is to enable sysadmins and users to mandate that all space
>> mappings in a file must have a startoff/blockcount that are aligned to
>> (say) a 2MB alignment and that the startblock/blockcount will follow the
>> same alignment.
>>
>> Allocated space will be aligned to start of the AG, and not necessarily
>> aligned with disk blocks. The upcoming atomic writes feature will rely and
>> forcealign and will also require allocated space will also be aligned to
>> disk blocks.
>>
>> Currently RT is not be supported for forcealign, so reject a mount under
>> that condition. In future, it should be possible to support forcealign for
>> RT. In this case, the extent size hint will need to be aligned with
>> rtextsize, so add inode verification for that now.
>>
>> reflink link will not be supported for forcealign. This is because we have
>> the limitation of pageache writeback not knowing how to writeback an
>> entire allocation unut, so reject a mount with relink.
>>
>> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
>> Co-developed-by: John Garry <john.g.garry@oracle.com>
>> [jpg: many changes from orig, including forcealign inode verification
>>   rework, disallow RT and forcealign mount, ioctl setattr rework,
>>   disallow reflink a forcealign inode]
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_format.h    |  6 +++-
>>   fs/xfs/libxfs/xfs_inode_buf.c | 55 +++++++++++++++++++++++++++++++++++
>>   fs/xfs/libxfs/xfs_inode_buf.h |  3 ++
>>   fs/xfs/libxfs/xfs_sb.c        |  2 ++
>>   fs/xfs/xfs_inode.c            | 13 +++++++++
>>   fs/xfs/xfs_inode.h            | 20 ++++++++++++-
>>   fs/xfs/xfs_ioctl.c            | 51 ++++++++++++++++++++++++++++++--
>>   fs/xfs/xfs_mount.h            |  2 ++
>>   fs/xfs/xfs_reflink.c          |  5 ++--
>>   fs/xfs/xfs_reflink.h          | 10 -------
>>   fs/xfs/xfs_super.c            | 11 +++++++
>>   include/uapi/linux/fs.h       |  2 ++
>>   12 files changed, 164 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index 61f51becff4f..b48cd75d34a6 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -353,6 +353,7 @@ xfs_sb_has_compat_feature(
>>   #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
>>   #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
>>   #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
>> +#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
>>   #define XFS_SB_FEAT_RO_COMPAT_ALL \
>>   		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
>>   		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
>> @@ -1094,16 +1095,19 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
>>   #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
>>   #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
>>   #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
>> +/* data extent mappings for regular files must be aligned to extent size hint */
>> +#define XFS_DIFLAG2_FORCEALIGN_BIT 5
>>   
>>   #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
>>   #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
>>   #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
>>   #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
>>   #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
>> +#define XFS_DIFLAG2_FORCEALIGN	(1 << XFS_DIFLAG2_FORCEALIGN_BIT)
>>   
>>   #define XFS_DIFLAG2_ANY \
>>   	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
>> -	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
>> +	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_FORCEALIGN)
>>   
>>   static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
>>   {
>> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
>> index 513b50da6215..5c61a1d1bb2b 100644
>> --- a/fs/xfs/libxfs/xfs_inode_buf.c
>> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
>> @@ -657,6 +657,15 @@ xfs_dinode_verify(
>>   	    !xfs_has_bigtime(mp))
>>   		return __this_address;
>>   
>> +	if (flags2 & XFS_DIFLAG2_FORCEALIGN) {
>> +		fa = xfs_inode_validate_forcealign(mp,
>> +				be32_to_cpu(dip->di_extsize),
>> +				be32_to_cpu(dip->di_cowextsize),
>> +				mode, flags, flags2);
>> +		if (fa)
>> +			return fa;
>> +	}
>> +
>>   	return NULL;
>>   }
>>   
>> @@ -824,3 +833,49 @@ xfs_inode_validate_cowextsize(
>>   
>>   	return NULL;
>>   }
>> +
>> +/* Validate the forcealign inode flag */
>> +xfs_failaddr_t
>> +xfs_inode_validate_forcealign(
>> +	struct xfs_mount	*mp,
>> +	uint32_t		extsize,
>> +	uint32_t		cowextsize,
>> +	uint16_t		mode,
>> +	uint16_t		flags,
>> +	uint64_t		flags2)
>> +{
>> +	bool			rt =  flags & XFS_DIFLAG_REALTIME;
>> +
>> +	/* superblock rocompat feature flag */
>> +	if (!xfs_has_forcealign(mp))
>> +		return __this_address;
>> +
>> +	/* Only regular files and directories */
>> +	if (!S_ISDIR(mode) && !S_ISREG(mode))
>> +		return __this_address;
>> +
>> +	/* We require EXTSIZE or EXTSZINHERIT */
>> +	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
>> +		return __this_address;
>> +
>> +	/* We require a non-zero extsize */
>> +	if (!extsize)
>> +		return __this_address;
>> +
>> +	/* Reflink'ed disallowed */
>> +	if (flags2 & XFS_DIFLAG2_REFLINK)
>> +		return __this_address;
> 
> Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
> superblock verifier or xfs_fs_fill_super fail the mount so that old
> kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
> support for forcealign'd cow and starts writing out files with both
> iflags set?

Fine, I see that we do something similar now for rtdev.

However why even have the rt inode check, below, to disallow for reflink 
cp for rt inode (if we can't even mount with rt and reflink together)?


>>    * Mount features
>> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
>> index 265a2a418bc7..8da293e8bfa2 100644
>> --- a/fs/xfs/xfs_reflink.c
>> +++ b/fs/xfs/xfs_reflink.c
>> @@ -1467,8 +1467,9 @@ xfs_reflink_remap_prep(
>>   
>>   	/* Check file eligibility and prepare for block sharing. */
>>   	ret = -EINVAL;
>> -	/* Don't reflink realtime inodes */
>> -	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
>> +	/* Don't reflink realtime or forcealign inodes */
>> +	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest) ||
>> +	    xfs_inode_has_forcealign(src) || xfs_inode_has_forcealign(dest))
>>   		goto out_unlock;
>>   
>>   	/* Don't share DAX file data with non-DAX file. */
>> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
>> index 65c5dfe17ecf..fb55e4ce49fa 100644
>> --- a/fs/xfs/xfs_reflink.h
>> +++ b/fs/xfs/xfs_reflink.h
>> @@ -6,16 +6,6 @@
>>   #ifndef __XFS_REFLINK_H
>>   #define __XFS_REFLINK_H 1
>>   

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-11  2:59   ` Darrick J. Wong
  2024-07-11  3:59     ` Christoph Hellwig
  2024-07-11  7:17     ` John Garry
@ 2024-07-11 23:20     ` Dave Chinner
  2024-07-12  4:56       ` Christoph Hellwig
  2024-07-18  8:53       ` John Garry
  2 siblings, 2 replies; 48+ messages in thread
From: Dave Chinner @ 2024-07-11 23:20 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Wed, Jul 10, 2024 at 07:59:58PM -0700, Darrick J. Wong wrote:
> On Fri, Jul 05, 2024 at 04:24:44PM +0000, John Garry wrote:
> > +/* Validate the forcealign inode flag */
> > +xfs_failaddr_t
> > +xfs_inode_validate_forcealign(
> > +	struct xfs_mount	*mp,
> > +	uint32_t		extsize,
> > +	uint32_t		cowextsize,
> > +	uint16_t		mode,
> > +	uint16_t		flags,
> > +	uint64_t		flags2)
> > +{
> > +	bool			rt =  flags & XFS_DIFLAG_REALTIME;
> > +
> > +	/* superblock rocompat feature flag */
> > +	if (!xfs_has_forcealign(mp))
> > +		return __this_address;
> > +
> > +	/* Only regular files and directories */
> > +	if (!S_ISDIR(mode) && !S_ISREG(mode))
> > +		return __this_address;
> > +
> > +	/* We require EXTSIZE or EXTSZINHERIT */
> > +	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
> > +		return __this_address;
> > +
> > +	/* We require a non-zero extsize */
> > +	if (!extsize)
> > +		return __this_address;
> > +
> > +	/* Reflink'ed disallowed */
> > +	if (flags2 & XFS_DIFLAG2_REFLINK)
> > +		return __this_address;
> 
> Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
> superblock verifier or xfs_fs_fill_super fail the mount so that old
> kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
> support for forcealign'd cow and starts writing out files with both
> iflags set?

I don't think we should error out the mount because reflink and
forcealign are enabled - that's going to be the common configuration
for every user of forcealign, right? I also don't think we should
throw a corruption error if both flags are set, either.

We're making an initial *implementation choice* not to implement the
two features on the same inode at the same time. We are not making a
an on-disk format design decision that says "these two on-disk flags
are incompatible".

IOWs, if both are set on a current kernel, it's not corruption but a
more recent kernel that supports both flags has modified this inode.
Put simply, we have detected a ro-compat situation for this specific
inode.

Looking at it as a ro-compat situation rather then corruption,
what I would suggest we do is this:

1. Warn at mount that reflink+force align inodes will be treated
as ro-compat inodes. i.e. read-only.

2. prevent forcealign from being set if the shared extent flag is
set on the inode.

3. prevent shared extents from being created if the force align flag
is set (i.e. ->remap_file_range() and anything else that relies on
shared extents will fail on forcealign inodes).

4. if we read an inode with both set, we emit a warning and force
the inode to be read only so we don't screw up the force alignment
of the file (i.e. that inode operates in ro-compat mode.)

#1 is the mount time warning of potential ro-compat behaviour.

#2 and #3 prevent both from getting set on existing kernels.

#4 is the ro-compat behaviour that would occur from taking a
filesystem that ran on a newer kernel that supports force-align+COW.
This avoids corruption shutdowns and modifications that would screw
up the alignment of the shared and COW'd extents.

> That said, if the bs>ps patchset lands, then I think forcealign cow is
> a simple matter of setting the min folio order to the forcealign size
> and making sure that we always write out entire folios if any of the
> blocks cached by the folio is shared.  Direct writes to forcealigned
> shared files probably has to be aligned to the forcealign size or fall
> back to buffered writes for cow.

Right, I think all the pieces we will need are slowly falling into
place in the near future, so it doesn't seem right to me to actually
prevent filesystems with reflink and force-align both enabled right
now and then end up with a weird filesystem config needed to use
forcealign just for a couple of kernel releases...

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-11  7:17     ` John Garry
@ 2024-07-11 23:33       ` Dave Chinner
  0 siblings, 0 replies; 48+ messages in thread
From: Dave Chinner @ 2024-07-11 23:33 UTC (permalink / raw)
  To: John Garry
  Cc: Darrick J. Wong, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Thu, Jul 11, 2024 at 08:17:26AM +0100, John Garry wrote:
> On 11/07/2024 03:59, Darrick J. Wong wrote:
> > On Fri, Jul 05, 2024 at 04:24:44PM +0000, John Garry wrote:
> > > +/* Validate the forcealign inode flag */
> > > +xfs_failaddr_t
> > > +xfs_inode_validate_forcealign(
> > > +	struct xfs_mount	*mp,
> > > +	uint32_t		extsize,
> > > +	uint32_t		cowextsize,
> > > +	uint16_t		mode,
> > > +	uint16_t		flags,
> > > +	uint64_t		flags2)
> > > +{
> > > +	bool			rt =  flags & XFS_DIFLAG_REALTIME;
> > > +
> > > +	/* superblock rocompat feature flag */
> > > +	if (!xfs_has_forcealign(mp))
> > > +		return __this_address;
> > > +
> > > +	/* Only regular files and directories */
> > > +	if (!S_ISDIR(mode) && !S_ISREG(mode))
> > > +		return __this_address;
> > > +
> > > +	/* We require EXTSIZE or EXTSZINHERIT */
> > > +	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
> > > +		return __this_address;
> > > +
> > > +	/* We require a non-zero extsize */
> > > +	if (!extsize)
> > > +		return __this_address;
> > > +
> > > +	/* Reflink'ed disallowed */
> > > +	if (flags2 & XFS_DIFLAG2_REFLINK)
> > > +		return __this_address;
> > 
> > Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
> > superblock verifier or xfs_fs_fill_super fail the mount so that old
> > kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
> > support for forcealign'd cow and starts writing out files with both
> > iflags set?
> 
> Fine, I see that we do something similar now for rtdev.
>
> However why even have the rt inode check, below, to disallow for reflink cp
> for rt inode (if we can't even mount with rt and reflink together)?

In theory we should be able to have reflink enabled on a filesystem
with an RT device right now - we just can't share extents on a rt
inode.  Extent sharing should till work just fine on non-rt files,
but the overall config is disallowed at mount time because we
haven't ever tested that configuration. I'm not sure that mkfs.xfs
even allows you to make a filesystem of that config....

That said, it's good practice for the ->remap_file_range()
implementation (and anything else using shared extents) to be
explicitly checking for and preventing extent sharing on RT inodes.
THose operations don't support that config, and so should catch any
attempt that is made to do so and error out. It doesn't matter that
we have mount time checks, the checks in the extent sharing code
explicitly document that it doesn't support RT inodes right now...

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-11 23:20     ` Dave Chinner
@ 2024-07-12  4:56       ` Christoph Hellwig
  2024-07-18  8:53       ` John Garry
  1 sibling, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-12  4:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, John Garry, chandan.babu, dchinner, hch, viro,
	brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen

On Fri, Jul 12, 2024 at 09:20:26AM +1000, Dave Chinner wrote:
> I don't think we should error out the mount because reflink and
> forcealign are enabled - that's going to be the common configuration
> for every user of forcealign, right? I also don't think we should
> throw a corruption error if both flags are set, either.
> 
> We're making an initial *implementation choice* not to implement the
> two features on the same inode at the same time. We are not making a
> an on-disk format design decision that says "these two on-disk flags
> are incompatible".

Oh, right forcealign is per-inode.  In that case we just need to
ensure it never happens.  Which honestly might be a bit confusing if
you can reflink for some files and not others, but that's a separate
discussion.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 10/13] xfs: Unmap blocks according to forcealign
  2024-07-09  7:46       ` Christoph Hellwig
@ 2024-07-17 15:24         ` John Garry
  2024-07-17 16:42           ` Christoph Hellwig
  0 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-17 15:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: chandan.babu, djwong, dchinner, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 09/07/2024 08:46, Christoph Hellwig wrote:

Hi Christoph,

>> Using isfa (instead of isforcealign) might be interpreted as something else
> The check should be used in one single place where we decided if
> we need to to the alignment based adjustments.  So IMHO just killing
> it and open coding it there seems way easier.   Yes, it is in a loop,
> but compared to all the work done is is really cheap.
> 
>>> We've been long wanting to split the whole align / convert unwritten /
>>> etc code into a helper outside the main bumapi flow.  And when adding
>>> new logic to it this might indeed be a good time.
>> ok, I'll see if can come up with something
> I can take a look too. 

I was wondering what you plans are for any clean-up/refactoring here, as 
mentioned?

I was starting to look at the whole "if (forcealign) else if (big rt)" 
flow refactoring in this series to use xfs_inode_alloc_unitsize(); 
however, I figure that you have plans wider in scope, which affects this.

? There is some real mess in there like trying
> to account for cases where the transaction doesn't have a block
> reservation, which I think could have happen in truncate until
> Zhang Yi fixed it for the 6.11 merge window.

Cheers,
John



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 10/13] xfs: Unmap blocks according to forcealign
  2024-07-17 15:24         ` John Garry
@ 2024-07-17 16:42           ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-17 16:42 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, chandan.babu, djwong, dchinner, viro, brauner,
	jack, linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Wed, Jul 17, 2024 at 04:24:28PM +0100, John Garry wrote:
>>>> new logic to it this might indeed be a good time.
>>> ok, I'll see if can come up with something
>> I can take a look too. 
>
> I was wondering what you plans are for any clean-up/refactoring here, as 
> mentioned?

Try to split all the convert left over of large rtextent / allocation
size logic out of __xfs_bunmapi and into a separate helper or two.
I started on it after writing that previous mail, but it has been
preempted by more urgent work for now.

> I was starting to look at the whole "if (forcealign) else if (big rt)" flow 
> refactoring in this series to use xfs_inode_alloc_unitsize(); however, I 
> figure that you have plans wider in scope, which affects this.

It will create a bit of conflict, but nothing out of ordinary.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-11 23:20     ` Dave Chinner
  2024-07-12  4:56       ` Christoph Hellwig
@ 2024-07-18  8:53       ` John Garry
  2024-07-23 10:11         ` John Garry
  2024-07-23 23:38         ` Dave Chinner
  1 sibling, 2 replies; 48+ messages in thread
From: John Garry @ 2024-07-18  8:53 UTC (permalink / raw)
  To: Dave Chinner, Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 12/07/2024 00:20, Dave Chinner wrote:
>>> /* Reflink'ed disallowed */
>>> +	if (flags2 & XFS_DIFLAG2_REFLINK)
>>> +		return __this_address;
>> Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
>> superblock verifier or xfs_fs_fill_super fail the mount so that old
>> kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
>> support for forcealign'd cow and starts writing out files with both
>> iflags set?
> I don't think we should error out the mount because reflink and
> forcealign are enabled - that's going to be the common configuration
> for every user of forcealign, right? I also don't think we should
> throw a corruption error if both flags are set, either.
> 
> We're making an initial*implementation choice*  not to implement the
> two features on the same inode at the same time. We are not making a
> an on-disk format design decision that says "these two on-disk flags
> are incompatible".
> 
> IOWs, if both are set on a current kernel, it's not corruption but a
> more recent kernel that supports both flags has modified this inode.
> Put simply, we have detected a ro-compat situation for this specific
> inode.
> 
> Looking at it as a ro-compat situation rather then corruption,
> what I would suggest we do is this:
> 
> 1. Warn at mount that reflink+force align inodes will be treated
> as ro-compat inodes. i.e. read-only.
> 
> 2. prevent forcealign from being set if the shared extent flag is
> set on the inode.
> 
> 3. prevent shared extents from being created if the force align flag
> is set (i.e. ->remap_file_range() and anything else that relies on
> shared extents will fail on forcealign inodes).
> 
> 4. if we read an inode with both set, we emit a warning and force
> the inode to be read only so we don't screw up the force alignment
> of the file (i.e. that inode operates in ro-compat mode.)
> 
> #1 is the mount time warning of potential ro-compat behaviour.
> 
> #2 and #3 prevent both from getting set on existing kernels.
> 
> #4 is the ro-compat behaviour that would occur from taking a
> filesystem that ran on a newer kernel that supports force-align+COW.
> This avoids corruption shutdowns and modifications that would screw
> up the alignment of the shared and COW'd extents.
> 

This seems fine for dealing with forcealign and reflink.

So what about forcealign and RT?

We want to support this config in future, but the current implementation 
will not support it.

In this v2 series, I just disallow a mount for forcealign and RT, 
similar to reflink and RT together.

Furthermore, I am also saying here that still forcealign and RT bits set 
is a valid inode on-disk format and we just have to enforce a 
sb_rextsize to extsize relationship:

xfs_inode_validate_forcealign(
	struct xfs_mount	*mp,
	uint32_t		extsize,
	uint32_t		cowextsize,
	uint16_t		mode,
	uint16_t		flags,
	uint64_t		flags2)
{
	bool			rt =  flags & XFS_DIFLAG_REALTIME;
...


	/* extsize must be a multiple of sb_rextsize for RT */
	if (rt && mp->m_sb.sb_rextsize && extsize % mp->m_sb.sb_rextsize)
		return __this_address;

	return NULL;
}

Is this all ok? Or should we follow similar solution to reflink + 
forcealign?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-18  8:53       ` John Garry
@ 2024-07-23 10:11         ` John Garry
  2024-07-23 14:42           ` Christoph Hellwig
  2024-07-23 23:38         ` Dave Chinner
  1 sibling, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-23 10:11 UTC (permalink / raw)
  To: Dave Chinner, Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 18/07/2024 09:53, John Garry wrote:
> On 12/07/2024 00:20, Dave Chinner wrote:
>>>> /* Reflink'ed disallowed */
>>>> +    if (flags2 & XFS_DIFLAG2_REFLINK)
>>>> +        return __this_address;
>>> Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
>>> superblock verifier or xfs_fs_fill_super fail the mount so that old
>>> kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
>>> support for forcealign'd cow and starts writing out files with both
>>> iflags set?
>> I don't think we should error out the mount because reflink and
>> forcealign are enabled - that's going to be the common configuration
>> for every user of forcealign, right? I also don't think we should
>> throw a corruption error if both flags are set, either.
>>
>> We're making an initial*implementation choice*  not to implement the
>> two features on the same inode at the same time. We are not making a
>> an on-disk format design decision that says "these two on-disk flags
>> are incompatible".
>>
>> IOWs, if both are set on a current kernel, it's not corruption but a
>> more recent kernel that supports both flags has modified this inode.
>> Put simply, we have detected a ro-compat situation for this specific
>> inode.
>>
>> Looking at it as a ro-compat situation rather then corruption,
>> what I would suggest we do is this:
>>
>> 1. Warn at mount that reflink+force align inodes will be treated
>> as ro-compat inodes. i.e. read-only.

I am looking at something like this to implement read-only for those inodes:

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 07f736c42460..444a44ccc11c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -1132,6 +1132,17 @@ xfs_vn_tmpfile(
  	return finish_open_simple(file, err);
  }

+static int xfs_permission(struct mnt_idmap *d, struct inode *inode, int 
mask)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+
+	if (mask & MAY_WRITE) {
+		if (xfs_is_reflink_inode(ip) && xfs_inode_has_forcealign(ip))
+			return -EACCES;
+	}
+	return generic_permission(d, inode, mask);
+}
+
  static const struct inode_operations xfs_inode_operations = {
  	.get_inode_acl		= xfs_get_acl,
  	.set_acl		= xfs_set_acl,
@@ -1142,6 +1153,7 @@ static const struct inode_operations 
xfs_inode_operations = {
  	.update_time		= xfs_vn_update_time,
  	.fileattr_get		= xfs_fileattr_get,
  	.fileattr_set		= xfs_fileattr_set,
+	.permission		= xfs_permission,
  };

Or how else could this be done? I guess that we have something else in 
the xfs code to implement the equivalent of this, but I did not find it.

>>
>> 2. prevent forcealign from being set if the shared extent flag is
>> set on the inode.

This is just XFS_DIFLAG2_REFLINK flag, right?

>>
>> 3. prevent shared extents from being created if the force align flag
>> is set (i.e. ->remap_file_range() and anything else that relies on
>> shared extents will fail on forcealign inodes).

In this series version I extend the RT check in xfs_reflink_remap_prep() 
to cover forcealign - is that good enough?

>>
>> 4. if we read an inode with both set, we emit a warning and force
>> the inode to be read only so we don't screw up the force alignment
>> of the file (i.e. that inode operates in ro-compat mode.)
>>
>> #1 is the mount time warning of potential ro-compat behaviour.
>>
>> #2 and #3 prevent both from getting set on existing kernels.
>>
>> #4 is the ro-compat behaviour that would occur from taking a
>> filesystem that ran on a newer kernel that supports force-align+COW.
>> This avoids corruption shutdowns and modifications that would screw
>> up the alignment of the shared and COW'd extents.
>>
> 
> This seems fine for dealing with forcealign and reflink.
> 
> So what about forcealign and RT?

Any opinion on this?

> 
> We want to support this config in future, but the current implementation 
> will not support it.
> 
> In this v2 series, I just disallow a mount for forcealign and RT, 
> similar to reflink and RT together.
> 
> Furthermore, I am also saying here that still forcealign and RT bits set 
> is a valid inode on-disk format and we just have to enforce a 
> sb_rextsize to extsize relationship:
> 
> xfs_inode_validate_forcealign(
>      struct xfs_mount    *mp,
>      uint32_t        extsize,
>      uint32_t        cowextsize,
>      uint16_t        mode,
>      uint16_t        flags,
>      uint64_t        flags2)
> {
>      bool            rt =  flags & XFS_DIFLAG_REALTIME;
> ...
> 
> 
>      /* extsize must be a multiple of sb_rextsize for RT */
>      if (rt && mp->m_sb.sb_rextsize && extsize % mp->m_sb.sb_rextsize)
>          return __this_address;

And this? If not, I'll just send v3 with this code as-is.

> 
>      return NULL;
> }
> 


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-23 10:11         ` John Garry
@ 2024-07-23 14:42           ` Christoph Hellwig
  2024-07-23 15:01             ` John Garry
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-07-23 14:42 UTC (permalink / raw)
  To: John Garry
  Cc: Dave Chinner, Darrick J. Wong, chandan.babu, dchinner, hch, viro,
	brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen

On Tue, Jul 23, 2024 at 11:11:28AM +0100, John Garry wrote:
> I am looking at something like this to implement read-only for those inodes:

Yikes.  Treating individual inodes in a file systems as read-only
is about the most confusing and harmful behavior we could do.

Just treat it as any other rocompat feature please an mount the entire
file system read-only if not supported.

Or even better let this wait a little, and work with Darrick to work
on the rextsize > 1 reflіnk patches and just make the thing work.

>> So what about forcealign and RT?
>
> Any opinion on this?

What about forcealign and RT? 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-23 14:42           ` Christoph Hellwig
@ 2024-07-23 15:01             ` John Garry
  2024-07-23 22:26               ` Darrick J. Wong
  0 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2024-07-23 15:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Darrick J. Wong, chandan.babu, dchinner, viro,
	brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen

On 23/07/2024 15:42, Christoph Hellwig wrote:
> On Tue, Jul 23, 2024 at 11:11:28AM +0100, John Garry wrote:
>> I am looking at something like this to implement read-only for those inodes:
> 
> Yikes.  Treating individual inodes in a file systems as read-only
> is about the most confusing and harmful behavior we could do.

That was the suggestion which I was given earlier in this thread.

> 
> Just treat it as any other rocompat feature please an mount the entire
> file system read-only if not supported.
> 
> Or even better let this wait a little, and work with Darrick to work
> on the rextsize > 1 reflіnk patches and just make the thing work.

I'll let Darrick comment on this.

> 
>>> So what about forcealign and RT?
>>
>> Any opinion on this?
> 
> What about forcealign and RT?

In this series version I was mounting the whole FS as RO if 
XFS_FEAT_FORCEALIGN and XFS_FEAT_REFLINK was found in the SB. And so 
very different to how I was going to individual treat inodes which 
happen to be forcealign and reflink, above.

So I was asking guidance when whether that approach (for RT and 
forcealign) is sound.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-23 15:01             ` John Garry
@ 2024-07-23 22:26               ` Darrick J. Wong
  2024-07-26 14:14                 ` John Garry
  0 siblings, 1 reply; 48+ messages in thread
From: Darrick J. Wong @ 2024-07-23 22:26 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, Dave Chinner, chandan.babu, dchinner, viro,
	brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, Matthew Wilcox

On Tue, Jul 23, 2024 at 04:01:41PM +0100, John Garry wrote:
> On 23/07/2024 15:42, Christoph Hellwig wrote:
> > On Tue, Jul 23, 2024 at 11:11:28AM +0100, John Garry wrote:
> > > I am looking at something like this to implement read-only for those inodes:
> > 
> > Yikes.  Treating individual inodes in a file systems as read-only
> > is about the most confusing and harmful behavior we could do.
> 
> That was the suggestion which I was given earlier in this thread.

Well, Christoph and I suggested failing the mount /earlier/ in this
thread. ;)

> > 
> > Just treat it as any other rocompat feature please an mount the entire
> > file system read-only if not supported.
> > 
> > Or even better let this wait a little, and work with Darrick to work
> > on the rextsize > 1 reflіnk patches and just make the thing work.
> 
> I'll let Darrick comment on this.

COW with alloc_unit > fsblock is not currently possible, whether it's
forcealign or rtreflink because COW must happen at allocation unit
granularity.  Pure overwrites don't need all these twists and turns.

1. For COW to work, each write/page_mkwrite must mark dirty every
fsblock in the entire alloc unit.  Those fsblocks could be cached by
multiple folios, which means (in iomap terms) dirtying each block in
potentially multiple iomap_folio_state structures, as well as their
folios.

2. Similarly, writeback must then be able to issue IO in quantities that
are aligned to allocation units.  IOWs, for every dirty region in the
file, we'd have to find the folios for a given allocation unit, mark
them all for writeback, and issue bios for however much we managed to
do.  If it's not possible to grab a folio, then the entire allocation
unit can't be written out, which implies that writeback can fail to
fully clean folios.

3. Alternately I suppose we could track the number of folios undergoing
writeback for each allocation unit, issue the writeback ios whenever
we're ready, and only remap the allocation unit when the number of
folios undergoing writeback for that allocation unit reaches zero.

If we could get the mapping_set_folio_order patch merged, then we could
at least get partial support for power-of-two alloc_unit > fsblock
configurations by setting the minimum folio order to log2(alloc_unit).
For atomic writes this is probably a hard requirement because we must be
able to submit one bio with one memory region.

For everyone else this sucks because cranking up the min folio order
reduces the flexibility that the page cache can have in finding cache
memory... but until someone figures out how to make the batching work,
there's not much progress to be made.

For non power-of-two alloc_unit we can't just crank up the min folio
order because there will always be misalignments somewhere; we need a
full writeback batching implementation that can handle multiple folios
per alloc unit and partial folio writeback.

djwong-dev implements 1.  It partially handles 2 by enlarging the wbc
range to be aligned to allocation units, but it doesn't guarantee that
all the folios actually got tagged for the batch.  It can't do 3, which
means that it's probably broken if you press it hard enough.

Alternately we could disallow non power-of-two everywhere, which would
make the accounting simpler but that's a regression against ye olde xfs
which supports non power-of-two allocation units.

rtreflink is nowhere near ready to go -- it's still in djwong-wtf behind
metadata directories, rtgroups, realtime rmap, and (probably) hch's
zns patches.

> > > > So what about forcealign and RT?
> > > 
> > > Any opinion on this?
> > 
> > What about forcealign and RT?
> 
> In this series version I was mounting the whole FS as RO if
> XFS_FEAT_FORCEALIGN and XFS_FEAT_REFLINK was found in the SB. And so very
> different to how I was going to individual treat inodes which happen to be
> forcealign and reflink, above.
> 
> So I was asking guidance when whether that approach (for RT and forcealign)
> is sound.

I reiterate: don't allow mounting of (forcealign && reflink) or
(forcealign && rtextsize > 1) filesystems, and then you and I can work
on figuring out the rest.

--D

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-18  8:53       ` John Garry
  2024-07-23 10:11         ` John Garry
@ 2024-07-23 23:38         ` Dave Chinner
  2024-07-24  0:04           ` Darrick J. Wong
  2024-07-24  7:39           ` John Garry
  1 sibling, 2 replies; 48+ messages in thread
From: Dave Chinner @ 2024-07-23 23:38 UTC (permalink / raw)
  To: John Garry
  Cc: Darrick J. Wong, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Thu, Jul 18, 2024 at 09:53:14AM +0100, John Garry wrote:
> On 12/07/2024 00:20, Dave Chinner wrote:
> > > > /* Reflink'ed disallowed */
> > > > +	if (flags2 & XFS_DIFLAG2_REFLINK)
> > > > +		return __this_address;
> > > Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
> > > superblock verifier or xfs_fs_fill_super fail the mount so that old
> > > kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
> > > support for forcealign'd cow and starts writing out files with both
> > > iflags set?
> > I don't think we should error out the mount because reflink and
> > forcealign are enabled - that's going to be the common configuration
> > for every user of forcealign, right? I also don't think we should
> > throw a corruption error if both flags are set, either.
> > 
> > We're making an initial*implementation choice*  not to implement the
> > two features on the same inode at the same time. We are not making a
> > an on-disk format design decision that says "these two on-disk flags
> > are incompatible".
> > 
> > IOWs, if both are set on a current kernel, it's not corruption but a
> > more recent kernel that supports both flags has modified this inode.
> > Put simply, we have detected a ro-compat situation for this specific
> > inode.
> > 
> > Looking at it as a ro-compat situation rather then corruption,
> > what I would suggest we do is this:
> > 
> > 1. Warn at mount that reflink+force align inodes will be treated
> > as ro-compat inodes. i.e. read-only.
> > 
> > 2. prevent forcealign from being set if the shared extent flag is
> > set on the inode.
> > 
> > 3. prevent shared extents from being created if the force align flag
> > is set (i.e. ->remap_file_range() and anything else that relies on
> > shared extents will fail on forcealign inodes).
> > 
> > 4. if we read an inode with both set, we emit a warning and force
> > the inode to be read only so we don't screw up the force alignment
> > of the file (i.e. that inode operates in ro-compat mode.)
> > 
> > #1 is the mount time warning of potential ro-compat behaviour.
> > 
> > #2 and #3 prevent both from getting set on existing kernels.
> > 
> > #4 is the ro-compat behaviour that would occur from taking a
> > filesystem that ran on a newer kernel that supports force-align+COW.
> > This avoids corruption shutdowns and modifications that would screw
> > up the alignment of the shared and COW'd extents.
> > 
> 
> This seems fine for dealing with forcealign and reflink.
> 
> So what about forcealign and RT?
> 
> We want to support this config in future, but the current implementation
> will not support it.

What's the problem with supporting it right from the start? We
already support forcealign for RT, just it's a global config 
under the "big rt alloc" moniker rather than a per-inode flag.

Like all on-disk format change based features,
forcealign should add the EXPERIMENTAL flag to the filesystem for a
couple of releases after merge, so there will be plenty of time to
test both data and rt dev functionality before removing the
EXPERIMENTAL flag from it.

So why not just enable the per-inode flag with RT right from the
start given that this functionality is supposed to work and be
globally supported by the rtdev right now? It seems like a whole lot
less work to just enable it for RT now than it is to disable it...

> In this v2 series, I just disallow a mount for forcealign and RT, similar to
> reflink and RT together.
> 
> Furthermore, I am also saying here that still forcealign and RT bits set is
> a valid inode on-disk format and we just have to enforce a sb_rextsize to
> extsize relationship:
> 
> xfs_inode_validate_forcealign(
> 	struct xfs_mount	*mp,
> 	uint32_t		extsize,
> 	uint32_t		cowextsize,
> 	uint16_t		mode,
> 	uint16_t		flags,
> 	uint64_t		flags2)
> {
> 	bool			rt =  flags & XFS_DIFLAG_REALTIME;
> ...
> 
> 
> 	/* extsize must be a multiple of sb_rextsize for RT */
> 	if (rt && mp->m_sb.sb_rextsize && extsize % mp->m_sb.sb_rextsize)
> 		return __this_address;
> 
> 	return NULL;
> }

I suspect the logic needs tweaking, but why not just do this right
from the start?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-23 23:38         ` Dave Chinner
@ 2024-07-24  0:04           ` Darrick J. Wong
  2024-07-24 18:50             ` John Garry
  2024-07-24  7:39           ` John Garry
  1 sibling, 1 reply; 48+ messages in thread
From: Darrick J. Wong @ 2024-07-24  0:04 UTC (permalink / raw)
  To: Dave Chinner
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Wed, Jul 24, 2024 at 09:38:18AM +1000, Dave Chinner wrote:
> On Thu, Jul 18, 2024 at 09:53:14AM +0100, John Garry wrote:
> > On 12/07/2024 00:20, Dave Chinner wrote:
> > > > > /* Reflink'ed disallowed */
> > > > > +	if (flags2 & XFS_DIFLAG2_REFLINK)
> > > > > +		return __this_address;
> > > > Hmm.  If we don't support reflink + forcealign ATM, then shouldn't the
> > > > superblock verifier or xfs_fs_fill_super fail the mount so that old
> > > > kernels won't abruptly emit EFSCORRUPTED errors if a future kernel adds
> > > > support for forcealign'd cow and starts writing out files with both
> > > > iflags set?
> > > I don't think we should error out the mount because reflink and
> > > forcealign are enabled - that's going to be the common configuration
> > > for every user of forcealign, right? I also don't think we should
> > > throw a corruption error if both flags are set, either.
> > > 
> > > We're making an initial*implementation choice*  not to implement the
> > > two features on the same inode at the same time. We are not making a
> > > an on-disk format design decision that says "these two on-disk flags
> > > are incompatible".
> > > 
> > > IOWs, if both are set on a current kernel, it's not corruption but a
> > > more recent kernel that supports both flags has modified this inode.
> > > Put simply, we have detected a ro-compat situation for this specific
> > > inode.
> > > 
> > > Looking at it as a ro-compat situation rather then corruption,
> > > what I would suggest we do is this:
> > > 
> > > 1. Warn at mount that reflink+force align inodes will be treated
> > > as ro-compat inodes. i.e. read-only.
> > > 
> > > 2. prevent forcealign from being set if the shared extent flag is
> > > set on the inode.
> > > 
> > > 3. prevent shared extents from being created if the force align flag
> > > is set (i.e. ->remap_file_range() and anything else that relies on
> > > shared extents will fail on forcealign inodes).
> > > 
> > > 4. if we read an inode with both set, we emit a warning and force
> > > the inode to be read only so we don't screw up the force alignment
> > > of the file (i.e. that inode operates in ro-compat mode.)
> > > 
> > > #1 is the mount time warning of potential ro-compat behaviour.
> > > 
> > > #2 and #3 prevent both from getting set on existing kernels.
> > > 
> > > #4 is the ro-compat behaviour that would occur from taking a
> > > filesystem that ran on a newer kernel that supports force-align+COW.
> > > This avoids corruption shutdowns and modifications that would screw
> > > up the alignment of the shared and COW'd extents.
> > > 
> > 
> > This seems fine for dealing with forcealign and reflink.
> > 
> > So what about forcealign and RT?
> > 
> > We want to support this config in future, but the current implementation
> > will not support it.
> 
> What's the problem with supporting it right from the start? We
> already support forcealign for RT, just it's a global config 
> under the "big rt alloc" moniker rather than a per-inode flag.
> 
> Like all on-disk format change based features,
> forcealign should add the EXPERIMENTAL flag to the filesystem for a
> couple of releases after merge, so there will be plenty of time to
> test both data and rt dev functionality before removing the
> EXPERIMENTAL flag from it.
> 
> So why not just enable the per-inode flag with RT right from the
> start given that this functionality is supposed to work and be
> globally supported by the rtdev right now? It seems like a whole lot
> less work to just enable it for RT now than it is to disable it...

What needs to be done to the rt allocator, anyway?

I think it's mostly turning off the fallback to unaligned allocation,
just like what was done for the data device allocator, right?  And
possibly tweaking whatever this does:

	/*
	 * Only bother calculating a real prod factor if offset & length are
	 * perfectly aligned, otherwise it will just get us in trouble.
	 */
	div_u64_rem(ap->offset, align, &mod);
	if (mod || ap->length % align) {
		prod = 1;
	} else {
		prod = xfs_extlen_to_rtxlen(mp, align);
		if (prod > 1)
			xfs_rtalloc_align_minmax(&raminlen, &ralen, &prod);
	}


> > In this v2 series, I just disallow a mount for forcealign and RT, similar to
> > reflink and RT together.
> > 
> > Furthermore, I am also saying here that still forcealign and RT bits set is
> > a valid inode on-disk format and we just have to enforce a sb_rextsize to
> > extsize relationship:
> > 
> > xfs_inode_validate_forcealign(
> > 	struct xfs_mount	*mp,
> > 	uint32_t		extsize,
> > 	uint32_t		cowextsize,
> > 	uint16_t		mode,
> > 	uint16_t		flags,
> > 	uint64_t		flags2)
> > {
> > 	bool			rt =  flags & XFS_DIFLAG_REALTIME;
> > ...
> > 
> > 
> > 	/* extsize must be a multiple of sb_rextsize for RT */
> > 	if (rt && mp->m_sb.sb_rextsize && extsize % mp->m_sb.sb_rextsize)
> > 		return __this_address;
> > 
> > 	return NULL;
> > }
> 
> I suspect the logic needs tweaking, but why not just do this right
> from the start?

Do we even allow (i_extsize % mp->m_sb.sb_rextsize) != 0 for realtime
files?  I didn't think we did.

--D

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-23 23:38         ` Dave Chinner
  2024-07-24  0:04           ` Darrick J. Wong
@ 2024-07-24  7:39           ` John Garry
  1 sibling, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-24  7:39 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On 24/07/2024 00:38, Dave Chinner wrote:
> What's the problem with supporting it right from the start? 

Simply because I wanted to focus on regular data volume support first.

> We
> already support forcealign for RT, just it's a global config
> under the "big rt alloc" moniker rather than a per-inode flag.
> 
> Like all on-disk format change based features,
> forcealign should add the EXPERIMENTAL flag to the filesystem for a
> couple of releases after merge, so there will be plenty of time to
> test both data and rt dev functionality before removing the
> EXPERIMENTAL flag from it.
> 
> So why not just enable the per-inode flag with RT right from the
> start given that this functionality is supposed to work and be
> globally supported by the rtdev right now? It seems like a whole lot
> less work to just enable it for RT now than it is to disable it...

I'll have a look... if it really is that easy, then - yes - we can have 
it from the start.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-24  0:04           ` Darrick J. Wong
@ 2024-07-24 18:50             ` John Garry
  0 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-24 18:50 UTC (permalink / raw)
  To: Darrick J. Wong, Dave Chinner
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 24/07/2024 01:04, Darrick J. Wong wrote:
>> So why not just enable the per-inode flag with RT right from the
>> start given that this functionality is supposed to work and be
>> globally supported by the rtdev right now? It seems like a whole lot
>> less work to just enable it for RT now than it is to disable it...
> What needs to be done to the rt allocator, anyway?
> 
> I think it's mostly turning off the fallback to unaligned allocation,
> just like what was done for the data device allocator, right?  And
> possibly tweaking whatever this does:
> 
> 	/*
> 	 * Only bother calculating a real prod factor if offset & length are
> 	 * perfectly aligned, otherwise it will just get us in trouble.
> 	 */
> 	div_u64_rem(ap->offset, align, &mod);
> 	if (mod || ap->length % align) {
> 		prod = 1;
> 	} else {
> 		prod = xfs_extlen_to_rtxlen(mp, align);
> 		if (prod > 1)
> 			xfs_rtalloc_align_minmax(&raminlen, &ralen, &prod);
> 	}
> 
> 

My initial impression is that calling xfs_bmap_rtalloc() -> 
xfs_rtpick_extent() for XFS_ALLOC_INITIAL_USER_DATA won't always give an 
aligned extent. However the rest of the allocator paths are giving 
extents aligned as requested - that is from limited testing.

And we would need to not take the xfs_bmap_rtalloc() retry fallback for 
-ENOSPC when align > rtextsize, but I have not hit that yet - maybe 
because xfs_trans_reserve() stops us getting to this point due to lack 
of free rtextents.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-07-23 22:26               ` Darrick J. Wong
@ 2024-07-26 14:14                 ` John Garry
  0 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2024-07-26 14:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Dave Chinner, chandan.babu, dchinner, viro,
	brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, Matthew Wilcox


> 
>>>>> So what about forcealign and RT?
>>>> Any opinion on this?
>>> What about forcealign and RT?
>> In this series version I was mounting the whole FS as RO if
>> XFS_FEAT_FORCEALIGN and XFS_FEAT_REFLINK was found in the SB. And so very
>> different to how I was going to individual treat inodes which happen to be
>> forcealign and reflink, above.
>>
>> So I was asking guidance when whether that approach (for RT and forcealign)
>> is sound.
> I reiterate: don't allow mounting of (forcealign && reflink) or
> (forcealign && rtextsize > 1) filesystems, and then you and I can work
> on figuring out the rest.

I'm fine with that approach for forcealign && reflink (no mounting).

As for forcealign && rtextsize > 1 it seems to be working for me. That 
is with not too many changes, so maybe we can go with this support 
initially. Personally I'd rather not, as testing may be spread too thin. 
Anyway, I'll send the patches early next week and we can make the 
judgement then.




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 04/13] xfs: make EOF allocation simpler
  2024-07-05 16:24 ` [PATCH v2 04/13] xfs: make EOF allocation simpler John Garry
@ 2024-08-06 18:58   ` Darrick J. Wong
  0 siblings, 0 replies; 48+ messages in thread
From: Darrick J. Wong @ 2024-08-06 18:58 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jul 05, 2024 at 04:24:41PM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently the allocation at EOF is broken into two cases - when the
> offset is zero and when the offset is non-zero. When the offset is
> non-zero, we try to do exact block allocation for contiguous
> extent allocation. When the offset is zero, the allocation is simply
> an aligned allocation.
> 
> We want aligned allocation as the fallback when exact block
> allocation fails, but that complicates the EOF allocation in that it
> now has to handle two different allocation cases. The
> caller also has to handle allocation when not at EOF, and for the
> upcoming forced alignment changes we need that to also be aligned
> allocation.
> 
> To simplify all this, pull the aligned allocation cases back into
> the callers and leave the EOF allocation path for exact block
> allocation only. This means that the EOF exact block allocation
> fallback path is the normal aligned allocation path and that ends up
> making things a lot simpler when forced alignment is introduced.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c   | 129 +++++++++++++++----------------------
>  fs/xfs/libxfs/xfs_ialloc.c |   2 +-
>  2 files changed, 54 insertions(+), 77 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index b5156bafb7be..4122a2da06ec 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3309,12 +3309,12 @@ xfs_bmap_select_minlen(
>  static int
>  xfs_bmap_btalloc_select_lengths(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	xfs_extlen_t		*blen)
> +	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*pag;
>  	xfs_agnumber_t		agno, startag;
> +	xfs_extlen_t		blen = 0;
>  	int			error = 0;
>  
>  	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> @@ -3328,19 +3328,18 @@ xfs_bmap_btalloc_select_lengths(
>  	if (startag == NULLAGNUMBER)
>  		startag = 0;
>  
> -	*blen = 0;
>  	for_each_perag_wrap(mp, startag, agno, pag) {
> -		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
> +		error = xfs_bmap_longest_free_extent(pag, args->tp, &blen);
>  		if (error && error != -EAGAIN)
>  			break;
>  		error = 0;
> -		if (*blen >= args->maxlen)
> +		if (blen >= args->maxlen)
>  			break;
>  	}
>  	if (pag)
>  		xfs_perag_rele(pag);
>  
> -	args->minlen = xfs_bmap_select_minlen(ap, args, *blen);
> +	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
>  	return error;
>  }
>  
> @@ -3550,78 +3549,40 @@ xfs_bmap_exact_minlen_extent_alloc(
>   * If we are not low on available data blocks and we are allocating at
>   * EOF, optimise allocation for contiguous file extension and/or stripe
>   * alignment of the new extent.
> - *
> - * NOTE: ap->aeof is only set if the allocation length is >= the
> - * stripe unit and the allocation offset is at the end of file.
>   */
>  static int
>  xfs_bmap_btalloc_at_eof(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	xfs_extlen_t		blen,
> -	bool			ag_only)
> +	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*caller_pag = args->pag;
> +	xfs_extlen_t		alignment = args->alignment;
>  	int			error;
>  
> +	ASSERT(ap->aeof && ap->offset);
> +	ASSERT(args->alignment >= 1);
> +
>  	/*
> -	 * If there are already extents in the file, try an exact EOF block
> -	 * allocation to extend the file as a contiguous extent. If that fails,
> -	 * or it's the first allocation in a file, just try for a stripe aligned
> -	 * allocation.
> +	 * Compute the alignment slop for the fallback path so we ensure
> +	 * we account for the potential alignemnt space required by the
> +	 * fallback paths before we modify the AGF and AGFL here.
>  	 */
> -	if (ap->offset) {
> -		xfs_extlen_t	alignment = args->alignment;
> -
> -		/*
> -		 * Compute the alignment slop for the fallback path so we ensure
> -		 * we account for the potential alignment space required by the
> -		 * fallback paths before we modify the AGF and AGFL here.
> -		 */
> -		args->alignment = 1;
> -		args->alignslop = alignment - args->alignment;
> -
> -		if (!caller_pag)
> -			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> -		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> -		if (!caller_pag) {
> -			xfs_perag_put(args->pag);
> -			args->pag = NULL;
> -		}
> -		if (error)
> -			return error;
> -
> -		if (args->fsbno != NULLFSBLOCK)
> -			return 0;
> -		/*
> -		 * Exact allocation failed. Reset to try an aligned allocation
> -		 * according to the original allocation specification.
> -		 */
> -		args->alignment = alignment;
> -		args->alignslop = 0;
> -	}
> +	args->alignment = 1;
> +	args->alignslop = alignment - args->alignment;
>  
> -	if (ag_only) {
> -		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> -	} else {
> +	if (!caller_pag)
> +		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> +	error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> +	if (!caller_pag) {
> +		xfs_perag_put(args->pag);
>  		args->pag = NULL;
> -		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> -		ASSERT(args->pag == NULL);
> -		args->pag = caller_pag;
>  	}
> -	if (error)
> -		return error;
>  
> -	if (args->fsbno != NULLFSBLOCK)
> -		return 0;
> -
> -	/*
> -	 * Aligned allocation failed, so all fallback paths from here drop the
> -	 * start alignment requirement as we know it will not succeed.
> -	 */
> -	args->alignment = 1;
> -	return 0;
> +	/* Reset alignment to original specifications.  */
> +	args->alignment = alignment;
> +	args->alignslop = 0;
> +	return error;
>  }
>  
>  /*
> @@ -3687,12 +3648,19 @@ xfs_bmap_btalloc_filestreams(
>  	}
>  
>  	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
> -	if (ap->aeof)
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
> +	if (ap->aeof && ap->offset)
> +		error = xfs_bmap_btalloc_at_eof(ap, args);
>  
> +	/* This may be an aligned allocation attempt. */
>  	if (!error && args->fsbno == NULLFSBLOCK)
>  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
>  
> +	/* Attempt non-aligned allocation if we haven't already. */
> +	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> +		args->alignment = 1;
> +		error = xfs_alloc_vextent_near_bno(args, ap->blkno);

and again going back a couple of submissions, do we have to zero
alignslop here?

https://lore.kernel.org/linux-xfs/20240621203556.GU3058325@frogsfrogsfrogs/

--D

> +	}
> +
>  out_low_space:
>  	/*
>  	 * We are now done with the perag reference for the filestreams
> @@ -3714,7 +3682,6 @@ xfs_bmap_btalloc_best_length(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args)
>  {
> -	xfs_extlen_t		blen = 0;
>  	int			error;
>  
>  	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
> @@ -3725,23 +3692,33 @@ xfs_bmap_btalloc_best_length(
>  	 * the request.  If one isn't found, then adjust the minimum allocation
>  	 * size to the largest space found.
>  	 */
> -	error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
> +	error = xfs_bmap_btalloc_select_lengths(ap, args);
>  	if (error)
>  		return error;
>  
>  	/*
> -	 * Don't attempt optimal EOF allocation if previous allocations barely
> -	 * succeeded due to being near ENOSPC. It is highly unlikely we'll get
> -	 * optimal or even aligned allocations in this case, so don't waste time
> -	 * trying.
> +	 * If we are in low space mode, then optimal allocation will fail so
> +	 * prepare for minimal allocation and run the low space algorithm
> +	 * immediately.
>  	 */
> -	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
> -		if (error || args->fsbno != NULLFSBLOCK)
> -			return error;
> +	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> +		ASSERT(args->fsbno == NULLFSBLOCK);
> +		return xfs_bmap_btalloc_low_space(ap, args);
> +	}
> +
> +	if (ap->aeof && ap->offset)
> +		error = xfs_bmap_btalloc_at_eof(ap, args);
> +
> +	/* This may be an aligned allocation attempt. */
> +	if (!error && args->fsbno == NULLFSBLOCK)
> +		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> +
> +	/* Attempt non-aligned allocation if we haven't already. */
> +	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> +		args->alignment = 1;
> +		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	}
>  
> -	error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	if (error || args->fsbno != NULLFSBLOCK)
>  		return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 9f71a9a3a65e..40a2daeea712 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -780,7 +780,7 @@ xfs_ialloc_ag_alloc(
>  		 * the exact agbno requirement and increase the alignment
>  		 * instead. It is critical that the total size of the request
>  		 * (len + alignment + slop) does not increase from this point
> -		 * on, so reset minalignslop to ensure it is not included in
> +		 * on, so reset alignslop to ensure it is not included in
>  		 * subsequent requests.
>  		 */
>  		args.alignslop = 0;
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-08-06 18:58 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-05 16:24 [PATCH v2 00/13] forcealign for xfs John Garry
2024-07-05 16:24 ` [PATCH v2 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
2024-07-05 16:24 ` [PATCH v2 02/13] xfs: always tail align maxlen allocations John Garry
2024-07-05 16:24 ` [PATCH v2 03/13] xfs: simplify extent allocation alignment John Garry
2024-07-05 16:24 ` [PATCH v2 04/13] xfs: make EOF allocation simpler John Garry
2024-08-06 18:58   ` Darrick J. Wong
2024-07-05 16:24 ` [PATCH v2 05/13] xfs: introduce forced allocation alignment John Garry
2024-07-05 16:24 ` [PATCH v2 06/13] xfs: align args->minlen for " John Garry
2024-07-05 16:24 ` [PATCH v2 07/13] xfs: Introduce FORCEALIGN inode flag John Garry
2024-07-11  2:59   ` Darrick J. Wong
2024-07-11  3:59     ` Christoph Hellwig
2024-07-11  7:17     ` John Garry
2024-07-11 23:33       ` Dave Chinner
2024-07-11 23:20     ` Dave Chinner
2024-07-12  4:56       ` Christoph Hellwig
2024-07-18  8:53       ` John Garry
2024-07-23 10:11         ` John Garry
2024-07-23 14:42           ` Christoph Hellwig
2024-07-23 15:01             ` John Garry
2024-07-23 22:26               ` Darrick J. Wong
2024-07-26 14:14                 ` John Garry
2024-07-23 23:38         ` Dave Chinner
2024-07-24  0:04           ` Darrick J. Wong
2024-07-24 18:50             ` John Garry
2024-07-24  7:39           ` John Garry
2024-07-05 16:24 ` [PATCH v2 08/13] xfs: Do not free EOF blocks for forcealign John Garry
2024-07-06  7:56   ` Christoph Hellwig
2024-07-08  1:44     ` Dave Chinner
2024-07-08  7:36       ` John Garry
2024-07-08 11:12         ` Dave Chinner
2024-07-08 14:41           ` John Garry
2024-07-09  7:41       ` Christoph Hellwig
2024-07-05 16:24 ` [PATCH v2 09/13] xfs: Update xfs_inode_alloc_unitsize() " John Garry
2024-07-05 16:24 ` [PATCH v2 10/13] xfs: Unmap blocks according to forcealign John Garry
2024-07-06  7:58   ` Christoph Hellwig
2024-07-08 14:48     ` John Garry
2024-07-09  7:46       ` Christoph Hellwig
2024-07-17 15:24         ` John Garry
2024-07-17 16:42           ` Christoph Hellwig
2024-07-09  9:57     ` Dave Chinner
2024-07-09 11:19       ` Christoph Hellwig
2024-07-05 16:24 ` [PATCH v2 11/13] xfs: Only free full extents for forcealign John Garry
2024-07-06  7:59   ` Christoph Hellwig
2024-07-05 16:24 ` [PATCH v2 12/13] xfs: Don't revert allocated offset " John Garry
2024-07-05 16:24 ` [PATCH v2 13/13] xfs: Enable file data forcealign feature John Garry
2024-07-06  7:53 ` [PATCH v2 00/13] forcealign for xfs Christoph Hellwig
2024-07-08  7:48   ` John Garry
2024-07-09  7:48     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).