linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] forcealign for xfs
@ 2024-06-21 10:05 John Garry
  2024-06-21 10:05 ` [PATCH 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
                   ` (12 more replies)
  0 siblings, 13 replies; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

This series is being spun off the block atomic writes for xfs series at
[0].

That series has got too big and also has a dependency on the core block
atomic writes support at [1].

The actual forcealign patches are the same in this series, modulo an
attempt for a fix in xfs_bunmapi_align()

Why forcealign?
In some scenarios to may be required to guarantee extent alignment and
granularity.

For example, for atomic writes, the maximum atomic write unit size would
be limited at the extent alignment and granularity, guaranteeing that an
atomic write would not span data present in multiple extents.

forcealign may be useful as a performance tuning optimization in other
scenarios.

Early development xfsprogs support is at:
https://github.com/johnpgarry/xfsprogs-dev/tree/forcealign_and_atomicwrites_for_v4_xfs_block_atomic_writes

Catherine has been working on a formal version of this support, which
I need to update to.

Baseline:
xfs/for-next @ 348a1983cf4c ("xfs: fix unlink vs cluster buffer
instantiation race")
+ https://lore.kernel.org/linux-xfs/20240528171510.3562654-1-john.g.garry@oracle.com/

[0] https://lore.kernel.org/linux-xfs/20240607143919.2622319-1-john.g.garry@oracle.com/
[1] https://lore.kernel.org/linux-nvme/20240620125359.2684798-1-john.g.garry@oracle.com/T/#m4ab84ddd627d95e6e957b62c29bbf815ca6e44e2

Darrick J. Wong (2):
  xfs: Introduce FORCEALIGN inode flag
  xfs: Enable file data forcealign feature

Dave Chinner (6):
  xfs: only allow minlen allocations when near ENOSPC
  xfs: always tail align maxlen allocations
  xfs: simplify extent allocation alignment
  xfs: make EOF allocation simpler
  xfs: introduce forced allocation alignment
  xfs: align args->minlen for forced allocation alignment

John Garry (5):
  xfs: Do not free EOF blocks for forcealign
  xfs: Update xfs_inode_alloc_unitsize_fsb() for forcealign
  xfs: Unmap blocks according to forcealign
  xfs: Only free full extents for forcealign
  xfs: Don't revert allocated offset for forcealign

 fs/xfs/libxfs/xfs_alloc.c     |  33 ++--
 fs/xfs/libxfs/xfs_alloc.h     |   3 +-
 fs/xfs/libxfs/xfs_bmap.c      | 313 +++++++++++++++++++---------------
 fs/xfs/libxfs/xfs_format.h    |   9 +-
 fs/xfs/libxfs/xfs_ialloc.c    |  12 +-
 fs/xfs/libxfs/xfs_inode_buf.c |  53 ++++++
 fs/xfs/libxfs/xfs_inode_buf.h |   3 +
 fs/xfs/libxfs/xfs_sb.c        |   2 +
 fs/xfs/xfs_bmap_util.c        |  14 +-
 fs/xfs/xfs_inode.c            |  15 ++
 fs/xfs/xfs_inode.h            |  23 +++
 fs/xfs/xfs_ioctl.c            |  47 ++++-
 fs/xfs/xfs_mount.h            |   2 +
 fs/xfs/xfs_reflink.h          |  10 --
 fs/xfs/xfs_super.c            |   4 +
 fs/xfs/xfs_trace.h            |   8 +-
 include/uapi/linux/fs.h       |   2 +
 17 files changed, 371 insertions(+), 182 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/13] xfs: only allow minlen allocations when near ENOSPC
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 19:42   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 02/13] xfs: always tail align maxlen allocations John Garry
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

When we are near ENOSPC and don't have enough free
space for an args->maxlen allocation, xfs_alloc_space_available()
will trim args->maxlen to equal the available space. However, this
function has only checked that there is enough contiguous free space
for an aligned args->minlen allocation to succeed. Hence there is no
guarantee that an args->maxlen allocation will succeed, nor that the
available space will allow for correct alignment of an args->maxlen
allocation.

Further, by trimming args->maxlen arbitrarily, it breaks an
assumption made in xfs_alloc_fix_len() that if the caller wants
aligned allocation, then args->maxlen will be set to an aligned
value. It then skips the tail alignment and so we end up with
extents that aren't aligned to extent size hint boundaries as we
approach ENOSPC.

To avoid this problem, don't reduce args->maxlen by some random,
arbitrary amount. If args->maxlen is too large for the available
space, reduce the allocation to a minlen allocation as we know we
have contiguous free space available for this to succeed and always
be correctly aligned.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6c55a6e88eba..5855a21d4864 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2409,14 +2409,23 @@ xfs_alloc_space_available(
 	if (available < (int)max(args->total, alloc_len))
 		return false;
 
+	if (flags & XFS_ALLOC_FLAG_CHECK)
+		return true;
+
 	/*
-	 * Clamp maxlen to the amount of free space available for the actual
-	 * extent allocation.
+	 * If we can't do a maxlen allocation, then we must reduce the size of
+	 * the allocation to match the available free space. We know how big
+	 * the largest contiguous free space we can allocate is, so that's our
+	 * upper bound. However, we don't exaclty know what alignment/size
+	 * constraints have been placed on the allocation, so we can't
+	 * arbitrarily select some new max size. Hence make this a minlen
+	 * allocation as we know that will definitely succeed and match the
+	 * callers alignment constraints.
 	 */
-	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
-		args->maxlen = available;
+	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
+	if (longest < alloc_len) {
+		args->maxlen = args->minlen;
 		ASSERT(args->maxlen > 0);
-		ASSERT(args->maxlen >= args->minlen);
 	}
 
 	return true;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/13] xfs: always tail align maxlen allocations
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
  2024-06-21 10:05 ` [PATCH 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 19:50   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 03/13] xfs: simplify extent allocation alignment John Garry
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

When we do a large allocation, the core free space allocation code
assumes that args->maxlen is aligned to args->prod/args->mod. hence
if we get a maximum sized extent allocated, it does not do tail
alignment of the extent.

However, this assumes that nothing modifies args->maxlen between the
original allocation context setup and trimming the selected free
space extent to size. This assumption has recently been found to be
invalid - xfs_alloc_space_available() modifies args->maxlen in low
space situations - and there may be more situations we haven't yet
found like this.

Force aligned allocation introduces the requirement that extents are
correctly tail aligned, resulting in this occasional latent
alignment failure to e reclassified from an unimportant curiousity
to a must-fix bug.

Removing the assumption about args->maxlen allocations always being
tail aligned is trivial, and should not impact anything because
args->maxlen for inodes with extent size hints configured are
already aligned. Hence all this change does it avoid weird corner
cases that would have resulted in unaligned extent sizes by always
trimming the extent down to an aligned size.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 5855a21d4864..32f72217c126 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -432,20 +432,18 @@ xfs_alloc_compute_diff(
  * Fix up the length, based on mod and prod.
  * len should be k * prod + mod for some k.
  * If len is too small it is returned unchanged.
- * If len hits maxlen it is left alone.
  */
-STATIC void
+static void
 xfs_alloc_fix_len(
-	xfs_alloc_arg_t	*args)		/* allocation argument structure */
+	struct xfs_alloc_arg	*args)
 {
-	xfs_extlen_t	k;
-	xfs_extlen_t	rlen;
+	xfs_extlen_t		k;
+	xfs_extlen_t		rlen = args->len;
 
 	ASSERT(args->mod < args->prod);
-	rlen = args->len;
 	ASSERT(rlen >= args->minlen);
 	ASSERT(rlen <= args->maxlen);
-	if (args->prod <= 1 || rlen < args->mod || rlen == args->maxlen ||
+	if (args->prod <= 1 || rlen < args->mod ||
 	    (args->mod == 0 && rlen < args->prod))
 		return;
 	k = rlen % args->prod;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/13] xfs: simplify extent allocation alignment
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
  2024-06-21 10:05 ` [PATCH 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
  2024-06-21 10:05 ` [PATCH 02/13] xfs: always tail align maxlen allocations John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 20:29   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 04/13] xfs: make EOF allocation simpler John Garry
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

We currently align extent allocation to stripe unit or stripe width.
That is specified by an external parameter to the allocation code,
which then manipulates the xfs_alloc_args alignment configuration in
interesting ways.

The args->alignment field specifies extent start alignment, but
because we may be attempting non-aligned allocation first there are
also slop variables that allow for those allocation attempts to
account for aligned allocation if they fail.

This gets much more complex as we introduce forced allocation
alignment, where extent size hints are used to generate the extent
start alignment. extent size hints currently only affect extent
lengths (via args->prod and args->mod) and so with this change we
will have two different start alignment conditions.

Avoid this complexity by always using args->alignment to indicate
extent start alignment, and always using args->prod/mod to indicate
extent length adjustment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[jpg: fixup alignslop references in xfs_trace.h and xfs_ialloc.c]
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |  4 +-
 fs/xfs/libxfs/xfs_alloc.h  |  2 +-
 fs/xfs/libxfs/xfs_bmap.c   | 96 +++++++++++++++++---------------------
 fs/xfs/libxfs/xfs_ialloc.c | 10 ++--
 fs/xfs/xfs_trace.h         |  8 ++--
 5 files changed, 54 insertions(+), 66 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 32f72217c126..35fbd6b19682 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2391,7 +2391,7 @@ xfs_alloc_space_available(
 	reservation = xfs_ag_resv_needed(pag, args->resv);
 
 	/* do we have enough contiguous free space for the allocation? */
-	alloc_len = args->minlen + (args->alignment - 1) + args->minalignslop;
+	alloc_len = args->minlen + (args->alignment - 1) + args->alignslop;
 	longest = xfs_alloc_longest_free_extent(pag, min_free, reservation);
 	if (longest < alloc_len)
 		return false;
@@ -2420,7 +2420,7 @@ xfs_alloc_space_available(
 	 * allocation as we know that will definitely succeed and match the
 	 * callers alignment constraints.
 	 */
-	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
+	alloc_len = args->maxlen + (args->alignment - 1) + args->alignslop;
 	if (longest < alloc_len) {
 		args->maxlen = args->minlen;
 		ASSERT(args->maxlen > 0);
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 0b956f8b9d5a..aa2c103d98f0 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
 	xfs_extlen_t	minleft;	/* min blocks must be left after us */
 	xfs_extlen_t	total;		/* total blocks needed in xaction */
 	xfs_extlen_t	alignment;	/* align answer to multiple of this */
-	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
+	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
 	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
 	xfs_agblock_t	max_agbno;	/* ... */
 	xfs_extlen_t	len;		/* output: actual size of extent */
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index c101cf266bc4..7f8c8e4dd244 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3285,6 +3285,10 @@ xfs_bmap_select_minlen(
 	xfs_extlen_t		blen)
 {
 
+	/* Adjust best length for extent start alignment. */
+	if (blen > args->alignment)
+		blen -= args->alignment;
+
 	/*
 	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
 	 * possible that there is enough contiguous free space for this request.
@@ -3300,6 +3304,7 @@ xfs_bmap_select_minlen(
 	if (blen < args->maxlen)
 		return blen;
 	return args->maxlen;
+
 }
 
 static int
@@ -3393,35 +3398,43 @@ xfs_bmap_alloc_account(
 	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, fld, ap->length);
 }
 
-static int
+/*
+ * Calculate the extent start alignment and the extent length adjustments that
+ * constrain this allocation.
+ *
+ * Extent start alignment is currently determined by stripe configuration and is
+ * carried in args->alignment, whilst extent length adjustment is determined by
+ * extent size hints and is carried by args->prod and args->mod.
+ *
+ * Low level allocation code is free to either ignore or override these values
+ * as required.
+ */
+static void
 xfs_bmap_compute_alignments(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_extlen_t		align = 0; /* minimum allocation alignment */
-	int			stripe_align = 0;
 
 	/* stripe alignment for allocation is determined by mount parameters */
 	if (mp->m_swidth && xfs_has_swalloc(mp))
-		stripe_align = mp->m_swidth;
+		args->alignment = mp->m_swidth;
 	else if (mp->m_dalign)
-		stripe_align = mp->m_dalign;
+		args->alignment = mp->m_dalign;
 
 	if (ap->flags & XFS_BMAPI_COWFORK)
 		align = xfs_get_cowextsz_hint(ap->ip);
 	else if (ap->datatype & XFS_ALLOC_USERDATA)
 		align = xfs_get_extsz_hint(ap->ip);
+
 	if (align) {
 		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
 					ap->eof, 0, ap->conv, &ap->offset,
 					&ap->length))
 			ASSERT(0);
 		ASSERT(ap->length);
-	}
 
-	/* apply extent size hints if obtained earlier */
-	if (align) {
 		args->prod = align;
 		div_u64_rem(ap->offset, args->prod, &args->mod);
 		if (args->mod)
@@ -3436,7 +3449,6 @@ xfs_bmap_compute_alignments(
 			args->mod = args->prod - args->mod;
 	}
 
-	return stripe_align;
 }
 
 static void
@@ -3508,7 +3520,7 @@ xfs_bmap_exact_minlen_extent_alloc(
 	args.total = ap->total;
 
 	args.alignment = 1;
-	args.minalignslop = 0;
+	args.alignslop = 0;
 
 	args.minleft = ap->minleft;
 	args.wasdel = ap->wasdel;
@@ -3548,7 +3560,6 @@ xfs_bmap_btalloc_at_eof(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		blen,
-	int			stripe_align,
 	bool			ag_only)
 {
 	struct xfs_mount	*mp = args->mp;
@@ -3562,23 +3573,15 @@ xfs_bmap_btalloc_at_eof(
 	 * allocation.
 	 */
 	if (ap->offset) {
-		xfs_extlen_t	nextminlen = 0;
+		xfs_extlen_t	alignment = args->alignment;
 
 		/*
-		 * Compute the minlen+alignment for the next case.  Set slop so
-		 * that the value of minlen+alignment+slop doesn't go up between
-		 * the calls.
+		 * Compute the alignment slop for the fallback path so we ensure
+		 * we account for the potential alignemnt space required by the
+		 * fallback paths before we modify the AGF and AGFL here.
 		 */
 		args->alignment = 1;
-		if (blen > stripe_align && blen <= args->maxlen)
-			nextminlen = blen - stripe_align;
-		else
-			nextminlen = args->minlen;
-		if (nextminlen + stripe_align > args->minlen + 1)
-			args->minalignslop = nextminlen + stripe_align -
-					args->minlen - 1;
-		else
-			args->minalignslop = 0;
+		args->alignslop = alignment - args->alignment;
 
 		if (!caller_pag)
 			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
@@ -3596,19 +3599,8 @@ xfs_bmap_btalloc_at_eof(
 		 * Exact allocation failed. Reset to try an aligned allocation
 		 * according to the original allocation specification.
 		 */
-		args->alignment = stripe_align;
-		args->minlen = nextminlen;
-		args->minalignslop = 0;
-	} else {
-		/*
-		 * Adjust minlen to try and preserve alignment if we
-		 * can't guarantee an aligned maxlen extent.
-		 */
-		args->alignment = stripe_align;
-		if (blen > args->alignment &&
-		    blen <= args->maxlen + args->alignment)
-			args->minlen = blen - args->alignment;
-		args->minalignslop = 0;
+		args->alignment = alignment;
+		args->alignslop = 0;
 	}
 
 	if (ag_only) {
@@ -3626,9 +3618,8 @@ xfs_bmap_btalloc_at_eof(
 		return 0;
 
 	/*
-	 * Allocation failed, so turn return the allocation args to their
-	 * original non-aligned state so the caller can proceed on allocation
-	 * failure as if this function was never called.
+	 * Aligned allocation failed, so all fallback paths from here drop the
+	 * start alignment requirement as we know it will not succeed.
 	 */
 	args->alignment = 1;
 	return 0;
@@ -3636,7 +3627,9 @@ xfs_bmap_btalloc_at_eof(
 
 /*
  * We have failed multiple allocation attempts so now are in a low space
- * allocation situation. Try a locality first full filesystem minimum length
+ * allocation situation. We give up on any attempt at aligned allocation here.
+ *
+ * Try a locality first full filesystem minimum length
  * allocation whilst still maintaining necessary total block reservation
  * requirements.
  *
@@ -3653,6 +3646,7 @@ xfs_bmap_btalloc_low_space(
 {
 	int			error;
 
+	args->alignment = 1;
 	if (args->minlen > ap->minlen) {
 		args->minlen = ap->minlen;
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
@@ -3672,13 +3666,11 @@ xfs_bmap_btalloc_low_space(
 static int
 xfs_bmap_btalloc_filestreams(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	int			stripe_align)
+	struct xfs_alloc_arg	*args)
 {
 	xfs_extlen_t		blen = 0;
 	int			error = 0;
 
-
 	error = xfs_filestream_select_ag(ap, args, &blen);
 	if (error)
 		return error;
@@ -3697,8 +3689,7 @@ xfs_bmap_btalloc_filestreams(
 
 	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
 	if (ap->aeof)
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
-				true);
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
 
 	if (!error && args->fsbno == NULLFSBLOCK)
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
@@ -3722,8 +3713,7 @@ xfs_bmap_btalloc_filestreams(
 static int
 xfs_bmap_btalloc_best_length(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	int			stripe_align)
+	struct xfs_alloc_arg	*args)
 {
 	xfs_extlen_t		blen = 0;
 	int			error;
@@ -3747,8 +3737,7 @@ xfs_bmap_btalloc_best_length(
 	 * trying.
 	 */
 	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
-				false);
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
 		if (error || args->fsbno != NULLFSBLOCK)
 			return error;
 	}
@@ -3775,27 +3764,26 @@ xfs_bmap_btalloc(
 		.resv		= XFS_AG_RESV_NONE,
 		.datatype	= ap->datatype,
 		.alignment	= 1,
-		.minalignslop	= 0,
+		.alignslop	= 0,
 	};
 	xfs_fileoff_t		orig_offset;
 	xfs_extlen_t		orig_length;
 	int			error;
-	int			stripe_align;
 
 	ASSERT(ap->length);
 	orig_offset = ap->offset;
 	orig_length = ap->length;
 
-	stripe_align = xfs_bmap_compute_alignments(ap, &args);
+	xfs_bmap_compute_alignments(ap, &args);
 
 	/* Trim the allocation back to the maximum an AG can fit. */
 	args.maxlen = min(ap->length, mp->m_ag_max_usable);
 
 	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
 	    xfs_inode_is_filestream(ap->ip))
-		error = xfs_bmap_btalloc_filestreams(ap, &args, stripe_align);
+		error = xfs_bmap_btalloc_filestreams(ap, &args);
 	else
-		error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align);
+		error = xfs_bmap_btalloc_best_length(ap, &args);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 14c81f227c5b..9f71a9a3a65e 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -758,12 +758,12 @@ xfs_ialloc_ag_alloc(
 		 *
 		 * For an exact allocation, alignment must be 1,
 		 * however we need to take cluster alignment into account when
-		 * fixing up the freelist. Use the minalignslop field to
-		 * indicate that extra blocks might be required for alignment,
-		 * but not to use them in the actual exact allocation.
+		 * fixing up the freelist. Use the alignslop field to indicate
+		 * that extra blocks might be required for alignment, but not
+		 * to use them in the actual exact allocation.
 		 */
 		args.alignment = 1;
-		args.minalignslop = igeo->cluster_align - 1;
+		args.alignslop = igeo->cluster_align - 1;
 
 		/* Allow space for the inode btree to split. */
 		args.minleft = igeo->inobt_maxlevels;
@@ -783,7 +783,7 @@ xfs_ialloc_ag_alloc(
 		 * on, so reset minalignslop to ensure it is not included in
 		 * subsequent requests.
 		 */
-		args.minalignslop = 0;
+		args.alignslop = 0;
 	}
 
 	if (unlikely(args.fsbno == NULLFSBLOCK)) {
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 25ff6fe1eb6c..0b2a2a1379bd 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1808,7 +1808,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__field(xfs_extlen_t, minleft)
 		__field(xfs_extlen_t, total)
 		__field(xfs_extlen_t, alignment)
-		__field(xfs_extlen_t, minalignslop)
+		__field(xfs_extlen_t, alignslop)
 		__field(xfs_extlen_t, len)
 		__field(char, wasdel)
 		__field(char, wasfromfl)
@@ -1827,7 +1827,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__entry->minleft = args->minleft;
 		__entry->total = args->total;
 		__entry->alignment = args->alignment;
-		__entry->minalignslop = args->minalignslop;
+		__entry->alignslop = args->alignslop;
 		__entry->len = args->len;
 		__entry->wasdel = args->wasdel;
 		__entry->wasfromfl = args->wasfromfl;
@@ -1836,7 +1836,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__entry->highest_agno = args->tp->t_highest_agno;
 	),
 	TP_printk("dev %d:%d agno 0x%x agbno 0x%x minlen %u maxlen %u mod %u "
-		  "prod %u minleft %u total %u alignment %u minalignslop %u "
+		  "prod %u minleft %u total %u alignment %u alignslop %u "
 		  "len %u wasdel %d wasfromfl %d resv %d "
 		  "datatype 0x%x highest_agno 0x%x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
@@ -1849,7 +1849,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		  __entry->minleft,
 		  __entry->total,
 		  __entry->alignment,
-		  __entry->minalignslop,
+		  __entry->alignslop,
 		  __entry->len,
 		  __entry->wasdel,
 		  __entry->wasfromfl,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/13] xfs: make EOF allocation simpler
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (2 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 03/13] xfs: simplify extent allocation alignment John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 20:35   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 05/13] xfs: introduce forced allocation alignment John Garry
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

Currently the allocation at EOF is broken into two cases - when the
offset is zero and when the offset is non-zero. When the offset is
non-zero, we try to do exact block allocation for contiguous
extent allocation. When the offset is zero, the allocation is simply
an aligned allocation.

We want aligned allocation as the fallback when exact block
allocation fails, but that complicates the EOF allocation in that it
now has to handle two different allocation cases. The
caller also has to handle allocation when not at EOF, and for the
upcoming forced alignment changes we need that to also be aligned
allocation.

To simplify all this, pull the aligned allocation cases back into
the callers and leave the EOF allocation path for exact block
allocation only. This means that the EOF exact block allocation
fallback path is the normal aligned allocation path and that ends up
making things a lot simpler when forced alignment is introduced.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c   | 129 +++++++++++++++----------------------
 fs/xfs/libxfs/xfs_ialloc.c |   2 +-
 2 files changed, 54 insertions(+), 77 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7f8c8e4dd244..528e3cd81ee6 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3310,12 +3310,12 @@ xfs_bmap_select_minlen(
 static int
 xfs_bmap_btalloc_select_lengths(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		*blen)
+	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*pag;
 	xfs_agnumber_t		agno, startag;
+	xfs_extlen_t		blen = 0;
 	int			error = 0;
 
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
@@ -3329,19 +3329,18 @@ xfs_bmap_btalloc_select_lengths(
 	if (startag == NULLAGNUMBER)
 		startag = 0;
 
-	*blen = 0;
 	for_each_perag_wrap(mp, startag, agno, pag) {
-		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
+		error = xfs_bmap_longest_free_extent(pag, args->tp, &blen);
 		if (error && error != -EAGAIN)
 			break;
 		error = 0;
-		if (*blen >= args->maxlen)
+		if (blen >= args->maxlen)
 			break;
 	}
 	if (pag)
 		xfs_perag_rele(pag);
 
-	args->minlen = xfs_bmap_select_minlen(ap, args, *blen);
+	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
 	return error;
 }
 
@@ -3551,78 +3550,40 @@ xfs_bmap_exact_minlen_extent_alloc(
  * If we are not low on available data blocks and we are allocating at
  * EOF, optimise allocation for contiguous file extension and/or stripe
  * alignment of the new extent.
- *
- * NOTE: ap->aeof is only set if the allocation length is >= the
- * stripe unit and the allocation offset is at the end of file.
  */
 static int
 xfs_bmap_btalloc_at_eof(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		blen,
-	bool			ag_only)
+	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*caller_pag = args->pag;
+	xfs_extlen_t		alignment = args->alignment;
 	int			error;
 
+	ASSERT(ap->aeof && ap->offset);
+	ASSERT(args->alignment >= 1);
+
 	/*
-	 * If there are already extents in the file, try an exact EOF block
-	 * allocation to extend the file as a contiguous extent. If that fails,
-	 * or it's the first allocation in a file, just try for a stripe aligned
-	 * allocation.
+	 * Compute the alignment slop for the fallback path so we ensure
+	 * we account for the potential alignemnt space required by the
+	 * fallback paths before we modify the AGF and AGFL here.
 	 */
-	if (ap->offset) {
-		xfs_extlen_t	alignment = args->alignment;
-
-		/*
-		 * Compute the alignment slop for the fallback path so we ensure
-		 * we account for the potential alignemnt space required by the
-		 * fallback paths before we modify the AGF and AGFL here.
-		 */
-		args->alignment = 1;
-		args->alignslop = alignment - args->alignment;
-
-		if (!caller_pag)
-			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
-		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
-		if (!caller_pag) {
-			xfs_perag_put(args->pag);
-			args->pag = NULL;
-		}
-		if (error)
-			return error;
-
-		if (args->fsbno != NULLFSBLOCK)
-			return 0;
-		/*
-		 * Exact allocation failed. Reset to try an aligned allocation
-		 * according to the original allocation specification.
-		 */
-		args->alignment = alignment;
-		args->alignslop = 0;
-	}
+	args->alignment = 1;
+	args->alignslop = alignment - args->alignment;
 
-	if (ag_only) {
-		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
-	} else {
+	if (!caller_pag)
+		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
+	error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
+	if (!caller_pag) {
+		xfs_perag_put(args->pag);
 		args->pag = NULL;
-		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
-		ASSERT(args->pag == NULL);
-		args->pag = caller_pag;
 	}
-	if (error)
-		return error;
 
-	if (args->fsbno != NULLFSBLOCK)
-		return 0;
-
-	/*
-	 * Aligned allocation failed, so all fallback paths from here drop the
-	 * start alignment requirement as we know it will not succeed.
-	 */
-	args->alignment = 1;
-	return 0;
+	/* Reset alignment to original specifications.  */
+	args->alignment = alignment;
+	args->alignslop = 0;
+	return error;
 }
 
 /*
@@ -3688,12 +3649,19 @@ xfs_bmap_btalloc_filestreams(
 	}
 
 	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
-	if (ap->aeof)
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
+	if (ap->aeof && ap->offset)
+		error = xfs_bmap_btalloc_at_eof(ap, args);
 
+	/* This may be an aligned allocation attempt. */
 	if (!error && args->fsbno == NULLFSBLOCK)
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
 
+	/* Attempt non-aligned allocation if we haven't already. */
+	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		args->alignment = 1;
+		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
+	}
+
 out_low_space:
 	/*
 	 * We are now done with the perag reference for the filestreams
@@ -3715,7 +3683,6 @@ xfs_bmap_btalloc_best_length(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
 {
-	xfs_extlen_t		blen = 0;
 	int			error;
 
 	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
@@ -3726,23 +3693,33 @@ xfs_bmap_btalloc_best_length(
 	 * the request.  If one isn't found, then adjust the minimum allocation
 	 * size to the largest space found.
 	 */
-	error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
+	error = xfs_bmap_btalloc_select_lengths(ap, args);
 	if (error)
 		return error;
 
 	/*
-	 * Don't attempt optimal EOF allocation if previous allocations barely
-	 * succeeded due to being near ENOSPC. It is highly unlikely we'll get
-	 * optimal or even aligned allocations in this case, so don't waste time
-	 * trying.
+	 * If we are in low space mode, then optimal allocation will fail so
+	 * prepare for minimal allocation and run the low space algorithm
+	 * immediately.
 	 */
-	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
-		if (error || args->fsbno != NULLFSBLOCK)
-			return error;
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
+		ASSERT(args->fsbno == NULLFSBLOCK);
+		return xfs_bmap_btalloc_low_space(ap, args);
+	}
+
+	if (ap->aeof && ap->offset)
+		error = xfs_bmap_btalloc_at_eof(ap, args);
+
+	/* This may be an aligned allocation attempt. */
+	if (!error && args->fsbno == NULLFSBLOCK)
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
+
+	/* Attempt non-aligned allocation if we haven't already. */
+	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		args->alignment = 1;
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	}
 
-	error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	if (error || args->fsbno != NULLFSBLOCK)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 9f71a9a3a65e..40a2daeea712 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -780,7 +780,7 @@ xfs_ialloc_ag_alloc(
 		 * the exact agbno requirement and increase the alignment
 		 * instead. It is critical that the total size of the request
 		 * (len + alignment + slop) does not increase from this point
-		 * on, so reset minalignslop to ensure it is not included in
+		 * on, so reset alignslop to ensure it is not included in
 		 * subsequent requests.
 		 */
 		args.alignslop = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/13] xfs: introduce forced allocation alignment
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (3 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 04/13] xfs: make EOF allocation simpler John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 20:37   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 06/13] xfs: align args->minlen for " John Garry
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

When forced allocation alignment is specified, the extent will
be aligned to the extent size hint size rather than stripe
alignment. If aligned allocation cannot be done, then the allocation
is failed rather than attempting non-aligned fallbacks.

Note: none of the per-inode force align configuration is present
yet, so this just triggers off an "always false" wrapper function
for the moment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.h |  1 +
 fs/xfs/libxfs/xfs_bmap.c  | 29 +++++++++++++++++++++++------
 fs/xfs/xfs_inode.h        |  5 +++++
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index aa2c103d98f0..7de2e6f64882 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -66,6 +66,7 @@ typedef struct xfs_alloc_arg {
 #define XFS_ALLOC_USERDATA		(1 << 0)/* allocation is for user data*/
 #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
 #define XFS_ALLOC_NOBUSY		(1 << 2)/* Busy extents not allowed */
+#define XFS_ALLOC_FORCEALIGN		(1 << 3)/* forced extent alignment */
 
 /* freespace limit calculations */
 unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 528e3cd81ee6..9131ba8113a6 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3401,9 +3401,10 @@ xfs_bmap_alloc_account(
  * Calculate the extent start alignment and the extent length adjustments that
  * constrain this allocation.
  *
- * Extent start alignment is currently determined by stripe configuration and is
- * carried in args->alignment, whilst extent length adjustment is determined by
- * extent size hints and is carried by args->prod and args->mod.
+ * Extent start alignment is currently determined by forced inode alignment or
+ * stripe configuration and is carried in args->alignment, whilst extent length
+ * adjustment is determined by extent size hints and is carried by args->prod
+ * and args->mod.
  *
  * Low level allocation code is free to either ignore or override these values
  * as required.
@@ -3416,11 +3417,18 @@ xfs_bmap_compute_alignments(
 	struct xfs_mount	*mp = args->mp;
 	xfs_extlen_t		align = 0; /* minimum allocation alignment */
 
-	/* stripe alignment for allocation is determined by mount parameters */
-	if (mp->m_swidth && xfs_has_swalloc(mp))
+	/*
+	 * Forced inode alignment takes preference over stripe alignment.
+	 * Stripe alignment for allocation is determined by mount parameters.
+	 */
+	if (xfs_inode_has_forcealign(ap->ip)) {
+		args->alignment = xfs_get_extsz_hint(ap->ip);
+		args->datatype |= XFS_ALLOC_FORCEALIGN;
+	} else if (mp->m_swidth && xfs_has_swalloc(mp)) {
 		args->alignment = mp->m_swidth;
-	else if (mp->m_dalign)
+	} else if (mp->m_dalign) {
 		args->alignment = mp->m_dalign;
+	}
 
 	if (ap->flags & XFS_BMAPI_COWFORK)
 		align = xfs_get_cowextsz_hint(ap->ip);
@@ -3607,6 +3615,11 @@ xfs_bmap_btalloc_low_space(
 {
 	int			error;
 
+	if (args->alignment > 1 && (args->datatype & XFS_ALLOC_FORCEALIGN)) {
+		args->fsbno = NULLFSBLOCK;
+		return 0;
+	}
+
 	args->alignment = 1;
 	if (args->minlen > ap->minlen) {
 		args->minlen = ap->minlen;
@@ -3658,6 +3671,8 @@ xfs_bmap_btalloc_filestreams(
 
 	/* Attempt non-aligned allocation if we haven't already. */
 	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		if (args->datatype & XFS_ALLOC_FORCEALIGN)
+			return error;
 		args->alignment = 1;
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
 	}
@@ -3716,6 +3731,8 @@ xfs_bmap_btalloc_best_length(
 
 	/* Attempt non-aligned allocation if we haven't already. */
 	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		if (args->datatype & XFS_ALLOC_FORCEALIGN)
+			return error;
 		args->alignment = 1;
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 292b90b5f2ac..42f999c1106c 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -311,6 +311,11 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
 	return ip->i_diflags2 & XFS_DIFLAG2_NREXT64;
 }
 
+static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
+{
+	return false;
+}
+
 /*
  * Decide if this file is a realtime file whose data allocation unit is larger
  * than a single filesystem block.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/13] xfs: align args->minlen for forced allocation alignment
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (4 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 05/13] xfs: introduce forced allocation alignment John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 20:38   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 07/13] xfs: Introduce FORCEALIGN inode flag John Garry
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

If args->minlen is not aligned to the constraints of forced
alignment, we may do minlen allocations that are not aligned when we
approach ENOSPC. Avoid this by always aligning args->minlen
appropriately. If alignment of minlen results in a value smaller
than the alignment constraint, fail the allocation immediately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 45 +++++++++++++++++++++++++++-------------
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 9131ba8113a6..c9cf138e13c4 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3278,33 +3278,48 @@ xfs_bmap_longest_free_extent(
 	return 0;
 }
 
-static xfs_extlen_t
+static int
 xfs_bmap_select_minlen(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		blen)
 {
-
 	/* Adjust best length for extent start alignment. */
 	if (blen > args->alignment)
 		blen -= args->alignment;
 
 	/*
 	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
-	 * possible that there is enough contiguous free space for this request.
+	 * possible that there is enough contiguous free space for this request
+	 * even if best length is less that the minimum length we need.
+	 *
+	 * If the best length won't satisfy the maximum length we requested,
+	 * then use it as the minimum length so we get as large an allocation
+	 * as possible.
 	 */
 	if (blen < ap->minlen)
-		return ap->minlen;
+		blen = ap->minlen;
+	else if (blen > args->maxlen)
+		blen = args->maxlen;
 
 	/*
-	 * If the best seen length is less than the request length,
-	 * use the best as the minimum, otherwise we've got the maxlen we
-	 * were asked for.
+	 * If we have alignment constraints, round the minlen down to match the
+	 * constraint so that alignment will be attempted. This may reduce the
+	 * allocation to smaller than was requested, so clamp the minimum to
+	 * ap->minlen to allow unaligned allocation to succeed. If we are forced
+	 * to align the allocation, return ENOSPC at this point because we don't
+	 * have enough contiguous free space to guarantee aligned allocation.
 	 */
-	if (blen < args->maxlen)
-		return blen;
-	return args->maxlen;
-
+	if (args->alignment > 1) {
+		blen = rounddown(blen, args->alignment);
+		if (blen < ap->minlen) {
+			if (args->datatype & XFS_ALLOC_FORCEALIGN)
+				return -ENOSPC;
+			blen = ap->minlen;
+		}
+	}
+	args->minlen = blen;
+	return 0;
 }
 
 static int
@@ -3340,8 +3355,7 @@ xfs_bmap_btalloc_select_lengths(
 	if (pag)
 		xfs_perag_rele(pag);
 
-	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
-	return error;
+	return xfs_bmap_select_minlen(ap, args, blen);
 }
 
 /* Update all inode and quota accounting for the allocation we just did. */
@@ -3661,7 +3675,10 @@ xfs_bmap_btalloc_filestreams(
 		goto out_low_space;
 	}
 
-	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
+	error = xfs_bmap_select_minlen(ap, args, blen);
+	if (error)
+		goto out_low_space;
+
 	if (ap->aeof && ap->offset)
 		error = xfs_bmap_btalloc_at_eof(ap, args);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (5 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 06/13] xfs: align args->minlen for " John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 19:07   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 08/13] xfs: Do not free EOF blocks for forcealign John Garry
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: "Darrick J. Wong" <djwong@kernel.org>

Add a new inode flag to require that all file data extent mappings must
be aligned (both the file offset range and the allocated space itself)
to the extent size hint.  Having a separate COW extent size hint is no
longer allowed.

The goal here is to enable sysadmins and users to mandate that all space
mappings in a file must have a startoff/blockcount that are aligned to
(say) a 2MB alignment and that the startblock/blockcount will follow the
same alignment.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Co-developed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h    |  6 +++-
 fs/xfs/libxfs/xfs_inode_buf.c | 53 +++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_inode_buf.h |  3 ++
 fs/xfs/libxfs/xfs_sb.c        |  2 ++
 fs/xfs/xfs_inode.c            | 13 +++++++++
 fs/xfs/xfs_inode.h            | 20 ++++++++++++-
 fs/xfs/xfs_ioctl.c            | 47 +++++++++++++++++++++++++++++--
 fs/xfs/xfs_mount.h            |  2 ++
 fs/xfs/xfs_reflink.h          | 10 -------
 fs/xfs/xfs_super.c            |  4 +++
 include/uapi/linux/fs.h       |  2 ++
 11 files changed, 148 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 61f51becff4f..b48cd75d34a6 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -353,6 +353,7 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
@@ -1094,16 +1095,19 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
 #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
 #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
+/* data extent mappings for regular files must be aligned to extent size hint */
+#define XFS_DIFLAG2_FORCEALIGN_BIT 5
 
 #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
 #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
 #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
 #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
 #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_FORCEALIGN	(1 << XFS_DIFLAG2_FORCEALIGN_BIT)
 
 #define XFS_DIFLAG2_ANY \
 	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
-	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
+	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_FORCEALIGN)
 
 static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 {
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index e7a7bfbe75b4..b2c5f466c1a9 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -644,6 +644,15 @@ xfs_dinode_verify(
 	    !xfs_has_bigtime(mp))
 		return __this_address;
 
+	if (flags2 & XFS_DIFLAG2_FORCEALIGN) {
+		fa = xfs_inode_validate_forcealign(mp,
+			be32_to_cpu(dip->di_extsize),
+			be32_to_cpu(dip->di_cowextsize),
+			mode, flags, flags2);
+		if (fa)
+			return fa;
+	}
+
 	return NULL;
 }
 
@@ -811,3 +820,47 @@ xfs_inode_validate_cowextsize(
 
 	return NULL;
 }
+
+/* Validate the forcealign inode flag */
+xfs_failaddr_t
+xfs_inode_validate_forcealign(
+	struct xfs_mount	*mp,
+	uint32_t		extsize,
+	uint32_t		cowextsize,
+	uint16_t		mode,
+	uint16_t		flags,
+	uint64_t		flags2)
+{
+	/* superblock rocompat feature flag */
+	if (!xfs_has_forcealign(mp))
+		return __this_address;
+
+	/* Only regular files and directories */
+	if (!S_ISDIR(mode) && !S_ISREG(mode))
+		return __this_address;
+
+	/* We require EXTSIZE or EXTSZINHERIT */
+	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
+		return __this_address;
+
+	/* We require a non-zero extsize */
+	if (!extsize)
+		return __this_address;
+
+	/* Reflink'ed disallowed */
+	if (flags2 & XFS_DIFLAG2_REFLINK)
+		return __this_address;
+
+	/* COW extsize disallowed */
+	if (flags2 & XFS_DIFLAG2_COWEXTSIZE)
+		return __this_address;
+
+	if (cowextsize)
+		return __this_address;
+
+	/* A RT device with sb_rextsize=1 could make use of forcealign */
+	if (flags & XFS_DIFLAG_REALTIME && mp->m_sb.sb_rextsize != 1)
+		return __this_address;
+
+	return NULL;
+}
diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h
index 585ed5a110af..b8b65287b037 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.h
+++ b/fs/xfs/libxfs/xfs_inode_buf.h
@@ -33,6 +33,9 @@ xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp,
 xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp,
 		uint32_t cowextsize, uint16_t mode, uint16_t flags,
 		uint64_t flags2);
+xfs_failaddr_t xfs_inode_validate_forcealign(struct xfs_mount *mp,
+		uint32_t extsize, uint32_t cowextsize, uint16_t mode,
+		uint16_t flags, uint64_t flags2);
 
 static inline uint64_t xfs_inode_encode_bigtime(struct timespec64 tv)
 {
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 6b56f0f6d4c1..e56911553edd 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -164,6 +164,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_REFLINK;
 	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
 		features |= XFS_FEAT_INOBTCNT;
+	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
+		features |= XFS_FEAT_FORCEALIGN;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
 		features |= XFS_FEAT_FTYPE;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f36091e1e7f5..994fb7e184d9 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -608,6 +608,8 @@ xfs_ip2xflags(
 			flags |= FS_XFLAG_DAX;
 		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
 			flags |= FS_XFLAG_COWEXTSIZE;
+		if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
+			flags |= FS_XFLAG_FORCEALIGN;
 	}
 
 	if (xfs_inode_has_attr_fork(ip))
@@ -737,6 +739,8 @@ xfs_inode_inherit_flags2(
 	}
 	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
 		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
+	if (pip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
+		ip->i_diflags2 |= XFS_DIFLAG2_FORCEALIGN;
 
 	/* Don't let invalid cowextsize hints propagate. */
 	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
@@ -745,6 +749,15 @@ xfs_inode_inherit_flags2(
 		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
 		ip->i_cowextsize = 0;
 	}
+
+	if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN) {
+		failaddr = xfs_inode_validate_forcealign(ip->i_mount,
+				ip->i_extsize, ip->i_cowextsize,
+				VFS_I(ip)->i_mode, ip->i_diflags,
+				ip->i_diflags2);
+		if (failaddr)
+			ip->i_diflags2 &= ~XFS_DIFLAG2_FORCEALIGN;
+	}
 }
 
 /*
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 42f999c1106c..536e646dd055 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -301,6 +301,16 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
 	return ip->i_cowfp && ip->i_cowfp->if_bytes;
 }
 
+static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
+{
+	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
+}
+
+static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
+{
+	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
+}
+
 static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
 {
 	return ip->i_diflags2 & XFS_DIFLAG2_BIGTIME;
@@ -313,7 +323,15 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
 
 static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
 {
-	return false;
+	if (!(ip->i_diflags & XFS_DIFLAG_EXTSIZE))
+		return false;
+	if (ip->i_extsize <= 1)
+		return false;
+	if (xfs_is_cow_inode(ip))
+		return false;
+	if (ip->i_diflags & XFS_DIFLAG_REALTIME)
+		return false;
+	return ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN;
 }
 
 /*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index f0117188f302..5eff8fd9fa3e 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -525,10 +525,48 @@ xfs_flags2diflags2(
 		di_flags2 |= XFS_DIFLAG2_DAX;
 	if (xflags & FS_XFLAG_COWEXTSIZE)
 		di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
+	if (xflags & FS_XFLAG_FORCEALIGN)
+		di_flags2 |= XFS_DIFLAG2_FORCEALIGN;
 
 	return di_flags2;
 }
 
+/*
+ * Forcealign requires a non-zero extent size hint and a zero cow
+ * extent size hint.  Don't allow set for RT files yet.
+ */
+static int
+xfs_ioctl_setattr_forcealign(
+	struct xfs_inode	*ip,
+	struct fileattr		*fa)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (!xfs_has_forcealign(mp))
+		return -EINVAL;
+
+	if (xfs_is_reflink_inode(ip))
+		return -EINVAL;
+
+	if (!(fa->fsx_xflags & (FS_XFLAG_EXTSIZE |
+				FS_XFLAG_EXTSZINHERIT)))
+		return -EINVAL;
+
+	if (fa->fsx_xflags & FS_XFLAG_COWEXTSIZE)
+		return -EINVAL;
+
+	if (!fa->fsx_extsize)
+		return -EINVAL;
+
+	if (fa->fsx_cowextsize)
+		return -EINVAL;
+
+	if (fa->fsx_xflags & FS_XFLAG_REALTIME)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int
 xfs_ioctl_setattr_xflags(
 	struct xfs_trans	*tp,
@@ -537,10 +575,12 @@ xfs_ioctl_setattr_xflags(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	bool			rtflag = (fa->fsx_xflags & FS_XFLAG_REALTIME);
+	bool			forcealign = fa->fsx_xflags & FS_XFLAG_FORCEALIGN;
 	uint64_t		i_flags2;
 
-	if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
-		/* Can't change realtime flag if any extents are allocated. */
+	/* Can't change RT or forcealign flags if any extents are allocated. */
+	if (rtflag != XFS_IS_REALTIME_INODE(ip) ||
+	    forcealign != xfs_inode_has_forcealign(ip)) {
 		if (ip->i_df.if_nextents || ip->i_delayed_blks)
 			return -EINVAL;
 	}
@@ -561,6 +601,9 @@ xfs_ioctl_setattr_xflags(
 	if (i_flags2 && !xfs_has_v3inodes(mp))
 		return -EINVAL;
 
+	if (forcealign && (xfs_ioctl_setattr_forcealign(ip, fa) < 0))
+		return -EINVAL;
+
 	ip->i_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
 	ip->i_diflags2 = i_flags2;
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index d0567dfbc036..30228fea908d 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -299,6 +299,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
 #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
 #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
+#define XFS_FEAT_FORCEALIGN	(1ULL << 28)	/* aligned file data extents */
 
 /* Mount features */
 #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
@@ -385,6 +386,7 @@ __XFS_ADD_V4_FEAT(projid32, PROJID32)
 __XFS_HAS_V4_FEAT(v3inodes, V3INODES)
 __XFS_HAS_V4_FEAT(crc, CRC)
 __XFS_HAS_V4_FEAT(pquotino, PQUOTINO)
+__XFS_HAS_FEAT(forcealign, FORCEALIGN)
 
 /*
  * Mount features
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 65c5dfe17ecf..fb55e4ce49fa 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -6,16 +6,6 @@
 #ifndef __XFS_REFLINK_H
 #define __XFS_REFLINK_H 1
 
-static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
-{
-	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
-}
-
-static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
-{
-	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
-}
-
 extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
 		struct xfs_bmbt_irec *irec, bool *shared);
 int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 27e9f749c4c7..852bbfb21506 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1721,6 +1721,10 @@ xfs_fs_fill_super(
 		mp->m_features &= ~XFS_FEAT_DISCARD;
 	}
 
+	if (xfs_has_forcealign(mp))
+		xfs_warn(mp,
+"EXPERIMENTAL forced data extent alignment feature in use. Use at your own risk!");
+
 	if (xfs_has_reflink(mp)) {
 		if (mp->m_sb.sb_rblocks) {
 			xfs_alert(mp,
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 45e4e64fd664..8af304c0e29a 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -158,6 +158,8 @@ struct fsxattr {
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+/* data extent mappings for regular files must be aligned to extent size hint */
+#define FS_XFLAG_FORCEALIGN	0x00020000
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /* the read-only stuff doesn't really belong here, but any other place is
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/13] xfs: Do not free EOF blocks for forcealign
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (6 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 07/13] xfs: Introduce FORCEALIGN inode flag John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 19:08   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 09/13] xfs: Update xfs_inode_alloc_unitsize_fsb() " John Garry
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For when forcealign is enabled, we want the EOF to be aligned as well, so
do not free EOF blocks.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_bmap_util.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index e5d893f93522..56b80a7c0992 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -539,8 +539,13 @@ xfs_can_free_eofblocks(
 	 * forever.
 	 */
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
-	if (xfs_inode_has_bigrtalloc(ip))
+
+	/* Do not free blocks when forcing extent sizes */
+	if (xfs_inode_has_forcealign(ip))
+		end_fsb = roundup_64(end_fsb, ip->i_extsize);
+	else if (xfs_inode_has_bigrtalloc(ip))
 		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
+
 	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
 	if (last_fsb <= end_fsb)
 		return false;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/13] xfs: Update xfs_inode_alloc_unitsize_fsb() for forcealign
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (7 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 08/13] xfs: Do not free EOF blocks for forcealign John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 18:38   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 10/13] xfs: Unmap blocks according to forcealign John Garry
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For forcealign enabled, the allocation unit size is in ip->i_extsize, and
this must never be zero.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 994fb7e184d9..cd07b15478ac 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -4299,6 +4299,8 @@ xfs_inode_alloc_unitsize(
 {
 	unsigned int		blocks = 1;
 
+	if (xfs_inode_has_forcealign(ip))
+		return ip->i_extsize;
 	if (XFS_IS_REALTIME_INODE(ip))
 		blocks = ip->i_mount->m_sb.sb_rextsize;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/13] xfs: Unmap blocks according to forcealign
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (8 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 09/13] xfs: Update xfs_inode_alloc_unitsize_fsb() " John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 19:12   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 11/13] xfs: Only free full extents for forcealign John Garry
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For when forcealign is enabled, blocks in an inode need to be unmapped
according to extent alignment, like what is already done for rtvol.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 38 +++++++++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index c9cf138e13c4..ebeb2969b289 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5380,6 +5380,25 @@ xfs_bmap_del_extent_real(
 	return 0;
 }
 
+static xfs_extlen_t
+xfs_bunmapi_align(
+	struct xfs_inode	*ip,
+	xfs_fsblock_t		bno)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_agblock_t		agbno;
+
+	if (xfs_inode_has_forcealign(ip)) {
+		if (is_power_of_2(ip->i_extsize))
+			return bno & (ip->i_extsize - 1);
+
+		agbno = XFS_FSB_TO_AGBNO(mp, bno);
+		return do_div(agbno, ip->i_extsize);
+	}
+	ASSERT(XFS_IS_REALTIME_INODE(ip));
+	return xfs_rtb_to_rtxoff(ip->i_mount, bno);
+}
+
 /*
  * Unmap (remove) blocks from a file.
  * If nexts is nonzero then the number of extents to remove is limited to
@@ -5402,6 +5421,7 @@ __xfs_bunmapi(
 	struct xfs_bmbt_irec	got;		/* current extent record */
 	struct xfs_ifork	*ifp;		/* inode fork pointer */
 	int			isrt;		/* freeing in rt area */
+	int			isforcealign;	/* freeing for inode with forcealign */
 	int			logflags;	/* transaction logging flags */
 	xfs_extlen_t		mod;		/* rt extent offset */
 	struct xfs_mount	*mp = ip->i_mount;
@@ -5439,6 +5459,8 @@ __xfs_bunmapi(
 	}
 	XFS_STATS_INC(mp, xs_blk_unmap);
 	isrt = xfs_ifork_is_realtime(ip, whichfork);
+	isforcealign = (whichfork != XFS_ATTR_FORK) &&
+			xfs_inode_has_forcealign(ip);
 	end = start + len;
 
 	if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) {
@@ -5490,11 +5512,10 @@ __xfs_bunmapi(
 		if (del.br_startoff + del.br_blockcount > end + 1)
 			del.br_blockcount = end + 1 - del.br_startoff;
 
-		if (!isrt || (flags & XFS_BMAPI_REMAP))
+		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
 			goto delete;
 
-		mod = xfs_rtb_to_rtxoff(mp,
-				del.br_startblock + del.br_blockcount);
+		mod = xfs_bunmapi_align(ip, del.br_startblock + del.br_blockcount);
 		if (mod) {
 			/*
 			 * Realtime extent not lined up at the end.
@@ -5542,9 +5563,16 @@ __xfs_bunmapi(
 			goto nodelete;
 		}
 
-		mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
+		mod = xfs_bunmapi_align(ip, del.br_startblock);
 		if (mod) {
-			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
+			xfs_extlen_t off;
+
+			if (isforcealign) {
+				off = ip->i_extsize - mod;
+			} else {
+				ASSERT(isrt);
+				off = mp->m_sb.sb_rextsize - mod;
+			}
 
 			/*
 			 * Realtime extent is lined up at the end but not
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/13] xfs: Only free full extents for forcealign
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (9 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 10/13] xfs: Unmap blocks according to forcealign John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 19:13   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 12/13] xfs: Don't revert allocated offset " John Garry
  2024-06-21 10:05 ` [PATCH 13/13] xfs: Enable file data forcealign feature John Garry
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

Like we already do for rtvol, only free full extents for forcealign in
xfs_free_file_space().

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_bmap_util.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 56b80a7c0992..ee767a4fd63a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -842,8 +842,11 @@ xfs_free_file_space(
 	startoffset_fsb = XFS_B_TO_FSB(mp, offset);
 	endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
 
-	/* We can only free complete realtime extents. */
-	if (xfs_inode_has_bigrtalloc(ip)) {
+	/* Free only complete extents. */
+	if (xfs_inode_has_forcealign(ip)) {
+		startoffset_fsb = roundup_64(startoffset_fsb, ip->i_extsize);
+		endoffset_fsb = rounddown_64(endoffset_fsb, ip->i_extsize);
+	} else if (xfs_inode_has_bigrtalloc(ip)) {
 		startoffset_fsb = xfs_rtb_roundup_rtx(mp, startoffset_fsb);
 		endoffset_fsb = xfs_rtb_rounddown_rtx(mp, endoffset_fsb);
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 12/13] xfs: Don't revert allocated offset for forcealign
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (10 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 11/13] xfs: Only free full extents for forcealign John Garry
@ 2024-06-21 10:05 ` John Garry
  2024-06-21 19:13   ` Darrick J. Wong
  2024-06-21 10:05 ` [PATCH 13/13] xfs: Enable file data forcealign feature John Garry
  12 siblings, 1 reply; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

In xfs_bmap_process_allocated_extent(), for when we found that we could not
provide the requested length completely, the mapping is moved so that we
can provide as much as possible for the original request.

For forcealign, this would mean ignoring alignment guaranteed, so don't do
this.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index ebeb2969b289..42f3582c1574 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3492,11 +3492,15 @@ xfs_bmap_process_allocated_extent(
 	 * original request as possible.  Free space is apparently
 	 * very fragmented so we're unlikely to be able to satisfy the
 	 * hints anyway.
+	 * However, for an inode with forcealign, continue with the
+	 * found offset as we need to honour the alignment hint.
 	 */
-	if (ap->length <= orig_length)
-		ap->offset = orig_offset;
-	else if (ap->offset + ap->length < orig_offset + orig_length)
-		ap->offset = orig_offset + orig_length - ap->length;
+	if (!xfs_inode_has_forcealign(ap->ip)) {
+		if (ap->length <= orig_length)
+			ap->offset = orig_offset;
+		else if (ap->offset + ap->length < orig_offset + orig_length)
+			ap->offset = orig_offset + orig_length - ap->length;
+	}
 	xfs_bmap_alloc_account(ap);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 13/13] xfs: Enable file data forcealign feature
  2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
                   ` (11 preceding siblings ...)
  2024-06-21 10:05 ` [PATCH 12/13] xfs: Don't revert allocated offset " John Garry
@ 2024-06-21 10:05 ` John Garry
  12 siblings, 0 replies; 31+ messages in thread
From: John Garry @ 2024-06-21 10:05 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: "Darrick J. Wong" <djwong@kernel.org>

Enable this feature.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index b48cd75d34a6..42e1f80206ab 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -358,7 +358,8 @@ xfs_sb_has_compat_feature(
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
 		 XFS_SB_FEAT_RO_COMPAT_REFLINK| \
-		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT | \
+		 XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/13] xfs: Update xfs_inode_alloc_unitsize_fsb() for forcealign
  2024-06-21 10:05 ` [PATCH 09/13] xfs: Update xfs_inode_alloc_unitsize_fsb() " John Garry
@ 2024-06-21 18:38   ` Darrick J. Wong
  2024-06-24  7:34     ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 18:38 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:36AM +0000, John Garry wrote:
> For forcealign enabled, the allocation unit size is in ip->i_extsize, and
> this must never be zero.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/xfs_inode.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 994fb7e184d9..cd07b15478ac 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -4299,6 +4299,8 @@ xfs_inode_alloc_unitsize(
>  {
>  	unsigned int		blocks = 1;
>  
> +	if (xfs_inode_has_forcealign(ip))
> +		return ip->i_extsize;

i_extsize is in units of fsblocks, whereas this function returns bytes.
I think you need XFS_FSB_TO_B here?

--D

>  	if (XFS_IS_REALTIME_INODE(ip))
>  		blocks = ip->i_mount->m_sb.sb_rextsize;
>  
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-06-21 10:05 ` [PATCH 07/13] xfs: Introduce FORCEALIGN inode flag John Garry
@ 2024-06-21 19:07   ` Darrick J. Wong
  2024-06-24 14:36     ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 19:07 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:34AM +0000, John Garry wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> Add a new inode flag to require that all file data extent mappings must
> be aligned (both the file offset range and the allocated space itself)
> to the extent size hint.  Having a separate COW extent size hint is no
> longer allowed.

It might be worth mentioning that for non-rt files, the allocated space
will be aligned to the start of an AG, not necessary the block device,
though the upcoming atomicwrites inode flag will also require that.

Also this should clarify what happens for rt files -- do we allow
forcealign realtime files?  Or only for the sb_rextsize == 1 case?
Or for any sbrextsize?

> The goal here is to enable sysadmins and users to mandate that all space
> mappings in a file must have a startoff/blockcount that are aligned to
> (say) a 2MB alignment and that the startblock/blockcount will follow the
> same alignment.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> Co-developed-by: John Garry <john.g.garry@oracle.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_format.h    |  6 +++-
>  fs/xfs/libxfs/xfs_inode_buf.c | 53 +++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_inode_buf.h |  3 ++
>  fs/xfs/libxfs/xfs_sb.c        |  2 ++
>  fs/xfs/xfs_inode.c            | 13 +++++++++
>  fs/xfs/xfs_inode.h            | 20 ++++++++++++-
>  fs/xfs/xfs_ioctl.c            | 47 +++++++++++++++++++++++++++++--
>  fs/xfs/xfs_mount.h            |  2 ++
>  fs/xfs/xfs_reflink.h          | 10 -------
>  fs/xfs/xfs_super.c            |  4 +++
>  include/uapi/linux/fs.h       |  2 ++
>  11 files changed, 148 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 61f51becff4f..b48cd75d34a6 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -353,6 +353,7 @@ xfs_sb_has_compat_feature(
>  #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
>  #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
>  #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
> +#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
>  #define XFS_SB_FEAT_RO_COMPAT_ALL \
>  		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
>  		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
> @@ -1094,16 +1095,19 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
>  #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
>  #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
>  #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
> +/* data extent mappings for regular files must be aligned to extent size hint */
> +#define XFS_DIFLAG2_FORCEALIGN_BIT 5
>  
>  #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
>  #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
>  #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
>  #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
>  #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
> +#define XFS_DIFLAG2_FORCEALIGN	(1 << XFS_DIFLAG2_FORCEALIGN_BIT)
>  
>  #define XFS_DIFLAG2_ANY \
>  	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
> -	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
> +	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_FORCEALIGN)
>  
>  static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
>  {
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index e7a7bfbe75b4..b2c5f466c1a9 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -644,6 +644,15 @@ xfs_dinode_verify(
>  	    !xfs_has_bigtime(mp))
>  		return __this_address;
>  
> +	if (flags2 & XFS_DIFLAG2_FORCEALIGN) {
> +		fa = xfs_inode_validate_forcealign(mp,
> +			be32_to_cpu(dip->di_extsize),
> +			be32_to_cpu(dip->di_cowextsize),
> +			mode, flags, flags2);

Needs another level of indent.

		fa = xfs_inode_validate_forcealign(mp,
				be32_to_cpu(dip->di_extsize),
				be32_to_cpu(dip->di_cowextsize),
				mode, flags, flags2);

> +		if (fa)
> +			return fa;
> +	}
> +
>  	return NULL;
>  }
>  
> @@ -811,3 +820,47 @@ xfs_inode_validate_cowextsize(
>  
>  	return NULL;
>  }
> +
> +/* Validate the forcealign inode flag */
> +xfs_failaddr_t
> +xfs_inode_validate_forcealign(
> +	struct xfs_mount	*mp,
> +	uint32_t		extsize,
> +	uint32_t		cowextsize,
> +	uint16_t		mode,
> +	uint16_t		flags,
> +	uint64_t		flags2)
> +{
> +	/* superblock rocompat feature flag */
> +	if (!xfs_has_forcealign(mp))
> +		return __this_address;
> +
> +	/* Only regular files and directories */
> +	if (!S_ISDIR(mode) && !S_ISREG(mode))
> +		return __this_address;
> +
> +	/* We require EXTSIZE or EXTSZINHERIT */
> +	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
> +		return __this_address;
> +
> +	/* We require a non-zero extsize */
> +	if (!extsize)
> +		return __this_address;
> +
> +	/* Reflink'ed disallowed */
> +	if (flags2 & XFS_DIFLAG2_REFLINK)
> +		return __this_address;

This is still a significant limitation to end up encoding in the ondisk
format.  Given that we may some day fix that limitation by teaching
pagecache writeback how to do writeback on entire allocation units,
I think for now you should refuse to mount any filesystem with reflink
and forcealign enabled at the same time.

> +
> +	/* COW extsize disallowed */
> +	if (flags2 & XFS_DIFLAG2_COWEXTSIZE)
> +		return __this_address;
> +
> +	if (cowextsize)
> +		return __this_address;
> +
> +	/* A RT device with sb_rextsize=1 could make use of forcealign */
> +	if (flags & XFS_DIFLAG_REALTIME && mp->m_sb.sb_rextsize != 1)
> +		return __this_address;

Ok, so forcealign and bigrtalloc are not compatible?  I would have
thought they would be ok together since the rest of your code changes
short circuit into the forcealign case before the bigrtalloc case.
All you need here is to validate that i_extsize % sb_rextsize == 0.

> +
> +	return NULL;

You don't check that i_extsize % sb_agblocks == 0 for non-rt files
because that is only required for atomic writes + forcealign, right?

> +}
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h
> index 585ed5a110af..b8b65287b037 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.h
> +++ b/fs/xfs/libxfs/xfs_inode_buf.h
> @@ -33,6 +33,9 @@ xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp,
>  xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp,
>  		uint32_t cowextsize, uint16_t mode, uint16_t flags,
>  		uint64_t flags2);
> +xfs_failaddr_t xfs_inode_validate_forcealign(struct xfs_mount *mp,
> +		uint32_t extsize, uint32_t cowextsize, uint16_t mode,
> +		uint16_t flags, uint64_t flags2);
>  
>  static inline uint64_t xfs_inode_encode_bigtime(struct timespec64 tv)
>  {
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 6b56f0f6d4c1..e56911553edd 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -164,6 +164,8 @@ xfs_sb_version_to_features(
>  		features |= XFS_FEAT_REFLINK;
>  	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
>  		features |= XFS_FEAT_INOBTCNT;
> +	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
> +		features |= XFS_FEAT_FORCEALIGN;
>  	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
>  		features |= XFS_FEAT_FTYPE;
>  	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index f36091e1e7f5..994fb7e184d9 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -608,6 +608,8 @@ xfs_ip2xflags(
>  			flags |= FS_XFLAG_DAX;
>  		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
>  			flags |= FS_XFLAG_COWEXTSIZE;
> +		if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
> +			flags |= FS_XFLAG_FORCEALIGN;
>  	}
>  
>  	if (xfs_inode_has_attr_fork(ip))
> @@ -737,6 +739,8 @@ xfs_inode_inherit_flags2(
>  	}
>  	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
>  		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
> +	if (pip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
> +		ip->i_diflags2 |= XFS_DIFLAG2_FORCEALIGN;
>  
>  	/* Don't let invalid cowextsize hints propagate. */
>  	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
> @@ -745,6 +749,15 @@ xfs_inode_inherit_flags2(
>  		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
>  		ip->i_cowextsize = 0;
>  	}
> +
> +	if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN) {
> +		failaddr = xfs_inode_validate_forcealign(ip->i_mount,
> +				ip->i_extsize, ip->i_cowextsize,
> +				VFS_I(ip)->i_mode, ip->i_diflags,
> +				ip->i_diflags2);
> +		if (failaddr)
> +			ip->i_diflags2 &= ~XFS_DIFLAG2_FORCEALIGN;
> +	}
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 42f999c1106c..536e646dd055 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -301,6 +301,16 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
>  	return ip->i_cowfp && ip->i_cowfp->if_bytes;
>  }
>  
> +static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
> +{
> +	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
> +}
> +
> +static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
> +{
> +	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
> +}
> +
>  static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
>  {
>  	return ip->i_diflags2 & XFS_DIFLAG2_BIGTIME;
> @@ -313,7 +323,15 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
>  
>  static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
>  {
> -	return false;
> +	if (!(ip->i_diflags & XFS_DIFLAG_EXTSIZE))
> +		return false;
> +	if (ip->i_extsize <= 1)
> +		return false;
> +	if (xfs_is_cow_inode(ip))
> +		return false;
> +	if (ip->i_diflags & XFS_DIFLAG_REALTIME)
> +		return false;
> +	return ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN;

In theory we already validated all of these fields when we loaded the
inode, right?  In which case you only need to check diflags2.

>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index f0117188f302..5eff8fd9fa3e 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -525,10 +525,48 @@ xfs_flags2diflags2(
>  		di_flags2 |= XFS_DIFLAG2_DAX;
>  	if (xflags & FS_XFLAG_COWEXTSIZE)
>  		di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
> +	if (xflags & FS_XFLAG_FORCEALIGN)
> +		di_flags2 |= XFS_DIFLAG2_FORCEALIGN;
>  
>  	return di_flags2;
>  }
>  
> +/*
> + * Forcealign requires a non-zero extent size hint and a zero cow
> + * extent size hint.  Don't allow set for RT files yet.
> + */
> +static int
> +xfs_ioctl_setattr_forcealign(
> +	struct xfs_inode	*ip,
> +	struct fileattr		*fa)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +
> +	if (!xfs_has_forcealign(mp))
> +		return -EINVAL;
> +
> +	if (xfs_is_reflink_inode(ip))
> +		return -EINVAL;
> +
> +	if (!(fa->fsx_xflags & (FS_XFLAG_EXTSIZE |
> +				FS_XFLAG_EXTSZINHERIT)))
> +		return -EINVAL;
> +
> +	if (fa->fsx_xflags & FS_XFLAG_COWEXTSIZE)
> +		return -EINVAL;
> +
> +	if (!fa->fsx_extsize)
> +		return -EINVAL;
> +
> +	if (fa->fsx_cowextsize)
> +		return -EINVAL;
> +
> +	if (fa->fsx_xflags & FS_XFLAG_REALTIME)
> +		return -EINVAL;

The inode verifier allows realtime files so long as sb_rextsize is
nonzero.

> +
> +	return 0;
> +}
> +
>  static int
>  xfs_ioctl_setattr_xflags(
>  	struct xfs_trans	*tp,
> @@ -537,10 +575,12 @@ xfs_ioctl_setattr_xflags(
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	bool			rtflag = (fa->fsx_xflags & FS_XFLAG_REALTIME);
> +	bool			forcealign = fa->fsx_xflags & FS_XFLAG_FORCEALIGN;
>  	uint64_t		i_flags2;
>  
> -	if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
> -		/* Can't change realtime flag if any extents are allocated. */
> +	/* Can't change RT or forcealign flags if any extents are allocated. */
> +	if (rtflag != XFS_IS_REALTIME_INODE(ip) ||
> +	    forcealign != xfs_inode_has_forcealign(ip)) {
>  		if (ip->i_df.if_nextents || ip->i_delayed_blks)
>  			return -EINVAL;
>  	}
> @@ -561,6 +601,9 @@ xfs_ioctl_setattr_xflags(
>  	if (i_flags2 && !xfs_has_v3inodes(mp))
>  		return -EINVAL;
>  
> +	if (forcealign && (xfs_ioctl_setattr_forcealign(ip, fa) < 0))
> +		return -EINVAL;

Either make xfs_ioctl_setattr_forcealign return a boolean, or extract
the error code and return that; don't just squash
xfs_ioctl_setattr_forcealign's negative errno into -EINVAL.

> +
>  	ip->i_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
>  	ip->i_diflags2 = i_flags2;
>  
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index d0567dfbc036..30228fea908d 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -299,6 +299,7 @@ typedef struct xfs_mount {
>  #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
>  #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
>  #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
> +#define XFS_FEAT_FORCEALIGN	(1ULL << 28)	/* aligned file data extents */
>  
>  /* Mount features */
>  #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
> @@ -385,6 +386,7 @@ __XFS_ADD_V4_FEAT(projid32, PROJID32)
>  __XFS_HAS_V4_FEAT(v3inodes, V3INODES)
>  __XFS_HAS_V4_FEAT(crc, CRC)
>  __XFS_HAS_V4_FEAT(pquotino, PQUOTINO)
> +__XFS_HAS_FEAT(forcealign, FORCEALIGN)
>  
>  /*
>   * Mount features
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 65c5dfe17ecf..fb55e4ce49fa 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -6,16 +6,6 @@
>  #ifndef __XFS_REFLINK_H
>  #define __XFS_REFLINK_H 1
>  
> -static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
> -{
> -	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
> -}
> -
> -static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
> -{
> -	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
> -}
> -
>  extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
>  		struct xfs_bmbt_irec *irec, bool *shared);
>  int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 27e9f749c4c7..852bbfb21506 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1721,6 +1721,10 @@ xfs_fs_fill_super(
>  		mp->m_features &= ~XFS_FEAT_DISCARD;
>  	}
>  
> +	if (xfs_has_forcealign(mp))
> +		xfs_warn(mp,
> +"EXPERIMENTAL forced data extent alignment feature in use. Use at your own risk!");

Here would be the place to fail the mount if reflink is enabled.

--D

> +
>  	if (xfs_has_reflink(mp)) {
>  		if (mp->m_sb.sb_rblocks) {
>  			xfs_alert(mp,
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 45e4e64fd664..8af304c0e29a 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -158,6 +158,8 @@ struct fsxattr {
>  #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
>  #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
>  #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
> +/* data extent mappings for regular files must be aligned to extent size hint */
> +#define FS_XFLAG_FORCEALIGN	0x00020000
>  #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
>  
>  /* the read-only stuff doesn't really belong here, but any other place is
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 08/13] xfs: Do not free EOF blocks for forcealign
  2024-06-21 10:05 ` [PATCH 08/13] xfs: Do not free EOF blocks for forcealign John Garry
@ 2024-06-21 19:08   ` Darrick J. Wong
  2024-06-24 15:04     ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 19:08 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:35AM +0000, John Garry wrote:
> For when forcealign is enabled, we want the EOF to be aligned as well, so
> do not free EOF blocks.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/xfs_bmap_util.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index e5d893f93522..56b80a7c0992 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -539,8 +539,13 @@ xfs_can_free_eofblocks(
>  	 * forever.
>  	 */
>  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> -	if (xfs_inode_has_bigrtalloc(ip))
> +
> +	/* Do not free blocks when forcing extent sizes */

"Only try to free blocks beyond the allocation unit that crosses EOF" ?

Otherwise seems fine to me
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +	if (xfs_inode_has_forcealign(ip))
> +		end_fsb = roundup_64(end_fsb, ip->i_extsize);
> +	else if (xfs_inode_has_bigrtalloc(ip))
>  		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
> +
>  	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
>  	if (last_fsb <= end_fsb)
>  		return false;
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 10/13] xfs: Unmap blocks according to forcealign
  2024-06-21 10:05 ` [PATCH 10/13] xfs: Unmap blocks according to forcealign John Garry
@ 2024-06-21 19:12   ` Darrick J. Wong
  2024-06-24 15:12     ` John Garry
  0 siblings, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 19:12 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:37AM +0000, John Garry wrote:
> For when forcealign is enabled, blocks in an inode need to be unmapped
> according to extent alignment, like what is already done for rtvol.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 38 +++++++++++++++++++++++++++++++++-----
>  1 file changed, 33 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index c9cf138e13c4..ebeb2969b289 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5380,6 +5380,25 @@ xfs_bmap_del_extent_real(
>  	return 0;
>  }
>  
> +static xfs_extlen_t
> +xfs_bunmapi_align(
> +	struct xfs_inode	*ip,
> +	xfs_fsblock_t		bno)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +	xfs_agblock_t		agbno;
> +
> +	if (xfs_inode_has_forcealign(ip)) {
> +		if (is_power_of_2(ip->i_extsize))
> +			return bno & (ip->i_extsize - 1);
> +
> +		agbno = XFS_FSB_TO_AGBNO(mp, bno);
> +		return do_div(agbno, ip->i_extsize);

Huh.  The inode verifier allows realtime forcealign files, but this code
will not handle that properly.  Either don't allow realtime files, or
make this handle them correctly:

	if (XFS_IS_REALTIME_INODE(ip)) {
		if (xfs_inode_has_forcealign(ip))
			return offset_in_block(bno, ip->i_extsize);
		return xfs_rtb_to_rtxoff(ip->i_mount, bno);
	} else if (xfs_inode_has_forcealign(ip)) {
		xfs_agblock_t	agbno = XFS_FSB_TO_AGBNO(mp, bno);

		return offset_in_block(agbno, ip->i_extsize);
	}

	return 1; /* or assert, or whatever */

> +	}
> +	ASSERT(XFS_IS_REALTIME_INODE(ip));
> +	return xfs_rtb_to_rtxoff(ip->i_mount, bno);
> +}
> +
>  /*
>   * Unmap (remove) blocks from a file.
>   * If nexts is nonzero then the number of extents to remove is limited to
> @@ -5402,6 +5421,7 @@ __xfs_bunmapi(
>  	struct xfs_bmbt_irec	got;		/* current extent record */
>  	struct xfs_ifork	*ifp;		/* inode fork pointer */
>  	int			isrt;		/* freeing in rt area */
> +	int			isforcealign;	/* freeing for inode with forcealign */
>  	int			logflags;	/* transaction logging flags */
>  	xfs_extlen_t		mod;		/* rt extent offset */
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -5439,6 +5459,8 @@ __xfs_bunmapi(
>  	}
>  	XFS_STATS_INC(mp, xs_blk_unmap);
>  	isrt = xfs_ifork_is_realtime(ip, whichfork);
> +	isforcealign = (whichfork != XFS_ATTR_FORK) &&
> +			xfs_inode_has_forcealign(ip);
>  	end = start + len;
>  
>  	if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) {
> @@ -5490,11 +5512,10 @@ __xfs_bunmapi(
>  		if (del.br_startoff + del.br_blockcount > end + 1)
>  			del.br_blockcount = end + 1 - del.br_startoff;
>  
> -		if (!isrt || (flags & XFS_BMAPI_REMAP))
> +		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
>  			goto delete;
>  
> -		mod = xfs_rtb_to_rtxoff(mp,
> -				del.br_startblock + del.br_blockcount);
> +		mod = xfs_bunmapi_align(ip, del.br_startblock + del.br_blockcount);
>  		if (mod) {
>  			/*
>  			 * Realtime extent not lined up at the end.

"Not aligned to allocation unit on the end." ?

> @@ -5542,9 +5563,16 @@ __xfs_bunmapi(
>  			goto nodelete;
>  		}
>  
> -		mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
> +		mod = xfs_bunmapi_align(ip, del.br_startblock);
>  		if (mod) {
> -			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
> +			xfs_extlen_t off;
> +
> +			if (isforcealign) {
> +				off = ip->i_extsize - mod;
> +			} else {
> +				ASSERT(isrt);
> +				off = mp->m_sb.sb_rextsize - mod;
> +			}
>  
>  			/*
>  			 * Realtime extent is lined up at the end but not

Same here -- now this code is handling more than just rt extents.

--D

> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 11/13] xfs: Only free full extents for forcealign
  2024-06-21 10:05 ` [PATCH 11/13] xfs: Only free full extents for forcealign John Garry
@ 2024-06-21 19:13   ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 19:13 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:38AM +0000, John Garry wrote:
> Like we already do for rtvol, only free full extents for forcealign in
> xfs_free_file_space().
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Seems fine to me.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_bmap_util.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 56b80a7c0992..ee767a4fd63a 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -842,8 +842,11 @@ xfs_free_file_space(
>  	startoffset_fsb = XFS_B_TO_FSB(mp, offset);
>  	endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
>  
> -	/* We can only free complete realtime extents. */
> -	if (xfs_inode_has_bigrtalloc(ip)) {
> +	/* Free only complete extents. */
> +	if (xfs_inode_has_forcealign(ip)) {
> +		startoffset_fsb = roundup_64(startoffset_fsb, ip->i_extsize);
> +		endoffset_fsb = rounddown_64(endoffset_fsb, ip->i_extsize);
> +	} else if (xfs_inode_has_bigrtalloc(ip)) {
>  		startoffset_fsb = xfs_rtb_roundup_rtx(mp, startoffset_fsb);
>  		endoffset_fsb = xfs_rtb_rounddown_rtx(mp, endoffset_fsb);
>  	}
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 12/13] xfs: Don't revert allocated offset for forcealign
  2024-06-21 10:05 ` [PATCH 12/13] xfs: Don't revert allocated offset " John Garry
@ 2024-06-21 19:13   ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 19:13 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:39AM +0000, John Garry wrote:
> In xfs_bmap_process_allocated_extent(), for when we found that we could not
> provide the requested length completely, the mapping is moved so that we
> can provide as much as possible for the original request.
> 
> For forcealign, this would mean ignoring alignment guaranteed, so don't do
> this.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Makes sense,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index ebeb2969b289..42f3582c1574 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3492,11 +3492,15 @@ xfs_bmap_process_allocated_extent(
>  	 * original request as possible.  Free space is apparently
>  	 * very fragmented so we're unlikely to be able to satisfy the
>  	 * hints anyway.
> +	 * However, for an inode with forcealign, continue with the
> +	 * found offset as we need to honour the alignment hint.
>  	 */
> -	if (ap->length <= orig_length)
> -		ap->offset = orig_offset;
> -	else if (ap->offset + ap->length < orig_offset + orig_length)
> -		ap->offset = orig_offset + orig_length - ap->length;
> +	if (!xfs_inode_has_forcealign(ap->ip)) {
> +		if (ap->length <= orig_length)
> +			ap->offset = orig_offset;
> +		else if (ap->offset + ap->length < orig_offset + orig_length)
> +			ap->offset = orig_offset + orig_length - ap->length;
> +	}
>  	xfs_bmap_alloc_account(ap);
>  }
>  
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/13] xfs: only allow minlen allocations when near ENOSPC
  2024-06-21 10:05 ` [PATCH 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
@ 2024-06-21 19:42   ` Darrick J. Wong
  2024-06-21 20:04     ` Darrick J. Wong
  0 siblings, 1 reply; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 19:42 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:28AM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we are near ENOSPC and don't have enough free
> space for an args->maxlen allocation, xfs_alloc_space_available()
> will trim args->maxlen to equal the available space. However, this
> function has only checked that there is enough contiguous free space
> for an aligned args->minlen allocation to succeed. Hence there is no
> guarantee that an args->maxlen allocation will succeed, nor that the
> available space will allow for correct alignment of an args->maxlen
> allocation.
> 
> Further, by trimming args->maxlen arbitrarily, it breaks an
> assumption made in xfs_alloc_fix_len() that if the caller wants
> aligned allocation, then args->maxlen will be set to an aligned
> value. It then skips the tail alignment and so we end up with
> extents that aren't aligned to extent size hint boundaries as we
> approach ENOSPC.
> 
> To avoid this problem, don't reduce args->maxlen by some random,
> arbitrary amount. If args->maxlen is too large for the available
> space, reduce the allocation to a minlen allocation as we know we
> have contiguous free space available for this to succeed and always
> be correctly aligned.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 6c55a6e88eba..5855a21d4864 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2409,14 +2409,23 @@ xfs_alloc_space_available(
>  	if (available < (int)max(args->total, alloc_len))
>  		return false;
>  
> +	if (flags & XFS_ALLOC_FLAG_CHECK)
> +		return true;
> +
>  	/*
> -	 * Clamp maxlen to the amount of free space available for the actual
> -	 * extent allocation.
> +	 * If we can't do a maxlen allocation, then we must reduce the size of
> +	 * the allocation to match the available free space. We know how big
> +	 * the largest contiguous free space we can allocate is, so that's our
> +	 * upper bound. However, we don't exaclty know what alignment/size
> +	 * constraints have been placed on the allocation, so we can't
> +	 * arbitrarily select some new max size. Hence make this a minlen
> +	 * allocation as we know that will definitely succeed and match the
> +	 * callers alignment constraints.
>  	 */
> -	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
> -		args->maxlen = available;
> +	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;

Didn't we already calculate alloc_len identically under "do we have
enough contiguous free space for the allocation?"?  AFAICT we haven't
alter anything in @args since then, right?

> +	if (longest < alloc_len) {
> +		args->maxlen = args->minlen;

Is it possible to reduce maxlen the largest multiple of the alignment
that is still less than @longest?

--D

>  		ASSERT(args->maxlen > 0);
> -		ASSERT(args->maxlen >= args->minlen);
>  	}
>  
>  	return true;
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/13] xfs: always tail align maxlen allocations
  2024-06-21 10:05 ` [PATCH 02/13] xfs: always tail align maxlen allocations John Garry
@ 2024-06-21 19:50   ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 19:50 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:29AM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we do a large allocation, the core free space allocation code
> assumes that args->maxlen is aligned to args->prod/args->mod. hence
> if we get a maximum sized extent allocated, it does not do tail
> alignment of the extent.
> 
> However, this assumes that nothing modifies args->maxlen between the
> original allocation context setup and trimming the selected free
> space extent to size. This assumption has recently been found to be
> invalid - xfs_alloc_space_available() modifies args->maxlen in low
> space situations - and there may be more situations we haven't yet
> found like this.
> 
> Force aligned allocation introduces the requirement that extents are
> correctly tail aligned, resulting in this occasional latent
> alignment failure to e reclassified from an unimportant curiousity

                    to be

> to a must-fix bug.
> 
> Removing the assumption about args->maxlen allocations always being
> tail aligned is trivial, and should not impact anything because
> args->maxlen for inodes with extent size hints configured are
> already aligned. Hence all this change does it avoid weird corner
> cases that would have resulted in unaligned extent sizes by always
> trimming the extent down to an aligned size.

IOWs, we always trim rlen, unless there is no alignment (prod==1) or
rlen is less than mod.  For a forcealign file, it should never be the
case that minlen < mod because we'll have returned ENOSPC, right?

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>

If the answer is 'yes' and the typo gets fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 5855a21d4864..32f72217c126 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -432,20 +432,18 @@ xfs_alloc_compute_diff(
>   * Fix up the length, based on mod and prod.
>   * len should be k * prod + mod for some k.
>   * If len is too small it is returned unchanged.
> - * If len hits maxlen it is left alone.
>   */
> -STATIC void
> +static void
>  xfs_alloc_fix_len(
> -	xfs_alloc_arg_t	*args)		/* allocation argument structure */
> +	struct xfs_alloc_arg	*args)
>  {
> -	xfs_extlen_t	k;
> -	xfs_extlen_t	rlen;
> +	xfs_extlen_t		k;
> +	xfs_extlen_t		rlen = args->len;
>  
>  	ASSERT(args->mod < args->prod);
> -	rlen = args->len;
>  	ASSERT(rlen >= args->minlen);
>  	ASSERT(rlen <= args->maxlen);
> -	if (args->prod <= 1 || rlen < args->mod || rlen == args->maxlen ||
> +	if (args->prod <= 1 || rlen < args->mod ||
>  	    (args->mod == 0 && rlen < args->prod))
>  		return;
>  	k = rlen % args->prod;
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/13] xfs: only allow minlen allocations when near ENOSPC
  2024-06-21 19:42   ` Darrick J. Wong
@ 2024-06-21 20:04     ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 20:04 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 12:42:25PM -0700, Darrick J. Wong wrote:
> On Fri, Jun 21, 2024 at 10:05:28AM +0000, John Garry wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When we are near ENOSPC and don't have enough free
> > space for an args->maxlen allocation, xfs_alloc_space_available()
> > will trim args->maxlen to equal the available space. However, this
> > function has only checked that there is enough contiguous free space
> > for an aligned args->minlen allocation to succeed. Hence there is no
> > guarantee that an args->maxlen allocation will succeed, nor that the
> > available space will allow for correct alignment of an args->maxlen
> > allocation.
> > 
> > Further, by trimming args->maxlen arbitrarily, it breaks an
> > assumption made in xfs_alloc_fix_len() that if the caller wants
> > aligned allocation, then args->maxlen will be set to an aligned
> > value. It then skips the tail alignment and so we end up with
> > extents that aren't aligned to extent size hint boundaries as we
> > approach ENOSPC.
> > 
> > To avoid this problem, don't reduce args->maxlen by some random,
> > arbitrary amount. If args->maxlen is too large for the available
> > space, reduce the allocation to a minlen allocation as we know we
> > have contiguous free space available for this to succeed and always
> > be correctly aligned.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 6c55a6e88eba..5855a21d4864 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -2409,14 +2409,23 @@ xfs_alloc_space_available(
> >  	if (available < (int)max(args->total, alloc_len))
> >  		return false;
> >  
> > +	if (flags & XFS_ALLOC_FLAG_CHECK)
> > +		return true;
> > +
> >  	/*
> > -	 * Clamp maxlen to the amount of free space available for the actual
> > -	 * extent allocation.
> > +	 * If we can't do a maxlen allocation, then we must reduce the size of
> > +	 * the allocation to match the available free space. We know how big
> > +	 * the largest contiguous free space we can allocate is, so that's our
> > +	 * upper bound. However, we don't exaclty know what alignment/size
> > +	 * constraints have been placed on the allocation, so we can't
> > +	 * arbitrarily select some new max size. Hence make this a minlen
> > +	 * allocation as we know that will definitely succeed and match the
> > +	 * callers alignment constraints.
> >  	 */
> > -	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
> > -		args->maxlen = available;
> > +	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
> 
> Didn't we already calculate alloc_len identically under "do we have
> enough contiguous free space for the allocation?"?  AFAICT we haven't
> alter anything in @args since then, right?

Oops, the first computation uses minlen, whereas this one uses maxlen.
Disregard this question, please.

--D

> > +	if (longest < alloc_len) {
> > +		args->maxlen = args->minlen;
> 
> Is it possible to reduce maxlen the largest multiple of the alignment
> that is still less than @longest?
> 
> --D
> 
> >  		ASSERT(args->maxlen > 0);
> > -		ASSERT(args->maxlen >= args->minlen);
> >  	}
> >  
> >  	return true;
> > -- 
> > 2.31.1
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/13] xfs: simplify extent allocation alignment
  2024-06-21 10:05 ` [PATCH 03/13] xfs: simplify extent allocation alignment John Garry
@ 2024-06-21 20:29   ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 20:29 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:30AM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently align extent allocation to stripe unit or stripe width.
> That is specified by an external parameter to the allocation code,
> which then manipulates the xfs_alloc_args alignment configuration in
> interesting ways.

Is this the "stripe_align" wrangling?  That has confused me in the
past...

> The args->alignment field specifies extent start alignment, but
> because we may be attempting non-aligned allocation first there are
> also slop variables that allow for those allocation attempts to
> account for aligned allocation if they fail.
> 
> This gets much more complex as we introduce forced allocation
> alignment, where extent size hints are used to generate the extent
> start alignment. extent size hints currently only affect extent
> lengths (via args->prod and args->mod) and so with this change we
> will have two different start alignment conditions.
> 
> Avoid this complexity by always using args->alignment to indicate
> extent start alignment, and always using args->prod/mod to indicate
> extent length adjustment.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> [jpg: fixup alignslop references in xfs_trace.h and xfs_ialloc.c]
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c  |  4 +-
>  fs/xfs/libxfs/xfs_alloc.h  |  2 +-
>  fs/xfs/libxfs/xfs_bmap.c   | 96 +++++++++++++++++---------------------
>  fs/xfs/libxfs/xfs_ialloc.c | 10 ++--
>  fs/xfs/xfs_trace.h         |  8 ++--
>  5 files changed, 54 insertions(+), 66 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 32f72217c126..35fbd6b19682 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2391,7 +2391,7 @@ xfs_alloc_space_available(
>  	reservation = xfs_ag_resv_needed(pag, args->resv);
>  
>  	/* do we have enough contiguous free space for the allocation? */
> -	alloc_len = args->minlen + (args->alignment - 1) + args->minalignslop;
> +	alloc_len = args->minlen + (args->alignment - 1) + args->alignslop;
>  	longest = xfs_alloc_longest_free_extent(pag, min_free, reservation);
>  	if (longest < alloc_len)
>  		return false;
> @@ -2420,7 +2420,7 @@ xfs_alloc_space_available(
>  	 * allocation as we know that will definitely succeed and match the
>  	 * callers alignment constraints.
>  	 */
> -	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
> +	alloc_len = args->maxlen + (args->alignment - 1) + args->alignslop;
>  	if (longest < alloc_len) {
>  		args->maxlen = args->minlen;
>  		ASSERT(args->maxlen > 0);
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 0b956f8b9d5a..aa2c103d98f0 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
>  	xfs_extlen_t	minleft;	/* min blocks must be left after us */
>  	xfs_extlen_t	total;		/* total blocks needed in xaction */
>  	xfs_extlen_t	alignment;	/* align answer to multiple of this */
> -	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
> +	xfs_extlen_t	alignslop;	/* slop for alignment calcs */

Er... what is "slop", exactly?  I've never felt like I understood how
the aligned allocation code works.  AFAICT, the rough idea is that we'll
search the free space for an extent that is at least (len + alignment)
blocks long, and then pick whatever subset of that extent satisfies the
alignment requirements?

But that's the 30,000ft version -- sometimes for the sake of contiguity,
we want to try for an allocation that starts at an *exact* location.
Contiguity is always important for storage allocation, so we're willing
to forego the alignment of the start of the free space, but we should
still try to align the end of the allocation.  Therefore, we add this
"alignslop" parameter and look for an AG with a free extent that is at
least (len + alignment - 1 + alignslop) blocks long?  And then we pick
the subset that satisfies us?

>  	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
>  	xfs_agblock_t	max_agbno;	/* ... */
>  	xfs_extlen_t	len;		/* output: actual size of extent */
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index c101cf266bc4..7f8c8e4dd244 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3285,6 +3285,10 @@ xfs_bmap_select_minlen(
>  	xfs_extlen_t		blen)
>  {
>  
> +	/* Adjust best length for extent start alignment. */
> +	if (blen > args->alignment)
> +		blen -= args->alignment;

But why?  I guess this is artifact of moving stripe_align into
args->alignment?  So now we actually need to subtract out the alignment
when figuring out how to set minlen?

<confused>

> +
>  	/*
>  	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
>  	 * possible that there is enough contiguous free space for this request.
> @@ -3300,6 +3304,7 @@ xfs_bmap_select_minlen(
>  	if (blen < args->maxlen)
>  		return blen;
>  	return args->maxlen;
> +

Nit: unnecessary whitespace.

>  }
>  
>  static int
> @@ -3393,35 +3398,43 @@ xfs_bmap_alloc_account(
>  	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, fld, ap->length);
>  }
>  
> -static int
> +/*
> + * Calculate the extent start alignment and the extent length adjustments that
> + * constrain this allocation.
> + *
> + * Extent start alignment is currently determined by stripe configuration and is
> + * carried in args->alignment, whilst extent length adjustment is determined by
> + * extent size hints and is carried by args->prod and args->mod.
> + *
> + * Low level allocation code is free to either ignore or override these values
> + * as required.
> + */
> +static void
>  xfs_bmap_compute_alignments(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	xfs_extlen_t		align = 0; /* minimum allocation alignment */
> -	int			stripe_align = 0;
>  
>  	/* stripe alignment for allocation is determined by mount parameters */
>  	if (mp->m_swidth && xfs_has_swalloc(mp))
> -		stripe_align = mp->m_swidth;
> +		args->alignment = mp->m_swidth;
>  	else if (mp->m_dalign)
> -		stripe_align = mp->m_dalign;
> +		args->alignment = mp->m_dalign;
>  
>  	if (ap->flags & XFS_BMAPI_COWFORK)
>  		align = xfs_get_cowextsz_hint(ap->ip);
>  	else if (ap->datatype & XFS_ALLOC_USERDATA)
>  		align = xfs_get_extsz_hint(ap->ip);
> +
>  	if (align) {
>  		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
>  					ap->eof, 0, ap->conv, &ap->offset,
>  					&ap->length))
>  			ASSERT(0);
>  		ASSERT(ap->length);
> -	}
>  
> -	/* apply extent size hints if obtained earlier */
> -	if (align) {
>  		args->prod = align;
>  		div_u64_rem(ap->offset, args->prod, &args->mod);
>  		if (args->mod)
> @@ -3436,7 +3449,6 @@ xfs_bmap_compute_alignments(
>  			args->mod = args->prod - args->mod;
>  	}
>  
> -	return stripe_align;
>  }
>  
>  static void
> @@ -3508,7 +3520,7 @@ xfs_bmap_exact_minlen_extent_alloc(
>  	args.total = ap->total;
>  
>  	args.alignment = 1;
> -	args.minalignslop = 0;
> +	args.alignslop = 0;
>  
>  	args.minleft = ap->minleft;
>  	args.wasdel = ap->wasdel;
> @@ -3548,7 +3560,6 @@ xfs_bmap_btalloc_at_eof(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args,
>  	xfs_extlen_t		blen,
> -	int			stripe_align,
>  	bool			ag_only)
>  {
>  	struct xfs_mount	*mp = args->mp;
> @@ -3562,23 +3573,15 @@ xfs_bmap_btalloc_at_eof(
>  	 * allocation.
>  	 */
>  	if (ap->offset) {
> -		xfs_extlen_t	nextminlen = 0;
> +		xfs_extlen_t	alignment = args->alignment;
>  
>  		/*
> -		 * Compute the minlen+alignment for the next case.  Set slop so
> -		 * that the value of minlen+alignment+slop doesn't go up between
> -		 * the calls.
> +		 * Compute the alignment slop for the fallback path so we ensure
> +		 * we account for the potential alignemnt space required by the

Spelling:                                       alignment

> +		 * fallback paths before we modify the AGF and AGFL here.
>  		 */
>  		args->alignment = 1;
> -		if (blen > stripe_align && blen <= args->maxlen)
> -			nextminlen = blen - stripe_align;
> -		else
> -			nextminlen = args->minlen;
> -		if (nextminlen + stripe_align > args->minlen + 1)
> -			args->minalignslop = nextminlen + stripe_align -
> -					args->minlen - 1;
> -		else
> -			args->minalignslop = 0;
> +		args->alignslop = alignment - args->alignment;

I think this is easier to follow, assuming I'm not just completely
confused. :/

--D

>  
>  		if (!caller_pag)
>  			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> @@ -3596,19 +3599,8 @@ xfs_bmap_btalloc_at_eof(
>  		 * Exact allocation failed. Reset to try an aligned allocation
>  		 * according to the original allocation specification.
>  		 */
> -		args->alignment = stripe_align;
> -		args->minlen = nextminlen;
> -		args->minalignslop = 0;
> -	} else {
> -		/*
> -		 * Adjust minlen to try and preserve alignment if we
> -		 * can't guarantee an aligned maxlen extent.
> -		 */
> -		args->alignment = stripe_align;
> -		if (blen > args->alignment &&
> -		    blen <= args->maxlen + args->alignment)
> -			args->minlen = blen - args->alignment;
> -		args->minalignslop = 0;
> +		args->alignment = alignment;
> +		args->alignslop = 0;
>  	}
>  
>  	if (ag_only) {
> @@ -3626,9 +3618,8 @@ xfs_bmap_btalloc_at_eof(
>  		return 0;
>  
>  	/*
> -	 * Allocation failed, so turn return the allocation args to their
> -	 * original non-aligned state so the caller can proceed on allocation
> -	 * failure as if this function was never called.
> +	 * Aligned allocation failed, so all fallback paths from here drop the
> +	 * start alignment requirement as we know it will not succeed.
>  	 */
>  	args->alignment = 1;
>  	return 0;
> @@ -3636,7 +3627,9 @@ xfs_bmap_btalloc_at_eof(
>  
>  /*
>   * We have failed multiple allocation attempts so now are in a low space
> - * allocation situation. Try a locality first full filesystem minimum length
> + * allocation situation. We give up on any attempt at aligned allocation here.
> + *
> + * Try a locality first full filesystem minimum length
>   * allocation whilst still maintaining necessary total block reservation
>   * requirements.
>   *
> @@ -3653,6 +3646,7 @@ xfs_bmap_btalloc_low_space(
>  {
>  	int			error;
>  
> +	args->alignment = 1;
>  	if (args->minlen > ap->minlen) {
>  		args->minlen = ap->minlen;
>  		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> @@ -3672,13 +3666,11 @@ xfs_bmap_btalloc_low_space(
>  static int
>  xfs_bmap_btalloc_filestreams(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	int			stripe_align)
> +	struct xfs_alloc_arg	*args)
>  {
>  	xfs_extlen_t		blen = 0;
>  	int			error = 0;
>  
> -
>  	error = xfs_filestream_select_ag(ap, args, &blen);
>  	if (error)
>  		return error;
> @@ -3697,8 +3689,7 @@ xfs_bmap_btalloc_filestreams(
>  
>  	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
>  	if (ap->aeof)
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
> -				true);
> +		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
>  
>  	if (!error && args->fsbno == NULLFSBLOCK)
>  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> @@ -3722,8 +3713,7 @@ xfs_bmap_btalloc_filestreams(
>  static int
>  xfs_bmap_btalloc_best_length(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	int			stripe_align)
> +	struct xfs_alloc_arg	*args)
>  {
>  	xfs_extlen_t		blen = 0;
>  	int			error;
> @@ -3747,8 +3737,7 @@ xfs_bmap_btalloc_best_length(
>  	 * trying.
>  	 */
>  	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
> -				false);
> +		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
>  		if (error || args->fsbno != NULLFSBLOCK)
>  			return error;
>  	}
> @@ -3775,27 +3764,26 @@ xfs_bmap_btalloc(
>  		.resv		= XFS_AG_RESV_NONE,
>  		.datatype	= ap->datatype,
>  		.alignment	= 1,
> -		.minalignslop	= 0,
> +		.alignslop	= 0,
>  	};
>  	xfs_fileoff_t		orig_offset;
>  	xfs_extlen_t		orig_length;
>  	int			error;
> -	int			stripe_align;
>  
>  	ASSERT(ap->length);
>  	orig_offset = ap->offset;
>  	orig_length = ap->length;
>  
> -	stripe_align = xfs_bmap_compute_alignments(ap, &args);
> +	xfs_bmap_compute_alignments(ap, &args);
>  
>  	/* Trim the allocation back to the maximum an AG can fit. */
>  	args.maxlen = min(ap->length, mp->m_ag_max_usable);
>  
>  	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
>  	    xfs_inode_is_filestream(ap->ip))
> -		error = xfs_bmap_btalloc_filestreams(ap, &args, stripe_align);
> +		error = xfs_bmap_btalloc_filestreams(ap, &args);
>  	else
> -		error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align);
> +		error = xfs_bmap_btalloc_best_length(ap, &args);
>  	if (error)
>  		return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 14c81f227c5b..9f71a9a3a65e 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -758,12 +758,12 @@ xfs_ialloc_ag_alloc(
>  		 *
>  		 * For an exact allocation, alignment must be 1,
>  		 * however we need to take cluster alignment into account when
> -		 * fixing up the freelist. Use the minalignslop field to
> -		 * indicate that extra blocks might be required for alignment,
> -		 * but not to use them in the actual exact allocation.
> +		 * fixing up the freelist. Use the alignslop field to indicate
> +		 * that extra blocks might be required for alignment, but not
> +		 * to use them in the actual exact allocation.
>  		 */
>  		args.alignment = 1;
> -		args.minalignslop = igeo->cluster_align - 1;
> +		args.alignslop = igeo->cluster_align - 1;
>  
>  		/* Allow space for the inode btree to split. */
>  		args.minleft = igeo->inobt_maxlevels;
> @@ -783,7 +783,7 @@ xfs_ialloc_ag_alloc(
>  		 * on, so reset minalignslop to ensure it is not included in
>  		 * subsequent requests.
>  		 */
> -		args.minalignslop = 0;
> +		args.alignslop = 0;
>  	}
>  
>  	if (unlikely(args.fsbno == NULLFSBLOCK)) {
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 25ff6fe1eb6c..0b2a2a1379bd 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1808,7 +1808,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>  		__field(xfs_extlen_t, minleft)
>  		__field(xfs_extlen_t, total)
>  		__field(xfs_extlen_t, alignment)
> -		__field(xfs_extlen_t, minalignslop)
> +		__field(xfs_extlen_t, alignslop)
>  		__field(xfs_extlen_t, len)
>  		__field(char, wasdel)
>  		__field(char, wasfromfl)
> @@ -1827,7 +1827,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>  		__entry->minleft = args->minleft;
>  		__entry->total = args->total;
>  		__entry->alignment = args->alignment;
> -		__entry->minalignslop = args->minalignslop;
> +		__entry->alignslop = args->alignslop;
>  		__entry->len = args->len;
>  		__entry->wasdel = args->wasdel;
>  		__entry->wasfromfl = args->wasfromfl;
> @@ -1836,7 +1836,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>  		__entry->highest_agno = args->tp->t_highest_agno;
>  	),
>  	TP_printk("dev %d:%d agno 0x%x agbno 0x%x minlen %u maxlen %u mod %u "
> -		  "prod %u minleft %u total %u alignment %u minalignslop %u "
> +		  "prod %u minleft %u total %u alignment %u alignslop %u "
>  		  "len %u wasdel %d wasfromfl %d resv %d "
>  		  "datatype 0x%x highest_agno 0x%x",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
> @@ -1849,7 +1849,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>  		  __entry->minleft,
>  		  __entry->total,
>  		  __entry->alignment,
> -		  __entry->minalignslop,
> +		  __entry->alignslop,
>  		  __entry->len,
>  		  __entry->wasdel,
>  		  __entry->wasfromfl,
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 04/13] xfs: make EOF allocation simpler
  2024-06-21 10:05 ` [PATCH 04/13] xfs: make EOF allocation simpler John Garry
@ 2024-06-21 20:35   ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 20:35 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:31AM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently the allocation at EOF is broken into two cases - when the
> offset is zero and when the offset is non-zero. When the offset is
> non-zero, we try to do exact block allocation for contiguous
> extent allocation. When the offset is zero, the allocation is simply
> an aligned allocation.
> 
> We want aligned allocation as the fallback when exact block
> allocation fails, but that complicates the EOF allocation in that it
> now has to handle two different allocation cases. The
> caller also has to handle allocation when not at EOF, and for the
> upcoming forced alignment changes we need that to also be aligned
> allocation.
> 
> To simplify all this, pull the aligned allocation cases back into
> the callers and leave the EOF allocation path for exact block
> allocation only. This means that the EOF exact block allocation
> fallback path is the normal aligned allocation path and that ends up
> making things a lot simpler when forced alignment is introduced.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c   | 129 +++++++++++++++----------------------
>  fs/xfs/libxfs/xfs_ialloc.c |   2 +-
>  2 files changed, 54 insertions(+), 77 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7f8c8e4dd244..528e3cd81ee6 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3310,12 +3310,12 @@ xfs_bmap_select_minlen(
>  static int
>  xfs_bmap_btalloc_select_lengths(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	xfs_extlen_t		*blen)
> +	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*pag;
>  	xfs_agnumber_t		agno, startag;
> +	xfs_extlen_t		blen = 0;
>  	int			error = 0;
>  
>  	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> @@ -3329,19 +3329,18 @@ xfs_bmap_btalloc_select_lengths(
>  	if (startag == NULLAGNUMBER)
>  		startag = 0;
>  
> -	*blen = 0;
>  	for_each_perag_wrap(mp, startag, agno, pag) {
> -		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
> +		error = xfs_bmap_longest_free_extent(pag, args->tp, &blen);
>  		if (error && error != -EAGAIN)
>  			break;
>  		error = 0;
> -		if (*blen >= args->maxlen)
> +		if (blen >= args->maxlen)
>  			break;
>  	}
>  	if (pag)
>  		xfs_perag_rele(pag);
>  
> -	args->minlen = xfs_bmap_select_minlen(ap, args, *blen);
> +	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
>  	return error;
>  }
>  
> @@ -3551,78 +3550,40 @@ xfs_bmap_exact_minlen_extent_alloc(
>   * If we are not low on available data blocks and we are allocating at
>   * EOF, optimise allocation for contiguous file extension and/or stripe
>   * alignment of the new extent.
> - *
> - * NOTE: ap->aeof is only set if the allocation length is >= the
> - * stripe unit and the allocation offset is at the end of file.
>   */
>  static int
>  xfs_bmap_btalloc_at_eof(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	xfs_extlen_t		blen,
> -	bool			ag_only)
> +	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*caller_pag = args->pag;
> +	xfs_extlen_t		alignment = args->alignment;
>  	int			error;
>  
> +	ASSERT(ap->aeof && ap->offset);
> +	ASSERT(args->alignment >= 1);
> +
>  	/*
> -	 * If there are already extents in the file, try an exact EOF block
> -	 * allocation to extend the file as a contiguous extent. If that fails,
> -	 * or it's the first allocation in a file, just try for a stripe aligned
> -	 * allocation.
> +	 * Compute the alignment slop for the fallback path so we ensure
> +	 * we account for the potential alignemnt space required by the
> +	 * fallback paths before we modify the AGF and AGFL here.
>  	 */
> -	if (ap->offset) {
> -		xfs_extlen_t	alignment = args->alignment;
> -
> -		/*
> -		 * Compute the alignment slop for the fallback path so we ensure
> -		 * we account for the potential alignemnt space required by the
> -		 * fallback paths before we modify the AGF and AGFL here.
> -		 */
> -		args->alignment = 1;
> -		args->alignslop = alignment - args->alignment;
> -
> -		if (!caller_pag)
> -			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> -		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> -		if (!caller_pag) {
> -			xfs_perag_put(args->pag);
> -			args->pag = NULL;
> -		}
> -		if (error)
> -			return error;
> -
> -		if (args->fsbno != NULLFSBLOCK)
> -			return 0;
> -		/*
> -		 * Exact allocation failed. Reset to try an aligned allocation
> -		 * according to the original allocation specification.
> -		 */
> -		args->alignment = alignment;
> -		args->alignslop = 0;
> -	}
> +	args->alignment = 1;
> +	args->alignslop = alignment - args->alignment;
>  
> -	if (ag_only) {
> -		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> -	} else {
> +	if (!caller_pag)
> +		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> +	error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> +	if (!caller_pag) {
> +		xfs_perag_put(args->pag);
>  		args->pag = NULL;
> -		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> -		ASSERT(args->pag == NULL);
> -		args->pag = caller_pag;
>  	}
> -	if (error)
> -		return error;
>  
> -	if (args->fsbno != NULLFSBLOCK)
> -		return 0;
> -
> -	/*
> -	 * Aligned allocation failed, so all fallback paths from here drop the
> -	 * start alignment requirement as we know it will not succeed.
> -	 */
> -	args->alignment = 1;
> -	return 0;
> +	/* Reset alignment to original specifications.  */
> +	args->alignment = alignment;
> +	args->alignslop = 0;
> +	return error;
>  }
>  
>  /*
> @@ -3688,12 +3649,19 @@ xfs_bmap_btalloc_filestreams(
>  	}
>  
>  	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
> -	if (ap->aeof)
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
> +	if (ap->aeof && ap->offset)
> +		error = xfs_bmap_btalloc_at_eof(ap, args);
>  
> +	/* This may be an aligned allocation attempt. */
>  	if (!error && args->fsbno == NULLFSBLOCK)
>  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
>  
> +	/* Attempt non-aligned allocation if we haven't already. */
> +	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> +		args->alignment = 1;

Do we have to zero alignslop here?

Kicking the calls to the near/startag allocators out of
xfs_bmap_btalloc_at_eof makes it a lot easier to follow.  I appreciate
that.

--D

> +		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> +	}
> +
>  out_low_space:
>  	/*
>  	 * We are now done with the perag reference for the filestreams
> @@ -3715,7 +3683,6 @@ xfs_bmap_btalloc_best_length(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args)
>  {
> -	xfs_extlen_t		blen = 0;
>  	int			error;
>  
>  	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
> @@ -3726,23 +3693,33 @@ xfs_bmap_btalloc_best_length(
>  	 * the request.  If one isn't found, then adjust the minimum allocation
>  	 * size to the largest space found.
>  	 */
> -	error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
> +	error = xfs_bmap_btalloc_select_lengths(ap, args);
>  	if (error)
>  		return error;
>  
>  	/*
> -	 * Don't attempt optimal EOF allocation if previous allocations barely
> -	 * succeeded due to being near ENOSPC. It is highly unlikely we'll get
> -	 * optimal or even aligned allocations in this case, so don't waste time
> -	 * trying.
> +	 * If we are in low space mode, then optimal allocation will fail so
> +	 * prepare for minimal allocation and run the low space algorithm
> +	 * immediately.
>  	 */
> -	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
> -		if (error || args->fsbno != NULLFSBLOCK)
> -			return error;
> +	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> +		ASSERT(args->fsbno == NULLFSBLOCK);
> +		return xfs_bmap_btalloc_low_space(ap, args);
> +	}
> +
> +	if (ap->aeof && ap->offset)
> +		error = xfs_bmap_btalloc_at_eof(ap, args);
> +
> +	/* This may be an aligned allocation attempt. */
> +	if (!error && args->fsbno == NULLFSBLOCK)
> +		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> +
> +	/* Attempt non-aligned allocation if we haven't already. */
> +	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> +		args->alignment = 1;

And here?

> +		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	}
>  
> -	error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	if (error || args->fsbno != NULLFSBLOCK)
>  		return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 9f71a9a3a65e..40a2daeea712 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -780,7 +780,7 @@ xfs_ialloc_ag_alloc(
>  		 * the exact agbno requirement and increase the alignment
>  		 * instead. It is critical that the total size of the request
>  		 * (len + alignment + slop) does not increase from this point
> -		 * on, so reset minalignslop to ensure it is not included in
> +		 * on, so reset alignslop to ensure it is not included in

This belongs in the previous patch, yes?

--D

>  		 * subsequent requests.
>  		 */
>  		args.alignslop = 0;
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/13] xfs: introduce forced allocation alignment
  2024-06-21 10:05 ` [PATCH 05/13] xfs: introduce forced allocation alignment John Garry
@ 2024-06-21 20:37   ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 20:37 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:32AM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When forced allocation alignment is specified, the extent will
> be aligned to the extent size hint size rather than stripe
> alignment. If aligned allocation cannot be done, then the allocation
> is failed rather than attempting non-aligned fallbacks.
> 
> Note: none of the per-inode force align configuration is present
> yet, so this just triggers off an "always false" wrapper function
> for the moment.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Pretty straightfoward!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.h |  1 +
>  fs/xfs/libxfs/xfs_bmap.c  | 29 +++++++++++++++++++++++------
>  fs/xfs/xfs_inode.h        |  5 +++++
>  3 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index aa2c103d98f0..7de2e6f64882 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -66,6 +66,7 @@ typedef struct xfs_alloc_arg {
>  #define XFS_ALLOC_USERDATA		(1 << 0)/* allocation is for user data*/
>  #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
>  #define XFS_ALLOC_NOBUSY		(1 << 2)/* Busy extents not allowed */
> +#define XFS_ALLOC_FORCEALIGN		(1 << 3)/* forced extent alignment */
>  
>  /* freespace limit calculations */
>  unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 528e3cd81ee6..9131ba8113a6 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3401,9 +3401,10 @@ xfs_bmap_alloc_account(
>   * Calculate the extent start alignment and the extent length adjustments that
>   * constrain this allocation.
>   *
> - * Extent start alignment is currently determined by stripe configuration and is
> - * carried in args->alignment, whilst extent length adjustment is determined by
> - * extent size hints and is carried by args->prod and args->mod.
> + * Extent start alignment is currently determined by forced inode alignment or
> + * stripe configuration and is carried in args->alignment, whilst extent length
> + * adjustment is determined by extent size hints and is carried by args->prod
> + * and args->mod.
>   *
>   * Low level allocation code is free to either ignore or override these values
>   * as required.
> @@ -3416,11 +3417,18 @@ xfs_bmap_compute_alignments(
>  	struct xfs_mount	*mp = args->mp;
>  	xfs_extlen_t		align = 0; /* minimum allocation alignment */
>  
> -	/* stripe alignment for allocation is determined by mount parameters */
> -	if (mp->m_swidth && xfs_has_swalloc(mp))
> +	/*
> +	 * Forced inode alignment takes preference over stripe alignment.
> +	 * Stripe alignment for allocation is determined by mount parameters.
> +	 */
> +	if (xfs_inode_has_forcealign(ap->ip)) {
> +		args->alignment = xfs_get_extsz_hint(ap->ip);
> +		args->datatype |= XFS_ALLOC_FORCEALIGN;
> +	} else if (mp->m_swidth && xfs_has_swalloc(mp)) {
>  		args->alignment = mp->m_swidth;
> -	else if (mp->m_dalign)
> +	} else if (mp->m_dalign) {
>  		args->alignment = mp->m_dalign;
> +	}
>  
>  	if (ap->flags & XFS_BMAPI_COWFORK)
>  		align = xfs_get_cowextsz_hint(ap->ip);
> @@ -3607,6 +3615,11 @@ xfs_bmap_btalloc_low_space(
>  {
>  	int			error;
>  
> +	if (args->alignment > 1 && (args->datatype & XFS_ALLOC_FORCEALIGN)) {
> +		args->fsbno = NULLFSBLOCK;
> +		return 0;
> +	}
> +
>  	args->alignment = 1;
>  	if (args->minlen > ap->minlen) {
>  		args->minlen = ap->minlen;
> @@ -3658,6 +3671,8 @@ xfs_bmap_btalloc_filestreams(
>  
>  	/* Attempt non-aligned allocation if we haven't already. */
>  	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> +		if (args->datatype & XFS_ALLOC_FORCEALIGN)
> +			return error;
>  		args->alignment = 1;
>  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
>  	}
> @@ -3716,6 +3731,8 @@ xfs_bmap_btalloc_best_length(
>  
>  	/* Attempt non-aligned allocation if we haven't already. */
>  	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> +		if (args->datatype & XFS_ALLOC_FORCEALIGN)
> +			return error;
>  		args->alignment = 1;
>  		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	}
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 292b90b5f2ac..42f999c1106c 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -311,6 +311,11 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
>  	return ip->i_diflags2 & XFS_DIFLAG2_NREXT64;
>  }
>  
> +static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
> +{
> +	return false;
> +}
> +
>  /*
>   * Decide if this file is a realtime file whose data allocation unit is larger
>   * than a single filesystem block.
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/13] xfs: align args->minlen for forced allocation alignment
  2024-06-21 10:05 ` [PATCH 06/13] xfs: align args->minlen for " John Garry
@ 2024-06-21 20:38   ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2024-06-21 20:38 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Fri, Jun 21, 2024 at 10:05:33AM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> If args->minlen is not aligned to the constraints of forced
> alignment, we may do minlen allocations that are not aligned when we
> approach ENOSPC. Avoid this by always aligning args->minlen
> appropriately. If alignment of minlen results in a value smaller
> than the alignment constraint, fail the allocation immediately.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Looks good!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 45 +++++++++++++++++++++++++++-------------
>  1 file changed, 31 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 9131ba8113a6..c9cf138e13c4 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3278,33 +3278,48 @@ xfs_bmap_longest_free_extent(
>  	return 0;
>  }
>  
> -static xfs_extlen_t
> +static int
>  xfs_bmap_select_minlen(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args,
>  	xfs_extlen_t		blen)
>  {
> -
>  	/* Adjust best length for extent start alignment. */
>  	if (blen > args->alignment)
>  		blen -= args->alignment;
>  
>  	/*
>  	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
> -	 * possible that there is enough contiguous free space for this request.
> +	 * possible that there is enough contiguous free space for this request
> +	 * even if best length is less that the minimum length we need.
> +	 *
> +	 * If the best length won't satisfy the maximum length we requested,
> +	 * then use it as the minimum length so we get as large an allocation
> +	 * as possible.
>  	 */
>  	if (blen < ap->minlen)
> -		return ap->minlen;
> +		blen = ap->minlen;
> +	else if (blen > args->maxlen)
> +		blen = args->maxlen;
>  
>  	/*
> -	 * If the best seen length is less than the request length,
> -	 * use the best as the minimum, otherwise we've got the maxlen we
> -	 * were asked for.
> +	 * If we have alignment constraints, round the minlen down to match the
> +	 * constraint so that alignment will be attempted. This may reduce the
> +	 * allocation to smaller than was requested, so clamp the minimum to
> +	 * ap->minlen to allow unaligned allocation to succeed. If we are forced
> +	 * to align the allocation, return ENOSPC at this point because we don't
> +	 * have enough contiguous free space to guarantee aligned allocation.
>  	 */
> -	if (blen < args->maxlen)
> -		return blen;
> -	return args->maxlen;
> -
> +	if (args->alignment > 1) {
> +		blen = rounddown(blen, args->alignment);
> +		if (blen < ap->minlen) {
> +			if (args->datatype & XFS_ALLOC_FORCEALIGN)
> +				return -ENOSPC;
> +			blen = ap->minlen;
> +		}
> +	}
> +	args->minlen = blen;
> +	return 0;
>  }
>  
>  static int
> @@ -3340,8 +3355,7 @@ xfs_bmap_btalloc_select_lengths(
>  	if (pag)
>  		xfs_perag_rele(pag);
>  
> -	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
> -	return error;
> +	return xfs_bmap_select_minlen(ap, args, blen);
>  }
>  
>  /* Update all inode and quota accounting for the allocation we just did. */
> @@ -3661,7 +3675,10 @@ xfs_bmap_btalloc_filestreams(
>  		goto out_low_space;
>  	}
>  
> -	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
> +	error = xfs_bmap_select_minlen(ap, args, blen);
> +	if (error)
> +		goto out_low_space;
> +
>  	if (ap->aeof && ap->offset)
>  		error = xfs_bmap_btalloc_at_eof(ap, args);
>  
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/13] xfs: Update xfs_inode_alloc_unitsize_fsb() for forcealign
  2024-06-21 18:38   ` Darrick J. Wong
@ 2024-06-24  7:34     ` John Garry
  0 siblings, 0 replies; 31+ messages in thread
From: John Garry @ 2024-06-24  7:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 21/06/2024 19:38, Darrick J. Wong wrote:
> On Fri, Jun 21, 2024 at 10:05:36AM +0000, John Garry wrote:
>> For forcealign enabled, the allocation unit size is in ip->i_extsize, and
>> this must never be zero.
>>
>> Signed-off-by: John Garry<john.g.garry@oracle.com>
>> ---
>>   fs/xfs/xfs_inode.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 994fb7e184d9..cd07b15478ac 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -4299,6 +4299,8 @@ xfs_inode_alloc_unitsize(
>>   {
>>   	unsigned int		blocks = 1;
>>   
>> +	if (xfs_inode_has_forcealign(ip))
>> +		return ip->i_extsize;
> i_extsize is in units of fsblocks, whereas this function returns bytes.
> I think you need XFS_FSB_TO_B here?

Right, that was a copy-and-paste error from the previous series.

Thanks,
John

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] xfs: Introduce FORCEALIGN inode flag
  2024-06-21 19:07   ` Darrick J. Wong
@ 2024-06-24 14:36     ` John Garry
  0 siblings, 0 replies; 31+ messages in thread
From: John Garry @ 2024-06-24 14:36 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 21/06/2024 20:07, Darrick J. Wong wrote:
> On Fri, Jun 21, 2024 at 10:05:34AM +0000, John Garry wrote:
>> From: "Darrick J. Wong" <djwong@kernel.org>
>>
>> Add a new inode flag to require that all file data extent mappings must
>> be aligned (both the file offset range and the allocated space itself)
>> to the extent size hint.  Having a separate COW extent size hint is no
>> longer allowed.
> 
> It might be worth mentioning that for non-rt files, the allocated space
> will be aligned to the start of an AG, not necessary the block device,
> though the upcoming atomicwrites inode flag will also require that.

ok

> 
> Also this should clarify what happens for rt files -- do we allow
> forcealign realtime files? 

We would, but not yet. So how to handle in the kernel now?

Dave said that forcealign for RT inode is a viable on-disk format now. 
So, since we won't support forcealign for RT initailly, should we make 
the inode read-only?

> Or only for the sb_rextsize == 1 case?
> Or for any sbrextsize?

I think that any sb_rextsize is ok with forcealign, as long as 
forcealign extsize % sb_rextsize == 0. And, to support atomic writes, we 
need power-of-2 extsize.

> 
>> The goal here is to enable sysadmins and users to mandate that all space
>> mappings in a file must have a startoff/blockcount that are aligned to
>> (say) a 2MB alignment and that the startblock/blockcount will follow the
>> same alignment.
>>
>> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
>> Co-developed-by: John Garry <john.g.garry@oracle.com>

note: I really should mention that I have made big changes since you 
touched this patch

>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_format.h    |  6 +++-
>>   fs/xfs/libxfs/xfs_inode_buf.c | 53 +++++++++++++++++++++++++++++++++++
>>   fs/xfs/libxfs/xfs_inode_buf.h |  3 ++
>>   fs/xfs/libxfs/xfs_sb.c        |  2 ++
>>   fs/xfs/xfs_inode.c            | 13 +++++++++
>>   fs/xfs/xfs_inode.h            | 20 ++++++++++++-
>>   fs/xfs/xfs_ioctl.c            | 47 +++++++++++++++++++++++++++++--
>>   fs/xfs/xfs_mount.h            |  2 ++
>>   fs/xfs/xfs_reflink.h          | 10 -------
>>   fs/xfs/xfs_super.c            |  4 +++
>>   include/uapi/linux/fs.h       |  2 ++
>>   11 files changed, 148 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index 61f51becff4f..b48cd75d34a6 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -353,6 +353,7 @@ xfs_sb_has_compat_feature(
>>   #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
>>   #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
>>   #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
>> +#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
>>   #define XFS_SB_FEAT_RO_COMPAT_ALL \
>>   		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
>>   		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
>> @@ -1094,16 +1095,19 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
>>   #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
>>   #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
>>   #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
>> +/* data extent mappings for regular files must be aligned to extent size hint */
>> +#define XFS_DIFLAG2_FORCEALIGN_BIT 5
>>   
>>   #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
>>   #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
>>   #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
>>   #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
>>   #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
>> +#define XFS_DIFLAG2_FORCEALIGN	(1 << XFS_DIFLAG2_FORCEALIGN_BIT)
>>   
>>   #define XFS_DIFLAG2_ANY \
>>   	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
>> -	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
>> +	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_FORCEALIGN)
>>   
>>   static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
>>   {
>> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
>> index e7a7bfbe75b4..b2c5f466c1a9 100644
>> --- a/fs/xfs/libxfs/xfs_inode_buf.c
>> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
>> @@ -644,6 +644,15 @@ xfs_dinode_verify(
>>   	    !xfs_has_bigtime(mp))
>>   		return __this_address;
>>   
>> +	if (flags2 & XFS_DIFLAG2_FORCEALIGN) {
>> +		fa = xfs_inode_validate_forcealign(mp,
>> +			be32_to_cpu(dip->di_extsize),
>> +			be32_to_cpu(dip->di_cowextsize),
>> +			mode, flags, flags2);
> 
> Needs another level of indent.

ok

> 
> 		fa = xfs_inode_validate_forcealign(mp,
> 				be32_to_cpu(dip->di_extsize),
> 				be32_to_cpu(dip->di_cowextsize),
> 				mode, flags, flags2);
> 
>> +		if (fa)
>> +			return fa;
>> +	}
>> +
>>   	return NULL;
>>   }
>>   
>> @@ -811,3 +820,47 @@ xfs_inode_validate_cowextsize(
>>   
>>   	return NULL;
>>   }
>> +
>> +/* Validate the forcealign inode flag */
>> +xfs_failaddr_t
>> +xfs_inode_validate_forcealign(
>> +	struct xfs_mount	*mp,
>> +	uint32_t		extsize,
>> +	uint32_t		cowextsize,
>> +	uint16_t		mode,
>> +	uint16_t		flags,
>> +	uint64_t		flags2)
>> +{
>> +	/* superblock rocompat feature flag */
>> +	if (!xfs_has_forcealign(mp))
>> +		return __this_address;
>> +
>> +	/* Only regular files and directories */
>> +	if (!S_ISDIR(mode) && !S_ISREG(mode))
>> +		return __this_address;
>> +
>> +	/* We require EXTSIZE or EXTSZINHERIT */
>> +	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
>> +		return __this_address;
>> +
>> +	/* We require a non-zero extsize */
>> +	if (!extsize)
>> +		return __this_address;
>> +
>> +	/* Reflink'ed disallowed */
>> +	if (flags2 & XFS_DIFLAG2_REFLINK)
>> +		return __this_address;
> 
> This is still a significant limitation to end up encoding in the ondisk
> format.  Given that we may some day fix that limitation by teaching
> pagecache writeback how to do writeback on entire allocation units,
> I think for now you should refuse to mount any filesystem with reflink
> and forcealign enabled at the same time.

That seems pretty drastic.

> 
>> +
>> +	/* COW extsize disallowed */
>> +	if (flags2 & XFS_DIFLAG2_COWEXTSIZE)
>> +		return __this_address;
>> +
>> +	if (cowextsize)
>> +		return __this_address;
>> +
>> +	/* A RT device with sb_rextsize=1 could make use of forcealign */
>> +	if (flags & XFS_DIFLAG_REALTIME && mp->m_sb.sb_rextsize != 1)
>> +		return __this_address;
> 
> Ok, so forcealign and bigrtalloc are not compatible?  I would have
> thought they would be ok together since the rest of your code changes
> short circuit into the forcealign case before the bigrtalloc case.
> All you need here is to validate that i_extsize % sb_rextsize == 0.

See what Dave wrote at 
https://lore.kernel.org/linux-xfs/20240429174746.2132161-1-john.g.garry@oracle.com/T/#m808100f5699c3068bb5d5939297ff54ce3a3081f 
that begins with "A rt device with an extsize of 1 fsb could make use of
forced alignment just like the data device to allow larger atomic
writes to be done.". How do you read that exactly?

> 
>> +
>> +	return NULL;
> 
> You don't check that i_extsize % sb_agblocks == 0 for non-rt files
> because that is only required for atomic writes + forcealign, right?

right

> 
>> +}
>> diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h
>> index 585ed5a110af..b8b65287b037 100644
>> --- a/fs/xfs/libxfs/xfs_inode_buf.h
>> +++ b/fs/xfs/libxfs/xfs_inode_buf.h
>> @@ -33,6 +33,9 @@ xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp,
>>   xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp,
>>   		uint32_t cowextsize, uint16_t mode, uint16_t flags,
>>   		uint64_t flags2);
>> +xfs_failaddr_t xfs_inode_validate_forcealign(struct xfs_mount *mp,
>> +		uint32_t extsize, uint32_t cowextsize, uint16_t mode,
>> +		uint16_t flags, uint64_t flags2);
>>   
>>   static inline uint64_t xfs_inode_encode_bigtime(struct timespec64 tv)
>>   {
>> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
>> index 6b56f0f6d4c1..e56911553edd 100644
>> --- a/fs/xfs/libxfs/xfs_sb.c
>> +++ b/fs/xfs/libxfs/xfs_sb.c
>> @@ -164,6 +164,8 @@ xfs_sb_version_to_features(
>>   		features |= XFS_FEAT_REFLINK;
>>   	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
>>   		features |= XFS_FEAT_INOBTCNT;
>> +	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
>> +		features |= XFS_FEAT_FORCEALIGN;
>>   	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
>>   		features |= XFS_FEAT_FTYPE;
>>   	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index f36091e1e7f5..994fb7e184d9 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -608,6 +608,8 @@ xfs_ip2xflags(
>>   			flags |= FS_XFLAG_DAX;
>>   		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
>>   			flags |= FS_XFLAG_COWEXTSIZE;
>> +		if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
>> +			flags |= FS_XFLAG_FORCEALIGN;
>>   	}
>>   
>>   	if (xfs_inode_has_attr_fork(ip))
>> @@ -737,6 +739,8 @@ xfs_inode_inherit_flags2(
>>   	}
>>   	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
>>   		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
>> +	if (pip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
>> +		ip->i_diflags2 |= XFS_DIFLAG2_FORCEALIGN;
>>   
>>   	/* Don't let invalid cowextsize hints propagate. */
>>   	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
>> @@ -745,6 +749,15 @@ xfs_inode_inherit_flags2(
>>   		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
>>   		ip->i_cowextsize = 0;
>>   	}
>> +
>> +	if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN) {
>> +		failaddr = xfs_inode_validate_forcealign(ip->i_mount,
>> +				ip->i_extsize, ip->i_cowextsize,
>> +				VFS_I(ip)->i_mode, ip->i_diflags,
>> +				ip->i_diflags2);
>> +		if (failaddr)
>> +			ip->i_diflags2 &= ~XFS_DIFLAG2_FORCEALIGN;
>> +	}
>>   }
>>   
>>   /*
>> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
>> index 42f999c1106c..536e646dd055 100644
>> --- a/fs/xfs/xfs_inode.h
>> +++ b/fs/xfs/xfs_inode.h
>> @@ -301,6 +301,16 @@ static inline bool xfs_inode_has_cow_data(struct xfs_inode *ip)
>>   	return ip->i_cowfp && ip->i_cowfp->if_bytes;
>>   }
>>   
>> +static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
>> +{
>> +	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
>> +}
>> +
>> +static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
>> +{
>> +	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
>> +}
>> +
>>   static inline bool xfs_inode_has_bigtime(struct xfs_inode *ip)
>>   {
>>   	return ip->i_diflags2 & XFS_DIFLAG2_BIGTIME;
>> @@ -313,7 +323,15 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
>>   
>>   static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
>>   {
>> -	return false;
>> +	if (!(ip->i_diflags & XFS_DIFLAG_EXTSIZE))
>> +		return false;
>> +	if (ip->i_extsize <= 1)
>> +		return false;
>> +	if (xfs_is_cow_inode(ip))
>> +		return false;
>> +	if (ip->i_diflags & XFS_DIFLAG_REALTIME)
>> +		return false;
>> +	return ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN;
> 
> In theory we already validated all of these fields when we loaded the
> inode, right?  In which case you only need to check diflags2.

It was suggested to have all these checks in one place to safeguard 
against some other functions not detecting invalid flags.

> 
>>   }
>>   
>>   /*
>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>> index f0117188f302..5eff8fd9fa3e 100644
>> --- a/fs/xfs/xfs_ioctl.c
>> +++ b/fs/xfs/xfs_ioctl.c
>> @@ -525,10 +525,48 @@ xfs_flags2diflags2(
>>   		di_flags2 |= XFS_DIFLAG2_DAX;
>>   	if (xflags & FS_XFLAG_COWEXTSIZE)
>>   		di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
>> +	if (xflags & FS_XFLAG_FORCEALIGN)
>> +		di_flags2 |= XFS_DIFLAG2_FORCEALIGN;
>>   
>>   	return di_flags2;
>>   }
>>   
>> +/*
>> + * Forcealign requires a non-zero extent size hint and a zero cow
>> + * extent size hint.  Don't allow set for RT files yet.
>> + */
>> +static int
>> +xfs_ioctl_setattr_forcealign(
>> +	struct xfs_inode	*ip,
>> +	struct fileattr		*fa)
>> +{
>> +	struct xfs_mount	*mp = ip->i_mount;
>> +
>> +	if (!xfs_has_forcealign(mp))
>> +		return -EINVAL;
>> +
>> +	if (xfs_is_reflink_inode(ip))
>> +		return -EINVAL;
>> +
>> +	if (!(fa->fsx_xflags & (FS_XFLAG_EXTSIZE |
>> +				FS_XFLAG_EXTSZINHERIT)))
>> +		return -EINVAL;
>> +
>> +	if (fa->fsx_xflags & FS_XFLAG_COWEXTSIZE)
>> +		return -EINVAL;
>> +
>> +	if (!fa->fsx_extsize)
>> +		return -EINVAL;
>> +
>> +	if (fa->fsx_cowextsize)
>> +		return -EINVAL;
>> +
>> +	if (fa->fsx_xflags & FS_XFLAG_REALTIME)
>> +		return -EINVAL;
> 
> The inode verifier allows realtime files so long as sb_rextsize is
> nonzero.

Yeah, again, I am not sure on this.

> 
>> +
>> +	return 0;
>> +}
>> +
>>   static int
>>   xfs_ioctl_setattr_xflags(
>>   	struct xfs_trans	*tp,
>> @@ -537,10 +575,12 @@ xfs_ioctl_setattr_xflags(
>>   {
>>   	struct xfs_mount	*mp = ip->i_mount;
>>   	bool			rtflag = (fa->fsx_xflags & FS_XFLAG_REALTIME);
>> +	bool			forcealign = fa->fsx_xflags & FS_XFLAG_FORCEALIGN;
>>   	uint64_t		i_flags2;
>>   
>> -	if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
>> -		/* Can't change realtime flag if any extents are allocated. */
>> +	/* Can't change RT or forcealign flags if any extents are allocated. */
>> +	if (rtflag != XFS_IS_REALTIME_INODE(ip) ||
>> +	    forcealign != xfs_inode_has_forcealign(ip)) {
>>   		if (ip->i_df.if_nextents || ip->i_delayed_blks)
>>   			return -EINVAL;
>>   	}
>> @@ -561,6 +601,9 @@ xfs_ioctl_setattr_xflags(
>>   	if (i_flags2 && !xfs_has_v3inodes(mp))
>>   		return -EINVAL;
>>   
>> +	if (forcealign && (xfs_ioctl_setattr_forcealign(ip, fa) < 0))
>> +		return -EINVAL;
> 
> Either make xfs_ioctl_setattr_forcealign return a boolean, or extract
> the error code and return that; don't just squash
> xfs_ioctl_setattr_forcealign's negative errno into -EINVAL.

ok, fine

> 
>> +
>>   	ip->i_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
>>   	ip->i_diflags2 = i_flags2;
>>   
>> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
>> index d0567dfbc036..30228fea908d 100644
>> --- a/fs/xfs/xfs_mount.h
>> +++ b/fs/xfs/xfs_mount.h
>> @@ -299,6 +299,7 @@ typedef struct xfs_mount {
>>   #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
>>   #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
>>   #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
>> +#define XFS_FEAT_FORCEALIGN	(1ULL << 28)	/* aligned file data extents */
>>   
>>   /* Mount features */
>>   #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
>> @@ -385,6 +386,7 @@ __XFS_ADD_V4_FEAT(projid32, PROJID32)
>>   __XFS_HAS_V4_FEAT(v3inodes, V3INODES)
>>   __XFS_HAS_V4_FEAT(crc, CRC)
>>   __XFS_HAS_V4_FEAT(pquotino, PQUOTINO)
>> +__XFS_HAS_FEAT(forcealign, FORCEALIGN)
>>   
>>   /*
>>    * Mount features
>> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
>> index 65c5dfe17ecf..fb55e4ce49fa 100644
>> --- a/fs/xfs/xfs_reflink.h
>> +++ b/fs/xfs/xfs_reflink.h
>> @@ -6,16 +6,6 @@
>>   #ifndef __XFS_REFLINK_H
>>   #define __XFS_REFLINK_H 1
>>   
>> -static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip)
>> -{
>> -	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
>> -}
>> -
>> -static inline bool xfs_is_cow_inode(struct xfs_inode *ip)
>> -{
>> -	return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip);
>> -}
>> -
>>   extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip,
>>   		struct xfs_bmbt_irec *irec, bool *shared);
>>   int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap,
>> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
>> index 27e9f749c4c7..852bbfb21506 100644
>> --- a/fs/xfs/xfs_super.c
>> +++ b/fs/xfs/xfs_super.c
>> @@ -1721,6 +1721,10 @@ xfs_fs_fill_super(
>>   		mp->m_features &= ~XFS_FEAT_DISCARD;
>>   	}
>>   
>> +	if (xfs_has_forcealign(mp))
>> +		xfs_warn(mp,
>> +"EXPERIMENTAL forced data extent alignment feature in use. Use at your own risk!");
> 
> Here would be the place to fail the mount if reflink is enabled.

Right


Thanks,
John

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 08/13] xfs: Do not free EOF blocks for forcealign
  2024-06-21 19:08   ` Darrick J. Wong
@ 2024-06-24 15:04     ` John Garry
  0 siblings, 0 replies; 31+ messages in thread
From: John Garry @ 2024-06-24 15:04 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 21/06/2024 20:08, Darrick J. Wong wrote:
>> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
>> index e5d893f93522..56b80a7c0992 100644
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -539,8 +539,13 @@ xfs_can_free_eofblocks(
>>   	 * forever.
>>   	 */
>>   	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
>> -	if (xfs_inode_has_bigrtalloc(ip))
>> +
>> +	/* Do not free blocks when forcing extent sizes */
> "Only try to free blocks beyond the allocation unit that crosses EOF" ?

ok, fine

> 
> Otherwise seems fine to me
> Reviewed-by: Darrick J. Wong<djwong@kernel.org>

cheers


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 10/13] xfs: Unmap blocks according to forcealign
  2024-06-21 19:12   ` Darrick J. Wong
@ 2024-06-24 15:12     ` John Garry
  0 siblings, 0 replies; 31+ messages in thread
From: John Garry @ 2024-06-24 15:12 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 21/06/2024 20:12, Darrick J. Wong wrote:
> On Fri, Jun 21, 2024 at 10:05:37AM +0000, John Garry wrote:
>> For when forcealign is enabled, blocks in an inode need to be unmapped
>> according to extent alignment, like what is already done for rtvol.
>>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_bmap.c | 38 +++++++++++++++++++++++++++++++++-----
>>   1 file changed, 33 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index c9cf138e13c4..ebeb2969b289 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -5380,6 +5380,25 @@ xfs_bmap_del_extent_real(
>>   	return 0;
>>   }
>>   
>> +static xfs_extlen_t
>> +xfs_bunmapi_align(
>> +	struct xfs_inode	*ip,
>> +	xfs_fsblock_t		bno)
>> +{
>> +	struct xfs_mount	*mp = ip->i_mount;
>> +	xfs_agblock_t		agbno;
>> +
>> +	if (xfs_inode_has_forcealign(ip)) {
>> +		if (is_power_of_2(ip->i_extsize))
>> +			return bno & (ip->i_extsize - 1);
>> +
>> +		agbno = XFS_FSB_TO_AGBNO(mp, bno);
>> +		return do_div(agbno, ip->i_extsize);
> 
> Huh.  The inode verifier allows realtime forcealign files, but this code
> will not handle that properly.  Either don't allow realtime files, or
> make this handle them correctly:

ok, so XFS_FSB_TO_AGBNO() is not always suitable

> 
> 	if (XFS_IS_REALTIME_INODE(ip)) {
> 		if (xfs_inode_has_forcealign(ip))
> 			return offset_in_block(bno, ip->i_extsize);
> 		return xfs_rtb_to_rtxoff(ip->i_mount, bno);
> 	} else if (xfs_inode_has_forcealign(ip)) {
> 		xfs_agblock_t	agbno = XFS_FSB_TO_AGBNO(mp, bno);
> 
> 		return offset_in_block(agbno, ip->i_extsize);
> 	}
> 
> 	return 1; /* or assert, or whatever */
> 
>> +	}
>> +	ASSERT(XFS_IS_REALTIME_INODE(ip));
>> +	return xfs_rtb_to_rtxoff(ip->i_mount, bno);
>> +}
>> +
>>   /*
>>    * Unmap (remove) blocks from a file.
>>    * If nexts is nonzero then the number of extents to remove is limited to
>> @@ -5402,6 +5421,7 @@ __xfs_bunmapi(
>>   	struct xfs_bmbt_irec	got;		/* current extent record */
>>   	struct xfs_ifork	*ifp;		/* inode fork pointer */
>>   	int			isrt;		/* freeing in rt area */
>> +	int			isforcealign;	/* freeing for inode with forcealign */
>>   	int			logflags;	/* transaction logging flags */
>>   	xfs_extlen_t		mod;		/* rt extent offset */
>>   	struct xfs_mount	*mp = ip->i_mount;
>> @@ -5439,6 +5459,8 @@ __xfs_bunmapi(
>>   	}
>>   	XFS_STATS_INC(mp, xs_blk_unmap);
>>   	isrt = xfs_ifork_is_realtime(ip, whichfork);
>> +	isforcealign = (whichfork != XFS_ATTR_FORK) &&
>> +			xfs_inode_has_forcealign(ip);
>>   	end = start + len;
>>   
>>   	if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) {
>> @@ -5490,11 +5512,10 @@ __xfs_bunmapi(
>>   		if (del.br_startoff + del.br_blockcount > end + 1)
>>   			del.br_blockcount = end + 1 - del.br_startoff;
>>   
>> -		if (!isrt || (flags & XFS_BMAPI_REMAP))
>> +		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
>>   			goto delete;
>>   
>> -		mod = xfs_rtb_to_rtxoff(mp,
>> -				del.br_startblock + del.br_blockcount);
>> +		mod = xfs_bunmapi_align(ip, del.br_startblock + del.br_blockcount);
>>   		if (mod) {
>>   			/*
>>   			 * Realtime extent not lined up at the end.
> 
> "Not aligned to allocation unit on the end." ?

ok

> 
>> @@ -5542,9 +5563,16 @@ __xfs_bunmapi(
>>   			goto nodelete;
>>   		}
>>   
>> -		mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
>> +		mod = xfs_bunmapi_align(ip, del.br_startblock);
>>   		if (mod) {
>> -			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
>> +			xfs_extlen_t off;
>> +
>> +			if (isforcealign) {
>> +				off = ip->i_extsize - mod;
>> +			} else {
>> +				ASSERT(isrt);
>> +				off = mp->m_sb.sb_rextsize - mod;
>> +			}
>>   
>>   			/*
>>   			 * Realtime extent is lined up at the end but not
> 
> Same here -- now this code is handling more than just rt extents.
> 

ok

Thanks!


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-06-24 15:13 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-21 10:05 [PATCH 00/13] forcealign for xfs John Garry
2024-06-21 10:05 ` [PATCH 01/13] xfs: only allow minlen allocations when near ENOSPC John Garry
2024-06-21 19:42   ` Darrick J. Wong
2024-06-21 20:04     ` Darrick J. Wong
2024-06-21 10:05 ` [PATCH 02/13] xfs: always tail align maxlen allocations John Garry
2024-06-21 19:50   ` Darrick J. Wong
2024-06-21 10:05 ` [PATCH 03/13] xfs: simplify extent allocation alignment John Garry
2024-06-21 20:29   ` Darrick J. Wong
2024-06-21 10:05 ` [PATCH 04/13] xfs: make EOF allocation simpler John Garry
2024-06-21 20:35   ` Darrick J. Wong
2024-06-21 10:05 ` [PATCH 05/13] xfs: introduce forced allocation alignment John Garry
2024-06-21 20:37   ` Darrick J. Wong
2024-06-21 10:05 ` [PATCH 06/13] xfs: align args->minlen for " John Garry
2024-06-21 20:38   ` Darrick J. Wong
2024-06-21 10:05 ` [PATCH 07/13] xfs: Introduce FORCEALIGN inode flag John Garry
2024-06-21 19:07   ` Darrick J. Wong
2024-06-24 14:36     ` John Garry
2024-06-21 10:05 ` [PATCH 08/13] xfs: Do not free EOF blocks for forcealign John Garry
2024-06-21 19:08   ` Darrick J. Wong
2024-06-24 15:04     ` John Garry
2024-06-21 10:05 ` [PATCH 09/13] xfs: Update xfs_inode_alloc_unitsize_fsb() " John Garry
2024-06-21 18:38   ` Darrick J. Wong
2024-06-24  7:34     ` John Garry
2024-06-21 10:05 ` [PATCH 10/13] xfs: Unmap blocks according to forcealign John Garry
2024-06-21 19:12   ` Darrick J. Wong
2024-06-24 15:12     ` John Garry
2024-06-21 10:05 ` [PATCH 11/13] xfs: Only free full extents for forcealign John Garry
2024-06-21 19:13   ` Darrick J. Wong
2024-06-21 10:05 ` [PATCH 12/13] xfs: Don't revert allocated offset " John Garry
2024-06-21 19:13   ` Darrick J. Wong
2024-06-21 10:05 ` [PATCH 13/13] xfs: Enable file data forcealign feature John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).