[PATCH v3 00/14] forcealign for xfs

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/14] forcealign for xfs
@ 2024-08-01 16:30 John Garry
  2024-08-01 16:30 ` [PATCH v3 01/14] xfs: only allow minlen allocations when near ENOSPC John Garry
                   ` (13 more replies)
  0 siblings, 14 replies; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

This series is being spun off the block atomic writes for xfs series at
[0].

That series got too big.

The actual forcealign patches are roughly the same in this series.

Why forcealign?
In some scenarios to may be required to guarantee extent alignment and
granularity.

For example, for atomic writes, the maximum atomic write unit size would
be limited at the extent alignment and granularity, guaranteeing that an
atomic write would not span data present in multiple extents.

forcealign may be useful as a performance tuning optimization in other
scenarios.

I decided not to support forcealign for RT devices here. Initially I
thought that it would be quite simple of implement. However, I recently
discovered through much testing and subsequent debug that this was not
true, so I decided to defer support to later.

Early development xfsprogs support is at:
https://github.com/johnpgarry/xfsprogs-dev/commits/atomic-writes/

Differences to v2:
- Add rounding to alloc unit helpers
- Update xfs_setattr_size()
- Disallow mount for forcealign and reflink
- Remove forcealign and RT/reflink inode checks
- Relocate setting of XFS_ALLOC_FORCEALIGN

Differences to v1:
- Add Darricks RB tags (thanks)
- Disallow mount for forcealign and RT
- Disallow cp --reflink from forcealign inode
- Comments improvements (Darrick)
- Coding style improvements (Darrick)
- Fix xfs_inode_alloc_unitsize() (Darrick)

Baseline:
v6.11-rc1

[0] https://lore.kernel.org/linux-xfs/20240607143919.2622319-1-john.g.garry@oracle.com/
[1] https://lore.kernel.org/linux-block/20240620125359.2684798-1-john.g.garry@oracle.com/

Darrick J. Wong (2):
  xfs: Introduce FORCEALIGN inode flag
  xfs: Enable file data forcealign feature

Dave Chinner (6):
  xfs: only allow minlen allocations when near ENOSPC
  xfs: always tail align maxlen allocations
  xfs: simplify extent allocation alignment
  xfs: make EOF allocation simpler
  xfs: introduce forced allocation alignment
  xfs: align args->minlen for forced allocation alignment

John Garry (6):
  xfs: Update xfs_inode_alloc_unitsize() for forcealign
  xfs: Update xfs_setattr_size() for forcealign
  xfs: Do not free EOF blocks for forcealign
  xfs: Only free full extents for forcealign
  xfs: Unmap blocks according to forcealign
  xfs: Don't revert allocated offset for forcealign

 fs/xfs/libxfs/xfs_alloc.c      |  33 ++--
 fs/xfs/libxfs/xfs_alloc.h      |   3 +-
 fs/xfs/libxfs/xfs_bmap.c       | 322 ++++++++++++++++++---------------
 fs/xfs/libxfs/xfs_format.h     |   9 +-
 fs/xfs/libxfs/xfs_ialloc.c     |  12 +-
 fs/xfs/libxfs/xfs_inode_buf.c  |  46 +++++
 fs/xfs/libxfs/xfs_inode_buf.h  |   3 +
 fs/xfs/libxfs/xfs_inode_util.c |  14 ++
 fs/xfs/libxfs/xfs_sb.c         |   2 +
 fs/xfs/xfs_bmap_util.c         |  14 +-
 fs/xfs/xfs_inode.c             |  61 ++++++-
 fs/xfs/xfs_inode.h             |  18 ++
 fs/xfs/xfs_ioctl.c             |  46 ++++-
 fs/xfs/xfs_iops.c              |   4 +-
 fs/xfs/xfs_mount.h             |   2 +
 fs/xfs/xfs_reflink.c           |   5 +-
 fs/xfs/xfs_super.c             |  18 ++
 fs/xfs/xfs_trace.h             |   8 +-
 include/uapi/linux/fs.h        |   2 +
 19 files changed, 431 insertions(+), 191 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 01/14] xfs: only allow minlen allocations when near ENOSPC
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 18:51   ` Darrick J. Wong
  2024-08-01 16:30 ` [PATCH v3 02/14] xfs: always tail align maxlen allocations John Garry
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

When we are near ENOSPC and don't have enough free
space for an args->maxlen allocation, xfs_alloc_space_available()
will trim args->maxlen to equal the available space. However, this
function has only checked that there is enough contiguous free space
for an aligned args->minlen allocation to succeed. Hence there is no
guarantee that an args->maxlen allocation will succeed, nor that the
available space will allow for correct alignment of an args->maxlen
allocation.

Further, by trimming args->maxlen arbitrarily, it breaks an
assumption made in xfs_alloc_fix_len() that if the caller wants
aligned allocation, then args->maxlen will be set to an aligned
value. It then skips the tail alignment and so we end up with
extents that aren't aligned to extent size hint boundaries as we
approach ENOSPC.

To avoid this problem, don't reduce args->maxlen by some random,
arbitrary amount. If args->maxlen is too large for the available
space, reduce the allocation to a minlen allocation as we know we
have contiguous free space available for this to succeed and always
be correctly aligned.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 59326f84f6a5..d559d992c6ef 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2524,14 +2524,23 @@ xfs_alloc_space_available(
 	if (available < (int)max(args->total, alloc_len))
 		return false;

+	if (flags & XFS_ALLOC_FLAG_CHECK)
+		return true;
+
 	/*
-	 * Clamp maxlen to the amount of free space available for the actual
-	 * extent allocation.
+	 * If we can't do a maxlen allocation, then we must reduce the size of
+	 * the allocation to match the available free space. We know how big
+	 * the largest contiguous free space we can allocate is, so that's our
+	 * upper bound. However, we don't exaclty know what alignment/size
+	 * constraints have been placed on the allocation, so we can't
+	 * arbitrarily select some new max size. Hence make this a minlen
+	 * allocation as we know that will definitely succeed and match the
+	 * callers alignment constraints.
 	 */
-	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
-		args->maxlen = available;
+	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
+	if (longest < alloc_len) {
+		args->maxlen = args->minlen;
 		ASSERT(args->maxlen > 0);
-		ASSERT(args->maxlen >= args->minlen);
 	}

 	return true;
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 02/14] xfs: always tail align maxlen allocations
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
  2024-08-01 16:30 ` [PATCH v3 01/14] xfs: only allow minlen allocations when near ENOSPC John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-13 15:01   ` John Garry
  2024-08-01 16:30 ` [PATCH v3 03/14] xfs: simplify extent allocation alignment John Garry
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

When we do a large allocation, the core free space allocation code
assumes that args->maxlen is aligned to args->prod/args->mod. hence
if we get a maximum sized extent allocated, it does not do tail
alignment of the extent.

However, this assumes that nothing modifies args->maxlen between the
original allocation context setup and trimming the selected free
space extent to size. This assumption has recently been found to be
invalid - xfs_alloc_space_available() modifies args->maxlen in low
space situations - and there may be more situations we haven't yet
found like this.

Force aligned allocation introduces the requirement that extents are
correctly tail aligned, resulting in this occasional latent
alignment failure to be reclassified from an unimportant curiousity
to a must-fix bug.

Removing the assumption about args->maxlen allocations always being
tail aligned is trivial, and should not impact anything because
args->maxlen for inodes with extent size hints configured are
already aligned. Hence all this change does it avoid weird corner
cases that would have resulted in unaligned extent sizes by always
trimming the extent down to an aligned size.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> [provisional on v1 series comment]
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index d559d992c6ef..bf08b9e9d9ac 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -433,20 +433,18 @@ xfs_alloc_compute_diff(
  * Fix up the length, based on mod and prod.
  * len should be k * prod + mod for some k.
  * If len is too small it is returned unchanged.
- * If len hits maxlen it is left alone.
  */
-STATIC void
+static void
 xfs_alloc_fix_len(
-	xfs_alloc_arg_t	*args)		/* allocation argument structure */
+	struct xfs_alloc_arg	*args)
 {
-	xfs_extlen_t	k;
-	xfs_extlen_t	rlen;
+	xfs_extlen_t		k;
+	xfs_extlen_t		rlen = args->len;

 	ASSERT(args->mod < args->prod);
-	rlen = args->len;
 	ASSERT(rlen >= args->minlen);
 	ASSERT(rlen <= args->maxlen);
-	if (args->prod <= 1 || rlen < args->mod || rlen == args->maxlen ||
+	if (args->prod <= 1 || rlen < args->mod ||
 	    (args->mod == 0 && rlen < args->prod))
 		return;
 	k = rlen % args->prod;
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 03/14] xfs: simplify extent allocation alignment
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
  2024-08-01 16:30 ` [PATCH v3 01/14] xfs: only allow minlen allocations when near ENOSPC John Garry
  2024-08-01 16:30 ` [PATCH v3 02/14] xfs: always tail align maxlen allocations John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 18:56   ` Darrick J. Wong
  2024-08-01 16:30 ` [PATCH v3 04/14] xfs: make EOF allocation simpler John Garry
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

We currently align extent allocation to stripe unit or stripe width.
That is specified by an external parameter to the allocation code,
which then manipulates the xfs_alloc_args alignment configuration in
interesting ways.

The args->alignment field specifies extent start alignment, but
because we may be attempting non-aligned allocation first there are
also slop variables that allow for those allocation attempts to
account for aligned allocation if they fail.

This gets much more complex as we introduce forced allocation
alignment, where extent size hints are used to generate the extent
start alignment. extent size hints currently only affect extent
lengths (via args->prod and args->mod) and so with this change we
will have two different start alignment conditions.

Avoid this complexity by always using args->alignment to indicate
extent start alignment, and always using args->prod/mod to indicate
extent length adjustment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[jpg: fixup alignslop references in xfs_trace.h and xfs_ialloc.c]
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |  4 +-
 fs/xfs/libxfs/xfs_alloc.h  |  2 +-
 fs/xfs/libxfs/xfs_bmap.c   | 95 ++++++++++++++++----------------------
 fs/xfs/libxfs/xfs_ialloc.c | 10 ++--
 fs/xfs/xfs_trace.h         |  8 ++--
 5 files changed, 53 insertions(+), 66 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index bf08b9e9d9ac..a9ab7d71c558 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2506,7 +2506,7 @@ xfs_alloc_space_available(
 	reservation = xfs_ag_resv_needed(pag, args->resv);
 
 	/* do we have enough contiguous free space for the allocation? */
-	alloc_len = args->minlen + (args->alignment - 1) + args->minalignslop;
+	alloc_len = args->minlen + (args->alignment - 1) + args->alignslop;
 	longest = xfs_alloc_longest_free_extent(pag, min_free, reservation);
 	if (longest < alloc_len)
 		return false;
@@ -2535,7 +2535,7 @@ xfs_alloc_space_available(
 	 * allocation as we know that will definitely succeed and match the
 	 * callers alignment constraints.
 	 */
-	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
+	alloc_len = args->maxlen + (args->alignment - 1) + args->alignslop;
 	if (longest < alloc_len) {
 		args->maxlen = args->minlen;
 		ASSERT(args->maxlen > 0);
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index fae170825be0..473822a5d4e9 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
 	xfs_extlen_t	minleft;	/* min blocks must be left after us */
 	xfs_extlen_t	total;		/* total blocks needed in xaction */
 	xfs_extlen_t	alignment;	/* align answer to multiple of this */
-	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
+	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
 	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
 	xfs_agblock_t	max_agbno;	/* ... */
 	xfs_extlen_t	len;		/* output: actual size of extent */
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7df74c35d9f9..25a87e1154bb 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3286,6 +3286,10 @@ xfs_bmap_select_minlen(
 	xfs_extlen_t		blen)
 {
 
+	/* Adjust best length for extent start alignment. */
+	if (blen > args->alignment)
+		blen -= args->alignment;
+
 	/*
 	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
 	 * possible that there is enough contiguous free space for this request.
@@ -3394,35 +3398,43 @@ xfs_bmap_alloc_account(
 	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, fld, ap->length);
 }
 
-static int
+/*
+ * Calculate the extent start alignment and the extent length adjustments that
+ * constrain this allocation.
+ *
+ * Extent start alignment is currently determined by stripe configuration and is
+ * carried in args->alignment, whilst extent length adjustment is determined by
+ * extent size hints and is carried by args->prod and args->mod.
+ *
+ * Low level allocation code is free to either ignore or override these values
+ * as required.
+ */
+static void
 xfs_bmap_compute_alignments(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_extlen_t		align = 0; /* minimum allocation alignment */
-	int			stripe_align = 0;
 
 	/* stripe alignment for allocation is determined by mount parameters */
 	if (mp->m_swidth && xfs_has_swalloc(mp))
-		stripe_align = mp->m_swidth;
+		args->alignment = mp->m_swidth;
 	else if (mp->m_dalign)
-		stripe_align = mp->m_dalign;
+		args->alignment = mp->m_dalign;
 
 	if (ap->flags & XFS_BMAPI_COWFORK)
 		align = xfs_get_cowextsz_hint(ap->ip);
 	else if (ap->datatype & XFS_ALLOC_USERDATA)
 		align = xfs_get_extsz_hint(ap->ip);
+
 	if (align) {
 		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
 					ap->eof, 0, ap->conv, &ap->offset,
 					&ap->length))
 			ASSERT(0);
 		ASSERT(ap->length);
-	}
 
-	/* apply extent size hints if obtained earlier */
-	if (align) {
 		args->prod = align;
 		div_u64_rem(ap->offset, args->prod, &args->mod);
 		if (args->mod)
@@ -3437,7 +3449,6 @@ xfs_bmap_compute_alignments(
 			args->mod = args->prod - args->mod;
 	}
 
-	return stripe_align;
 }
 
 static void
@@ -3509,7 +3520,7 @@ xfs_bmap_exact_minlen_extent_alloc(
 	args.total = ap->total;
 
 	args.alignment = 1;
-	args.minalignslop = 0;
+	args.alignslop = 0;
 
 	args.minleft = ap->minleft;
 	args.wasdel = ap->wasdel;
@@ -3549,7 +3560,6 @@ xfs_bmap_btalloc_at_eof(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		blen,
-	int			stripe_align,
 	bool			ag_only)
 {
 	struct xfs_mount	*mp = args->mp;
@@ -3563,23 +3573,15 @@ xfs_bmap_btalloc_at_eof(
 	 * allocation.
 	 */
 	if (ap->offset) {
-		xfs_extlen_t	nextminlen = 0;
+		xfs_extlen_t	alignment = args->alignment;
 
 		/*
-		 * Compute the minlen+alignment for the next case.  Set slop so
-		 * that the value of minlen+alignment+slop doesn't go up between
-		 * the calls.
+		 * Compute the alignment slop for the fallback path so we ensure
+		 * we account for the potential alignment space required by the
+		 * fallback paths before we modify the AGF and AGFL here.
 		 */
 		args->alignment = 1;
-		if (blen > stripe_align && blen <= args->maxlen)
-			nextminlen = blen - stripe_align;
-		else
-			nextminlen = args->minlen;
-		if (nextminlen + stripe_align > args->minlen + 1)
-			args->minalignslop = nextminlen + stripe_align -
-					args->minlen - 1;
-		else
-			args->minalignslop = 0;
+		args->alignslop = alignment - args->alignment;
 
 		if (!caller_pag)
 			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
@@ -3597,19 +3599,8 @@ xfs_bmap_btalloc_at_eof(
 		 * Exact allocation failed. Reset to try an aligned allocation
 		 * according to the original allocation specification.
 		 */
-		args->alignment = stripe_align;
-		args->minlen = nextminlen;
-		args->minalignslop = 0;
-	} else {
-		/*
-		 * Adjust minlen to try and preserve alignment if we
-		 * can't guarantee an aligned maxlen extent.
-		 */
-		args->alignment = stripe_align;
-		if (blen > args->alignment &&
-		    blen <= args->maxlen + args->alignment)
-			args->minlen = blen - args->alignment;
-		args->minalignslop = 0;
+		args->alignment = alignment;
+		args->alignslop = 0;
 	}
 
 	if (ag_only) {
@@ -3627,9 +3618,8 @@ xfs_bmap_btalloc_at_eof(
 		return 0;
 
 	/*
-	 * Allocation failed, so turn return the allocation args to their
-	 * original non-aligned state so the caller can proceed on allocation
-	 * failure as if this function was never called.
+	 * Aligned allocation failed, so all fallback paths from here drop the
+	 * start alignment requirement as we know it will not succeed.
 	 */
 	args->alignment = 1;
 	return 0;
@@ -3637,7 +3627,9 @@ xfs_bmap_btalloc_at_eof(
 
 /*
  * We have failed multiple allocation attempts so now are in a low space
- * allocation situation. Try a locality first full filesystem minimum length
+ * allocation situation. We give up on any attempt at aligned allocation here.
+ *
+ * Try a locality first full filesystem minimum length
  * allocation whilst still maintaining necessary total block reservation
  * requirements.
  *
@@ -3654,6 +3646,7 @@ xfs_bmap_btalloc_low_space(
 {
 	int			error;
 
+	args->alignment = 1;
 	if (args->minlen > ap->minlen) {
 		args->minlen = ap->minlen;
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
@@ -3673,13 +3666,11 @@ xfs_bmap_btalloc_low_space(
 static int
 xfs_bmap_btalloc_filestreams(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	int			stripe_align)
+	struct xfs_alloc_arg	*args)
 {
 	xfs_extlen_t		blen = 0;
 	int			error = 0;
 
-
 	error = xfs_filestream_select_ag(ap, args, &blen);
 	if (error)
 		return error;
@@ -3698,8 +3689,7 @@ xfs_bmap_btalloc_filestreams(
 
 	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
 	if (ap->aeof)
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
-				true);
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
 
 	if (!error && args->fsbno == NULLFSBLOCK)
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
@@ -3723,8 +3713,7 @@ xfs_bmap_btalloc_filestreams(
 static int
 xfs_bmap_btalloc_best_length(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	int			stripe_align)
+	struct xfs_alloc_arg	*args)
 {
 	xfs_extlen_t		blen = 0;
 	int			error;
@@ -3748,8 +3737,7 @@ xfs_bmap_btalloc_best_length(
 	 * trying.
 	 */
 	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
-				false);
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
 		if (error || args->fsbno != NULLFSBLOCK)
 			return error;
 	}
@@ -3776,27 +3764,26 @@ xfs_bmap_btalloc(
 		.resv		= XFS_AG_RESV_NONE,
 		.datatype	= ap->datatype,
 		.alignment	= 1,
-		.minalignslop	= 0,
+		.alignslop	= 0,
 	};
 	xfs_fileoff_t		orig_offset;
 	xfs_extlen_t		orig_length;
 	int			error;
-	int			stripe_align;
 
 	ASSERT(ap->length);
 	orig_offset = ap->offset;
 	orig_length = ap->length;
 
-	stripe_align = xfs_bmap_compute_alignments(ap, &args);
+	xfs_bmap_compute_alignments(ap, &args);
 
 	/* Trim the allocation back to the maximum an AG can fit. */
 	args.maxlen = min(ap->length, mp->m_ag_max_usable);
 
 	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
 	    xfs_inode_is_filestream(ap->ip))
-		error = xfs_bmap_btalloc_filestreams(ap, &args, stripe_align);
+		error = xfs_bmap_btalloc_filestreams(ap, &args);
 	else
-		error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align);
+		error = xfs_bmap_btalloc_best_length(ap, &args);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 0af5b7a33d05..2fa29d2f004e 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -758,12 +758,12 @@ xfs_ialloc_ag_alloc(
 		 *
 		 * For an exact allocation, alignment must be 1,
 		 * however we need to take cluster alignment into account when
-		 * fixing up the freelist. Use the minalignslop field to
-		 * indicate that extra blocks might be required for alignment,
-		 * but not to use them in the actual exact allocation.
+		 * fixing up the freelist. Use the alignslop field to indicate
+		 * that extra blocks might be required for alignment, but not
+		 * to use them in the actual exact allocation.
 		 */
 		args.alignment = 1;
-		args.minalignslop = igeo->cluster_align - 1;
+		args.alignslop = igeo->cluster_align - 1;
 
 		/* Allow space for the inode btree to split. */
 		args.minleft = igeo->inobt_maxlevels;
@@ -783,7 +783,7 @@ xfs_ialloc_ag_alloc(
 		 * on, so reset minalignslop to ensure it is not included in
 		 * subsequent requests.
 		 */
-		args.minalignslop = 0;
+		args.alignslop = 0;
 	}
 
 	if (unlikely(args.fsbno == NULLFSBLOCK)) {
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 5646d300b286..fb0c46d9a6d9 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1811,7 +1811,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__field(xfs_extlen_t, minleft)
 		__field(xfs_extlen_t, total)
 		__field(xfs_extlen_t, alignment)
-		__field(xfs_extlen_t, minalignslop)
+		__field(xfs_extlen_t, alignslop)
 		__field(xfs_extlen_t, len)
 		__field(char, wasdel)
 		__field(char, wasfromfl)
@@ -1830,7 +1830,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__entry->minleft = args->minleft;
 		__entry->total = args->total;
 		__entry->alignment = args->alignment;
-		__entry->minalignslop = args->minalignslop;
+		__entry->alignslop = args->alignslop;
 		__entry->len = args->len;
 		__entry->wasdel = args->wasdel;
 		__entry->wasfromfl = args->wasfromfl;
@@ -1839,7 +1839,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__entry->highest_agno = args->tp->t_highest_agno;
 	),
 	TP_printk("dev %d:%d agno 0x%x agbno 0x%x minlen %u maxlen %u mod %u "
-		  "prod %u minleft %u total %u alignment %u minalignslop %u "
+		  "prod %u minleft %u total %u alignment %u alignslop %u "
 		  "len %u wasdel %d wasfromfl %d resv %d "
 		  "datatype 0x%x highest_agno 0x%x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
@@ -1852,7 +1852,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		  __entry->minleft,
 		  __entry->total,
 		  __entry->alignment,
-		  __entry->minalignslop,
+		  __entry->alignslop,
 		  __entry->len,
 		  __entry->wasdel,
 		  __entry->wasfromfl,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 04/14] xfs: make EOF allocation simpler
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (2 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 03/14] xfs: simplify extent allocation alignment John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 18:58   ` Darrick J. Wong
  2024-08-01 16:30 ` [PATCH v3 05/14] xfs: introduce forced allocation alignment John Garry
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

Currently the allocation at EOF is broken into two cases - when the
offset is zero and when the offset is non-zero. When the offset is
non-zero, we try to do exact block allocation for contiguous
extent allocation. When the offset is zero, the allocation is simply
an aligned allocation.

We want aligned allocation as the fallback when exact block
allocation fails, but that complicates the EOF allocation in that it
now has to handle two different allocation cases. The
caller also has to handle allocation when not at EOF, and for the
upcoming forced alignment changes we need that to also be aligned
allocation.

To simplify all this, pull the aligned allocation cases back into
the callers and leave the EOF allocation path for exact block
allocation only. This means that the EOF exact block allocation
fallback path is the normal aligned allocation path and that ends up
making things a lot simpler when forced alignment is introduced.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c   | 129 +++++++++++++++----------------------
 fs/xfs/libxfs/xfs_ialloc.c |   2 +-
 2 files changed, 54 insertions(+), 77 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 25a87e1154bb..42a75d257b35 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3310,12 +3310,12 @@ xfs_bmap_select_minlen(
 static int
 xfs_bmap_btalloc_select_lengths(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		*blen)
+	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*pag;
 	xfs_agnumber_t		agno, startag;
+	xfs_extlen_t		blen = 0;
 	int			error = 0;
 
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
@@ -3329,19 +3329,18 @@ xfs_bmap_btalloc_select_lengths(
 	if (startag == NULLAGNUMBER)
 		startag = 0;
 
-	*blen = 0;
 	for_each_perag_wrap(mp, startag, agno, pag) {
-		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
+		error = xfs_bmap_longest_free_extent(pag, args->tp, &blen);
 		if (error && error != -EAGAIN)
 			break;
 		error = 0;
-		if (*blen >= args->maxlen)
+		if (blen >= args->maxlen)
 			break;
 	}
 	if (pag)
 		xfs_perag_rele(pag);
 
-	args->minlen = xfs_bmap_select_minlen(ap, args, *blen);
+	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
 	return error;
 }
 
@@ -3551,78 +3550,40 @@ xfs_bmap_exact_minlen_extent_alloc(
  * If we are not low on available data blocks and we are allocating at
  * EOF, optimise allocation for contiguous file extension and/or stripe
  * alignment of the new extent.
- *
- * NOTE: ap->aeof is only set if the allocation length is >= the
- * stripe unit and the allocation offset is at the end of file.
  */
 static int
 xfs_bmap_btalloc_at_eof(
 	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		blen,
-	bool			ag_only)
+	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*caller_pag = args->pag;
+	xfs_extlen_t		alignment = args->alignment;
 	int			error;
 
+	ASSERT(ap->aeof && ap->offset);
+	ASSERT(args->alignment >= 1);
+
 	/*
-	 * If there are already extents in the file, try an exact EOF block
-	 * allocation to extend the file as a contiguous extent. If that fails,
-	 * or it's the first allocation in a file, just try for a stripe aligned
-	 * allocation.
+	 * Compute the alignment slop for the fallback path so we ensure
+	 * we account for the potential alignemnt space required by the
+	 * fallback paths before we modify the AGF and AGFL here.
 	 */
-	if (ap->offset) {
-		xfs_extlen_t	alignment = args->alignment;
-
-		/*
-		 * Compute the alignment slop for the fallback path so we ensure
-		 * we account for the potential alignment space required by the
-		 * fallback paths before we modify the AGF and AGFL here.
-		 */
-		args->alignment = 1;
-		args->alignslop = alignment - args->alignment;
-
-		if (!caller_pag)
-			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
-		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
-		if (!caller_pag) {
-			xfs_perag_put(args->pag);
-			args->pag = NULL;
-		}
-		if (error)
-			return error;
-
-		if (args->fsbno != NULLFSBLOCK)
-			return 0;
-		/*
-		 * Exact allocation failed. Reset to try an aligned allocation
-		 * according to the original allocation specification.
-		 */
-		args->alignment = alignment;
-		args->alignslop = 0;
-	}
+	args->alignment = 1;
+	args->alignslop = alignment - args->alignment;
 
-	if (ag_only) {
-		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
-	} else {
+	if (!caller_pag)
+		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
+	error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
+	if (!caller_pag) {
+		xfs_perag_put(args->pag);
 		args->pag = NULL;
-		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
-		ASSERT(args->pag == NULL);
-		args->pag = caller_pag;
 	}
-	if (error)
-		return error;
 
-	if (args->fsbno != NULLFSBLOCK)
-		return 0;
-
-	/*
-	 * Aligned allocation failed, so all fallback paths from here drop the
-	 * start alignment requirement as we know it will not succeed.
-	 */
-	args->alignment = 1;
-	return 0;
+	/* Reset alignment to original specifications.  */
+	args->alignment = alignment;
+	args->alignslop = 0;
+	return error;
 }
 
 /*
@@ -3688,12 +3649,19 @@ xfs_bmap_btalloc_filestreams(
 	}
 
 	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
-	if (ap->aeof)
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
+	if (ap->aeof && ap->offset)
+		error = xfs_bmap_btalloc_at_eof(ap, args);
 
+	/* This may be an aligned allocation attempt. */
 	if (!error && args->fsbno == NULLFSBLOCK)
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
 
+	/* Attempt non-aligned allocation if we haven't already. */
+	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		args->alignment = 1;
+		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
+	}
+
 out_low_space:
 	/*
 	 * We are now done with the perag reference for the filestreams
@@ -3715,7 +3683,6 @@ xfs_bmap_btalloc_best_length(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
 {
-	xfs_extlen_t		blen = 0;
 	int			error;
 
 	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
@@ -3726,23 +3693,33 @@ xfs_bmap_btalloc_best_length(
 	 * the request.  If one isn't found, then adjust the minimum allocation
 	 * size to the largest space found.
 	 */
-	error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
+	error = xfs_bmap_btalloc_select_lengths(ap, args);
 	if (error)
 		return error;
 
 	/*
-	 * Don't attempt optimal EOF allocation if previous allocations barely
-	 * succeeded due to being near ENOSPC. It is highly unlikely we'll get
-	 * optimal or even aligned allocations in this case, so don't waste time
-	 * trying.
+	 * If we are in low space mode, then optimal allocation will fail so
+	 * prepare for minimal allocation and run the low space algorithm
+	 * immediately.
 	 */
-	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
-		if (error || args->fsbno != NULLFSBLOCK)
-			return error;
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
+		ASSERT(args->fsbno == NULLFSBLOCK);
+		return xfs_bmap_btalloc_low_space(ap, args);
+	}
+
+	if (ap->aeof && ap->offset)
+		error = xfs_bmap_btalloc_at_eof(ap, args);
+
+	/* This may be an aligned allocation attempt. */
+	if (!error && args->fsbno == NULLFSBLOCK)
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
+
+	/* Attempt non-aligned allocation if we haven't already. */
+	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		args->alignment = 1;
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	}
 
-	error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	if (error || args->fsbno != NULLFSBLOCK)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 2fa29d2f004e..c5d220d51757 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -780,7 +780,7 @@ xfs_ialloc_ag_alloc(
 		 * the exact agbno requirement and increase the alignment
 		 * instead. It is critical that the total size of the request
 		 * (len + alignment + slop) does not increase from this point
-		 * on, so reset minalignslop to ensure it is not included in
+		 * on, so reset alignslop to ensure it is not included in
 		 * subsequent requests.
 		 */
 		args.alignslop = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 05/14] xfs: introduce forced allocation alignment
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (3 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 04/14] xfs: make EOF allocation simpler John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-01 16:30 ` [PATCH v3 06/14] xfs: align args->minlen for " John Garry
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

When forced allocation alignment is specified, the extent will
be aligned to the extent size hint size rather than stripe
alignment. If aligned allocation cannot be done, then the allocation
is failed rather than attempting non-aligned fallbacks.

Note: none of the per-inode force align configuration is present
yet, so this just triggers off an "always false" wrapper function
for the moment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
[jpg: Set XFS_ALLOC_FORCEALIGN in xfs_bmap_alloc_userdata(), rather than
      xfs_bmap_compute_alignments()]
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.h |  1 +
 fs/xfs/libxfs/xfs_bmap.c  | 28 +++++++++++++++++++++++-----
 fs/xfs/xfs_inode.h        |  5 +++++
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 473822a5d4e9..17b062e8b925 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -66,6 +66,7 @@ typedef struct xfs_alloc_arg {
 #define XFS_ALLOC_USERDATA		(1 << 0)/* allocation is for user data*/
 #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
 #define XFS_ALLOC_NOBUSY		(1 << 2)/* Busy extents not allowed */
+#define XFS_ALLOC_FORCEALIGN		(1 << 3)/* forced extent alignment */
 
 /* freespace limit calculations */
 unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 42a75d257b35..602a5a50bcca 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3401,9 +3401,10 @@ xfs_bmap_alloc_account(
  * Calculate the extent start alignment and the extent length adjustments that
  * constrain this allocation.
  *
- * Extent start alignment is currently determined by stripe configuration and is
- * carried in args->alignment, whilst extent length adjustment is determined by
- * extent size hints and is carried by args->prod and args->mod.
+ * Extent start alignment is currently determined by forced inode alignment or
+ * stripe configuration and is carried in args->alignment, whilst extent length
+ * adjustment is determined by extent size hints and is carried by args->prod
+ * and args->mod.
  *
  * Low level allocation code is free to either ignore or override these values
  * as required.
@@ -3416,8 +3417,13 @@ xfs_bmap_compute_alignments(
 	struct xfs_mount	*mp = args->mp;
 	xfs_extlen_t		align = 0; /* minimum allocation alignment */
 
-	/* stripe alignment for allocation is determined by mount parameters */
-	if (mp->m_swidth && xfs_has_swalloc(mp))
+	/*
+	 * Forced inode alignment takes preference over stripe alignment.
+	 * Stripe alignment for allocation is determined by mount parameters.
+	 */
+	if (xfs_inode_has_forcealign(ap->ip))
+		args->alignment = xfs_get_extsz_hint(ap->ip);
+	else if (mp->m_swidth && xfs_has_swalloc(mp))
 		args->alignment = mp->m_swidth;
 	else if (mp->m_dalign)
 		args->alignment = mp->m_dalign;
@@ -3607,6 +3613,11 @@ xfs_bmap_btalloc_low_space(
 {
 	int			error;
 
+	if (args->alignment > 1 && (args->datatype & XFS_ALLOC_FORCEALIGN)) {
+		args->fsbno = NULLFSBLOCK;
+		return 0;
+	}
+
 	args->alignment = 1;
 	if (args->minlen > ap->minlen) {
 		args->minlen = ap->minlen;
@@ -3658,6 +3669,8 @@ xfs_bmap_btalloc_filestreams(
 
 	/* Attempt non-aligned allocation if we haven't already. */
 	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		if (args->datatype & XFS_ALLOC_FORCEALIGN)
+			return error;
 		args->alignment = 1;
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
 	}
@@ -3716,6 +3729,8 @@ xfs_bmap_btalloc_best_length(
 
 	/* Attempt non-aligned allocation if we haven't already. */
 	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
+		if (args->datatype & XFS_ALLOC_FORCEALIGN)
+			return error;
 		args->alignment = 1;
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	}
@@ -4151,6 +4166,9 @@ xfs_bmap_alloc_userdata(
 		if (bma->offset == 0)
 			bma->datatype |= XFS_ALLOC_INITIAL_USER_DATA;
 
+		if (xfs_inode_has_forcealign(bma->ip))
+			bma->datatype |= XFS_ALLOC_FORCEALIGN;
+
 		if (mp->m_dalign && bma->length >= mp->m_dalign) {
 			error = xfs_bmap_isaeof(bma, whichfork);
 			if (error)
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 51defdebef30..bf0f4f8b9e64 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -310,6 +310,11 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
 	return ip->i_diflags2 & XFS_DIFLAG2_NREXT64;
 }
 
+static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
+{
+	return false;
+}
+
 /*
  * Decide if this file is a realtime file whose data allocation unit is larger
  * than a single filesystem block.
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 06/14] xfs: align args->minlen for forced allocation alignment
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (4 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 05/14] xfs: introduce forced allocation alignment John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-01 16:30 ` [PATCH v3 07/14] xfs: Introduce FORCEALIGN inode flag John Garry
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: Dave Chinner <dchinner@redhat.com>

If args->minlen is not aligned to the constraints of forced
alignment, we may do minlen allocations that are not aligned when we
approach ENOSPC. Avoid this by always aligning args->minlen
appropriately. If alignment of minlen results in a value smaller
than the alignment constraint, fail the allocation immediately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 44 ++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 602a5a50bcca..0c3df8c71c6d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3279,32 +3279,48 @@ xfs_bmap_longest_free_extent(
 	return 0;
 }
 
-static xfs_extlen_t
+static int
 xfs_bmap_select_minlen(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		blen)
 {
-
 	/* Adjust best length for extent start alignment. */
 	if (blen > args->alignment)
 		blen -= args->alignment;
 
 	/*
 	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
-	 * possible that there is enough contiguous free space for this request.
+	 * possible that there is enough contiguous free space for this request
+	 * even if best length is less that the minimum length we need.
+	 *
+	 * If the best length won't satisfy the maximum length we requested,
+	 * then use it as the minimum length so we get as large an allocation
+	 * as possible.
 	 */
 	if (blen < ap->minlen)
-		return ap->minlen;
+		blen = ap->minlen;
+	else if (blen > args->maxlen)
+		blen = args->maxlen;
 
 	/*
-	 * If the best seen length is less than the request length,
-	 * use the best as the minimum, otherwise we've got the maxlen we
-	 * were asked for.
+	 * If we have alignment constraints, round the minlen down to match the
+	 * constraint so that alignment will be attempted. This may reduce the
+	 * allocation to smaller than was requested, so clamp the minimum to
+	 * ap->minlen to allow unaligned allocation to succeed. If we are forced
+	 * to align the allocation, return ENOSPC at this point because we don't
+	 * have enough contiguous free space to guarantee aligned allocation.
 	 */
-	if (blen < args->maxlen)
-		return blen;
-	return args->maxlen;
+	if (args->alignment > 1) {
+		blen = rounddown(blen, args->alignment);
+		if (blen < ap->minlen) {
+			if (args->datatype & XFS_ALLOC_FORCEALIGN)
+				return -ENOSPC;
+			blen = ap->minlen;
+		}
+	}
+	args->minlen = blen;
+	return 0;
 }
 
 static int
@@ -3340,8 +3356,7 @@ xfs_bmap_btalloc_select_lengths(
 	if (pag)
 		xfs_perag_rele(pag);
 
-	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
-	return error;
+	return xfs_bmap_select_minlen(ap, args, blen);
 }
 
 /* Update all inode and quota accounting for the allocation we just did. */
@@ -3659,7 +3674,10 @@ xfs_bmap_btalloc_filestreams(
 		goto out_low_space;
 	}
 
-	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
+	error = xfs_bmap_select_minlen(ap, args, blen);
+	if (error)
+		goto out_low_space;
+
 	if (ap->aeof && ap->offset)
 		error = xfs_bmap_btalloc_at_eof(ap, args);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 07/14] xfs: Introduce FORCEALIGN inode flag
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (5 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 06/14] xfs: align args->minlen for " John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 19:02   ` Darrick J. Wong
  2024-08-01 16:30 ` [PATCH v3 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign John Garry
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: "Darrick J. Wong" <djwong@kernel.org>

Add a new inode flag to require that all file data extent mappings must
be aligned (both the file offset range and the allocated space itself)
to the extent size hint.  Having a separate COW extent size hint is no
longer allowed.

The goal here is to enable sysadmins and users to mandate that all space
mappings in a file must have a startoff/blockcount that are aligned to
(say) a 2MB alignment and that the startblock/blockcount will follow the
same alignment.

Allocated space will be aligned to start of the AG, and not necessarily
aligned with disk blocks. The upcoming atomic writes feature will rely and
forcealign and will also require allocated space will also be aligned to
disk blocks.

reflink will not be supported for forcealign yet, so disallow a mount under
this condition. This is because we have the limitation of pageache
writeback not knowing how to writeback an entire allocation unut, so
reject a mount with relink.

RT vol will not be supported for forcealign yet, so disallow a mount under
this condition. It will be possible to support RT vol and forcealign in
future. For this, the inode extsize must be a multiple of rtextsize - this
is enforced already in xfs_ioctl_setattr_check_extsize() and
xfs_inode_validate_extsize().

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Co-developed-by: John Garry <john.g.garry@oracle.com>
[jpg: many changes from orig, including forcealign inode verification
 rework, ioctl setattr rework disallow reflink a forcealign inode,
 disallow mount for forcealign + reflink or rt]
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h     |  6 ++++-
 fs/xfs/libxfs/xfs_inode_buf.c  | 46 ++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_inode_buf.h  |  3 +++
 fs/xfs/libxfs/xfs_inode_util.c | 14 +++++++++++
 fs/xfs/libxfs/xfs_sb.c         |  2 ++
 fs/xfs/xfs_inode.h             |  8 +++++-
 fs/xfs/xfs_ioctl.c             | 46 ++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_mount.h             |  2 ++
 fs/xfs/xfs_reflink.c           |  5 ++--
 fs/xfs/xfs_super.c             | 18 +++++++++++++
 include/uapi/linux/fs.h        |  2 ++
 11 files changed, 146 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index e1bfee0c3b1a..95f5259c4255 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -352,6 +352,7 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
+#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
@@ -1093,16 +1094,19 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
 #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
 #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
+/* data extent mappings for regular files must be aligned to extent size hint */
+#define XFS_DIFLAG2_FORCEALIGN_BIT 5
 
 #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
 #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
 #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
 #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
 #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
+#define XFS_DIFLAG2_FORCEALIGN	(1 << XFS_DIFLAG2_FORCEALIGN_BIT)
 
 #define XFS_DIFLAG2_ANY \
 	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
-	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
+	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_FORCEALIGN)
 
 static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
 {
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 513b50da6215..1c59891fa9e2 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -657,6 +657,15 @@ xfs_dinode_verify(
 	    !xfs_has_bigtime(mp))
 		return __this_address;
 
+	if (flags2 & XFS_DIFLAG2_FORCEALIGN) {
+		fa = xfs_inode_validate_forcealign(mp,
+				be32_to_cpu(dip->di_extsize),
+				be32_to_cpu(dip->di_cowextsize),
+				mode, flags, flags2);
+		if (fa)
+			return fa;
+	}
+
 	return NULL;
 }
 
@@ -824,3 +833,40 @@ xfs_inode_validate_cowextsize(
 
 	return NULL;
 }
+
+/* Validate the forcealign inode flag */
+xfs_failaddr_t
+xfs_inode_validate_forcealign(
+	struct xfs_mount	*mp,
+	uint32_t		extsize,
+	uint32_t		cowextsize,
+	uint16_t		mode,
+	uint16_t		flags,
+	uint64_t		flags2)
+{
+	/* superblock rocompat feature flag */
+	if (!xfs_has_forcealign(mp))
+		return __this_address;
+
+	/* Only regular files and directories */
+	if (!S_ISDIR(mode) && !S_ISREG(mode))
+		return __this_address;
+
+	/* We require EXTSIZE or EXTSZINHERIT */
+	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
+		return __this_address;
+
+	/* We require a non-zero extsize */
+	if (!extsize)
+		return __this_address;
+
+	/* COW extsize disallowed */
+	if (flags2 & XFS_DIFLAG2_COWEXTSIZE)
+		return __this_address;
+
+	/* cowextsize must be zero */
+	if (cowextsize)
+		return __this_address;
+
+	return NULL;
+}
diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h
index 585ed5a110af..b8b65287b037 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.h
+++ b/fs/xfs/libxfs/xfs_inode_buf.h
@@ -33,6 +33,9 @@ xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp,
 xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp,
 		uint32_t cowextsize, uint16_t mode, uint16_t flags,
 		uint64_t flags2);
+xfs_failaddr_t xfs_inode_validate_forcealign(struct xfs_mount *mp,
+		uint32_t extsize, uint32_t cowextsize, uint16_t mode,
+		uint16_t flags, uint64_t flags2);
 
 static inline uint64_t xfs_inode_encode_bigtime(struct timespec64 tv)
 {
diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
index 032333289113..b264939d8855 100644
--- a/fs/xfs/libxfs/xfs_inode_util.c
+++ b/fs/xfs/libxfs/xfs_inode_util.c
@@ -80,6 +80,8 @@ xfs_flags2diflags2(
 		di_flags2 |= XFS_DIFLAG2_DAX;
 	if (xflags & FS_XFLAG_COWEXTSIZE)
 		di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
+	if (xflags & FS_XFLAG_FORCEALIGN)
+		di_flags2 |= XFS_DIFLAG2_FORCEALIGN;
 
 	return di_flags2;
 }
@@ -126,6 +128,8 @@ xfs_ip2xflags(
 			flags |= FS_XFLAG_DAX;
 		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
 			flags |= FS_XFLAG_COWEXTSIZE;
+		if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
+			flags |= FS_XFLAG_FORCEALIGN;
 	}
 
 	if (xfs_inode_has_attr_fork(ip))
@@ -224,6 +228,8 @@ xfs_inode_inherit_flags2(
 	}
 	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
 		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
+	if (pip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
+		ip->i_diflags2 |= XFS_DIFLAG2_FORCEALIGN;
 
 	/* Don't let invalid cowextsize hints propagate. */
 	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
@@ -232,6 +238,14 @@ xfs_inode_inherit_flags2(
 		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
 		ip->i_cowextsize = 0;
 	}
+	if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN) {
+		failaddr = xfs_inode_validate_forcealign(ip->i_mount,
+				ip->i_extsize, ip->i_cowextsize,
+				VFS_I(ip)->i_mode, ip->i_diflags,
+				ip->i_diflags2);
+		if (failaddr)
+			ip->i_diflags2 &= ~XFS_DIFLAG2_FORCEALIGN;
+	}
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 6b56f0f6d4c1..e56911553edd 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -164,6 +164,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_REFLINK;
 	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
 		features |= XFS_FEAT_INOBTCNT;
+	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
+		features |= XFS_FEAT_FORCEALIGN;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
 		features |= XFS_FEAT_FTYPE;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index bf0f4f8b9e64..3e7664ec4d6c 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -312,7 +312,13 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
 
 static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
 {
-	return false;
+	if (!(ip->i_diflags & XFS_DIFLAG_EXTSIZE))
+		return false;
+	if (ip->i_extsize <= 1)
+		return false;
+	if (xfs_is_cow_inode(ip))
+		return false;
+	return ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN;
 }
 
 /*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 4e933db75b12..7a6757a4d2bd 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -469,6 +469,39 @@ xfs_fileattr_get(
 	return 0;
 }
 
+/*
+ * Forcealign requires a non-zero extent size hint and a zero cow
+ * extent size hint.
+ */
+static int
+xfs_ioctl_setattr_forcealign(
+	struct xfs_inode	*ip,
+	struct fileattr		*fa)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (!xfs_has_forcealign(mp))
+		return -EINVAL;
+
+	if (xfs_is_reflink_inode(ip))
+		return -EINVAL;
+
+	if (!(fa->fsx_xflags & (FS_XFLAG_EXTSIZE |
+				FS_XFLAG_EXTSZINHERIT)))
+		return -EINVAL;
+
+	if (fa->fsx_xflags & FS_XFLAG_COWEXTSIZE)
+		return -EINVAL;
+
+	if (!fa->fsx_extsize)
+		return -EINVAL;
+
+	if (fa->fsx_cowextsize)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int
 xfs_ioctl_setattr_xflags(
 	struct xfs_trans	*tp,
@@ -477,10 +510,13 @@ xfs_ioctl_setattr_xflags(
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	bool			rtflag = (fa->fsx_xflags & FS_XFLAG_REALTIME);
+	bool			forcealign = fa->fsx_xflags & FS_XFLAG_FORCEALIGN;
 	uint64_t		i_flags2;
+	int			error;
 
-	if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
-		/* Can't change realtime flag if any extents are allocated. */
+	/* Can't change RT or forcealign flags if any extents are allocated. */
+	if (rtflag != XFS_IS_REALTIME_INODE(ip) ||
+	    forcealign != xfs_inode_has_forcealign(ip)) {
 		if (ip->i_df.if_nextents || ip->i_delayed_blks)
 			return -EINVAL;
 	}
@@ -501,6 +537,12 @@ xfs_ioctl_setattr_xflags(
 	if (i_flags2 && !xfs_has_v3inodes(mp))
 		return -EINVAL;
 
+	if (forcealign) {
+		error = xfs_ioctl_setattr_forcealign(ip, fa);
+		if (error)
+			return error;
+	}
+
 	ip->i_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
 	ip->i_diflags2 = i_flags2;
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index d0567dfbc036..30228fea908d 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -299,6 +299,7 @@ typedef struct xfs_mount {
 #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
 #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
 #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
+#define XFS_FEAT_FORCEALIGN	(1ULL << 28)	/* aligned file data extents */
 
 /* Mount features */
 #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
@@ -385,6 +386,7 @@ __XFS_ADD_V4_FEAT(projid32, PROJID32)
 __XFS_HAS_V4_FEAT(v3inodes, V3INODES)
 __XFS_HAS_V4_FEAT(crc, CRC)
 __XFS_HAS_V4_FEAT(pquotino, PQUOTINO)
+__XFS_HAS_FEAT(forcealign, FORCEALIGN)
 
 /*
  * Mount features
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 6fde6ec8092f..a836bfec7878 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1467,8 +1467,9 @@ xfs_reflink_remap_prep(
 
 	/* Check file eligibility and prepare for block sharing. */
 	ret = -EINVAL;
-	/* Don't reflink realtime inodes */
-	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
+	/* Don't reflink realtime or forcealign inodes */
+	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest) ||
+	    xfs_inode_has_forcealign(src) || xfs_inode_has_forcealign(dest))
 		goto out_unlock;
 
 	/* Don't share DAX file data with non-DAX file. */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 27e9f749c4c7..b52a01b50387 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1729,12 +1729,30 @@ xfs_fs_fill_super(
 			goto out_filestream_unmount;
 		}
 
+		if (xfs_has_forcealign(mp)) {
+			xfs_alert(mp,
+	"reflink not compatible with forcealign!");
+			error = -EINVAL;
+			goto out_filestream_unmount;
+		}
+
 		if (xfs_globals.always_cow) {
 			xfs_info(mp, "using DEBUG-only always_cow mode.");
 			mp->m_always_cow = true;
 		}
 	}
 
+	if (xfs_has_forcealign(mp)) {
+		if (xfs_has_realtime(mp)) {
+			xfs_alert(mp,
+		"forcealign not supported for realtime device!");
+			error = -EINVAL;
+			goto out_filestream_unmount;
+		}
+		xfs_warn(mp,
+"EXPERIMENTAL forced data extent alignment feature in use. Use at your own risk!");
+	}
+
 	if (xfs_has_rmapbt(mp) && mp->m_sb.sb_rblocks) {
 		xfs_alert(mp,
 	"reverse mapping btree not compatible with realtime device!");
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 753971770733..f55d650f904a 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -158,6 +158,8 @@ struct fsxattr {
 #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
 #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
 #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
+/* data extent mappings for regular files must be aligned to extent size hint */
+#define FS_XFLAG_FORCEALIGN	0x00020000
 #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /* the read-only stuff doesn't really belong here, but any other place is
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (6 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 07/14] xfs: Introduce FORCEALIGN inode flag John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 19:02   ` Darrick J. Wong
  2024-08-01 16:30 ` [PATCH v3 09/14] xfs: Update xfs_setattr_size() " John Garry
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For forcealign enabled, the allocation unit size is in ip->i_extsize, and
this must never be zero.

Add helper xfs_inode_alloc_fsbsize() to return alloc unit in FSBs, and use
it in xfs_inode_alloc_unitsize().

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_inode.c | 17 +++++++++++++----
 fs/xfs/xfs_inode.h |  1 +
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 7dc6f326936c..5af12f35062d 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3107,17 +3107,26 @@ xfs_break_layouts(
 	return error;
 }
 
-/* Returns the size of fundamental allocation unit for a file, in bytes. */
 unsigned int
-xfs_inode_alloc_unitsize(
+xfs_inode_alloc_fsbsize(
 	struct xfs_inode	*ip)
 {
 	unsigned int		blocks = 1;
 
-	if (XFS_IS_REALTIME_INODE(ip))
+	if (xfs_inode_has_forcealign(ip))
+		blocks = ip->i_extsize;
+	else if (XFS_IS_REALTIME_INODE(ip))
 		blocks = ip->i_mount->m_sb.sb_rextsize;
 
-	return XFS_FSB_TO_B(ip->i_mount, blocks);
+	return blocks;
+}
+
+/* Returns the size of fundamental allocation unit for a file, in bytes. */
+unsigned int
+xfs_inode_alloc_unitsize(
+	struct xfs_inode	*ip)
+{
+	return XFS_FSB_TO_B(ip->i_mount, xfs_inode_alloc_fsbsize(ip));
 }
 
 /* Should we always be using copy on write for file writes? */
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 3e7664ec4d6c..158afad8c7a4 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -641,6 +641,7 @@ int xfs_inode_reload_unlinked(struct xfs_inode *ip);
 bool xfs_ifork_zapped(const struct xfs_inode *ip, int whichfork);
 void xfs_inode_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_filblks_t *dblocks, xfs_filblks_t *rblocks);
+unsigned int xfs_inode_alloc_fsbsize(struct xfs_inode *ip);
 unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip);
 
 int xfs_icreate_dqalloc(const struct xfs_icreate_args *args,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 09/14] xfs: Update xfs_setattr_size() for forcealign
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (7 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 19:03   ` Darrick J. Wong
  2024-08-01 16:30 ` [PATCH v3 10/14] xfs: Do not free EOF blocks " John Garry
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For when an inode has forcealign, reserve blocks for same reason which we
were doing for big RT alloc.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_iops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 1cdc8034f54d..6e017aa6f61d 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -926,12 +926,12 @@ xfs_setattr_size(
 	}
 
 	/*
-	 * For realtime inode with more than one block rtextsize, we need the
+	 * For inodes with more than one block alloc unitsize, we need the
 	 * block reservation for bmap btree block allocations/splits that can
 	 * happen since it could split the tail written extent and convert the
 	 * right beyond EOF one to unwritten.
 	 */
-	if (xfs_inode_has_bigrtalloc(ip))
+	if (xfs_inode_alloc_fsbsize(ip) > 1)
 		resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
 
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 10/14] xfs: Do not free EOF blocks for forcealign
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (8 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 09/14] xfs: Update xfs_setattr_size() " John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 19:24   ` Darrick J. Wong
  2024-08-01 16:30 ` [PATCH v3 11/14] xfs: Only free full extents " John Garry
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For when forcealign is enabled, we want the EOF to be aligned as well, so
do not free EOF blocks.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> #earlier version
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_bmap_util.c |  7 +++++--
 fs/xfs/xfs_inode.c     | 14 ++++++++++++++
 fs/xfs/xfs_inode.h     |  2 ++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index fe2e2c930975..60389ac8bd45 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -496,6 +496,7 @@ xfs_can_free_eofblocks(
 	struct xfs_mount	*mp = ip->i_mount;
 	xfs_fileoff_t		end_fsb;
 	xfs_fileoff_t		last_fsb;
+	xfs_fileoff_t		dummy_fsb;
 	int			nimaps = 1;
 	int			error;
 
@@ -537,8 +538,10 @@ xfs_can_free_eofblocks(
 	 * forever.
 	 */
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
-	if (xfs_inode_has_bigrtalloc(ip))
-		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
+
+	/* Only try to free beyond the allocation unit that crosses EOF */
+	xfs_roundout_to_alloc_fsbsize(ip, &dummy_fsb, &end_fsb);
+
 	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
 	if (last_fsb <= end_fsb)
 		return false;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 5af12f35062d..d765dedebc15 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3129,6 +3129,20 @@ xfs_inode_alloc_unitsize(
 	return XFS_FSB_TO_B(ip->i_mount, xfs_inode_alloc_fsbsize(ip));
 }
 
+void
+xfs_roundout_to_alloc_fsbsize(
+	struct xfs_inode	*ip,
+	xfs_fileoff_t		*start,
+	xfs_fileoff_t		*end)
+{
+	unsigned int		blocks = xfs_inode_alloc_fsbsize(ip);
+
+	if (blocks == 1)
+		return;
+	*start = rounddown_64(*start, blocks);
+	*end = roundup_64(*end, blocks);
+}
+
 /* Should we always be using copy on write for file writes? */
 bool
 xfs_is_always_cow_inode(
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 158afad8c7a4..7f86c4781bd8 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -643,6 +643,8 @@ void xfs_inode_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_filblks_t *dblocks, xfs_filblks_t *rblocks);
 unsigned int xfs_inode_alloc_fsbsize(struct xfs_inode *ip);
 unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip);
+void xfs_roundout_to_alloc_fsbsize(struct xfs_inode *ip,
+		xfs_fileoff_t *start, xfs_fileoff_t *end);
 
 int xfs_icreate_dqalloc(const struct xfs_icreate_args *args,
 		struct xfs_dquot **udqpp, struct xfs_dquot **gdqpp,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 11/14] xfs: Only free full extents for forcealign
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (9 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 10/14] xfs: Do not free EOF blocks " John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 19:27   ` Darrick J. Wong
  2024-08-01 16:30 ` [PATCH v3 12/14] xfs: Unmap blocks according to forcealign John Garry
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

Like we already do for rtvol, only free full extents for forcealign in
xfs_free_file_space().

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> #earlier version
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/xfs_bmap_util.c |  7 ++-----
 fs/xfs/xfs_inode.c     | 14 ++++++++++++++
 fs/xfs/xfs_inode.h     |  2 ++
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 60389ac8bd45..46eebecd7bba 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -854,11 +854,8 @@ xfs_free_file_space(
 	startoffset_fsb = XFS_B_TO_FSB(mp, offset);
 	endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
 
-	/* We can only free complete realtime extents. */
-	if (xfs_inode_has_bigrtalloc(ip)) {
-		startoffset_fsb = xfs_rtb_roundup_rtx(mp, startoffset_fsb);
-		endoffset_fsb = xfs_rtb_rounddown_rtx(mp, endoffset_fsb);
-	}
+	/* Free only complete extents. */
+	xfs_roundin_to_alloc_fsbsize(ip, &startoffset_fsb, &endoffset_fsb);
 
 	/*
 	 * Need to zero the stuff we're not freeing, on disk.
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d765dedebc15..e7fa155fcbde 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3143,6 +3143,20 @@ xfs_roundout_to_alloc_fsbsize(
 	*end = roundup_64(*end, blocks);
 }
 
+void
+xfs_roundin_to_alloc_fsbsize(
+	struct xfs_inode	*ip,
+	xfs_fileoff_t		*start,
+	xfs_fileoff_t		*end)
+{
+	unsigned int		blocks = xfs_inode_alloc_fsbsize(ip);
+
+	if (blocks == 1)
+		return;
+	*start = roundup_64(*start, blocks);
+	*end = rounddown_64(*end, blocks);
+}
+
 /* Should we always be using copy on write for file writes? */
 bool
 xfs_is_always_cow_inode(
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 7f86c4781bd8..6dd8055c98b3 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -645,6 +645,8 @@ unsigned int xfs_inode_alloc_fsbsize(struct xfs_inode *ip);
 unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip);
 void xfs_roundout_to_alloc_fsbsize(struct xfs_inode *ip,
 		xfs_fileoff_t *start, xfs_fileoff_t *end);
+void xfs_roundin_to_alloc_fsbsize(struct xfs_inode *ip,
+		xfs_fileoff_t *start, xfs_fileoff_t *end);
 
 int xfs_icreate_dqalloc(const struct xfs_icreate_args *args,
 		struct xfs_dquot **udqpp, struct xfs_dquot **gdqpp,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 12/14] xfs: Unmap blocks according to forcealign
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (10 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 11/14] xfs: Only free full extents " John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 20:14   ` Darrick J. Wong
  2024-08-01 16:30 ` [PATCH v3 13/14] xfs: Don't revert allocated offset for forcealign John Garry
  2024-08-01 16:30 ` [PATCH v3 14/14] xfs: Enable file data forcealign feature John Garry
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

For when forcealign is enabled, blocks in an inode need to be unmapped
according to extent alignment, like what is already done for rtvol.

Change variable isrt in __xfs_bunmapi() to a bool, as that is really what
it is.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 48 +++++++++++++++++++++++++++++-----------
 fs/xfs/xfs_inode.c       | 16 ++++++++++++++
 fs/xfs/xfs_inode.h       |  2 ++
 3 files changed, 53 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 0c3df8c71c6d..d6ae344a17fc 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5409,6 +5409,25 @@ xfs_bmap_del_extent_real(
 	return 0;
 }
 
+static xfs_extlen_t
+xfs_bunmapi_align(
+	struct xfs_inode	*ip,
+	xfs_fsblock_t		fsbno,
+	xfs_extlen_t *off)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (XFS_IS_REALTIME_INODE(ip))
+		return xfs_inode_alloc_fsbsize_align(ip, fsbno, off);
+	/*
+	 * The agbno for the fsbno is aligned to extsize, but the fsbno itself
+	 * is not necessarily aligned (to extsize), so use agbno to determine
+	 * mod+offset to the alloc unit boundary.
+	 */
+	return xfs_inode_alloc_fsbsize_align(ip, XFS_FSB_TO_AGBNO(mp, fsbno),
+					off);
+}
+
 /*
  * Unmap (remove) blocks from a file.
  * If nexts is nonzero then the number of extents to remove is limited to
@@ -5430,7 +5449,8 @@ __xfs_bunmapi(
 	xfs_extnum_t		extno;		/* extent number in list */
 	struct xfs_bmbt_irec	got;		/* current extent record */
 	struct xfs_ifork	*ifp;		/* inode fork pointer */
-	int			isrt;		/* freeing in rt area */
+	bool			isrt;		/* freeing in rt area */
+	bool			isforcealign;	/* forcealign inode */
 	int			logflags;	/* transaction logging flags */
 	xfs_extlen_t		mod;		/* rt extent offset */
 	struct xfs_mount	*mp = ip->i_mount;
@@ -5468,6 +5488,8 @@ __xfs_bunmapi(
 	}
 	XFS_STATS_INC(mp, xs_blk_unmap);
 	isrt = xfs_ifork_is_realtime(ip, whichfork);
+	isforcealign = (whichfork != XFS_ATTR_FORK) &&
+			xfs_inode_has_forcealign(ip);
 	end = start + len;
 
 	if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) {
@@ -5486,6 +5508,8 @@ __xfs_bunmapi(
 	extno = 0;
 	while (end != (xfs_fileoff_t)-1 && end >= start &&
 	       (nexts == 0 || extno < nexts)) {
+		xfs_extlen_t off;
+
 		/*
 		 * Is the found extent after a hole in which end lives?
 		 * Just back up to the previous extent, if so.
@@ -5519,18 +5543,18 @@ __xfs_bunmapi(
 		if (del.br_startoff + del.br_blockcount > end + 1)
 			del.br_blockcount = end + 1 - del.br_startoff;
 
-		if (!isrt || (flags & XFS_BMAPI_REMAP))
+		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
 			goto delete;
 
-		mod = xfs_rtb_to_rtxoff(mp,
-				del.br_startblock + del.br_blockcount);
+		mod = xfs_bunmapi_align(ip,
+				del.br_startblock + del.br_blockcount, &off);
 		if (mod) {
 			/*
-			 * Realtime extent not lined up at the end.
+			 * Not aligned to allocation unit on the end.
 			 * The extent could have been split into written
 			 * and unwritten pieces, or we could just be
 			 * unmapping part of it.  But we can't really
-			 * get rid of part of a realtime extent.
+			 * get rid of part of an extent.
 			 */
 			if (del.br_state == XFS_EXT_UNWRITTEN) {
 				/*
@@ -5554,7 +5578,7 @@ __xfs_bunmapi(
 			ASSERT(del.br_state == XFS_EXT_NORM);
 			ASSERT(tp->t_blk_res > 0);
 			/*
-			 * If this spans a realtime extent boundary,
+			 * If this spans an extent boundary,
 			 * chop it back to the start of the one we end at.
 			 */
 			if (del.br_blockcount > mod) {
@@ -5571,14 +5595,12 @@ __xfs_bunmapi(
 			goto nodelete;
 		}
 
-		mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
+		mod = xfs_bunmapi_align(ip, del.br_startblock, &off);
 		if (mod) {
-			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
-
 			/*
-			 * Realtime extent is lined up at the end but not
-			 * at the front.  We'll get rid of full extents if
-			 * we can.
+			 * Extent is lined up to the allocation unit at the
+			 * end but not at the front.  We'll get rid of full
+			 * extents if we can.
 			 */
 			if (del.br_blockcount > off) {
 				del.br_blockcount -= off;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e7fa155fcbde..bb8abf990186 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3164,3 +3164,19 @@ xfs_is_always_cow_inode(
 {
 	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
 }
+
+/* Return mod+offset for a blkno to an extent boundary */
+xfs_extlen_t
+xfs_inode_alloc_fsbsize_align(
+	struct xfs_inode	*ip,
+	xfs_fileoff_t		blkno,
+	xfs_extlen_t		*off)
+{
+	xfs_fileoff_t		blkno_start = blkno;
+	xfs_fileoff_t		blkno_end = blkno;
+
+	xfs_roundout_to_alloc_fsbsize(ip, &blkno_start, &blkno_end);
+
+	*off = blkno_end - blkno;
+	return blkno - blkno_start;
+}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 6dd8055c98b3..7b77797c3943 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -647,6 +647,8 @@ void xfs_roundout_to_alloc_fsbsize(struct xfs_inode *ip,
 		xfs_fileoff_t *start, xfs_fileoff_t *end);
 void xfs_roundin_to_alloc_fsbsize(struct xfs_inode *ip,
 		xfs_fileoff_t *start, xfs_fileoff_t *end);
+xfs_extlen_t xfs_inode_alloc_fsbsize_align(struct xfs_inode *ip,
+		xfs_fileoff_t blkno, xfs_extlen_t *off);
 
 int xfs_icreate_dqalloc(const struct xfs_icreate_args *args,
 		struct xfs_dquot **udqpp, struct xfs_dquot **gdqpp,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 13/14] xfs: Don't revert allocated offset for forcealign
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (11 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 12/14] xfs: Unmap blocks according to forcealign John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-01 16:30 ` [PATCH v3 14/14] xfs: Enable file data forcealign feature John Garry
  13 siblings, 0 replies; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

In xfs_bmap_process_allocated_extent(), for when we found that we could not
provide the requested length completely, the mapping is moved so that we
can provide as much as possible for the original request.

For forcealign, this would mean ignoring alignment guaranteed, so don't do
this.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index d6ae344a17fc..d246b15160ac 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3491,11 +3491,15 @@ xfs_bmap_process_allocated_extent(
 	 * original request as possible.  Free space is apparently
 	 * very fragmented so we're unlikely to be able to satisfy the
 	 * hints anyway.
+	 * However, for an inode with forcealign, continue with the
+	 * found offset as we need to honour the alignment hint.
 	 */
-	if (ap->length <= orig_length)
-		ap->offset = orig_offset;
-	else if (ap->offset + ap->length < orig_offset + orig_length)
-		ap->offset = orig_offset + orig_length - ap->length;
+	if (!xfs_inode_has_forcealign(ap->ip)) {
+		if (ap->length <= orig_length)
+			ap->offset = orig_offset;
+		else if (ap->offset + ap->length < orig_offset + orig_length)
+			ap->offset = orig_offset + orig_length - ap->length;
+	}
 	xfs_bmap_alloc_account(ap);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 14/14] xfs: Enable file data forcealign feature
  2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
                   ` (12 preceding siblings ...)
  2024-08-01 16:30 ` [PATCH v3 13/14] xfs: Don't revert allocated offset for forcealign John Garry
@ 2024-08-01 16:30 ` John Garry
  2024-08-06 19:43   ` Darrick J. Wong
  13 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-01 16:30 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen, John Garry

From: "Darrick J. Wong" <djwong@kernel.org>

Enable this feature.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 95f5259c4255..04c6cbc943c2 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -357,7 +357,8 @@ xfs_sb_has_compat_feature(
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
 		 XFS_SB_FEAT_RO_COMPAT_REFLINK| \
-		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
+		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT | \
+		 XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 01/14] xfs: only allow minlen allocations when near ENOSPC
  2024-08-01 16:30 ` [PATCH v3 01/14] xfs: only allow minlen allocations when near ENOSPC John Garry
@ 2024-08-06 18:51   ` Darrick J. Wong
  2024-08-07  0:26     ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 18:51 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:44PM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we are near ENOSPC and don't have enough free
> space for an args->maxlen allocation, xfs_alloc_space_available()
> will trim args->maxlen to equal the available space. However, this
> function has only checked that there is enough contiguous free space
> for an aligned args->minlen allocation to succeed. Hence there is no
> guarantee that an args->maxlen allocation will succeed, nor that the
> available space will allow for correct alignment of an args->maxlen
> allocation.
> 
> Further, by trimming args->maxlen arbitrarily, it breaks an
> assumption made in xfs_alloc_fix_len() that if the caller wants
> aligned allocation, then args->maxlen will be set to an aligned
> value. It then skips the tail alignment and so we end up with
> extents that aren't aligned to extent size hint boundaries as we
> approach ENOSPC.
> 
> To avoid this problem, don't reduce args->maxlen by some random,
> arbitrary amount. If args->maxlen is too large for the available
> space, reduce the allocation to a minlen allocation as we know we
> have contiguous free space available for this to succeed and always
> be correctly aligned.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 59326f84f6a5..d559d992c6ef 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2524,14 +2524,23 @@ xfs_alloc_space_available(
>  	if (available < (int)max(args->total, alloc_len))
>  		return false;
>  
> +	if (flags & XFS_ALLOC_FLAG_CHECK)
> +		return true;
> +
>  	/*
> -	 * Clamp maxlen to the amount of free space available for the actual
> -	 * extent allocation.
> +	 * If we can't do a maxlen allocation, then we must reduce the size of
> +	 * the allocation to match the available free space. We know how big
> +	 * the largest contiguous free space we can allocate is, so that's our
> +	 * upper bound. However, we don't exaclty know what alignment/size
> +	 * constraints have been placed on the allocation, so we can't
> +	 * arbitrarily select some new max size. Hence make this a minlen
> +	 * allocation as we know that will definitely succeed and match the
> +	 * callers alignment constraints.
>  	 */
> -	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
> -		args->maxlen = available;
> +	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
> +	if (longest < alloc_len) {
> +		args->maxlen = args->minlen;

Same question as the June 21st posting:

Is it possible to reduce maxlen the largest multiple of the alignment
that is still less than @longest?

--D

>  		ASSERT(args->maxlen > 0);
> -		ASSERT(args->maxlen >= args->minlen);
>  	}
>  
>  	return true;
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 03/14] xfs: simplify extent allocation alignment
  2024-08-01 16:30 ` [PATCH v3 03/14] xfs: simplify extent allocation alignment John Garry
@ 2024-08-06 18:56   ` Darrick J. Wong
  2024-08-06 23:52     ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 18:56 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:46PM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently align extent allocation to stripe unit or stripe width.
> That is specified by an external parameter to the allocation code,
> which then manipulates the xfs_alloc_args alignment configuration in
> interesting ways.
> 
> The args->alignment field specifies extent start alignment, but
> because we may be attempting non-aligned allocation first there are
> also slop variables that allow for those allocation attempts to
> account for aligned allocation if they fail.
> 
> This gets much more complex as we introduce forced allocation
> alignment, where extent size hints are used to generate the extent
> start alignment. extent size hints currently only affect extent
> lengths (via args->prod and args->mod) and so with this change we
> will have two different start alignment conditions.
> 
> Avoid this complexity by always using args->alignment to indicate
> extent start alignment, and always using args->prod/mod to indicate
> extent length adjustment.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> [jpg: fixup alignslop references in xfs_trace.h and xfs_ialloc.c]
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Going back to the 6/21 posting[1], what were the answers to the
questions I posted?  Did I correctly figure out what alignslop refers
to?  AFAICT this is the same patch though at least it has the nits
fixed.

[1] https://lore.kernel.org/linux-xfs/20240621202914.GT3058325@frogsfrogsfrogs/

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c  |  4 +-
>  fs/xfs/libxfs/xfs_alloc.h  |  2 +-
>  fs/xfs/libxfs/xfs_bmap.c   | 95 ++++++++++++++++----------------------
>  fs/xfs/libxfs/xfs_ialloc.c | 10 ++--
>  fs/xfs/xfs_trace.h         |  8 ++--
>  5 files changed, 53 insertions(+), 66 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index bf08b9e9d9ac..a9ab7d71c558 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2506,7 +2506,7 @@ xfs_alloc_space_available(
>  	reservation = xfs_ag_resv_needed(pag, args->resv);
>  
>  	/* do we have enough contiguous free space for the allocation? */
> -	alloc_len = args->minlen + (args->alignment - 1) + args->minalignslop;
> +	alloc_len = args->minlen + (args->alignment - 1) + args->alignslop;
>  	longest = xfs_alloc_longest_free_extent(pag, min_free, reservation);
>  	if (longest < alloc_len)
>  		return false;
> @@ -2535,7 +2535,7 @@ xfs_alloc_space_available(
>  	 * allocation as we know that will definitely succeed and match the
>  	 * callers alignment constraints.
>  	 */
> -	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
> +	alloc_len = args->maxlen + (args->alignment - 1) + args->alignslop;
>  	if (longest < alloc_len) {
>  		args->maxlen = args->minlen;
>  		ASSERT(args->maxlen > 0);
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index fae170825be0..473822a5d4e9 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -46,7 +46,7 @@ typedef struct xfs_alloc_arg {
>  	xfs_extlen_t	minleft;	/* min blocks must be left after us */
>  	xfs_extlen_t	total;		/* total blocks needed in xaction */
>  	xfs_extlen_t	alignment;	/* align answer to multiple of this */
> -	xfs_extlen_t	minalignslop;	/* slop for minlen+alignment calcs */
> +	xfs_extlen_t	alignslop;	/* slop for alignment calcs */
>  	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
>  	xfs_agblock_t	max_agbno;	/* ... */
>  	xfs_extlen_t	len;		/* output: actual size of extent */
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7df74c35d9f9..25a87e1154bb 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3286,6 +3286,10 @@ xfs_bmap_select_minlen(
>  	xfs_extlen_t		blen)
>  {
>  
> +	/* Adjust best length for extent start alignment. */
> +	if (blen > args->alignment)
> +		blen -= args->alignment;
> +
>  	/*
>  	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
>  	 * possible that there is enough contiguous free space for this request.
> @@ -3394,35 +3398,43 @@ xfs_bmap_alloc_account(
>  	xfs_trans_mod_dquot_byino(ap->tp, ap->ip, fld, ap->length);
>  }
>  
> -static int
> +/*
> + * Calculate the extent start alignment and the extent length adjustments that
> + * constrain this allocation.
> + *
> + * Extent start alignment is currently determined by stripe configuration and is
> + * carried in args->alignment, whilst extent length adjustment is determined by
> + * extent size hints and is carried by args->prod and args->mod.
> + *
> + * Low level allocation code is free to either ignore or override these values
> + * as required.
> + */
> +static void
>  xfs_bmap_compute_alignments(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	xfs_extlen_t		align = 0; /* minimum allocation alignment */
> -	int			stripe_align = 0;
>  
>  	/* stripe alignment for allocation is determined by mount parameters */
>  	if (mp->m_swidth && xfs_has_swalloc(mp))
> -		stripe_align = mp->m_swidth;
> +		args->alignment = mp->m_swidth;
>  	else if (mp->m_dalign)
> -		stripe_align = mp->m_dalign;
> +		args->alignment = mp->m_dalign;
>  
>  	if (ap->flags & XFS_BMAPI_COWFORK)
>  		align = xfs_get_cowextsz_hint(ap->ip);
>  	else if (ap->datatype & XFS_ALLOC_USERDATA)
>  		align = xfs_get_extsz_hint(ap->ip);
> +
>  	if (align) {
>  		if (xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 0,
>  					ap->eof, 0, ap->conv, &ap->offset,
>  					&ap->length))
>  			ASSERT(0);
>  		ASSERT(ap->length);
> -	}
>  
> -	/* apply extent size hints if obtained earlier */
> -	if (align) {
>  		args->prod = align;
>  		div_u64_rem(ap->offset, args->prod, &args->mod);
>  		if (args->mod)
> @@ -3437,7 +3449,6 @@ xfs_bmap_compute_alignments(
>  			args->mod = args->prod - args->mod;
>  	}
>  
> -	return stripe_align;
>  }
>  
>  static void
> @@ -3509,7 +3520,7 @@ xfs_bmap_exact_minlen_extent_alloc(
>  	args.total = ap->total;
>  
>  	args.alignment = 1;
> -	args.minalignslop = 0;
> +	args.alignslop = 0;
>  
>  	args.minleft = ap->minleft;
>  	args.wasdel = ap->wasdel;
> @@ -3549,7 +3560,6 @@ xfs_bmap_btalloc_at_eof(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args,
>  	xfs_extlen_t		blen,
> -	int			stripe_align,
>  	bool			ag_only)
>  {
>  	struct xfs_mount	*mp = args->mp;
> @@ -3563,23 +3573,15 @@ xfs_bmap_btalloc_at_eof(
>  	 * allocation.
>  	 */
>  	if (ap->offset) {
> -		xfs_extlen_t	nextminlen = 0;
> +		xfs_extlen_t	alignment = args->alignment;
>  
>  		/*
> -		 * Compute the minlen+alignment for the next case.  Set slop so
> -		 * that the value of minlen+alignment+slop doesn't go up between
> -		 * the calls.
> +		 * Compute the alignment slop for the fallback path so we ensure
> +		 * we account for the potential alignment space required by the
> +		 * fallback paths before we modify the AGF and AGFL here.
>  		 */
>  		args->alignment = 1;
> -		if (blen > stripe_align && blen <= args->maxlen)
> -			nextminlen = blen - stripe_align;
> -		else
> -			nextminlen = args->minlen;
> -		if (nextminlen + stripe_align > args->minlen + 1)
> -			args->minalignslop = nextminlen + stripe_align -
> -					args->minlen - 1;
> -		else
> -			args->minalignslop = 0;
> +		args->alignslop = alignment - args->alignment;
>  
>  		if (!caller_pag)
>  			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> @@ -3597,19 +3599,8 @@ xfs_bmap_btalloc_at_eof(
>  		 * Exact allocation failed. Reset to try an aligned allocation
>  		 * according to the original allocation specification.
>  		 */
> -		args->alignment = stripe_align;
> -		args->minlen = nextminlen;
> -		args->minalignslop = 0;
> -	} else {
> -		/*
> -		 * Adjust minlen to try and preserve alignment if we
> -		 * can't guarantee an aligned maxlen extent.
> -		 */
> -		args->alignment = stripe_align;
> -		if (blen > args->alignment &&
> -		    blen <= args->maxlen + args->alignment)
> -			args->minlen = blen - args->alignment;
> -		args->minalignslop = 0;
> +		args->alignment = alignment;
> +		args->alignslop = 0;
>  	}
>  
>  	if (ag_only) {
> @@ -3627,9 +3618,8 @@ xfs_bmap_btalloc_at_eof(
>  		return 0;
>  
>  	/*
> -	 * Allocation failed, so turn return the allocation args to their
> -	 * original non-aligned state so the caller can proceed on allocation
> -	 * failure as if this function was never called.
> +	 * Aligned allocation failed, so all fallback paths from here drop the
> +	 * start alignment requirement as we know it will not succeed.
>  	 */
>  	args->alignment = 1;
>  	return 0;
> @@ -3637,7 +3627,9 @@ xfs_bmap_btalloc_at_eof(
>  
>  /*
>   * We have failed multiple allocation attempts so now are in a low space
> - * allocation situation. Try a locality first full filesystem minimum length
> + * allocation situation. We give up on any attempt at aligned allocation here.
> + *
> + * Try a locality first full filesystem minimum length
>   * allocation whilst still maintaining necessary total block reservation
>   * requirements.
>   *
> @@ -3654,6 +3646,7 @@ xfs_bmap_btalloc_low_space(
>  {
>  	int			error;
>  
> +	args->alignment = 1;
>  	if (args->minlen > ap->minlen) {
>  		args->minlen = ap->minlen;
>  		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> @@ -3673,13 +3666,11 @@ xfs_bmap_btalloc_low_space(
>  static int
>  xfs_bmap_btalloc_filestreams(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	int			stripe_align)
> +	struct xfs_alloc_arg	*args)
>  {
>  	xfs_extlen_t		blen = 0;
>  	int			error = 0;
>  
> -
>  	error = xfs_filestream_select_ag(ap, args, &blen);
>  	if (error)
>  		return error;
> @@ -3698,8 +3689,7 @@ xfs_bmap_btalloc_filestreams(
>  
>  	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
>  	if (ap->aeof)
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
> -				true);
> +		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
>  
>  	if (!error && args->fsbno == NULLFSBLOCK)
>  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> @@ -3723,8 +3713,7 @@ xfs_bmap_btalloc_filestreams(
>  static int
>  xfs_bmap_btalloc_best_length(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	int			stripe_align)
> +	struct xfs_alloc_arg	*args)
>  {
>  	xfs_extlen_t		blen = 0;
>  	int			error;
> @@ -3748,8 +3737,7 @@ xfs_bmap_btalloc_best_length(
>  	 * trying.
>  	 */
>  	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
> -				false);
> +		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
>  		if (error || args->fsbno != NULLFSBLOCK)
>  			return error;
>  	}
> @@ -3776,27 +3764,26 @@ xfs_bmap_btalloc(
>  		.resv		= XFS_AG_RESV_NONE,
>  		.datatype	= ap->datatype,
>  		.alignment	= 1,
> -		.minalignslop	= 0,
> +		.alignslop	= 0,
>  	};
>  	xfs_fileoff_t		orig_offset;
>  	xfs_extlen_t		orig_length;
>  	int			error;
> -	int			stripe_align;
>  
>  	ASSERT(ap->length);
>  	orig_offset = ap->offset;
>  	orig_length = ap->length;
>  
> -	stripe_align = xfs_bmap_compute_alignments(ap, &args);
> +	xfs_bmap_compute_alignments(ap, &args);
>  
>  	/* Trim the allocation back to the maximum an AG can fit. */
>  	args.maxlen = min(ap->length, mp->m_ag_max_usable);
>  
>  	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
>  	    xfs_inode_is_filestream(ap->ip))
> -		error = xfs_bmap_btalloc_filestreams(ap, &args, stripe_align);
> +		error = xfs_bmap_btalloc_filestreams(ap, &args);
>  	else
> -		error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align);
> +		error = xfs_bmap_btalloc_best_length(ap, &args);
>  	if (error)
>  		return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 0af5b7a33d05..2fa29d2f004e 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -758,12 +758,12 @@ xfs_ialloc_ag_alloc(
>  		 *
>  		 * For an exact allocation, alignment must be 1,
>  		 * however we need to take cluster alignment into account when
> -		 * fixing up the freelist. Use the minalignslop field to
> -		 * indicate that extra blocks might be required for alignment,
> -		 * but not to use them in the actual exact allocation.
> +		 * fixing up the freelist. Use the alignslop field to indicate
> +		 * that extra blocks might be required for alignment, but not
> +		 * to use them in the actual exact allocation.
>  		 */
>  		args.alignment = 1;
> -		args.minalignslop = igeo->cluster_align - 1;
> +		args.alignslop = igeo->cluster_align - 1;
>  
>  		/* Allow space for the inode btree to split. */
>  		args.minleft = igeo->inobt_maxlevels;
> @@ -783,7 +783,7 @@ xfs_ialloc_ag_alloc(
>  		 * on, so reset minalignslop to ensure it is not included in
>  		 * subsequent requests.
>  		 */
> -		args.minalignslop = 0;
> +		args.alignslop = 0;
>  	}
>  
>  	if (unlikely(args.fsbno == NULLFSBLOCK)) {
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 5646d300b286..fb0c46d9a6d9 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1811,7 +1811,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>  		__field(xfs_extlen_t, minleft)
>  		__field(xfs_extlen_t, total)
>  		__field(xfs_extlen_t, alignment)
> -		__field(xfs_extlen_t, minalignslop)
> +		__field(xfs_extlen_t, alignslop)
>  		__field(xfs_extlen_t, len)
>  		__field(char, wasdel)
>  		__field(char, wasfromfl)
> @@ -1830,7 +1830,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>  		__entry->minleft = args->minleft;
>  		__entry->total = args->total;
>  		__entry->alignment = args->alignment;
> -		__entry->minalignslop = args->minalignslop;
> +		__entry->alignslop = args->alignslop;
>  		__entry->len = args->len;
>  		__entry->wasdel = args->wasdel;
>  		__entry->wasfromfl = args->wasfromfl;
> @@ -1839,7 +1839,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>  		__entry->highest_agno = args->tp->t_highest_agno;
>  	),
>  	TP_printk("dev %d:%d agno 0x%x agbno 0x%x minlen %u maxlen %u mod %u "
> -		  "prod %u minleft %u total %u alignment %u minalignslop %u "
> +		  "prod %u minleft %u total %u alignment %u alignslop %u "
>  		  "len %u wasdel %d wasfromfl %d resv %d "
>  		  "datatype 0x%x highest_agno 0x%x",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
> @@ -1852,7 +1852,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>  		  __entry->minleft,
>  		  __entry->total,
>  		  __entry->alignment,
> -		  __entry->minalignslop,
> +		  __entry->alignslop,
>  		  __entry->len,
>  		  __entry->wasdel,
>  		  __entry->wasfromfl,
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 04/14] xfs: make EOF allocation simpler
  2024-08-01 16:30 ` [PATCH v3 04/14] xfs: make EOF allocation simpler John Garry
@ 2024-08-06 18:58   ` Darrick J. Wong
  2024-08-07  0:00     ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 18:58 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:47PM +0000, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently the allocation at EOF is broken into two cases - when the
> offset is zero and when the offset is non-zero. When the offset is
> non-zero, we try to do exact block allocation for contiguous
> extent allocation. When the offset is zero, the allocation is simply
> an aligned allocation.
> 
> We want aligned allocation as the fallback when exact block
> allocation fails, but that complicates the EOF allocation in that it
> now has to handle two different allocation cases. The
> caller also has to handle allocation when not at EOF, and for the
> upcoming forced alignment changes we need that to also be aligned
> allocation.
> 
> To simplify all this, pull the aligned allocation cases back into
> the callers and leave the EOF allocation path for exact block
> allocation only. This means that the EOF exact block allocation
> fallback path is the normal aligned allocation path and that ends up
> making things a lot simpler when forced alignment is introduced.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c   | 129 +++++++++++++++----------------------
>  fs/xfs/libxfs/xfs_ialloc.c |   2 +-
>  2 files changed, 54 insertions(+), 77 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 25a87e1154bb..42a75d257b35 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3310,12 +3310,12 @@ xfs_bmap_select_minlen(
>  static int
>  xfs_bmap_btalloc_select_lengths(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	xfs_extlen_t		*blen)
> +	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*pag;
>  	xfs_agnumber_t		agno, startag;
> +	xfs_extlen_t		blen = 0;
>  	int			error = 0;
>  
>  	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> @@ -3329,19 +3329,18 @@ xfs_bmap_btalloc_select_lengths(
>  	if (startag == NULLAGNUMBER)
>  		startag = 0;
>  
> -	*blen = 0;
>  	for_each_perag_wrap(mp, startag, agno, pag) {
> -		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
> +		error = xfs_bmap_longest_free_extent(pag, args->tp, &blen);
>  		if (error && error != -EAGAIN)
>  			break;
>  		error = 0;
> -		if (*blen >= args->maxlen)
> +		if (blen >= args->maxlen)
>  			break;
>  	}
>  	if (pag)
>  		xfs_perag_rele(pag);
>  
> -	args->minlen = xfs_bmap_select_minlen(ap, args, *blen);
> +	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
>  	return error;
>  }
>  
> @@ -3551,78 +3550,40 @@ xfs_bmap_exact_minlen_extent_alloc(
>   * If we are not low on available data blocks and we are allocating at
>   * EOF, optimise allocation for contiguous file extension and/or stripe
>   * alignment of the new extent.
> - *
> - * NOTE: ap->aeof is only set if the allocation length is >= the
> - * stripe unit and the allocation offset is at the end of file.
>   */
>  static int
>  xfs_bmap_btalloc_at_eof(
>  	struct xfs_bmalloca	*ap,
> -	struct xfs_alloc_arg	*args,
> -	xfs_extlen_t		blen,
> -	bool			ag_only)
> +	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*caller_pag = args->pag;
> +	xfs_extlen_t		alignment = args->alignment;
>  	int			error;
>  
> +	ASSERT(ap->aeof && ap->offset);
> +	ASSERT(args->alignment >= 1);
> +
>  	/*
> -	 * If there are already extents in the file, try an exact EOF block
> -	 * allocation to extend the file as a contiguous extent. If that fails,
> -	 * or it's the first allocation in a file, just try for a stripe aligned
> -	 * allocation.
> +	 * Compute the alignment slop for the fallback path so we ensure
> +	 * we account for the potential alignemnt space required by the
> +	 * fallback paths before we modify the AGF and AGFL here.
>  	 */
> -	if (ap->offset) {
> -		xfs_extlen_t	alignment = args->alignment;
> -
> -		/*
> -		 * Compute the alignment slop for the fallback path so we ensure
> -		 * we account for the potential alignment space required by the
> -		 * fallback paths before we modify the AGF and AGFL here.
> -		 */
> -		args->alignment = 1;
> -		args->alignslop = alignment - args->alignment;
> -
> -		if (!caller_pag)
> -			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> -		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> -		if (!caller_pag) {
> -			xfs_perag_put(args->pag);
> -			args->pag = NULL;
> -		}
> -		if (error)
> -			return error;
> -
> -		if (args->fsbno != NULLFSBLOCK)
> -			return 0;
> -		/*
> -		 * Exact allocation failed. Reset to try an aligned allocation
> -		 * according to the original allocation specification.
> -		 */
> -		args->alignment = alignment;
> -		args->alignslop = 0;
> -	}
> +	args->alignment = 1;
> +	args->alignslop = alignment - args->alignment;
>  
> -	if (ag_only) {
> -		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> -	} else {
> +	if (!caller_pag)
> +		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> +	error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> +	if (!caller_pag) {
> +		xfs_perag_put(args->pag);
>  		args->pag = NULL;
> -		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> -		ASSERT(args->pag == NULL);
> -		args->pag = caller_pag;
>  	}
> -	if (error)
> -		return error;
>  
> -	if (args->fsbno != NULLFSBLOCK)
> -		return 0;
> -
> -	/*
> -	 * Aligned allocation failed, so all fallback paths from here drop the
> -	 * start alignment requirement as we know it will not succeed.
> -	 */
> -	args->alignment = 1;
> -	return 0;
> +	/* Reset alignment to original specifications.  */
> +	args->alignment = alignment;
> +	args->alignslop = 0;
> +	return error;
>  }
>  
>  /*
> @@ -3688,12 +3649,19 @@ xfs_bmap_btalloc_filestreams(
>  	}
>  
>  	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
> -	if (ap->aeof)
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
> +	if (ap->aeof && ap->offset)
> +		error = xfs_bmap_btalloc_at_eof(ap, args);
>  
> +	/* This may be an aligned allocation attempt. */
>  	if (!error && args->fsbno == NULLFSBLOCK)
>  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
>  
> +	/* Attempt non-aligned allocation if we haven't already. */
> +	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> +		args->alignment = 1;
> +		error = xfs_alloc_vextent_near_bno(args, ap->blkno);

Oops, I just replied to the v2 thread instead of this.

From
https://lore.kernel.org/linux-xfs/20240621203556.GU3058325@frogsfrogsfrogs/

Do we have to zero args->alignslop here?

--D

> +	}
> +
>  out_low_space:
>  	/*
>  	 * We are now done with the perag reference for the filestreams
> @@ -3715,7 +3683,6 @@ xfs_bmap_btalloc_best_length(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args)
>  {
> -	xfs_extlen_t		blen = 0;
>  	int			error;
>  
>  	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
> @@ -3726,23 +3693,33 @@ xfs_bmap_btalloc_best_length(
>  	 * the request.  If one isn't found, then adjust the minimum allocation
>  	 * size to the largest space found.
>  	 */
> -	error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
> +	error = xfs_bmap_btalloc_select_lengths(ap, args);
>  	if (error)
>  		return error;
>  
>  	/*
> -	 * Don't attempt optimal EOF allocation if previous allocations barely
> -	 * succeeded due to being near ENOSPC. It is highly unlikely we'll get
> -	 * optimal or even aligned allocations in this case, so don't waste time
> -	 * trying.
> +	 * If we are in low space mode, then optimal allocation will fail so
> +	 * prepare for minimal allocation and run the low space algorithm
> +	 * immediately.
>  	 */
> -	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, false);
> -		if (error || args->fsbno != NULLFSBLOCK)
> -			return error;
> +	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> +		ASSERT(args->fsbno == NULLFSBLOCK);
> +		return xfs_bmap_btalloc_low_space(ap, args);
> +	}
> +
> +	if (ap->aeof && ap->offset)
> +		error = xfs_bmap_btalloc_at_eof(ap, args);
> +
> +	/* This may be an aligned allocation attempt. */
> +	if (!error && args->fsbno == NULLFSBLOCK)
> +		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> +
> +	/* Attempt non-aligned allocation if we haven't already. */
> +	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> +		args->alignment = 1;
> +		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	}
>  
> -	error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	if (error || args->fsbno != NULLFSBLOCK)
>  		return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 2fa29d2f004e..c5d220d51757 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -780,7 +780,7 @@ xfs_ialloc_ag_alloc(
>  		 * the exact agbno requirement and increase the alignment
>  		 * instead. It is critical that the total size of the request
>  		 * (len + alignment + slop) does not increase from this point
> -		 * on, so reset minalignslop to ensure it is not included in
> +		 * on, so reset alignslop to ensure it is not included in
>  		 * subsequent requests.
>  		 */
>  		args.alignslop = 0;
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 07/14] xfs: Introduce FORCEALIGN inode flag
  2024-08-01 16:30 ` [PATCH v3 07/14] xfs: Introduce FORCEALIGN inode flag John Garry
@ 2024-08-06 19:02   ` Darrick J. Wong
  2024-08-07 11:42     ` John Garry
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 19:02 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:50PM +0000, John Garry wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> Add a new inode flag to require that all file data extent mappings must
> be aligned (both the file offset range and the allocated space itself)
> to the extent size hint.  Having a separate COW extent size hint is no
> longer allowed.
> 
> The goal here is to enable sysadmins and users to mandate that all space
> mappings in a file must have a startoff/blockcount that are aligned to
> (say) a 2MB alignment and that the startblock/blockcount will follow the
> same alignment.
> 
> Allocated space will be aligned to start of the AG, and not necessarily
> aligned with disk blocks. The upcoming atomic writes feature will rely and
> forcealign and will also require allocated space will also be aligned to
> disk blocks.
> 
> reflink will not be supported for forcealign yet, so disallow a mount under
> this condition. This is because we have the limitation of pageache
> writeback not knowing how to writeback an entire allocation unut, so
> reject a mount with relink.
> 
> RT vol will not be supported for forcealign yet, so disallow a mount under
> this condition. It will be possible to support RT vol and forcealign in
> future. For this, the inode extsize must be a multiple of rtextsize - this
> is enforced already in xfs_ioctl_setattr_check_extsize() and
> xfs_inode_validate_extsize().
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> Co-developed-by: John Garry <john.g.garry@oracle.com>
> [jpg: many changes from orig, including forcealign inode verification
>  rework, ioctl setattr rework disallow reflink a forcealign inode,
>  disallow mount for forcealign + reflink or rt]
> Signed-off-by: John Garry <john.g.garry@oracle.com>

This patch looks ready to me but as I'm the original author I cannot add
a RVB tag.  Someone else needs to add that -- frankly, John is the best
candidate because he grabbed my patch into his tree and actually
modified it to do what he wants, which means he's the most familiar with
it.

--D

> ---
>  fs/xfs/libxfs/xfs_format.h     |  6 ++++-
>  fs/xfs/libxfs/xfs_inode_buf.c  | 46 ++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_inode_buf.h  |  3 +++
>  fs/xfs/libxfs/xfs_inode_util.c | 14 +++++++++++
>  fs/xfs/libxfs/xfs_sb.c         |  2 ++
>  fs/xfs/xfs_inode.h             |  8 +++++-
>  fs/xfs/xfs_ioctl.c             | 46 ++++++++++++++++++++++++++++++++--
>  fs/xfs/xfs_mount.h             |  2 ++
>  fs/xfs/xfs_reflink.c           |  5 ++--
>  fs/xfs/xfs_super.c             | 18 +++++++++++++
>  include/uapi/linux/fs.h        |  2 ++
>  11 files changed, 146 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index e1bfee0c3b1a..95f5259c4255 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -352,6 +352,7 @@ xfs_sb_has_compat_feature(
>  #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
>  #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
>  #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
> +#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
>  #define XFS_SB_FEAT_RO_COMPAT_ALL \
>  		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
>  		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
> @@ -1093,16 +1094,19 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
>  #define XFS_DIFLAG2_COWEXTSIZE_BIT   2  /* copy on write extent size hint */
>  #define XFS_DIFLAG2_BIGTIME_BIT	3	/* big timestamps */
>  #define XFS_DIFLAG2_NREXT64_BIT 4	/* large extent counters */
> +/* data extent mappings for regular files must be aligned to extent size hint */
> +#define XFS_DIFLAG2_FORCEALIGN_BIT 5
>  
>  #define XFS_DIFLAG2_DAX		(1 << XFS_DIFLAG2_DAX_BIT)
>  #define XFS_DIFLAG2_REFLINK     (1 << XFS_DIFLAG2_REFLINK_BIT)
>  #define XFS_DIFLAG2_COWEXTSIZE  (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
>  #define XFS_DIFLAG2_BIGTIME	(1 << XFS_DIFLAG2_BIGTIME_BIT)
>  #define XFS_DIFLAG2_NREXT64	(1 << XFS_DIFLAG2_NREXT64_BIT)
> +#define XFS_DIFLAG2_FORCEALIGN	(1 << XFS_DIFLAG2_FORCEALIGN_BIT)
>  
>  #define XFS_DIFLAG2_ANY \
>  	(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \
> -	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64)
> +	 XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_FORCEALIGN)
>  
>  static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip)
>  {
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 513b50da6215..1c59891fa9e2 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -657,6 +657,15 @@ xfs_dinode_verify(
>  	    !xfs_has_bigtime(mp))
>  		return __this_address;
>  
> +	if (flags2 & XFS_DIFLAG2_FORCEALIGN) {
> +		fa = xfs_inode_validate_forcealign(mp,
> +				be32_to_cpu(dip->di_extsize),
> +				be32_to_cpu(dip->di_cowextsize),
> +				mode, flags, flags2);
> +		if (fa)
> +			return fa;
> +	}
> +
>  	return NULL;
>  }
>  
> @@ -824,3 +833,40 @@ xfs_inode_validate_cowextsize(
>  
>  	return NULL;
>  }
> +
> +/* Validate the forcealign inode flag */
> +xfs_failaddr_t
> +xfs_inode_validate_forcealign(
> +	struct xfs_mount	*mp,
> +	uint32_t		extsize,
> +	uint32_t		cowextsize,
> +	uint16_t		mode,
> +	uint16_t		flags,
> +	uint64_t		flags2)
> +{
> +	/* superblock rocompat feature flag */
> +	if (!xfs_has_forcealign(mp))
> +		return __this_address;
> +
> +	/* Only regular files and directories */
> +	if (!S_ISDIR(mode) && !S_ISREG(mode))
> +		return __this_address;
> +
> +	/* We require EXTSIZE or EXTSZINHERIT */
> +	if (!(flags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT)))
> +		return __this_address;
> +
> +	/* We require a non-zero extsize */
> +	if (!extsize)
> +		return __this_address;
> +
> +	/* COW extsize disallowed */
> +	if (flags2 & XFS_DIFLAG2_COWEXTSIZE)
> +		return __this_address;
> +
> +	/* cowextsize must be zero */
> +	if (cowextsize)
> +		return __this_address;
> +
> +	return NULL;
> +}
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h
> index 585ed5a110af..b8b65287b037 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.h
> +++ b/fs/xfs/libxfs/xfs_inode_buf.h
> @@ -33,6 +33,9 @@ xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp,
>  xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp,
>  		uint32_t cowextsize, uint16_t mode, uint16_t flags,
>  		uint64_t flags2);
> +xfs_failaddr_t xfs_inode_validate_forcealign(struct xfs_mount *mp,
> +		uint32_t extsize, uint32_t cowextsize, uint16_t mode,
> +		uint16_t flags, uint64_t flags2);
>  
>  static inline uint64_t xfs_inode_encode_bigtime(struct timespec64 tv)
>  {
> diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c
> index 032333289113..b264939d8855 100644
> --- a/fs/xfs/libxfs/xfs_inode_util.c
> +++ b/fs/xfs/libxfs/xfs_inode_util.c
> @@ -80,6 +80,8 @@ xfs_flags2diflags2(
>  		di_flags2 |= XFS_DIFLAG2_DAX;
>  	if (xflags & FS_XFLAG_COWEXTSIZE)
>  		di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;
> +	if (xflags & FS_XFLAG_FORCEALIGN)
> +		di_flags2 |= XFS_DIFLAG2_FORCEALIGN;
>  
>  	return di_flags2;
>  }
> @@ -126,6 +128,8 @@ xfs_ip2xflags(
>  			flags |= FS_XFLAG_DAX;
>  		if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
>  			flags |= FS_XFLAG_COWEXTSIZE;
> +		if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
> +			flags |= FS_XFLAG_FORCEALIGN;
>  	}
>  
>  	if (xfs_inode_has_attr_fork(ip))
> @@ -224,6 +228,8 @@ xfs_inode_inherit_flags2(
>  	}
>  	if (pip->i_diflags2 & XFS_DIFLAG2_DAX)
>  		ip->i_diflags2 |= XFS_DIFLAG2_DAX;
> +	if (pip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)
> +		ip->i_diflags2 |= XFS_DIFLAG2_FORCEALIGN;
>  
>  	/* Don't let invalid cowextsize hints propagate. */
>  	failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize,
> @@ -232,6 +238,14 @@ xfs_inode_inherit_flags2(
>  		ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE;
>  		ip->i_cowextsize = 0;
>  	}
> +	if (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN) {
> +		failaddr = xfs_inode_validate_forcealign(ip->i_mount,
> +				ip->i_extsize, ip->i_cowextsize,
> +				VFS_I(ip)->i_mode, ip->i_diflags,
> +				ip->i_diflags2);
> +		if (failaddr)
> +			ip->i_diflags2 &= ~XFS_DIFLAG2_FORCEALIGN;
> +	}
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 6b56f0f6d4c1..e56911553edd 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -164,6 +164,8 @@ xfs_sb_version_to_features(
>  		features |= XFS_FEAT_REFLINK;
>  	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
>  		features |= XFS_FEAT_INOBTCNT;
> +	if (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
> +		features |= XFS_FEAT_FORCEALIGN;
>  	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_FTYPE)
>  		features |= XFS_FEAT_FTYPE;
>  	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_SPINODES)
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index bf0f4f8b9e64..3e7664ec4d6c 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -312,7 +312,13 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
>  
>  static inline bool xfs_inode_has_forcealign(struct xfs_inode *ip)
>  {
> -	return false;
> +	if (!(ip->i_diflags & XFS_DIFLAG_EXTSIZE))
> +		return false;
> +	if (ip->i_extsize <= 1)
> +		return false;
> +	if (xfs_is_cow_inode(ip))
> +		return false;
> +	return ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN;
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 4e933db75b12..7a6757a4d2bd 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -469,6 +469,39 @@ xfs_fileattr_get(
>  	return 0;
>  }
>  
> +/*
> + * Forcealign requires a non-zero extent size hint and a zero cow
> + * extent size hint.
> + */
> +static int
> +xfs_ioctl_setattr_forcealign(
> +	struct xfs_inode	*ip,
> +	struct fileattr		*fa)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +
> +	if (!xfs_has_forcealign(mp))
> +		return -EINVAL;
> +
> +	if (xfs_is_reflink_inode(ip))
> +		return -EINVAL;
> +
> +	if (!(fa->fsx_xflags & (FS_XFLAG_EXTSIZE |
> +				FS_XFLAG_EXTSZINHERIT)))
> +		return -EINVAL;
> +
> +	if (fa->fsx_xflags & FS_XFLAG_COWEXTSIZE)
> +		return -EINVAL;
> +
> +	if (!fa->fsx_extsize)
> +		return -EINVAL;
> +
> +	if (fa->fsx_cowextsize)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static int
>  xfs_ioctl_setattr_xflags(
>  	struct xfs_trans	*tp,
> @@ -477,10 +510,13 @@ xfs_ioctl_setattr_xflags(
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	bool			rtflag = (fa->fsx_xflags & FS_XFLAG_REALTIME);
> +	bool			forcealign = fa->fsx_xflags & FS_XFLAG_FORCEALIGN;
>  	uint64_t		i_flags2;
> +	int			error;
>  
> -	if (rtflag != XFS_IS_REALTIME_INODE(ip)) {
> -		/* Can't change realtime flag if any extents are allocated. */
> +	/* Can't change RT or forcealign flags if any extents are allocated. */
> +	if (rtflag != XFS_IS_REALTIME_INODE(ip) ||
> +	    forcealign != xfs_inode_has_forcealign(ip)) {
>  		if (ip->i_df.if_nextents || ip->i_delayed_blks)
>  			return -EINVAL;
>  	}
> @@ -501,6 +537,12 @@ xfs_ioctl_setattr_xflags(
>  	if (i_flags2 && !xfs_has_v3inodes(mp))
>  		return -EINVAL;
>  
> +	if (forcealign) {
> +		error = xfs_ioctl_setattr_forcealign(ip, fa);
> +		if (error)
> +			return error;
> +	}
> +
>  	ip->i_diflags = xfs_flags2diflags(ip, fa->fsx_xflags);
>  	ip->i_diflags2 = i_flags2;
>  
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index d0567dfbc036..30228fea908d 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -299,6 +299,7 @@ typedef struct xfs_mount {
>  #define XFS_FEAT_NEEDSREPAIR	(1ULL << 25)	/* needs xfs_repair */
>  #define XFS_FEAT_NREXT64	(1ULL << 26)	/* large extent counters */
>  #define XFS_FEAT_EXCHANGE_RANGE	(1ULL << 27)	/* exchange range */
> +#define XFS_FEAT_FORCEALIGN	(1ULL << 28)	/* aligned file data extents */
>  
>  /* Mount features */
>  #define XFS_FEAT_NOATTR2	(1ULL << 48)	/* disable attr2 creation */
> @@ -385,6 +386,7 @@ __XFS_ADD_V4_FEAT(projid32, PROJID32)
>  __XFS_HAS_V4_FEAT(v3inodes, V3INODES)
>  __XFS_HAS_V4_FEAT(crc, CRC)
>  __XFS_HAS_V4_FEAT(pquotino, PQUOTINO)
> +__XFS_HAS_FEAT(forcealign, FORCEALIGN)
>  
>  /*
>   * Mount features
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 6fde6ec8092f..a836bfec7878 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -1467,8 +1467,9 @@ xfs_reflink_remap_prep(
>  
>  	/* Check file eligibility and prepare for block sharing. */
>  	ret = -EINVAL;
> -	/* Don't reflink realtime inodes */
> -	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
> +	/* Don't reflink realtime or forcealign inodes */
> +	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest) ||
> +	    xfs_inode_has_forcealign(src) || xfs_inode_has_forcealign(dest))
>  		goto out_unlock;
>  
>  	/* Don't share DAX file data with non-DAX file. */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 27e9f749c4c7..b52a01b50387 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1729,12 +1729,30 @@ xfs_fs_fill_super(
>  			goto out_filestream_unmount;
>  		}
>  
> +		if (xfs_has_forcealign(mp)) {
> +			xfs_alert(mp,
> +	"reflink not compatible with forcealign!");
> +			error = -EINVAL;
> +			goto out_filestream_unmount;
> +		}
> +
>  		if (xfs_globals.always_cow) {
>  			xfs_info(mp, "using DEBUG-only always_cow mode.");
>  			mp->m_always_cow = true;
>  		}
>  	}
>  
> +	if (xfs_has_forcealign(mp)) {
> +		if (xfs_has_realtime(mp)) {
> +			xfs_alert(mp,
> +		"forcealign not supported for realtime device!");
> +			error = -EINVAL;
> +			goto out_filestream_unmount;
> +		}
> +		xfs_warn(mp,
> +"EXPERIMENTAL forced data extent alignment feature in use. Use at your own risk!");
> +	}
> +
>  	if (xfs_has_rmapbt(mp) && mp->m_sb.sb_rblocks) {
>  		xfs_alert(mp,
>  	"reverse mapping btree not compatible with realtime device!");
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 753971770733..f55d650f904a 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -158,6 +158,8 @@ struct fsxattr {
>  #define FS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
>  #define FS_XFLAG_DAX		0x00008000	/* use DAX for IO */
>  #define FS_XFLAG_COWEXTSIZE	0x00010000	/* CoW extent size allocator hint */
> +/* data extent mappings for regular files must be aligned to extent size hint */
> +#define FS_XFLAG_FORCEALIGN	0x00020000
>  #define FS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
>  
>  /* the read-only stuff doesn't really belong here, but any other place is
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign
  2024-08-01 16:30 ` [PATCH v3 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign John Garry
@ 2024-08-06 19:02   ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 19:02 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:51PM +0000, John Garry wrote:
> For forcealign enabled, the allocation unit size is in ip->i_extsize, and
> this must never be zero.
> 
> Add helper xfs_inode_alloc_fsbsize() to return alloc unit in FSBs, and use
> it in xfs_inode_alloc_unitsize().
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Looks good,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_inode.c | 17 +++++++++++++----
>  fs/xfs/xfs_inode.h |  1 +
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 7dc6f326936c..5af12f35062d 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3107,17 +3107,26 @@ xfs_break_layouts(
>  	return error;
>  }
>  
> -/* Returns the size of fundamental allocation unit for a file, in bytes. */
>  unsigned int
> -xfs_inode_alloc_unitsize(
> +xfs_inode_alloc_fsbsize(
>  	struct xfs_inode	*ip)
>  {
>  	unsigned int		blocks = 1;
>  
> -	if (XFS_IS_REALTIME_INODE(ip))
> +	if (xfs_inode_has_forcealign(ip))
> +		blocks = ip->i_extsize;
> +	else if (XFS_IS_REALTIME_INODE(ip))
>  		blocks = ip->i_mount->m_sb.sb_rextsize;
>  
> -	return XFS_FSB_TO_B(ip->i_mount, blocks);
> +	return blocks;
> +}
> +
> +/* Returns the size of fundamental allocation unit for a file, in bytes. */
> +unsigned int
> +xfs_inode_alloc_unitsize(
> +	struct xfs_inode	*ip)
> +{
> +	return XFS_FSB_TO_B(ip->i_mount, xfs_inode_alloc_fsbsize(ip));
>  }
>  
>  /* Should we always be using copy on write for file writes? */
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 3e7664ec4d6c..158afad8c7a4 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -641,6 +641,7 @@ int xfs_inode_reload_unlinked(struct xfs_inode *ip);
>  bool xfs_ifork_zapped(const struct xfs_inode *ip, int whichfork);
>  void xfs_inode_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
>  		xfs_filblks_t *dblocks, xfs_filblks_t *rblocks);
> +unsigned int xfs_inode_alloc_fsbsize(struct xfs_inode *ip);
>  unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip);
>  
>  int xfs_icreate_dqalloc(const struct xfs_icreate_args *args,
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 09/14] xfs: Update xfs_setattr_size() for forcealign
  2024-08-01 16:30 ` [PATCH v3 09/14] xfs: Update xfs_setattr_size() " John Garry
@ 2024-08-06 19:03   ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 19:03 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:52PM +0000, John Garry wrote:
> For when an inode has forcealign, reserve blocks for same reason which we
> were doing for big RT alloc.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>

Looks good,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_iops.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 1cdc8034f54d..6e017aa6f61d 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -926,12 +926,12 @@ xfs_setattr_size(
>  	}
>  
>  	/*
> -	 * For realtime inode with more than one block rtextsize, we need the
> +	 * For inodes with more than one block alloc unitsize, we need the
>  	 * block reservation for bmap btree block allocations/splits that can
>  	 * happen since it could split the tail written extent and convert the
>  	 * right beyond EOF one to unwritten.
>  	 */
> -	if (xfs_inode_has_bigrtalloc(ip))
> +	if (xfs_inode_alloc_fsbsize(ip) > 1)
>  		resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
>  
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 10/14] xfs: Do not free EOF blocks for forcealign
  2024-08-01 16:30 ` [PATCH v3 10/14] xfs: Do not free EOF blocks " John Garry
@ 2024-08-06 19:24   ` Darrick J. Wong
  2024-08-07 12:33     ` John Garry
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 19:24 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:53PM +0000, John Garry wrote:
> For when forcealign is enabled, we want the EOF to be aligned as well, so
> do not free EOF blocks.
> 
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> #earlier version
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/xfs_bmap_util.c |  7 +++++--
>  fs/xfs/xfs_inode.c     | 14 ++++++++++++++
>  fs/xfs/xfs_inode.h     |  2 ++
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index fe2e2c930975..60389ac8bd45 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -496,6 +496,7 @@ xfs_can_free_eofblocks(
>  	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_fileoff_t		end_fsb;
>  	xfs_fileoff_t		last_fsb;
> +	xfs_fileoff_t		dummy_fsb;
>  	int			nimaps = 1;
>  	int			error;
>  
> @@ -537,8 +538,10 @@ xfs_can_free_eofblocks(
>  	 * forever.
>  	 */
>  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
> -	if (xfs_inode_has_bigrtalloc(ip))
> -		end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
> +
> +	/* Only try to free beyond the allocation unit that crosses EOF */
> +	xfs_roundout_to_alloc_fsbsize(ip, &dummy_fsb, &end_fsb);
> +
>  	last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
>  	if (last_fsb <= end_fsb)
>  		return false;
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 5af12f35062d..d765dedebc15 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3129,6 +3129,20 @@ xfs_inode_alloc_unitsize(
>  	return XFS_FSB_TO_B(ip->i_mount, xfs_inode_alloc_fsbsize(ip));
>  }
>  
> +void
> +xfs_roundout_to_alloc_fsbsize(
> +	struct xfs_inode	*ip,
> +	xfs_fileoff_t		*start,
> +	xfs_fileoff_t		*end)
> +{
> +	unsigned int		blocks = xfs_inode_alloc_fsbsize(ip);
> +
> +	if (blocks == 1)
> +		return;
> +	*start = rounddown_64(*start, blocks);
> +	*end = roundup_64(*end, blocks);
> +}

This is probably going to start another round of shouting, but I think
it's silly to do two rounding operations when you only care about one
value.  In patch 12 it results in a bunch more dummy variables that you
then ignore.

Can't this be:

static inline xfs_fileoff_t
xfs_inode_rounddown_alloc_unit(
	struct xfs_inode	*ip,
	xfs_fileoff		off)
{
	unsigned int		rounding = xfs_inode_alloc_fsbsize(ip);

	if (rounding == 1)
		return off;
	return rounddown_64(off, rounding);
}

static inline xfs_fileoff_t
xfs_inode_roundup_alloc_unit(
	struct xfs_inode	*ip,
	xfs_fileoff		off)
{
	unsigned int		rounding = xfs_inode_alloc_fsbsize(ip);

	if (rounding == 1)
		return off;
	return roundup_64(off, rounding);
}

Then that callsite can be:

	end_fsb = xfs_inode_roundup_alloc_unit(ip,
			XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)));

--D

> +
>  /* Should we always be using copy on write for file writes? */
>  bool
>  xfs_is_always_cow_inode(
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 158afad8c7a4..7f86c4781bd8 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -643,6 +643,8 @@ void xfs_inode_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip,
>  		xfs_filblks_t *dblocks, xfs_filblks_t *rblocks);
>  unsigned int xfs_inode_alloc_fsbsize(struct xfs_inode *ip);
>  unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip);
> +void xfs_roundout_to_alloc_fsbsize(struct xfs_inode *ip,
> +		xfs_fileoff_t *start, xfs_fileoff_t *end);
>  
>  int xfs_icreate_dqalloc(const struct xfs_icreate_args *args,
>  		struct xfs_dquot **udqpp, struct xfs_dquot **gdqpp,
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 11/14] xfs: Only free full extents for forcealign
  2024-08-01 16:30 ` [PATCH v3 11/14] xfs: Only free full extents " John Garry
@ 2024-08-06 19:27   ` Darrick J. Wong
  2024-08-07  0:08     ` Dave Chinner
  2024-08-07 13:06     ` John Garry
  0 siblings, 2 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 19:27 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:54PM +0000, John Garry wrote:
> Like we already do for rtvol, only free full extents for forcealign in
> xfs_free_file_space().
> 
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> #earlier version
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/xfs_bmap_util.c |  7 ++-----
>  fs/xfs/xfs_inode.c     | 14 ++++++++++++++
>  fs/xfs/xfs_inode.h     |  2 ++
>  3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 60389ac8bd45..46eebecd7bba 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -854,11 +854,8 @@ xfs_free_file_space(
>  	startoffset_fsb = XFS_B_TO_FSB(mp, offset);
>  	endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
>  
> -	/* We can only free complete realtime extents. */
> -	if (xfs_inode_has_bigrtalloc(ip)) {
> -		startoffset_fsb = xfs_rtb_roundup_rtx(mp, startoffset_fsb);
> -		endoffset_fsb = xfs_rtb_rounddown_rtx(mp, endoffset_fsb);
> -	}
> +	/* Free only complete extents. */
> +	xfs_roundin_to_alloc_fsbsize(ip, &startoffset_fsb, &endoffset_fsb);

...and then this becomes:

	/* We can only free complete allocation units. */
	startoffset_fsb = xfs_inode_roundup_alloc_unit(ip, startoffset_fsb);
	endoffset_fsb = xfs_inode_rounddown_alloc_unit(ip, endoffset_fsb);

I guess "roundout" means "extend start and end to fit allocation unit"
whereas "roundin" means "shrink start and end to fit allocation unit"?

--D

>  
>  	/*
>  	 * Need to zero the stuff we're not freeing, on disk.
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index d765dedebc15..e7fa155fcbde 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3143,6 +3143,20 @@ xfs_roundout_to_alloc_fsbsize(
>  	*end = roundup_64(*end, blocks);
>  }
>  
> +void
> +xfs_roundin_to_alloc_fsbsize(
> +	struct xfs_inode	*ip,
> +	xfs_fileoff_t		*start,
> +	xfs_fileoff_t		*end)
> +{
> +	unsigned int		blocks = xfs_inode_alloc_fsbsize(ip);
> +
> +	if (blocks == 1)
> +		return;
> +	*start = roundup_64(*start, blocks);
> +	*end = rounddown_64(*end, blocks);
> +}
> +
>  /* Should we always be using copy on write for file writes? */
>  bool
>  xfs_is_always_cow_inode(
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 7f86c4781bd8..6dd8055c98b3 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -645,6 +645,8 @@ unsigned int xfs_inode_alloc_fsbsize(struct xfs_inode *ip);
>  unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip);
>  void xfs_roundout_to_alloc_fsbsize(struct xfs_inode *ip,
>  		xfs_fileoff_t *start, xfs_fileoff_t *end);
> +void xfs_roundin_to_alloc_fsbsize(struct xfs_inode *ip,
> +		xfs_fileoff_t *start, xfs_fileoff_t *end);
>  
>  int xfs_icreate_dqalloc(const struct xfs_icreate_args *args,
>  		struct xfs_dquot **udqpp, struct xfs_dquot **gdqpp,
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 14/14] xfs: Enable file data forcealign feature
  2024-08-01 16:30 ` [PATCH v3 14/14] xfs: Enable file data forcealign feature John Garry
@ 2024-08-06 19:43   ` Darrick J. Wong
  2024-08-07 13:50     ` John Garry
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 19:43 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:57PM +0000, John Garry wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> Enable this feature.
> 
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> Signed-off-by: John Garry <john.g.garry@oracle.com>

You really ought to add your own Reviewed-by tag here...

--D

> ---
>  fs/xfs/libxfs/xfs_format.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 95f5259c4255..04c6cbc943c2 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -357,7 +357,8 @@ xfs_sb_has_compat_feature(
>  		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
>  		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
>  		 XFS_SB_FEAT_RO_COMPAT_REFLINK| \
> -		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)
> +		 XFS_SB_FEAT_RO_COMPAT_INOBTCNT | \
> +		 XFS_SB_FEAT_RO_COMPAT_FORCEALIGN)
>  #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
>  static inline bool
>  xfs_sb_has_ro_compat_feature(
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 12/14] xfs: Unmap blocks according to forcealign
  2024-08-01 16:30 ` [PATCH v3 12/14] xfs: Unmap blocks according to forcealign John Garry
@ 2024-08-06 20:14   ` Darrick J. Wong
  2024-08-07 13:40     ` John Garry
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-06 20:14 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Thu, Aug 01, 2024 at 04:30:55PM +0000, John Garry wrote:
> For when forcealign is enabled, blocks in an inode need to be unmapped
> according to extent alignment, like what is already done for rtvol.
> 
> Change variable isrt in __xfs_bunmapi() to a bool, as that is really what
> it is.
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 48 +++++++++++++++++++++++++++++-----------
>  fs/xfs/xfs_inode.c       | 16 ++++++++++++++
>  fs/xfs/xfs_inode.h       |  2 ++
>  3 files changed, 53 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 0c3df8c71c6d..d6ae344a17fc 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5409,6 +5409,25 @@ xfs_bmap_del_extent_real(
>  	return 0;
>  }
>  
> +static xfs_extlen_t
> +xfs_bunmapi_align(
> +	struct xfs_inode	*ip,
> +	xfs_fsblock_t		fsbno,
> +	xfs_extlen_t *off)
> +{
> +	struct xfs_mount	*mp = ip->i_mount;
> +
> +	if (XFS_IS_REALTIME_INODE(ip))
> +		return xfs_inode_alloc_fsbsize_align(ip, fsbno, off);
> +	/*
> +	 * The agbno for the fsbno is aligned to extsize, but the fsbno itself
> +	 * is not necessarily aligned (to extsize), so use agbno to determine
> +	 * mod+offset to the alloc unit boundary.
> +	 */
> +	return xfs_inode_alloc_fsbsize_align(ip, XFS_FSB_TO_AGBNO(mp, fsbno),
> +					off);
> +}
> +
>  /*
>   * Unmap (remove) blocks from a file.
>   * If nexts is nonzero then the number of extents to remove is limited to
> @@ -5430,7 +5449,8 @@ __xfs_bunmapi(
>  	xfs_extnum_t		extno;		/* extent number in list */
>  	struct xfs_bmbt_irec	got;		/* current extent record */
>  	struct xfs_ifork	*ifp;		/* inode fork pointer */
> -	int			isrt;		/* freeing in rt area */
> +	bool			isrt;		/* freeing in rt area */
> +	bool			isforcealign;	/* forcealign inode */
>  	int			logflags;	/* transaction logging flags */
>  	xfs_extlen_t		mod;		/* rt extent offset */
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -5468,6 +5488,8 @@ __xfs_bunmapi(
>  	}
>  	XFS_STATS_INC(mp, xs_blk_unmap);
>  	isrt = xfs_ifork_is_realtime(ip, whichfork);
> +	isforcealign = (whichfork != XFS_ATTR_FORK) &&
> +			xfs_inode_has_forcealign(ip);
>  	end = start + len;
>  
>  	if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) {
> @@ -5486,6 +5508,8 @@ __xfs_bunmapi(
>  	extno = 0;
>  	while (end != (xfs_fileoff_t)-1 && end >= start &&
>  	       (nexts == 0 || extno < nexts)) {
> +		xfs_extlen_t off;

I got really confused because I thought this was a file block offset and
only after more digging realized that this is a sometimes dummy
adjustment variable.

> +
>  		/*
>  		 * Is the found extent after a hole in which end lives?
>  		 * Just back up to the previous extent, if so.
> @@ -5519,18 +5543,18 @@ __xfs_bunmapi(
>  		if (del.br_startoff + del.br_blockcount > end + 1)
>  			del.br_blockcount = end + 1 - del.br_startoff;
>  
> -		if (!isrt || (flags & XFS_BMAPI_REMAP))
> +		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
>  			goto delete;
>  
> -		mod = xfs_rtb_to_rtxoff(mp,
> -				del.br_startblock + del.br_blockcount);
> +		mod = xfs_bunmapi_align(ip,
> +				del.br_startblock + del.br_blockcount, &off);
>  		if (mod) {

Oof.  I don't like how this loop body has the rtx adjustment code
inlined into it.  We only use the isrt flag for the one test above.
I tried hoisting this into something less gross involving separate
adjustment functions but then you have to pass in so many outer
variables that it becomes a mess.

The best I can come up with for now is:

	unsigned int		alloc_fsb = xfs_inode_alloc_fsbsize(ip);
	/* no more isrt/isforcealign bools */

...

		if (alloc_fsb == 1 || (flags & XFS_BMAPI_REMAP))
			goto delete;

		mod = do_div(del.br_startblock + del.br_blockcount,
				alloc_fsb);
		if (mod) {

>  			/*
> -			 * Realtime extent not lined up at the end.
> +			 * Not aligned to allocation unit on the end.
>  			 * The extent could have been split into written
>  			 * and unwritten pieces, or we could just be
>  			 * unmapping part of it.  But we can't really
> -			 * get rid of part of a realtime extent.
> +			 * get rid of part of an extent.
>  			 */
>  			if (del.br_state == XFS_EXT_UNWRITTEN) {
>  				/*
> @@ -5554,7 +5578,7 @@ __xfs_bunmapi(
>  			ASSERT(del.br_state == XFS_EXT_NORM);
>  			ASSERT(tp->t_blk_res > 0);
>  			/*
> -			 * If this spans a realtime extent boundary,
> +			 * If this spans an extent boundary,
>  			 * chop it back to the start of the one we end at.
>  			 */
>  			if (del.br_blockcount > mod) {
> @@ -5571,14 +5595,12 @@ __xfs_bunmapi(
>  			goto nodelete;
>  		}
>  
> -		mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
> +		mod = xfs_bunmapi_align(ip, del.br_startblock, &off);
>  		if (mod) {
> -			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;

		mod = do_div(del.br_startblock, alloc_fsb);
		if (mod) {
			xfs_extlen_t off = alloc_fsb - mod;

At least then you don't need this weird xfs_inode_alloc_fsbsize_align
that passes back two xfs_extlen_t arguments.

--D

> -
>  			/*
> -			 * Realtime extent is lined up at the end but not
> -			 * at the front.  We'll get rid of full extents if
> -			 * we can.
> +			 * Extent is lined up to the allocation unit at the
> +			 * end but not at the front.  We'll get rid of full
> +			 * extents if we can.
>  			 */
>  			if (del.br_blockcount > off) {
>  				del.br_blockcount -= off;
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index e7fa155fcbde..bb8abf990186 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3164,3 +3164,19 @@ xfs_is_always_cow_inode(
>  {
>  	return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount);
>  }
> +
> +/* Return mod+offset for a blkno to an extent boundary */
> +xfs_extlen_t
> +xfs_inode_alloc_fsbsize_align(
> +	struct xfs_inode	*ip,
> +	xfs_fileoff_t		blkno,
> +	xfs_extlen_t		*off)
> +{
> +	xfs_fileoff_t		blkno_start = blkno;
> +	xfs_fileoff_t		blkno_end = blkno;
> +
> +	xfs_roundout_to_alloc_fsbsize(ip, &blkno_start, &blkno_end);
> +
> +	*off = blkno_end - blkno;
> +	return blkno - blkno_start;
> +}
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 6dd8055c98b3..7b77797c3943 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -647,6 +647,8 @@ void xfs_roundout_to_alloc_fsbsize(struct xfs_inode *ip,
>  		xfs_fileoff_t *start, xfs_fileoff_t *end);
>  void xfs_roundin_to_alloc_fsbsize(struct xfs_inode *ip,
>  		xfs_fileoff_t *start, xfs_fileoff_t *end);
> +xfs_extlen_t xfs_inode_alloc_fsbsize_align(struct xfs_inode *ip,
> +		xfs_fileoff_t blkno, xfs_extlen_t *off);
>  
>  int xfs_icreate_dqalloc(const struct xfs_icreate_args *args,
>  		struct xfs_dquot **udqpp, struct xfs_dquot **gdqpp,
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 03/14] xfs: simplify extent allocation alignment
  2024-08-06 18:56   ` Darrick J. Wong
@ 2024-08-06 23:52     ` Dave Chinner
  2024-08-07  0:23       ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2024-08-06 23:52 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Tue, Aug 06, 2024 at 11:56:51AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 01, 2024 at 04:30:46PM +0000, John Garry wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We currently align extent allocation to stripe unit or stripe width.
> > That is specified by an external parameter to the allocation code,
> > which then manipulates the xfs_alloc_args alignment configuration in
> > interesting ways.
> > 
> > The args->alignment field specifies extent start alignment, but
> > because we may be attempting non-aligned allocation first there are
> > also slop variables that allow for those allocation attempts to
> > account for aligned allocation if they fail.
> > 
> > This gets much more complex as we introduce forced allocation
> > alignment, where extent size hints are used to generate the extent
> > start alignment. extent size hints currently only affect extent
> > lengths (via args->prod and args->mod) and so with this change we
> > will have two different start alignment conditions.
> > 
> > Avoid this complexity by always using args->alignment to indicate
> > extent start alignment, and always using args->prod/mod to indicate
> > extent length adjustment.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > [jpg: fixup alignslop references in xfs_trace.h and xfs_ialloc.c]
> > Signed-off-by: John Garry <john.g.garry@oracle.com>
> 
> Going back to the 6/21 posting[1], what were the answers to the
> questions I posted?  Did I correctly figure out what alignslop refers
> to?

Hard to say.

alignslop is basically an temporary accounting mechanism used to
prevent filesystem shutdowns when the AG is near ENOSPC and exact
BNO allocation is attempted and fails because there isn't an exact
free space available. This exact bno allocation attempt can dirty
the AGFL, and before we dirty the transaction *we must guarantee the
allocation will succeed*. If the allocation fails after we've
started modifying metadata (for whatever reason) we will cancel a
dirty transaction and shut down the filesystem.

Hence the first allocation done from the xfs_bmap_btalloc() context
needs to account for every block the specific allocation and all the
failure fallback attempts *may require* before it starts modifying
metadata. The contiguous exact bno allocation case isn't an aligned
allocation, but it will be followed by an aligned allocation attempt
if it fails and so it must take into account the space requirements
of aligned allocation even though it is not an aligned allocation
itself.

args->alignslop allows xfs_alloc_space_available() to take this
space requirement into account for any allocation that has lesser
alignment requirements than any subsequent allocation attempt that
may follow if this specific allocation attempt fails.

IOWs, args->alignslop is similar to args->minleft and args->total in
purpose, but it only affects the accounting for this specific
allocation attempt rather than defining the amount of space
that needs to remain available at the successful completion of this
allocation for future allocations within this transaction context.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 04/14] xfs: make EOF allocation simpler
  2024-08-06 18:58   ` Darrick J. Wong
@ 2024-08-07  0:00     ` Dave Chinner
  2024-08-07  0:24       ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2024-08-07  0:00 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Tue, Aug 06, 2024 at 11:58:53AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 01, 2024 at 04:30:47PM +0000, John Garry wrote:
> > @@ -3688,12 +3649,19 @@ xfs_bmap_btalloc_filestreams(
> >  	}
> >  
> >  	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
> > -	if (ap->aeof)
> > -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
> > +	if (ap->aeof && ap->offset)
> > +		error = xfs_bmap_btalloc_at_eof(ap, args);
> >  
> > +	/* This may be an aligned allocation attempt. */
> >  	if (!error && args->fsbno == NULLFSBLOCK)
> >  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> >  
> > +	/* Attempt non-aligned allocation if we haven't already. */
> > +	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> > +		args->alignment = 1;
> > +		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> 
> Oops, I just replied to the v2 thread instead of this.
> 
> From
> https://lore.kernel.org/linux-xfs/20240621203556.GU3058325@frogsfrogsfrogs/
> 
> Do we have to zero args->alignslop here?

No. It should always be zero here to begin with. It is the
responsibility of the allocation attempt that modifies
args->alignment and args->alignslop to reset them to original values
on allocation failure.

The only places we use alignslop are xfs_bmap_btalloc_at_eof() and
xfs_ialloc_ag_alloc(). They both zero it and reset args->alignment
on allocation failure before falling through to the next allocation
attempt.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 11/14] xfs: Only free full extents for forcealign
  2024-08-06 19:27   ` Darrick J. Wong
@ 2024-08-07  0:08     ` Dave Chinner
  2024-08-07 13:06     ` John Garry
  1 sibling, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2024-08-07  0:08 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Tue, Aug 06, 2024 at 12:27:38PM -0700, Darrick J. Wong wrote:
> On Thu, Aug 01, 2024 at 04:30:54PM +0000, John Garry wrote:
> > Like we already do for rtvol, only free full extents for forcealign in
> > xfs_free_file_space().
> > 
> > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> #earlier version
> > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > ---
> >  fs/xfs/xfs_bmap_util.c |  7 ++-----
> >  fs/xfs/xfs_inode.c     | 14 ++++++++++++++
> >  fs/xfs/xfs_inode.h     |  2 ++
> >  3 files changed, 18 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 60389ac8bd45..46eebecd7bba 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -854,11 +854,8 @@ xfs_free_file_space(
> >  	startoffset_fsb = XFS_B_TO_FSB(mp, offset);
> >  	endoffset_fsb = XFS_B_TO_FSBT(mp, offset + len);
> >  
> > -	/* We can only free complete realtime extents. */
> > -	if (xfs_inode_has_bigrtalloc(ip)) {
> > -		startoffset_fsb = xfs_rtb_roundup_rtx(mp, startoffset_fsb);
> > -		endoffset_fsb = xfs_rtb_rounddown_rtx(mp, endoffset_fsb);
> > -	}
> > +	/* Free only complete extents. */
> > +	xfs_roundin_to_alloc_fsbsize(ip, &startoffset_fsb, &endoffset_fsb);
> 
> ...and then this becomes:
> 
> 	/* We can only free complete allocation units. */
> 	startoffset_fsb = xfs_inode_roundup_alloc_unit(ip, startoffset_fsb);
> 	endoffset_fsb = xfs_inode_rounddown_alloc_unit(ip, endoffset_fsb);

I much prefer this (roundup/rounddown) to the "in/out" API.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 03/14] xfs: simplify extent allocation alignment
  2024-08-06 23:52     ` Dave Chinner
@ 2024-08-07  0:23       ` Darrick J. Wong
  2024-08-07  0:34         ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-07  0:23 UTC (permalink / raw)
  To: Dave Chinner
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Wed, Aug 07, 2024 at 09:52:06AM +1000, Dave Chinner wrote:
> On Tue, Aug 06, 2024 at 11:56:51AM -0700, Darrick J. Wong wrote:
> > On Thu, Aug 01, 2024 at 04:30:46PM +0000, John Garry wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > We currently align extent allocation to stripe unit or stripe width.
> > > That is specified by an external parameter to the allocation code,
> > > which then manipulates the xfs_alloc_args alignment configuration in
> > > interesting ways.
> > > 
> > > The args->alignment field specifies extent start alignment, but
> > > because we may be attempting non-aligned allocation first there are
> > > also slop variables that allow for those allocation attempts to
> > > account for aligned allocation if they fail.
> > > 
> > > This gets much more complex as we introduce forced allocation
> > > alignment, where extent size hints are used to generate the extent
> > > start alignment. extent size hints currently only affect extent
> > > lengths (via args->prod and args->mod) and so with this change we
> > > will have two different start alignment conditions.
> > > 
> > > Avoid this complexity by always using args->alignment to indicate
> > > extent start alignment, and always using args->prod/mod to indicate
> > > extent length adjustment.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > [jpg: fixup alignslop references in xfs_trace.h and xfs_ialloc.c]
> > > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > 
> > Going back to the 6/21 posting[1], what were the answers to the
> > questions I posted?  Did I correctly figure out what alignslop refers
> > to?
> 
> Hard to say.
> 
> alignslop is basically an temporary accounting mechanism used to
> prevent filesystem shutdowns when the AG is near ENOSPC and exact
> BNO allocation is attempted and fails because there isn't an exact
> free space available. This exact bno allocation attempt can dirty
> the AGFL, and before we dirty the transaction *we must guarantee the
> allocation will succeed*. If the allocation fails after we've
> started modifying metadata (for whatever reason) we will cancel a
> dirty transaction and shut down the filesystem.
> 
> Hence the first allocation done from the xfs_bmap_btalloc() context
> needs to account for every block the specific allocation and all the
> failure fallback attempts *may require* before it starts modifying
> metadata. The contiguous exact bno allocation case isn't an aligned
> allocation, but it will be followed by an aligned allocation attempt
> if it fails and so it must take into account the space requirements
> of aligned allocation even though it is not an aligned allocation
> itself.
> 
> args->alignslop allows xfs_alloc_space_available() to take this
> space requirement into account for any allocation that has lesser
> alignment requirements than any subsequent allocation attempt that
> may follow if this specific allocation attempt fails.
> 
> IOWs, args->alignslop is similar to args->minleft and args->total in
> purpose, but it only affects the accounting for this specific
> allocation attempt rather than defining the amount of space
> that needs to remain available at the successful completion of this
> allocation for future allocations within this transaction context.

Oh, okay.  IOWs, "slop" is a means for alloc callers to communicate that
they need to perform an aligned allocation, but that right now they want
to try an exact allocation (with looser alignment) so that they might
get better locality.  However, they don't want the exact allocation scan
to commit to a certain AG unless the aligned allocation will be able to
find an aligned space *somewhere* in that AG if the exact scan fails.
For any other kind of allocation situation, slop should be zero.

If the above statement is correct, could we paste that into the
definition of struct xfs_alloc_arg so that I don't have to recollect all
this the next time I see something involving alignslop?  After which
I'm ok saying:

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 04/14] xfs: make EOF allocation simpler
  2024-08-07  0:00     ` Dave Chinner
@ 2024-08-07  0:24       ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-07  0:24 UTC (permalink / raw)
  To: Dave Chinner
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Wed, Aug 07, 2024 at 10:00:46AM +1000, Dave Chinner wrote:
> On Tue, Aug 06, 2024 at 11:58:53AM -0700, Darrick J. Wong wrote:
> > On Thu, Aug 01, 2024 at 04:30:47PM +0000, John Garry wrote:
> > > @@ -3688,12 +3649,19 @@ xfs_bmap_btalloc_filestreams(
> > >  	}
> > >  
> > >  	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
> > > -	if (ap->aeof)
> > > -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, true);
> > > +	if (ap->aeof && ap->offset)
> > > +		error = xfs_bmap_btalloc_at_eof(ap, args);
> > >  
> > > +	/* This may be an aligned allocation attempt. */
> > >  	if (!error && args->fsbno == NULLFSBLOCK)
> > >  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> > >  
> > > +	/* Attempt non-aligned allocation if we haven't already. */
> > > +	if (!error && args->fsbno == NULLFSBLOCK && args->alignment > 1)  {
> > > +		args->alignment = 1;
> > > +		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> > 
> > Oops, I just replied to the v2 thread instead of this.
> > 
> > From
> > https://lore.kernel.org/linux-xfs/20240621203556.GU3058325@frogsfrogsfrogs/
> > 
> > Do we have to zero args->alignslop here?
> 
> No. It should always be zero here to begin with. It is the
> responsibility of the allocation attempt that modifies
> args->alignment and args->alignslop to reset them to original values
> on allocation failure.
> 
> The only places we use alignslop are xfs_bmap_btalloc_at_eof() and
> xfs_ialloc_ag_alloc(). They both zero it and reset args->alignment
> on allocation failure before falling through to the next allocation
> attempt.

Aha, that's even in this patch.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 01/14] xfs: only allow minlen allocations when near ENOSPC
  2024-08-06 18:51   ` Darrick J. Wong
@ 2024-08-07  0:26     ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2024-08-07  0:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Tue, Aug 06, 2024 at 11:51:38AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 01, 2024 at 04:30:44PM +0000, John Garry wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When we are near ENOSPC and don't have enough free
> > space for an args->maxlen allocation, xfs_alloc_space_available()
> > will trim args->maxlen to equal the available space. However, this
> > function has only checked that there is enough contiguous free space
> > for an aligned args->minlen allocation to succeed. Hence there is no
> > guarantee that an args->maxlen allocation will succeed, nor that the
> > available space will allow for correct alignment of an args->maxlen
> > allocation.
> > 
> > Further, by trimming args->maxlen arbitrarily, it breaks an
> > assumption made in xfs_alloc_fix_len() that if the caller wants
> > aligned allocation, then args->maxlen will be set to an aligned
> > value. It then skips the tail alignment and so we end up with
> > extents that aren't aligned to extent size hint boundaries as we
> > approach ENOSPC.
> > 
> > To avoid this problem, don't reduce args->maxlen by some random,
> > arbitrary amount. If args->maxlen is too large for the available
> > space, reduce the allocation to a minlen allocation as we know we
> > have contiguous free space available for this to succeed and always
> > be correctly aligned.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 59326f84f6a5..d559d992c6ef 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -2524,14 +2524,23 @@ xfs_alloc_space_available(
> >  	if (available < (int)max(args->total, alloc_len))
> >  		return false;
> >  
> > +	if (flags & XFS_ALLOC_FLAG_CHECK)
> > +		return true;
> > +
> >  	/*
> > -	 * Clamp maxlen to the amount of free space available for the actual
> > -	 * extent allocation.
> > +	 * If we can't do a maxlen allocation, then we must reduce the size of
> > +	 * the allocation to match the available free space. We know how big
> > +	 * the largest contiguous free space we can allocate is, so that's our
> > +	 * upper bound. However, we don't exaclty know what alignment/size
> > +	 * constraints have been placed on the allocation, so we can't
> > +	 * arbitrarily select some new max size. Hence make this a minlen
> > +	 * allocation as we know that will definitely succeed and match the
> > +	 * callers alignment constraints.
> >  	 */
> > -	if (available < (int)args->maxlen && !(flags & XFS_ALLOC_FLAG_CHECK)) {
> > -		args->maxlen = available;
> > +	alloc_len = args->maxlen + (args->alignment - 1) + args->minalignslop;
> > +	if (longest < alloc_len) {
> > +		args->maxlen = args->minlen;
> 
> Same question as the June 21st posting:
> 
> Is it possible to reduce maxlen the largest multiple of the alignment
> that is still less than @longest?

Perhaps.

The comment does say "we don't exaclty know what alignment/size
constraints have been placed on the allocation, so we can't
arbitrarily select some new max size."

Given this unknown I simply punted the issue and went straight to
selecting a size the caller has guaranteed will be valid for their
constraints.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 03/14] xfs: simplify extent allocation alignment
  2024-08-07  0:23       ` Darrick J. Wong
@ 2024-08-07  0:34         ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2024-08-07  0:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: John Garry, chandan.babu, dchinner, hch, viro, brauner, jack,
	linux-xfs, linux-kernel, linux-fsdevel, catherine.hoang,
	martin.petersen

On Tue, Aug 06, 2024 at 05:23:58PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 07, 2024 at 09:52:06AM +1000, Dave Chinner wrote:
> > On Tue, Aug 06, 2024 at 11:56:51AM -0700, Darrick J. Wong wrote:
> > > On Thu, Aug 01, 2024 at 04:30:46PM +0000, John Garry wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > We currently align extent allocation to stripe unit or stripe width.
> > > > That is specified by an external parameter to the allocation code,
> > > > which then manipulates the xfs_alloc_args alignment configuration in
> > > > interesting ways.
> > > > 
> > > > The args->alignment field specifies extent start alignment, but
> > > > because we may be attempting non-aligned allocation first there are
> > > > also slop variables that allow for those allocation attempts to
> > > > account for aligned allocation if they fail.
> > > > 
> > > > This gets much more complex as we introduce forced allocation
> > > > alignment, where extent size hints are used to generate the extent
> > > > start alignment. extent size hints currently only affect extent
> > > > lengths (via args->prod and args->mod) and so with this change we
> > > > will have two different start alignment conditions.
> > > > 
> > > > Avoid this complexity by always using args->alignment to indicate
> > > > extent start alignment, and always using args->prod/mod to indicate
> > > > extent length adjustment.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > [jpg: fixup alignslop references in xfs_trace.h and xfs_ialloc.c]
> > > > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > > 
> > > Going back to the 6/21 posting[1], what were the answers to the
> > > questions I posted?  Did I correctly figure out what alignslop refers
> > > to?
> > 
> > Hard to say.
> > 
> > alignslop is basically an temporary accounting mechanism used to
> > prevent filesystem shutdowns when the AG is near ENOSPC and exact
> > BNO allocation is attempted and fails because there isn't an exact
> > free space available. This exact bno allocation attempt can dirty
> > the AGFL, and before we dirty the transaction *we must guarantee the
> > allocation will succeed*. If the allocation fails after we've
> > started modifying metadata (for whatever reason) we will cancel a
> > dirty transaction and shut down the filesystem.
> > 
> > Hence the first allocation done from the xfs_bmap_btalloc() context
> > needs to account for every block the specific allocation and all the
> > failure fallback attempts *may require* before it starts modifying
> > metadata. The contiguous exact bno allocation case isn't an aligned
> > allocation, but it will be followed by an aligned allocation attempt
> > if it fails and so it must take into account the space requirements
> > of aligned allocation even though it is not an aligned allocation
> > itself.
> > 
> > args->alignslop allows xfs_alloc_space_available() to take this
> > space requirement into account for any allocation that has lesser
> > alignment requirements than any subsequent allocation attempt that
> > may follow if this specific allocation attempt fails.
> > 
> > IOWs, args->alignslop is similar to args->minleft and args->total in
> > purpose, but it only affects the accounting for this specific
> > allocation attempt rather than defining the amount of space
> > that needs to remain available at the successful completion of this
> > allocation for future allocations within this transaction context.
> 
> Oh, okay.  IOWs, "slop" is a means for alloc callers to communicate that
> they need to perform an aligned allocation, but that right now they want
> to try an exact allocation (with looser alignment) so that they might
> get better locality.  However, they don't want the exact allocation scan
> to commit to a certain AG unless the aligned allocation will be able to
> find an aligned space *somewhere* in that AG if the exact scan fails.
> For any other kind of allocation situation, slop should be zero.
> 
> If the above statement is correct,

It is, except for a small nit: alignslop isn't exact bno allocation
specific, it allows any sort of unaligned -> fail -> aligned
allocation fallback pattern to select an AG where the fallback
aligned allocation will have space to succeed.

> could we paste that into the
> definition of struct xfs_alloc_arg so that I don't have to recollect all
> this the next time I see something involving alignslop?

Sure.

> After which
> I'm ok saying:
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 07/14] xfs: Introduce FORCEALIGN inode flag
  2024-08-06 19:02   ` Darrick J. Wong
@ 2024-08-07 11:42     ` John Garry
  2024-08-07 14:40       ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-07 11:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 06/08/2024 20:02, Darrick J. Wong wrote:
> On Thu, Aug 01, 2024 at 04:30:50PM +0000, John Garry wrote:
>> From: "Darrick J. Wong" <djwong@kernel.org>
>>
>> Add a new inode flag to require that all file data extent mappings must
>> be aligned (both the file offset range and the allocated space itself)
>> to the extent size hint.  Having a separate COW extent size hint is no
>> longer allowed.
>>
>> The goal here is to enable sysadmins and users to mandate that all space
>> mappings in a file must have a startoff/blockcount that are aligned to
>> (say) a 2MB alignment and that the startblock/blockcount will follow the
>> same alignment.
>>
>> Allocated space will be aligned to start of the AG, and not necessarily
>> aligned with disk blocks. The upcoming atomic writes feature will rely and
>> forcealign and will also require allocated space will also be aligned to
>> disk blocks.
>>
>> reflink will not be supported for forcealign yet, so disallow a mount under
>> this condition. This is because we have the limitation of pageache
>> writeback not knowing how to writeback an entire allocation unut, so
>> reject a mount with relink.
>>
>> RT vol will not be supported for forcealign yet, so disallow a mount under
>> this condition. It will be possible to support RT vol and forcealign in
>> future. For this, the inode extsize must be a multiple of rtextsize - this
>> is enforced already in xfs_ioctl_setattr_check_extsize() and
>> xfs_inode_validate_extsize().
>>
>> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
>> Co-developed-by: John Garry <john.g.garry@oracle.com>
>> [jpg: many changes from orig, including forcealign inode verification
>>   rework, ioctl setattr rework disallow reflink a forcealign inode,
>>   disallow mount for forcealign + reflink or rt]
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
> 
> This patch looks ready to me but as I'm the original author I cannot add
> a RVB tag.  Someone else needs to add that -- frankly, John is the best
> candidate because he grabbed my patch into his tree and actually
> modified it to do what he wants, which means he's the most familiar with
> it.

I thought my review would be implied since I noted how I appended it, above.

Anyway,

Reviewed-by: John Garry <john.g.garry@oracle.com>

I am hoping that Dave and Christoph will give some formal ack/review 
when they get a chance.

BTW, at what stage do we give XFS_SB_FEAT_RO_COMPAT_FORCEALIGN a more 
proper value? So far it has the experimental dev value of 1 << 30, below.

Thanks!


>>
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index e1bfee0c3b1a..95f5259c4255 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -352,6 +352,7 @@ xfs_sb_has_compat_feature(
>>   #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
>>   #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
>>   #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
>> +#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
>>   #define XFS_SB_FEAT_RO_COMPAT_ALL \
>>   		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
>>   		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 10/14] xfs: Do not free EOF blocks for forcealign
  2024-08-06 19:24   ` Darrick J. Wong
@ 2024-08-07 12:33     ` John Garry
  2024-08-07 15:12       ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-07 12:33 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

>> +void
>> +xfs_roundout_to_alloc_fsbsize(
>> +	struct xfs_inode	*ip,
>> +	xfs_fileoff_t		*start,
>> +	xfs_fileoff_t		*end)
>> +{
>> +	unsigned int		blocks = xfs_inode_alloc_fsbsize(ip);
>> +
>> +	if (blocks == 1)
>> +		return;
>> +	*start = rounddown_64(*start, blocks);
>> +	*end = roundup_64(*end, blocks);
>> +}
> 
> This is probably going to start another round of shouting, but I think
> it's silly to do two rounding operations when you only care about one
> value. 

Sure, but the "in" version does use the 2x values and I wanted to be 
consistent. Anyway, I really don't feel strongly about this.

> In patch 12 it results in a bunch more dummy variables that you
> then ignore.
> 
> Can't this be:
> 
> static inline xfs_fileoff_t
> xfs_inode_rounddown_alloc_unit(

Just a question about the naming:
xfs_inode_alloc_unitsize() returns bytes, so I would expect 
xfs_inode_rounddown_alloc_unit() to deal in bytes. Would you be 
satisfied with xfs_rounddown_alloc_fsbsize()? Or any other suggestion?

> 	struct xfs_inode	*ip,
> 	xfs_fileoff		off)
> {
> 	unsigned int		rounding = xfs_inode_alloc_fsbsize(ip);
> 
> 	if (rounding == 1)
> 		return off;
> 	return rounddown_64(off, rounding);
> }
> 
> static inline xfs_fileoff_t
> xfs_inode_roundup_alloc_unit(
> 	struct xfs_inode	*ip,
> 	xfs_fileoff		off)
> {
> 	unsigned int		rounding = xfs_inode_alloc_fsbsize(ip);
> 
> 	if (rounding == 1)
> 		return off;
> 	return roundup_64(off, rounding);
> }
> 
> Then that callsite can be:
> 
> 	end_fsb = xfs_inode_roundup_alloc_unit(ip,
> 			XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)));


Thanks,
John

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 11/14] xfs: Only free full extents for forcealign
  2024-08-06 19:27   ` Darrick J. Wong
  2024-08-07  0:08     ` Dave Chinner
@ 2024-08-07 13:06     ` John Garry
  1 sibling, 0 replies; 42+ messages in thread
From: John Garry @ 2024-08-07 13:06 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 06/08/2024 20:27, Darrick J. Wong wrote:
> I guess "roundout" means "extend start and end to fit allocation unit"
> whereas "roundin" means "shrink start and end to fit allocation unit"?

Yes, I was inspired by wording "inwards" and "outwards" used in a 
discussion in the previous series. Anyway, I can change it as suggested.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 12/14] xfs: Unmap blocks according to forcealign
  2024-08-06 20:14   ` Darrick J. Wong
@ 2024-08-07 13:40     ` John Garry
  2024-08-07 16:19       ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-07 13:40 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 06/08/2024 21:14, Darrick J. Wong wrote:
> On Thu, Aug 01, 2024 at 04:30:55PM +0000, John Garry wrote:
>> For when forcealign is enabled, blocks in an inode need to be unmapped
>> according to extent alignment, like what is already done for rtvol.
>>
>> Change variable isrt in __xfs_bunmapi() to a bool, as that is really what
>> it is.
>>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_bmap.c | 48 +++++++++++++++++++++++++++++-----------
>>   fs/xfs/xfs_inode.c       | 16 ++++++++++++++
>>   fs/xfs/xfs_inode.h       |  2 ++
>>   3 files changed, 53 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
>> index 0c3df8c71c6d..d6ae344a17fc 100644
>> --- a/fs/xfs/libxfs/xfs_bmap.c
>> +++ b/fs/xfs/libxfs/xfs_bmap.c
>> @@ -5409,6 +5409,25 @@ xfs_bmap_del_extent_real(
>>   	return 0;
>>   }
>>   
>> +static xfs_extlen_t
>> +xfs_bunmapi_align(
>> +	struct xfs_inode	*ip,
>> +	xfs_fsblock_t		fsbno,
>> +	xfs_extlen_t *off)
>> +{
>> +	struct xfs_mount	*mp = ip->i_mount;
>> +
>> +	if (XFS_IS_REALTIME_INODE(ip))
>> +		return xfs_inode_alloc_fsbsize_align(ip, fsbno, off);
>> +	/*
>> +	 * The agbno for the fsbno is aligned to extsize, but the fsbno itself
>> +	 * is not necessarily aligned (to extsize), so use agbno to determine
>> +	 * mod+offset to the alloc unit boundary.
>> +	 */
>> +	return xfs_inode_alloc_fsbsize_align(ip, XFS_FSB_TO_AGBNO(mp, fsbno),
>> +					off);
>> +}
>> +
>>   /*
>>    * Unmap (remove) blocks from a file.
>>    * If nexts is nonzero then the number of extents to remove is limited to
>> @@ -5430,7 +5449,8 @@ __xfs_bunmapi(
>>   	xfs_extnum_t		extno;		/* extent number in list */
>>   	struct xfs_bmbt_irec	got;		/* current extent record */
>>   	struct xfs_ifork	*ifp;		/* inode fork pointer */
>> -	int			isrt;		/* freeing in rt area */
>> +	bool			isrt;		/* freeing in rt area */
>> +	bool			isforcealign;	/* forcealign inode */
>>   	int			logflags;	/* transaction logging flags */
>>   	xfs_extlen_t		mod;		/* rt extent offset */
>>   	struct xfs_mount	*mp = ip->i_mount;
>> @@ -5468,6 +5488,8 @@ __xfs_bunmapi(
>>   	}
>>   	XFS_STATS_INC(mp, xs_blk_unmap);
>>   	isrt = xfs_ifork_is_realtime(ip, whichfork);
>> +	isforcealign = (whichfork != XFS_ATTR_FORK) &&
>> +			xfs_inode_has_forcealign(ip);
>>   	end = start + len;
>>   
>>   	if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) {
>> @@ -5486,6 +5508,8 @@ __xfs_bunmapi(
>>   	extno = 0;
>>   	while (end != (xfs_fileoff_t)-1 && end >= start &&
>>   	       (nexts == 0 || extno < nexts)) {
>> +		xfs_extlen_t off;
> 
> I got really confused because I thought this was a file block offset and
> only after more digging realized that this is a sometimes dummy
> adjustment variable.

yeah, I put it here to use the common helper in both callsites

> 
>> +
>>   		/*
>>   		 * Is the found extent after a hole in which end lives?
>>   		 * Just back up to the previous extent, if so.
>> @@ -5519,18 +5543,18 @@ __xfs_bunmapi(
>>   		if (del.br_startoff + del.br_blockcount > end + 1)
>>   			del.br_blockcount = end + 1 - del.br_startoff;
>>   
>> -		if (!isrt || (flags & XFS_BMAPI_REMAP))
>> +		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
>>   			goto delete;
>>   
>> -		mod = xfs_rtb_to_rtxoff(mp,
>> -				del.br_startblock + del.br_blockcount);
>> +		mod = xfs_bunmapi_align(ip,
>> +				del.br_startblock + del.br_blockcount, &off);
>>   		if (mod) {
> 
> Oof.  I don't like how this loop body has the rtx adjustment code
> inlined into it.  We only use the isrt flag for the one test above.
> I tried hoisting this into something less gross involving separate
> adjustment functions but then you have to pass in so many outer
> variables that it becomes a mess.
> 
> The best I can come up with for now is:
> 
> 	unsigned int		alloc_fsb = xfs_inode_alloc_fsbsize(ip);
> 	/* no more isrt/isforcealign bools */
> 
> ...
> 
> 		if (alloc_fsb == 1 || (flags & XFS_BMAPI_REMAP))
> 			goto delete;

ok, good

> 
> 		mod = do_div(del.br_startblock + del.br_blockcount,
> 				alloc_fsb);

Note that xfs_bunmapi_align() uses agbno for !rt

> 		if (mod) {
> 
>>   			/*
>> -			 * Realtime extent not lined up at the end.
>> +			 * Not aligned to allocation unit on the end.
>>   			 * The extent could have been split into written
>>   			 * and unwritten pieces, or we could just be
>>   			 * unmapping part of it.  But we can't really
>> -			 * get rid of part of a realtime extent.
>> +			 * get rid of part of an extent.
>>   			 */
>>   			if (del.br_state == XFS_EXT_UNWRITTEN) {
>>   				/*
>> @@ -5554,7 +5578,7 @@ __xfs_bunmapi(
>>   			ASSERT(del.br_state == XFS_EXT_NORM);
>>   			ASSERT(tp->t_blk_res > 0);
>>   			/*
>> -			 * If this spans a realtime extent boundary,
>> +			 * If this spans an extent boundary,
>>   			 * chop it back to the start of the one we end at.
>>   			 */
>>   			if (del.br_blockcount > mod) {
>> @@ -5571,14 +5595,12 @@ __xfs_bunmapi(
>>   			goto nodelete;
>>   		}
>>   
>> -		mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
>> +		mod = xfs_bunmapi_align(ip, del.br_startblock, &off);
>>   		if (mod) {
>> -			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
> 
> 		mod = do_div(del.br_startblock, alloc_fsb);
> 		if (mod) {
> 			xfs_extlen_t off = alloc_fsb - mod;
> 
> At least then you don't need this weird xfs_inode_alloc_fsbsize_align
> that passes back two xfs_extlen_t arguments.

Sure, but same point about xfs_bunmapi_align() using agbno. I suppose I 
can just make the change to not have xfs_bunmapi_align() be passed the 
off address pointer, and do the calcation here for off here (as you 
suggest).

Thanks,
John

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 14/14] xfs: Enable file data forcealign feature
  2024-08-06 19:43   ` Darrick J. Wong
@ 2024-08-07 13:50     ` John Garry
  2024-08-07 15:17       ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: John Garry @ 2024-08-07 13:50 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On 06/08/2024 20:43, Darrick J. Wong wrote:
> On Thu, Aug 01, 2024 at 04:30:57PM +0000, John Garry wrote:
>> From: "Darrick J. Wong"<djwong@kernel.org>
>>
>> Enable this feature.
>>
>> Signed-off-by: "Darrick J. Wong"<djwong@kernel.org>
>> Signed-off-by: John Garry<john.g.garry@oracle.com>
> You really ought to add your own Reviewed-by tag here...

ok, if you prefer. Normally I think it's implied by the context and 
signed-off tag.

Thanks,
John

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 07/14] xfs: Introduce FORCEALIGN inode flag
  2024-08-07 11:42     ` John Garry
@ 2024-08-07 14:40       ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-07 14:40 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Wed, Aug 07, 2024 at 12:42:32PM +0100, John Garry wrote:
> On 06/08/2024 20:02, Darrick J. Wong wrote:
> > On Thu, Aug 01, 2024 at 04:30:50PM +0000, John Garry wrote:
> > > From: "Darrick J. Wong" <djwong@kernel.org>
> > > 
> > > Add a new inode flag to require that all file data extent mappings must
> > > be aligned (both the file offset range and the allocated space itself)
> > > to the extent size hint.  Having a separate COW extent size hint is no
> > > longer allowed.
> > > 
> > > The goal here is to enable sysadmins and users to mandate that all space
> > > mappings in a file must have a startoff/blockcount that are aligned to
> > > (say) a 2MB alignment and that the startblock/blockcount will follow the
> > > same alignment.
> > > 
> > > Allocated space will be aligned to start of the AG, and not necessarily
> > > aligned with disk blocks. The upcoming atomic writes feature will rely and
> > > forcealign and will also require allocated space will also be aligned to
> > > disk blocks.
> > > 
> > > reflink will not be supported for forcealign yet, so disallow a mount under
> > > this condition. This is because we have the limitation of pageache
> > > writeback not knowing how to writeback an entire allocation unut, so
> > > reject a mount with relink.
> > > 
> > > RT vol will not be supported for forcealign yet, so disallow a mount under
> > > this condition. It will be possible to support RT vol and forcealign in
> > > future. For this, the inode extsize must be a multiple of rtextsize - this
> > > is enforced already in xfs_ioctl_setattr_check_extsize() and
> > > xfs_inode_validate_extsize().
> > > 
> > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > > Co-developed-by: John Garry <john.g.garry@oracle.com>
> > > [jpg: many changes from orig, including forcealign inode verification
> > >   rework, ioctl setattr rework disallow reflink a forcealign inode,
> > >   disallow mount for forcealign + reflink or rt]
> > > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > 
> > This patch looks ready to me but as I'm the original author I cannot add
> > a RVB tag.  Someone else needs to add that -- frankly, John is the best
> > candidate because he grabbed my patch into his tree and actually
> > modified it to do what he wants, which means he's the most familiar with
> > it.
> 
> I thought my review would be implied since I noted how I appended it, above.
> 
> Anyway,
> 
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> 
> I am hoping that Dave and Christoph will give some formal ack/review when
> they get a chance.
> 
> BTW, at what stage do we give XFS_SB_FEAT_RO_COMPAT_FORCEALIGN a more proper
> value? So far it has the experimental dev value of 1 << 30, below.

When you're ready for the release manager to merge it, change the value,
resend the entire series for archival purposes, and send a pull request.

--D

> Thanks!
> 
> 
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > > index e1bfee0c3b1a..95f5259c4255 100644
> > > --- a/fs/xfs/libxfs/xfs_format.h
> > > +++ b/fs/xfs/libxfs/xfs_format.h
> > > @@ -352,6 +352,7 @@ xfs_sb_has_compat_feature(
> > >   #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
> > >   #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
> > >   #define XFS_SB_FEAT_RO_COMPAT_INOBTCNT (1 << 3)		/* inobt block counts */
> > > +#define XFS_SB_FEAT_RO_COMPAT_FORCEALIGN (1 << 30)	/* aligned file data extents */
> > >   #define XFS_SB_FEAT_RO_COMPAT_ALL \
> > >   		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
> > >   		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 10/14] xfs: Do not free EOF blocks for forcealign
  2024-08-07 12:33     ` John Garry
@ 2024-08-07 15:12       ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-07 15:12 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Wed, Aug 07, 2024 at 01:33:49PM +0100, John Garry wrote:
> > > +void
> > > +xfs_roundout_to_alloc_fsbsize(
> > > +	struct xfs_inode	*ip,
> > > +	xfs_fileoff_t		*start,
> > > +	xfs_fileoff_t		*end)
> > > +{
> > > +	unsigned int		blocks = xfs_inode_alloc_fsbsize(ip);
> > > +
> > > +	if (blocks == 1)
> > > +		return;
> > > +	*start = rounddown_64(*start, blocks);
> > > +	*end = roundup_64(*end, blocks);
> > > +}
> > 
> > This is probably going to start another round of shouting, but I think
> > it's silly to do two rounding operations when you only care about one
> > value.
> 
> Sure, but the "in" version does use the 2x values and I wanted to be
> consistent. Anyway, I really don't feel strongly about this.
> 
> > In patch 12 it results in a bunch more dummy variables that you
> > then ignore.
> > 
> > Can't this be:
> > 
> > static inline xfs_fileoff_t
> > xfs_inode_rounddown_alloc_unit(
> 
> Just a question about the naming:
> xfs_inode_alloc_unitsize() returns bytes, so I would expect
> xfs_inode_rounddown_alloc_unit() to deal in bytes. Would you be satisfied
> with xfs_rounddown_alloc_fsbsize()? Or any other suggestion?

xfs_fileoff_t is the unit for file logical blocks, no need to append
stuff to the name.  It's clear that this function takes a file block
offset and returns another one.  If we need a second function for file
byte offsets then we can add another function and maybe wrap the whole
mess in _Generic.

--D

> > 	struct xfs_inode	*ip,
> > 	xfs_fileoff		off)
> > {
> > 	unsigned int		rounding = xfs_inode_alloc_fsbsize(ip);
> > 
> > 	if (rounding == 1)
> > 		return off;
> > 	return rounddown_64(off, rounding);
> > }
> > 
> > static inline xfs_fileoff_t
> > xfs_inode_roundup_alloc_unit(
> > 	struct xfs_inode	*ip,
> > 	xfs_fileoff		off)
> > {
> > 	unsigned int		rounding = xfs_inode_alloc_fsbsize(ip);
> > 
> > 	if (rounding == 1)
> > 		return off;
> > 	return roundup_64(off, rounding);
> > }
> > 
> > Then that callsite can be:
> > 
> > 	end_fsb = xfs_inode_roundup_alloc_unit(ip,
> > 			XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)));
> 
> 
> Thanks,
> John
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 14/14] xfs: Enable file data forcealign feature
  2024-08-07 13:50     ` John Garry
@ 2024-08-07 15:17       ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-07 15:17 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Wed, Aug 07, 2024 at 02:50:18PM +0100, John Garry wrote:
> On 06/08/2024 20:43, Darrick J. Wong wrote:
> > On Thu, Aug 01, 2024 at 04:30:57PM +0000, John Garry wrote:
> > > From: "Darrick J. Wong"<djwong@kernel.org>
> > > 
> > > Enable this feature.
> > > 
> > > Signed-off-by: "Darrick J. Wong"<djwong@kernel.org>
> > > Signed-off-by: John Garry<john.g.garry@oracle.com>
> > You really ought to add your own Reviewed-by tag here...
> 
> ok, if you prefer. Normally I think it's implied by the context and
> signed-off tag.

SoB means only that either you've written the patch yourself and have
the right to submit it, or that you have the right to pass on a patch
written by someone else who certified the same thing.

An actual code review should be signalled explicitly, not implied by
context.

--D

> Thanks,
> John
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 12/14] xfs: Unmap blocks according to forcealign
  2024-08-07 13:40     ` John Garry
@ 2024-08-07 16:19       ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2024-08-07 16:19 UTC (permalink / raw)
  To: John Garry
  Cc: chandan.babu, dchinner, hch, viro, brauner, jack, linux-xfs,
	linux-kernel, linux-fsdevel, catherine.hoang, martin.petersen

On Wed, Aug 07, 2024 at 02:40:36PM +0100, John Garry wrote:
> On 06/08/2024 21:14, Darrick J. Wong wrote:
> > On Thu, Aug 01, 2024 at 04:30:55PM +0000, John Garry wrote:
> > > For when forcealign is enabled, blocks in an inode need to be unmapped
> > > according to extent alignment, like what is already done for rtvol.
> > > 
> > > Change variable isrt in __xfs_bunmapi() to a bool, as that is really what
> > > it is.
> > > 
> > > Signed-off-by: John Garry <john.g.garry@oracle.com>
> > > ---
> > >   fs/xfs/libxfs/xfs_bmap.c | 48 +++++++++++++++++++++++++++++-----------
> > >   fs/xfs/xfs_inode.c       | 16 ++++++++++++++
> > >   fs/xfs/xfs_inode.h       |  2 ++
> > >   3 files changed, 53 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > > index 0c3df8c71c6d..d6ae344a17fc 100644
> > > --- a/fs/xfs/libxfs/xfs_bmap.c
> > > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > > @@ -5409,6 +5409,25 @@ xfs_bmap_del_extent_real(
> > >   	return 0;
> > >   }
> > > +static xfs_extlen_t
> > > +xfs_bunmapi_align(
> > > +	struct xfs_inode	*ip,
> > > +	xfs_fsblock_t		fsbno,
> > > +	xfs_extlen_t *off)
> > > +{
> > > +	struct xfs_mount	*mp = ip->i_mount;
> > > +
> > > +	if (XFS_IS_REALTIME_INODE(ip))
> > > +		return xfs_inode_alloc_fsbsize_align(ip, fsbno, off);
> > > +	/*
> > > +	 * The agbno for the fsbno is aligned to extsize, but the fsbno itself
> > > +	 * is not necessarily aligned (to extsize), so use agbno to determine
> > > +	 * mod+offset to the alloc unit boundary.
> > > +	 */
> > > +	return xfs_inode_alloc_fsbsize_align(ip, XFS_FSB_TO_AGBNO(mp, fsbno),
> > > +					off);
> > > +}
> > > +
> > >   /*
> > >    * Unmap (remove) blocks from a file.
> > >    * If nexts is nonzero then the number of extents to remove is limited to
> > > @@ -5430,7 +5449,8 @@ __xfs_bunmapi(
> > >   	xfs_extnum_t		extno;		/* extent number in list */
> > >   	struct xfs_bmbt_irec	got;		/* current extent record */
> > >   	struct xfs_ifork	*ifp;		/* inode fork pointer */
> > > -	int			isrt;		/* freeing in rt area */
> > > +	bool			isrt;		/* freeing in rt area */
> > > +	bool			isforcealign;	/* forcealign inode */
> > >   	int			logflags;	/* transaction logging flags */
> > >   	xfs_extlen_t		mod;		/* rt extent offset */
> > >   	struct xfs_mount	*mp = ip->i_mount;
> > > @@ -5468,6 +5488,8 @@ __xfs_bunmapi(
> > >   	}
> > >   	XFS_STATS_INC(mp, xs_blk_unmap);
> > >   	isrt = xfs_ifork_is_realtime(ip, whichfork);
> > > +	isforcealign = (whichfork != XFS_ATTR_FORK) &&
> > > +			xfs_inode_has_forcealign(ip);
> > >   	end = start + len;
> > >   	if (!xfs_iext_lookup_extent_before(ip, ifp, &end, &icur, &got)) {
> > > @@ -5486,6 +5508,8 @@ __xfs_bunmapi(
> > >   	extno = 0;
> > >   	while (end != (xfs_fileoff_t)-1 && end >= start &&
> > >   	       (nexts == 0 || extno < nexts)) {
> > > +		xfs_extlen_t off;
> > 
> > I got really confused because I thought this was a file block offset and
> > only after more digging realized that this is a sometimes dummy
> > adjustment variable.
> 
> yeah, I put it here to use the common helper in both callsites
> 
> > 
> > > +
> > >   		/*
> > >   		 * Is the found extent after a hole in which end lives?
> > >   		 * Just back up to the previous extent, if so.
> > > @@ -5519,18 +5543,18 @@ __xfs_bunmapi(
> > >   		if (del.br_startoff + del.br_blockcount > end + 1)
> > >   			del.br_blockcount = end + 1 - del.br_startoff;
> > > -		if (!isrt || (flags & XFS_BMAPI_REMAP))
> > > +		if ((!isrt && !isforcealign) || (flags & XFS_BMAPI_REMAP))
> > >   			goto delete;
> > > -		mod = xfs_rtb_to_rtxoff(mp,
> > > -				del.br_startblock + del.br_blockcount);
> > > +		mod = xfs_bunmapi_align(ip,
> > > +				del.br_startblock + del.br_blockcount, &off);
> > >   		if (mod) {
> > 
> > Oof.  I don't like how this loop body has the rtx adjustment code
> > inlined into it.  We only use the isrt flag for the one test above.
> > I tried hoisting this into something less gross involving separate
> > adjustment functions but then you have to pass in so many outer
> > variables that it becomes a mess.
> > 
> > The best I can come up with for now is:
> > 
> > 	unsigned int		alloc_fsb = xfs_inode_alloc_fsbsize(ip);
> > 	/* no more isrt/isforcealign bools */
> > 
> > ...
> > 
> > 		if (alloc_fsb == 1 || (flags & XFS_BMAPI_REMAP))
> > 			goto delete;
> 
> ok, good
> 
> > 
> > 		mod = do_div(del.br_startblock + del.br_blockcount,
> > 				alloc_fsb);
> 
> Note that xfs_bunmapi_align() uses agbno for !rt

Right, I forgot about that.  Can we make the name say that we're
computing the offset within an allocunit?  It's not actually aligning
any of its parameters.

static inline unsigned int
xfs_bmap_alloc_unit_offset(
	struct xfs_inode	*ip,
	xfs_fsblock_t		fsbno,
	unsigned int		alloc_fsb)
{
	xfs_agblock_t		agbno;

	if (XFS_IS_REALTIME_INODE(ip))
		return do_div(fsbno, alloc_fsb);

	agbno = XFS_FSB_TO_AGBNO(ip->i_mount, fsbno);
	return agbno % alloc_fsb;
}

		mod = xfs_bmap_alloc_unit_offset(ip,
				del.br_startblock + del.br_blockcount,
				alloc_fsb);

and

		mod = xfs_bmap_alloc_unit_offset(ip, del.br_startblock,
				alloc_fsb);

--D

> 
> > 		if (mod) {
> > 
> > >   			/*
> > > -			 * Realtime extent not lined up at the end.
> > > +			 * Not aligned to allocation unit on the end.
> > >   			 * The extent could have been split into written
> > >   			 * and unwritten pieces, or we could just be
> > >   			 * unmapping part of it.  But we can't really
> > > -			 * get rid of part of a realtime extent.
> > > +			 * get rid of part of an extent.
> > >   			 */
> > >   			if (del.br_state == XFS_EXT_UNWRITTEN) {
> > >   				/*
> > > @@ -5554,7 +5578,7 @@ __xfs_bunmapi(
> > >   			ASSERT(del.br_state == XFS_EXT_NORM);
> > >   			ASSERT(tp->t_blk_res > 0);
> > >   			/*
> > > -			 * If this spans a realtime extent boundary,
> > > +			 * If this spans an extent boundary,
> > >   			 * chop it back to the start of the one we end at.
> > >   			 */
> > >   			if (del.br_blockcount > mod) {
> > > @@ -5571,14 +5595,12 @@ __xfs_bunmapi(
> > >   			goto nodelete;
> > >   		}
> > > -		mod = xfs_rtb_to_rtxoff(mp, del.br_startblock);
> > > +		mod = xfs_bunmapi_align(ip, del.br_startblock, &off);
> > >   		if (mod) {
> > > -			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
> > 
> > 		mod = do_div(del.br_startblock, alloc_fsb);
> > 		if (mod) {
> > 			xfs_extlen_t off = alloc_fsb - mod;
> > 
> > At least then you don't need this weird xfs_inode_alloc_fsbsize_align
> > that passes back two xfs_extlen_t arguments.
> 
> Sure, but same point about xfs_bunmapi_align() using agbno. I suppose I can
> just make the change to not have xfs_bunmapi_align() be passed the off
> address pointer, and do the calcation here for off here (as you suggest).
> 
> Thanks,
> John
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 02/14] xfs: always tail align maxlen allocations
  2024-08-01 16:30 ` [PATCH v3 02/14] xfs: always tail align maxlen allocations John Garry
@ 2024-08-13 15:01   ` John Garry
  0 siblings, 0 replies; 42+ messages in thread
From: John Garry @ 2024-08-13 15:01 UTC (permalink / raw)
  To: chandan.babu, djwong, dchinner, hch
  Cc: viro, brauner, jack, linux-xfs, linux-kernel, linux-fsdevel,
	catherine.hoang, martin.petersen

On 01/08/2024 17:30, John Garry wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we do a large allocation, the core free space allocation code
> assumes that args->maxlen is aligned to args->prod/args->mod. hence
> if we get a maximum sized extent allocated, it does not do tail
> alignment of the extent.
> 
> However, this assumes that nothing modifies args->maxlen between the
> original allocation context setup and trimming the selected free
> space extent to size. This assumption has recently been found to be
> invalid - xfs_alloc_space_available() modifies args->maxlen in low
> space situations - and there may be more situations we haven't yet
> found like this.
> 
> Force aligned allocation introduces the requirement that extents are
> correctly tail aligned, resulting in this occasional latent
> alignment failure to be reclassified from an unimportant curiousity
> to a must-fix bug.
> 
> Removing the assumption about args->maxlen allocations always being
> tail aligned is trivial, and should not impact anything because
> args->maxlen for inodes with extent size hints configured are
> already aligned. Hence all this change does it avoid weird corner
> cases that would have resulted in unaligned extent sizes by always
> trimming the extent down to an aligned size.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> [provisional on v1 series comment]

Note that I marked this as provisional, as not all questions were 
answered from v1:
https://lore.kernel.org/linux-xfs/20240621195058.GS3058325@frogsfrogsfrogs/

I plan on sending v4 today like this, and hope to clear up any 
unanswered questions then.

> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>   fs/xfs/libxfs/xfs_alloc.c | 12 +++++-------
>   1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index d559d992c6ef..bf08b9e9d9ac 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -433,20 +433,18 @@ xfs_alloc_compute_diff(
>    * Fix up the length, based on mod and prod.
>    * len should be k * prod + mod for some k.
>    * If len is too small it is returned unchanged.
> - * If len hits maxlen it is left alone.
>    */
> -STATIC void
> +static void
>   xfs_alloc_fix_len(
> -	xfs_alloc_arg_t	*args)		/* allocation argument structure */
> +	struct xfs_alloc_arg	*args)
>   {
> -	xfs_extlen_t	k;
> -	xfs_extlen_t	rlen;
> +	xfs_extlen_t		k;
> +	xfs_extlen_t		rlen = args->len;
>   
>   	ASSERT(args->mod < args->prod);
> -	rlen = args->len;
>   	ASSERT(rlen >= args->minlen);
>   	ASSERT(rlen <= args->maxlen);
> -	if (args->prod <= 1 || rlen < args->mod || rlen == args->maxlen ||
> +	if (args->prod <= 1 || rlen < args->mod ||
>   	    (args->mod == 0 && rlen < args->prod))
>   		return;
>   	k = rlen % args->prod;


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-08-13 15:01 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-01 16:30 [PATCH v3 00/14] forcealign for xfs John Garry
2024-08-01 16:30 ` [PATCH v3 01/14] xfs: only allow minlen allocations when near ENOSPC John Garry
2024-08-06 18:51   ` Darrick J. Wong
2024-08-07  0:26     ` Dave Chinner
2024-08-01 16:30 ` [PATCH v3 02/14] xfs: always tail align maxlen allocations John Garry
2024-08-13 15:01   ` John Garry
2024-08-01 16:30 ` [PATCH v3 03/14] xfs: simplify extent allocation alignment John Garry
2024-08-06 18:56   ` Darrick J. Wong
2024-08-06 23:52     ` Dave Chinner
2024-08-07  0:23       ` Darrick J. Wong
2024-08-07  0:34         ` Dave Chinner
2024-08-01 16:30 ` [PATCH v3 04/14] xfs: make EOF allocation simpler John Garry
2024-08-06 18:58   ` Darrick J. Wong
2024-08-07  0:00     ` Dave Chinner
2024-08-07  0:24       ` Darrick J. Wong
2024-08-01 16:30 ` [PATCH v3 05/14] xfs: introduce forced allocation alignment John Garry
2024-08-01 16:30 ` [PATCH v3 06/14] xfs: align args->minlen for " John Garry
2024-08-01 16:30 ` [PATCH v3 07/14] xfs: Introduce FORCEALIGN inode flag John Garry
2024-08-06 19:02   ` Darrick J. Wong
2024-08-07 11:42     ` John Garry
2024-08-07 14:40       ` Darrick J. Wong
2024-08-01 16:30 ` [PATCH v3 08/14] xfs: Update xfs_inode_alloc_unitsize() for forcealign John Garry
2024-08-06 19:02   ` Darrick J. Wong
2024-08-01 16:30 ` [PATCH v3 09/14] xfs: Update xfs_setattr_size() " John Garry
2024-08-06 19:03   ` Darrick J. Wong
2024-08-01 16:30 ` [PATCH v3 10/14] xfs: Do not free EOF blocks " John Garry
2024-08-06 19:24   ` Darrick J. Wong
2024-08-07 12:33     ` John Garry
2024-08-07 15:12       ` Darrick J. Wong
2024-08-01 16:30 ` [PATCH v3 11/14] xfs: Only free full extents " John Garry
2024-08-06 19:27   ` Darrick J. Wong
2024-08-07  0:08     ` Dave Chinner
2024-08-07 13:06     ` John Garry
2024-08-01 16:30 ` [PATCH v3 12/14] xfs: Unmap blocks according to forcealign John Garry
2024-08-06 20:14   ` Darrick J. Wong
2024-08-07 13:40     ` John Garry
2024-08-07 16:19       ` Darrick J. Wong
2024-08-01 16:30 ` [PATCH v3 13/14] xfs: Don't revert allocated offset for forcealign John Garry
2024-08-01 16:30 ` [PATCH v3 14/14] xfs: Enable file data forcealign feature John Garry
2024-08-06 19:43   ` Darrick J. Wong
2024-08-07 13:50     ` John Garry
2024-08-07 15:17       ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).