* [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk
@ 2026-01-08 14:11 Brian Foster
2026-01-08 17:10 ` Darrick J. Wong
[not found] ` <32881f8f-3b68-49c3-95b0-b1889c08d281@oracle.com>
0 siblings, 2 replies; 6+ messages in thread
From: Brian Foster @ 2026-01-08 14:11 UTC (permalink / raw)
To: linux-xfs
Sparse inode cluster allocation sets min/max agbno values to avoid
allocating an inode cluster that might map to an invalid inode
chunk. For example, we can't have an inode record mapped to agbno 0
or that extends past the end of a runt AG of misaligned size.
The initial calculation of max_agbno is unnecessarily conservative,
however. This has triggered a corner case allocation failure where a
small runt AG (i.e. 2063 blocks) is mostly full save for an extent
to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
case, which happens to be the offset of the last possible valid
inode chunk in the AG. In practice, we should be able to allocate
the 4-block cluster at agbno 2052 to map to the parent inode record
at agbno 2048, but the max_agbno value precludes it.
Note that this can result in filesystem shutdown via dirty trans
cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
the tail AG selection by the allocator sets t_highest_agno on the
transaction. If the inode allocator spins around and finds an inode
chunk with free inodes in an earlier AG, the subsequent dir name
creation path may still fail to allocate due to the AG restriction
and cancel.
To avoid this problem, update the max_agbno calculation to the agbno
prior to the last chunk aligned agbno in the AG. This is not
necessarily the last valid allocation target for a sparse chunk, but
since inode chunks (i.e. records) are chunk aligned and sparse
allocs are cluster sized/aligned, this allows the sb_spino_align
alignment restriction to take over and round down the max effective
agbno to within the last valid inode chunk in the AG.
Note that even though the allocator improvements in the
aforementioned commit seem to avoid this particular dirty trans
cancel situation, the max_agbno logic improvement still applies as
we should be able to allocate from an AG that has been appropriately
selected. The more important target for this patch however are
older/stable kernels prior to this allocator rework/improvement.
Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/libxfs/xfs_ialloc.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index d97295eaebe6..c19d6d713780 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -848,15 +848,16 @@ xfs_ialloc_ag_alloc(
* invalid inode records, such as records that start at agbno 0
* or extend beyond the AG.
*
- * Set min agbno to the first aligned, non-zero agbno and max to
- * the last aligned agbno that is at least one full chunk from
- * the end of the AG.
+ * Set min agbno to the first chunk aligned, non-zero agbno and
+ * max to one less than the last chunk aligned agbno from the
+ * end of the AG. We subtract 1 from max so that the cluster
+ * allocation alignment takes over and allows allocation within
+ * the last full inode chunk in the AG.
*/
args.min_agbno = args.mp->m_sb.sb_inoalignmt;
args.max_agbno = round_down(xfs_ag_block_count(args.mp,
pag_agno(pag)),
- args.mp->m_sb.sb_inoalignmt) -
- igeo->ialloc_blks;
+ args.mp->m_sb.sb_inoalignmt) - 1;
error = xfs_alloc_vextent_near_bno(&args,
xfs_agbno_to_fsb(pag,
--
2.52.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk
2026-01-08 14:11 [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk Brian Foster
@ 2026-01-08 17:10 ` Darrick J. Wong
2026-01-08 19:39 ` Brian Foster
[not found] ` <32881f8f-3b68-49c3-95b0-b1889c08d281@oracle.com>
1 sibling, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2026-01-08 17:10 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-xfs
On Thu, Jan 08, 2026 at 09:11:29AM -0500, Brian Foster wrote:
> Sparse inode cluster allocation sets min/max agbno values to avoid
> allocating an inode cluster that might map to an invalid inode
> chunk. For example, we can't have an inode record mapped to agbno 0
> or that extends past the end of a runt AG of misaligned size.
>
> The initial calculation of max_agbno is unnecessarily conservative,
> however. This has triggered a corner case allocation failure where a
> small runt AG (i.e. 2063 blocks) is mostly full save for an extent
> to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
> case, which happens to be the offset of the last possible valid
> inode chunk in the AG. In practice, we should be able to allocate
> the 4-block cluster at agbno 2052 to map to the parent inode record
> at agbno 2048, but the max_agbno value precludes it.
>
> Note that this can result in filesystem shutdown via dirty trans
> cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
> all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
> the tail AG selection by the allocator sets t_highest_agno on the
> transaction. If the inode allocator spins around and finds an inode
> chunk with free inodes in an earlier AG, the subsequent dir name
> creation path may still fail to allocate due to the AG restriction
> and cancel.
>
> To avoid this problem, update the max_agbno calculation to the agbno
> prior to the last chunk aligned agbno in the AG. This is not
> necessarily the last valid allocation target for a sparse chunk, but
> since inode chunks (i.e. records) are chunk aligned and sparse
> allocs are cluster sized/aligned, this allows the sb_spino_align
> alignment restriction to take over and round down the max effective
> agbno to within the last valid inode chunk in the AG.
>
> Note that even though the allocator improvements in the
> aforementioned commit seem to avoid this particular dirty trans
> cancel situation, the max_agbno logic improvement still applies as
> we should be able to allocate from an AG that has been appropriately
> selected. The more important target for this patch however are
> older/stable kernels prior to this allocator rework/improvement.
<nod> It makes sense to me that we ought to be able to examine space out
to the final(ish) agbno of the runt AG.
Question for you: There are 16 holemask bits for 64 inodes per inobt
record, or in other words the new allocation has to be aligned at least
to the number of blocks needed to write 4 inodes. I /think/
sb_spino_align reflects that, right?
> Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
Cc: <stable@vger.kernel.org> # v4.2
--D
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/xfs/libxfs/xfs_ialloc.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index d97295eaebe6..c19d6d713780 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -848,15 +848,16 @@ xfs_ialloc_ag_alloc(
> * invalid inode records, such as records that start at agbno 0
> * or extend beyond the AG.
> *
> - * Set min agbno to the first aligned, non-zero agbno and max to
> - * the last aligned agbno that is at least one full chunk from
> - * the end of the AG.
> + * Set min agbno to the first chunk aligned, non-zero agbno and
> + * max to one less than the last chunk aligned agbno from the
> + * end of the AG. We subtract 1 from max so that the cluster
> + * allocation alignment takes over and allows allocation within
> + * the last full inode chunk in the AG.
> */
> args.min_agbno = args.mp->m_sb.sb_inoalignmt;
> args.max_agbno = round_down(xfs_ag_block_count(args.mp,
> pag_agno(pag)),
> - args.mp->m_sb.sb_inoalignmt) -
> - igeo->ialloc_blks;
> + args.mp->m_sb.sb_inoalignmt) - 1;
>
> error = xfs_alloc_vextent_near_bno(&args,
> xfs_agbno_to_fsb(pag,
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk
2026-01-08 17:10 ` Darrick J. Wong
@ 2026-01-08 19:39 ` Brian Foster
2026-01-09 16:07 ` Darrick J. Wong
0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2026-01-08 19:39 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Thu, Jan 08, 2026 at 09:10:47AM -0800, Darrick J. Wong wrote:
> On Thu, Jan 08, 2026 at 09:11:29AM -0500, Brian Foster wrote:
> > Sparse inode cluster allocation sets min/max agbno values to avoid
> > allocating an inode cluster that might map to an invalid inode
> > chunk. For example, we can't have an inode record mapped to agbno 0
> > or that extends past the end of a runt AG of misaligned size.
> >
> > The initial calculation of max_agbno is unnecessarily conservative,
> > however. This has triggered a corner case allocation failure where a
> > small runt AG (i.e. 2063 blocks) is mostly full save for an extent
> > to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
> > case, which happens to be the offset of the last possible valid
> > inode chunk in the AG. In practice, we should be able to allocate
> > the 4-block cluster at agbno 2052 to map to the parent inode record
> > at agbno 2048, but the max_agbno value precludes it.
> >
> > Note that this can result in filesystem shutdown via dirty trans
> > cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
> > all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
> > the tail AG selection by the allocator sets t_highest_agno on the
> > transaction. If the inode allocator spins around and finds an inode
> > chunk with free inodes in an earlier AG, the subsequent dir name
> > creation path may still fail to allocate due to the AG restriction
> > and cancel.
> >
> > To avoid this problem, update the max_agbno calculation to the agbno
> > prior to the last chunk aligned agbno in the AG. This is not
> > necessarily the last valid allocation target for a sparse chunk, but
> > since inode chunks (i.e. records) are chunk aligned and sparse
> > allocs are cluster sized/aligned, this allows the sb_spino_align
> > alignment restriction to take over and round down the max effective
> > agbno to within the last valid inode chunk in the AG.
> >
> > Note that even though the allocator improvements in the
> > aforementioned commit seem to avoid this particular dirty trans
> > cancel situation, the max_agbno logic improvement still applies as
> > we should be able to allocate from an AG that has been appropriately
> > selected. The more important target for this patch however are
> > older/stable kernels prior to this allocator rework/improvement.
>
> <nod> It makes sense to me that we ought to be able to examine space out
> to the final(ish) agbno of the runt AG.
>
> Question for you: There are 16 holemask bits for 64 inodes per inobt
> record, or in other words the new allocation has to be aligned at least
> to the number of blocks needed to write 4 inodes. I /think/
> sb_spino_align reflects that, right?
>
Pretty much.. If I recall all the details correctly, the holemask bit
per 4-inode ratio was more of a data structure thing. That was just
based on how much space we had in the standard inode chunk record
freecount field to repurpose to track "holes" in an inode chunk.
The alignment rules had to change for higher level design raisins,
because if we allocated some sparse chunk out of fragmented free space
we need a consistent way to map it back to an inode record without
causing conflicts across multiple inode records (i.e. accidental record
overlap or whatever else). So therefore when sparse inodes are enabled,
at mkfs time we change inode chunk alignment from cluster size to full
inode chunk size, and set sparse chunk alignment to the cluster size.
This creates an inherent mapping for a sparse inode chunk to an inode
record because the cluster aligned sparse chunk always maps to whatever
chunk aligned record covers it (so we know whether to allocate a new
inode record or use one that might already be sparse based on the sparse
alloc, etc.).
> > Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
>
> Cc: <stable@vger.kernel.org> # v4.2
>
Thanks. Do you want me to repost with that or shall the maintainer
handle it? ;)
Brian
> --D
>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > fs/xfs/libxfs/xfs_ialloc.c | 11 ++++++-----
> > 1 file changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > index d97295eaebe6..c19d6d713780 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > @@ -848,15 +848,16 @@ xfs_ialloc_ag_alloc(
> > * invalid inode records, such as records that start at agbno 0
> > * or extend beyond the AG.
> > *
> > - * Set min agbno to the first aligned, non-zero agbno and max to
> > - * the last aligned agbno that is at least one full chunk from
> > - * the end of the AG.
> > + * Set min agbno to the first chunk aligned, non-zero agbno and
> > + * max to one less than the last chunk aligned agbno from the
> > + * end of the AG. We subtract 1 from max so that the cluster
> > + * allocation alignment takes over and allows allocation within
> > + * the last full inode chunk in the AG.
> > */
> > args.min_agbno = args.mp->m_sb.sb_inoalignmt;
> > args.max_agbno = round_down(xfs_ag_block_count(args.mp,
> > pag_agno(pag)),
> > - args.mp->m_sb.sb_inoalignmt) -
> > - igeo->ialloc_blks;
> > + args.mp->m_sb.sb_inoalignmt) - 1;
> >
> > error = xfs_alloc_vextent_near_bno(&args,
> > xfs_agbno_to_fsb(pag,
> > --
> > 2.52.0
> >
> >
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [External] : [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk
[not found] ` <32881f8f-3b68-49c3-95b0-b1889c08d281@oracle.com>
@ 2026-01-08 19:40 ` Brian Foster
0 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2026-01-08 19:40 UTC (permalink / raw)
To: Mark Tinguely; +Cc: linux-xfs
Adding linux-xfs back to cc.
On Thu, Jan 08, 2026 at 12:37:46PM -0600, Mark Tinguely wrote:
> On 1/8/26 8:11 AM, Brian Foster wrote:
> > Sparse inode cluster allocation sets min/max agbno values to avoid
> > allocating an inode cluster that might map to an invalid inode
> > chunk. For example, we can't have an inode record mapped to agbno 0
> > or that extends past the end of a runt AG of misaligned size.
> >
> > The initial calculation of max_agbno is unnecessarily conservative,
> > however. This has triggered a corner case allocation failure where a
> > small runt AG (i.e. 2063 blocks) is mostly full save for an extent
> > to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
> > case, which happens to be the offset of the last possible valid
> > inode chunk in the AG. In practice, we should be able to allocate
> > the 4-block cluster at agbno 2052 to map to the parent inode record
> > at agbno 2048, but the max_agbno value precludes it.
> >
>
> With the same logic, wouldn't the 4 block cluster at agbno 2056 also
> be a valid sparse inode cluster?
>
Nope.. the problem there is that 4 block cluster would map to an inode
record at the same agbno 2056, but that record in the inobt would be
invalid because there are only 7 blocks before the end of the 2063 block
runt AG (full inode records are 8 blocks in this example). So IIRC the
metadata verifiers will complain about this and consider it corruption
and whatnot.
That was actually the issue fixed by Dave's more recent commit
13325333582d ("xfs: fix sparse inode limits on runt AG") in this same
area. The subtlety here is that the calculation was off in this regard
from the start, but it never mattered as such because it wasn't
effective in this small runt AG case in the first place. So that fix
made the min/max agbno bounding logic effective, and with that in place
this fell out more recently pointing out that the original calculation
was a bit too conservative.
Brian
> thanks,
>
> Mark.
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk
2026-01-08 19:39 ` Brian Foster
@ 2026-01-09 16:07 ` Darrick J. Wong
2026-01-09 17:39 ` Brian Foster
0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2026-01-09 16:07 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-xfs
On Thu, Jan 08, 2026 at 02:39:12PM -0500, Brian Foster wrote:
> On Thu, Jan 08, 2026 at 09:10:47AM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 08, 2026 at 09:11:29AM -0500, Brian Foster wrote:
> > > Sparse inode cluster allocation sets min/max agbno values to avoid
> > > allocating an inode cluster that might map to an invalid inode
> > > chunk. For example, we can't have an inode record mapped to agbno 0
> > > or that extends past the end of a runt AG of misaligned size.
> > >
> > > The initial calculation of max_agbno is unnecessarily conservative,
> > > however. This has triggered a corner case allocation failure where a
> > > small runt AG (i.e. 2063 blocks) is mostly full save for an extent
> > > to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
> > > case, which happens to be the offset of the last possible valid
> > > inode chunk in the AG. In practice, we should be able to allocate
> > > the 4-block cluster at agbno 2052 to map to the parent inode record
> > > at agbno 2048, but the max_agbno value precludes it.
> > >
> > > Note that this can result in filesystem shutdown via dirty trans
> > > cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
> > > all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
> > > the tail AG selection by the allocator sets t_highest_agno on the
> > > transaction. If the inode allocator spins around and finds an inode
> > > chunk with free inodes in an earlier AG, the subsequent dir name
> > > creation path may still fail to allocate due to the AG restriction
> > > and cancel.
> > >
> > > To avoid this problem, update the max_agbno calculation to the agbno
> > > prior to the last chunk aligned agbno in the AG. This is not
> > > necessarily the last valid allocation target for a sparse chunk, but
> > > since inode chunks (i.e. records) are chunk aligned and sparse
> > > allocs are cluster sized/aligned, this allows the sb_spino_align
> > > alignment restriction to take over and round down the max effective
> > > agbno to within the last valid inode chunk in the AG.
> > >
> > > Note that even though the allocator improvements in the
> > > aforementioned commit seem to avoid this particular dirty trans
> > > cancel situation, the max_agbno logic improvement still applies as
> > > we should be able to allocate from an AG that has been appropriately
> > > selected. The more important target for this patch however are
> > > older/stable kernels prior to this allocator rework/improvement.
> >
> > <nod> It makes sense to me that we ought to be able to examine space out
> > to the final(ish) agbno of the runt AG.
> >
> > Question for you: There are 16 holemask bits for 64 inodes per inobt
> > record, or in other words the new allocation has to be aligned at least
> > to the number of blocks needed to write 4 inodes. I /think/
> > sb_spino_align reflects that, right?
> >
>
> Pretty much.. If I recall all the details correctly, the holemask bit
> per 4-inode ratio was more of a data structure thing. That was just
> based on how much space we had in the standard inode chunk record
> freecount field to repurpose to track "holes" in an inode chunk.
>
> The alignment rules had to change for higher level design raisins,
> because if we allocated some sparse chunk out of fragmented free space
> we need a consistent way to map it back to an inode record without
> causing conflicts across multiple inode records (i.e. accidental record
> overlap or whatever else). So therefore when sparse inodes are enabled,
> at mkfs time we change inode chunk alignment from cluster size to full
> inode chunk size, and set sparse chunk alignment to the cluster size.
>
> This creates an inherent mapping for a sparse inode chunk to an inode
> record because the cluster aligned sparse chunk always maps to whatever
> chunk aligned record covers it (so we know whether to allocate a new
> inode record or use one that might already be sparse based on the sparse
> alloc, etc.).
<nod> Ok, that's what I was thinking, so I'm glad I asked. :)
> > > Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
> >
> > Cc: <stable@vger.kernel.org> # v4.2
> >
>
> Thanks. Do you want me to repost with that or shall the maintainer
> handle it? ;)
Well first things first ;)
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
If you want to repost with the cc and the rvb tag then please do that.
If not, then Carlos, can you include both when you add this to for-next,
please?
--D
> Brian
>
> > --D
> >
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > fs/xfs/libxfs/xfs_ialloc.c | 11 ++++++-----
> > > 1 file changed, 6 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > > index d97295eaebe6..c19d6d713780 100644
> > > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > > @@ -848,15 +848,16 @@ xfs_ialloc_ag_alloc(
> > > * invalid inode records, such as records that start at agbno 0
> > > * or extend beyond the AG.
> > > *
> > > - * Set min agbno to the first aligned, non-zero agbno and max to
> > > - * the last aligned agbno that is at least one full chunk from
> > > - * the end of the AG.
> > > + * Set min agbno to the first chunk aligned, non-zero agbno and
> > > + * max to one less than the last chunk aligned agbno from the
> > > + * end of the AG. We subtract 1 from max so that the cluster
> > > + * allocation alignment takes over and allows allocation within
> > > + * the last full inode chunk in the AG.
> > > */
> > > args.min_agbno = args.mp->m_sb.sb_inoalignmt;
> > > args.max_agbno = round_down(xfs_ag_block_count(args.mp,
> > > pag_agno(pag)),
> > > - args.mp->m_sb.sb_inoalignmt) -
> > > - igeo->ialloc_blks;
> > > + args.mp->m_sb.sb_inoalignmt) - 1;
> > >
> > > error = xfs_alloc_vextent_near_bno(&args,
> > > xfs_agbno_to_fsb(pag,
> > > --
> > > 2.52.0
> > >
> > >
> >
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk
2026-01-09 16:07 ` Darrick J. Wong
@ 2026-01-09 17:39 ` Brian Foster
0 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2026-01-09 17:39 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Fri, Jan 09, 2026 at 08:07:54AM -0800, Darrick J. Wong wrote:
> On Thu, Jan 08, 2026 at 02:39:12PM -0500, Brian Foster wrote:
> > On Thu, Jan 08, 2026 at 09:10:47AM -0800, Darrick J. Wong wrote:
> > > On Thu, Jan 08, 2026 at 09:11:29AM -0500, Brian Foster wrote:
> > > > Sparse inode cluster allocation sets min/max agbno values to avoid
> > > > allocating an inode cluster that might map to an invalid inode
> > > > chunk. For example, we can't have an inode record mapped to agbno 0
> > > > or that extends past the end of a runt AG of misaligned size.
> > > >
> > > > The initial calculation of max_agbno is unnecessarily conservative,
> > > > however. This has triggered a corner case allocation failure where a
> > > > small runt AG (i.e. 2063 blocks) is mostly full save for an extent
> > > > to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
> > > > case, which happens to be the offset of the last possible valid
> > > > inode chunk in the AG. In practice, we should be able to allocate
> > > > the 4-block cluster at agbno 2052 to map to the parent inode record
> > > > at agbno 2048, but the max_agbno value precludes it.
> > > >
> > > > Note that this can result in filesystem shutdown via dirty trans
> > > > cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
> > > > all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
> > > > the tail AG selection by the allocator sets t_highest_agno on the
> > > > transaction. If the inode allocator spins around and finds an inode
> > > > chunk with free inodes in an earlier AG, the subsequent dir name
> > > > creation path may still fail to allocate due to the AG restriction
> > > > and cancel.
> > > >
> > > > To avoid this problem, update the max_agbno calculation to the agbno
> > > > prior to the last chunk aligned agbno in the AG. This is not
> > > > necessarily the last valid allocation target for a sparse chunk, but
> > > > since inode chunks (i.e. records) are chunk aligned and sparse
> > > > allocs are cluster sized/aligned, this allows the sb_spino_align
> > > > alignment restriction to take over and round down the max effective
> > > > agbno to within the last valid inode chunk in the AG.
> > > >
> > > > Note that even though the allocator improvements in the
> > > > aforementioned commit seem to avoid this particular dirty trans
> > > > cancel situation, the max_agbno logic improvement still applies as
> > > > we should be able to allocate from an AG that has been appropriately
> > > > selected. The more important target for this patch however are
> > > > older/stable kernels prior to this allocator rework/improvement.
> > >
> > > <nod> It makes sense to me that we ought to be able to examine space out
> > > to the final(ish) agbno of the runt AG.
> > >
> > > Question for you: There are 16 holemask bits for 64 inodes per inobt
> > > record, or in other words the new allocation has to be aligned at least
> > > to the number of blocks needed to write 4 inodes. I /think/
> > > sb_spino_align reflects that, right?
> > >
> >
> > Pretty much.. If I recall all the details correctly, the holemask bit
> > per 4-inode ratio was more of a data structure thing. That was just
> > based on how much space we had in the standard inode chunk record
> > freecount field to repurpose to track "holes" in an inode chunk.
> >
> > The alignment rules had to change for higher level design raisins,
> > because if we allocated some sparse chunk out of fragmented free space
> > we need a consistent way to map it back to an inode record without
> > causing conflicts across multiple inode records (i.e. accidental record
> > overlap or whatever else). So therefore when sparse inodes are enabled,
> > at mkfs time we change inode chunk alignment from cluster size to full
> > inode chunk size, and set sparse chunk alignment to the cluster size.
> >
> > This creates an inherent mapping for a sparse inode chunk to an inode
> > record because the cluster aligned sparse chunk always maps to whatever
> > chunk aligned record covers it (so we know whether to allocate a new
> > inode record or use one that might already be sparse based on the sparse
> > alloc, etc.).
>
> <nod> Ok, that's what I was thinking, so I'm glad I asked. :)
>
> > > > Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
> > >
> > > Cc: <stable@vger.kernel.org> # v4.2
> > >
> >
> > Thanks. Do you want me to repost with that or shall the maintainer
> > handle it? ;)
>
> Well first things first ;)
>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
>
> If you want to repost with the cc and the rvb tag then please do that.
> If not, then Carlos, can you include both when you add this to for-next,
> please?
>
Thanks! No problem.. I'll spin a v2 with all of this in a sec..
Brian
> --D
>
>
> > Brian
> >
> > > --D
> > >
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > > fs/xfs/libxfs/xfs_ialloc.c | 11 ++++++-----
> > > > 1 file changed, 6 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > > > index d97295eaebe6..c19d6d713780 100644
> > > > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > > > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > > > @@ -848,15 +848,16 @@ xfs_ialloc_ag_alloc(
> > > > * invalid inode records, such as records that start at agbno 0
> > > > * or extend beyond the AG.
> > > > *
> > > > - * Set min agbno to the first aligned, non-zero agbno and max to
> > > > - * the last aligned agbno that is at least one full chunk from
> > > > - * the end of the AG.
> > > > + * Set min agbno to the first chunk aligned, non-zero agbno and
> > > > + * max to one less than the last chunk aligned agbno from the
> > > > + * end of the AG. We subtract 1 from max so that the cluster
> > > > + * allocation alignment takes over and allows allocation within
> > > > + * the last full inode chunk in the AG.
> > > > */
> > > > args.min_agbno = args.mp->m_sb.sb_inoalignmt;
> > > > args.max_agbno = round_down(xfs_ag_block_count(args.mp,
> > > > pag_agno(pag)),
> > > > - args.mp->m_sb.sb_inoalignmt) -
> > > > - igeo->ialloc_blks;
> > > > + args.mp->m_sb.sb_inoalignmt) - 1;
> > > >
> > > > error = xfs_alloc_vextent_near_bno(&args,
> > > > xfs_agbno_to_fsb(pag,
> > > > --
> > > > 2.52.0
> > > >
> > > >
> > >
> >
> >
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-09 17:39 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-08 14:11 [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk Brian Foster
2026-01-08 17:10 ` Darrick J. Wong
2026-01-08 19:39 ` Brian Foster
2026-01-09 16:07 ` Darrick J. Wong
2026-01-09 17:39 ` Brian Foster
[not found] ` <32881f8f-3b68-49c3-95b0-b1889c08d281@oracle.com>
2026-01-08 19:40 ` [External] : " Brian Foster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox