From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk
Date: Fri, 9 Jan 2026 12:39:38 -0500 [thread overview]
Message-ID: <aWE9WpcO3HBpJTQy@bfoster> (raw)
In-Reply-To: <20260109160754.GM15551@frogsfrogsfrogs>
On Fri, Jan 09, 2026 at 08:07:54AM -0800, Darrick J. Wong wrote:
> On Thu, Jan 08, 2026 at 02:39:12PM -0500, Brian Foster wrote:
> > On Thu, Jan 08, 2026 at 09:10:47AM -0800, Darrick J. Wong wrote:
> > > On Thu, Jan 08, 2026 at 09:11:29AM -0500, Brian Foster wrote:
> > > > Sparse inode cluster allocation sets min/max agbno values to avoid
> > > > allocating an inode cluster that might map to an invalid inode
> > > > chunk. For example, we can't have an inode record mapped to agbno 0
> > > > or that extends past the end of a runt AG of misaligned size.
> > > >
> > > > The initial calculation of max_agbno is unnecessarily conservative,
> > > > however. This has triggered a corner case allocation failure where a
> > > > small runt AG (i.e. 2063 blocks) is mostly full save for an extent
> > > > to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
> > > > case, which happens to be the offset of the last possible valid
> > > > inode chunk in the AG. In practice, we should be able to allocate
> > > > the 4-block cluster at agbno 2052 to map to the parent inode record
> > > > at agbno 2048, but the max_agbno value precludes it.
> > > >
> > > > Note that this can result in filesystem shutdown via dirty trans
> > > > cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
> > > > all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
> > > > the tail AG selection by the allocator sets t_highest_agno on the
> > > > transaction. If the inode allocator spins around and finds an inode
> > > > chunk with free inodes in an earlier AG, the subsequent dir name
> > > > creation path may still fail to allocate due to the AG restriction
> > > > and cancel.
> > > >
> > > > To avoid this problem, update the max_agbno calculation to the agbno
> > > > prior to the last chunk aligned agbno in the AG. This is not
> > > > necessarily the last valid allocation target for a sparse chunk, but
> > > > since inode chunks (i.e. records) are chunk aligned and sparse
> > > > allocs are cluster sized/aligned, this allows the sb_spino_align
> > > > alignment restriction to take over and round down the max effective
> > > > agbno to within the last valid inode chunk in the AG.
> > > >
> > > > Note that even though the allocator improvements in the
> > > > aforementioned commit seem to avoid this particular dirty trans
> > > > cancel situation, the max_agbno logic improvement still applies as
> > > > we should be able to allocate from an AG that has been appropriately
> > > > selected. The more important target for this patch however are
> > > > older/stable kernels prior to this allocator rework/improvement.
> > >
> > > <nod> It makes sense to me that we ought to be able to examine space out
> > > to the final(ish) agbno of the runt AG.
> > >
> > > Question for you: There are 16 holemask bits for 64 inodes per inobt
> > > record, or in other words the new allocation has to be aligned at least
> > > to the number of blocks needed to write 4 inodes. I /think/
> > > sb_spino_align reflects that, right?
> > >
> >
> > Pretty much.. If I recall all the details correctly, the holemask bit
> > per 4-inode ratio was more of a data structure thing. That was just
> > based on how much space we had in the standard inode chunk record
> > freecount field to repurpose to track "holes" in an inode chunk.
> >
> > The alignment rules had to change for higher level design raisins,
> > because if we allocated some sparse chunk out of fragmented free space
> > we need a consistent way to map it back to an inode record without
> > causing conflicts across multiple inode records (i.e. accidental record
> > overlap or whatever else). So therefore when sparse inodes are enabled,
> > at mkfs time we change inode chunk alignment from cluster size to full
> > inode chunk size, and set sparse chunk alignment to the cluster size.
> >
> > This creates an inherent mapping for a sparse inode chunk to an inode
> > record because the cluster aligned sparse chunk always maps to whatever
> > chunk aligned record covers it (so we know whether to allocate a new
> > inode record or use one that might already be sparse based on the sparse
> > alloc, etc.).
>
> <nod> Ok, that's what I was thinking, so I'm glad I asked. :)
>
> > > > Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
> > >
> > > Cc: <stable@vger.kernel.org> # v4.2
> > >
> >
> > Thanks. Do you want me to repost with that or shall the maintainer
> > handle it? ;)
>
> Well first things first ;)
>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
>
> If you want to repost with the cc and the rvb tag then please do that.
> If not, then Carlos, can you include both when you add this to for-next,
> please?
>
Thanks! No problem.. I'll spin a v2 with all of this in a sec..
Brian
> --D
>
>
> > Brian
> >
> > > --D
> > >
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > > fs/xfs/libxfs/xfs_ialloc.c | 11 ++++++-----
> > > > 1 file changed, 6 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > > > index d97295eaebe6..c19d6d713780 100644
> > > > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > > > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > > > @@ -848,15 +848,16 @@ xfs_ialloc_ag_alloc(
> > > > * invalid inode records, such as records that start at agbno 0
> > > > * or extend beyond the AG.
> > > > *
> > > > - * Set min agbno to the first aligned, non-zero agbno and max to
> > > > - * the last aligned agbno that is at least one full chunk from
> > > > - * the end of the AG.
> > > > + * Set min agbno to the first chunk aligned, non-zero agbno and
> > > > + * max to one less than the last chunk aligned agbno from the
> > > > + * end of the AG. We subtract 1 from max so that the cluster
> > > > + * allocation alignment takes over and allows allocation within
> > > > + * the last full inode chunk in the AG.
> > > > */
> > > > args.min_agbno = args.mp->m_sb.sb_inoalignmt;
> > > > args.max_agbno = round_down(xfs_ag_block_count(args.mp,
> > > > pag_agno(pag)),
> > > > - args.mp->m_sb.sb_inoalignmt) -
> > > > - igeo->ialloc_blks;
> > > > + args.mp->m_sb.sb_inoalignmt) - 1;
> > > >
> > > > error = xfs_alloc_vextent_near_bno(&args,
> > > > xfs_agbno_to_fsb(pag,
> > > > --
> > > > 2.52.0
> > > >
> > > >
> > >
> >
> >
>
next prev parent reply other threads:[~2026-01-09 17:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-08 14:11 [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk Brian Foster
2026-01-08 17:10 ` Darrick J. Wong
2026-01-08 19:39 ` Brian Foster
2026-01-09 16:07 ` Darrick J. Wong
2026-01-09 17:39 ` Brian Foster [this message]
[not found] ` <32881f8f-3b68-49c3-95b0-b1889c08d281@oracle.com>
2026-01-08 19:40 ` [External] : " Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWE9WpcO3HBpJTQy@bfoster \
--to=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox