public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk
Date: Fri, 9 Jan 2026 12:39:38 -0500	[thread overview]
Message-ID: <aWE9WpcO3HBpJTQy@bfoster> (raw)
In-Reply-To: <20260109160754.GM15551@frogsfrogsfrogs>

On Fri, Jan 09, 2026 at 08:07:54AM -0800, Darrick J. Wong wrote:
> On Thu, Jan 08, 2026 at 02:39:12PM -0500, Brian Foster wrote:
> > On Thu, Jan 08, 2026 at 09:10:47AM -0800, Darrick J. Wong wrote:
> > > On Thu, Jan 08, 2026 at 09:11:29AM -0500, Brian Foster wrote:
> > > > Sparse inode cluster allocation sets min/max agbno values to avoid
> > > > allocating an inode cluster that might map to an invalid inode
> > > > chunk. For example, we can't have an inode record mapped to agbno 0
> > > > or that extends past the end of a runt AG of misaligned size.
> > > > 
> > > > The initial calculation of max_agbno is unnecessarily conservative,
> > > > however. This has triggered a corner case allocation failure where a
> > > > small runt AG (i.e. 2063 blocks) is mostly full save for an extent
> > > > to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
> > > > case, which happens to be the offset of the last possible valid
> > > > inode chunk in the AG. In practice, we should be able to allocate
> > > > the 4-block cluster at agbno 2052 to map to the parent inode record
> > > > at agbno 2048, but the max_agbno value precludes it.
> > > > 
> > > > Note that this can result in filesystem shutdown via dirty trans
> > > > cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
> > > > all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
> > > > the tail AG selection by the allocator sets t_highest_agno on the
> > > > transaction. If the inode allocator spins around and finds an inode
> > > > chunk with free inodes in an earlier AG, the subsequent dir name
> > > > creation path may still fail to allocate due to the AG restriction
> > > > and cancel.
> > > > 
> > > > To avoid this problem, update the max_agbno calculation to the agbno
> > > > prior to the last chunk aligned agbno in the AG. This is not
> > > > necessarily the last valid allocation target for a sparse chunk, but
> > > > since inode chunks (i.e. records) are chunk aligned and sparse
> > > > allocs are cluster sized/aligned, this allows the sb_spino_align
> > > > alignment restriction to take over and round down the max effective
> > > > agbno to within the last valid inode chunk in the AG.
> > > > 
> > > > Note that even though the allocator improvements in the
> > > > aforementioned commit seem to avoid this particular dirty trans
> > > > cancel situation, the max_agbno logic improvement still applies as
> > > > we should be able to allocate from an AG that has been appropriately
> > > > selected. The more important target for this patch however are
> > > > older/stable kernels prior to this allocator rework/improvement.
> > > 
> > > <nod> It makes sense to me that we ought to be able to examine space out
> > > to the final(ish) agbno of the runt AG.
> > > 
> > > Question for you: There are 16 holemask bits for 64 inodes per inobt
> > > record, or in other words the new allocation has to be aligned at least
> > > to the number of blocks needed to write 4 inodes.  I /think/
> > > sb_spino_align reflects that, right?
> > > 
> > 
> > Pretty much.. If I recall all the details correctly, the holemask bit
> > per 4-inode ratio was more of a data structure thing. That was just
> > based on how much space we had in the standard inode chunk record
> > freecount field to repurpose to track "holes" in an inode chunk.
> > 
> > The alignment rules had to change for higher level design raisins,
> > because if we allocated some sparse chunk out of fragmented free space
> > we need a consistent way to map it back to an inode record without
> > causing conflicts across multiple inode records (i.e. accidental record
> > overlap or whatever else). So therefore when sparse inodes are enabled,
> > at mkfs time we change inode chunk alignment from cluster size to full
> > inode chunk size, and set sparse chunk alignment to the cluster size.
> > 
> > This creates an inherent mapping for a sparse inode chunk to an inode
> > record because the cluster aligned sparse chunk always maps to whatever
> > chunk aligned record covers it (so we know whether to allocate a new
> > inode record or use one that might already be sparse based on the sparse
> > alloc, etc.).
> 
> <nod> Ok, that's what I was thinking, so I'm glad I asked. :)
> 
> > > > Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
> > > 
> > > Cc: <stable@vger.kernel.org> # v4.2
> > > 
> > 
> > Thanks. Do you want me to repost with that or shall the maintainer
> > handle it? ;)
> 
> Well first things first ;)
> 
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> 
> If you want to repost with the cc and the rvb tag then please do that.
> If not, then Carlos, can you include both when you add this to for-next,
> please?
> 

Thanks! No problem.. I'll spin a v2 with all of this in a sec..

Brian

> --D
> 
> 
> > Brian
> > 
> > > --D
> > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_ialloc.c | 11 ++++++-----
> > > >  1 file changed, 6 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > > > index d97295eaebe6..c19d6d713780 100644
> > > > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > > > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > > > @@ -848,15 +848,16 @@ xfs_ialloc_ag_alloc(
> > > >  		 * invalid inode records, such as records that start at agbno 0
> > > >  		 * or extend beyond the AG.
> > > >  		 *
> > > > -		 * Set min agbno to the first aligned, non-zero agbno and max to
> > > > -		 * the last aligned agbno that is at least one full chunk from
> > > > -		 * the end of the AG.
> > > > +		 * Set min agbno to the first chunk aligned, non-zero agbno and
> > > > +		 * max to one less than the last chunk aligned agbno from the
> > > > +		 * end of the AG. We subtract 1 from max so that the cluster
> > > > +		 * allocation alignment takes over and allows allocation within
> > > > +		 * the last full inode chunk in the AG.
> > > >  		 */
> > > >  		args.min_agbno = args.mp->m_sb.sb_inoalignmt;
> > > >  		args.max_agbno = round_down(xfs_ag_block_count(args.mp,
> > > >  							pag_agno(pag)),
> > > > -					    args.mp->m_sb.sb_inoalignmt) -
> > > > -				 igeo->ialloc_blks;
> > > > +					    args.mp->m_sb.sb_inoalignmt) - 1;
> > > >  
> > > >  		error = xfs_alloc_vextent_near_bno(&args,
> > > >  				xfs_agbno_to_fsb(pag,
> > > > -- 
> > > > 2.52.0
> > > > 
> > > > 
> > > 
> > 
> > 
> 


  reply	other threads:[~2026-01-09 17:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-08 14:11 [PATCH] xfs: set max_agbno to allow sparse alloc of last full inode chunk Brian Foster
2026-01-08 17:10 ` Darrick J. Wong
2026-01-08 19:39   ` Brian Foster
2026-01-09 16:07     ` Darrick J. Wong
2026-01-09 17:39       ` Brian Foster [this message]
     [not found] ` <32881f8f-3b68-49c3-95b0-b1889c08d281@oracle.com>
2026-01-08 19:40   ` [External] : " Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWE9WpcO3HBpJTQy@bfoster \
    --to=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox