All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Kundan Kumar <kundan.kumar@samsung.com>
Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	willy@infradead.org, mcgrof@kernel.org, clm@meta.com,
	david@fromorbit.com, amir73il@gmail.com, axboe@kernel.dk,
	hch@lst.de, ritesh.list@gmail.com, dave@stgolabs.net,
	cem@kernel.org, wangyufei@vivo.com,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-xfs@vger.kernel.org, gost.dev@samsung.com,
	anuj20.g@samsung.com, vishak.g@samsung.com, joshi.k@samsung.com
Subject: Re: [PATCH v3 3/6] xfs: add per-inode AG prediction map and dirty-AG bitmap
Date: Thu, 5 Feb 2026 08:42:38 -0800	[thread overview]
Message-ID: <20260205164238.GS7712@frogsfrogsfrogs> (raw)
In-Reply-To: <2c485586-83c9-4697-91fc-7b0cee697704@samsung.com>

On Tue, Feb 03, 2026 at 12:50:53PM +0530, Kundan Kumar wrote:
> On 1/29/2026 6:14 AM, Darrick J. Wong wrote:
> > On Fri, Jan 16, 2026 at 03:38:15PM +0530, Kundan Kumar wrote:
> >> Add per-inode structures to track predicted AGs of dirty folios using
> >> an xarray and bitmap. This enables efficient identification of AGs
> >> involved in writeback.
> >>
> >> Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
> >> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
> >> ---
> >>   fs/xfs/xfs_icache.c | 27 +++++++++++++++++++++++++++
> >>   fs/xfs/xfs_inode.h  |  5 +++++
> >>   2 files changed, 32 insertions(+)
> >>
> >> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> >> index e44040206851..f97aa6d66271 100644
> >> --- a/fs/xfs/xfs_icache.c
> >> +++ b/fs/xfs/xfs_icache.c
> >> @@ -80,6 +80,25 @@ static inline xa_mark_t ici_tag_to_mark(unsigned int tag)
> >>   	return XFS_PERAG_BLOCKGC_MARK;
> >>   }
> >>   
> >> +static int xfs_inode_init_ag_bitmap(struct xfs_inode *ip)
> >> +{
> >> +	unsigned int bits = ip->i_mount->m_sb.sb_agcount;
> >> +	unsigned int nlongs;
> >> +
> >> +	xa_init_flags(&ip->i_ag_pmap, XA_FLAGS_LOCK_IRQ);
> > 
> > This increases the size of struct xfs_inode by 40 bytes...
> > 
> 
> I’ll make this lazy and sparse: move AG writeback state behind a pointer
> allocated on first use, and replace the bitmap with a sparse dirty-AG
> set(xarray keyed by agno) so memory scales with AGs actually touched by
> the inode.
> 
> >> +	ip->i_ag_dirty_bitmap = NULL;
> >> +	ip->i_ag_dirty_bits = bits;
> >> +
> >> +	if (!bits)
> >> +		return 0;
> >> +
> >> +	nlongs = BITS_TO_LONGS(bits);
> >> +	ip->i_ag_dirty_bitmap = kcalloc(nlongs, sizeof(unsigned long),
> >> +					GFP_NOFS);
> > 
> > ...and there could be hundreds or thousands of AGs for each filesystem.
> > That's a lot of kernel memory to handle this prediction stuff, and I"m
> > not even sure what ag_dirty_bitmap does yet.
> > 
> 
> The bit for an AG is set in ag_dirty_bitmap at write time. During
> writeback, we check which AG bits are set, wake only those AG-specific
> workers, and each worker scans the page cache, filters folios tagged for
> its AG, and submits the I/O.
> 
> >> +
> >> +	return ip->i_ag_dirty_bitmap ? 0 : -ENOMEM;
> >> +}
> >> +
> >>   /*
> >>    * Allocate and initialise an xfs_inode.
> >>    */
> >> @@ -131,6 +150,8 @@ xfs_inode_alloc(
> >>   	ip->i_next_unlinked = NULLAGINO;
> >>   	ip->i_prev_unlinked = 0;
> >>   
> >> +	xfs_inode_init_ag_bitmap(ip);
> > 
> > Unchecked return value???
> 
> Will correct in next version
> 
> > 
> >> +
> >>   	return ip;
> >>   }
> >>   
> >> @@ -194,6 +215,12 @@ xfs_inode_free(
> >>   	ip->i_ino = 0;
> >>   	spin_unlock(&ip->i_flags_lock);
> >>   
> >> +	/* free xarray contents (values are immediate packed ints) */
> >> +	xa_destroy(&ip->i_ag_pmap);
> >> +	kfree(ip->i_ag_dirty_bitmap);
> >> +	ip->i_ag_dirty_bitmap = NULL;
> >> +	ip->i_ag_dirty_bits = 0;
> >> +
> >>   	__xfs_inode_free(ip);
> >>   }
> >>   
> >> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> >> index bd6d33557194..dee449168605 100644
> >> --- a/fs/xfs/xfs_inode.h
> >> +++ b/fs/xfs/xfs_inode.h
> >> @@ -99,6 +99,11 @@ typedef struct xfs_inode {
> >>   	spinlock_t		i_ioend_lock;
> >>   	struct work_struct	i_ioend_work;
> >>   	struct list_head	i_ioend_list;
> >> +
> >> +	/* AG prediction map: pgoff_t -> packed u32 */
> > 
> > What about blocksize < pagesize filesystems?  Which packed agno do you
> > associate with the pgoff_t?
> > 
> > Also, do you have an xarray entry for each pgoff_t in a large folio?
> > 
> > --D
> > 
> 
> pgoff_t here is the pagecache index (folio->index), i.e. file offset in
> PAGE_SIZE units, not a filesystem block index. So blocksize < PAGE_SIZE
> doesn’t change the association, the packed agno is attached to the folio
> at that pagecache index.

Ok, so the tag is entirely determined by the AG of the first fsblock
within the folio.

> We store one xarray entry per folio index (the start of the folio). We 
> do not create entries for each base-page inside a large folio. If a 
> large folio could span multiple extents/AGs, we’ll treat the hint as 
> advisory and tag it invalid (fallback to normal writeback routing) 
> rather than trying to encode per-subpage AGs.

Oh, ok, so if you have the mapping and the folio at the same time you
can determine that the entire large folio maps to a single extent, and
tag the whole large folio as belonging to a single AG.  That clears
things up, thank you.

It's only in the case of extreme fragmentation that a large folio gets
flung at the old writeback paths, which is probably good enough anyway.

--D

> >> +	struct xarray           i_ag_pmap;
> >> +	unsigned long           *i_ag_dirty_bitmap;
> >> +	unsigned int            i_ag_dirty_bits;
> >>   } xfs_inode_t;
> >>   
> >>   static inline bool xfs_inode_on_unlinked_list(const struct xfs_inode *ip)
> >> -- 
> >> 2.25.1
> >>
> >>
> > 
> 
> 

  reply	other threads:[~2026-02-05 16:42 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20260116101236epcas5p12ba3de776976f4ea6666e16a33ab6ec4@epcas5p1.samsung.com>
2026-01-16 10:08 ` [PATCH v3 0/6] AG aware parallel writeback for XFS Kundan Kumar
2026-01-16 10:08   ` [PATCH v3 1/6] iomap: add write ops hook to attach metadata to folios Kundan Kumar
2026-01-16 10:08   ` [PATCH v3 2/6] xfs: add helpers to pack AG prediction info for per-folio tracking Kundan Kumar
2026-01-29  0:45     ` Darrick J. Wong
2026-02-03  7:15       ` Kundan Kumar
2026-02-05 16:39         ` Darrick J. Wong
2026-02-04  7:37     ` Nirjhar Roy (IBM)
2026-01-16 10:08   ` [PATCH v3 3/6] xfs: add per-inode AG prediction map and dirty-AG bitmap Kundan Kumar
2026-01-29  0:44     ` Darrick J. Wong
2026-02-03  7:20       ` Kundan Kumar
2026-02-05 16:42         ` Darrick J. Wong [this message]
2026-02-05  6:44       ` Nirjhar Roy (IBM)
2026-02-05 16:32         ` Darrick J. Wong
2026-02-06  5:41           ` Nirjhar Roy (IBM)
2026-02-05  6:36     ` Nirjhar Roy (IBM)
2026-02-05 16:36       ` Darrick J. Wong
2026-02-06  5:36         ` Nirjhar Roy (IBM)
2026-02-06  5:57           ` Darrick J. Wong
2026-02-06  6:03             ` Nirjhar Roy (IBM)
2026-02-06  7:00     ` Christoph Hellwig
2026-01-16 10:08   ` [PATCH v3 4/6] xfs: tag folios with AG number during buffered write via iomap attach hook Kundan Kumar
2026-01-29  0:47     ` Darrick J. Wong
2026-01-29 22:40       ` Darrick J. Wong
2026-02-03  7:32         ` Kundan Kumar
2026-02-03  7:28       ` Kundan Kumar
2026-02-05 15:56         ` Brian Foster
2026-02-06  6:44     ` Nirjhar Roy (IBM)
2026-01-16 10:08   ` [PATCH v3 5/6] xfs: add per-AG writeback workqueue infrastructure Kundan Kumar
2026-01-29 22:21     ` Darrick J. Wong
2026-02-03  7:35       ` Kundan Kumar
2026-02-06  6:46     ` Christoph Hellwig
2026-02-10 11:56     ` Nirjhar Roy (IBM)
2026-01-16 10:08   ` [PATCH v3 6/6] xfs: offload writeback by AG using per-inode dirty bitmap and per-AG workers Kundan Kumar
2026-01-29 22:34     ` Darrick J. Wong
2026-02-03  7:40       ` Kundan Kumar
2026-02-11  9:39     ` Nirjhar Roy (IBM)
2026-01-16 16:13   ` [syzbot ci] Re: AG aware parallel writeback for XFS syzbot ci
2026-01-21 19:54   ` [PATCH v3 0/6] " Brian Foster
2026-01-22 16:15     ` Kundan Kumar
2026-01-23  9:36       ` Pankaj Raghav (Samsung)
2026-01-23 13:26       ` Brian Foster
2026-01-28 18:28         ` Kundan Kumar
2026-02-06  6:25           ` Christoph Hellwig
2026-02-06 10:07             ` Kundan Kumar
2026-02-06 17:42               ` Darrick J. Wong
2026-02-09  6:30               ` Christoph Hellwig
2026-02-09 15:54             ` Kundan Kumar
2026-02-10 15:38               ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260205164238.GS7712@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=anuj20.g@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=cem@kernel.org \
    --cc=clm@meta.com \
    --cc=dave@stgolabs.net \
    --cc=david@fromorbit.com \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=joshi.k@samsung.com \
    --cc=kundan.kumar@samsung.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vishak.g@samsung.com \
    --cc=wangyufei@vivo.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.