Linux block layer
 help / color / mirror / Atom feed
From: Kanchan Joshi <joshi.k@samsung.com>
To: brauner@kernel.org, hch@lst.de, djwong@kernel.org,
	dgc@kernel.org, jack@suse.cz, cem@kernel.org, axboe@kernel.dk,
	kbusch@kernel.org, ritesh.list@gmail.com
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, gost.dev@samsung.com,
	Kanchan Joshi <joshi.k@samsung.com>
Subject: [PATCH v3 5/6] xfs: write stream based AG placement
Date: Tue, 16 Jun 2026 23:35:54 +0530	[thread overview]
Message-ID: <20260616180555.33338-6-joshi.k@samsung.com> (raw)
In-Reply-To: <20260616180555.33338-1-joshi.k@samsung.com>

When write stream is set on the file, choose the AG set based on the
write stream value.

Isolating distinct write streams into dedicated allocation groups helps
in reducing the block interleaving of concurrent writers. Keeping these
streams spatially separated reduces AGF lock contention and logical file
fragmentation.

If AGs are fewer than write streams, write streams are distributed into
available AGs in round robin fashion.
If not, available AGs are partitioned into write streams. The write-stream
value is used to derive the AG set, and low bits of the inode is used to
derive the AG within the AG set.

While each stream provides the isolation, the intra-stream concurrency
comes from the AG set size.

Example: 8 Allocation Groups, 4 write streams
AG set size = 2 AGs per write stream

   Stream 1 (ID: 1)         Stream 2 (ID: 2)         Streams 3 & 4
 +---------+---------+    +---------+---------+    +-------------
 |   AG0   |   AG1   |    |   AG2   |   AG3   |    |  AG4...AG7
 +---------+---------+    +---------+---------+    +-------------
      ^         ^              ^         ^
      |         |              |         |
      | File B (ino: 101)      | File D (ino: 201)
      | 101 % 2 = 1 -> AG 1    | 201 % 2 = 1 -> AG 3
      |                        |
 File A (ino: 100)        File C (ino: 200)
 100 % 2 = 0 -> AG 0      200 % 2 = 0 -> AG 2

If AGs can not be evenly distributed among streams, the last stream will
absorb the remaining AGs.

Note that there are no hard boundaries; this only provides explicit
routing hint to xfs allocator. We still preserve file contiguity, and the
full space can be utilized even with a single stream.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6685220ec59a..325987b5bd9e 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3205,6 +3205,38 @@ xfs_default_ag_set_size(
 	return min_t(xfs_agnumber_t, GENERIC_AG_SET_SZ, mp->m_sb.sb_agcount);
 }
 
+static xfs_agnumber_t
+xfs_inode_write_stream_ag_set(
+	struct xfs_inode	*ip,
+	xfs_agnumber_t		*target_agno)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	uint32_t		nr_streams = xfs_inode_max_write_streams(ip);
+	uint32_t		stream_id = ip->i_write_stream;
+	uint32_t		nr_ags = mp->m_sb.sb_agcount;
+	xfs_agnumber_t		set_size;
+
+
+	if (!nr_streams)
+		return xfs_default_ag_set_size(ip);
+
+	stream_id -= 1; /* For 0-based math; stream-ids are 1-based */
+	set_size = nr_ags / nr_streams;
+
+	if (set_size) {
+		*target_agno = stream_id * set_size;
+		/* unven distribution, last stream will cover extra AGs */
+		if (stream_id == nr_streams - 1)
+			set_size = nr_ags - *target_agno;
+	} else {
+		/* for the case when we have fewer AGs than streams */
+		*target_agno = stream_id % nr_ags;
+		set_size = 1;
+	}
+
+	return set_size;
+}
+
 static xfs_agnumber_t
 xfs_ag_to_ag_set(
 	struct xfs_bmalloca	*ap,
@@ -3218,7 +3250,11 @@ xfs_ag_to_ag_set(
 	if (!(ap->datatype & XFS_ALLOC_USERDATA))
 		return base_agno;
 
-	set_size = xfs_default_ag_set_size(ip);
+	if (ip->i_write_stream)
+		set_size = xfs_inode_write_stream_ag_set(ip, &base_agno);
+	else
+		set_size = xfs_default_ag_set_size(ip);
+
 	/* Fan out within the AG set using low bits of the inode */
 	return (base_agno + (XFS_INO_TO_AGINO(mp, ip->i_ino) % set_size)) %
 		mp->m_sb.sb_agcount;
-- 
2.25.1


  parent reply	other threads:[~2026-06-16 18:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20260616181240epcas5p3f86fbb67f0d04cb0ee4b34839c9522b5@epcas5p3.samsung.com>
2026-06-16 18:05 ` [PATCH v3 0/6] xfs write streams Kanchan Joshi
2026-06-16 18:05   ` [PATCH v3 1/6] fs: add generic write-stream management ioctl Kanchan Joshi
2026-06-16 18:05   ` [PATCH v3 2/6] iomap: introduce and propagate write_stream Kanchan Joshi
2026-06-16 18:05   ` [PATCH v3 3/6] xfs: implement write-stream management support Kanchan Joshi
2026-06-16 18:05   ` [PATCH v3 4/6] xfs: generic AG set based steering Kanchan Joshi
2026-06-16 18:05   ` Kanchan Joshi [this message]
2026-06-16 18:05   ` [PATCH v3 6/6] xfs: introduce software write streams Kanchan Joshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260616180555.33338-6-joshi.k@samsung.com \
    --to=joshi.k@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=cem@kernel.org \
    --cc=dgc@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox