Linux block layer
 help / color / mirror / Atom feed
From: Kanchan Joshi <joshi.k@samsung.com>
To: brauner@kernel.org, hch@lst.de, djwong@kernel.org,
	dgc@kernel.org, jack@suse.cz, cem@kernel.org, axboe@kernel.dk,
	kbusch@kernel.org, ritesh.list@gmail.com
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, gost.dev@samsung.com,
	Kanchan Joshi <joshi.k@samsung.com>
Subject: [PATCH v3 0/6] xfs write streams
Date: Tue, 16 Jun 2026 23:35:49 +0530	[thread overview]
Message-ID: <20260616180555.33338-1-joshi.k@samsung.com> (raw)
In-Reply-To: CGME20260616181240epcas5p3f86fbb67f0d04cb0ee4b34839c9522b5@epcas5p3.samsung.com

Hi All,

In LSFMMBPF'26, we discussed 'write stream' as a mechanism to reduce the
filesystem allocator bottlenecks and improving buffered/direct IO
scalability, in different sessions.

This series introduces a generic interface [1,2] for write stream management on
files. It achieves spatial isolation and concurrency improvments [3] in xfs using
- generic AG-set (patch #4)
- write-stream based AG-set (patch #5)

write streams allow the abstraction provider (fs, block device, raid etc.) to
leverage application's intent (file relationships/lifecycle).
- application: sends grouping/isolation intent with a stream id.
- xfs: maps streams to AGs; allocates without interleaving; gains higher
  concurrency due to reduced lock contention.
- hardware: maps streams to underlying allocation unit; reduces device
  internal write amplification, improved life, predictable QoS.

Also few other points:

- Since high level write streams (in xfs) can work without the
low-level write streams (in block device), the series has a general
value beyond hardware with a particular capability.

- For hardware-only spatial isolation, only first 3 patches are needed.

- write-stream is different from existing 'filestream' allocator which
  maintains directory-to-AG associations in a global MRU cache. That
requires state managment and memory (and its reclaim). Proposed AG-set
based steering relies on simple, statless/lockless airthmatic that aligns
more with the default allocator heuristics.

[3]
### Performance
1. On regular NVMe:
fio 4k write, direct IO, 16 jobs, 16 files * 8GiB, iodepth 32, XFS with 16 AGs

Base: 41 KIOPS
With generic ag-set: 93 KIOPS (+126%)
With write-stream ag-set: 227 KIOPS (+453%)

2. On FDP-capable NVMe:
RocksDB YCSB
WAF (base vs write-stream): 35% Reduction

[1]
### Application interface

New vfs ioctl 'FS_IOC_WRITE_STEAM'.
Application communicates the intended operation using the 'op_flags'
field of the passed 'struct fs_write_stream'.
Valid flags are:
FS_WRITE_STREAM_OP_GET_MAX: Returns the number of available streams.
FS_WRITE_STREAM_OP_SET: Assign a specific stream value to the file.
FS_WRITE_STREAM_OP_GET: Query what stream value is set on the file.

[2]
### Comparison with Write Hints (RWH_WRITE_LIFE_*)

- Semantics: Write Hints describe 'data temperature' (e.g.,
short/long/extreme), implying a lifetime. Write Streams describe 'data
placement' (e.g., Bin 1/Bin 2), implying only separation.

- Scalability: Write Hints are limited to a small, fixed enum (6
values). Write streams are dynamic, provider-dependent values that can
scale much higher (kernel limit: up to 255 due to u8 field).

- Discovery: The existing write-hint interface is advisory and decoupled
  from underlying capabilties; application has no way to probe support
and cannot deterministically know which hints are valid. OTOH, write-streams
provide explicit discovery.

Note: within the kernel, the separation between two constructs
(write-hint and write-stream) had started from 6.16 itself.

### Changelog
since v2:
https://lore.kernel.org/linux-fsdevel/20260309052944.156054-1-joshi.k@samsung.com/
- xfs default allocator optimization using fixed-size generic AG set (Dave)
- reuse the above to simplify the write-stream AG set handling
- streamline the uapi; Use union for GET_MAX and GET/SET (Darrick)
- uint16_t for write-stream within xfs inode and other cleanups (Darrick)

since v1:
https://lore.kernel.org/linux-fsdevel/20260216052540.217920-1-joshi.k@samsung.com/
- swich from fcntl based to ioctl-based interface (Christian)
- new patch (#4) that makes xfs allocator use the write streams for AG
  selection
- new patch (#5) that introduces software write streams in xfs.

### Interface example

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <string.h>
#include <errno.h>

/* Duplicate the kernel UAPI definitions */
struct fs_write_stream {
        uint32_t op_flags;
        union {
                uint32_t stream_id;
                uint32_t max_streams;
        };
        uint64_t __reserved;
};

#define FS_WRITE_STREAM_OP_GET                  (1 << 1)
#define FS_WRITE_STREAM_OP_SET                  (1 << 2)
#define FS_WRITE_STREAM_OP_GET_MAX              (1 << 0)

#define FS_IOC_WRITE_STREAM             _IOWR('f', 135, struct fs_write_stream)

void print_usage(const char *progname) {
        fprintf(stderr, "Usage:\n");
        fprintf(stderr, "  %s <file> max       - Get max supported streams\n", progname);
        fprintf(stderr, "  %s <file> get       - Get current stream ID\n", progname);
        fprintf(stderr, "  %s <file> set <id>  - Set stream ID\n", progname);
        exit(EXIT_FAILURE);
}

int main(int argc, char *argv[]) {
        if (argc < 3)
                print_usage(argv[0]);

        const char *filepath = argv[1];
        const char *cmd = argv[2];
        int fd = open(filepath, O_RDWR);
        if (fd < 0) {
                perror("Error opening file");
                return EXIT_FAILURE;
        }

        struct fs_write_stream req;
        memset(&req, 0, sizeof(req));

        if (strcmp(cmd, "max") == 0) {
                req.op_flags = FS_WRITE_STREAM_OP_GET_MAX;
                if (ioctl(fd, FS_IOC_WRITE_STREAM, &req) < 0) {
                        perror("ioctl(GET_MAX) failed");
                        close(fd);
                        return EXIT_FAILURE;
                }
                printf("Max streams supported: %u\n", req.max_streams);
        } else if (strcmp(cmd, "get") == 0) {
                req.op_flags = FS_WRITE_STREAM_OP_GET;
                if (ioctl(fd, FS_IOC_WRITE_STREAM, &req) < 0) {
                        perror("ioctl(GET) failed");
                        close(fd);
                        return EXIT_FAILURE;
                }
                printf("Current stream ID: %u\n", req.stream_id);
        } else if (strcmp(cmd, "set") == 0) {
                if (argc != 4)
                        print_usage(argv[0]);

                req.op_flags = FS_WRITE_STREAM_OP_SET;
                req.stream_id = atoi(argv[3]);

                if (ioctl(fd, FS_IOC_WRITE_STREAM, &req) < 0) {
                        perror("ioctl(SET) failed");
                        close(fd);
                        return EXIT_FAILURE;
                }
                printf("Set stream ID to: %u\n", req.stream_id);
        } else {
                fprintf(stderr, "Unknown command: %s\n", cmd);
                close(fd);
                print_usage(argv[0]);
        }

        close(fd);
        return EXIT_SUCCESS;
}

Kanchan Joshi (6):
  fs: add generic write-stream management ioctl
  iomap: introduce and propagate write_stream
  xfs: implement write-stream management support
  xfs: generic AG set based steering
  xfs: write stream based AG placement
  xfs: introduce software write streams

 fs/iomap/direct-io.c     |  1 +
 fs/iomap/ioend.c         |  3 ++
 fs/xfs/libxfs/xfs_bmap.c | 74 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_icache.c      |  1 +
 fs/xfs/xfs_inode.c       | 69 +++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_inode.h       |  6 ++++
 fs/xfs/xfs_ioctl.c       | 38 +++++++++++++++++++++
 fs/xfs/xfs_iomap.c       |  1 +
 include/linux/iomap.h    |  2 ++
 include/uapi/linux/fs.h  | 14 ++++++++
 10 files changed, 209 insertions(+)

-- 
2.25.1


       reply	other threads:[~2026-06-16 18:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20260616181240epcas5p3f86fbb67f0d04cb0ee4b34839c9522b5@epcas5p3.samsung.com>
2026-06-16 18:05 ` Kanchan Joshi [this message]
2026-06-16 18:05   ` [PATCH v3 1/6] fs: add generic write-stream management ioctl Kanchan Joshi
2026-06-16 18:05   ` [PATCH v3 2/6] iomap: introduce and propagate write_stream Kanchan Joshi
2026-06-16 18:05   ` [PATCH v3 3/6] xfs: implement write-stream management support Kanchan Joshi
2026-06-16 18:05   ` [PATCH v3 4/6] xfs: generic AG set based steering Kanchan Joshi
2026-06-16 18:05   ` [PATCH v3 5/6] xfs: write stream based AG placement Kanchan Joshi
2026-06-16 18:05   ` [PATCH v3 6/6] xfs: introduce software write streams Kanchan Joshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260616180555.33338-1-joshi.k@samsung.com \
    --to=joshi.k@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=cem@kernel.org \
    --cc=dgc@kernel.org \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox