public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* support for zoned devices v3
@ 2025-02-18  8:10 Christoph Hellwig
  2025-02-18  8:10 ` [PATCH 01/45] xfs: reflow xfs_dec_freecounter Christoph Hellwig
                   ` (44 more replies)
  0 siblings, 45 replies; 61+ messages in thread
From: Christoph Hellwig @ 2025-02-18  8:10 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Darrick J. Wong, Hans Holmberg, linux-xfs

Hi all,

this series adds support for zoned devices:

    https://zonedstorage.io/docs/introduction/zoned-storage

to XFS. It has been developed for and tested on both SMR hard drives,
which are the oldest and most common class of zoned devices:

   https://zonedstorage.io/docs/introduction/smr

and ZNS SSDs:

   https://zonedstorage.io/docs/introduction/zns

It has not been tested with zoned UFS devices, as their current capacity
points and performance characteristics aren't too interesting for XFS
use cases (but never say never).

Sequential write only zones are only supported for data using a new
allocator for the RT device, which maps each zone to a rtgroup which
is written sequentially.  All metadata and (for now) the log require
using randomly writable space. This means a realtime device is required
to support zoned storage, but for the common case of SMR hard drives
that contain random writable zones and sequential write required zones
on the same block device, the concept of an internal RT device is added
which means using XFS on a SMR HDD is as simple as:

$ mkfs.xfs /dev/sda
$ mount /dev/sda /mnt

When using NVMe ZNS SSDs that do not support conventional zones, the
traditional multi-device RT configuration is required.  E.g. for an
SSD with a conventional namespace 1 and a zoned namespace 2:

$ mkfs.xfs /dev/nvme0n1 -o rtdev=/dev/nvme0n2
$ mount -o rtdev=/dev/nvme0n2 /dev/nvme0n1 /mnt

The zoned allocator can also be used on conventional block devices, or
on conventional zones (e.g. when using an SMR HDD as the external RT
device).  For example using zoned XFS on normal SSDs shows very nice
performance advantages and write amplification reduction for intelligent
workloads like RocksDB.

Some work is still in progress or planned, but should not affect the
integration with the rest of XFS or the on-disk format:

 - support for quotas
 - support for reflinks - the I/O path already supports them, but
   garbage collection currently isn't refcount aware and would unshare
   them, rendering the feature useless
 - more scalable garbage collection victim selection
 - various improvements to hint based data placement

To make testing easier a git tree is provided that has the required
iomap changes that we merged through the VFS tree, this code and a
few misc patches that make VM testing easier:

    git://git.infradead.org/users/hch/xfs.git xfs-zoned

The matching xfsprogs is available here:

    git://git.infradead.org/users/hch/xfsprogs.git xfs-zoned

An xfstests branch to enable the zoned code, and with various new tests
is here:

    git://git.infradead.org/users/hch/xfstests-dev.git xfs-zoned

An updated xfs-documentation branch documenting the on-disk format is
here:

    git://git.infradead.org/users/hch/xfs-documentation.git xfs-zoned

Gitweb:

    http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/xfs-zoned
    http://git.infradead.org/users/hch/xfsprogs.git/shortlog/refs/heads/xfs-zoned
    http://git.infradead.org/users/hch/xfstests-dev.git/shortlog/refs/heads/xfs-zoned
    http://git.infradead.org/users/hch/xfs-documentation.git/shortlog/refs/heads/xfs-zoned

Changes since RFCv2:
 - split the freecounter changes into two patches and changed the
   structure to have a single array with a struct as entry for the
   percpu counter and the reserved blocks fields
 - split up the global metabtree reservation patch
 - fix fixing up the metabtree reservation when rebuilding metabtrees
 - guard the sysfs code with IS_ENABLED(CONFIG_XFS_RT)
 - make the in-core sb_rtstart an xfs_rfsblock_t
 - remove very verbose printks
 - improve a few tracepoints to include more information
 - move a few hunks to earlier patches
 - use an if/else block in xfs_vm_writepages to keep the context
 - use vmalloc for zi_used_bucket_bitmap
   in it's own block
 - improve a comment message
 - more typo and whitespace fixin'

Changes since RFC:
 - rebased to current Linus' tree that has rtrmap and rtreflink merged
 - adjust for minor changes in the iomap series
 - add one more caller of rtg_rmap
 - comment on the sb_dblocks access in statfs
 - use xfs_inode_alloc_unitsize to report dio alignments
 - improve various commit messages
 - misc spelling fixes
 - misc whitespace fixes
 - add separate helpers for raw vs always positive free space counters
 - print the pool name when reservations failed
 - return bool from xfs_zone_validate
 - use more rtg locking helpers
 - use more XFS_IS_CORRUPT
 - misc cleanups and minor renames
 - document the XFS_ZR_* constants
 - rename the IN_GC flag
 - make gc_bio.state an enum
 - don't join rtg to empty transaction in xfs_zone_gc_query
 - update copyrights
 - better inode and sb verifiers
 - allocate GC thread specific data outside the thread
 - clean up GC naming and add more comments
 - use the cmp_int trick
 - rework zone list locking a bit to avoid kmallocing under a spinlock
 - export rtstart in the fsgeometry in fsblocks
 - use buckets to speed up GC victim selection
 - stop the GC thread when freezing the file system
 - drop an assert that was racy
 - move some code additions between patches in the series
 - keep an active open zone reference for outstanding I/O
 - handle the case of all using all available open zones and an an
   open GC at shutdown at mount time correctly
 - reject zoned specific mount options for non-zoned file systems
 - export the max_open_zones limit in sysfs
 - add freecounter tracing
 - reduce metafile reservations
 - fix GC I/O splitting on devices with not LBA aligned max_sectors

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2025-02-26  1:20 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-18  8:10 support for zoned devices v3 Christoph Hellwig
2025-02-18  8:10 ` [PATCH 01/45] xfs: reflow xfs_dec_freecounter Christoph Hellwig
2025-02-18  8:10 ` [PATCH 02/45] xfs: generalize the freespace and reserved blocks handling Christoph Hellwig
2025-02-19 22:11   ` Darrick J. Wong
2025-02-18  8:10 ` [PATCH 03/45] xfs: support reserved blocks for the rt extent counter Christoph Hellwig
2025-02-25 18:05   ` Darrick J. Wong
2025-02-26  1:20     ` Christoph Hellwig
2025-02-18  8:10 ` [PATCH 04/45] xfs: trace in-memory freecounter reservations Christoph Hellwig
2025-02-18  8:10 ` [PATCH 05/45] xfs: preserve RT reservations across remounts Christoph Hellwig
2025-02-18  8:10 ` [PATCH 06/45] xfs: fixup the metabtree reservation in xrep_reap_metadir_fsblocks Christoph Hellwig
2025-02-19 18:41   ` Darrick J. Wong
2025-02-18  8:10 ` [PATCH 07/45] xfs: make metabtree reservations global Christoph Hellwig
2025-02-19 18:44   ` Darrick J. Wong
2025-02-18  8:10 ` [PATCH 08/45] xfs: reduce metafile reservations Christoph Hellwig
2025-02-18  8:10 ` [PATCH 09/45] xfs: factor out a xfs_rt_check_size helper Christoph Hellwig
2025-02-18  8:10 ` [PATCH 10/45] xfs: add a rtg_blocks helper Christoph Hellwig
2025-02-18  8:10 ` [PATCH 11/45] xfs: move xfs_bmapi_reserve_delalloc to xfs_iomap.c Christoph Hellwig
2025-02-19 21:47   ` Darrick J. Wong
2025-02-18  8:10 ` [PATCH 12/45] xfs: skip always_cow inodes in xfs_reflink_trim_around_shared Christoph Hellwig
2025-02-18  8:10 ` [PATCH 13/45] xfs: refine the unaligned check for always COW inodes in xfs_file_dio_write Christoph Hellwig
2025-02-18  8:10 ` [PATCH 14/45] xfs: support XFS_BMAPI_REMAP in xfs_bmap_del_extent_delay Christoph Hellwig
2025-02-18  8:10 ` [PATCH 15/45] xfs: add a xfs_rtrmap_highest_rgbno helper Christoph Hellwig
2025-02-18  8:10 ` [PATCH 16/45] xfs: define the zoned on-disk format Christoph Hellwig
2025-02-18  8:10 ` [PATCH 17/45] xfs: allow internal RT devices for zoned mode Christoph Hellwig
2025-02-18  8:10 ` [PATCH 18/45] xfs: export zoned geometry via XFS_FSOP_GEOM Christoph Hellwig
2025-02-18  8:10 ` [PATCH 19/45] xfs: disable sb_frextents for zoned file systems Christoph Hellwig
2025-02-18  8:10 ` [PATCH 20/45] xfs: disable FITRIM for zoned RT devices Christoph Hellwig
2025-02-18  8:10 ` [PATCH 21/45] xfs: don't call xfs_can_free_eofblocks from ->release for zoned inodes Christoph Hellwig
2025-02-18  8:10 ` [PATCH 22/45] xfs: skip zoned RT inodes in xfs_inodegc_want_queue_rt_file Christoph Hellwig
2025-02-18  8:10 ` [PATCH 23/45] xfs: parse and validate hardware zone information Christoph Hellwig
2025-02-18  8:10 ` [PATCH 24/45] xfs: add the zoned space allocator Christoph Hellwig
2025-02-19 21:58   ` Darrick J. Wong
2025-02-20  6:17     ` Christoph Hellwig
2025-02-18  8:10 ` [PATCH 25/45] xfs: add support for zoned space reservations Christoph Hellwig
2025-02-19 22:00   ` Darrick J. Wong
2025-02-18  8:10 ` [PATCH 26/45] xfs: implement zoned garbage collection Christoph Hellwig
2025-02-19 22:02   ` Darrick J. Wong
2025-02-18  8:10 ` [PATCH 27/45] xfs: implement buffered writes to zoned RT devices Christoph Hellwig
2025-02-19 21:47   ` Darrick J. Wong
2025-02-20  6:16     ` Christoph Hellwig
2025-02-20 16:57       ` Darrick J. Wong
2025-02-25 17:59   ` Darrick J. Wong
2025-02-18  8:10 ` [PATCH 28/45] xfs: implement direct " Christoph Hellwig
2025-02-18  8:10 ` [PATCH 29/45] xfs: wire up zoned block freeing in xfs_rtextent_free_finish_item Christoph Hellwig
2025-02-18  8:10 ` [PATCH 30/45] xfs: hide reserved RT blocks from statfs Christoph Hellwig
2025-02-18  8:10 ` [PATCH 31/45] xfs: support growfs on zoned file systems Christoph Hellwig
2025-02-18  8:10 ` [PATCH 32/45] xfs: allow COW forks on zoned file systems in xchk_bmap Christoph Hellwig
2025-02-18  8:10 ` [PATCH 33/45] xfs: support xchk_xref_is_used_rt_space on zoned file systems Christoph Hellwig
2025-02-18  8:10 ` [PATCH 34/45] xfs: support xrep_require_rtext_inuse " Christoph Hellwig
2025-02-18  8:10 ` [PATCH 35/45] xfs: enable fsmap reporting for internal RT devices Christoph Hellwig
2025-02-18  8:10 ` [PATCH 36/45] xfs: disable reflink for zoned file systems Christoph Hellwig
2025-02-18  8:10 ` [PATCH 37/45] xfs: disable rt quotas " Christoph Hellwig
2025-02-18  8:10 ` [PATCH 38/45] xfs: enable the zoned RT device feature Christoph Hellwig
2025-02-18  8:10 ` [PATCH 39/45] xfs: support zone gaps Christoph Hellwig
2025-02-18  8:10 ` [PATCH 40/45] xfs: add a max_open_zones mount option Christoph Hellwig
2025-02-18  8:10 ` [PATCH 41/45] xfs: support write life time based data placement Christoph Hellwig
2025-02-19 18:49   ` Darrick J. Wong
2025-02-18  8:10 ` [PATCH 42/45] xfs: wire up the show_stats super operation Christoph Hellwig
2025-02-18  8:10 ` [PATCH 43/45] xfs: export zone stats in /proc/*/mountstats Christoph Hellwig
2025-02-18  8:10 ` [PATCH 44/45] xfs: contain more sysfs code in xfs_sysfs.c Christoph Hellwig
2025-02-18  8:10 ` [PATCH 45/45] xfs: export max_open_zones in sysfs Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox