From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
To: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>,
Christoph Hellwig <hch@lst.de>,
Naohiro Aota <naohiro.aota@wdc.com>, Qu Wenruo <wqu@suse.com>,
Damien Le Moal <dlemoal@kernel.org>,
linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
Anand Jain <anand.jain@oracle.com>
Subject: [PATCH v8 00/11] btrfs: introduce RAID stripe tree
Date: Mon, 11 Sep 2023 05:52:01 -0700 [thread overview]
Message-ID: <20230911-raid-stripe-tree-v8-0-647676fa852c@wdc.com> (raw)
Updates of the raid-stripe-tree are done at ordered extent write time to safe
on bandwidth while for reading we do the stripe-tree lookup on bio mapping
time, i.e. when the logical to physical translation happens for regular btrfs
RAID as well.
The stripe tree is keyed by an extent's disk_bytenr and disk_num_bytes and
it's contents are the respective physical device id and position.
For an example 1M write (split into 126K segments due to zone-append)
rapido2:/home/johannes/src/fstests# xfs_io -fdc "pwrite -b 1M 0 1M" -c fsync /mnt/test/test
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0065 sec (151.538 MiB/sec and 151.5381 ops/sec)
The tree will look as follows (both 128k buffered writes to a ZNS drive):
RAID0 case:
bash-5.2# btrfs inspect-internal dump-tree -t raid_stripe /dev/nvme0n1
btrfs-progs v6.3
raid stripe tree key (RAID_STRIPE_TREE ROOT_ITEM 0)
leaf 805535744 items 1 free space 16218 generation 8 owner RAID_STRIPE_TREE
leaf 805535744 flags 0x1(WRITTEN) backref revision 1
checksum stored 2d2d2262
checksum calced 2d2d2262
fs uuid ab05cfc6-9859-404e-970d-3999b1cb5438
chunk uuid c9470ba2-49ac-4d46-8856-438a18e6bd23
item 0 key (1073741824 RAID_STRIPE_KEY 131072) itemoff 16243 itemsize 40
encoding: RAID0
stripe 0 devid 1 offset 805306368
stripe 1 devid 2 offset 536870912
total bytes 42949672960
bytes used 294912
uuid ab05cfc6-9859-404e-970d-3999b1cb5438
RAID1 case:
bash-5.2# btrfs inspect-internal dump-tree -t raid_stripe /dev/nvme0n1
btrfs-progs v6.3
raid stripe tree key (RAID_STRIPE_TREE ROOT_ITEM 0)
leaf 805535744 items 1 free space 16218 generation 8 owner RAID_STRIPE_TREE
leaf 805535744 flags 0x1(WRITTEN) backref revision 1
checksum stored 56199539
checksum calced 56199539
fs uuid 9e693a37-fbd1-4891-aed2-e7fe64605045
chunk uuid 691874fc-1b9c-469b-bd7f-05e0e6ba88c4
item 0 key (939524096 RAID_STRIPE_KEY 131072) itemoff 16243 itemsize 40
encoding: RAID1
stripe 0 devid 1 offset 939524096
stripe 1 devid 2 offset 536870912
total bytes 42949672960
bytes used 294912
uuid 9e693a37-fbd1-4891-aed2-e7fe64605045
A design document can be found here:
https://docs.google.com/document/d/1Iui_jMidCd4MVBNSSLXRfO7p5KmvnoQL/edit?usp=sharing&ouid=103609947580185458266&rtpof=true&sd=true
The user-space part of this series can be found here:
https://lore.kernel.org/linux-btrfs/20230215143109.2721722-1-johannes.thumshirn@wdc.com
Changes to v7:
- Huge rewrite
v7 of the patchset can be found here:
https://lore.kernel.org/linux-btrfs/cover.1677750131.git.johannes.thumshirn@wdc.com/
Changes to v6:
- Fix degraded RAID1 mounts
- Fix RAID0/10 mounts
v6 of the patchset can be found here:
https://lore/kernel.org/linux-btrfs/cover.1676470614.git.johannes.thumshirn@wdc.com
Changes to v5:
- Incroporated review comments from Josef and Christoph
- Rebased onto misc-next
v5 of the patchset can be found here:
https://lore/kernel.org/linux-btrfs/cover.1675853489.git.johannes.thumshirn@wdc.com
Changes to v4:
- Added patch to check for RST feature in sysfs
- Added RST lookups for scrubbing
- Fixed the error handling bug Josef pointed out
- Only check if we need to write out a RST once per delayed_ref head
- Added support for zoned data DUP with RST
Changes to v3:
- Rebased onto 20221120124734.18634-1-hch@lst.de
- Incorporated Josef's review
- Merged related patches
v3 of the patchset can be found here:
https://lore/kernel.org/linux-btrfs/cover.1666007330.git.johannes.thumshirn@wdc.com
Changes to v2:
- Bug fixes
- Rebased onto 20220901074216.1849941-1-hch@lst.de
- Added tracepoints
- Added leak checker
- Added RAID0 and RAID10
v2 of the patchset can be found here:
https://lore.kernel.org/linux-btrfs/cover.1656513330.git.johannes.thumshirn@wdc.com
Changes to v1:
- Write the stripe-tree at delayed-ref time (Qu)
- Add a different write path for preallocation
v1 of the patchset can be found here:
https://lore.kernel.org/linux-btrfs/cover.1652711187.git.johannes.thumshirn@wdc.com/
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
Johannes Thumshirn (11):
btrfs: add raid stripe tree definitions
btrfs: read raid-stripe-tree from disk
btrfs: add support for inserting raid stripe extents
btrfs: delete stripe extent on extent deletion
btrfs: lookup physical address from stripe extent
btrfs: implement RST version of scrub
btrfs: zoned: allow zoned RAID
btrfs: add raid stripe tree pretty printer
btrfs: announce presence of raid-stripe-tree in sysfs
btrfs: add trace events for RST
btrfs: add raid-stripe-tree to features enabled with debug
fs/btrfs/Makefile | 2 +-
fs/btrfs/accessors.h | 10 +
fs/btrfs/bio.c | 23 ++
fs/btrfs/block-rsv.c | 6 +
fs/btrfs/disk-io.c | 18 ++
fs/btrfs/disk-io.h | 5 +
fs/btrfs/extent-tree.c | 7 +
fs/btrfs/fs.h | 4 +-
fs/btrfs/inode.c | 8 +-
fs/btrfs/locking.c | 5 +-
fs/btrfs/ordered-data.c | 1 +
fs/btrfs/ordered-data.h | 2 +
fs/btrfs/print-tree.c | 49 ++++
fs/btrfs/raid-stripe-tree.c | 493 ++++++++++++++++++++++++++++++++++++++++
fs/btrfs/raid-stripe-tree.h | 52 +++++
fs/btrfs/scrub.c | 56 +++++
fs/btrfs/sysfs.c | 3 +
fs/btrfs/volumes.c | 43 +++-
fs/btrfs/volumes.h | 15 +-
fs/btrfs/zoned.c | 113 ++++++++-
include/trace/events/btrfs.h | 75 ++++++
include/uapi/linux/btrfs.h | 1 +
include/uapi/linux/btrfs_tree.h | 33 ++-
23 files changed, 999 insertions(+), 25 deletions(-)
---
base-commit: 133da717263112d81bb95b5535ceb2c1eeddd4e7
change-id: 20230613-raid-stripe-tree-e330c9a45cc3
Best regards,
--
Johannes Thumshirn <johannes.thumshirn@wdc.com>
next reply other threads:[~2023-09-11 22:07 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-11 12:52 Johannes Thumshirn [this message]
2023-09-11 12:52 ` [PATCH v8 01/11] btrfs: add raid stripe tree definitions Johannes Thumshirn
2023-09-11 21:00 ` Damien Le Moal
2023-09-12 6:09 ` Johannes Thumshirn
2023-09-12 20:32 ` David Sterba
2023-09-13 6:02 ` Johannes Thumshirn
2023-09-13 14:49 ` David Sterba
2023-09-13 14:57 ` Johannes Thumshirn
2023-09-13 16:06 ` David Sterba
2023-09-11 12:52 ` [PATCH v8 02/11] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2023-09-14 9:27 ` Qu Wenruo
2023-09-14 9:33 ` Johannes Thumshirn
2023-09-11 12:52 ` [PATCH v8 03/11] btrfs: add support for inserting raid stripe extents Johannes Thumshirn
2023-09-13 16:50 ` David Sterba
2023-09-13 16:57 ` David Sterba
2023-09-14 9:25 ` Qu Wenruo
2023-09-14 9:51 ` Johannes Thumshirn
2023-09-14 10:06 ` Qu Wenruo
2023-09-14 15:35 ` Johannes Thumshirn
2023-09-11 12:52 ` [PATCH v8 04/11] btrfs: delete stripe extent on extent deletion Johannes Thumshirn
2023-09-11 12:52 ` [PATCH v8 05/11] btrfs: lookup physical address from stripe extent Johannes Thumshirn
2023-09-14 9:18 ` Qu Wenruo
2023-09-14 9:45 ` Johannes Thumshirn
2023-09-14 14:16 ` Johannes Thumshirn
2023-09-11 12:52 ` [PATCH v8 06/11] btrfs: implement RST version of scrub Johannes Thumshirn
2023-09-13 9:51 ` Qu Wenruo
2023-09-13 16:59 ` David Sterba
2023-09-11 12:52 ` [PATCH v8 07/11] btrfs: zoned: allow zoned RAID Johannes Thumshirn
2023-09-12 20:49 ` David Sterba
2023-09-13 5:41 ` Johannes Thumshirn
2023-09-13 14:52 ` David Sterba
2023-09-13 14:59 ` Johannes Thumshirn
2023-09-11 12:52 ` [PATCH v8 08/11] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
2023-09-12 20:42 ` David Sterba
2023-09-13 5:34 ` Johannes Thumshirn
2023-09-11 12:52 ` [PATCH v8 09/11] btrfs: announce presence of raid-stripe-tree in sysfs Johannes Thumshirn
2023-09-11 12:52 ` [PATCH v8 10/11] btrfs: add trace events for RST Johannes Thumshirn
2023-09-12 20:46 ` David Sterba
2023-09-11 12:52 ` [PATCH v8 11/11] btrfs: add raid-stripe-tree to features enabled with debug Johannes Thumshirn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230911-raid-stripe-tree-v8-0-647676fa852c@wdc.com \
--to=johannes.thumshirn@wdc.com \
--cc=anand.jain@oracle.com \
--cc=clm@fb.com \
--cc=dlemoal@kernel.org \
--cc=dsterba@suse.com \
--cc=hch@lst.de \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).