From: Lukas Straub <lukasstraub2@web.de>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree")
Date: Wed, 13 Jul 2022 15:24:24 +0000 [thread overview]
Message-ID: <20220713152424.0d93e5fd@gecko> (raw)
In-Reply-To: <PH0PR04MB741638E2A15F4E106D8A6FAF9B899@PH0PR04MB7416.namprd04.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 3428 bytes --]
On Wed, 13 Jul 2022 14:01:32 +0000
Johannes Thumshirn <Johannes.Thumshirn@wdc.com> wrote:
> On 13.07.22 15:47, Qu Wenruo wrote:
> >
> >
> > On 2022/7/13 20:42, Johannes Thumshirn wrote:
> >> On 13.07.22 14:01, Qu Wenruo wrote:
> >>>
> >>>
> >>> On 2022/7/13 19:43, Johannes Thumshirn wrote:
> >>>> On 13.07.22 12:54, Qu Wenruo wrote:
> >>>>>
> >>>>>
> >>>>> On 2022/5/16 22:31, Johannes Thumshirn wrote:
> >>>>>> Introduce a raid-stripe-tree to record writes in a RAID environment.
> >>>>>>
> >>>>>> In essence this adds another address translation layer between the logical
> >>>>>> and the physical addresses in btrfs and is designed to close two gaps. The
> >>>>>> first is the ominous RAID-write-hole we suffer from with RAID5/6 and the
> >>>>>> second one is the inability of doing RAID with zoned block devices due to the
> >>>>>> constraints we have with REQ_OP_ZONE_APPEND writes.
> >>>>>
> >>>>> Here I want to discuss about something related to RAID56 and RST.
> >>>>>
> >>>>> One of my long existing concern is, P/Q stripes have a higher update
> >>>>> frequency, thus with certain transaction commit/data writeback timing,
> >>>>> wouldn't it cause the device storing P/Q stripes go out of space before
> >>>>> the data stripe devices?
> >>>>
> >>>> P/Q stripes on a dedicated drive would be RAID4, which we don't have.
> >>>
> >>> I'm just using one block group as an example.
> >>>
> >>> Sure, the next bg can definitely go somewhere else.
> >>>
> >>> But inside one bg, we are still using one zone for the bg, right?
> >>
> >> Ok maybe I'm not understanding the code in volumes.c correctly, but
> >> doesn't __btrfs_map_block() calculate a rotation per stripe-set?
> >>
> >> I'm looking at this code:
> >>
> >> /* Build raid_map */
> >> if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
> >> (need_full_stripe(op) || mirror_num > 1)) {
> >> u64 tmp;
> >> unsigned rot;
> >>
> >> /* Work out the disk rotation on this stripe-set */
> >> div_u64_rem(stripe_nr, num_stripes, &rot);
> >>
> >> /* Fill in the logical address of each stripe */
> >> tmp = stripe_nr * data_stripes;
> >> for (i = 0; i < data_stripes; i++)
> >> bioc->raid_map[(i + rot) % num_stripes] =
> >> em->start + (tmp + i) * map->stripe_len;
> >>
> >> bioc->raid_map[(i + rot) % map->num_stripes] = RAID5_P_STRIPE;
> >> if (map->type & BTRFS_BLOCK_GROUP_RAID6)
> >> bioc->raid_map[(i + rot + 1) % num_stripes] =
> >> RAID6_Q_STRIPE;
> >>
> >> sort_parity_stripes(bioc, num_stripes);
> >> }
> >
> > That's per full-stripe. AKA, the rotation only kicks in after a full stripe.
> >
> > In my example, we're inside one full stripe, no rotation, until next
> > full stripe.
> >
>
>
> Ah ok, my apologies. For sub-stripe size writes My idea was to 0-pad up to
> stripe size. Then we can do full CoW of stripes. If we have an older generation
> of a stripe, we can just override it on regular btrfs. On zoned btrfs this
> just accounts for more zone_unusable bytes and waits for the GC to kick in.
>
Have you considered variable stripe size? I believe ZFS does this.
Should be easy for raid5 since it's just xor, not sure for raid6.
PS: ZFS seems to do variable-_width_ stripes
https://pthree.org/2012/12/05/zfs-administration-part-ii-raidz/
Regards,
Lukas Straub
--
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2022-07-13 15:25 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-16 14:31 [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-05-17 7:39 ` Qu Wenruo
2022-05-17 7:45 ` Johannes Thumshirn
2022-05-17 7:56 ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 2/8] btrfs: move btrfs_io_context to volumes.h Johannes Thumshirn
2022-05-17 7:42 ` Qu Wenruo
2022-05-17 7:51 ` Johannes Thumshirn
2022-05-17 7:58 ` Qu Wenruo
2022-05-17 8:01 ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 3/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2022-05-17 8:09 ` Qu Wenruo
2022-05-17 8:13 ` Johannes Thumshirn
2022-05-17 8:28 ` Qu Wenruo
2022-05-18 11:29 ` Johannes Thumshirn
2022-05-19 8:36 ` Qu Wenruo
2022-05-19 8:39 ` Johannes Thumshirn
2022-05-19 10:37 ` Qu Wenruo
2022-05-19 11:44 ` Johannes Thumshirn
2022-05-19 11:48 ` Qu Wenruo
2022-05-19 11:53 ` Johannes Thumshirn
2022-05-19 13:26 ` Qu Wenruo
2022-05-19 13:49 ` Johannes Thumshirn
2022-05-19 22:56 ` Qu Wenruo
2022-05-20 8:27 ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 4/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
2022-05-17 7:53 ` Qu Wenruo
2022-05-17 8:00 ` Qu Wenruo
2022-05-17 8:05 ` Johannes Thumshirn
2022-05-17 8:09 ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 5/8] btrfs: add code to delete " Johannes Thumshirn
2022-05-17 8:06 ` Qu Wenruo
2022-05-17 8:10 ` Johannes Thumshirn
2022-05-17 8:14 ` Qu Wenruo
2022-05-17 8:20 ` Johannes Thumshirn
2022-05-17 8:31 ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 6/8] btrfs: add code to read " Johannes Thumshirn
2022-05-16 14:55 ` Josef Bacik
2022-05-16 14:31 ` [RFC ONLY 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
2022-05-16 14:58 ` [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Josef Bacik
2022-05-16 15:04 ` Johannes Thumshirn
2022-05-16 15:10 ` Josef Bacik
2022-05-16 15:47 ` Johannes Thumshirn
2022-05-17 7:23 ` Nikolay Borisov
2022-05-17 7:31 ` Qu Wenruo
2022-05-17 7:41 ` Johannes Thumshirn
2022-05-17 7:32 ` Johannes Thumshirn
2022-07-13 10:54 ` RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree") Qu Wenruo
2022-07-13 11:43 ` Johannes Thumshirn
2022-07-13 12:01 ` Qu Wenruo
2022-07-13 12:42 ` Johannes Thumshirn
2022-07-13 13:47 ` Qu Wenruo
2022-07-13 14:01 ` Johannes Thumshirn
2022-07-13 15:24 ` Lukas Straub [this message]
2022-07-13 15:28 ` Johannes Thumshirn
2022-07-14 1:08 ` Qu Wenruo
2022-07-14 7:08 ` Johannes Thumshirn
2022-07-14 7:32 ` Qu Wenruo
2022-07-14 7:46 ` Johannes Thumshirn
2022-07-14 7:53 ` Qu Wenruo
2022-07-15 17:54 ` Goffredo Baroncelli
2022-07-15 19:08 ` Thiago Ramon
2022-07-16 0:34 ` Qu Wenruo
2022-07-16 11:11 ` Qu Wenruo
2022-07-16 13:52 ` Thiago Ramon
2022-07-16 14:26 ` Goffredo Baroncelli
2022-07-17 17:58 ` Goffredo Baroncelli
2022-07-17 0:30 ` Qu Wenruo
2022-07-17 15:18 ` Thiago Ramon
2022-07-17 22:01 ` Qu Wenruo
2022-07-17 23:00 ` Zygo Blaxell
2022-07-18 1:04 ` Qu Wenruo
2022-07-15 20:14 ` Chris Murphy
2022-07-18 7:33 ` Johannes Thumshirn
2022-07-18 8:03 ` Qu Wenruo
2022-07-18 21:49 ` Forza
2022-07-19 1:19 ` Qu Wenruo
2022-07-21 14:51 ` Forza
2022-07-24 11:27 ` Qu Wenruo
2022-07-25 0:00 ` Zygo Blaxell
2022-07-25 0:25 ` Qu Wenruo
2022-07-25 5:41 ` Zygo Blaxell
2022-07-25 7:49 ` Qu Wenruo
2022-07-25 19:58 ` Goffredo Baroncelli
2022-07-25 21:29 ` Qu Wenruo
2022-07-18 7:30 ` Johannes Thumshirn
2022-07-19 18:58 ` Goffredo Baroncelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220713152424.0d93e5fd@gecko \
--to=lukasstraub2@web.de \
--cc=Johannes.Thumshirn@wdc.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox