From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Thiago Ramon <thiagoramon@gmail.com>, kreijack@inwind.it
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
Qu Wenruo <wqu@suse.com>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree")
Date: Sat, 16 Jul 2022 08:34:30 +0800 [thread overview]
Message-ID: <1dcfecba-92fc-6f49-bdea-705896ece036@gmx.com> (raw)
In-Reply-To: <CAO1Y9woJUhuQ+Q2yWSvscnBJb9D5cYiBaY-WG3Re=7V=OzWVhw@mail.gmail.com>
On 2022/7/16 03:08, Thiago Ramon wrote:
> As a user of RAID6 here, let me jump in because I think this
> suggestion is actually a very good compromise.
>
> With stripes written only once, we completely eliminate any possible
> write-hole, and even without any changes on the current disk layout
> and allocation,
Unfortunately current extent allocator won't understand the requirement
at all.
Currently the extent allocator although tends to use clustered free
space, when it can not find a clustered space, it goes where it can find
a free space. No matter if it's a substripe write.
Thus to full stripe only write, it's really the old idea about a new
extent allocator to avoid sub-stripe writes.
Nowadays with the zoned code, I guess it is now more feasible than previous.
Now I think it's time to revive the extent allcator idea, and explore
the extent allocator based idea, at least it requires no on-disk format
change, which even write-intent still needs a on-disk format change (at
least needs a compat ro flag)
Thanks,
Qu
> there shouldn't be much wasted space (in my case, I
> have a 12-disk RAID6, so each full stripe holds 640kb, and discounting
> single-sector writes that should go into metadata space, any
> reasonable write should fill that buffer in a few seconds).
>
> The additional suggestion of using smaller stripe widths in case there
> isn't enough data to fill a whole stripe would make it very easy to
> reclaim the wasted space by rebalancing with a stripe count filter,
> which can be easily automated and run very frequently.
>
> On-disk format also wouldn't change and be fully usable by older
> kernels, and it should "only" require changes on the allocator to
> implement.
>
> On Fri, Jul 15, 2022 at 2:58 PM Goffredo Baroncelli <kreijack@libero.it> wrote:
>>
>> On 14/07/2022 09.46, Johannes Thumshirn wrote:
>>> On 14.07.22 09:32, Qu Wenruo wrote:
>>>> [...]
>>>
>>> Again if you're doing sub-stripe size writes, you're asking stupid things and
>>> then there's no reason to not give the user stupid answers.
>>>
>>
>> Qu is right, if we consider only full stripe write the "raid hole" problem
>> disappear, because if a "full stripe" is not fully written it is not
>> referenced either.
>>
>>
>> Personally I think that the ZFS variable stripe size, may be interesting
>> to evaluate. Moreover, because the BTRFS disk format is quite flexible,
>> we can store different BG with different number of disks. Let me to make an
>> example: if we have 10 disks, we could allocate:
>> 1 BG RAID1
>> 1 BG RAID5, spread over 4 disks only
>> 1 BG RAID5, spread over 8 disks only
>> 1 BG RAID5, spread over 10 disks
>>
>> So if we have short writes, we could put the extents in the RAID1 BG; for longer
>> writes we could use a RAID5 BG with 4 or 8 or 10 disks depending by length
>> of the data.
>>
>> Yes this would require a sort of garbage collector to move the data to the biggest
>> raid5 BG, but this would avoid (or reduce) the fragmentation which affect the
>> variable stripe size.
>>
>> Doing so we don't need any disk format change and it would be backward compatible.
>>
>>
>> Moreover, if we could put the smaller BG in the faster disks, we could have a
>> decent tiering....
>>
>>
>>> If a user is concerned about the write or space amplicfication of sub-stripe
>>> writes on RAID56 he/she really needs to rethink the architecture.
>>>
>>>
>>>
>>> [1]
>>> S. K. Mishra and P. Mohapatra,
>>> "Performance study of RAID-5 disk arrays with data and parity cache,"
>>> Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing,
>>> 1996, pp. 222-229 vol.1, doi: 10.1109/ICPP.1996.537164.
>>
>> --
>> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
>>
next prev parent reply other threads:[~2022-07-16 0:34 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-16 14:31 [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-05-17 7:39 ` Qu Wenruo
2022-05-17 7:45 ` Johannes Thumshirn
2022-05-17 7:56 ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 2/8] btrfs: move btrfs_io_context to volumes.h Johannes Thumshirn
2022-05-17 7:42 ` Qu Wenruo
2022-05-17 7:51 ` Johannes Thumshirn
2022-05-17 7:58 ` Qu Wenruo
2022-05-17 8:01 ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 3/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2022-05-17 8:09 ` Qu Wenruo
2022-05-17 8:13 ` Johannes Thumshirn
2022-05-17 8:28 ` Qu Wenruo
2022-05-18 11:29 ` Johannes Thumshirn
2022-05-19 8:36 ` Qu Wenruo
2022-05-19 8:39 ` Johannes Thumshirn
2022-05-19 10:37 ` Qu Wenruo
2022-05-19 11:44 ` Johannes Thumshirn
2022-05-19 11:48 ` Qu Wenruo
2022-05-19 11:53 ` Johannes Thumshirn
2022-05-19 13:26 ` Qu Wenruo
2022-05-19 13:49 ` Johannes Thumshirn
2022-05-19 22:56 ` Qu Wenruo
2022-05-20 8:27 ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 4/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
2022-05-17 7:53 ` Qu Wenruo
2022-05-17 8:00 ` Qu Wenruo
2022-05-17 8:05 ` Johannes Thumshirn
2022-05-17 8:09 ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 5/8] btrfs: add code to delete " Johannes Thumshirn
2022-05-17 8:06 ` Qu Wenruo
2022-05-17 8:10 ` Johannes Thumshirn
2022-05-17 8:14 ` Qu Wenruo
2022-05-17 8:20 ` Johannes Thumshirn
2022-05-17 8:31 ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 6/8] btrfs: add code to read " Johannes Thumshirn
2022-05-16 14:55 ` Josef Bacik
2022-05-16 14:31 ` [RFC ONLY 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
2022-05-16 14:58 ` [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Josef Bacik
2022-05-16 15:04 ` Johannes Thumshirn
2022-05-16 15:10 ` Josef Bacik
2022-05-16 15:47 ` Johannes Thumshirn
2022-05-17 7:23 ` Nikolay Borisov
2022-05-17 7:31 ` Qu Wenruo
2022-05-17 7:41 ` Johannes Thumshirn
2022-05-17 7:32 ` Johannes Thumshirn
2022-07-13 10:54 ` RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree") Qu Wenruo
2022-07-13 11:43 ` Johannes Thumshirn
2022-07-13 12:01 ` Qu Wenruo
2022-07-13 12:42 ` Johannes Thumshirn
2022-07-13 13:47 ` Qu Wenruo
2022-07-13 14:01 ` Johannes Thumshirn
2022-07-13 15:24 ` Lukas Straub
2022-07-13 15:28 ` Johannes Thumshirn
2022-07-14 1:08 ` Qu Wenruo
2022-07-14 7:08 ` Johannes Thumshirn
2022-07-14 7:32 ` Qu Wenruo
2022-07-14 7:46 ` Johannes Thumshirn
2022-07-14 7:53 ` Qu Wenruo
2022-07-15 17:54 ` Goffredo Baroncelli
2022-07-15 19:08 ` Thiago Ramon
2022-07-16 0:34 ` Qu Wenruo [this message]
2022-07-16 11:11 ` Qu Wenruo
2022-07-16 13:52 ` Thiago Ramon
2022-07-16 14:26 ` Goffredo Baroncelli
2022-07-17 17:58 ` Goffredo Baroncelli
2022-07-17 0:30 ` Qu Wenruo
2022-07-17 15:18 ` Thiago Ramon
2022-07-17 22:01 ` Qu Wenruo
2022-07-17 23:00 ` Zygo Blaxell
2022-07-18 1:04 ` Qu Wenruo
2022-07-15 20:14 ` Chris Murphy
2022-07-18 7:33 ` Johannes Thumshirn
2022-07-18 8:03 ` Qu Wenruo
2022-07-18 21:49 ` Forza
2022-07-19 1:19 ` Qu Wenruo
2022-07-21 14:51 ` Forza
2022-07-24 11:27 ` Qu Wenruo
2022-07-25 0:00 ` Zygo Blaxell
2022-07-25 0:25 ` Qu Wenruo
2022-07-25 5:41 ` Zygo Blaxell
2022-07-25 7:49 ` Qu Wenruo
2022-07-25 19:58 ` Goffredo Baroncelli
2022-07-25 21:29 ` Qu Wenruo
2022-07-18 7:30 ` Johannes Thumshirn
2022-07-19 18:58 ` Goffredo Baroncelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1dcfecba-92fc-6f49-bdea-705896ece036@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=thiagoramon@gmail.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox