public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Thiago Ramon <thiagoramon@gmail.com>, kreijack@inwind.it
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Qu Wenruo <wqu@suse.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree")
Date: Sat, 16 Jul 2022 08:34:30 +0800	[thread overview]
Message-ID: <1dcfecba-92fc-6f49-bdea-705896ece036@gmx.com> (raw)
In-Reply-To: <CAO1Y9woJUhuQ+Q2yWSvscnBJb9D5cYiBaY-WG3Re=7V=OzWVhw@mail.gmail.com>



On 2022/7/16 03:08, Thiago Ramon wrote:
> As a user of RAID6 here, let me jump in because I think this
> suggestion is actually a very good compromise.
>
> With stripes written only once, we completely eliminate any possible
> write-hole, and even without any changes on the current disk layout
> and allocation,

Unfortunately current extent allocator won't understand the requirement
at all.

Currently the extent allocator although tends to use clustered free
space, when it can not find a clustered space, it goes where it can find
a free space. No matter if it's a substripe write.


Thus to full stripe only write, it's really the old idea about a new
extent allocator to avoid sub-stripe writes.

Nowadays with the zoned code, I guess it is now more feasible than previous.

Now I think it's time to revive the extent allcator idea, and explore
the extent allocator based idea, at least it requires no on-disk format
change, which even write-intent still needs a on-disk format change (at
least needs a compat ro flag)

Thanks,
Qu

> there shouldn't be much wasted space (in my case, I
> have a 12-disk RAID6, so each full stripe holds 640kb, and discounting
> single-sector writes that should go into metadata space, any
> reasonable write should fill that buffer in a few seconds).
>
> The additional suggestion of using smaller stripe widths in case there
> isn't enough data to fill a whole stripe would make it very easy to
> reclaim the wasted space by rebalancing with a stripe count filter,
> which can be easily automated and run very frequently.
>
> On-disk format also wouldn't change and be fully usable by older
> kernels, and it should "only" require changes on the allocator to
> implement.
>
> On Fri, Jul 15, 2022 at 2:58 PM Goffredo Baroncelli <kreijack@libero.it> wrote:
>>
>> On 14/07/2022 09.46, Johannes Thumshirn wrote:
>>> On 14.07.22 09:32, Qu Wenruo wrote:
>>>> [...]
>>>
>>> Again if you're doing sub-stripe size writes, you're asking stupid things and
>>> then there's no reason to not give the user stupid answers.
>>>
>>
>> Qu is right, if we consider only full stripe write the "raid hole" problem
>> disappear, because if a "full stripe" is not fully written it is not
>> referenced either.
>>
>>
>> Personally I think that the ZFS variable stripe size, may be interesting
>> to evaluate. Moreover, because the BTRFS disk format is quite flexible,
>> we can store different BG with different number of disks. Let me to make an
>> example: if we have 10 disks, we could allocate:
>> 1 BG RAID1
>> 1 BG RAID5, spread over 4 disks only
>> 1 BG RAID5, spread over 8 disks only
>> 1 BG RAID5, spread over 10 disks
>>
>> So if we have short writes, we could put the extents in the RAID1 BG; for longer
>> writes we could use a RAID5 BG with 4 or 8 or 10 disks depending by length
>> of the data.
>>
>> Yes this would require a sort of garbage collector to move the data to the biggest
>> raid5 BG, but this would avoid (or reduce) the fragmentation which affect the
>> variable stripe size.
>>
>> Doing so we don't need any disk format change and it would be backward compatible.
>>
>>
>> Moreover, if we could put the smaller BG in the faster disks, we could have a
>> decent tiering....
>>
>>
>>> If a user is concerned about the write or space amplicfication of sub-stripe
>>> writes on RAID56 he/she really needs to rethink the architecture.
>>>
>>>
>>>
>>> [1]
>>> S. K. Mishra and P. Mohapatra,
>>> "Performance study of RAID-5 disk arrays with data and parity cache,"
>>> Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing,
>>> 1996, pp. 222-229 vol.1, doi: 10.1109/ICPP.1996.537164.
>>
>> --
>> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
>> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
>>

  reply	other threads:[~2022-07-16  0:34 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16 14:31 [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-05-17  7:39   ` Qu Wenruo
2022-05-17  7:45     ` Johannes Thumshirn
2022-05-17  7:56       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 2/8] btrfs: move btrfs_io_context to volumes.h Johannes Thumshirn
2022-05-17  7:42   ` Qu Wenruo
2022-05-17  7:51     ` Johannes Thumshirn
2022-05-17  7:58       ` Qu Wenruo
2022-05-17  8:01         ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 3/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2022-05-17  8:09   ` Qu Wenruo
2022-05-17  8:13     ` Johannes Thumshirn
2022-05-17  8:28       ` Qu Wenruo
2022-05-18 11:29         ` Johannes Thumshirn
2022-05-19  8:36           ` Qu Wenruo
2022-05-19  8:39             ` Johannes Thumshirn
2022-05-19 10:37               ` Qu Wenruo
2022-05-19 11:44                 ` Johannes Thumshirn
2022-05-19 11:48                   ` Qu Wenruo
2022-05-19 11:53                     ` Johannes Thumshirn
2022-05-19 13:26                       ` Qu Wenruo
2022-05-19 13:49                         ` Johannes Thumshirn
2022-05-19 22:56                           ` Qu Wenruo
2022-05-20  8:27                             ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 4/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
2022-05-17  7:53   ` Qu Wenruo
2022-05-17  8:00   ` Qu Wenruo
2022-05-17  8:05     ` Johannes Thumshirn
2022-05-17  8:09       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 5/8] btrfs: add code to delete " Johannes Thumshirn
2022-05-17  8:06   ` Qu Wenruo
2022-05-17  8:10     ` Johannes Thumshirn
2022-05-17  8:14       ` Qu Wenruo
2022-05-17  8:20         ` Johannes Thumshirn
2022-05-17  8:31           ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 6/8] btrfs: add code to read " Johannes Thumshirn
2022-05-16 14:55   ` Josef Bacik
2022-05-16 14:31 ` [RFC ONLY 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
2022-05-16 14:58 ` [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Josef Bacik
2022-05-16 15:04   ` Johannes Thumshirn
2022-05-16 15:10     ` Josef Bacik
2022-05-16 15:47       ` Johannes Thumshirn
2022-05-17  7:23 ` Nikolay Borisov
2022-05-17  7:31   ` Qu Wenruo
2022-05-17  7:41     ` Johannes Thumshirn
2022-05-17  7:32   ` Johannes Thumshirn
2022-07-13 10:54 ` RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree") Qu Wenruo
2022-07-13 11:43   ` Johannes Thumshirn
2022-07-13 12:01     ` Qu Wenruo
2022-07-13 12:42       ` Johannes Thumshirn
2022-07-13 13:47         ` Qu Wenruo
2022-07-13 14:01           ` Johannes Thumshirn
2022-07-13 15:24             ` Lukas Straub
2022-07-13 15:28               ` Johannes Thumshirn
2022-07-14  1:08             ` Qu Wenruo
2022-07-14  7:08               ` Johannes Thumshirn
2022-07-14  7:32                 ` Qu Wenruo
2022-07-14  7:46                   ` Johannes Thumshirn
2022-07-14  7:53                     ` Qu Wenruo
2022-07-15 17:54                     ` Goffredo Baroncelli
2022-07-15 19:08                       ` Thiago Ramon
2022-07-16  0:34                         ` Qu Wenruo [this message]
2022-07-16 11:11                           ` Qu Wenruo
2022-07-16 13:52                             ` Thiago Ramon
2022-07-16 14:26                               ` Goffredo Baroncelli
2022-07-17 17:58                                 ` Goffredo Baroncelli
2022-07-17  0:30                               ` Qu Wenruo
2022-07-17 15:18                                 ` Thiago Ramon
2022-07-17 22:01                                   ` Qu Wenruo
2022-07-17 23:00                           ` Zygo Blaxell
2022-07-18  1:04                             ` Qu Wenruo
2022-07-15 20:14                       ` Chris Murphy
2022-07-18  7:33                         ` Johannes Thumshirn
2022-07-18  8:03                           ` Qu Wenruo
2022-07-18 21:49                         ` Forza
2022-07-19  1:19                           ` Qu Wenruo
2022-07-21 14:51                             ` Forza
2022-07-24 11:27                               ` Qu Wenruo
2022-07-25  0:00                             ` Zygo Blaxell
2022-07-25  0:25                               ` Qu Wenruo
2022-07-25  5:41                                 ` Zygo Blaxell
2022-07-25  7:49                                   ` Qu Wenruo
2022-07-25 19:58                               ` Goffredo Baroncelli
2022-07-25 21:29                                 ` Qu Wenruo
2022-07-18  7:30                       ` Johannes Thumshirn
2022-07-19 18:58                         ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1dcfecba-92fc-6f49-bdea-705896ece036@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=thiagoramon@gmail.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox