Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: waxhead@dirtcellar.net, Christoph Hellwig <hch@infradead.org>
Cc: dsterba@suse.cz, Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH DRAFT] btrfs: RAID56J journal on-disk format draft
Date: Thu, 26 May 2022 17:26:52 +0800	[thread overview]
Message-ID: <50cb070b-2e2e-1987-3726-1e67eaf060cf@gmx.com> (raw)
In-Reply-To: <9d1e2fc6-9ee6-68f8-bda8-8dd7e59e74e5@dirtcellar.net>



On 2022/5/26 17:06, waxhead wrote:
> Qu Wenruo wrote:
>>
>>
>> On 2022/5/25 17:26, Christoph Hellwig wrote:
>>> On Wed, May 25, 2022 at 05:13:11PM +0800, Qu Wenruo wrote:
>>>> The problem is, we can have partial write for RAID56, no matter if we
>>>> use NODATACOW or not.
>>>>
>>>> For example, we have a very typical 3 disks RAID5:
>>>>
>>>>     0    32K    64K
>>>> Disk 1  |DDDDDDD|       |
>>>> Disk 2  |ddddddd|ddddddd|
>>>> Disk 3  |PPPPPPP|PPPPPPP|
>>>>
>>>>
>>>> D = old data, it's there for a while.
>>>> d = new data, we want to write.
>>>
>>> Oh.  I keep forgetting that the striping is entirely on the physіcal
>>> block basis and not logic block basis.  Which makes the whole idea
>>> of btrfs integrated raid5/6 not all that useful compared to just using
>>> mdraid :(
>>
>> Yep, that's why I have to go the old journal way.
>>
>> But you may want to explore the super awesome idea of raid stripe tree
>> from Johannes.
>>
>> The idea is we introduce a new layer of logical addr -> internal mapping
>> -> physical addr.
>> By that, we get rid of the strict physical address requirement.
>>
>> And when we update the new stripe, we just insert two new mapping for
>> (dddd), and two new mapping for the new (PPPPP).
>>
>> If power loss happen, we still see the old internal mapping, and can get
>> the correct recovery.
>>
>> But it still seems to have a lot of things to resolve for now.
>>
>> Thanks,
>> Qu
>
> I am just a humble BTRFS user and while I think the journaled approach
> sounds superinteresting I believe that the stripe tree sounds like the
> better solution in the long run.
>
> Is it really such a good idea to add a (potentially temporary) journaled
> raid mode if the stripe tree version really is better?

Journal is simpler to implement, and has been tried and true for a long
time.
Although the on-disk format change is unavoidable.

Another problem is, for now we don't have a good idea on even if it's
possible to use stripe tree for metadata.
(And it's still under the early stage for stripe tree)

Sure forcing RAID10/RAID1C* on metadata would be acceptable for most
users, it's still something to take into consideration.

> What about Josef
> Bacik's extent tree v2 ? Would that fit better with the stripe tree /
> would it cause problems with the journaled mode?

I don't believe extent tree v2 would affect RAID56J at all.

Not 100% sure about RST (raid stripe tree), but from the initial
impression, some tricks from extent tree v2 may help RST.

>
> As a regular user I think that adding another raid56 mode may be
> confusing, especially for people that do not understand how things work
> (which absolutely sometimes includes me too), Quite some BTRFS use is
> also done outside the datacenter, and it is regular joe and co. that
> complains the most when they screw up, which to some extent prevents
> adoption on non-stellar hardware which again would/could lead to
> bugreorts and a better filesystem in the long run. So therefore:
>
> If the standard raid56 mode is unstable and discouraged to use, would it
> not be better to sneakily drop that once and for all e.g. just make it
> so that new filesystems created with raid56 automatically uses the new
> (and better) raid56j mode? Effectively preventing users from making
> filesystems with the "bad" raid56 after a certain btrfs-progs version?

Deprecation needs time, and RAID56J is not a drop-in replacement
unfortunately, it needs on-disk format change, and is new RAID profiles.

If the code is finished and properly tested (through several kernel
released), we may switch all raid56 to raid56J in mkfs.btrfs and balance
(aka, balance profile raid56j becomes the default one for raid56).

For RST, it's harder to say with confidence now, a lot of things are not
yet determined...

Thanks,
Qu

>
> This way the raid56 code would seem to be fixed albeit getting slower
> (as I understand it), but the number of configurations available is not
> overwhelming for us regular people.
>
> PS! I understand that I sound like I am not to keen on the new raid56j
> mode which is sort of true, but that does not mean that I am ungrateful
> for it :)

  reply	other threads:[~2022-05-26  9:27 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-24  6:13 [PATCH DRAFT] btrfs: RAID56J journal on-disk format draft Qu Wenruo
2022-05-24 11:08 ` kernel test robot
2022-05-24 12:19 ` kernel test robot
2022-05-24 17:02 ` David Sterba
2022-05-24 22:31   ` Qu Wenruo
2022-05-25  9:00   ` Christoph Hellwig
2022-05-25  9:13     ` Qu Wenruo
2022-05-25  9:26       ` Christoph Hellwig
2022-05-25  9:35         ` Qu Wenruo
2022-05-26  9:06           ` waxhead
2022-05-26  9:26             ` Qu Wenruo [this message]
2022-05-26 15:30               ` Goffredo Baroncelli
2022-05-26 16:10                 ` David Sterba
2022-06-01  2:06 ` Wang Yugui
2022-06-01  2:13   ` Qu Wenruo
2022-06-01  2:25     ` Wang Yugui
2022-06-01  2:55       ` Qu Wenruo
2022-06-01  9:07         ` Wang Yugui
2022-06-01  9:27           ` Qu Wenruo
2022-06-01  9:56             ` Paul Jones
2022-06-01 10:12               ` Qu Wenruo
2022-06-01 18:49                 ` Martin Raiber
2022-06-01 21:37                   ` Qu Wenruo
2022-06-03  9:32                     ` Lukas Straub
2022-06-03  9:59                       ` Qu Wenruo
2022-06-06  8:16                         ` Qu Wenruo
2022-06-06 11:21                           ` Qu Wenruo
2022-06-06 18:10                             ` Goffredo Baroncelli
2022-06-07  1:27                               ` Qu Wenruo
2022-06-07 17:36                                 ` Goffredo Baroncelli
2022-06-07 22:14                                   ` Qu Wenruo
2022-06-08 17:26                                     ` Goffredo Baroncelli
2022-06-13  2:27                                       ` Qu Wenruo
2022-06-08 15:17                         ` Lukas Straub
2022-06-08 17:32                           ` Goffredo Baroncelli
2022-06-01 12:21               ` Qu Wenruo
2022-06-01 14:55                 ` Robert Krig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50cb070b-2e2e-1987-3726-1e67eaf060cf@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waxhead@dirtcellar.net \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox