From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>, Qu Wenruo <wqu@suse.com>,
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
Date: Mon, 23 May 2022 15:46:02 +0800 [thread overview]
Message-ID: <84b022dc-6310-1d52-b8e3-33f915a4fee7@gmx.com> (raw)
In-Reply-To: <20220523062636.GA29750@lst.de>
On 2022/5/23 14:26, Christoph Hellwig wrote:
> On Mon, May 23, 2022 at 08:07:01AM +0800, Qu Wenruo wrote:
>>
>> I checked the code, but still find the code in patch "btrfs: add new
>> read repair infrastructure" not that instinctive.
>>
>> - Why we bother different repair methods in btrfs_repair_one_mirror()?
>> In fact btrfs_repair_io_failure() can handle all profiles.
>
> Becasue btrfs_repair_io_failure can't handle multiple-page I/O. It
> is also is rather cumbersome because it bypassed the normal bio
> mapping. As a follow on I'd rather move it over to btrfs_map_bio
> with a special flag for the single mirror parity write rather than that
> hack.
In fact so far for all callers of btrfs_repair_io_failure(), we are
always handling things inside one stripe.
Thus we can easily enhance that function to handle multi page ranges.
Although a dedicated btrfs_map_bio() flags seems more generic and better.
>
>> Then why we go back to write the whole bio?
>
> Because the whole bio at this point is all the bad sectors. There
> is no point in writing only parts of the bio because that would leave
> corruption on disk.
>
>> The only reason I can think of is, we're still trying to do some
>> "optimization".
>>
>> But all our bio submission is already synchronous, I doubt such
>> "optimization" would make much difference.
>
> Can you explain what you mean here?
We wait for the read bio anyway, I doubt the batched write part is that
important.
If you really want, I can try to make the write part asynchronous, while
still keep the read part synchronous, and easier to read.
>
>>
>> - The bio truncation
>> This really looks like a bandage to address the checker pattern
>> corruption.
>> I doubt why not just do per-sector read/write like:
>
> Because now you wait for each I/O.
Yeah, that's the biggest performance drop here, that's definitely no doubt.
>
>> To me, the "optimization" of batched read/write is only relevant if we
>> have tons of corrupted sectors in a read bio, which I don't believe is a
>> hot path in real world anyway.
>
> It is is very easy low hanging fruit,
Unfortunately, this is not that a simple fruit, or I won't go full
bitmap way.
In your current version, the do {} while() loop iterates through all
mirrors.
But for the following case, we will hit problems thanks to RAID1C3 again:
Mirror 1 |X|X|X|X|
Mirror 2 |X| |X| |
Mirror 3 | |X| |X|
We hit mirror 1 initially, thus @initial_mirror is 1.
Then when we try mirror 2, since the first sector is still bad, we jump
to the next mirror.
For mirror 3, we fixed the first sector only. Then 2nd sector is still
from mirror 3 and didn't pass.
Now we have no more mirrors, and still return -EIO.
What saves our day is the VFS read retry, which will try read the range
sectors by sector, and got every sector fixed.
Unfortunatly we waste a lot of code to hack the bio size, but it still
doesn't work, and falls back to exactly the same sector-by-sector wait
behavior.
So my points still stand, if we want to do batched handling, either we
go bitmap or we give up.
Such hacky bandage seems to work at first glance and will pass your new
test cases, but it doesn't do it any better than sector-by-sector waiting.
(Forgot to mention, the new RAID1C3 test case may also be flawed, as any
read on other mirrors will cause read-repair, screwing up our later
retry, thus we must check pid first before doing any read.)
Thus it's a low hanging but rotten fruit.
Thanks,
Qu
> and actually pretty common for
> actual failures of components. In other words: single sector failures
> happen some times, usually due to corruption on the transfer wire. If
> actual corruption happens on the driver it will usually fail quite
> bit areas. These are rather rare these days as a lot of device shut
> down before returning bad data, but if it happens you'll rarely see
> just a single sector.
next prev parent reply other threads:[~2022-05-23 7:46 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
2022-05-22 11:47 ` [PATCH 1/8] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
2022-05-22 11:47 ` [PATCH 2/8] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
2022-05-23 0:38 ` Qu Wenruo
2022-05-24 7:24 ` Christoph Hellwig
2022-05-24 8:07 ` Qu Wenruo
2022-05-24 8:11 ` Christoph Hellwig
2022-05-25 16:20 ` David Sterba
2022-05-22 11:47 ` [PATCH 3/8] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
2022-05-22 11:47 ` [PATCH 4/8] btrfs: factor out a helper to end a single sector buffere I/O Christoph Hellwig
2022-05-25 14:54 ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 5/8] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
2022-05-25 14:57 ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 6/8] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
2022-05-25 14:32 ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 7/8] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
2022-05-26 12:58 ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio Christoph Hellwig
2022-05-22 12:21 ` Qu Wenruo
2022-05-22 12:31 ` Christoph Hellwig
2022-05-22 12:38 ` Qu Wenruo
2022-05-22 12:53 ` Christoph Hellwig
2022-05-23 0:07 ` Qu Wenruo
2022-05-23 6:26 ` Christoph Hellwig
2022-05-23 7:46 ` Qu Wenruo [this message]
2022-05-24 7:32 ` Christoph Hellwig
2022-05-24 8:04 ` Qu Wenruo
2022-05-24 8:21 ` Qu Wenruo
2022-05-24 12:08 ` Christoph Hellwig
2022-05-24 13:13 ` Qu Wenruo
2022-05-24 14:02 ` Qu Wenruo
2022-05-25 15:12 ` Nikolay Borisov
2022-05-25 22:46 ` Qu Wenruo
2022-05-26 12:58 ` Nikolay Borisov
2022-05-25 20:20 ` misc btrfs cleanups David Sterba
2022-05-27 15:20 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84b022dc-6310-1d52-b8e3-33f915a4fee7@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=hch@lst.de \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox