public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Christoph Hellwig <hch@lst.de>, Chris Mason <clm@fb.com>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio
Date: Tue, 24 May 2022 09:32:16 +0200	[thread overview]
Message-ID: <20220524073216.GB26145@lst.de> (raw)
In-Reply-To: <84b022dc-6310-1d52-b8e3-33f915a4fee7@gmx.com>

On Mon, May 23, 2022 at 03:46:02PM +0800, Qu Wenruo wrote:
>> Becasue btrfs_repair_io_failure can't handle multiple-page I/O.  It
>> is also is rather cumbersome because it bypassed the normal bio
>> mapping.  As a follow on I'd rather move it over to btrfs_map_bio
>> with a special flag for the single mirror parity write rather than that
>> hack.
>
> In fact so far for all callers of btrfs_repair_io_failure(), we are
> always handling things inside one stripe.
>
> Thus we can easily enhance that function to handle multi page ranges.
>
> Although a dedicated btrfs_map_bio() flags seems more generic and better.

I did think of moving btrfs_repair_io_failure over to my new
infrastructure in fact, because it seems inherently possible.  Just
not the highest priority right now.

>> Because the whole bio at this point is all the bad sectors.  There
>> is no point in writing only parts of the bio because that would leave
>> corruption on disk.
>>
>>>    The only reason I can think of is, we're still trying to do some
>>>    "optimization".
>>>
>>>    But all our bio submission is already synchronous, I doubt such
>>>    "optimization" would make much difference.
>>
>> Can you explain what you mean here?
>
> We wait for the read bio anyway, I doubt the batched write part is that
> important.

I still don't understand the point.  Once we read more than a single
page, writing it back as a patch is completely trivial as shown by
this series.  Why would we not do it?

>
> If you really want, I can try to make the write part asynchronous, while
> still keep the read part synchronous, and easier to read.

Asynchronous writes gets us back into all the I/O completion handler
complexities, which was the whole reason to start on the synchronous
repair.

> In your current version, the do {} while() loop iterates through all
> mirrors.
>
> But for the following case, we will hit problems thanks to RAID1C3 again:
>
> Mirror 1 	|X|X|X|X|
> Mirror 2	|X| |X| |
> Mirror 3	| |X| |X|
>
> We hit mirror 1 initially, thus @initial_mirror is 1.
>
> Then when we try mirror 2, since the first sector is still bad, we jump
> to the next mirror.
>
> For mirror 3, we fixed the first sector only. Then 2nd sector is still
> from mirror 3 and didn't pass.
> Now we have no more mirrors, and still return -EIO.

Can you share a test case?  The code resets initial_mirror as soon as
we made any progress so that should not happen.

> So my points still stand, if we want to do batched handling, either we
> go bitmap or we give up.

Why?  For the very common case of clustered corruption or entirely
failing reads it is significantly faster than a simple synchronous
read of each sector, and also much better than the existing code.
It also is a lot less code than the existing code base, and (maybe
I'm biassed) a lot more readable.

Bitmaps only help you with randomly splattered corruption, which simply
is not how SSDs or hard drives actually fail.

> Such hacky bandage seems to work at first glance and will pass your new
> test cases, but it doesn't do it any better than sector-by-sector waiting.
> (Forgot to mention, the new RAID1C3 test case may also be flawed, as any
> read on other mirrors will cause read-repair, screwing up our later
> retry, thus we must check pid first before doing any read.)

The updated version uses the read from mirror loop from btrfs/142
that cleverly used bash internals to not issue the read if it would
be done using the wrong mirror.  Which also really nicely speeds up
the tests including the exist 140 and 141 ones.

  reply	other threads:[~2022-05-24  7:32 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-22 11:47 misc btrfs cleanups Christoph Hellwig
2022-05-22 11:47 ` [PATCH 1/8] btrfs: quit early if the fs has no RAID56 support for raid56 related checks Christoph Hellwig
2022-05-22 11:47 ` [PATCH 2/8] btrfs: introduce a pure data checksum checking helper Christoph Hellwig
2022-05-23  0:38   ` Qu Wenruo
2022-05-24  7:24     ` Christoph Hellwig
2022-05-24  8:07       ` Qu Wenruo
2022-05-24  8:11         ` Christoph Hellwig
2022-05-25 16:20           ` David Sterba
2022-05-22 11:47 ` [PATCH 3/8] btrfs: remove duplicated parameters from submit_data_read_repair() Christoph Hellwig
2022-05-22 11:47 ` [PATCH 4/8] btrfs: factor out a helper to end a single sector buffere I/O Christoph Hellwig
2022-05-25 14:54   ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 5/8] btrfs: refactor end_bio_extent_readpage Christoph Hellwig
2022-05-25 14:57   ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 6/8] btrfs: factor out a btrfs_csum_ptr helper Christoph Hellwig
2022-05-25 14:32   ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 7/8] btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks Christoph Hellwig
2022-05-26 12:58   ` Nikolay Borisov
2022-05-22 11:47 ` [PATCH 8/8] btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio Christoph Hellwig
2022-05-22 12:21   ` Qu Wenruo
2022-05-22 12:31     ` Christoph Hellwig
2022-05-22 12:38       ` Qu Wenruo
2022-05-22 12:53         ` Christoph Hellwig
2022-05-23  0:07           ` Qu Wenruo
2022-05-23  6:26             ` Christoph Hellwig
2022-05-23  7:46               ` Qu Wenruo
2022-05-24  7:32                 ` Christoph Hellwig [this message]
2022-05-24  8:04                   ` Qu Wenruo
2022-05-24  8:21                     ` Qu Wenruo
2022-05-24 12:08                       ` Christoph Hellwig
2022-05-24 13:13                         ` Qu Wenruo
2022-05-24 14:02                           ` Qu Wenruo
2022-05-25 15:12                           ` Nikolay Borisov
2022-05-25 22:46                             ` Qu Wenruo
2022-05-26 12:58   ` Nikolay Borisov
2022-05-25 20:20 ` misc btrfs cleanups David Sterba
2022-05-27 15:20   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220524073216.GB26145@lst.de \
    --to=hch@lst.de \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox