public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: dsterba@suse.cz
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/3] btrfs: introduce "abort=" groups for more strict error handling
Date: Sat, 23 Sep 2023 06:46:26 +0930	[thread overview]
Message-ID: <25c4f01f-a355-43b9-ab22-725353dc6380@suse.com> (raw)
In-Reply-To: <20230922145513.GF13697@twin.jikos.cz>



On 2023/9/23 00:25, David Sterba wrote:
> On Fri, Sep 22, 2023 at 12:25:18PM +0930, Qu Wenruo wrote:
>> During a very interesting (and weird) debugging session, it turns out
>> that btrfs will ignore a lot of write errors until we hit some critical
>> location, then btrfs started reacting, normally by aborting the
>> transaction.
>>
>> This can be problematic for two types of people:
>>
>> - Developers
>>    Sometimes we want to catch the earlies sign, continuing without any
>>    obvious errors (other than kernel error messages) can make debugging
>>    much harder.
>>
>> - Sysadmins who wants to catch problems early
>>    Dmesg is not really something users would check frequently, and even
>>    they check it, it may already be too late.
>>    Meanwhile if the fs flips read-only early it's not only noisy but also
>>    saves the context much better (more relevant dmesgs etc).
> 
> For sysadmins I think that the preferred way is to get events (like via
> the uevents interface) that can be monitored and then reacted to by some
> tools.

I think this is a future development idea, to have a generic way to do 
error reporting.

> 
>> On the other hand, I totally understand if just a single sector failed
>> to be write and we mark the whole fs read-only, it can be super
>> frustrating for regular end users, thus we can not make it the default
>> behavior.
> 
> I can't imagine a realistic scenario where a user would like this
> behaviour, one EIO takes down whole filesystem could make sense only for
> some testing environments.

I doubt, for some environment with expensive hardware, one may not even 
expect any -EIO for valid operations.
If that happens, it may mean bad firmware or bad hardware, neither is a 
good thing especially if they paid extra money for the fancy hardware or 
the support fee.

> 
>> So here we introduce a mount option group "abort=", and make the
>> following errors more noisy and abort early if specified by the user.
> 
> Default andswer for a new mount option is 'no', here we also have one
> that is probably doing the same, 'fatal_errors', so if you really want
> to do that by a mount option then please use this one.

Or I can go sysfs interface and put it under some debug directory.

> 
>> - Any super block write back failure
>>    Currently we're very loose on the super block writeback failure.
>>    The failure has to meet both conditions below:
>>    * The primary super block writeback failed
> 
> Doesn't this fail with flip to read-only?

Nope, just by itself is not enough to go read-only, as long as we have 
other devices.

If the primary super block writeback failed, it would only make 
write_dev_supers() to return error.
But inside write_all_supers(), error from write_dev_super() would only 
increase @total_errors, not directly erroring out.

> 
>>    * Total failed devices go beyond tolerance
>>      The tolerance is super high, num_devices - 1. To me this is
>>      too high, but I don't have a better idea yet.
> 
> Does this depend on the profile constraints?

Nope, it's profile independent.

The @max_errors inside write_all_supers() is purely determined by 
btrfs_super_num_devices() - 1.

> 
>>    This new "rescue=super" may be more frequently used considering how
>>    loose our existing tolerance is.
>>
>> - Any data writeback failure
>>    This is only for the data writeback at btrfs bio layer.
>>    This means, if a data sector is going to be written to a RAID1 chunk,
>>    and one mirror failed, we still consider the writeback succeeded.
>>
>> There would be another one for btrfs bio layer, but I have found
>> something weird in the code, thus it would only be introduced after I
>> solved the problem there, meanwhile we can discuss on the usefulness of
>> this patchset.
> 
> We can possibly enhance the error checking with additional knobs and
> checkpoints that will have to survive or detect specific testing, but as
> mount options it's not very flexible. We can possibly do it via sysfs or
> BPF but this may not be the proper interface anyway.

I think sysfs would be better, but not familiar enough with BPF to 
determine if it's any better or worse.

Thanks,
Qu

  reply	other threads:[~2023-09-22 21:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22  2:55 [PATCH 0/3] btrfs: introduce "abort=" groups for more strict error handling Qu Wenruo
2023-09-22  2:55 ` [PATCH 1/3] btrfs: explicitly mark BTRFS_MOUNT_ enum as 64bit Qu Wenruo
2023-09-22  2:55 ` [PATCH 2/3] btrfs: introduce "abort=super" mount option Qu Wenruo
2023-09-22  2:55 ` [PATCH 3/3] btrfs: introduce "abort=data" " Qu Wenruo
2023-09-22 14:55 ` [PATCH 0/3] btrfs: introduce "abort=" groups for more strict error handling David Sterba
2023-09-22 21:16   ` Qu Wenruo [this message]
2023-09-25 16:37     ` David Sterba
2023-09-25 21:14       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25c4f01f-a355-43b9-ab22-725353dc6380@suse.com \
    --to=wqu@suse.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox