From: Kai Stian Olstad <btrfs+list@olstad.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: How to fix "BTRFS error (device dm-3): error writing primary super block to device 1"?
Date: Sun, 13 Apr 2025 10:14:03 +0200 [thread overview]
Message-ID: <0098f9689655ea11f9f0913f2b797201@olstad.com> (raw)
In-Reply-To: <c15e6edd-0bbb-4670-a4de-db500080601e@gmx.com>
On 12.04.2025 05:15, Qu Wenruo wrote:
> 在 2025/4/12 10:32, Kai Stian Olstad 写道:
>> On 12.04.2025 02:43, Qu Wenruo wrote:
>>> 在 2025/4/12 09:59, Kai Stian Olstad 写道:
>>>> On 12.04.2025 00:10, Qu Wenruo wrote:
>>>>> 在 2025/4/12 01:18, Kai Stian Olstad 写道:
>>>>>> Kubuntu 24.04
>>>>>> Kernel 6.8.0-57-generic
>>>>>>
>>>>>> 2 day ago I got a sector error on one of the BTRFS disk
>>>>>>
>>>>>> $ journalctl -k -S 2025-04-09 | grep -A 20 mpt3sas_cm0
>>>>>> Apr 09 03:16:26 cb kernel: mpt3sas_cm0: log_info(0x31080000):
>>>>>> originator(PL), code(0x08), sub_code(0x0000)
>>>>>> Apr 09 03:16:26 cb kernel: mpt3sas_cm0: log_info(0x31080000):
>>>>>> originator(PL), code(0x08), sub_code(0x0000)
>>>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 FAILED
>>>>>> Result:
>>>>>> hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=6s
>>>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 Sense Key :
>>>>>> Illegal Request [current]
>>>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 Add. Sense:
>>>>>> Logical block address out of range
>>>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 CDB:
>>>>>> Write(16)
>>>>>> 8a 08 00 00 00 00 00 00 10 80 00 00 00 08 00 00
>>>>>> Apr 09 03:16:26 cb kernel: critical target error, dev sdd, sector
>>>>>> 4224 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
>>>>>
>>>>> This error is completely from the lower layer (the block device).
>>>>>
>>>>> Btrfs nor the LUKS upon the disk can do anything to it.
>>>>
>>>> Thank you for the response.
>>>>
>>>> This disk support scterc
>>>>
>>>> $ sudo smartctl -l scterc /dev/sdd
>>>> SCT Error Recovery Control:
>>>> Read: 70 (7.0 seconds)
>>>> Write: 70 (7.0 seconds)
>>>>
>>>> Doesn't that mean that the disk gives up after 7 seconds, and then
>>>> the
>>>> sector i mapped to a spare.
>>>> So if Btrfs does a write to the sector again it will be written to
>>>> the
>>>> spare?
>>>>
>>>> I've experienced numerous sector errors throughout the years with
>>>> mdadm
>>>> and they have been fixed with a check.
>>>> Also a few with Btrfs I think, but they have been fixed
>>>> automatically.
>>>
>>> Whatever the feature is, it's block device driver's behavior.
>>>
>>> Btrfs only errors out because the disk reported the write failed.
>>>
>>> For the detailed reason you should check these lines:
>>>
>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 FAILED Result:
>>> hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=6s
>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 Sense Key :
>>> Illegal Request [current]
>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 Add. Sense:
>>> Logical block address out of range
>>
>> I'll check them but this is what I usually sees when a disk have a
>> sector error.
>>
>>
>>>> So why not this time?
>>>> To me this looks like an ordinary faulty sector that can be "fixed"
>>>> with
>>>> a write?
>>>>
>>> I'm not sure what ever the "SCT Error recovery control" feature is,
>>> but
>>> if it is designed to re-map a write, it should not return -EIO for
>>> the
>>> initial write failure, but OK as long as eventually the write
>>> succeeded.
>>>
>>> It should not require any upper layer to do any extra work.
>>>
>>> But since the write eventually failed, there is nothing upper layer
>>> can
>>> do, unless the dm or fs layer has some extra recovery mechanism.
>>
>> Now I'm confused, I'm running RAID1 an only one disk has/had 1 sector
>> failure.
>> Shouldn't Btrfs manage to to write this data, it should exist on one
>> of
>> the other drives because of RAID1?
>> And shouldn't a scrub fix it?
>
> Sorry, I finally got your concern that, it's not about the initial
> write
> failure, but the future errors messages.
>
> It turns out to be a bug in the older kernels, that after one super
> block write back error, the folio keeps its error flag without clearing
> it up, thus it always shows an error message.
>
> And since it's RAID1, btrfs continues the fs (thus your fs is still
> running, not flipping into read-only).
>
> Scrub won't solve it because there is nothing to resolve, everything is
> fine, except the false warning messages.
>
>
> In upstream it's fixed by a rework patch, upstream commit bc00965dbff7
> ("btrfs: count super block write errors in device instead of tracking
> folio error state") fixes the bug by going a completely different path
> counting the super block write back errors.
>
> Unfortunately that commit is only in v6.10, and since it's not
> explicitly marked as a bug fix (even it indeed fixes a hidden bug),
> it's
> not backported to any older kernel. (BTW, 6.8 kernel is already EOL)
>
> Please update to v6.12 or newer LTS kernels.
>
> Or just unmount and remount the fs and pray no more super block
> writeback errors happen again...
Unfortunately Canonical is shipping 6.8 with there latest LTS release
and managing backports inhouse i guess.
But they have an option to upgrade to 6.11 with there Hardware
Enablement (HWE) program.
At least I get that patch upgrading to 6.11.
Thank you for your help Qu.
--
Kai Stian Olstad
prev parent reply other threads:[~2025-04-13 8:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-11 15:48 How to fix "BTRFS error (device dm-3): error writing primary super block to device 1"? Kai Stian Olstad
2025-04-11 22:10 ` Qu Wenruo
2025-04-12 0:29 ` Kai Stian Olstad
2025-04-12 0:43 ` Qu Wenruo
2025-04-12 1:02 ` Kai Stian Olstad
2025-04-12 3:15 ` Qu Wenruo
2025-04-13 8:14 ` Kai Stian Olstad [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0098f9689655ea11f9f0913f2b797201@olstad.com \
--to=btrfs+list@olstad.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox