Re: How to fix "BTRFS error (device dm-3): error writing primary super block to device 1"?

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Kai Stian Olstad <btrfs+list@olstad.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: How to fix "BTRFS error (device dm-3): error writing primary super block to device 1"?
Date: Sun, 13 Apr 2025 10:14:03 +0200	[thread overview]
Message-ID: <0098f9689655ea11f9f0913f2b797201@olstad.com> (raw)
In-Reply-To: <c15e6edd-0bbb-4670-a4de-db500080601e@gmx.com>

On 12.04.2025 05:15, Qu Wenruo wrote:
> 在 2025/4/12 10:32, Kai Stian Olstad 写道:
>> On 12.04.2025 02:43, Qu Wenruo wrote:
>>> 在 2025/4/12 09:59, Kai Stian Olstad 写道:
>>>> On 12.04.2025 00:10, Qu Wenruo wrote:
>>>>> 在 2025/4/12 01:18, Kai Stian Olstad 写道:
>>>>>> Kubuntu 24.04
>>>>>> Kernel 6.8.0-57-generic
>>>>>> 
>>>>>> 2 day ago I got a sector error on one of the BTRFS disk
>>>>>> 
>>>>>> $ journalctl -k -S 2025-04-09 | grep -A 20 mpt3sas_cm0
>>>>>> Apr 09 03:16:26 cb kernel: mpt3sas_cm0: log_info(0x31080000):
>>>>>> originator(PL), code(0x08), sub_code(0x0000)
>>>>>> Apr 09 03:16:26 cb kernel: mpt3sas_cm0: log_info(0x31080000):
>>>>>> originator(PL), code(0x08), sub_code(0x0000)
>>>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 FAILED 
>>>>>> Result:
>>>>>> hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=6s
>>>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 Sense Key :
>>>>>> Illegal Request [current]
>>>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 Add. Sense:
>>>>>> Logical block address out of range
>>>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 CDB: 
>>>>>> Write(16)
>>>>>> 8a 08 00 00 00 00 00 00 10 80 00 00 00 08 00 00
>>>>>> Apr 09 03:16:26 cb kernel: critical target error, dev sdd, sector
>>>>>> 4224 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
>>>>> 
>>>>> This error is completely from the lower layer (the block device).
>>>>> 
>>>>> Btrfs nor the LUKS upon the disk can do anything to it.
>>>> 
>>>> Thank you for the response.
>>>> 
>>>> This disk support scterc
>>>> 
>>>> $ sudo smartctl -l scterc /dev/sdd
>>>> SCT Error Recovery Control:
>>>>             Read:     70 (7.0 seconds)
>>>>            Write:     70 (7.0 seconds)
>>>> 
>>>> Doesn't that mean that the disk gives up after 7 seconds, and then 
>>>> the
>>>> sector i mapped to a spare.
>>>> So if Btrfs does a write to the sector again it will be written to 
>>>> the
>>>> spare?
>>>> 
>>>> I've experienced numerous sector errors throughout the years with 
>>>> mdadm
>>>> and they have been fixed with a check.
>>>> Also a few with Btrfs I think, but they have been fixed 
>>>> automatically.
>>> 
>>> Whatever the feature is, it's block device driver's behavior.
>>> 
>>> Btrfs only errors out because the disk reported the write failed.
>>> 
>>> For the detailed reason you should check these lines:
>>> 
>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 FAILED Result:
>>> hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=6s
>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 Sense Key :
>>> Illegal Request [current]
>>>> Apr 09 03:16:26 cb kernel: sd 4:0:1:0: [sdd] tag#5552 Add. Sense:
>>> Logical block address out of range
>> 
>> I'll check them but this is what I usually sees when a disk have a
>> sector error.
>> 
>> 
>>>> So why not this time?
>>>> To me this looks like an ordinary faulty sector that can be "fixed" 
>>>> with
>>>> a write?
>>>> 
>>> I'm not sure what ever the "SCT Error recovery control" feature is, 
>>> but
>>> if it is designed to re-map a write, it should not return -EIO for 
>>> the
>>> initial write failure, but OK as long as eventually the write 
>>> succeeded.
>>> 
>>> It should not require any upper layer to do any extra work.
>>> 
>>> But since the write eventually failed, there is nothing upper layer 
>>> can
>>> do, unless the dm or fs layer has some extra recovery mechanism.
>> 
>> Now I'm confused, I'm running RAID1 an only one disk has/had 1 sector
>> failure.
>> Shouldn't Btrfs manage to to write this data, it should exist on one 
>> of
>> the other drives because of RAID1?
>> And shouldn't a scrub fix it?
> 
> Sorry, I finally got your concern that, it's not about the initial 
> write
> failure, but the future errors messages.
> 
> It turns out to be a bug in the older kernels, that after one super
> block write back error, the folio keeps its error flag without clearing
> it up, thus it always shows an error message.
> 
> And since it's RAID1, btrfs continues the fs (thus your fs is still
> running, not flipping into read-only).
> 
> Scrub won't solve it because there is nothing to resolve, everything is
> fine, except the false warning messages.
> 
> 
> In upstream it's fixed by a rework patch, upstream commit bc00965dbff7
> ("btrfs: count super block write errors in device instead of tracking
> folio error state") fixes the bug by going a completely different path
> counting the super block write back errors.
> 
> Unfortunately that commit is only in v6.10, and since it's not
> explicitly marked as a bug fix (even it indeed fixes a hidden bug), 
> it's
> not backported to any older kernel. (BTW, 6.8 kernel is already EOL)
> 
> Please update to v6.12 or newer LTS kernels.
> 
> Or just unmount and remount the fs and pray no more super block
> writeback errors happen again...

Unfortunately Canonical is shipping 6.8 with there latest LTS release 
and managing backports inhouse i guess.
But they have an option to upgrade to 6.11 with there Hardware 
Enablement (HWE) program.

At least I get that patch upgrading to 6.11.

Thank you for your help Qu.

-- 
Kai Stian Olstad

     prev parent reply	other threads:[~2025-04-13  8:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11 15:48 How to fix "BTRFS error (device dm-3): error writing primary super block to device 1"? Kai Stian Olstad
2025-04-11 22:10 ` Qu Wenruo
2025-04-12  0:29   ` Kai Stian Olstad
2025-04-12  0:43     ` Qu Wenruo
2025-04-12  1:02       ` Kai Stian Olstad
2025-04-12  3:15         ` Qu Wenruo
2025-04-13  8:14           ` Kai Stian Olstad [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0098f9689655ea11f9f0913f2b797201@olstad.com \
    --to=btrfs+list@olstad.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox