Re: Filesystem corruption when adding a new RAID device (delayed-resync, write-mostly)

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: Paul E Luse <paul.e.luse@linux.intel.com>
To: "Mateusz Jończyk" <mat.jonczyk@o2.pl>
Cc: Yu Kuai <yukuai3@huawei.com>,
	linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
	Song Liu <song@kernel.org>,
	regressions@lists.linux.dev
Subject: Re: Filesystem corruption when adding a new RAID device (delayed-resync, write-mostly)
Date: Wed, 24 Jul 2024 14:19:06 -0700	[thread overview]
Message-ID: <20240724141906.10b4fc4e@peluse-desk5> (raw)
In-Reply-To: <f28f9eec-d318-46e2-b2a1-430c9302ba43@o2.pl>

On Wed, 24 Jul 2024 22:35:49 +0200
Mateusz Jończyk <mat.jonczyk@o2.pl> wrote:

> W dniu 22.07.2024 o 07:39, Mateusz Jończyk pisze:
> > W dniu 20.07.2024 o 16:47, Mateusz Jończyk pisze:
> >> Hello,
> >>
> >> In my laptop, I used to have two RAID1 arrays on top of NVMe and
> >> SATA SSD drives: /dev/md0 for /boot (not partitioned), /dev/md1
> >> for remaining data (LUKS
> >> + LVM + ext4). For performance, I have marked the RAID component
> >> device for /dev/md1 on the SATA SSD drive write-mostly, which
> >> "means that the 'md' driver will avoid reading from these devices
> >> if at all possible" (man mdadm).
> >>
> >> Recently, the NVMe drive started having problems (PCI AER errors
> >> and the controller disappearing), so I removed it from the arrays
> >> and wiped it. However, I have reseated the drive in the M.2 socket
> >> and this apparently fixed it (verified with tests).
> >>
> >>     $ cat /proc/mdstat
> >>     Personalities : [raid1] [linear] [multipath] [raid0] [raid6]
> >> [raid5] [raid4] [raid10] md1 : active raid1 sdb5[1](W)
> >>           471727104 blocks super 1.2 [2/1] [_U]
> >>           bitmap: 4/4 pages [16KB], 65536KB chunk
> >>
> >>     md2 : active (auto-read-only) raid1 sdb6[3](W) sda1[2]
> >>           3142656 blocks super 1.2 [2/2] [UU]
> >>           bitmap: 0/1 pages [0KB], 65536KB chunk
> >>
> >>     md0 : active raid1 sdb4[3]
> >>           2094080 blocks super 1.2 [2/1] [_U]
> >>          
> >>     unused devices: <none>
> >>
> >> (md2 was used just for testing, ignore it).
> >>
> >> Today, I have tried to add the drive back to the arrays by using a
> >> script that executed in quick succession:
> >>
> >>     mdadm /dev/md0 --add --readwrite /dev/nvme0n1p2
> >>     mdadm /dev/md1 --add --readwrite /dev/nvme0n1p3
> >>
> >> This was on Linux 6.10.0, patched with my previous patch:
> >>
> >>     https://lore.kernel.org/linux-raid/20240711202316.10775-1-mat.jonczyk@o2.pl/
> >>
> >> (which fixed a regression in the kernel and allows it to start
> >> /dev/md1 with a single drive in write-mostly mode).
> >> In the background, I was running "rdiff-backup --compare" that was
> >> comparing data between my array contents and a backup attached via
> >> USB.
> >>
> >> This, however resulted in mayhem - I was unable to start any
> >> program with an input-output error, etc. I used SysRQ + C to save
> >> a kernel log:
> >>
> > Hello,
> >
> > It is possible that my second SSD has some problems and high read
> > activity during RAID resync triggered it. Reads from that drive are
> > now very slow (between 10 - 30 MB/s) and this suggests that
> > something is not OK.
> 
> Hello,
> 
> Unfortunately, hardware failure seems not to be the case.
> 
> I did test it again on 6.10, twice, and in both cases I got
> filesystem corruption (but not as severe).
> 
> On Linux 6.1.96 it seems to be working well (also did two tries).
> 
> Please note: in my tests, I was using a RAID component device with
> a write-mostly bit set. This setup does not work on 6.9+ out of the
> box and requires the following patch:
> 
> commit 36a5c03f23271 ("md/raid1: set max_sectors during early return
> from choose_slow_rdev()")
> 
> that is in master now.
> 
> It is also heading into stable, which I'm going to interrupt.

Hi Mateusz,

I'm pretty interested in what is happening here especially as it
relates to write-mostly.  Couple of questions for you:

1) Are you able to find a simpler reproduction for this, for example
without mixing SATA and NVMe.  Maybe just using two known good NVMe
SSDs and follow your steps to repro?

2) I don't fully understand your last two statements, maybe you can
clarify?  With your max_sectors patch does it pass or fail?  If pass,
what do mean by "I'm going to interrupt"? It sounds like you mean the
patch doesn't work and you are trying to stop it??

thanks
Paul

> 
> Greetings,
> Mateusz
> 
>

next prev parent reply	other threads:[~2024-07-24 21:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-20 14:47 Filesystem corruption when adding a new device (delayed-resync, write-mostly) Mateusz Jończyk
2024-07-22  5:39 ` Mateusz Jończyk
2024-07-24 20:35   ` Filesystem corruption when adding a new RAID " Mateusz Jończyk
2024-07-24 21:19     ` Paul E Luse [this message]
2024-07-25  7:15       ` Mateusz Jończyk
2024-07-25 14:27         ` Paul E Luse
2024-07-28 10:30           ` [REGRESSION] " Mateusz Jończyk
2024-07-30 20:35             ` Mateusz Jończyk
2024-07-31  1:10               ` Yu Kuai
2024-07-28 10:36 ` [PATCH] [DEBUG] md/raid1: check recovery_offset in raid1_check_read_range Mateusz Jończyk
2024-07-29  1:30   ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240724141906.10b4fc4e@peluse-desk5 \
    --to=paul.e.luse@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=mat.jonczyk@o2.pl \
    --cc=regressions@lists.linux.dev \
    --cc=song@kernel.org \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox