From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70668139584; Wed, 24 Jul 2024 21:19:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721855949; cv=none; b=N7CXcgYHSWjOwVXDY0QDW648Djel9+C68RvHjLmugo/nZoILV02vTB1PBN31TSD9ixmRa2+iSogumoqusm5ljoDimioO7S8AYa3AhOG/kiriTgXl5zcwMr4z8rD4EaA0GGEJgfiWTYGbVE/3bDUIzu5sTQabGV9Tl1uIJrSXhrU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721855949; c=relaxed/simple; bh=Rf3ofJ44AlVV1dPimKk0+4Py8+cmXRDlHMG27neCr7U=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=e92uiH/53gnCi5umyrI/Zm2eaPLyzZYEwa7y82FO+rf/wAc21wDNRisKjJmGGbVnblSViYWQ86Vc3wRisEMrzqrJbm7FWElTUxTq7Z3nS33pAAGph2LjmvzuVIPjait57evK4k23Rf4S65euzjpDV0EAIq3R0uF28X6CR6L26Kk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=korfD63c; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="korfD63c" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721855949; x=1753391949; h=date:from:to:cc:subject:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Rf3ofJ44AlVV1dPimKk0+4Py8+cmXRDlHMG27neCr7U=; b=korfD63cqrTXtiJcomf41tDGzdwMENfcZ5jCiA11koRUXtR23WdJwZdp z1V1Ur2LilXzrjQtI+JOsr3jv/X2vgEWwLgkUZifOxQ3SidR2f21TcVEt li46PDIMANKZZfeoAnhI0rfTryoQQ3MMcIVx5OhC62utPwYEptnQKHx5Q SH/8sPnsNU0l1AIKqBSOO54ITwZ+1z7gHo4SKLkpQIWHmXuz1QqlxPBo7 25fk6tXBe/o5JqOjN3t2PJpqiL774EhDsWAN/0AlaTIZhzyvZOkwTeZk6 01PrrphS+szqznlb5SDG+tqoLnZa7R8SZJev0TJBY7grZQPSSOtogtpsQ g==; X-CSE-ConnectionGUID: 5AHSCSZCSEGAWPtuGzu9xQ== X-CSE-MsgGUID: QgQhe6P+Qg2916sTbznuXg== X-IronPort-AV: E=McAfee;i="6700,10204,11143"; a="30166462" X-IronPort-AV: E=Sophos;i="6.09,233,1716274800"; d="scan'208";a="30166462" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jul 2024 14:19:08 -0700 X-CSE-ConnectionGUID: D6nXxjbdS+CKaipP0Gibtg== X-CSE-MsgGUID: aJivT/zMQHGia5vDZSidvA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,233,1716274800"; d="scan'208";a="75940094" Received: from srpatil1-mobl.amr.corp.intel.com (HELO peluse-desk5) ([10.212.62.180]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jul 2024 14:19:08 -0700 Date: Wed, 24 Jul 2024 14:19:06 -0700 From: Paul E Luse To: Mateusz =?UTF-8?B?Sm/FhGN6eWs=?= Cc: Yu Kuai , linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org, Song Liu , regressions@lists.linux.dev Subject: Re: Filesystem corruption when adding a new RAID device (delayed-resync, write-mostly) Message-ID: <20240724141906.10b4fc4e@peluse-desk5> In-Reply-To: References: <9952f532-2554-44bf-b906-4880b2e88e3a@o2.pl> X-Mailer: Claws Mail 4.2.0 (GTK 3.24.41; aarch64-redhat-linux-gnu) Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, 24 Jul 2024 22:35:49 +0200 Mateusz Jo=C5=84czyk wrote: > W dniu 22.07.2024 o=C2=A007:39, Mateusz Jo=C5=84czyk pisze: > > W dniu 20.07.2024 o=C2=A016:47, Mateusz Jo=C5=84czyk pisze: > >> Hello, > >> > >> In my laptop, I used to have two RAID1 arrays on top of NVMe and > >> SATA SSD drives: /dev/md0 for /boot (not partitioned), /dev/md1 > >> for remaining data (LUKS > >> + LVM + ext4). For performance, I have marked the RAID component > >> device for /dev/md1 on the SATA SSD drive write-mostly, which > >> "means that the 'md' driver will avoid reading from these devices > >> if at all possible" (man mdadm). > >> > >> Recently, the NVMe drive started having problems (PCI AER errors > >> and the controller disappearing), so I removed it from the arrays > >> and wiped it. However, I have reseated the drive in the M.2 socket > >> and this apparently fixed it (verified with tests). > >> > >> =C2=A0=C2=A0 =C2=A0$ cat /proc/mdstat > >> =C2=A0=C2=A0 =C2=A0Personalities : [raid1] [linear] [multipath] [raid0= ] [raid6] > >> [raid5] [raid4] [raid10] md1 : active raid1 sdb5[1](W) > >> =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 471727104 blocks sup= er 1.2 [2/1] [_U] > >> =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bitmap: 4/4 pages [1= 6KB], 65536KB chunk > >> > >> =C2=A0=C2=A0 =C2=A0md2 : active (auto-read-only) raid1 sdb6[3](W) sda1= [2] > >> =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3142656 blocks super= 1.2 [2/2] [UU] > >> =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bitmap: 0/1 pages [0= KB], 65536KB chunk > >> > >> =C2=A0=C2=A0 =C2=A0md0 : active raid1 sdb4[3] > >> =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2094080 blocks super= 1.2 [2/1] [_U] > >> =C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > >> =C2=A0=C2=A0 =C2=A0unused devices: > >> > >> (md2 was used just for testing, ignore it). > >> > >> Today, I have tried to add the drive back to the arrays by using a > >> script that executed in quick succession: > >> > >> =C2=A0=C2=A0 =C2=A0mdadm /dev/md0 --add --readwrite /dev/nvme0n1p2 > >> =C2=A0=C2=A0 =C2=A0mdadm /dev/md1 --add --readwrite /dev/nvme0n1p3 > >> > >> This was on Linux 6.10.0, patched with my previous patch: > >> > >> =C2=A0=C2=A0=C2=A0 https://lore.kernel.org/linux-raid/20240711202316.1= 0775-1-mat.jonczyk@o2.pl/ > >> > >> (which fixed a regression in the kernel and allows it to start > >> /dev/md1 with a single drive in write-mostly mode). > >> In the background, I was running "rdiff-backup --compare" that was > >> comparing data between my array contents and a backup attached via > >> USB. > >> > >> This, however resulted in mayhem - I was unable to start any > >> program with an input-output error, etc. I used SysRQ + C to save > >> a kernel log: > >> > > Hello, > > > > It is possible that my second SSD has some problems and high read > > activity during RAID resync triggered it. Reads from that drive are > > now very slow (between 10 - 30 MB/s) and this suggests that > > something is not OK. >=20 > Hello, >=20 > Unfortunately, hardware failure seems not to be the case. >=20 > I did test it again on 6.10, twice, and in both cases I got > filesystem corruption (but not as severe). >=20 > On Linux 6.1.96 it seems to be working well (also did two tries). >=20 > Please note: in my tests, I was using a RAID component device with > a write-mostly bit set. This setup does not work on 6.9+ out of the > box and requires the following patch: >=20 > commit 36a5c03f23271 ("md/raid1: set max_sectors during early return > from choose_slow_rdev()") >=20 > that is in master now. >=20 > It is also heading into stable, which I'm going to interrupt. Hi Mateusz, I'm pretty interested in what is happening here especially as it relates to write-mostly. Couple of questions for you: 1) Are you able to find a simpler reproduction for this, for example without mixing SATA and NVMe. Maybe just using two known good NVMe SSDs and follow your steps to repro? 2) I don't fully understand your last two statements, maybe you can clarify? With your max_sectors patch does it pass or fail? If pass, what do mean by "I'm going to interrupt"? It sounds like you mean the patch doesn't work and you are trying to stop it?? thanks Paul >=20 > Greetings, > Mateusz >=20 >=20