From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 588461B0F2A for ; Mon, 4 Nov 2024 11:40:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.187 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730720423; cv=none; b=R6+/ZnhmUHW48LFzl2lJQV7t4xsd0zanZP3BUPxU4MbTfsT40ouDlNUB9iTeiEo4JTFlhfLLR2Oi0JeEuf+zvkGhbAmaaOUEkH1KaO93HdNru8qGPoSbrmUfsu6g5G4ANWLMuMr8y9Kio26jbi4Lhui1LI0hbFuz0KvTaQ6w1M4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730720423; c=relaxed/simple; bh=l3df35cFgEjpPepf6/2le8TYaIAuNL1aSxO4+RxqZYs=; h=Subject:To:CC:References:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type; b=LIQQpfzwFvnq7zqfpK0oNvN4jPSSOOXbKDZoEIKy0p32flHV1+QjJdEiZ/+nu8d8hhysBzFsISrNEyqSy/u1C7tNCLyFXtN/YtfvUHdyXth23DvUB5TbTC2jQJfm8dBW0iG0qXoFxgTWiyzGkBE12gHJ85mqqs1B2OwU7sxtjTE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4XhqGv4zgtzyVH7; Mon, 4 Nov 2024 19:38:31 +0800 (CST) Received: from kwepemk500007.china.huawei.com (unknown [7.202.194.92]) by mail.maildlp.com (Postfix) with ESMTPS id 3DD241402C8; Mon, 4 Nov 2024 19:40:16 +0800 (CST) Received: from [10.174.176.73] (10.174.176.73) by kwepemk500007.china.huawei.com (7.202.194.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 4 Nov 2024 19:40:15 +0800 Subject: Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files To: Christian Theune , Yu Kuai CC: John Stoffel , "linux-raid@vger.kernel.org" , , =?UTF-8?Q?Dragan_Milivojevi=c4=87?= References: <2d85e9ab-1d0f-70a1-fab2-1e469764ef28@huaweicloud.com> <3CF4B28B-52D7-414E-96A1-FDFA5A5EF172@flyingcircus.io> <3DB33849-56C5-4C5C-BF56-F54205BEFCF2@flyingcircus.io> <1f2c74f4-8ba9-1a9f-0c11-018a25e785e5@huaweicloud.com> <22A202B1-A802-406F-8F38-F4F486A92F81@flyingcircus.io> <45d44ed5-da7c-6480-9143-f611385b2e92@huaweicloud.com> <9C03DED0-3A6A-42F8-B935-6EB500F8BCE2@flyingcircus.io> <78517565-B1AB-4441-B4F8-EB380E98EB0F@flyingcircus.io> <26403.59789.480428.418012@quad.stoffel.home> <5fb0a6f0-066d-c490-3010-8a047aae2c29@huaweicloud.com> From: Yu Kuai Message-ID: <5170f0d2-cb0f-2e0f-eb5e-31aa9d6ff65d@huawei.com> Date: Mon, 4 Nov 2024 19:40:14 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemk500007.china.huawei.com (7.202.194.92) Hi, 在 2024/11/01 16:33, Christian Theune 写道: > I dug out a different one that goes back longer but even that one seems like something was missing early on when I didn’t have the serial console attached. > > I’m wondering whether this indicates an issue during initialization? I’m going to reboot the machine and make sure i get the early logs with those numbers. > > [ 405.347345] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(22301786792+8) 4294967259 For this log, let's assume the firt start is from here. > [ 432.542465] __add_stripe_bio: md127: start ff2721beec8c2fa0(22837701992+8) 4294967260 > [ 432.542469] __add_stripe_bio: md127: start ff2721beec8c2fa0(22837701992+8) 4294967261 > [ 434.272964] __add_stripe_bio: md127: start ff2721beec8c2fa0(22837701992+8) 4294967262 > [ 434.273175] __add_stripe_bio: md127: start ff2721beec8c2fa0(22837701992+8) 4294967263 > [ 434.273189] __add_stripe_bio: md127: start ff2721beec8c2fa0(22837701992+8) 4294967264 > [ 434.273285] __add_stripe_bio: md127: start ff2721beec8c2fa0(22837701992+8) 4294967265 > [ 434.274063] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(22837701992+8) 4294967264 > [ 434.274066] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(22837701992+8) 4294967263 > [ 434.274070] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(22837701992+8) 4294967262 > [ 434.274073] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(22837701992+8) 4294967261 > [ 434.274078] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(22837701992+8) 4294967260 > [ 434.274083] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(22837701992+8) 4294967259 > [ 434.276609] __add_stripe_bio: md127: start ff2721beec8c2fa0(23374951848+8) 4294967260 > [ 434.278939] __add_stripe_bio: md127: start ff2721beec8c2fa0(23374951848+8) 4294967261 > [ 464.922354] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(23374951848+8) 4294967260 > [ 464.931833] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(23374951848+8) 4294967259 > [ 466.964557] __add_stripe_bio: md127: start ff2721beec8c2fa0(23912715112+8) 4294967260 > [ 466.964616] __add_stripe_bio: md127: start ff2721beec8c2fa0(23912715112+8) 4294967261 > [ 474.399930] __add_stripe_bio: md127: start ff2721beec8c2fa0(23912715112+8) 4294967262 > [ 474.451451] __add_stripe_bio: md127: start ff2721beec8c2fa0(23912715112+8) 4294967263 > [ 489.447079] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(23912715112+8) 4294967262 > [ 489.456574] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(23912715112+8) 4294967261 > [ 489.466069] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(23912715112+8) 4294967260 > [ 489.475565] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(23912715112+8) 4294967259 > [ 491.235517] __add_stripe_bio: md127: start ff2721beec8c2fa0(24448073512+8) 4294967260 > [ 491.235602] __add_stripe_bio: md127: start ff2721beec8c2fa0(24448073512+8) 4294967261 > [ 498.153108] __add_stripe_bio: md127: start ff2721beec8c2fa0(24716445096+8) 4294967262 > [ 498.156307] __add_stripe_bio: md127: start ff2721beec8c2fa0(24716445096+8) 4294967263 > [ 530.332619] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(24716445096+8) 4294967262 > [ 530.342110] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(24716445096+8) 4294967261 > [ 530.351595] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(24716445096+8) 4294967260 > [ 530.361082] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(24716445096+8) 4294967259 > [ 535.176774] __add_stripe_bio: md127: start ff2721beec8c2fa0(24985208424+8) 4294967260 > [ 549.125326] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(24985208424+8) 4294967259 Then until now, everything is good, start and end is balanced for this stripe head. > [ 549.635782] __add_stripe_bio: md127: start ff2721beec8c2fa0(25521770024+8) 4294967261 > [ 590.875593] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(25521770024+8) 4294967260 > [ 590.885081] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(25521770024+8) 4294967259 > [ 596.973863] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(26057037928+8) 4294967263 > [ 596.973866] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(26057037928+8) 4294967262 > [ 596.973869] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(26057037928+8) 4294967261 > [ 596.973871] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(26057037928+8) 4294967260 > [ 596.973881] handle_stripe_clean_event: md127: end ff2721beec8c2fa0(26057037928+8) 4294967259 Then, oops, this 'sh' start just once here, and end lots of times. It's unlikely that those end are corresponding to the log much earlier, so I'm almost convinced that this problem is due to unbalanced start and end. And the huge number is due to underflow. Let me dig more. :) Thanks, Kuai