From: Yu Kuai <yukuai1@huaweicloud.com>
To: Christian Theune <ct@flyingcircus.io>, Yu Kuai <yukuai1@huaweicloud.com>
Cc: "John Stoffel" <john@stoffel.org>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
dm-devel@lists.linux.dev,
"Dragan Milivojević" <galileo@pkm-inc.com>,
"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files
Date: Mon, 4 Nov 2024 20:18:21 +0800 [thread overview]
Message-ID: <2b093abc-cd9a-0b84-bcba-baec689fa153@huaweicloud.com> (raw)
In-Reply-To: <5170f0d2-cb0f-2e0f-eb5e-31aa9d6ff65d@huawei.com>
Hi,
在 2024/11/04 19:40, Yu Kuai 写道:
> Hi,
>
> 在 2024/11/01 16:33, Christian Theune 写道:
>> I dug out a different one that goes back longer but even that one
>> seems like something was missing early on when I didn’t have the
>> serial console attached.
>>
>> I’m wondering whether this indicates an issue during initialization?
>> I’m going to reboot the machine and make sure i get the early logs
>> with those numbers.
>>
>> [ 405.347345] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(22301786792+8) 4294967259
>
> For this log, let's assume the firt start is from here.
>> [ 432.542465] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(22837701992+8) 4294967260
>> [ 432.542469] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(22837701992+8) 4294967261
>> [ 434.272964] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(22837701992+8) 4294967262
>> [ 434.273175] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(22837701992+8) 4294967263
>> [ 434.273189] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(22837701992+8) 4294967264
>> [ 434.273285] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(22837701992+8) 4294967265
>> [ 434.274063] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(22837701992+8) 4294967264
>> [ 434.274066] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(22837701992+8) 4294967263
>> [ 434.274070] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(22837701992+8) 4294967262
>> [ 434.274073] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(22837701992+8) 4294967261
>> [ 434.274078] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(22837701992+8) 4294967260
>> [ 434.274083] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(22837701992+8) 4294967259
>> [ 434.276609] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(23374951848+8) 4294967260
>> [ 434.278939] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(23374951848+8) 4294967261
>> [ 464.922354] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(23374951848+8) 4294967260
>> [ 464.931833] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(23374951848+8) 4294967259
>> [ 466.964557] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(23912715112+8) 4294967260
>> [ 466.964616] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(23912715112+8) 4294967261
>> [ 474.399930] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(23912715112+8) 4294967262
>> [ 474.451451] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(23912715112+8) 4294967263
>> [ 489.447079] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(23912715112+8) 4294967262
>> [ 489.456574] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(23912715112+8) 4294967261
>> [ 489.466069] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(23912715112+8) 4294967260
>> [ 489.475565] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(23912715112+8) 4294967259
>> [ 491.235517] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(24448073512+8) 4294967260
>> [ 491.235602] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(24448073512+8) 4294967261
>> [ 498.153108] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(24716445096+8) 4294967262
>> [ 498.156307] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(24716445096+8) 4294967263
>> [ 530.332619] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(24716445096+8) 4294967262
>> [ 530.342110] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(24716445096+8) 4294967261
>> [ 530.351595] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(24716445096+8) 4294967260
>> [ 530.361082] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(24716445096+8) 4294967259
>> [ 535.176774] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(24985208424+8) 4294967260
>> [ 549.125326] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(24985208424+8) 4294967259
>
> Then until now, everything is good, start and end is balanced for this
> stripe head.
>> [ 549.635782] __add_stripe_bio: md127: start
>> ff2721beec8c2fa0(25521770024+8) 4294967261
>> [ 590.875593] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(25521770024+8) 4294967260
>> [ 590.885081] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(25521770024+8) 4294967259
>> [ 596.973863] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(26057037928+8) 4294967263
>> [ 596.973866] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(26057037928+8) 4294967262
>> [ 596.973869] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(26057037928+8) 4294967261
>> [ 596.973871] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(26057037928+8) 4294967260
>> [ 596.973881] handle_stripe_clean_event: md127: end
>> ff2721beec8c2fa0(26057037928+8) 4294967259
>
> Then, oops, this 'sh' start just once here, and end lots of times. It's
> unlikely that those end are corresponding to the log much earlier, so
> I'm almost convinced that this problem is due to unbalanced start and
> end. And the huge number is due to underflow.
>
> Let me dig more. :)
I think I found a problem by code review, can you test the following
patch? (Noted this is still from latest mainline).
Thanks,
Kuai
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index dc2ea636d173..04f32173839a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4042,6 +4042,8 @@ static void handle_stripe_clean_event(struct
r5conf *conf,
test_bit(R5_SkipCopy, &dev->flags))) {
/* We can return any write requests */
struct bio *wbi, *wbi2;
+ bool written = false;
+
pr_debug("Return write for disc %d\n", i);
if (test_and_clear_bit(R5_Discard,
&dev->flags))
clear_bit(R5_UPTODATE,
&dev->flags);
@@ -4054,6 +4056,9 @@ static void handle_stripe_clean_event(struct
r5conf *conf,
dev->page = dev->orig_page;
wbi = dev->written;
dev->written = NULL;
+ if (wbi)
+ written = true;
+
while (wbi && wbi->bi_iter.bi_sector <
dev->sector +
RAID5_STRIPE_SECTORS(conf)) {
wbi2 = r5_next_bio(conf, wbi,
dev->sector);
@@ -4061,10 +4066,13 @@ static void handle_stripe_clean_event(struct
r5conf *conf,
bio_endio(wbi);
wbi = wbi2;
}
-
conf->mddev->bitmap_ops->endwrite(conf->mddev,
- sh->sector,
RAID5_STRIPE_SECTORS(conf),
- !test_bit(STRIPE_DEGRADED,
&sh->state),
- false);
+
+ if (written)
+
conf->mddev->bitmap_ops->endwrite(conf->mddev,
+ sh->sector,
RAID5_STRIPE_SECTORS(conf),
+
!test_bit(STRIPE_DEGRADED, &sh->state),
+ false);
+
if (head_sh->batch_head) {
sh =
list_first_entry(&sh->batch_list,
struct
stripe_head,
>
> Thanks,
> Kuai
>
next prev parent reply other threads:[~2024-11-04 12:18 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-06 14:10 PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files Christian Theune
2024-08-06 14:10 ` Christian Theune
2024-08-07 2:55 ` Yu Kuai
2024-08-07 5:31 ` Christian Theune
2024-08-07 6:46 ` Christian Theune
2024-08-07 8:59 ` Christian Theune
2024-08-07 21:05 ` John Stoffel
2024-08-08 1:33 ` Yu Kuai
2024-08-08 6:02 ` Christian Theune
2024-08-08 6:55 ` Yu Kuai
2024-08-08 7:06 ` Christian Theune
2024-08-08 8:53 ` Christian Theune
2024-08-09 1:13 ` Yu Kuai
2024-08-09 6:10 ` Christian Theune
2024-08-09 22:51 ` John Stoffel
2024-08-12 6:58 ` Christian Theune
2024-08-12 18:37 ` John Stoffel
2024-08-14 8:53 ` Christian Theune
2024-08-15 6:19 ` Christian Theune
2024-08-15 10:03 ` Christian Theune
2024-08-15 11:14 ` Yu Kuai
2024-08-15 11:24 ` Christian Theune
2024-08-15 11:49 ` Yu Kuai
2024-10-22 15:02 ` Christian Theune
2024-10-23 1:13 ` Yu Kuai
2024-10-23 6:03 ` Christian Theune
2024-10-23 17:50 ` Christian Theune
2024-10-25 8:39 ` Christian Theune
2024-10-25 13:31 ` Dragan Milivojević
2024-10-25 14:02 ` Christian Theune
2024-10-26 5:37 ` Christian Theune
2024-10-26 9:07 ` Yu Kuai
2024-10-26 11:51 ` Christian Theune
2024-10-26 12:07 ` Christian Theune
2024-10-26 12:11 ` Christian Theune
2024-10-30 1:25 ` Yu Kuai
2024-10-30 6:29 ` Christian Theune
2024-10-31 7:48 ` Yu Kuai
2024-10-31 8:04 ` Christian Theune
2024-10-31 15:07 ` Christian Theune
2024-10-31 19:46 ` Christian Theune
2024-10-31 20:33 ` John Stoffel
2024-11-01 2:02 ` Yu Kuai
2024-11-01 7:56 ` Christian Theune
2024-11-01 8:33 ` Christian Theune
2024-11-03 15:54 ` Christian Theune
2024-11-03 16:16 ` Dragan Milivojević
2024-11-04 11:29 ` Yu Kuai
2024-11-04 11:51 ` Christian Theune
2024-11-04 12:30 ` Yu Kuai
2024-11-04 11:40 ` Yu Kuai
2024-11-04 12:18 ` Yu Kuai [this message]
2024-11-04 14:45 ` Christian Theune
2024-11-04 20:04 ` Christian Theune
2024-11-05 1:20 ` Yu Kuai
2024-11-05 6:23 ` Christian Theune
2024-11-05 10:15 ` Christian Theune
2024-11-06 6:35 ` Yu Kuai
2024-11-06 6:40 ` Christian Theune
2024-11-07 7:55 ` Yu Kuai
2024-11-07 8:01 ` Yu Kuai
2024-11-09 11:35 ` Xiao Ni
2024-11-11 2:25 ` Yu Kuai
2024-11-11 8:00 ` Christian Theune
2024-11-11 14:34 ` Christian Theune
2024-11-12 6:57 ` Christian Theune
2024-11-14 15:07 ` Christian Theune
2024-11-15 8:07 ` Xiao Ni
2024-11-15 8:44 ` Christian Theune
2024-11-15 10:11 ` Xiao Ni
2024-11-15 11:06 ` Christian Theune
2024-12-10 8:33 ` Christian Theune
2024-12-16 13:25 ` Christian Theune
2024-12-16 13:36 ` Yu Kuai
2024-12-16 14:18 ` Christian Theune
2025-01-20 9:19 ` Christian Theune
2025-01-24 6:22 ` Christian Theune
2025-01-24 6:35 ` Yu Kuai
2025-01-24 6:38 ` Christian Theune
2024-08-15 15:53 ` John Stoffel
2024-08-15 19:13 ` Christian Theune
2024-08-26 14:38 ` Christian Theune
2024-08-08 14:23 ` John Stoffel
2024-08-19 19:12 ` tihmstar
2024-08-19 21:05 ` John Stoffel
2024-08-24 16:56 ` tihmstar
2024-08-24 18:12 ` Dragan Milivojević
2024-08-27 1:27 ` John Stoffel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2b093abc-cd9a-0b84-bcba-baec689fa153@huaweicloud.com \
--to=yukuai1@huaweicloud.com \
--cc=ct@flyingcircus.io \
--cc=dm-devel@lists.linux.dev \
--cc=galileo@pkm-inc.com \
--cc=john@stoffel.org \
--cc=linux-raid@vger.kernel.org \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox