From: Christian Theune <ct@flyingcircus.io>
To: Yu Kuai <yukuai1@huaweicloud.com>
Cc: "John Stoffel" <john@stoffel.org>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
dm-devel@lists.linux.dev,
"Dragan Milivojević" <galileo@pkm-inc.com>,
"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files
Date: Thu, 31 Oct 2024 20:46:42 +0100 [thread overview]
Message-ID: <78517565-B1AB-4441-B4F8-EB380E98EB0F@flyingcircus.io> (raw)
In-Reply-To: <DD99A1F0-56E7-473D-B917-66885810233E@flyingcircus.io>
Hi,
the system has been running under stress for a while on 6.11.5 with the debugging. I have two observations so far:
1. The bitmap_counts are sometimes low and sometimes very high and intermingled like this:
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bf1db20000(29009381448+8) 7
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bf9d6fbf80(29009382168+8) 5
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721beec896f20(29009381928+8) 4294967242
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c108f26f20(29009374480+8) 3
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bfb083df40(29009381456+8) 7
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bfc92a2fa0(29009381936+8) 5
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c108f26f20(29009374480+8) 2
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721c074f8df40(29009381464+8) 7
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721bfa3b2df40(29009381944+8) 5
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c108f26f20(29009374480+8) 1
Oct 31 20:41:27 barbrady09 kernel: __add_stripe_bio: md127: start ff2721beec219fc0(29009381472+8) 4294967268
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c108f26f20(29009374480+8) 0
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967247
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967246
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967245
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967244
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967243
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967242
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721beec030000(29009374488+8) 4294967241
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 6
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 5
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 4
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 3
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 2
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 1
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721bf21496f20(29009374496+8) 0
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 6
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 5
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 4
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 3
Oct 31 20:41:27 barbrady09 kernel: handle_stripe_clean_event: md127: end ff2721c1aa216f20(29009374504+8) 2
Is the high number an indicator of something weird?
2. The printk seems expensive - it might be because there’s a serial console attached, I’m trying to detach it, but it seems things are stubborn in our config.
This results in constant high CPU of the md127_raid6 (full single core at 100% accounted as sys time of course) as well as intermittent RCU stalls on printk.
I’m wondering whether this will limit the ability to reproduce this …
I’ll keep the system running for a bit more and if it doesn’t lock up I’ll try harder to get rid of the serial console.
Christian
--
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
next prev parent reply other threads:[~2024-10-31 19:47 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-06 14:10 PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files Christian Theune
2024-08-06 14:10 ` Christian Theune
2024-08-07 2:55 ` Yu Kuai
2024-08-07 5:31 ` Christian Theune
2024-08-07 6:46 ` Christian Theune
2024-08-07 8:59 ` Christian Theune
2024-08-07 21:05 ` John Stoffel
2024-08-08 1:33 ` Yu Kuai
2024-08-08 6:02 ` Christian Theune
2024-08-08 6:55 ` Yu Kuai
2024-08-08 7:06 ` Christian Theune
2024-08-08 8:53 ` Christian Theune
2024-08-09 1:13 ` Yu Kuai
2024-08-09 6:10 ` Christian Theune
2024-08-09 22:51 ` John Stoffel
2024-08-12 6:58 ` Christian Theune
2024-08-12 18:37 ` John Stoffel
2024-08-14 8:53 ` Christian Theune
2024-08-15 6:19 ` Christian Theune
2024-08-15 10:03 ` Christian Theune
2024-08-15 11:14 ` Yu Kuai
2024-08-15 11:24 ` Christian Theune
2024-08-15 11:49 ` Yu Kuai
2024-10-22 15:02 ` Christian Theune
2024-10-23 1:13 ` Yu Kuai
2024-10-23 6:03 ` Christian Theune
2024-10-23 17:50 ` Christian Theune
2024-10-25 8:39 ` Christian Theune
2024-10-25 13:31 ` Dragan Milivojević
2024-10-25 14:02 ` Christian Theune
2024-10-26 5:37 ` Christian Theune
2024-10-26 9:07 ` Yu Kuai
2024-10-26 11:51 ` Christian Theune
2024-10-26 12:07 ` Christian Theune
2024-10-26 12:11 ` Christian Theune
2024-10-30 1:25 ` Yu Kuai
2024-10-30 6:29 ` Christian Theune
2024-10-31 7:48 ` Yu Kuai
2024-10-31 8:04 ` Christian Theune
2024-10-31 15:07 ` Christian Theune
2024-10-31 19:46 ` Christian Theune [this message]
2024-10-31 20:33 ` John Stoffel
2024-11-01 2:02 ` Yu Kuai
2024-11-01 7:56 ` Christian Theune
2024-11-01 8:33 ` Christian Theune
2024-11-03 15:54 ` Christian Theune
2024-11-03 16:16 ` Dragan Milivojević
2024-11-04 11:29 ` Yu Kuai
2024-11-04 11:51 ` Christian Theune
2024-11-04 12:30 ` Yu Kuai
2024-11-04 11:40 ` Yu Kuai
2024-11-04 12:18 ` Yu Kuai
2024-11-04 14:45 ` Christian Theune
2024-11-04 20:04 ` Christian Theune
2024-11-05 1:20 ` Yu Kuai
2024-11-05 6:23 ` Christian Theune
2024-11-05 10:15 ` Christian Theune
2024-11-06 6:35 ` Yu Kuai
2024-11-06 6:40 ` Christian Theune
2024-11-07 7:55 ` Yu Kuai
2024-11-07 8:01 ` Yu Kuai
2024-11-09 11:35 ` Xiao Ni
2024-11-11 2:25 ` Yu Kuai
2024-11-11 8:00 ` Christian Theune
2024-11-11 14:34 ` Christian Theune
2024-11-12 6:57 ` Christian Theune
2024-11-14 15:07 ` Christian Theune
2024-11-15 8:07 ` Xiao Ni
2024-11-15 8:44 ` Christian Theune
2024-11-15 10:11 ` Xiao Ni
2024-11-15 11:06 ` Christian Theune
2024-12-10 8:33 ` Christian Theune
2024-12-16 13:25 ` Christian Theune
2024-12-16 13:36 ` Yu Kuai
2024-12-16 14:18 ` Christian Theune
2025-01-20 9:19 ` Christian Theune
2025-01-24 6:22 ` Christian Theune
2025-01-24 6:35 ` Yu Kuai
2025-01-24 6:38 ` Christian Theune
2024-08-15 15:53 ` John Stoffel
2024-08-15 19:13 ` Christian Theune
2024-08-26 14:38 ` Christian Theune
2024-08-08 14:23 ` John Stoffel
2024-08-19 19:12 ` tihmstar
2024-08-19 21:05 ` John Stoffel
2024-08-24 16:56 ` tihmstar
2024-08-24 18:12 ` Dragan Milivojević
2024-08-27 1:27 ` John Stoffel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=78517565-B1AB-4441-B4F8-EB380E98EB0F@flyingcircus.io \
--to=ct@flyingcircus.io \
--cc=dm-devel@lists.linux.dev \
--cc=galileo@pkm-inc.com \
--cc=john@stoffel.org \
--cc=linux-raid@vger.kernel.org \
--cc=yukuai1@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox