From: Yu Kuai <yukuai1@huaweicloud.com>
To: Christian Theune <ct@flyingcircus.io>, Yu Kuai <yukuai1@huaweicloud.com>
Cc: John Stoffel <john@stoffel.org>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
dm-devel@lists.linux.dev, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files
Date: Wed, 23 Oct 2024 09:13:03 +0800 [thread overview]
Message-ID: <1bbc86a8-1abf-11a1-e724-b6868a8d9f88@huaweicloud.com> (raw)
In-Reply-To: <143E09BF-BD10-43EB-B0F1-7421F8200DB1@flyingcircus.io>
Hi,
在 2024/10/22 23:02, Christian Theune 写道:
> Hi,
>
> I had to put this issue aside and as Yu indicated he was busy I didn’t follow up yet.
>
> @Yu: I don’t have new insights, but I have a basically identical machine that I will start adding new data with a similar structure soon.
>
> I couldn’t directly reproduce the issue there - likely because the network is a bit slower as it’s connected from a remote side and has only 1G instead of 10G, due to the long distances.
>
> Let me know if you’re interested in following up here and I’ll try to make room on my side to get you more input as needed.
Yes, sorry that I was totally busy with other things. :(
BTW, what is the result after bypassing bitmap(disable bitmap by
kernel hacking)?
Thanks,
Kuai
>
> Christian
>
>> On 15. Aug 2024, at 13:14, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/08/15 18:03, Christian Theune 写道:
>>> Hi,
>>> small insight: even given my dataset that can reliably trigger this (after around 1.5 hours of rsyncing) it does not trigger on a specific set of files. I’ve deleted the data and started the rsync on a fresh directory (not a fresh filesystem, I can’t delete that as it carries important data) but it doesn’t always get stuck on the same files, even though rsync processes them in a repeatable order.
>>> I’m wondering how to generate more insights from that. Maybe keeping a blktrace log might help?
>>> It sounds like the specific pattern relies on XFS doing a specific thing there …
>>> Wild idea: maybe running the xfstest suite on an in-memory raid 6 setup could reproduce this?
>>> I’m guessing that the xfs people do not regularly run their test suite on a layered setup like mine with encryption and software raid?
>>
>> That sounds greate.
>>> Christian
>>>> On 15. Aug 2024, at 08:19, Christian Theune <ct@flyingcircus.io> wrote:
>>>>
>>>> Hi,
>>>>
>>>>> On 14. Aug 2024, at 10:53, Christian Theune <ct@flyingcircus.io> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> On 12. Aug 2024, at 20:37, John Stoffel <john@stoffel.org> wrote:
>>>>>>
>>>>>> I'd probably just do the RAID6 tests first, get them out of the way.
>>>>>
>>>>> Alright, those are running right now - I’ll let you know what happens.
>>>>
>>>> I’m not making progress here. I can’t reproduce those on in-memory loopback raid 6. However: i can’t fully produce the rsync. For me this only triggered after around 1.5hs of progress on the NVMe which resulted in the hangup. I can only create around 20 GiB worth of raid 6 volume on this machine. I’ve tried running rsync until it exhausts the space, deleting the content and running rsync again, but I feel like this isn’t suffient to trigger the issue. :(
>>>>
>>>> I’m trying to find whether any specific pattern in the files around the time it locks up might be relevant here and try to run the rsync over that
>>>> portion.
>>>>
>>>> On the plus side, I have a script now that can create the various loopback settings quickly, so I can try out things as needed. Not that valuable without a reproducer, yet, though.
>>>>
>>>> @Yu: you mentioned that you might be able to provide me a kernel that produces more error logging to diagnose this? Any chance we could try that route?
>>
>> Yes, however, I still need some time to sort out the internal process of
>> raid5. I'm quite busy with some other work stuff and I'm familiar with
>> raid1/10, but not too much about raid5. :(
>>
>> Main idea is to figure out why IO are not dispatched to underlying
>> disks.
>>
>> Thanks,
>> Kuai
>>
>>>>
>>>> Christian
>>>>
>>>> --
>>>> Christian Theune · ct@flyingcircus.io · +49 345 219401 0
>>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
>>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
>>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
>>> Liebe Grüße,
>>> Christian Theune
>
>
> Liebe Grüße,
> Christian Theune
>
next prev parent reply other threads:[~2024-10-23 1:13 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-06 14:10 PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files Christian Theune
2024-08-06 14:10 ` Christian Theune
2024-08-07 2:55 ` Yu Kuai
2024-08-07 5:31 ` Christian Theune
2024-08-07 6:46 ` Christian Theune
2024-08-07 8:59 ` Christian Theune
2024-08-07 21:05 ` John Stoffel
2024-08-08 1:33 ` Yu Kuai
2024-08-08 6:02 ` Christian Theune
2024-08-08 6:55 ` Yu Kuai
2024-08-08 7:06 ` Christian Theune
2024-08-08 8:53 ` Christian Theune
2024-08-09 1:13 ` Yu Kuai
2024-08-09 6:10 ` Christian Theune
2024-08-09 22:51 ` John Stoffel
2024-08-12 6:58 ` Christian Theune
2024-08-12 18:37 ` John Stoffel
2024-08-14 8:53 ` Christian Theune
2024-08-15 6:19 ` Christian Theune
2024-08-15 10:03 ` Christian Theune
2024-08-15 11:14 ` Yu Kuai
2024-08-15 11:24 ` Christian Theune
2024-08-15 11:49 ` Yu Kuai
2024-10-22 15:02 ` Christian Theune
2024-10-23 1:13 ` Yu Kuai [this message]
2024-10-23 6:03 ` Christian Theune
2024-10-23 17:50 ` Christian Theune
2024-10-25 8:39 ` Christian Theune
2024-10-25 13:31 ` Dragan Milivojević
2024-10-25 14:02 ` Christian Theune
2024-10-26 5:37 ` Christian Theune
2024-10-26 9:07 ` Yu Kuai
2024-10-26 11:51 ` Christian Theune
2024-10-26 12:07 ` Christian Theune
2024-10-26 12:11 ` Christian Theune
2024-10-30 1:25 ` Yu Kuai
2024-10-30 6:29 ` Christian Theune
2024-10-31 7:48 ` Yu Kuai
2024-10-31 8:04 ` Christian Theune
2024-10-31 15:07 ` Christian Theune
2024-10-31 19:46 ` Christian Theune
2024-10-31 20:33 ` John Stoffel
2024-11-01 2:02 ` Yu Kuai
2024-11-01 7:56 ` Christian Theune
2024-11-01 8:33 ` Christian Theune
2024-11-03 15:54 ` Christian Theune
2024-11-03 16:16 ` Dragan Milivojević
2024-11-04 11:29 ` Yu Kuai
2024-11-04 11:51 ` Christian Theune
2024-11-04 12:30 ` Yu Kuai
2024-11-04 11:40 ` Yu Kuai
2024-11-04 12:18 ` Yu Kuai
2024-11-04 14:45 ` Christian Theune
2024-11-04 20:04 ` Christian Theune
2024-11-05 1:20 ` Yu Kuai
2024-11-05 6:23 ` Christian Theune
2024-11-05 10:15 ` Christian Theune
2024-11-06 6:35 ` Yu Kuai
2024-11-06 6:40 ` Christian Theune
2024-11-07 7:55 ` Yu Kuai
2024-11-07 8:01 ` Yu Kuai
2024-11-09 11:35 ` Xiao Ni
2024-11-11 2:25 ` Yu Kuai
2024-11-11 8:00 ` Christian Theune
2024-11-11 14:34 ` Christian Theune
2024-11-12 6:57 ` Christian Theune
2024-11-14 15:07 ` Christian Theune
2024-11-15 8:07 ` Xiao Ni
2024-11-15 8:44 ` Christian Theune
2024-11-15 10:11 ` Xiao Ni
2024-11-15 11:06 ` Christian Theune
2024-12-10 8:33 ` Christian Theune
2024-12-16 13:25 ` Christian Theune
2024-12-16 13:36 ` Yu Kuai
2024-12-16 14:18 ` Christian Theune
2025-01-20 9:19 ` Christian Theune
2025-01-24 6:22 ` Christian Theune
2025-01-24 6:35 ` Yu Kuai
2025-01-24 6:38 ` Christian Theune
2024-08-15 15:53 ` John Stoffel
2024-08-15 19:13 ` Christian Theune
2024-08-26 14:38 ` Christian Theune
2024-08-08 14:23 ` John Stoffel
2024-08-19 19:12 ` tihmstar
2024-08-19 21:05 ` John Stoffel
2024-08-24 16:56 ` tihmstar
2024-08-24 18:12 ` Dragan Milivojević
2024-08-27 1:27 ` John Stoffel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1bbc86a8-1abf-11a1-e724-b6868a8d9f88@huaweicloud.com \
--to=yukuai1@huaweicloud.com \
--cc=ct@flyingcircus.io \
--cc=dm-devel@lists.linux.dev \
--cc=john@stoffel.org \
--cc=linux-raid@vger.kernel.org \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox