Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: Christian Theune <ct@flyingcircus.io>
To: Xiao Ni <xni@redhat.com>
Cc: "Yu Kuai" <yukuai1@huaweicloud.com>,
	"John Stoffel" <john@stoffel.org>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
	dm-devel@lists.linux.dev,
	"Dragan Milivojević" <galileo@pkm-inc.com>,
	"yangerkun@huawei.com" <yangerkun@huawei.com>,
	"yukuai (C)" <yukuai3@huawei.com>,
	"David Jeffery" <djeffery@redhat.com>
Subject: Re: PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files
Date: Thu, 14 Nov 2024 16:07:17 +0100	[thread overview]
Message-ID: <4DA6F1FE-D465-40C7-A116-F49CF6A2CFF0@flyingcircus.io> (raw)
In-Reply-To: <0B1D29D1-523C-4E42-95F9-62B32B741930@flyingcircus.io>

Hi,

just a followup: the system ran over 2 days without my workload being able to trigger the issue. I’ve seen there is another thread where this patch wasn’t sufficient and if i understand correctly, Yu and Xiao are working on an amalgamated fix?

Christian

> On 12. Nov 2024, at 07:57, Christian Theune <ct@flyingcircus.io> wrote:
> 
> Hi,
> 
> my workload has been running for 22 hours now successfully - it seems that the patch works.
> 
> If this gets accepted then I’d kindly ask for an LTS backport to 6.6.
> 
> Thanks to everyone for helping figuring this out and fixing it!
> 
> Christian
> 
>> On 11. Nov 2024, at 15:34, Christian Theune <ct@flyingcircus.io> wrote:
>> 
>> I’ve been running withy my workflow that is known to trigger the issue reliably for about 6 hours now. This is longer than it worked before.
>> I’m leaving the office for today and will leave things running over night and report back tomorrow.
>> 
>> Christian
>> 
>>> On 11. Nov 2024, at 09:00, Christian Theune <ct@flyingcircus.io> wrote:
>>> 
>>> Hi,
>>> 
>>> I’m trying this with 6.11.7 today.
>>> 
>>> Christian
>>> 
>>>> On 9. Nov 2024, at 12:35, Xiao Ni <xni@redhat.com> wrote:
>>>> 
>>>> On Thu, Nov 7, 2024 at 3:55 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>> 
>>>>> Hi!
>>>>> 
>>>>> 在 2024/11/06 14:40, Christian Theune 写道:
>>>>>> Hi,
>>>>>> 
>>>>>>> On 6. Nov 2024, at 07:35, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> 在 2024/11/05 18:15, Christian Theune 写道:
>>>>>>>> Hi,
>>>>>>>> after about 2 hours it stalled again. Here’s the full blocked process dump. (Tell me if this isn’t helpful, otherwise I’ll keep posting that as it’s the only real data I can show)
>>>>>>> 
>>>>>>> This is bad news :(
>>>>>> 
>>>>>> Yeah. But: the good new is that we aren’t eating any data so far … ;)
>>>>>> 
>>>>>>> While reviewing related code, I come up with a plan to move bitmap
>>>>>>> start/end write ops to the upper layer. Make sure each write IO from
>>>>>>> upper layer only start once and end once, this is easy to make sure
>>>>>>> they are balanced and can avoid many calls to improve performance as
>>>>>>> well.
>>>>>> 
>>>>>> Sounds like a plan!
>>>>>> 
>>>>>>> However, I need a few days to cooke a patch after work.
>>>>>> 
>>>>>> Sure thing! I’ll switch off bitmaps for that time - I’m happy we found a workaround so we can take time to resolve it cleanly. :)
>>>>> 
>>>>> I wrote a simple and crude version, please give it a test again.
>>>>> 
>>>>> Thanks,
>>>>> Kuai
>>>>> 
>>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>>> index d3a837506a36..5e1a82b79e41 100644
>>>>> --- a/drivers/md/md.c
>>>>> +++ b/drivers/md/md.c
>>>>> @@ -8753,6 +8753,30 @@ void md_submit_discard_bio(struct mddev *mddev,
>>>>> struct md_rdev *rdev,
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(md_submit_discard_bio);
>>>>> 
>>>>> +static bool is_raid456(struct mddev *mddev)
>>>>> +{
>>>>> +       return mddev->pers->level == 4 || mddev->pers->level == 5 ||
>>>>> +              mddev->pers->level == 6;
>>>>> +}
>>>>> +
>>>>> +static void bitmap_startwrite(struct mddev *mddev, struct bio *bio)
>>>>> +{
>>>>> +       if (!is_raid456(mddev) || !mddev->bitmap)
>>>>> +               return;
>>>>> +
>>>>> +       md_bitmap_startwrite(mddev->bitmap, bio_offset(bio),
>>>>> bio_sectors(bio),
>>>>> +                            0);
>>>>> +}
>>>>> +
>>>>> +static void bitmap_endwrite(struct mddev *mddev, struct bio *bio,
>>>>> sector_t sectors)
>>>>> +{
>>>>> +       if (!is_raid456(mddev) || !mddev->bitmap)
>>>>> +               return;
>>>>> +
>>>>> +       md_bitmap_endwrite(mddev->bitmap, bio_offset(bio), sectors,o
>>>>> +                          bio->bi_status == BLK_STS_OK, 0);
>>>>> +}
>>>>> +
>>>>> static void md_end_clone_io(struct bio *bio)
>>>>> {
>>>>>     struct md_io_clone *md_io_clone = bio->bi_private;
>>>>> @@ -8765,6 +8789,7 @@ static void md_end_clone_io(struct bio *bio)
>>>>>     if (md_io_clone->start_time)
>>>>>             bio_end_io_acct(orig_bio, md_io_clone->start_time);
>>>>> 
>>>>> +       bitmap_endwrite(mddev, orig_bio, md_io_clone->sectors);
>>>>>     bio_put(bio);
>>>>>     bio_endio(orig_bio);
>>>>>     percpu_ref_put(&mddev->active_io);
>>>>> @@ -8778,6 +8803,7 @@ static void md_clone_bio(struct mddev *mddev,
>>>>> struct bio **bio)
>>>>>             bio_alloc_clone(bdev, *bio, GFP_NOIO,
>>>>> &mddev->io_clone_set);
>>>>> 
>>>>>     md_io_clone = container_of(clone, struct md_io_clone, bio_clone);
>>>>> +       md_io_clone->sectors = bio_sectors(*bio);
>>>>>     md_io_clone->orig_bio = *bio;
>>>>>     md_io_clone->mddev = mddev;
>>>>>     if (blk_queue_io_stat(bdev->bd_disk->queue))
>>>>> @@ -8790,6 +8816,7 @@ static void md_clone_bio(struct mddev *mddev,
>>>>> struct bio **bio)
>>>>> 
>>>>> void md_account_bio(struct mddev *mddev, struct bio **bio)
>>>>> {
>>>>> +       bitmap_startwrite(mddev, *bio);
>>>>>     percpu_ref_get(&mddev->active_io);
>>>>>     md_clone_bio(mddev, bio);
>>>>> }
>>>>> @@ -8807,6 +8834,8 @@ void md_free_cloned_bio(struct bio *bio)
>>>>>     if (md_io_clone->start_time)
>>>>>             bio_end_io_acct(orig_bio, md_io_clone->start_time);
>>>>> 
>>>>> +       bitmap_endwrite(mddev, orig_bio, md_io_clone->sectors);
>>>>> +
>>>>>     bio_put(bio);
>>>>>     percpu_ref_put(&mddev->active_io);
>>>>> }
>>>>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>>>>> index a0d6827dced9..0c2794230e0a 100644
>>>>> --- a/drivers/md/md.h
>>>>> +++ b/drivers/md/md.h
>>>>> @@ -837,6 +837,7 @@ struct md_io_clone {
>>>>>     struct mddev    *mddev;
>>>>>     struct bio      *orig_bio;
>>>>>     unsigned long   start_time;
>>>>> +       sector_t        sectors;
>>>>>     struct bio      bio_clone;
>>>>> };
>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>>>>> index c14cf2410365..4f009e32f68a 100644
>>>>> --- a/drivers/md/raid5.c
>>>>> +++ b/drivers/md/raid5.c
>>>>> @@ -3561,12 +3561,6 @@ static void __add_stripe_bio(struct stripe_head
>>>>> *sh, struct bio *bi,
>>>>>              * is added to a batch, STRIPE_BIT_DELAY cannot be changed
>>>>>              * any more.
>>>>>              */
>>>>> -               set_bit(STRIPE_BITMAP_PENDING, &sh->state);
>>>>> -               spin_unlock_irq(&sh->stripe_lock);
>>>>> -               md_bitmap_startwrite(conf->mddev->bitmap, sh->sector,
>>>>> -                                    RAID5_STRIPE_SECTORS(conf), 0);
>>>>> -               spin_lock_irq(&sh->stripe_lock);
>>>>> -               clear_bit(STRIPE_BITMAP_PENDING, &sh->state);
>>>>>             if (!sh->batch_head) {
>>>>>                     sh->bm_seq = conf->seq_flush+1;
>>>>>                     set_bit(STRIPE_BIT_DELAY, &sh->state);
>>>>> @@ -3621,7 +3615,6 @@ handle_failed_stripe(struct r5conf *conf, struct
>>>>> stripe_head *sh,
>>>>>     BUG_ON(sh->batch_head);
>>>>>     for (i = disks; i--; ) {
>>>>>             struct bio *bi;
>>>>> -               int bitmap_end = 0;
>>>>> 
>>>>>             if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
>>>>>                     struct md_rdev *rdev = conf->disks[i].rdev;
>>>>> @@ -3646,8 +3639,6 @@ handle_failed_stripe(struct r5conf *conf, struct
>>>>> stripe_head *sh,
>>>>>             sh->dev[i].towrite = NULL;
>>>>>             sh->overwrite_disks = 0;
>>>>>             spin_unlock_irq(&sh->stripe_lock);
>>>>> -               if (bi)
>>>>> -                       bitmap_end = 1;
>>>>> 
>>>>>             log_stripe_write_finished(sh);
>>>>> @@ -3662,10 +3653,6 @@ handle_failed_stripe(struct r5conf *conf, struct
>>>>> stripe_head *sh,
>>>>>                     bio_io_error(bi);
>>>>>                     bi = nextbi;
>>>>>             }
>>>>> -               if (bitmap_end)
>>>>> -                       md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
>>>>> -                                          RAID5_STRIPE_SECTORS(conf),
>>>>> 0, 0);
>>>>> -               bitmap_end = 0;
>>>>>             /* and fail all 'written' */
>>>>>             bi = sh->dev[i].written;
>>>>>             sh->dev[i].written = NULL;
>>>>> @@ -3674,7 +3661,6 @@ handle_failed_stripe(struct r5conf *conf, struct
>>>>> stripe_head *sh,
>>>>>                     sh->dev[i].page = sh->dev[i].orig_page;
>>>>>             }
>>>>> 
>>>>> -               if (bi) bitmap_end = 1;
>>>>>             while (bi && bi->bi_iter.bi_sector <
>>>>>                    sh->dev[i].sector + RAID5_STRIPE_SECTORS(conf)) {
>>>>>                     struct bio *bi2 = r5_next_bio(conf, bi,
>>>>> sh->dev[i].sector);
>>>>> @@ -3708,9 +3694,6 @@ handle_failed_stripe(struct r5conf *conf, struct
>>>>> stripe_head *sh,
>>>>>                             bi = nextbi;
>>>>>                     }
>>>>>             }
>>>>> -               if (bitmap_end)
>>>>> -                       md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
>>>>> -                                          RAID5_STRIPE_SECTORS(conf),
>>>>> 0, 0);
>>>>>             /* If we were in the middle of a write the parity block
>>>>> might
>>>>>              * still be locked - so just clear all R5_LOCKED flags
>>>>>              */
>>>>> @@ -4059,10 +4042,6 @@ static void handle_stripe_clean_event(struct
>>>>> r5conf *conf,
>>>>>                                     bio_endio(wbi);
>>>>>                                     wbi = wbi2;
>>>>>                             }
>>>>> -                               md_bitmap_endwrite(conf->mddev->bitmap,
>>>>> sh->sector,
>>>>> -
>>>>> RAID5_STRIPE_SECTORS(conf),
>>>>> -
>>>>> !test_bit(STRIPE_DEGRADED, &sh->state),
>>>>> -                                                  0);
>>>>>                             if (head_sh->batch_head) {
>>>>>                                     sh =
>>>>> list_first_entry(&sh->batch_list,
>>>>>                                                           struct
>>>>> stripe_head,
>>>>> @@ -5788,13 +5767,6 @@ static void make_discard_request(struct mddev
>>>>> *mddev, struct bio *bi)
>>>>>             }
>>>>>             spin_unlock_irq(&sh->stripe_lock);
>>>>>             if (conf->mddev->bitmap) {
>>>>> -                       for (d = 0;
>>>>> -                            d < conf->raid_disks - conf->max_degraded;
>>>>> -                            d++)
>>>>> -                               md_bitmap_startwrite(mddev->bitmap,
>>>>> -                                                    sh->sector,
>>>>> -
>>>>> RAID5_STRIPE_SECTORS(conf),
>>>>> -                                                    0);
>>>>>                     sh->bm_seq = conf->seq_flush + 1;
>>>>>                     set_bit(STRIPE_BIT_DELAY, &sh->state);
>>>>>             }
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Thanks a lot for your help!
>>>>>> Christian
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> Hi Kuai
>>>> 
>>>> Maybe it's not good to put the bitmap operation from raid5 to md which
>>>> the new api is only used for raid5. And the bitmap region which raid5
>>>> needs to handle is based on the member disk. It should be calculated
>>>> rather than the bio address space. Because the bio address space is
>>>> for the whole array.
>>>> 
>>>> We have a customer who reports a similar problem. There is a patch
>>>> from David. I put it in the attachment.
>>>> 
>>>> @Christian, can you have a try with the patch? It can be applied
>>>> cleanly on 6.11-rc6
>>>> 
>>>> Regards
>>>> Xiao
>>>> <md_raid5_one_bitmap_claim_per_stripe_head.patch>
>>> 
>>> Liebe Grüße,
>>> Christian Theune
>>> 
>>> -- 
>>> Christian Theune · ct@flyingcircus.io · +49 345 219401 0
>>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
>>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
>>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
>>> 
>> 
>> Liebe Grüße,
>> Christian Theune
>> 
>> -- 
>> Christian Theune · ct@flyingcircus.io · +49 345 219401 0
>> Flying Circus Internet Operations GmbH · https://flyingcircus.io
>> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
>> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
>> 
> 
> Liebe Grüße,
> Christian Theune
> 
> -- 
> Christian Theune · ct@flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · https://flyingcircus.io
> Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
> 
> 

Liebe Grüße,
Christian Theune

-- 
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick

next prev parent reply	other threads:[~2024-11-14 15:07 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-06 14:10 PROBLEM: repeatable lockup on RAID-6 with LUKS dm-crypt on NVMe devices when rsyncing many files Christian Theune
2024-08-06 14:10 ` Christian Theune
2024-08-07  2:55 ` Yu Kuai
2024-08-07  5:31   ` Christian Theune
2024-08-07  6:46     ` Christian Theune
2024-08-07  8:59       ` Christian Theune
2024-08-07 21:05         ` John Stoffel
2024-08-08  1:33           ` Yu Kuai
2024-08-08  6:02           ` Christian Theune
2024-08-08  6:55             ` Yu Kuai
2024-08-08  7:06               ` Christian Theune
2024-08-08  8:53                 ` Christian Theune
2024-08-09  1:13                   ` Yu Kuai
2024-08-09  6:10                     ` Christian Theune
2024-08-09 22:51                       ` John Stoffel
2024-08-12  6:58                         ` Christian Theune
2024-08-12 18:37                           ` John Stoffel
2024-08-14  8:53                             ` Christian Theune
2024-08-15  6:19                               ` Christian Theune
2024-08-15 10:03                                 ` Christian Theune
2024-08-15 11:14                                   ` Yu Kuai
2024-08-15 11:24                                     ` Christian Theune
2024-08-15 11:49                                       ` Yu Kuai
2024-10-22 15:02                                     ` Christian Theune
2024-10-23  1:13                                       ` Yu Kuai
2024-10-23  6:03                                         ` Christian Theune
2024-10-23 17:50                                           ` Christian Theune
2024-10-25  8:39                                         ` Christian Theune
2024-10-25 13:31                                           ` Dragan Milivojević
2024-10-25 14:02                                             ` Christian Theune
2024-10-26  5:37                                               ` Christian Theune
2024-10-26  9:07                                                 ` Yu Kuai
2024-10-26 11:51                                                   ` Christian Theune
2024-10-26 12:07                                                   ` Christian Theune
2024-10-26 12:11                                                     ` Christian Theune
2024-10-30  1:25                                                       ` Yu Kuai
2024-10-30  6:29                                                         ` Christian Theune
2024-10-31  7:48                                                           ` Yu Kuai
2024-10-31  8:04                                                             ` Christian Theune
2024-10-31 15:07                                                               ` Christian Theune
2024-10-31 19:46                                                                 ` Christian Theune
2024-10-31 20:33                                                                   ` John Stoffel
2024-11-01  2:02                                                                     ` Yu Kuai
2024-11-01  7:56                                                                       ` Christian Theune
2024-11-01  8:33                                                                         ` Christian Theune
2024-11-03 15:54                                                                           ` Christian Theune
2024-11-03 16:16                                                                             ` Dragan Milivojević
2024-11-04 11:29                                                                           ` Yu Kuai
2024-11-04 11:51                                                                             ` Christian Theune
2024-11-04 12:30                                                                               ` Yu Kuai
2024-11-04 11:40                                                                           ` Yu Kuai
2024-11-04 12:18                                                                             ` Yu Kuai
2024-11-04 14:45                                                                               ` Christian Theune
2024-11-04 20:04                                                                                 ` Christian Theune
2024-11-05  1:20                                                                                   ` Yu Kuai
2024-11-05  6:23                                                                                     ` Christian Theune
2024-11-05 10:15                                                                                       ` Christian Theune
2024-11-06  6:35                                                                                         ` Yu Kuai
2024-11-06  6:40                                                                                           ` Christian Theune
2024-11-07  7:55                                                                                             ` Yu Kuai
2024-11-07  8:01                                                                                               ` Yu Kuai
2024-11-09 11:35                                                                                               ` Xiao Ni
2024-11-11  2:25                                                                                                 ` Yu Kuai
2024-11-11  8:00                                                                                                 ` Christian Theune
2024-11-11 14:34                                                                                                   ` Christian Theune
2024-11-12  6:57                                                                                                     ` Christian Theune
2024-11-14 15:07                                                                                                       ` Christian Theune [this message]
2024-11-15  8:07                                                                                                         ` Xiao Ni
2024-11-15  8:44                                                                                                           ` Christian Theune
2024-11-15 10:11                                                                                                             ` Xiao Ni
2024-11-15 11:06                                                                                                               ` Christian Theune
2024-12-10  8:33                                                                                                                 ` Christian Theune
2024-12-16 13:25                                                                                                                   ` Christian Theune
2024-12-16 13:36                                                                                                                     ` Yu Kuai
2024-12-16 14:18                                                                                                                       ` Christian Theune
2025-01-20  9:19                                                                                                                         ` Christian Theune
2025-01-24  6:22                                                                                                                           ` Christian Theune
2025-01-24  6:35                                                                                                                             ` Yu Kuai
2025-01-24  6:38                                                                                                                               ` Christian Theune
2024-08-15 15:53                                 ` John Stoffel
2024-08-15 19:13                                   ` Christian Theune
2024-08-26 14:38                                     ` Christian Theune
2024-08-08 14:23             ` John Stoffel
2024-08-19 19:12               ` tihmstar
2024-08-19 21:05                 ` John Stoffel
2024-08-24 16:56                   ` tihmstar
2024-08-24 18:12                   ` Dragan Milivojević
2024-08-27  1:27                     ` John Stoffel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DA6F1FE-D465-40C7-A116-F49CF6A2CFF0@flyingcircus.io \
    --to=ct@flyingcircus.io \
    --cc=djeffery@redhat.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=galileo@pkm-inc.com \
    --cc=john@stoffel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=xni@redhat.com \
    --cc=yangerkun@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox