All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Mikulas Patocka <mpatocka@redhat.com>, Song Liu <song@kernel.org>
Cc: linux-raid@vger.kernel.org, dm-devel@redhat.com,
	Zdenek Kabelac <zkabelac@redhat.com>
Subject: Re: [dm-devel] A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
Date: Thu, 3 Nov 2022 11:47:53 +0800	[thread overview]
Message-ID: <78646e88-2457-81e1-e3e7-cf66b67ba923@linux.dev> (raw)
In-Reply-To: <alpine.LRH.2.21.2211021214390.25745@file01.intranet.prod.int.rdu2.redhat.com>

Hi,

On 11/3/22 12:27 AM, Mikulas Patocka wrote:
> Hi
>
> There's a crash in the test shell/lvchange-rebuild-raid.sh when running
> the lvm testsuite. It can be reproduced by running "make check_local
> T=shell/lvchange-rebuild-raid.sh" in a loop.

I have problem to run the cmd (not sure what I missed), it would be 
better if
the relevant cmds are extracted from the script then I can reproduce it with
those cmds directly.

[root@localhost lvm2]# git log | head -1
commit 36a923926c2c27c1a8a5ac262387d2a4d3e620f8
[root@localhost lvm2]# make check_local T=shell/lvchange-rebuild-raid.sh
make -C libdm device-mapper
[...]
make -C daemons
make[1]: Nothing to be done for 'all'.
make -C test check_local
VERBOSE=0 ./lib/runner \
         --testdir . --outdir results \
         --flavours ndev-vanilla --only shell/lvchange-rebuild-raid.sh 
--skip @
running 1 tests
###      running: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 0
| [ 0:00] lib/inittest: line 133: 
/tmp/LVMTEST317948.iCoLwmDhZW/dev/testnull: Permission denied
| [ 0:00] Filesystem does support devices in 
/tmp/LVMTEST317948.iCoLwmDhZW/dev (mounted with nodev?)
| [ 0:00] ## - /root/lvm2/test/shell/lvchange-rebuild-raid.sh:16
| [ 0:00] ## 1 STACKTRACE() called from lib/inittest:134
| [ 0:00] ## 2 source() called from 
/root/lvm2/test/shell/lvchange-rebuild-raid.sh:16
| [ 0:00] ## teardown....ok
###       failed: [ndev-vanilla] shell/lvchange-rebuild-raid.sh

### 1 tests: 0 passed, 0 skipped, 0 timed out, 0 warned, 1 failed
make[1]: *** [Makefile:137: check_local] Error 1
make: *** [Makefile:89: check_local] Error 2

And line 16 is this,

[root@localhost lvm2]# head -16 
/root/lvm2/test/shell/lvchange-rebuild-raid.sh | tail -1
. lib/inittest

For "lvchange --rebuild" action, I guess it relates to CTR_FLAG_REBUILD flag
which is check from two paths.

1. raid_ctr -> parse_raid_params
                    -> analyse_superblocks -> super_validate -> 
super_init_validation

2. raid_status which might invoked by ioctls (DM_DEV_WAIT_CMD and
     DM_TABLE_STATUS_CMD) from lvm

Since the commit you mentioned the behavior of raid_dtr, then I think 
the crash
is caused by path 2, please correct me if my understanding is wrong.

> The crash happens in the kernel 6.0 and 6.1-rc3, but not in 5.19.
>
> I bisected the crash and it is caused by the commit
> 0dd84b319352bb8ba64752d4e45396d8b13e6018.
>
> I uploaded my .config here (it's 12-core virtual machine):
> https://people.redhat.com/~mpatocka/testcases/md-crash-config/config.txt
>
> Mikulas
>
> [   78.478417] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [   78.479166] #PF: supervisor write access in kernel mode
> [   78.479671] #PF: error_code(0x0002) - not-present page
> [   78.480171] PGD 11557f0067 P4D 11557f0067 PUD 0
> [   78.480626] Oops: 0002 [#1] PREEMPT SMP
> [   78.481001] CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5
> [   78.481661] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [   78.482471] Workqueue: kdelayd flush_expired_bios [dm_delay]
> [   78.483021] RIP: 0010:mempool_free+0x47/0x80
> [   78.483455] Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00
> [   78.485220] RSP: 0018:ffff88910036bda8 EFLAGS: 00010093
> [   78.485719] RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001
> [   78.486404] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8
> [   78.487080] RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900
> [   78.487764] R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000
> [   78.488451] R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05
> [   78.489146] FS:  0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000
> [   78.489913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   78.490474] CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0
> [   78.491165] Call Trace:
> [   78.491429]  <TASK>
> [   78.491640]  clone_endio+0xf4/0x1c0 [dm_mod]
> [   78.492072]  clone_endio+0xf4/0x1c0 [dm_mod]

The clone_endio belongs to "clone" target_type.

> [   78.492505]  __submit_bio+0x76/0x120
> [   78.492859]  submit_bio_noacct_nocheck+0xb6/0x2a0
> [   78.493325]  flush_expired_bios+0x28/0x2f [dm_delay]

This is "delay" target_type. Could you shed light on how the two targets
connect with dm-raid? And I have shallow knowledge about dm ...

> [   78.493808]  process_one_work+0x1b4/0x300
> [   78.494211]  worker_thread+0x45/0x3e0
> [   78.494570]  ? rescuer_thread+0x380/0x380
> [   78.494957]  kthread+0xc2/0x100
> [   78.495279]  ? kthread_complete_and_exit+0x20/0x20
> [   78.495743]  ret_from_fork+0x1f/0x30
> [   78.496096]  </TASK>
> [   78.496326] Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd]
> [   78.500425] CR2: 0000000000000000
> [   78.500752] ---[ end trace 0000000000000000 ]---
> [   78.501214] RIP: 0010:mempool_free+0x47/0x80

BTW, is the mempool_free from endio -> dec_count -> complete_io?
And io which caused the crash is from dm_io -> async_io / sync_io
  -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
suppose the io is for mirror image.

Thanks,
Guoqing

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

WARNING: multiple messages have this Message-ID (diff)
From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Mikulas Patocka <mpatocka@redhat.com>, Song Liu <song@kernel.org>
Cc: Zdenek Kabelac <zkabelac@redhat.com>,
	linux-raid@vger.kernel.org, dm-devel@redhat.com
Subject: Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
Date: Thu, 3 Nov 2022 11:47:53 +0800	[thread overview]
Message-ID: <78646e88-2457-81e1-e3e7-cf66b67ba923@linux.dev> (raw)
In-Reply-To: <alpine.LRH.2.21.2211021214390.25745@file01.intranet.prod.int.rdu2.redhat.com>

Hi,

On 11/3/22 12:27 AM, Mikulas Patocka wrote:
> Hi
>
> There's a crash in the test shell/lvchange-rebuild-raid.sh when running
> the lvm testsuite. It can be reproduced by running "make check_local
> T=shell/lvchange-rebuild-raid.sh" in a loop.

I have problem to run the cmd (not sure what I missed), it would be 
better if
the relevant cmds are extracted from the script then I can reproduce it with
those cmds directly.

[root@localhost lvm2]# git log | head -1
commit 36a923926c2c27c1a8a5ac262387d2a4d3e620f8
[root@localhost lvm2]# make check_local T=shell/lvchange-rebuild-raid.sh
make -C libdm device-mapper
[...]
make -C daemons
make[1]: Nothing to be done for 'all'.
make -C test check_local
VERBOSE=0 ./lib/runner \
         --testdir . --outdir results \
         --flavours ndev-vanilla --only shell/lvchange-rebuild-raid.sh 
--skip @
running 1 tests
###      running: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 0
| [ 0:00] lib/inittest: line 133: 
/tmp/LVMTEST317948.iCoLwmDhZW/dev/testnull: Permission denied
| [ 0:00] Filesystem does support devices in 
/tmp/LVMTEST317948.iCoLwmDhZW/dev (mounted with nodev?)
| [ 0:00] ## - /root/lvm2/test/shell/lvchange-rebuild-raid.sh:16
| [ 0:00] ## 1 STACKTRACE() called from lib/inittest:134
| [ 0:00] ## 2 source() called from 
/root/lvm2/test/shell/lvchange-rebuild-raid.sh:16
| [ 0:00] ## teardown....ok
###       failed: [ndev-vanilla] shell/lvchange-rebuild-raid.sh

### 1 tests: 0 passed, 0 skipped, 0 timed out, 0 warned, 1 failed
make[1]: *** [Makefile:137: check_local] Error 1
make: *** [Makefile:89: check_local] Error 2

And line 16 is this,

[root@localhost lvm2]# head -16 
/root/lvm2/test/shell/lvchange-rebuild-raid.sh | tail -1
. lib/inittest

For "lvchange --rebuild" action, I guess it relates to CTR_FLAG_REBUILD flag
which is check from two paths.

1. raid_ctr -> parse_raid_params
                    -> analyse_superblocks -> super_validate -> 
super_init_validation

2. raid_status which might invoked by ioctls (DM_DEV_WAIT_CMD and
     DM_TABLE_STATUS_CMD) from lvm

Since the commit you mentioned the behavior of raid_dtr, then I think 
the crash
is caused by path 2, please correct me if my understanding is wrong.

> The crash happens in the kernel 6.0 and 6.1-rc3, but not in 5.19.
>
> I bisected the crash and it is caused by the commit
> 0dd84b319352bb8ba64752d4e45396d8b13e6018.
>
> I uploaded my .config here (it's 12-core virtual machine):
> https://people.redhat.com/~mpatocka/testcases/md-crash-config/config.txt
>
> Mikulas
>
> [   78.478417] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [   78.479166] #PF: supervisor write access in kernel mode
> [   78.479671] #PF: error_code(0x0002) - not-present page
> [   78.480171] PGD 11557f0067 P4D 11557f0067 PUD 0
> [   78.480626] Oops: 0002 [#1] PREEMPT SMP
> [   78.481001] CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5
> [   78.481661] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [   78.482471] Workqueue: kdelayd flush_expired_bios [dm_delay]
> [   78.483021] RIP: 0010:mempool_free+0x47/0x80
> [   78.483455] Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00
> [   78.485220] RSP: 0018:ffff88910036bda8 EFLAGS: 00010093
> [   78.485719] RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001
> [   78.486404] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8
> [   78.487080] RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900
> [   78.487764] R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000
> [   78.488451] R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05
> [   78.489146] FS:  0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000
> [   78.489913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   78.490474] CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0
> [   78.491165] Call Trace:
> [   78.491429]  <TASK>
> [   78.491640]  clone_endio+0xf4/0x1c0 [dm_mod]
> [   78.492072]  clone_endio+0xf4/0x1c0 [dm_mod]

The clone_endio belongs to "clone" target_type.

> [   78.492505]  __submit_bio+0x76/0x120
> [   78.492859]  submit_bio_noacct_nocheck+0xb6/0x2a0
> [   78.493325]  flush_expired_bios+0x28/0x2f [dm_delay]

This is "delay" target_type. Could you shed light on how the two targets
connect with dm-raid? And I have shallow knowledge about dm ...

> [   78.493808]  process_one_work+0x1b4/0x300
> [   78.494211]  worker_thread+0x45/0x3e0
> [   78.494570]  ? rescuer_thread+0x380/0x380
> [   78.494957]  kthread+0xc2/0x100
> [   78.495279]  ? kthread_complete_and_exit+0x20/0x20
> [   78.495743]  ret_from_fork+0x1f/0x30
> [   78.496096]  </TASK>
> [   78.496326] Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd]
> [   78.500425] CR2: 0000000000000000
> [   78.500752] ---[ end trace 0000000000000000 ]---
> [   78.501214] RIP: 0010:mempool_free+0x47/0x80

BTW, is the mempool_free from endio -> dec_count -> complete_io?
And io which caused the crash is from dm_io -> async_io / sync_io
  -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
suppose the io is for mirror image.

Thanks,
Guoqing

  reply	other threads:[~2022-11-03  3:54 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-02 16:27 [dm-devel] A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018 Mikulas Patocka
2022-11-02 16:27 ` Mikulas Patocka
2022-11-03  3:47 ` Guoqing Jiang [this message]
2022-11-03  3:47   ` Guoqing Jiang
2022-11-03  7:28   ` [dm-devel] " Guoqing Jiang
2022-11-03  7:28     ` Guoqing Jiang
2022-11-03 13:00   ` [dm-devel] " Mikulas Patocka
2022-11-03 13:00     ` Mikulas Patocka
2022-11-03 15:20     ` [dm-devel] " Mikulas Patocka
2022-11-03 15:20       ` Mikulas Patocka
2022-11-04  2:41       ` [dm-devel] " Guoqing Jiang
2022-11-04  2:41         ` Guoqing Jiang
2022-11-04 13:40         ` [dm-devel] " Mikulas Patocka
2022-11-04 13:40           ` Mikulas Patocka
2022-11-07  9:32           ` [dm-devel] " Guoqing Jiang
2022-11-07  9:32             ` Guoqing Jiang
2022-11-03 14:46   ` [dm-devel] " Heming Zhao
2022-11-03 14:46     ` Heming Zhao
2022-11-04  1:23     ` [dm-devel] " Guoqing Jiang
2022-11-04  1:23       ` Guoqing Jiang
2022-11-04 11:10       ` [dm-devel] " Zdenek Kabelac
2022-11-04 11:10         ` Zdenek Kabelac
2022-11-04 15:18         ` [dm-devel] " Xiao Ni
2022-11-04 15:18           ` Xiao Ni
2022-11-07  1:52         ` Guoqing Jiang
2022-11-07  1:52           ` Guoqing Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78646e88-2457-81e1-e3e7-cf66b67ba923@linux.dev \
    --to=guoqing.jiang@linux.dev \
    --cc=dm-devel@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=song@kernel.org \
    --cc=zkabelac@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.