A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
@ 2022-11-02 16:27 Mikulas Patocka
  2022-11-03  3:47 ` Guoqing Jiang
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2022-11-02 16:27 UTC (permalink / raw)
  To: Guoqing Jiang, Song Liu; +Cc: Zdenek Kabelac, linux-raid, dm-devel

Hi

There's a crash in the test shell/lvchange-rebuild-raid.sh when running 
the lvm testsuite. It can be reproduced by running "make check_local 
T=shell/lvchange-rebuild-raid.sh" in a loop.

The crash happens in the kernel 6.0 and 6.1-rc3, but not in 5.19.

I bisected the crash and it is caused by the commit 
0dd84b319352bb8ba64752d4e45396d8b13e6018.

I uploaded my .config here (it's 12-core virtual machine): 
https://people.redhat.com/~mpatocka/testcases/md-crash-config/config.txt

Mikulas

[   78.478417] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   78.479166] #PF: supervisor write access in kernel mode
[   78.479671] #PF: error_code(0x0002) - not-present page
[   78.480171] PGD 11557f0067 P4D 11557f0067 PUD 0
[   78.480626] Oops: 0002 [#1] PREEMPT SMP
[   78.481001] CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5
[   78.481661] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[   78.482471] Workqueue: kdelayd flush_expired_bios [dm_delay]
[   78.483021] RIP: 0010:mempool_free+0x47/0x80
[   78.483455] Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00
[   78.485220] RSP: 0018:ffff88910036bda8 EFLAGS: 00010093
[   78.485719] RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001
[   78.486404] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8
[   78.487080] RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900
[   78.487764] R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000
[   78.488451] R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05
[   78.489146] FS:  0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000
[   78.489913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   78.490474] CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0
[   78.491165] Call Trace:
[   78.491429]  <TASK>
[   78.491640]  clone_endio+0xf4/0x1c0 [dm_mod]
[   78.492072]  clone_endio+0xf4/0x1c0 [dm_mod]
[   78.492505]  __submit_bio+0x76/0x120
[   78.492859]  submit_bio_noacct_nocheck+0xb6/0x2a0
[   78.493325]  flush_expired_bios+0x28/0x2f [dm_delay]
[   78.493808]  process_one_work+0x1b4/0x300
[   78.494211]  worker_thread+0x45/0x3e0
[   78.494570]  ? rescuer_thread+0x380/0x380
[   78.494957]  kthread+0xc2/0x100
[   78.495279]  ? kthread_complete_and_exit+0x20/0x20
[   78.495743]  ret_from_fork+0x1f/0x30
[   78.496096]  </TASK>
[   78.496326] Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd]
[   78.500425] CR2: 0000000000000000
[   78.500752] ---[ end trace 0000000000000000 ]---
[   78.501214] RIP: 0010:mempool_free+0x47/0x80
[   78.501633] Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00
[   78.503420] RSP: 0018:ffff88910036bda8 EFLAGS: 00010093
[   78.503921] RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001
[   78.504611] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8
[   78.505374] RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900
[   78.506075] R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000
[   78.506766] R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05
[   78.507481] FS:  0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000
[   78.508273] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   78.508837] CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0
[   78.509542] note: kworker/0:1[73] exited with preempt_count 1
[   78.510427] md/raid10:mdX: active with 4 out of 8 devices
[   96.902910] sysrq: Resetting


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-02 16:27 A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018 Mikulas Patocka
@ 2022-11-03  3:47 ` Guoqing Jiang
  2022-11-03  7:28   ` Guoqing Jiang
                     ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Guoqing Jiang @ 2022-11-03  3:47 UTC (permalink / raw)
  To: Mikulas Patocka, Song Liu; +Cc: Zdenek Kabelac, linux-raid, dm-devel

Hi,

On 11/3/22 12:27 AM, Mikulas Patocka wrote:
> Hi
>
> There's a crash in the test shell/lvchange-rebuild-raid.sh when running
> the lvm testsuite. It can be reproduced by running "make check_local
> T=shell/lvchange-rebuild-raid.sh" in a loop.

I have problem to run the cmd (not sure what I missed), it would be 
better if
the relevant cmds are extracted from the script then I can reproduce it with
those cmds directly.

[root@localhost lvm2]# git log | head -1
commit 36a923926c2c27c1a8a5ac262387d2a4d3e620f8
[root@localhost lvm2]# make check_local T=shell/lvchange-rebuild-raid.sh
make -C libdm device-mapper
[...]
make -C daemons
make[1]: Nothing to be done for 'all'.
make -C test check_local
VERBOSE=0 ./lib/runner \
         --testdir . --outdir results \
         --flavours ndev-vanilla --only shell/lvchange-rebuild-raid.sh 
--skip @
running 1 tests
###      running: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 0
| [ 0:00] lib/inittest: line 133: 
/tmp/LVMTEST317948.iCoLwmDhZW/dev/testnull: Permission denied
| [ 0:00] Filesystem does support devices in 
/tmp/LVMTEST317948.iCoLwmDhZW/dev (mounted with nodev?)
| [ 0:00] ## - /root/lvm2/test/shell/lvchange-rebuild-raid.sh:16
| [ 0:00] ## 1 STACKTRACE() called from lib/inittest:134
| [ 0:00] ## 2 source() called from 
/root/lvm2/test/shell/lvchange-rebuild-raid.sh:16
| [ 0:00] ## teardown....ok
###       failed: [ndev-vanilla] shell/lvchange-rebuild-raid.sh

### 1 tests: 0 passed, 0 skipped, 0 timed out, 0 warned, 1 failed
make[1]: *** [Makefile:137: check_local] Error 1
make: *** [Makefile:89: check_local] Error 2

And line 16 is this,

[root@localhost lvm2]# head -16 
/root/lvm2/test/shell/lvchange-rebuild-raid.sh | tail -1
. lib/inittest

For "lvchange --rebuild" action, I guess it relates to CTR_FLAG_REBUILD flag
which is check from two paths.

1. raid_ctr -> parse_raid_params
                    -> analyse_superblocks -> super_validate -> 
super_init_validation

2. raid_status which might invoked by ioctls (DM_DEV_WAIT_CMD and
     DM_TABLE_STATUS_CMD) from lvm

Since the commit you mentioned the behavior of raid_dtr, then I think 
the crash
is caused by path 2, please correct me if my understanding is wrong.

> The crash happens in the kernel 6.0 and 6.1-rc3, but not in 5.19.
>
> I bisected the crash and it is caused by the commit
> 0dd84b319352bb8ba64752d4e45396d8b13e6018.
>
> I uploaded my .config here (it's 12-core virtual machine):
> https://people.redhat.com/~mpatocka/testcases/md-crash-config/config.txt
>
> Mikulas
>
> [   78.478417] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [   78.479166] #PF: supervisor write access in kernel mode
> [   78.479671] #PF: error_code(0x0002) - not-present page
> [   78.480171] PGD 11557f0067 P4D 11557f0067 PUD 0
> [   78.480626] Oops: 0002 [#1] PREEMPT SMP
> [   78.481001] CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5
> [   78.481661] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [   78.482471] Workqueue: kdelayd flush_expired_bios [dm_delay]
> [   78.483021] RIP: 0010:mempool_free+0x47/0x80
> [   78.483455] Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00
> [   78.485220] RSP: 0018:ffff88910036bda8 EFLAGS: 00010093
> [   78.485719] RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001
> [   78.486404] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8
> [   78.487080] RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900
> [   78.487764] R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000
> [   78.488451] R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05
> [   78.489146] FS:  0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000
> [   78.489913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   78.490474] CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0
> [   78.491165] Call Trace:
> [   78.491429]  <TASK>
> [   78.491640]  clone_endio+0xf4/0x1c0 [dm_mod]
> [   78.492072]  clone_endio+0xf4/0x1c0 [dm_mod]

The clone_endio belongs to "clone" target_type.

> [   78.492505]  __submit_bio+0x76/0x120
> [   78.492859]  submit_bio_noacct_nocheck+0xb6/0x2a0
> [   78.493325]  flush_expired_bios+0x28/0x2f [dm_delay]

This is "delay" target_type. Could you shed light on how the two targets
connect with dm-raid? And I have shallow knowledge about dm ...

> [   78.493808]  process_one_work+0x1b4/0x300
> [   78.494211]  worker_thread+0x45/0x3e0
> [   78.494570]  ? rescuer_thread+0x380/0x380
> [   78.494957]  kthread+0xc2/0x100
> [   78.495279]  ? kthread_complete_and_exit+0x20/0x20
> [   78.495743]  ret_from_fork+0x1f/0x30
> [   78.496096]  </TASK>
> [   78.496326] Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd]
> [   78.500425] CR2: 0000000000000000
> [   78.500752] ---[ end trace 0000000000000000 ]---
> [   78.501214] RIP: 0010:mempool_free+0x47/0x80

BTW, is the mempool_free from endio -> dec_count -> complete_io?
And io which caused the crash is from dm_io -> async_io / sync_io
  -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
suppose the io is for mirror image.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-03  3:47 ` Guoqing Jiang
@ 2022-11-03  7:28   ` Guoqing Jiang
  2022-11-03 13:00   ` Mikulas Patocka
  2022-11-03 14:46   ` Heming Zhao
  2 siblings, 0 replies; 13+ messages in thread
From: Guoqing Jiang @ 2022-11-03  7:28 UTC (permalink / raw)
  To: Mikulas Patocka, Song Liu; +Cc: Zdenek Kabelac, linux-raid, dm-devel



On 11/3/22 11:47 AM, Guoqing Jiang wrote:
>> [   78.491429] <TASK>
>> [   78.491640]  clone_endio+0xf4/0x1c0 [dm_mod]
>> [   78.492072]  clone_endio+0xf4/0x1c0 [dm_mod]
>
> The clone_endio belongs to "clone" target_type.

Hmm, could be the "clone_endio" from dm.c instead of dm-clone-target.c.

>
>> [   78.492505] __submit_bio+0x76/0x120
>> [   78.492859]  submit_bio_noacct_nocheck+0xb6/0x2a0
>> [   78.493325]  flush_expired_bios+0x28/0x2f [dm_delay]
>
> This is "delay" target_type. Could you shed light on how the two targets
> connect with dm-raid? And I have shallow knowledge about dm ...
>
>> [   78.493808] process_one_work+0x1b4/0x300
>> [   78.494211]  worker_thread+0x45/0x3e0
>> [   78.494570]  ? rescuer_thread+0x380/0x380
>> [   78.494957]  kthread+0xc2/0x100
>> [   78.495279]  ? kthread_complete_and_exit+0x20/0x20
>> [   78.495743]  ret_from_fork+0x1f/0x30
>> [   78.496096]  </TASK>
>> [   78.496326] Modules linked in: brd dm_delay dm_raid dm_mod 
>> af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev 
>> tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon 
>> rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button 
>> mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy 
>> async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod 
>> t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg 
>> scsi_common [last unloaded: brd]
>> [   78.500425] CR2: 0000000000000000
>> [   78.500752] ---[ end trace 0000000000000000 ]---
>> [   78.501214] RIP: 0010:mempool_free+0x47/0x80
>
> BTW, is the mempool_free from endio -> dec_count -> complete_io?

I guess it is "mempool_free(io, &io->client->pool)", and the pool is 
freed by
dm_io_client_destroy, and seems dm-raid is not responsible for either create
pool or destroy pool.

> And io which caused the crash is from dm_io -> async_io / sync_io
>  -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
> suppose the io is for mirror image. 

The io should be from another path (dm_submit_bio -> 
dm_split_and_process_bio
-> __split_and_process_bio -> __map_bio which sets "bi_end_io = 
clone_endio").

My guess is, there is racy condition between "lvchange --rebuild" and 
raid_dtr since
it was reproduced by running cmd in loop.

Anyway, we can revert the mentioned commit and go back to Neil's 
solution [1],
but I'd like to reproduce it and learn DM a bit.

[1]. 
https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev/T/#m95ac225cab7409f66c295772483d091084a6d470

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-03  3:47 ` Guoqing Jiang
  2022-11-03  7:28   ` Guoqing Jiang
@ 2022-11-03 13:00   ` Mikulas Patocka
  2022-11-03 15:20     ` Mikulas Patocka
  2022-11-03 14:46   ` Heming Zhao
  2 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2022-11-03 13:00 UTC (permalink / raw)
  To: Guoqing Jiang, Zdenek Kabelac; +Cc: Song Liu, linux-raid, dm-devel

[-- Attachment #1: Type: text/plain, Size: 1179 bytes --]



On Thu, 3 Nov 2022, Guoqing Jiang wrote:

> Hi,
> 
> On 11/3/22 12:27 AM, Mikulas Patocka wrote:
> > Hi
> > 
> > There's a crash in the test shell/lvchange-rebuild-raid.sh when running
> > the lvm testsuite. It can be reproduced by running "make check_local
> > T=shell/lvchange-rebuild-raid.sh" in a loop.
> 
> I have problem to run the cmd (not sure what I missed), it would be better if
> the relevant cmds are extracted from the script then I can reproduce it with
> those cmds directly.

Ask Zdenek Kabelac to help you with running the testsuite. He has better 
understanding of the testsuite than me.

> BTW, is the mempool_free from endio -> dec_count -> complete_io?
> And io which caused the crash is from dm_io -> async_io / sync_io
>  -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
> suppose the io is for mirror image.
> 
> Thanks,
> Guoqing

I presume that the bug is caused by destruction of a bio set while bio 
from that set was in progress. When the bio finishes and an attempt is 
made to free the bio, a crash happens when the code tries to free the bio 
into a destroyed mempool.

I can do more testing to validate this theory.

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-03 13:00   ` Mikulas Patocka
@ 2022-11-03 15:20     ` Mikulas Patocka
  2022-11-04  2:41       ` Guoqing Jiang
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2022-11-03 15:20 UTC (permalink / raw)
  To: Guoqing Jiang, Zdenek Kabelac; +Cc: Song Liu, linux-raid, dm-devel

[-- Attachment #1: Type: text/plain, Size: 2287 bytes --]



On Thu, 3 Nov 2022, Mikulas Patocka wrote:

> > BTW, is the mempool_free from endio -> dec_count -> complete_io?
> > And io which caused the crash is from dm_io -> async_io / sync_io
> >  -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
> > suppose the io is for mirror image.
> > 
> > Thanks,
> > Guoqing
> 
> I presume that the bug is caused by destruction of a bio set while bio 
> from that set was in progress. When the bio finishes and an attempt is 
> made to free the bio, a crash happens when the code tries to free the bio 
> into a destroyed mempool.
> 
> I can do more testing to validate this theory.
> 
> Mikulas

When I disable tail-call optimizations with "-fno-optimize-sibling-calls", 
I get this stacktrace:

[  200.105367] Call Trace:
[  200.105611]  <TASK>
[  200.105825]  dump_stack_lvl+0x33/0x42
[  200.106196]  dump_stack+0xc/0xd
[  200.106516]  mempool_free.cold+0x22/0x32
[  200.106921]  bio_free+0x49/0x60
[  200.107239]  bio_put+0x95/0x100
[  200.107567]  super_written+0x4f/0x120 [md_mod]
[  200.108020]  bio_endio+0xe8/0x100
[  200.108359]  __dm_io_complete+0x1e9/0x300 [dm_mod]
[  200.108847]  clone_endio+0xf4/0x1c0 [dm_mod]
[  200.109288]  bio_endio+0xe8/0x100
[  200.109621]  __dm_io_complete+0x1e9/0x300 [dm_mod]
[  200.110102]  clone_endio+0xf4/0x1c0 [dm_mod]
[  200.110543]  bio_endio+0xe8/0x100
[  200.110877]  brd_submit_bio+0xf8/0x123 [brd]
[  200.111310]  __submit_bio+0x7a/0x120
[  200.111670]  submit_bio_noacct_nocheck+0xb6/0x2a0
[  200.112138]  submit_bio_noacct+0x12e/0x3e0
[  200.112551]  dm_submit_bio_remap+0x46/0xa0 [dm_mod]
[  200.113036]  flush_expired_bios+0x28/0x2f [dm_delay]
[  200.113536]  process_one_work+0x1b4/0x320
[  200.113943]  worker_thread+0x45/0x3e0
[  200.114319]  ? rescuer_thread+0x380/0x380
[  200.114714]  kthread+0xc2/0x100
[  200.115035]  ? kthread_complete_and_exit+0x20/0x20
[  200.115517]  ret_from_fork+0x1f/0x30
[  200.115874]  </TASK>

The function super_written is obviously buggy, because it first wakes up a 
process and then calls bio_put(bio) - so the woken-up process is racing 
with bio_put(bio) and the result is that we attempt to free a bio into a 
destroyed bio set.

When I fix super_written, there are no longer any crashes. I'm posting a 
patch in the next email.

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-03 15:20     ` Mikulas Patocka
@ 2022-11-04  2:41       ` Guoqing Jiang
  2022-11-04 13:40         ` Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Guoqing Jiang @ 2022-11-04  2:41 UTC (permalink / raw)
  To: Mikulas Patocka, Zdenek Kabelac
  Cc: Song Liu, linux-raid, dm-devel, Neil Brown



On 11/3/22 11:20 PM, Mikulas Patocka wrote:
>
> On Thu, 3 Nov 2022, Mikulas Patocka wrote:
>
>>> BTW, is the mempool_free from endio -> dec_count -> complete_io?
>>> And io which caused the crash is from dm_io -> async_io / sync_io
>>>   -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
>>> suppose the io is for mirror image.
>>>
>>> Thanks,
>>> Guoqing
>> I presume that the bug is caused by destruction of a bio set while bio
>> from that set was in progress. When the bio finishes and an attempt is
>> made to free the bio, a crash happens when the code tries to free the bio
>> into a destroyed mempool.
>>
>> I can do more testing to validate this theory.
>>
>> Mikulas
> When I disable tail-call optimizations with "-fno-optimize-sibling-calls",
> I get this stacktrace:

Just curious, is the option used for compile kernel or lvm? BTW, this 
trace is
different from previous one, and it is more understandable to me, thanks.

> [  200.105367] Call Trace:
> [  200.105611]  <TASK>
> [  200.105825]  dump_stack_lvl+0x33/0x42
> [  200.106196]  dump_stack+0xc/0xd
> [  200.106516]  mempool_free.cold+0x22/0x32
> [  200.106921]  bio_free+0x49/0x60
> [  200.107239]  bio_put+0x95/0x100
> [  200.107567]  super_written+0x4f/0x120 [md_mod]
> [  200.108020]  bio_endio+0xe8/0x100
> [  200.108359]  __dm_io_complete+0x1e9/0x300 [dm_mod]
> [  200.108847]  clone_endio+0xf4/0x1c0 [dm_mod]
> [  200.109288]  bio_endio+0xe8/0x100
> [  200.109621]  __dm_io_complete+0x1e9/0x300 [dm_mod]
> [  200.110102]  clone_endio+0xf4/0x1c0 [dm_mod]

Assume the above from this chain.

clone_endio -> dm_io_dec_pending -> __dm_io_dec_pending -> dm_io_complete
-> __dm_io_complete -> bio_endio

> [  200.110543]  bio_endio+0xe8/0x100
> [  200.110877]  brd_submit_bio+0xf8/0x123 [brd]
> [  200.111310]  __submit_bio+0x7a/0x120
> [  200.111670]  submit_bio_noacct_nocheck+0xb6/0x2a0
> [  200.112138]  submit_bio_noacct+0x12e/0x3e0
> [  200.112551]  dm_submit_bio_remap+0x46/0xa0 [dm_mod]
> [  200.113036]  flush_expired_bios+0x28/0x2f [dm_delay]

Was flush_expired_bios triggered by the path?

__dm_destroy or __dm_suspend -> suspend_targets -> delay_presuspend
-> del_timer_sync(&dc->delay_timer) -> handle_delayed_timer

> [  200.113536]  process_one_work+0x1b4/0x320
> [  200.113943]  worker_thread+0x45/0x3e0
> [  200.114319]  ? rescuer_thread+0x380/0x380
> [  200.114714]  kthread+0xc2/0x100
> [  200.115035]  ? kthread_complete_and_exit+0x20/0x20
> [  200.115517]  ret_from_fork+0x1f/0x30
> [  200.115874]  </TASK>
>
> The function super_written is obviously buggy, because it first wakes up a
> process and then calls bio_put(bio) - so the woken-up process is racing
> with bio_put(bio) and the result is that we attempt to free a bio into a
> destroyed bio set.

Does it mean the woken-up process destroyed the bio set?

The io for super write is allocated from sync_set, and the bio set is 
mostly destroyed
in md_free_disk or md_stop, I assume md_stop is more relevant here as it 
is called
by dm-raid.

So I guess the problem is, raid_dtr is called while in flight (or 
expired) bio still not
completed, maybe lvm issues cmd to call dm_table_destroy and dm was in the
progress of suspend or destroy. Just my $0.02.

> When I fix super_written, there are no longer any crashes. I'm posting a
> patch in the next email.

IIRC, the code existed for a long time, I'd suggest try with mdadm test 
suite first
with the change. Cc Neil too.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-04  2:41       ` Guoqing Jiang
@ 2022-11-04 13:40         ` Mikulas Patocka
  2022-11-07  9:32           ` Guoqing Jiang
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2022-11-04 13:40 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: Zdenek Kabelac, Song Liu, linux-raid, dm-devel, Neil Brown

[-- Attachment #1: Type: text/plain, Size: 6616 bytes --]



On Fri, 4 Nov 2022, Guoqing Jiang wrote:

> 
> 
> On 11/3/22 11:20 PM, Mikulas Patocka wrote:
> > 
> > On Thu, 3 Nov 2022, Mikulas Patocka wrote:
> > 
> > > > BTW, is the mempool_free from endio -> dec_count -> complete_io?
> > > > And io which caused the crash is from dm_io -> async_io / sync_io
> > > >   -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
> > > > suppose the io is for mirror image.
> > > > 
> > > > Thanks,
> > > > Guoqing
> > > I presume that the bug is caused by destruction of a bio set while bio
> > > from that set was in progress. When the bio finishes and an attempt is
> > > made to free the bio, a crash happens when the code tries to free the bio
> > > into a destroyed mempool.
> > > 
> > > I can do more testing to validate this theory.
> > > 
> > > Mikulas
> > When I disable tail-call optimizations with "-fno-optimize-sibling-calls",
> > I get this stacktrace:
> 
> Just curious, is the option used for compile kernel or lvm?

I used it to compile the kernel. I set
export KCFLAGS="-march=barcelona -fno-optimize-sibling-calls"
and recompiled the kernel.

> BTW, this trace is different from previous one, and it is more 
> understandable to me, thanks.
> 
> > [  200.105367] Call Trace:
> > [  200.105611]  <TASK>
> > [  200.105825]  dump_stack_lvl+0x33/0x42
> > [  200.106196]  dump_stack+0xc/0xd
> > [  200.106516]  mempool_free.cold+0x22/0x32
> > [  200.106921]  bio_free+0x49/0x60
> > [  200.107239]  bio_put+0x95/0x100
> > [  200.107567]  super_written+0x4f/0x120 [md_mod]
> > [  200.108020]  bio_endio+0xe8/0x100
> > [  200.108359]  __dm_io_complete+0x1e9/0x300 [dm_mod]
> > [  200.108847]  clone_endio+0xf4/0x1c0 [dm_mod]
> > [  200.109288]  bio_endio+0xe8/0x100
> > [  200.109621]  __dm_io_complete+0x1e9/0x300 [dm_mod]
> > [  200.110102]  clone_endio+0xf4/0x1c0 [dm_mod]
> 
> Assume the above from this chain.
> 
> clone_endio -> dm_io_dec_pending -> __dm_io_dec_pending -> dm_io_complete
> -> __dm_io_complete -> bio_endio
> 
> > [  200.110543]  bio_endio+0xe8/0x100
> > [  200.110877]  brd_submit_bio+0xf8/0x123 [brd]
> > [  200.111310]  __submit_bio+0x7a/0x120
> > [  200.111670]  submit_bio_noacct_nocheck+0xb6/0x2a0
> > [  200.112138]  submit_bio_noacct+0x12e/0x3e0
> > [  200.112551]  dm_submit_bio_remap+0x46/0xa0 [dm_mod]
> > [  200.113036]  flush_expired_bios+0x28/0x2f [dm_delay]
> 
> Was flush_expired_bios triggered by the path?
> 
> __dm_destroy or __dm_suspend -> suspend_targets -> delay_presuspend
> -> del_timer_sync(&dc->delay_timer) -> handle_delayed_timer

No - del_timer_sync doesn't call handle_delayed_timer.

The timer was set by "mod_timer(&dc->delay_timer, expires);". When the 
time expires, it calls handle_delayed_timer. handle_delayed_timer calls 
"queue_work(dc->kdelayd_wq, &dc->flush_expired_bios);" - this submits a 
work item that calls "flush_expired_bios" and triggers this stacktrace.

> > [  200.113536]  process_one_work+0x1b4/0x320
> > [  200.113943]  worker_thread+0x45/0x3e0
> > [  200.114319]  ? rescuer_thread+0x380/0x380
> > [  200.114714]  kthread+0xc2/0x100
> > [  200.115035]  ? kthread_complete_and_exit+0x20/0x20
> > [  200.115517]  ret_from_fork+0x1f/0x30
> > [  200.115874]  </TASK>
> > 
> > The function super_written is obviously buggy, because it first wakes up a
> > process and then calls bio_put(bio) - so the woken-up process is racing
> > with bio_put(bio) and the result is that we attempt to free a bio into a
> > destroyed bio set.
> 
> Does it mean the woken-up process destroyed the bio set?

Yes.

> The io for super write is allocated from sync_set, and the bio set is 
> mostly destroyed in md_free_disk or md_stop, I assume md_stop is more 
> relevant here as it is called by dm-raid.

Yes.

> So I guess the problem is, raid_dtr is called while in flight (or 
> expired) bio still not completed,

When device mapper calls raid_dtr, there are no external bios in progress 
(this is guaranteed by dm design). However, the bio that writes the 
superblock is still in progress and it races with the md_stop. I am not an 
expert in md, so I don't know if this is OK or not.

> maybe lvm issues cmd to call dm_table_destroy and dm was in the progress 
> of suspend or destroy. Just my $0.02.
> 
> > When I fix super_written, there are no longer any crashes. I'm posting a
> > patch in the next email.
> 
> IIRC, the code existed for a long time, I'd suggest try with mdadm test 
> suite first with the change. Cc Neil too.

I get an error when attempting to run the testsuite:
[debian:/usr/src/git/mdadm]# ./test
Testing on linux-6.1.0-rc3 kernel
/usr/src/git/mdadm/tests/00createnames... FAILED - see 
/var/tmp/00createnames.log and /var/tmp/fail00createnames.log for details

[debian:/usr/src/git/mdadm]# cat /var/tmp/00createnames.log
+ . /usr/src/git/mdadm/tests/00createnames
++ set -x -e
++ _create /dev/md/name
++ local DEVNAME=/dev/md/name
++ local NAME=
++ [[ -z '' ]]
++ mdadm -CR /dev/md/name -l0 -n 1 /dev/loop0 --force
++ rm -f /var/tmp/stderr
++ case $* in
++ case $* in
++ for args in $*
++ [[ -CR =~ /dev/ ]]
++ for args in $*
++ [[ /dev/md/name =~ /dev/ ]]
++ [[ /dev/md/name =~ md ]]
++ for args in $*
++ [[ -l0 =~ /dev/ ]]
++ for args in $*
++ [[ -n =~ /dev/ ]]
++ for args in $*
++ [[ 1 =~ /dev/ ]]
++ for args in $*
++ [[ /dev/loop0 =~ /dev/ ]]
++ [[ /dev/loop0 =~ md ]]
++ /usr/src/git/mdadm/mdadm --zero /dev/loop0
mdadm: Unrecognised md component device - /dev/loop0
++ for args in $*
++ [[ --force =~ /dev/ ]]
++ /usr/src/git/mdadm/mdadm --quiet -CR /dev/md/name -l0 -n 1 /dev/loop0 
--force --auto=yes
++ rv=1
++ case $* in
++ cat /var/tmp/stderr
mdadm: unexpected failure opening /dev/md127
++ return 1
++ [[ 1 != \0 ]]
++ echo 'Cannot create device.'
Cannot create device.
++ exit 1
mdadm: unexpected failure opening /dev/md127

[debian:/usr/src/git/mdadm]# cat /var/tmp/fail00createnames.log
## debian.vm: saving dmesg.
## debian.vm: saving proc mdstat.
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
## debian.vm: no array assembled!
## debian.vm: saving dmesg.
## debian.vm: saving proc mdstat.
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
## debian.vm: no array assembled!
## debian.vm: saving dmesg.
## debian.vm: saving proc mdstat.
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
## debian.vm: no array assembled!
## debian.vm: saving dmesg.
## debian.vm: saving proc mdstat.
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
## debian.vm: no array assembled!

> Thanks,
> Guoqing

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-04 13:40         ` Mikulas Patocka
@ 2022-11-07  9:32           ` Guoqing Jiang
  0 siblings, 0 replies; 13+ messages in thread
From: Guoqing Jiang @ 2022-11-07  9:32 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Zdenek Kabelac, Song Liu, linux-raid, dm-devel, Neil Brown



On 11/4/22 9:40 PM, Mikulas Patocka wrote:
> On Fri, 4 Nov 2022, Guoqing Jiang wrote:
>
>> On 11/3/22 11:20 PM, Mikulas Patocka wrote:
>>> On Thu, 3 Nov 2022, Mikulas Patocka wrote:
>>>
>>>>> BTW, is the mempool_free from endio -> dec_count -> complete_io?
>>>>> And io which caused the crash is from dm_io -> async_io / sync_io
>>>>>    -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
>>>>> suppose the io is for mirror image.
>>>>>
>>>>> Thanks,
>>>>> Guoqing
>>>> I presume that the bug is caused by destruction of a bio set while bio
>>>> from that set was in progress. When the bio finishes and an attempt is
>>>> made to free the bio, a crash happens when the code tries to free the bio
>>>> into a destroyed mempool.
>>>>
>>>> I can do more testing to validate this theory.
>>>>
>>>> Mikulas
>>> When I disable tail-call optimizations with "-fno-optimize-sibling-calls",
>>> I get this stacktrace:
>> Just curious, is the option used for compile kernel or lvm?
> I used it to compile the kernel. I set
> export KCFLAGS="-march=barcelona -fno-optimize-sibling-calls"
> and recompiled the kernel.

Thanks for the sharing!

>> BTW, this trace is different from previous one, and it is more
>> understandable to me, thanks.
>>
>>> [  200.105367] Call Trace:
>>> [  200.105611]  <TASK>
>>> [  200.105825]  dump_stack_lvl+0x33/0x42
>>> [  200.106196]  dump_stack+0xc/0xd
>>> [  200.106516]  mempool_free.cold+0x22/0x32
>>> [  200.106921]  bio_free+0x49/0x60
>>> [  200.107239]  bio_put+0x95/0x100
>>> [  200.107567]  super_written+0x4f/0x120 [md_mod]
>>> [  200.108020]  bio_endio+0xe8/0x100
>>> [  200.108359]  __dm_io_complete+0x1e9/0x300 [dm_mod]
>>> [  200.108847]  clone_endio+0xf4/0x1c0 [dm_mod]
>>> [  200.109288]  bio_endio+0xe8/0x100
>>> [  200.109621]  __dm_io_complete+0x1e9/0x300 [dm_mod]
>>> [  200.110102]  clone_endio+0xf4/0x1c0 [dm_mod]
>> Assume the above from this chain.
>>
>> clone_endio -> dm_io_dec_pending -> __dm_io_dec_pending -> dm_io_complete
>> -> __dm_io_complete -> bio_endio
>>
>>> [  200.110543]  bio_endio+0xe8/0x100
>>> [  200.110877]  brd_submit_bio+0xf8/0x123 [brd]
>>> [  200.111310]  __submit_bio+0x7a/0x120
>>> [  200.111670]  submit_bio_noacct_nocheck+0xb6/0x2a0
>>> [  200.112138]  submit_bio_noacct+0x12e/0x3e0
>>> [  200.112551]  dm_submit_bio_remap+0x46/0xa0 [dm_mod]
>>> [  200.113036]  flush_expired_bios+0x28/0x2f [dm_delay]
>> Was flush_expired_bios triggered by the path?
>>
>> __dm_destroy or __dm_suspend -> suspend_targets -> delay_presuspend
>> -> del_timer_sync(&dc->delay_timer) -> handle_delayed_timer
> No - del_timer_sync doesn't call handle_delayed_timer.
>
> The timer was set by "mod_timer(&dc->delay_timer, expires);". When the
> time expires, it calls handle_delayed_timer. handle_delayed_timer calls
> "queue_work(dc->kdelayd_wq, &dc->flush_expired_bios);" - this submits a
> work item that calls "flush_expired_bios" and triggers this stacktrace.

Ok, so the queue_timeout which sets expire timer is called by delay_map
-> delay_bio or flush_delayed_bios, and in either way the bio is originated
from delayed_bios list.

>>> [  200.113536]  process_one_work+0x1b4/0x320
>>> [  200.113943]  worker_thread+0x45/0x3e0
>>> [  200.114319]  ? rescuer_thread+0x380/0x380
>>> [  200.114714]  kthread+0xc2/0x100
>>> [  200.115035]  ? kthread_complete_and_exit+0x20/0x20
>>> [  200.115517]  ret_from_fork+0x1f/0x30
>>> [  200.115874]  </TASK>
>>>
>>> The function super_written is obviously buggy, because it first wakes up a
>>> process and then calls bio_put(bio) - so the woken-up process is racing
>>> with bio_put(bio) and the result is that we attempt to free a bio into a
>>> destroyed bio set.
>> Does it mean the woken-up process destroyed the bio set?
> Yes.
>
>> The io for super write is allocated from sync_set, and the bio set is
>> mostly destroyed in md_free_disk or md_stop, I assume md_stop is more
>> relevant here as it is called by dm-raid.
> Yes.
>
>> So I guess the problem is, raid_dtr is called while in flight (or
>> expired) bio still not completed,
> When device mapper calls raid_dtr, there are no external bios in progress
> (this is guaranteed by dm design). However, the bio that writes the
> superblock is still in progress and it races with the md_stop. I am not an
> expert in md, so I don't know if this is OK or not.

For md raid, I think the same situation can't happen given the bioset 
can only
be destroyed from disk_release, and block layer guarantees all bios 
happen to
array should be finished before release disk.

Also I believe dm can guarantee all bio happened to dm should be 
finished before
dtr, but the bio in the call trace was issued to md layer instead of 
inside dm, not
sure how dm handle this case correctly.

And super write bio is kind of special, it happens mostly when the 
configuration
of array is changed, e.g. size,  disk member, but seems mddev_detach is not
capable to guarantee the completion of super write bio, which means when the
path (md_stop -> __md_stop) destroys bioset, the super write bio could 
still exist.
Maybe it is necessary to check pending_writes for super write io in md_stop
before bioset_exit.

> maybe lvm issues cmd to call dm_table_destroy and dm was in the progress
> of suspend or destroy. Just my $0.02.
>
>>> When I fix super_written, there are no longer any crashes. I'm posting a
>>> patch in the next email.
>> IIRC, the code existed for a long time, I'd suggest try with mdadm test
>> suite first with the change. Cc Neil too.
> I get an error when attempting to run the testsuite:
> [debian:/usr/src/git/mdadm]# ./test
> Testing on linux-6.1.0-rc3 kernel
> /usr/src/git/mdadm/tests/00createnames... FAILED - see
> /var/tmp/00createnames.log and /var/tmp/fail00createnames.log for details
>
> [debian:/usr/src/git/mdadm]# cat /var/tmp/00createnames.log
> + . /usr/src/git/mdadm/tests/00createnames
> ++ set -x -e
> ++ _create /dev/md/name
> ++ local DEVNAME=/dev/md/name
> ++ local NAME=
> ++ [[ -z '' ]]
> ++ mdadm -CR /dev/md/name -l0 -n 1 /dev/loop0 --force
> ++ rm -f /var/tmp/stderr
> ++ case $* in
> ++ case $* in
> ++ for args in $*
> ++ [[ -CR =~ /dev/ ]]
> ++ for args in $*
> ++ [[ /dev/md/name =~ /dev/ ]]
> ++ [[ /dev/md/name =~ md ]]
> ++ for args in $*
> ++ [[ -l0 =~ /dev/ ]]
> ++ for args in $*
> ++ [[ -n =~ /dev/ ]]
> ++ for args in $*
> ++ [[ 1 =~ /dev/ ]]
> ++ for args in $*
> ++ [[ /dev/loop0 =~ /dev/ ]]
> ++ [[ /dev/loop0 =~ md ]]
> ++ /usr/src/git/mdadm/mdadm --zero /dev/loop0
> mdadm: Unrecognised md component device - /dev/loop0
> ++ for args in $*
> ++ [[ --force =~ /dev/ ]]
> ++ /usr/src/git/mdadm/mdadm --quiet -CR /dev/md/name -l0 -n 1 /dev/loop0
> --force --auto=yes
> ++ rv=1
> ++ case $* in
> ++ cat /var/tmp/stderr
> mdadm: unexpected failure opening /dev/md127
> ++ return 1
> ++ [[ 1 != \0 ]]
> ++ echo 'Cannot create device.'
> Cannot create device.
> ++ exit 1
> mdadm: unexpected failure opening /dev/md127
>
> [debian:/usr/src/git/mdadm]# cat /var/tmp/fail00createnames.log
> ## debian.vm: saving dmesg.
> ## debian.vm: saving proc mdstat.
> Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> unused devices: <none>
> ## debian.vm: no array assembled!
> ## debian.vm: saving dmesg.
> ## debian.vm: saving proc mdstat.
> Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> unused devices: <none>
> ## debian.vm: no array assembled!
> ## debian.vm: saving dmesg.
> ## debian.vm: saving proc mdstat.
> Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> unused devices: <none>
> ## debian.vm: no array assembled!
> ## debian.vm: saving dmesg.
> ## debian.vm: saving proc mdstat.
> Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> unused devices: <none>
> ## debian.vm: no array assembled!

I guess the problem is that loop module was not loaded, but I didn't
have experience with debian.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-03  3:47 ` Guoqing Jiang
  2022-11-03  7:28   ` Guoqing Jiang
  2022-11-03 13:00   ` Mikulas Patocka
@ 2022-11-03 14:46   ` Heming Zhao
  2022-11-04  1:23     ` Guoqing Jiang
  2 siblings, 1 reply; 13+ messages in thread
From: Heming Zhao @ 2022-11-03 14:46 UTC (permalink / raw)
  To: Guoqing Jiang, Mikulas Patocka, Song Liu
  Cc: Zdenek Kabelac, linux-raid, dm-devel

On 11/3/22 11:47 AM, Guoqing Jiang wrote:
> Hi,
> 
> On 11/3/22 12:27 AM, Mikulas Patocka wrote:
>> Hi
>>
>> There's a crash in the test shell/lvchange-rebuild-raid.sh when running
>> the lvm testsuite. It can be reproduced by running "make check_local
>> T=shell/lvchange-rebuild-raid.sh" in a loop.
> 
> I have problem to run the cmd (not sure what I missed), it would be better if
> the relevant cmds are extracted from the script then I can reproduce it with
> those cmds directly.
> 
> [root@localhost lvm2]# git log | head -1
> commit 36a923926c2c27c1a8a5ac262387d2a4d3e620f8
> [root@localhost lvm2]# make check_local T=shell/lvchange-rebuild-raid.sh
> make -C libdm device-mapper
> [...]
> make -C daemons
> make[1]: Nothing to be done for 'all'.
> make -C test check_local
> VERBOSE=0 ./lib/runner \
>          --testdir . --outdir results \
>          --flavours ndev-vanilla --only shell/lvchange-rebuild-raid.sh --skip @
> running 1 tests
> ###      running: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 0
> | [ 0:00] lib/inittest: line 133: /tmp/LVMTEST317948.iCoLwmDhZW/dev/testnull: Permission denied
> | [ 0:00] Filesystem does support devices in /tmp/LVMTEST317948.iCoLwmDhZW/dev (mounted with nodev?)

I didn't read other mails in this thread, only for above issue.
If you use opensuse, systemd service tmp.mount uses nodev option to mount tmpfs on /tmp.
 From my experience, there are two methods to fix(work around):
1. systemctl disable tmp.mount && systemctl mask tmp.mount && reboot
2. mv /usr/lib/systemd/system/tmp.mount /root/ && reboot

Thanks,
Heming

> | [ 0:00] ## - /root/lvm2/test/shell/lvchange-rebuild-raid.sh:16
> | [ 0:00] ## 1 STACKTRACE() called from lib/inittest:134
> | [ 0:00] ## 2 source() called from /root/lvm2/test/shell/lvchange-rebuild-raid.sh:16
> | [ 0:00] ## teardown....ok
> ###       failed: [ndev-vanilla] shell/lvchange-rebuild-raid.sh
> 
> ### 1 tests: 0 passed, 0 skipped, 0 timed out, 0 warned, 1 failed
> make[1]: *** [Makefile:137: check_local] Error 1
> make: *** [Makefile:89: check_local] Error 2
> 
> And line 16 is this,
> 
> [root@localhost lvm2]# head -16 /root/lvm2/test/shell/lvchange-rebuild-raid.sh | tail -1
> . lib/inittest
> 
> For "lvchange --rebuild" action, I guess it relates to CTR_FLAG_REBUILD flag
> which is check from two paths.
> 
> 1. raid_ctr -> parse_raid_params
>                     -> analyse_superblocks -> super_validate -> super_init_validation
> 
> 2. raid_status which might invoked by ioctls (DM_DEV_WAIT_CMD and
>      DM_TABLE_STATUS_CMD) from lvm
> 
> Since the commit you mentioned the behavior of raid_dtr, then I think the crash
> is caused by path 2, please correct me if my understanding is wrong.
> 
>> The crash happens in the kernel 6.0 and 6.1-rc3, but not in 5.19.
>>
>> I bisected the crash and it is caused by the commit
>> 0dd84b319352bb8ba64752d4e45396d8b13e6018.
>>
>> I uploaded my .config here (it's 12-core virtual machine):
>> https://people.redhat.com/~mpatocka/testcases/md-crash-config/config.txt
>>
>> Mikulas
>>
>> [   78.478417] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> [   78.479166] #PF: supervisor write access in kernel mode
>> [   78.479671] #PF: error_code(0x0002) - not-present page
>> [   78.480171] PGD 11557f0067 P4D 11557f0067 PUD 0
>> [   78.480626] Oops: 0002 [#1] PREEMPT SMP
>> [   78.481001] CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5
>> [   78.481661] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
>> [   78.482471] Workqueue: kdelayd flush_expired_bios [dm_delay]
>> [   78.483021] RIP: 0010:mempool_free+0x47/0x80
>> [   78.483455] Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00
>> [   78.485220] RSP: 0018:ffff88910036bda8 EFLAGS: 00010093
>> [   78.485719] RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001
>> [   78.486404] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8
>> [   78.487080] RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900
>> [   78.487764] R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000
>> [   78.488451] R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05
>> [   78.489146] FS:  0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000
>> [   78.489913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   78.490474] CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0
>> [   78.491165] Call Trace:
>> [   78.491429]  <TASK>
>> [   78.491640]  clone_endio+0xf4/0x1c0 [dm_mod]
>> [   78.492072]  clone_endio+0xf4/0x1c0 [dm_mod]
> 
> The clone_endio belongs to "clone" target_type.
> 
>> [   78.492505]  __submit_bio+0x76/0x120
>> [   78.492859]  submit_bio_noacct_nocheck+0xb6/0x2a0
>> [   78.493325]  flush_expired_bios+0x28/0x2f [dm_delay]
> 
> This is "delay" target_type. Could you shed light on how the two targets
> connect with dm-raid? And I have shallow knowledge about dm ...
> 
>> [   78.493808]  process_one_work+0x1b4/0x300
>> [   78.494211]  worker_thread+0x45/0x3e0
>> [   78.494570]  ? rescuer_thread+0x380/0x380
>> [   78.494957]  kthread+0xc2/0x100
>> [   78.495279]  ? kthread_complete_and_exit+0x20/0x20
>> [   78.495743]  ret_from_fork+0x1f/0x30
>> [   78.496096]  </TASK>
>> [   78.496326] Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd]
>> [   78.500425] CR2: 0000000000000000
>> [   78.500752] ---[ end trace 0000000000000000 ]---
>> [   78.501214] RIP: 0010:mempool_free+0x47/0x80
> 
> BTW, is the mempool_free from endio -> dec_count -> complete_io?
> And io which caused the crash is from dm_io -> async_io / sync_io
>   -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
> suppose the io is for mirror image.
> 
> Thanks,
> Guoqing


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-03 14:46   ` Heming Zhao
@ 2022-11-04  1:23     ` Guoqing Jiang
  2022-11-04 11:10       ` Zdenek Kabelac
  0 siblings, 1 reply; 13+ messages in thread
From: Guoqing Jiang @ 2022-11-04  1:23 UTC (permalink / raw)
  To: Heming Zhao, Mikulas Patocka, Song Liu
  Cc: Zdenek Kabelac, linux-raid, dm-devel



On 11/3/22 10:46 PM, Heming Zhao wrote:
> On 11/3/22 11:47 AM, Guoqing Jiang wrote:
>> Hi,
>>
>> On 11/3/22 12:27 AM, Mikulas Patocka wrote:
>>> Hi
>>>
>>> There's a crash in the test shell/lvchange-rebuild-raid.sh when running
>>> the lvm testsuite. It can be reproduced by running "make check_local
>>> T=shell/lvchange-rebuild-raid.sh" in a loop.
>>
>> I have problem to run the cmd (not sure what I missed), it would be 
>> better if
>> the relevant cmds are extracted from the script then I can reproduce 
>> it with
>> those cmds directly.
>>
>> [root@localhost lvm2]# git log | head -1
>> commit 36a923926c2c27c1a8a5ac262387d2a4d3e620f8
>> [root@localhost lvm2]# make check_local T=shell/lvchange-rebuild-raid.sh
>> make -C libdm device-mapper
>> [...]
>> make -C daemons
>> make[1]: Nothing to be done for 'all'.
>> make -C test check_local
>> VERBOSE=0 ./lib/runner \
>>          --testdir . --outdir results \
>>          --flavours ndev-vanilla --only 
>> shell/lvchange-rebuild-raid.sh --skip @
>> running 1 tests
>> ###      running: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 0
>> | [ 0:00] lib/inittest: line 133: 
>> /tmp/LVMTEST317948.iCoLwmDhZW/dev/testnull: Permission denied
>> | [ 0:00] Filesystem does support devices in 
>> /tmp/LVMTEST317948.iCoLwmDhZW/dev (mounted with nodev?)
>
> I didn't read other mails in this thread, only for above issue.
> If you use opensuse, systemd service tmp.mount uses nodev option to 
> mount tmpfs on /tmp.
> From my experience, there are two methods to fix(work around):
> 1. systemctl disable tmp.mount && systemctl mask tmp.mount && reboot
> 2. mv /usr/lib/systemd/system/tmp.mount /root/ && reboot

I am using centos similar system, I can try leap later. Appreciate for 
the tips, Heming.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-04  1:23     ` Guoqing Jiang
@ 2022-11-04 11:10       ` Zdenek Kabelac
  2022-11-04 15:18         ` [dm-devel] " Xiao Ni
  2022-11-07  1:52         ` Guoqing Jiang
  0 siblings, 2 replies; 13+ messages in thread
From: Zdenek Kabelac @ 2022-11-04 11:10 UTC (permalink / raw)
  To: Guoqing Jiang, Heming Zhao, Mikulas Patocka, Song Liu
  Cc: linux-raid, dm-devel

Dne 04. 11. 22 v 2:23 Guoqing Jiang napsal(a):
>
>
> On 11/3/22 10:46 PM, Heming Zhao wrote:
>> On 11/3/22 11:47 AM, Guoqing Jiang wrote:
>>> Hi,
>>>
>>> On 11/3/22 12:27 AM, Mikulas Patocka wrote:
>>>> Hi
>>>>
>>>> There's a crash in the test shell/lvchange-rebuild-raid.sh when running
>>>> the lvm testsuite. It can be reproduced by running "make check_local
>>>> T=shell/lvchange-rebuild-raid.sh" in a loop.
>>>
>>> I have problem to run the cmd (not sure what I missed), it would be better if
>>> the relevant cmds are extracted from the script then I can reproduce it with
>>> those cmds directly.
>>>
>>> [root@localhost lvm2]# git log | head -1
>>> commit 36a923926c2c27c1a8a5ac262387d2a4d3e620f8
>>> [root@localhost lvm2]# make check_local T=shell/lvchange-rebuild-raid.sh
>>> make -C libdm device-mapper
>>> [...]
>>> make -C daemons
>>> make[1]: Nothing to be done for 'all'.
>>> make -C test check_local
>>> VERBOSE=0 ./lib/runner \
>>>          --testdir . --outdir results \
>>>          --flavours ndev-vanilla --only shell/lvchange-rebuild-raid.sh 
>>> --skip @
>>> running 1 tests
>>> ###      running: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 0
>>> | [ 0:00] lib/inittest: line 133: 
>>> /tmp/LVMTEST317948.iCoLwmDhZW/dev/testnull: Permission denied
>>> | [ 0:00] Filesystem does support devices in 
>>> /tmp/LVMTEST317948.iCoLwmDhZW/dev (mounted with nodev?)
>>
>> I didn't read other mails in this thread, only for above issue.
>> If you use opensuse, systemd service tmp.mount uses nodev option to mount 
>> tmpfs on /tmp.
>> From my experience, there are two methods to fix(work around):
>> 1. systemctl disable tmp.mount && systemctl mask tmp.mount && reboot
>> 2. mv /usr/lib/systemd/system/tmp.mount /root/ && reboot
>
> I am using centos similar system, I can try leap later. Appreciate for the 
> tips, Heming.


You can always redirect default /tmp dir to some other place/filesystem that 
allows you to create /dev nodes. Eventually for 'brave men'  you can let lvm2 
test suite to play directly with your /dev dir.  Normally nothing bad should 
happen, but we tend to prefer more controled '/dev' managed for a test.

Here are two envvars to play with:


make check_local T=shell/lvchange-rebuild-raid.sh LVM_TEST_DIR=/myhomefsdir  
LVM_TEST_DEVDIR=/dev

LVM_TEST_DIR for setting of dir where test creates all its files

LVM_TEST_DEVDIR  you can explicitly tell to keep using system's /dev   
(instead of dir created within tmpdir)


Regards


Zdenek



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-04 11:10       ` Zdenek Kabelac
@ 2022-11-04 15:18         ` Xiao Ni
  2022-11-07  1:52         ` Guoqing Jiang
  1 sibling, 0 replies; 13+ messages in thread
From: Xiao Ni @ 2022-11-04 15:18 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: Guoqing Jiang, Heming Zhao, Mikulas Patocka, Song Liu, linux-raid,
	dm-devel

On Fri, Nov 4, 2022 at 7:10 PM Zdenek Kabelac <zkabelac@redhat.com> wrote:
>
> Dne 04. 11. 22 v 2:23 Guoqing Jiang napsal(a):
> >
> >
> > On 11/3/22 10:46 PM, Heming Zhao wrote:
> >> On 11/3/22 11:47 AM, Guoqing Jiang wrote:
> >>> Hi,
> >>>
> >>> On 11/3/22 12:27 AM, Mikulas Patocka wrote:
> >>>> Hi
> >>>>
> >>>> There's a crash in the test shell/lvchange-rebuild-raid.sh when running
> >>>> the lvm testsuite. It can be reproduced by running "make check_local
> >>>> T=shell/lvchange-rebuild-raid.sh" in a loop.
> >>>
> >>> I have problem to run the cmd (not sure what I missed), it would be better if
> >>> the relevant cmds are extracted from the script then I can reproduce it with
> >>> those cmds directly.
> >>>
> >>> [root@localhost lvm2]# git log | head -1
> >>> commit 36a923926c2c27c1a8a5ac262387d2a4d3e620f8
> >>> [root@localhost lvm2]# make check_local T=shell/lvchange-rebuild-raid.sh
> >>> make -C libdm device-mapper
> >>> [...]
> >>> make -C daemons
> >>> make[1]: Nothing to be done for 'all'.
> >>> make -C test check_local
> >>> VERBOSE=0 ./lib/runner \
> >>>          --testdir . --outdir results \
> >>>          --flavours ndev-vanilla --only shell/lvchange-rebuild-raid.sh
> >>> --skip @
> >>> running 1 tests
> >>> ###      running: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 0
> >>> | [ 0:00] lib/inittest: line 133:
> >>> /tmp/LVMTEST317948.iCoLwmDhZW/dev/testnull: Permission denied
> >>> | [ 0:00] Filesystem does support devices in
> >>> /tmp/LVMTEST317948.iCoLwmDhZW/dev (mounted with nodev?)
> >>
> >> I didn't read other mails in this thread, only for above issue.
> >> If you use opensuse, systemd service tmp.mount uses nodev option to mount
> >> tmpfs on /tmp.
> >> From my experience, there are two methods to fix(work around):
> >> 1. systemctl disable tmp.mount && systemctl mask tmp.mount && reboot
> >> 2. mv /usr/lib/systemd/system/tmp.mount /root/ && reboot
> >
> > I am using centos similar system, I can try leap later. Appreciate for the
> > tips, Heming.
>
>
> You can always redirect default /tmp dir to some other place/filesystem that
> allows you to create /dev nodes. Eventually for 'brave men'  you can let lvm2
> test suite to play directly with your /dev dir.  Normally nothing bad should
> happen, but we tend to prefer more controled '/dev' managed for a test.
>
> Here are two envvars to play with:
>
>
> make check_local T=shell/lvchange-rebuild-raid.sh LVM_TEST_DIR=/myhomefsdir
> LVM_TEST_DEVDIR=/dev
>
> LVM_TEST_DIR for setting of dir where test creates all its files
>
> LVM_TEST_DEVDIR  you can explicitly tell to keep using system's /dev
> (instead of dir created within tmpdir)

Hi Zdenek

I tried this command and the test was skipped. Does it need to add
more options for
the command?

make check_local T=shell/lvchange-rebuild-raid.sh
LVM_TEST_DIR=/root/test  LVM_TEST_DEVDIR=/dev

VERBOSE=0 ./lib/runner \
--testdir . --outdir results \
--flavours ndev-vanilla --only shell/lvchange-rebuild-raid.sh --skip @
running 1 tests
###      skipped: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 0

### 1 tests: 0 passed, 1 skipped, 0 timed out, 0 warned, 0 failed

Regards
Xiao


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
  2022-11-04 11:10       ` Zdenek Kabelac
  2022-11-04 15:18         ` [dm-devel] " Xiao Ni
@ 2022-11-07  1:52         ` Guoqing Jiang
  1 sibling, 0 replies; 13+ messages in thread
From: Guoqing Jiang @ 2022-11-07  1:52 UTC (permalink / raw)
  To: Zdenek Kabelac, Heming Zhao, Mikulas Patocka, Song Liu
  Cc: linux-raid, dm-devel



On 11/4/22 7:10 PM, Zdenek Kabelac wrote:
> Dne 04. 11. 22 v 2:23 Guoqing Jiang napsal(a):
>>
>>
>> On 11/3/22 10:46 PM, Heming Zhao wrote:
>>> On 11/3/22 11:47 AM, Guoqing Jiang wrote:
>>>> Hi,
>>>>
>>>> On 11/3/22 12:27 AM, Mikulas Patocka wrote:
>>>>> Hi
>>>>>
>>>>> There's a crash in the test shell/lvchange-rebuild-raid.sh when 
>>>>> running
>>>>> the lvm testsuite. It can be reproduced by running "make check_local
>>>>> T=shell/lvchange-rebuild-raid.sh" in a loop.
>>>>
>>>> I have problem to run the cmd (not sure what I missed), it would be 
>>>> better if
>>>> the relevant cmds are extracted from the script then I can 
>>>> reproduce it with
>>>> those cmds directly.
>>>>
>>>> [root@localhost lvm2]# git log | head -1
>>>> commit 36a923926c2c27c1a8a5ac262387d2a4d3e620f8
>>>> [root@localhost lvm2]# make check_local 
>>>> T=shell/lvchange-rebuild-raid.sh
>>>> make -C libdm device-mapper
>>>> [...]
>>>> make -C daemons
>>>> make[1]: Nothing to be done for 'all'.
>>>> make -C test check_local
>>>> VERBOSE=0 ./lib/runner \
>>>>          --testdir . --outdir results \
>>>>          --flavours ndev-vanilla --only 
>>>> shell/lvchange-rebuild-raid.sh --skip @
>>>> running 1 tests
>>>> ###      running: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 0
>>>> | [ 0:00] lib/inittest: line 133: 
>>>> /tmp/LVMTEST317948.iCoLwmDhZW/dev/testnull: Permission denied
>>>> | [ 0:00] Filesystem does support devices in 
>>>> /tmp/LVMTEST317948.iCoLwmDhZW/dev (mounted with nodev?)
>>>
>>> I didn't read other mails in this thread, only for above issue.
>>> If you use opensuse, systemd service tmp.mount uses nodev option to 
>>> mount tmpfs on /tmp.
>>> From my experience, there are two methods to fix(work around):
>>> 1. systemctl disable tmp.mount && systemctl mask tmp.mount && reboot
>>> 2. mv /usr/lib/systemd/system/tmp.mount /root/ && reboot
>>
>> I am using centos similar system, I can try leap later. Appreciate 
>> for the tips, Heming.
>
>
> You can always redirect default /tmp dir to some other 
> place/filesystem that allows you to create /dev nodes. Eventually for 
> 'brave men'  you can let lvm2 test suite to play directly with your 
> /dev dir.  Normally nothing bad should happen, but we tend to prefer 
> more controled '/dev' managed for a test.
>
> Here are two envvars to play with:
>
>
> make check_local T=shell/lvchange-rebuild-raid.sh 
> LVM_TEST_DIR=/myhomefsdir  LVM_TEST_DEVDIR=/dev
>
> LVM_TEST_DIR for setting of dir where test creates all its files
>
> LVM_TEST_DEVDIR  you can explicitly tell to keep using system's /dev   
> (instead of dir created within tmpdir)

Thank you! This works for me.

[root@localhost lvm2]# make check_local T=shell/lvchange-rebuild-raid.sh 
LVM_TEST_DIR=/root/test LVM_TEST_DEVDIR=/dev
make -C libdm device-mapper
make[1]: Nothing to be done for 'device-mapper'.

[... ]

make -C test check_local
VERBOSE=0 ./lib/runner \
         --testdir . --outdir results \
         --flavours ndev-vanilla --only shell/lvchange-rebuild-raid.sh 
--skip @
running 1 tests
###       passed: [ndev-vanilla] shell/lvchange-rebuild-raid.sh 20

### 1 tests: 1 passed, 0 skipped, 0 timed out, 0 warned, 0 failed

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-07  9:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-02 16:27 A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018 Mikulas Patocka
2022-11-03  3:47 ` Guoqing Jiang
2022-11-03  7:28   ` Guoqing Jiang
2022-11-03 13:00   ` Mikulas Patocka
2022-11-03 15:20     ` Mikulas Patocka
2022-11-04  2:41       ` Guoqing Jiang
2022-11-04 13:40         ` Mikulas Patocka
2022-11-07  9:32           ` Guoqing Jiang
2022-11-03 14:46   ` Heming Zhao
2022-11-04  1:23     ` Guoqing Jiang
2022-11-04 11:10       ` Zdenek Kabelac
2022-11-04 15:18         ` [dm-devel] " Xiao Ni
2022-11-07  1:52         ` Guoqing Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).