* [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver
@ 2024-06-04 7:25 Shinichiro Kawasaki
2024-06-04 9:26 ` Zhu Yanjun
0 siblings, 1 reply; 9+ messages in thread
From: Shinichiro Kawasaki @ 2024-06-04 7:25 UTC (permalink / raw)
To: linux-rdma@vger.kernel.org; +Cc: Bart Van Assche, Zhu Yanjun
As I noted in another thread [1], KASAN slab-use-after-free is observed when
I repeat the blktests test case srp/002 with the siw driver [2]. The kernel
version was v6.10-rc2. The failure is recreated in stable manner when the test
case is repeated around 30 times. It was not observed with the rxe driver.
I think this failure is same as that I reported in Jun/2023 [3]. The Call Trace
reported is quite similar. Also, I confirmed that the trial fix patch that I
created in Jun/2023 avoided the KASAN failure at srp/002.
In Jun/2023, the KASAN failure was observed with the test cases nvme/030 and
nvme/031. But the symptom disappeared in Sep/2023 [4]. I guess the failure has
got observable again with srp/002.
As for the root cause, it was advised that "There is something wrong with the
iwarp cm if it is destroying IDs in handlers" [5]. Actions for fix will be
appreciated. I'm willing to test fix patches.
[1] https://lore.kernel.org/linux-block/n2adhqzr6x5fss6jff7pxhubkkalvxeyesmg7jre4uomfcdudb@dwn3wgkqhmj7/
[2]
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000006d1c31fe with status 5
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000916ce050 with status 5
Jun 04 09:23:11 testnode2 kernel: ==================================================================
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001770ef1b with status 5
Jun 04 09:23:11 testnode2 kernel: BUG: KASAN: slab-use-after-free in __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel: Read of size 8 at addr ffff888131a3e418 by task kworker/u16:6/1345
Jun 04 09:23:11 testnode2 kernel:
Jun 04 09:23:11 testnode2 kernel: CPU: 1 PID: 1345 Comm: kworker/u16:6 Not tainted 6.10.0-rc2+ #288
Jun 04 09:23:11 testnode2 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Jun 04 09:23:11 testnode2 kernel: Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Jun 04 09:23:11 testnode2 kernel: Call Trace:
Jun 04 09:23:11 testnode2 kernel: <TASK>
Jun 04 09:23:11 testnode2 kernel: dump_stack_lvl+0x6a/0x90
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000f727e5c2 with status 5
Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel: print_report+0x174/0x505
Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel: ? __virt_addr_valid+0x1b9/0x400
Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel: kasan_report+0xa7/0x180
Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel: __mutex_lock+0x1110/0x13c0
Jun 04 09:23:11 testnode2 kernel: ? cma_iw_handler+0xac/0x500 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel: ? __lock_acquire+0x139d/0x5d60
Jun 04 09:23:11 testnode2 kernel: ? __pfx___mutex_lock+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: ? mark_lock+0xf5/0x1580
Jun 04 09:23:11 testnode2 kernel: ? __pfx_mark_lock+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000009bc71497 with status 5
Jun 04 09:23:11 testnode2 kernel: ? cma_iw_handler+0xac/0x500 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 0000000041c0fa4b with status 5
Jun 04 09:23:11 testnode2 kernel: cma_iw_handler+0xac/0x500 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel: ? __pfx_cma_iw_handler+0x10/0x10 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel: ? mark_held_locks+0x94/0xe0
Jun 04 09:23:11 testnode2 kernel: ? _raw_spin_unlock_irqrestore+0x4c/0x60
Jun 04 09:23:11 testnode2 kernel: cm_work_handler+0xb54/0x1c50 [iw_cm]
Jun 04 09:23:11 testnode2 kernel: ? __pfx_cm_work_handler+0x10/0x10 [iw_cm]
Jun 04 09:23:11 testnode2 kernel: ? __pfx_lock_release+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000f48094cb with status 5
Jun 04 09:23:11 testnode2 kernel: process_one_work+0x865/0x1410
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001c3faa8a with status 5
Jun 04 09:23:11 testnode2 kernel: ? __pfx_lock_acquire+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: ? __pfx_process_one_work+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: ? assign_work+0x16c/0x240
Jun 04 09:23:11 testnode2 kernel: ? lock_is_held_type+0xd5/0x130
Jun 04 09:23:11 testnode2 kernel: worker_thread+0x5e2/0x1010
Jun 04 09:23:11 testnode2 kernel: ? __pfx_worker_thread+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: kthread+0x2d1/0x3a0
Jun 04 09:23:11 testnode2 kernel: ? _raw_spin_unlock_irq+0x24/0x50
Jun 04 09:23:11 testnode2 kernel: ? __pfx_kthread+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: ret_from_fork+0x30/0x70
Jun 04 09:23:11 testnode2 kernel: ? __pfx_kthread+0x10/0x10
Jun 04 09:23:11 testnode2 kernel: ret_from_fork_asm+0x1a/0x30
Jun 04 09:23:11 testnode2 kernel: </TASK>
Jun 04 09:23:11 testnode2 kernel:
Jun 04 09:23:11 testnode2 kernel: Allocated by task 75327:
Jun 04 09:23:11 testnode2 kernel: kasan_save_stack+0x2c/0x50
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001bd9ea09 with status 5
Jun 04 09:23:11 testnode2 kernel: kasan_save_track+0x10/0x30
Jun 04 09:23:11 testnode2 kernel: __kasan_kmalloc+0xa6/0xb0
Jun 04 09:23:11 testnode2 kernel: __rdma_create_id+0x5b/0x5d0 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel: __rdma_create_kernel_id+0x12/0x40 [rdma_cm]
Jun 04 09:23:11 testnode2 kernel: srp_new_rdma_cm_id+0x7c/0x200 [ib_srp]
Jun 04 09:23:11 testnode2 kernel: add_target_store+0x135e/0x29f0 [ib_srp]
Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000005afc8065 with status 5
Jun 04 09:23:11 testnode2 kernel: kernfs_fop_write_iter+0x3a4/0x5a0
Jun 04 09:23:11 testnode2 kernel: vfs_write+0x5e3/0xe70
Jun 04 09:23:11 testnode2 kernel: ksys_write+0xf7/0x1d0
Jun 04 09:23:11 testnode2 kernel: do_syscall_64+0x93/0x180
Jun 04 09:23:11 testnode2 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jun 04 09:23:11 testnode2 kernel:
Jun 04 09:23:11 testnode2 kernel: Freed by task 66344:
Jun 04 09:23:11 testnode2 kernel: kasan_save_stack+0x2c/0x50
Jun 04 09:23:11 testnode2 kernel: kasan_save_track+0x10/0x30
Jun 04 09:23:11 testnode2 kernel: kasan_save_free_info+0x37/0x60
Jun 04 09:23:11 testnode2 kernel: poison_slab_object+0x109/0x180
Jun 04 09:23:11 testnode2 kernel: __kasan_slab_free+0x2e/0x50
Jun 04 09:23:11 testnode2 kernel: kfree+0x11a/0x390
Jun 04 09:23:11 testnode2 kernel: srp_free_ch_ib+0x895/0xc80 [ib_srp]
Jun 04 09:23:11 testnode2 kernel: srp_remove_work+0x309/0x6c0 [ib_srp]
Jun 04 09:23:11 testnode2 kernel: process_one_work+0x865/0x1410
Jun 04 09:23:11 testnode2 kernel: worker_thread+0x5e2/0x1010
Jun 04 09:23:11 testnode2 kernel: kthread+0x2d1/0x3a0
Jun 04 09:23:11 testnode2 kernel: ret_from_fork+0x30/0x70
Jun 04 09:23:11 testnode2 kernel: ret_from_fork_asm+0x1a/0x30
Jun 04 09:23:11 testnode2 kernel:
Jun 04 09:23:11 testnode2 kernel: The buggy address belongs to the object at ffff888131a3e000
which belongs to the cache kmalloc-2k of size 2048
Jun 04 09:23:11 testnode2 kernel: The buggy address is located 1048 bytes inside of
freed 2048-byte region [ffff888131a3e000, ffff888131a3e800)
Jun 04 09:23:11 testnode2 kernel:
Jun 04 09:23:11 testnode2 kernel: The buggy address belongs to the physical page:
Jun 04 09:23:11 testnode2 kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888131a38000 pfn:0x131a38
Jun 04 09:23:11 testnode2 kernel: head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
Jun 04 09:23:11 testnode2 kernel: flags: 0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff)
Jun 04 09:23:11 testnode2 kernel: page_type: 0xffffefff(slab)
Jun 04 09:23:11 testnode2 kernel: raw: 0017ffffc0000240 ffff888100042f00 ffffea0004c89610 ffffea0004a3c010
Jun 04 09:23:11 testnode2 kernel: raw: ffff888131a38000 0000000000080006 00000001ffffefff 0000000000000000
Jun 04 09:23:11 testnode2 kernel: head: 0017ffffc0000240 ffff888100042f00 ffffea0004c89610 ffffea0004a3c010
Jun 04 09:23:11 testnode2 kernel: head: ffff888131a38000 0000000000080006 00000001ffffefff 0000000000000000
Jun 04 09:23:11 testnode2 kernel: head: 0017ffffc0000003 ffffea0004c68e01 ffffffffffffffff 0000000000000000
Jun 04 09:23:11 testnode2 kernel: head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
Jun 04 09:23:11 testnode2 kernel: page dumped because: kasan: bad access detected
Jun 04 09:23:11 testnode2 kernel:
Jun 04 09:23:11 testnode2 kernel: Memory state around the buggy address:
Jun 04 09:23:11 testnode2 kernel: ffff888131a3e300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel: ffff888131a3e380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel: >ffff888131a3e400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel: ^
Jun 04 09:23:11 testnode2 kernel: ffff888131a3e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel: ffff888131a3e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jun 04 09:23:11 testnode2 kernel: ==================================================================
Jun 04 09:23:11 testnode2 kernel: Disabling lock debugging due to kernel taint
Jun 04 09:23:11 testnode2 kernel: device-mapper: multipath: 253:2: Failing path 8:80.
Jun 04 09:23:11 testnode2 kernel: device-mapper: uevent: dm_send_uevents: skipping sending uevent for lost device
...
[3] https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@wdc.com/
[4] https://lore.kernel.org/linux-rdma/g2lh3wh6e6yossw2ktqmxx2rf63m36mumqmx4qbtzvxuygsr6h@gpgftgfigllv/
[5] https://lore.kernel.org/linux-rdma/ZIn6ul5jPuxC+uIG@ziepe.ca/
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver 2024-06-04 7:25 [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver Shinichiro Kawasaki @ 2024-06-04 9:26 ` Zhu Yanjun 2024-06-04 20:15 ` Bart Van Assche 2024-06-05 7:39 ` Shinichiro Kawasaki 0 siblings, 2 replies; 9+ messages in thread From: Zhu Yanjun @ 2024-06-04 9:26 UTC (permalink / raw) To: Shinichiro Kawasaki, linux-rdma@vger.kernel.org; +Cc: Bart Van Assche On 04.06.24 09:25, Shinichiro Kawasaki wrote: > As I noted in another thread [1], KASAN slab-use-after-free is observed when > I repeat the blktests test case srp/002 with the siw driver [2]. The kernel > version was v6.10-rc2. The failure is recreated in stable manner when the test > case is repeated around 30 times. It was not observed with the rxe driver. > > I think this failure is same as that I reported in Jun/2023 [3]. The Call Trace > reported is quite similar. Also, I confirmed that the trial fix patch that I > created in Jun/2023 avoided the KASAN failure at srp/002. "the trial fix patch that I created in Jun/2023" that you mentioned is the commit in the link? https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@wdc.com/ Thanks, Zhu Yanjun > > In Jun/2023, the KASAN failure was observed with the test cases nvme/030 and > nvme/031. But the symptom disappeared in Sep/2023 [4]. I guess the failure has > got observable again with srp/002. > > As for the root cause, it was advised that "There is something wrong with the > iwarp cm if it is destroying IDs in handlers" [5]. Actions for fix will be > appreciated. I'm willing to test fix patches. > > [1] https://lore.kernel.org/linux-block/n2adhqzr6x5fss6jff7pxhubkkalvxeyesmg7jre4uomfcdudb@dwn3wgkqhmj7/ > > [2] > > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000006d1c31fe with status 5 > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000916ce050 with status 5 > Jun 04 09:23:11 testnode2 kernel: ================================================================== > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001770ef1b with status 5 > Jun 04 09:23:11 testnode2 kernel: BUG: KASAN: slab-use-after-free in __mutex_lock+0x1110/0x13c0 > Jun 04 09:23:11 testnode2 kernel: Read of size 8 at addr ffff888131a3e418 by task kworker/u16:6/1345 > Jun 04 09:23:11 testnode2 kernel: > Jun 04 09:23:11 testnode2 kernel: CPU: 1 PID: 1345 Comm: kworker/u16:6 Not tainted 6.10.0-rc2+ #288 > Jun 04 09:23:11 testnode2 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 > Jun 04 09:23:11 testnode2 kernel: Workqueue: iw_cm_wq cm_work_handler [iw_cm] > Jun 04 09:23:11 testnode2 kernel: Call Trace: > Jun 04 09:23:11 testnode2 kernel: <TASK> > Jun 04 09:23:11 testnode2 kernel: dump_stack_lvl+0x6a/0x90 > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000f727e5c2 with status 5 > Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0 > Jun 04 09:23:11 testnode2 kernel: print_report+0x174/0x505 > Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0 > Jun 04 09:23:11 testnode2 kernel: ? __virt_addr_valid+0x1b9/0x400 > Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0 > Jun 04 09:23:11 testnode2 kernel: kasan_report+0xa7/0x180 > Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0 > Jun 04 09:23:11 testnode2 kernel: __mutex_lock+0x1110/0x13c0 > Jun 04 09:23:11 testnode2 kernel: ? cma_iw_handler+0xac/0x500 [rdma_cm] > Jun 04 09:23:11 testnode2 kernel: ? __lock_acquire+0x139d/0x5d60 > Jun 04 09:23:11 testnode2 kernel: ? __pfx___mutex_lock+0x10/0x10 > Jun 04 09:23:11 testnode2 kernel: ? mark_lock+0xf5/0x1580 > Jun 04 09:23:11 testnode2 kernel: ? __pfx_mark_lock+0x10/0x10 > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000009bc71497 with status 5 > Jun 04 09:23:11 testnode2 kernel: ? cma_iw_handler+0xac/0x500 [rdma_cm] > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 0000000041c0fa4b with status 5 > Jun 04 09:23:11 testnode2 kernel: cma_iw_handler+0xac/0x500 [rdma_cm] > Jun 04 09:23:11 testnode2 kernel: ? __pfx_cma_iw_handler+0x10/0x10 [rdma_cm] > Jun 04 09:23:11 testnode2 kernel: ? mark_held_locks+0x94/0xe0 > Jun 04 09:23:11 testnode2 kernel: ? _raw_spin_unlock_irqrestore+0x4c/0x60 > Jun 04 09:23:11 testnode2 kernel: cm_work_handler+0xb54/0x1c50 [iw_cm] > Jun 04 09:23:11 testnode2 kernel: ? __pfx_cm_work_handler+0x10/0x10 [iw_cm] > Jun 04 09:23:11 testnode2 kernel: ? __pfx_lock_release+0x10/0x10 > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000f48094cb with status 5 > Jun 04 09:23:11 testnode2 kernel: process_one_work+0x865/0x1410 > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001c3faa8a with status 5 > Jun 04 09:23:11 testnode2 kernel: ? __pfx_lock_acquire+0x10/0x10 > Jun 04 09:23:11 testnode2 kernel: ? __pfx_process_one_work+0x10/0x10 > Jun 04 09:23:11 testnode2 kernel: ? assign_work+0x16c/0x240 > Jun 04 09:23:11 testnode2 kernel: ? lock_is_held_type+0xd5/0x130 > Jun 04 09:23:11 testnode2 kernel: worker_thread+0x5e2/0x1010 > Jun 04 09:23:11 testnode2 kernel: ? __pfx_worker_thread+0x10/0x10 > Jun 04 09:23:11 testnode2 kernel: kthread+0x2d1/0x3a0 > Jun 04 09:23:11 testnode2 kernel: ? _raw_spin_unlock_irq+0x24/0x50 > Jun 04 09:23:11 testnode2 kernel: ? __pfx_kthread+0x10/0x10 > Jun 04 09:23:11 testnode2 kernel: ret_from_fork+0x30/0x70 > Jun 04 09:23:11 testnode2 kernel: ? __pfx_kthread+0x10/0x10 > Jun 04 09:23:11 testnode2 kernel: ret_from_fork_asm+0x1a/0x30 > Jun 04 09:23:11 testnode2 kernel: </TASK> > Jun 04 09:23:11 testnode2 kernel: > Jun 04 09:23:11 testnode2 kernel: Allocated by task 75327: > Jun 04 09:23:11 testnode2 kernel: kasan_save_stack+0x2c/0x50 > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001bd9ea09 with status 5 > Jun 04 09:23:11 testnode2 kernel: kasan_save_track+0x10/0x30 > Jun 04 09:23:11 testnode2 kernel: __kasan_kmalloc+0xa6/0xb0 > Jun 04 09:23:11 testnode2 kernel: __rdma_create_id+0x5b/0x5d0 [rdma_cm] > Jun 04 09:23:11 testnode2 kernel: __rdma_create_kernel_id+0x12/0x40 [rdma_cm] > Jun 04 09:23:11 testnode2 kernel: srp_new_rdma_cm_id+0x7c/0x200 [ib_srp] > Jun 04 09:23:11 testnode2 kernel: add_target_store+0x135e/0x29f0 [ib_srp] > Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000005afc8065 with status 5 > Jun 04 09:23:11 testnode2 kernel: kernfs_fop_write_iter+0x3a4/0x5a0 > Jun 04 09:23:11 testnode2 kernel: vfs_write+0x5e3/0xe70 > Jun 04 09:23:11 testnode2 kernel: ksys_write+0xf7/0x1d0 > Jun 04 09:23:11 testnode2 kernel: do_syscall_64+0x93/0x180 > Jun 04 09:23:11 testnode2 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e > Jun 04 09:23:11 testnode2 kernel: > Jun 04 09:23:11 testnode2 kernel: Freed by task 66344: > Jun 04 09:23:11 testnode2 kernel: kasan_save_stack+0x2c/0x50 > Jun 04 09:23:11 testnode2 kernel: kasan_save_track+0x10/0x30 > Jun 04 09:23:11 testnode2 kernel: kasan_save_free_info+0x37/0x60 > Jun 04 09:23:11 testnode2 kernel: poison_slab_object+0x109/0x180 > Jun 04 09:23:11 testnode2 kernel: __kasan_slab_free+0x2e/0x50 > Jun 04 09:23:11 testnode2 kernel: kfree+0x11a/0x390 > Jun 04 09:23:11 testnode2 kernel: srp_free_ch_ib+0x895/0xc80 [ib_srp] > Jun 04 09:23:11 testnode2 kernel: srp_remove_work+0x309/0x6c0 [ib_srp] > Jun 04 09:23:11 testnode2 kernel: process_one_work+0x865/0x1410 > Jun 04 09:23:11 testnode2 kernel: worker_thread+0x5e2/0x1010 > Jun 04 09:23:11 testnode2 kernel: kthread+0x2d1/0x3a0 > Jun 04 09:23:11 testnode2 kernel: ret_from_fork+0x30/0x70 > Jun 04 09:23:11 testnode2 kernel: ret_from_fork_asm+0x1a/0x30 > Jun 04 09:23:11 testnode2 kernel: > Jun 04 09:23:11 testnode2 kernel: The buggy address belongs to the object at ffff888131a3e000 > which belongs to the cache kmalloc-2k of size 2048 > Jun 04 09:23:11 testnode2 kernel: The buggy address is located 1048 bytes inside of > freed 2048-byte region [ffff888131a3e000, ffff888131a3e800) > Jun 04 09:23:11 testnode2 kernel: > Jun 04 09:23:11 testnode2 kernel: The buggy address belongs to the physical page: > Jun 04 09:23:11 testnode2 kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888131a38000 pfn:0x131a38 > Jun 04 09:23:11 testnode2 kernel: head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 > Jun 04 09:23:11 testnode2 kernel: flags: 0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff) > Jun 04 09:23:11 testnode2 kernel: page_type: 0xffffefff(slab) > Jun 04 09:23:11 testnode2 kernel: raw: 0017ffffc0000240 ffff888100042f00 ffffea0004c89610 ffffea0004a3c010 > Jun 04 09:23:11 testnode2 kernel: raw: ffff888131a38000 0000000000080006 00000001ffffefff 0000000000000000 > Jun 04 09:23:11 testnode2 kernel: head: 0017ffffc0000240 ffff888100042f00 ffffea0004c89610 ffffea0004a3c010 > Jun 04 09:23:11 testnode2 kernel: head: ffff888131a38000 0000000000080006 00000001ffffefff 0000000000000000 > Jun 04 09:23:11 testnode2 kernel: head: 0017ffffc0000003 ffffea0004c68e01 ffffffffffffffff 0000000000000000 > Jun 04 09:23:11 testnode2 kernel: head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 > Jun 04 09:23:11 testnode2 kernel: page dumped because: kasan: bad access detected > Jun 04 09:23:11 testnode2 kernel: > Jun 04 09:23:11 testnode2 kernel: Memory state around the buggy address: > Jun 04 09:23:11 testnode2 kernel: ffff888131a3e300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > Jun 04 09:23:11 testnode2 kernel: ffff888131a3e380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > Jun 04 09:23:11 testnode2 kernel: >ffff888131a3e400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > Jun 04 09:23:11 testnode2 kernel: ^ > Jun 04 09:23:11 testnode2 kernel: ffff888131a3e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > Jun 04 09:23:11 testnode2 kernel: ffff888131a3e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > Jun 04 09:23:11 testnode2 kernel: ================================================================== > Jun 04 09:23:11 testnode2 kernel: Disabling lock debugging due to kernel taint > Jun 04 09:23:11 testnode2 kernel: device-mapper: multipath: 253:2: Failing path 8:80. > Jun 04 09:23:11 testnode2 kernel: device-mapper: uevent: dm_send_uevents: skipping sending uevent for lost device > ... > > [3] https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@wdc.com/ > [4] https://lore.kernel.org/linux-rdma/g2lh3wh6e6yossw2ktqmxx2rf63m36mumqmx4qbtzvxuygsr6h@gpgftgfigllv/ > [5] https://lore.kernel.org/linux-rdma/ZIn6ul5jPuxC+uIG@ziepe.ca/ -- Best Regards, Yanjun.Zhu ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver 2024-06-04 9:26 ` Zhu Yanjun @ 2024-06-04 20:15 ` Bart Van Assche 2024-06-04 20:22 ` Jason Gunthorpe ` (2 more replies) 2024-06-05 7:39 ` Shinichiro Kawasaki 1 sibling, 3 replies; 9+ messages in thread From: Bart Van Assche @ 2024-06-04 20:15 UTC (permalink / raw) To: Zhu Yanjun, Shinichiro Kawasaki, linux-rdma@vger.kernel.org, Jason Gunthorpe, Leon Romanovsky On 6/4/24 03:26, Zhu Yanjun wrote: > > On 04.06.24 09:25, Shinichiro Kawasaki wrote: >> As I noted in another thread [1], KASAN slab-use-after-free is >> observed when >> I repeat the blktests test case srp/002 with the siw driver [2]. The >> kernel >> version was v6.10-rc2. The failure is recreated in stable manner when >> the test >> case is repeated around 30 times. It was not observed with the rxe >> driver. >> >> I think this failure is same as that I reported in Jun/2023 [3]. The >> Call Trace >> reported is quite similar. Also, I confirmed that the trial fix patch >> that I >> created in Jun/2023 avoided the KASAN failure at srp/002. > > "the trial fix patch that I created in Jun/2023" that you mentioned is > the commit in the link? > > https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@wdc.com/ To me that patch doesn't seem correct. Jason and Leon, is my understanding correct that you are the maintainers for the iwcm code? Can you please help with reviewing this patch? Thanks, Bart. From 879ca4e5f9ab8c4ce522b4edc144a3938a2f4afb Mon Sep 17 00:00:00 2001 From: Bart Van Assche <bvanassche@acm.org> Date: Tue, 4 Jun 2024 12:49:44 -0700 Subject: [PATCH] RDMA/iwcm: Fix a use-after-free related to destroying CM IDs iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with an existing struct iw_cm_id (cm_id) as follows: conn_id->cm_id.iw = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_iw_handler; rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make sure that cm_work_handler() does not trigger a use-after-free by delaing freeing of the struct rdma_id_private until all pending work has finished. Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/infiniband/core/iwcm.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c index d608952c6e8e..ea9dc26bf563 100644 --- a/drivers/infiniband/core/iwcm.c +++ b/drivers/infiniband/core/iwcm.c @@ -368,8 +368,10 @@ EXPORT_SYMBOL(iw_cm_disconnect); * * Clean up all resources associated with the connection and release * the initial reference taken by iw_create_cm_id. + * + * Returns true if and only if the last cm_id_priv reference has been dropped. */ -static void destroy_cm_id(struct iw_cm_id *cm_id) +static bool destroy_cm_id(struct iw_cm_id *cm_id) { struct iwcm_id_private *cm_id_priv; struct ib_qp *qp; @@ -439,7 +441,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id) iwpm_remove_mapping(&cm_id->local_addr, RDMA_NL_IWCM); } - (void)iwcm_deref_id(cm_id_priv); + return iwcm_deref_id(cm_id_priv); } /* @@ -450,7 +452,8 @@ static void destroy_cm_id(struct iw_cm_id *cm_id) */ void iw_destroy_cm_id(struct iw_cm_id *cm_id) { - destroy_cm_id(cm_id); + if (!destroy_cm_id(cm_id)) + flush_workqueue(iwcm_wq); } EXPORT_SYMBOL(iw_destroy_cm_id); @@ -1031,7 +1034,7 @@ static void cm_work_handler(struct work_struct *_work) if (!test_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags)) { ret = process_event(cm_id_priv, &levent); if (ret) - destroy_cm_id(&cm_id_priv->id); + WARN_ON_ONCE(destroy_cm_id(&cm_id_priv->id)); } else pr_debug("dropping event %d\n", levent.event); if (iwcm_deref_id(cm_id_priv)) ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver 2024-06-04 20:15 ` Bart Van Assche @ 2024-06-04 20:22 ` Jason Gunthorpe 2024-06-04 20:25 ` Bart Van Assche 2024-06-05 7:36 ` Shinichiro Kawasaki 2024-06-05 10:42 ` Zhu Yanjun 2 siblings, 1 reply; 9+ messages in thread From: Jason Gunthorpe @ 2024-06-04 20:22 UTC (permalink / raw) To: Bart Van Assche Cc: Zhu Yanjun, Shinichiro Kawasaki, linux-rdma@vger.kernel.org, Leon Romanovsky On Tue, Jun 04, 2024 at 02:15:44PM -0600, Bart Van Assche wrote: > On 6/4/24 03:26, Zhu Yanjun wrote: > > > > On 04.06.24 09:25, Shinichiro Kawasaki wrote: > > > As I noted in another thread [1], KASAN slab-use-after-free is > > > observed when > > > I repeat the blktests test case srp/002 with the siw driver [2]. The > > > kernel > > > version was v6.10-rc2. The failure is recreated in stable manner > > > when the test > > > case is repeated around 30 times. It was not observed with the rxe > > > driver. > > > > > > I think this failure is same as that I reported in Jun/2023 [3]. The > > > Call Trace > > > reported is quite similar. Also, I confirmed that the trial fix > > > patch that I > > > created in Jun/2023 avoided the KASAN failure at srp/002. > > > > "the trial fix patch that I created in Jun/2023" that you mentioned is > > the commit in the link? > > > > https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@wdc.com/ > > To me that patch doesn't seem correct. Jason and Leon, is my understanding > correct that you are the maintainers for the iwcm code? Can you please help > with reviewing this patch? > > Thanks, > > Bart. > > From 879ca4e5f9ab8c4ce522b4edc144a3938a2f4afb Mon Sep 17 00:00:00 2001 > From: Bart Van Assche <bvanassche@acm.org> > Date: Tue, 4 Jun 2024 12:49:44 -0700 > Subject: [PATCH] RDMA/iwcm: Fix a use-after-free related to destroying CM IDs > > iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with > an existing struct iw_cm_id (cm_id) as follows: > > conn_id->cm_id.iw = cm_id; > cm_id->context = conn_id; > cm_id->cm_handler = cma_iw_handler; > > rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make > sure that cm_work_handler() does not trigger a use-after-free by delaing > freeing of the struct rdma_id_private until all pending work has finished. I didn't try to look in detail but this certainly makes more sense to me as a possible solution to a UAF Presumably destroy_cm_id() does something to prevent new work from being scheduled? Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver 2024-06-04 20:22 ` Jason Gunthorpe @ 2024-06-04 20:25 ` Bart Van Assche 0 siblings, 0 replies; 9+ messages in thread From: Bart Van Assche @ 2024-06-04 20:25 UTC (permalink / raw) To: Jason Gunthorpe Cc: Zhu Yanjun, Shinichiro Kawasaki, linux-rdma@vger.kernel.org, Leon Romanovsky On 6/4/24 14:22, Jason Gunthorpe wrote: >> From 879ca4e5f9ab8c4ce522b4edc144a3938a2f4afb Mon Sep 17 00:00:00 2001 >> From: Bart Van Assche <bvanassche@acm.org> >> Date: Tue, 4 Jun 2024 12:49:44 -0700 >> Subject: [PATCH] RDMA/iwcm: Fix a use-after-free related to destroying CM IDs >> >> iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with >> an existing struct iw_cm_id (cm_id) as follows: >> >> conn_id->cm_id.iw = cm_id; >> cm_id->context = conn_id; >> cm_id->cm_handler = cma_iw_handler; >> >> rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make >> sure that cm_work_handler() does not trigger a use-after-free by delaing >> freeing of the struct rdma_id_private until all pending work has finished. > > I didn't try to look in detail but this certainly makes more sense to > me as a possible solution to a UAF > > Presumably destroy_cm_id() does something to prevent new work from > being scheduled? Yes, it removes the iWARP CM ID from all the data structures that are consulted when an incoming CM packet arrives. Thanks, Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver 2024-06-04 20:15 ` Bart Van Assche 2024-06-04 20:22 ` Jason Gunthorpe @ 2024-06-05 7:36 ` Shinichiro Kawasaki 2024-06-05 10:42 ` Zhu Yanjun 2 siblings, 0 replies; 9+ messages in thread From: Shinichiro Kawasaki @ 2024-06-05 7:36 UTC (permalink / raw) To: Bart Van Assche Cc: Zhu Yanjun, linux-rdma@vger.kernel.org, Jason Gunthorpe, Leon Romanovsky On Jun 04, 2024 / 14:15, Bart Van Assche wrote: [...] > From 879ca4e5f9ab8c4ce522b4edc144a3938a2f4afb Mon Sep 17 00:00:00 2001 > From: Bart Van Assche <bvanassche@acm.org> > Date: Tue, 4 Jun 2024 12:49:44 -0700 > Subject: [PATCH] RDMA/iwcm: Fix a use-after-free related to destroying CM IDs > > iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with > an existing struct iw_cm_id (cm_id) as follows: > > conn_id->cm_id.iw = cm_id; > cm_id->context = conn_id; > cm_id->cm_handler = cma_iw_handler; > > rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make > sure that cm_work_handler() does not trigger a use-after-free by delaing > freeing of the struct rdma_id_private until all pending work has finished. > > Signed-off-by: Bart Van Assche <bvanassche@acm.org> Thank you Bart, I applied this patch on top of the kernel v6.10-rc2, and the KASAN suaf disappeared. I repeated the test case 100 times, and did not see the failure. I also ran whole blktests with my test set up and saw no regression. Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver 2024-06-04 20:15 ` Bart Van Assche 2024-06-04 20:22 ` Jason Gunthorpe 2024-06-05 7:36 ` Shinichiro Kawasaki @ 2024-06-05 10:42 ` Zhu Yanjun 2024-06-05 13:07 ` Bart Van Assche 2 siblings, 1 reply; 9+ messages in thread From: Zhu Yanjun @ 2024-06-05 10:42 UTC (permalink / raw) To: Bart Van Assche, Shinichiro Kawasaki, linux-rdma@vger.kernel.org, Jason Gunthorpe, Leon Romanovsky On 04.06.24 22:15, Bart Van Assche wrote: > On 6/4/24 03:26, Zhu Yanjun wrote: >> >> On 04.06.24 09:25, Shinichiro Kawasaki wrote: >>> As I noted in another thread [1], KASAN slab-use-after-free is >>> observed when >>> I repeat the blktests test case srp/002 with the siw driver [2]. The >>> kernel >>> version was v6.10-rc2. The failure is recreated in stable manner when >>> the test >>> case is repeated around 30 times. It was not observed with the rxe >>> driver. >>> >>> I think this failure is same as that I reported in Jun/2023 [3]. The >>> Call Trace >>> reported is quite similar. Also, I confirmed that the trial fix patch >>> that I >>> created in Jun/2023 avoided the KASAN failure at srp/002. >> >> "the trial fix patch that I created in Jun/2023" that you mentioned is >> the commit in the link? >> >> https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@wdc.com/ > > To me that patch doesn't seem correct. Jason and Leon, is my understanding > correct that you are the maintainers for the iwcm code? Can you please help > with reviewing this patch? > > Thanks, > > Bart. > > From 879ca4e5f9ab8c4ce522b4edc144a3938a2f4afb Mon Sep 17 00:00:00 2001 > From: Bart Van Assche <bvanassche@acm.org> > Date: Tue, 4 Jun 2024 12:49:44 -0700 > Subject: [PATCH] RDMA/iwcm: Fix a use-after-free related to destroying > CM IDs > > iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) > with > an existing struct iw_cm_id (cm_id) as follows: > > conn_id->cm_id.iw = cm_id; > cm_id->context = conn_id; > cm_id->cm_handler = cma_iw_handler; > > rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make > sure that cm_work_handler() does not trigger a use-after-free by delaing > freeing of the struct rdma_id_private until all pending work has finished. > > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/infiniband/core/iwcm.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/core/iwcm.c > b/drivers/infiniband/core/iwcm.c > index d608952c6e8e..ea9dc26bf563 100644 > --- a/drivers/infiniband/core/iwcm.c > +++ b/drivers/infiniband/core/iwcm.c > @@ -368,8 +368,10 @@ EXPORT_SYMBOL(iw_cm_disconnect); > * > * Clean up all resources associated with the connection and release > * the initial reference taken by iw_create_cm_id. > + * > + * Returns true if and only if the last cm_id_priv reference has been > dropped. > */ > -static void destroy_cm_id(struct iw_cm_id *cm_id) > +static bool destroy_cm_id(struct iw_cm_id *cm_id) Now the type of destroy_cm_id is changed from void to bool. > { > struct iwcm_id_private *cm_id_priv; > struct ib_qp *qp; > @@ -439,7 +441,7 @@ static void destroy_cm_id(struct iw_cm_id *cm_id) > iwpm_remove_mapping(&cm_id->local_addr, RDMA_NL_IWCM); > } > > - (void)iwcm_deref_id(cm_id_priv); > + return iwcm_deref_id(cm_id_priv); static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv) The type of iwcm_deref_id is int. Not sure if we should make iwcm_deref_id and destroy_cm_id have the different type or not. We can make them use one of the 2 types: int and bool. Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Zhu Yanjun > } > > /* > @@ -450,7 +452,8 @@ static void destroy_cm_id(struct iw_cm_id *cm_id) > */ > void iw_destroy_cm_id(struct iw_cm_id *cm_id) > { > - destroy_cm_id(cm_id); > + if (!destroy_cm_id(cm_id)) > + flush_workqueue(iwcm_wq); > } > EXPORT_SYMBOL(iw_destroy_cm_id); > > @@ -1031,7 +1034,7 @@ static void cm_work_handler(struct work_struct > *_work) > if (!test_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags)) { > ret = process_event(cm_id_priv, &levent); > if (ret) > - destroy_cm_id(&cm_id_priv->id); > + WARN_ON_ONCE(destroy_cm_id(&cm_id_priv->id)); > } else > pr_debug("dropping event %d\n", levent.event); > if (iwcm_deref_id(cm_id_priv)) > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver 2024-06-05 10:42 ` Zhu Yanjun @ 2024-06-05 13:07 ` Bart Van Assche 0 siblings, 0 replies; 9+ messages in thread From: Bart Van Assche @ 2024-06-05 13:07 UTC (permalink / raw) To: Zhu Yanjun, Shinichiro Kawasaki, linux-rdma@vger.kernel.org, Jason Gunthorpe, Leon Romanovsky On 6/5/24 04:42, Zhu Yanjun wrote: > The type of iwcm_deref_id is int. > > Not sure if we should make iwcm_deref_id and destroy_cm_id have the different type or not. We can make them use one of the 2 types: int and bool. Since iwcm_deref_id() either returns 0 or 1, I will change its return type into bool. > Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Thanks! Bart. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver 2024-06-04 9:26 ` Zhu Yanjun 2024-06-04 20:15 ` Bart Van Assche @ 2024-06-05 7:39 ` Shinichiro Kawasaki 1 sibling, 0 replies; 9+ messages in thread From: Shinichiro Kawasaki @ 2024-06-05 7:39 UTC (permalink / raw) To: Zhu Yanjun; +Cc: linux-rdma@vger.kernel.org, Bart Van Assche On Jun 04, 2024 / 11:26, Zhu Yanjun wrote: > > On 04.06.24 09:25, Shinichiro Kawasaki wrote: > > As I noted in another thread [1], KASAN slab-use-after-free is observed when > > I repeat the blktests test case srp/002 with the siw driver [2]. The kernel > > version was v6.10-rc2. The failure is recreated in stable manner when the test > > case is repeated around 30 times. It was not observed with the rxe driver. > > > > I think this failure is same as that I reported in Jun/2023 [3]. The Call Trace > > reported is quite similar. Also, I confirmed that the trial fix patch that I > > created in Jun/2023 avoided the KASAN failure at srp/002. > > "the trial fix patch that I created in Jun/2023" that you mentioned is the > commit in the link? > > https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@wdc.com/ Yes, but I understand that it is not a good fix. I used the patch just to check if the KASAN observed now is the same issue as the KASAN observed in Jun/2023. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-06-05 13:07 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-04 7:25 [bug report] KASAN slab-use-after-free at blktests srp/002 with siw driver Shinichiro Kawasaki 2024-06-04 9:26 ` Zhu Yanjun 2024-06-04 20:15 ` Bart Van Assche 2024-06-04 20:22 ` Jason Gunthorpe 2024-06-04 20:25 ` Bart Van Assche 2024-06-05 7:36 ` Shinichiro Kawasaki 2024-06-05 10:42 ` Zhu Yanjun 2024-06-05 13:07 ` Bart Van Assche 2024-06-05 7:39 ` Shinichiro Kawasaki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.