public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* blktests failures with v6.12-rc1 kernel
@ 2024-10-03  8:02 Shinichiro Kawasaki
  2024-10-03 20:56 ` Bart Van Assche
  0 siblings, 1 reply; 11+ messages in thread
From: Shinichiro Kawasaki @ 2024-10-03  8:02 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

Hi all,

I ran the latest blktests (git hash: 80430afc5589) with the v6.12-rc1 kernel,
and I observed three failure symptoms listed below.

Comparing with the previous report using the v6.11 kernel [1],

- v6.12 kernel has one new failure symptom #3 in srp test group, and,
- v6.12 kernel has one less failure, which was observed with the test case
  scsi/008. It was addressed in the kernel side.

[1] https://lore.kernel.org/linux-block/3aydm6iazrkdxb4d5yb3tc7fjqax6nvukrn3tpvzjcom6woc5g@qbai6zlvsrbs/


List of failures
================
#1: nvme/014 (tcp transport)
#2: nvme/041 (fc transport)
#3: srp/001,002,011,012,013,014,016


Failure description
===================
#1: nvme/014 (tcp transport)

   With the trtype=tcp configuration, nvme/014 fails occasionally with the
   kernel message "DEBUG_LOCKS_WARN_ON(lock->magic != lock)". It is rare, and
   200 times of repeat is required to recreate the failure. A fix patch
   candidate was posted [2].

   [2] https://lore.kernel.org/linux-nvme/20241002045141.1975881-1-shinichiro.kawasaki@wdc.com/

#2: nvme/041 (fc transport)

   With the trtype=fc configuration, nvme/041 fails:

  nvme/041 (Create authenticated connections)                  [failed]
      runtime  2.677s  ...  4.823s
      --- tests/nvme/041.out      2023-11-29 12:57:17.206898664 +0900
      +++ /home/shin/Blktests/blktests/results/nodev/nvme/041.out.bad     2024-03-19 14:50:56.399101323 +0900
      @@ -2,5 +2,5 @@
       Test unauthenticated connection (should fail)
       disconnected 0 controller(s)
       Test authenticated connection
      -disconnected 1 controller(s)
      +disconnected 0 controller(s)
       Test complete

   nvme/044 had same failure symptom until the kernel v6.9. A solution was
   suggested and discussed in Feb/2024 [3].

   [3] https://lore.kernel.org/linux-nvme/20240221132404.6311-1-dwagner@suse.de/

#3: srp/001,002,011,012,013,014,016

   The seven test cases in srp test group failed due to the WARN
   "kmem_cache of name 'srpt-rsp-buf' already exists" [4]. The failures are
   recreated in stable manner. They need further debug effort.


[4]

[ 3833.868986] [ T120648] ------------[ cut here ]------------
[ 3833.870223] [ T120648] kmem_cache of name 'srpt-rsp-buf' already exists
[ 3833.871490] [ T120648] WARNING: CPU: 1 PID: 120648 at mm/slab_common.c:107 __kmem_cache_create_args+0xa3/0x300
[ 3833.873136] [ T120648] Modules linked in: ib_srp scsi_transport_srp target_core_user target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod rdma_cm scsi_debug siw ib_uverbs null_blk ib_umad crc32_generic dm_service_time nbd iw_cm ib_cm ib_core pktcdvd nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr sunrpc 9pnet_virtio 9pnet ppdev netfs pcspkr i2c_piix4 e1000 parport_pc i2c_smbus parport fuse loop nfnetlink zram bochs drm_vram_helper drm_ttm_helper ttm drm
_kms_helper xfs nvme nvme_core drm floppy sym53c8xx scsi_transport_spi nvme_auth serio_raw ata_generic pata_acpi dm_multipath qemu_fw_cfg [last unloaded: null_blk]
[ 3833.882920] [ T120648] CPU: 1 UID: 0 PID: 120648 Comm: kworker/u16:55 Tainted: G    B   W          6.12.0-rc1 #334
[ 3833.884767] [ T120648] Tainted: [B]=BAD_PAGE, [W]=WARN
[ 3833.886258] [ T120648] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
[ 3833.887979] [ T120648] Workqueue: iw_cm_wq cm_work_handler [iw_cm]
[ 3833.889520] [ T120648] RIP: 0010:__kmem_cache_create_args+0xa3/0x300
[ 3833.891016] [ T120648] Code: 8d 58 98 48 3d d0 a7 25 99 74 21 48 8b 7b 60 48 89 ee e8 30 cd 06 02 85 c0 75 e0 48 89 ee 48 c7 c7 d0 db b0 98 e8 dd 92 82 ff <0f> 0b be 20 00 00 00 48 89 ef e8 8e cd 06 02 48 85 c0 0f 85 02 02
[ 3833.894873] [ T120648] RSP: 0018:ffff8881788f7508 EFLAGS: 00010292
[ 3833.896546] [ T120648] RAX: 0000000000000000 RBX: ffff888104be3540 RCX: 0000000000000000
[ 3833.898237] [ T120648] RDX: 0000000000000000 RSI: ffffffff981bea60 RDI: 0000000000000001
[ 3833.899973] [ T120648] RBP: ffffffffc1f52c20 R08: 0000000000000001 R09: ffffed102f11ee4b
[ 3833.901715] [ T120648] R10: ffff8881788f725f R11: 00000000001b9378 R12: 0000000000000100
[ 3833.903509] [ T120648] R13: ffff8881788f76c8 R14: 0000000000000000 R15: 0000000000000000
[ 3833.905378] [ T120648] FS:  0000000000000000(0000) GS:ffff8883ae080000(0000) knlGS:0000000000000000
[ 3833.907167] [ T120648] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3833.908972] [ T120648] CR2: 00007fdbbefa1474 CR3: 0000000124b3a000 CR4: 00000000000006f0
[ 3833.910941] [ T120648] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3833.912807] [ T120648] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3833.914626] [ T120648] Call Trace:
[ 3833.915994] [ T120648]  <TASK>
[ 3833.917398] [ T120648]  ? __warn.cold+0x5f/0x1f8
[ 3833.918855] [ T120648]  ? __kmem_cache_create_args+0xa3/0x300
[ 3833.920464] [ T120648]  ? report_bug+0x1ec/0x390
[ 3833.921945] [ T120648]  ? handle_bug+0x58/0x90
[ 3833.923442] [ T120648]  ? exc_invalid_op+0x13/0x40
[ 3833.924906] [ T120648]  ? asm_exc_invalid_op+0x16/0x20
[ 3833.926457] [ T120648]  ? __kmem_cache_create_args+0xa3/0x300
[ 3833.928255] [ T120648]  ? __kmem_cache_create_args+0xa3/0x300
[ 3833.929985] [ T120648]  srpt_cm_req_recv.cold+0xea0/0x44cc [ib_srpt]
[ 3833.931717] [ T120648]  ? vsnprintf+0x38b/0x18f0
[ 3833.933255] [ T120648]  ? __pfx_vsnprintf+0x10/0x10
[ 3833.934858] [ T120648]  ? xas_start+0x93/0x500
[ 3833.936400] [ T120648]  ? __pfx_srpt_cm_req_recv+0x10/0x10 [ib_srpt]
[ 3833.938150] [ T120648]  ? snprintf+0xa5/0xe0
[ 3833.939611] [ T120648]  ? __pfx_snprintf+0x10/0x10
[ 3833.941121] [ T120648]  ? lock_release+0x57a/0x7a0
[ 3833.942652] [ T120648]  srpt_rdma_cm_req_recv+0x35d/0x460 [ib_srpt]
[ 3833.944234] [ T120648]  ? __pfx_srpt_rdma_cm_req_recv+0x10/0x10 [ib_srpt]
[ 3833.945844] [ T120648]  ? rcu_is_watching+0x11/0xb0
[ 3833.947311] [ T120648]  ? trace_cm_event_handler+0xf5/0x140 [rdma_cm]
[ 3833.948835] [ T120648]  cma_cm_event_handler+0x88/0x210 [rdma_cm]
[ 3833.950302] [ T120648]  iw_conn_req_handler+0x7a8/0xf10 [rdma_cm]
[ 3833.951766] [ T120648]  ? __pfx_iw_conn_req_handler+0x10/0x10 [rdma_cm]
[ 3833.953252] [ T120648]  ? alloc_work_entries+0x12f/0x260 [iw_cm]
[ 3833.954602] [ T120648]  cm_work_handler+0x143f/0x1ba0 [iw_cm]
[ 3833.955904] [ T120648]  ? __pfx_cm_work_handler+0x10/0x10 [iw_cm]
[ 3833.957213] [ T120648]  ? process_one_work+0x7de/0x1460
[ 3833.958412] [ T120648]  ? lock_acquire+0x2d/0xc0
[ 3833.959538] [ T120648]  ? process_one_work+0x7de/0x1460
[ 3833.960672] [ T120648]  process_one_work+0x85a/0x1460
[ 3833.961764] [ T120648]  ? __pfx_process_one_work+0x10/0x10
[ 3833.962861] [ T120648]  ? assign_work+0x16c/0x240
[ 3833.963901] [ T120648]  worker_thread+0x5e2/0xfc0
[ 3833.964926] [ T120648]  ? __pfx_worker_thread+0x10/0x10
[ 3833.965983] [ T120648]  kthread+0x2d1/0x3a0
[ 3833.966935] [ T120648]  ? trace_irq_enable.constprop.0+0xce/0x110
[ 3833.968000] [ T120648]  ? __pfx_kthread+0x10/0x10
[ 3833.968956] [ T120648]  ret_from_fork+0x30/0x70
[ 3833.969890] [ T120648]  ? __pfx_kthread+0x10/0x10
[ 3833.970837] [ T120648]  ret_from_fork_asm+0x1a/0x30
[ 3833.971792] [ T120648]  </TASK>
[ 3833.972609] [ T120648] irq event stamp: 0
[ 3833.973489] [ T120648] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 3833.974605] [ T120648] hardirqs last disabled at (0): [<ffffffff95204727>] copy_process+0x1ef7/0x8480
[ 3833.975860] [ T120648] softirqs last  enabled at (0): [<ffffffff9520478c>] copy_process+0x1f5c/0x8480
[ 3833.977096] [ T120648] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 3833.978192] [ T120648] ---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-03  8:02 blktests failures with v6.12-rc1 kernel Shinichiro Kawasaki
@ 2024-10-03 20:56 ` Bart Van Assche
  2024-10-04  2:35   ` Zhu Yanjun
  2024-10-04  2:40   ` Shinichiro Kawasaki
  0 siblings, 2 replies; 11+ messages in thread
From: Bart Van Assche @ 2024-10-03 20:56 UTC (permalink / raw)
  To: Shinichiro Kawasaki, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org,
	nbd@other.debian.org, linux-rdma@vger.kernel.org

On 10/3/24 1:02 AM, Shinichiro Kawasaki wrote:
> #3: srp/001,002,011,012,013,014,016
> 
>     The seven test cases in srp test group failed due to the WARN
>     "kmem_cache of name 'srpt-rsp-buf' already exists" [4]. The failures are
>     recreated in stable manner. They need further debug effort.

Does the patch below help?

Thanks,

Bart.


Subject: [PATCH] RDMA/srpt: Make kmem cache names unique

Make sure that the "srpt-rsp-buf" cache names are unique. An example of
a unique name generated by this patch:

srpt-rsp-buf-fe80:0000:0000:0000:5054:00ff:fe5e:4708-enp1s0_siw-1

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 5dabcd0456d7 ("RDMA/srpt: Add support for immediate data")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
  drivers/infiniband/ulp/srpt/ib_srpt.c | 8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 9632afbd727b..c4feb39b3106 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -2164,6 +2164,7 @@ static int srpt_cm_req_recv(struct srpt_device 
*const sdev,
  	u32 it_iu_len;
  	int i, tag_num, tag_size, ret;
  	struct srpt_tpg *stpg;
+	char *cache_name;

  	WARN_ON_ONCE(irqs_disabled());

@@ -2245,8 +2246,13 @@ static int srpt_cm_req_recv(struct srpt_device 
*const sdev,
  	INIT_LIST_HEAD(&ch->cmd_wait_list);
  	ch->max_rsp_size = ch->sport->port_attrib.srp_max_rsp_size;

-	ch->rsp_buf_cache = kmem_cache_create("srpt-rsp-buf", ch->max_rsp_size,
+	cache_name = kasprintf(GFP_KERNEL, "srpt-rsp-buf-%s-%s-%d", src_addr,
+			       dev_name(&sport->sdev->device->dev), port_num);
+	if (!cache_name)
+		goto free_ch;
+	ch->rsp_buf_cache = kmem_cache_create(cache_name, ch->max_rsp_size,
  					      512, 0, NULL);
+	kfree(cache_name);
  	if (!ch->rsp_buf_cache)
  		goto free_ch;



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-03 20:56 ` Bart Van Assche
@ 2024-10-04  2:35   ` Zhu Yanjun
  2024-10-04  2:40   ` Shinichiro Kawasaki
  1 sibling, 0 replies; 11+ messages in thread
From: Zhu Yanjun @ 2024-10-04  2:35 UTC (permalink / raw)
  To: Bart Van Assche, Shinichiro Kawasaki, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org,
	nbd@other.debian.org, linux-rdma@vger.kernel.org

在 2024/10/4 4:56, Bart Van Assche 写道:
> On 10/3/24 1:02 AM, Shinichiro Kawasaki wrote:
>> #3: srp/001,002,011,012,013,014,016
>>
>>     The seven test cases in srp test group failed due to the WARN
>>     "kmem_cache of name 'srpt-rsp-buf' already exists" [4]. The 
>> failures are
>>     recreated in stable manner. They need further debug effort.
> 
> Does the patch below help?

Hi, Bart

What is the root cause of this problem?

The following patch just allocates a new memory with a unique name. Can 
we make sure that the allocated memory is freed?

Does this will cause memory leak?

Thanks,
Zhu Yanjun

> 
> Thanks,
> 
> Bart.
> 
> 
> Subject: [PATCH] RDMA/srpt: Make kmem cache names unique
> 
> Make sure that the "srpt-rsp-buf" cache names are unique. An example of
> a unique name generated by this patch:
> 
> srpt-rsp-buf-fe80:0000:0000:0000:5054:00ff:fe5e:4708-enp1s0_siw-1
> 
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Fixes: 5dabcd0456d7 ("RDMA/srpt: Add support for immediate data")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/infiniband/ulp/srpt/ib_srpt.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ 
> ulp/srpt/ib_srpt.c
> index 9632afbd727b..c4feb39b3106 100644
> --- a/drivers/infiniband/ulp/srpt/ib_srpt.c
> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
> @@ -2164,6 +2164,7 @@ static int srpt_cm_req_recv(struct srpt_device 
> *const sdev,
>       u32 it_iu_len;
>       int i, tag_num, tag_size, ret;
>       struct srpt_tpg *stpg;
> +    char *cache_name;
> 
>       WARN_ON_ONCE(irqs_disabled());
> 
> @@ -2245,8 +2246,13 @@ static int srpt_cm_req_recv(struct srpt_device 
> *const sdev,
>       INIT_LIST_HEAD(&ch->cmd_wait_list);
>       ch->max_rsp_size = ch->sport->port_attrib.srp_max_rsp_size;
> 
> -    ch->rsp_buf_cache = kmem_cache_create("srpt-rsp-buf", ch- 
>  >max_rsp_size,
> +    cache_name = kasprintf(GFP_KERNEL, "srpt-rsp-buf-%s-%s-%d", src_addr,
> +                   dev_name(&sport->sdev->device->dev), port_num);
> +    if (!cache_name)
> +        goto free_ch;
> +    ch->rsp_buf_cache = kmem_cache_create(cache_name, ch->max_rsp_size,
>                             512, 0, NULL);
> +    kfree(cache_name);
>       if (!ch->rsp_buf_cache)
>           goto free_ch;
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-03 20:56 ` Bart Van Assche
  2024-10-04  2:35   ` Zhu Yanjun
@ 2024-10-04  2:40   ` Shinichiro Kawasaki
  2024-10-04 12:40     ` Zhu Yanjun
  1 sibling, 1 reply; 11+ messages in thread
From: Shinichiro Kawasaki @ 2024-10-04  2:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On Oct 03, 2024 / 13:56, Bart Van Assche wrote:
> On 10/3/24 1:02 AM, Shinichiro Kawasaki wrote:
> > #3: srp/001,002,011,012,013,014,016
> > 
> >     The seven test cases in srp test group failed due to the WARN
> >     "kmem_cache of name 'srpt-rsp-buf' already exists" [4]. The failures are
> >     recreated in stable manner. They need further debug effort.
> 
> Does the patch below help?

Thanks Bart, but unfortunately, still the test cases fail with the message
below. I also noticed that similar WARN for 'srpt-req-buf' is observed. This
problem apply to both 'srpt-rsp-buf' and 'srpt-req-buf', probably.

------------[ cut here ]------------
kmem_cache of name 'srpt-rsp-buf-fec0:0000:0000:0000:5054:00ff:fe12:3456-ens3_siw-1' already exists
WARNING: CPU: 0 PID: 47 at mm/slab_common.c:107 __kmem_cache_create_args+0xa3/0x300
Modules linked in: ib_srp scsi_transport_srp target_core_user target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod rdma_cm iw_cm ib_cm ib_umad scsi_debug dm_service_time siw ib_uverbs null_blk ib_core nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr sunrpc 9pnet_virtio ppdev 9pnet netfs e1000 i2c_piix4 parport_pc pcspkr parport i2c_smbus fuse loop nfnetlink zram bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper xfs nvme drm floppy nvme_core sym53c8xx scsi_transport_spi nvme_auth serio_raw ata_generic pata_acpi dm_multipath qemu_fw_cfg
CPU: 0 UID: 0 PID: 47 Comm: kworker/u16:2 Not tainted 6.12.0-rc1+ #335
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
RIP: 0010:__kmem_cache_create_args+0xa3/0x300
Code: 8d 58 98 48 3d d0 a7 25 b2 74 21 48 8b 7b 60 48 89 ee e8 30 cd 06 02 85 c0 75 e0 48 89 ee 48 c7 c7 d0 db b0 b1 e8 dd 92 82 ff <0f> 0b be 20 00 00 00 48 89 ef e8 8e cd 06 02 48 85 c0 0f 85 02 02
RSP: 0018:ffff88810135f508 EFLAGS: 00010292
RAX: 0000000000000000 RBX: ffff888100289400 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffffb11bea60 RDI: 0000000000000001
RBP: ffff8881144bbb00 R08: 0000000000000001 R09: ffffed102026be4b
R10: ffff88810135f25f R11: 0000000000000001 R12: 0000000000000100
R13: ffff88810135f6c8 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8883ae000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4f8d878c58 CR3: 00000001376da000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __warn.cold+0x5f/0x1f8
 ? __kmem_cache_create_args+0xa3/0x300
 ? report_bug+0x1ec/0x390
 ? handle_bug+0x58/0x90
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? __kmem_cache_create_args+0xa3/0x300
 ? __kmem_cache_create_args+0xa3/0x300
 srpt_cm_req_recv.cold+0x12e0/0x46a4 [ib_srpt]
 ? vsnprintf+0x38b/0x18f0
 ? __pfx_vsnprintf+0x10/0x10
 ? __pfx_srpt_cm_req_recv+0x10/0x10 [ib_srpt]
 ? snprintf+0xa5/0xe0
 ? __pfx_snprintf+0x10/0x10
 ? lock_release+0x460/0x7a0
 srpt_rdma_cm_req_recv+0x35d/0x460 [ib_srpt]
 ? __pfx_srpt_rdma_cm_req_recv+0x10/0x10 [ib_srpt]
 ? rcu_is_watching+0x11/0xb0
 ? trace_cm_event_handler+0xf5/0x140 [rdma_cm]
 cma_cm_event_handler+0x88/0x210 [rdma_cm]
 iw_conn_req_handler+0x7a8/0xf10 [rdma_cm]
 ? __pfx_iw_conn_req_handler+0x10/0x10 [rdma_cm]
 ? alloc_work_entries+0x12f/0x260 [iw_cm]
 cm_work_handler+0x143f/0x1ba0 [iw_cm]
 ? __pfx_cm_work_handler+0x10/0x10 [iw_cm]
 ? process_one_work+0x7de/0x1460
 ? lock_acquire+0x2d/0xc0
 ? process_one_work+0x7de/0x1460
 process_one_work+0x85a/0x1460
 ? __pfx_lock_acquire.part.0+0x10/0x10
 ? __pfx_process_one_work+0x10/0x10
 ? assign_work+0x16c/0x240
 ? lock_is_held_type+0xd5/0x130
 worker_thread+0x5e2/0xfc0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x2d1/0x3a0
 ? _raw_spin_unlock_irq+0x24/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x30/0x70
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
irq event stamp: 53809
hardirqs last  enabled at (53823): [<ffffffffae3d59ce>] __up_console_sem+0x5e/0x70
hardirqs last disabled at (53834): [<ffffffffae3d59b3>] __up_console_sem+0x43/0x70
softirqs last  enabled at (53864): [<ffffffffae2277ab>] __irq_exit_rcu+0xbb/0x1c0
softirqs last disabled at (53843): [<ffffffffae2277ab>] __irq_exit_rcu+0xbb/0x1c0
---[ end trace 0000000000000000 ]---
ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68
------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-04  2:40   ` Shinichiro Kawasaki
@ 2024-10-04 12:40     ` Zhu Yanjun
  2024-10-04 16:31       ` Bart Van Assche
  0 siblings, 1 reply; 11+ messages in thread
From: Zhu Yanjun @ 2024-10-04 12:40 UTC (permalink / raw)
  To: Shinichiro Kawasaki, Bart Van Assche
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

在 2024/10/4 10:40, Shinichiro Kawasaki 写道:
> On Oct 03, 2024 / 13:56, Bart Van Assche wrote:
>> On 10/3/24 1:02 AM, Shinichiro Kawasaki wrote:
>>> #3: srp/001,002,011,012,013,014,016
>>>
>>>      The seven test cases in srp test group failed due to the WARN
>>>      "kmem_cache of name 'srpt-rsp-buf' already exists" [4]. The failures are
>>>      recreated in stable manner. They need further debug effort.
>>
>> Does the patch below help?
> 
> Thanks Bart, but unfortunately, still the test cases fail with the message
> below. I also noticed that similar WARN for 'srpt-req-buf' is observed. This
> problem apply to both 'srpt-rsp-buf' and 'srpt-req-buf', probably.
> 

Hi, Bart

I read the following commit in the link:

https://patchwork.kernel.org/project/linux-rdma/patch/20240920181129.37156-1-sebott@redhat.com/#:~:text=Add%20the%20device%20name%20to%20the%20per%20device

Maybe the root cause of this problem is the same with the above link.
So I add a jiffies (u64) value into the name.

Hope this can solve this problem.

Hi, Shinichiro

The following is the same with Bart's patch except that a jiffies value 
is added to make the name unique. I am not sure if you can make tests to 
verify this patch or not.

Thanks a lot.

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c 
b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 9632afbd727b..ea1f8e6072ac 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -2164,6 +2164,7 @@ static int srpt_cm_req_recv(struct srpt_device 
*const sdev,
         u32 it_iu_len;
         int i, tag_num, tag_size, ret;
         struct srpt_tpg *stpg;
+    char *cache_name;

         WARN_ON_ONCE(irqs_disabled());

@@ -2245,8 +2246,13 @@ static int srpt_cm_req_recv(struct srpt_device 
*const sdev,
         INIT_LIST_HEAD(&ch->cmd_wait_list);
         ch->max_rsp_size = ch->sport->port_attrib.srp_max_rsp_size;

-       ch->rsp_buf_cache = kmem_cache_create("srpt-rsp-buf", 
ch->max_rsp_size,
+    cache_name = kasprintf(GFP_KERNEL, "srpt-rsp-buf-%s-%s-%d-%llu", 
src_addr,
+                   dev_name(&sport->sdev->device->dev), port_num, 
get_jiffies_64());
+    if (!cache_name)
+        goto free_ch;
+    ch->rsp_buf_cache = kmem_cache_create(cache_name, ch->max_rsp_size,
                                               512, 0, NULL);
+    kfree(cache_name);
         if (!ch->rsp_buf_cache)
                 goto free_ch;

Zhu Yanjun

> ------------[ cut here ]------------
> kmem_cache of name 'srpt-rsp-buf-fec0:0000:0000:0000:5054:00ff:fe12:3456-ens3_siw-1' already exists
> WARNING: CPU: 0 PID: 47 at mm/slab_common.c:107 __kmem_cache_create_args+0xa3/0x300
> Modules linked in: ib_srp scsi_transport_srp target_core_user target_core_pscsi target_core_file ib_srpt target_core_iblock target_core_mod rdma_cm iw_cm ib_cm ib_umad scsi_debug dm_service_time siw ib_uverbs null_blk ib_core nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr sunrpc 9pnet_virtio ppdev 9pnet netfs e1000 i2c_piix4 parport_pc pcspkr parport i2c_smbus fuse loop nfnetlink zram bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper xfs nvme drm floppy nvme_core sym53c8xx scsi_transport_spi nvme_auth serio_raw ata_generic pata_acpi dm_multipath qemu_fw_cfg
> CPU: 0 UID: 0 PID: 47 Comm: kworker/u16:2 Not tainted 6.12.0-rc1+ #335
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
> Workqueue: iw_cm_wq cm_work_handler [iw_cm]
> RIP: 0010:__kmem_cache_create_args+0xa3/0x300
> Code: 8d 58 98 48 3d d0 a7 25 b2 74 21 48 8b 7b 60 48 89 ee e8 30 cd 06 02 85 c0 75 e0 48 89 ee 48 c7 c7 d0 db b0 b1 e8 dd 92 82 ff <0f> 0b be 20 00 00 00 48 89 ef e8 8e cd 06 02 48 85 c0 0f 85 02 02
> RSP: 0018:ffff88810135f508 EFLAGS: 00010292
> RAX: 0000000000000000 RBX: ffff888100289400 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffffb11bea60 RDI: 0000000000000001
> RBP: ffff8881144bbb00 R08: 0000000000000001 R09: ffffed102026be4b
> R10: ffff88810135f25f R11: 0000000000000001 R12: 0000000000000100
> R13: ffff88810135f6c8 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff8883ae000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4f8d878c58 CR3: 00000001376da000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   <TASK>
>   ? __warn.cold+0x5f/0x1f8
>   ? __kmem_cache_create_args+0xa3/0x300
>   ? report_bug+0x1ec/0x390
>   ? handle_bug+0x58/0x90
>   ? exc_invalid_op+0x13/0x40
>   ? asm_exc_invalid_op+0x16/0x20
>   ? __kmem_cache_create_args+0xa3/0x300
>   ? __kmem_cache_create_args+0xa3/0x300
>   srpt_cm_req_recv.cold+0x12e0/0x46a4 [ib_srpt]
>   ? vsnprintf+0x38b/0x18f0
>   ? __pfx_vsnprintf+0x10/0x10
>   ? __pfx_srpt_cm_req_recv+0x10/0x10 [ib_srpt]
>   ? snprintf+0xa5/0xe0
>   ? __pfx_snprintf+0x10/0x10
>   ? lock_release+0x460/0x7a0
>   srpt_rdma_cm_req_recv+0x35d/0x460 [ib_srpt]
>   ? __pfx_srpt_rdma_cm_req_recv+0x10/0x10 [ib_srpt]
>   ? rcu_is_watching+0x11/0xb0
>   ? trace_cm_event_handler+0xf5/0x140 [rdma_cm]
>   cma_cm_event_handler+0x88/0x210 [rdma_cm]
>   iw_conn_req_handler+0x7a8/0xf10 [rdma_cm]
>   ? __pfx_iw_conn_req_handler+0x10/0x10 [rdma_cm]
>   ? alloc_work_entries+0x12f/0x260 [iw_cm]
>   cm_work_handler+0x143f/0x1ba0 [iw_cm]
>   ? __pfx_cm_work_handler+0x10/0x10 [iw_cm]
>   ? process_one_work+0x7de/0x1460
>   ? lock_acquire+0x2d/0xc0
>   ? process_one_work+0x7de/0x1460
>   process_one_work+0x85a/0x1460
>   ? __pfx_lock_acquire.part.0+0x10/0x10
>   ? __pfx_process_one_work+0x10/0x10
>   ? assign_work+0x16c/0x240
>   ? lock_is_held_type+0xd5/0x130
>   worker_thread+0x5e2/0xfc0
>   ? __pfx_worker_thread+0x10/0x10
>   kthread+0x2d1/0x3a0
>   ? _raw_spin_unlock_irq+0x24/0x50
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x30/0x70
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork_asm+0x1a/0x30
>   </TASK>
> irq event stamp: 53809
> hardirqs last  enabled at (53823): [<ffffffffae3d59ce>] __up_console_sem+0x5e/0x70
> hardirqs last disabled at (53834): [<ffffffffae3d59b3>] __up_console_sem+0x43/0x70
> softirqs last  enabled at (53864): [<ffffffffae2277ab>] __irq_exit_rcu+0xbb/0x1c0
> softirqs last disabled at (53843): [<ffffffffae2277ab>] __irq_exit_rcu+0xbb/0x1c0
> ---[ end trace 0000000000000000 ]---
> ib_srpt:srpt_cm_req_recv: ib_srpt imm_data_offset = 68
> ------------[ cut here ]------------


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-04 12:40     ` Zhu Yanjun
@ 2024-10-04 16:31       ` Bart Van Assche
  2024-10-05  1:26         ` Zhu Yanjun
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2024-10-04 16:31 UTC (permalink / raw)
  To: Zhu Yanjun, Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On 10/4/24 5:40 AM, Zhu Yanjun wrote:
> So I add a jiffies (u64) value into the name.

I don't think that embedding the value of the jiffies counter in the 
kmem cache names is sufficient to make cache names unique. That sounds 
like a fragile approach to me.

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-04 16:31       ` Bart Van Assche
@ 2024-10-05  1:26         ` Zhu Yanjun
  2024-10-05  1:41           ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Zhu Yanjun @ 2024-10-05  1:26 UTC (permalink / raw)
  To: Bart Van Assche, Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org


在 2024/10/5 0:31, Bart Van Assche 写道:
> On 10/4/24 5:40 AM, Zhu Yanjun wrote:
>> So I add a jiffies (u64) value into the name.
>
> I don't think that embedding the value of the jiffies counter in the 
> kmem cache names is sufficient to make cache names unique. That sounds 
> like a fragile approach to me.

Sorry. I can not get you. Why jiffies counter is not sufficient to make 
cache names unique? And why is it a fragile approach?

Can you share your advice with us?

I read your latest commit. In your commit, the ida is used to make cache 
names unique. It is a good approach if it can fix this problem.

The approach of jiffies seems clumsy. But it seems to be able to fix 
this problem, too. I can not see any risks about this jiffies appraoch.

Zhu Yanjun

>
> Bart.

-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-05  1:26         ` Zhu Yanjun
@ 2024-10-05  1:41           ` Jens Axboe
  2024-10-05  8:18             ` Zhu Yanjun
  2024-10-05 21:36             ` Bart Van Assche
  0 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2024-10-05  1:41 UTC (permalink / raw)
  To: Zhu Yanjun, Bart Van Assche, Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On 10/4/24 7:26 PM, Zhu Yanjun wrote:
> 
> ? 2024/10/5 0:31, Bart Van Assche ??:
>> On 10/4/24 5:40 AM, Zhu Yanjun wrote:
>>> So I add a jiffies (u64) value into the name.
>>
>> I don't think that embedding the value of the jiffies counter in the kmem cache names is sufficient to make cache names unique. That sounds like a fragile approach to me.
> 
> Sorry. I can not get you. Why jiffies counter is not sufficient to
> make cache names unique? And why is it a fragile approach?

1 jiffy is an eternity, what happens if someone calls
kmem_cache_create() twice in that window?

> I read your latest commit. In your commit, the ida is used to make
> cache names unique. It is a good approach if it can fix this problem.

That seems over-engineered. Seems to me that either these things should
share a slab cache (why do they need one each, if they are the same
sized object?!). And if they really do need one, surely something ala:

static atomic_long_t slab_index;

sprintf(slab_name, "foo-%ld", atomic_inc_return(&slab_index));

would be all you need.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-05  1:41           ` Jens Axboe
@ 2024-10-05  8:18             ` Zhu Yanjun
  2024-10-05 21:36             ` Bart Van Assche
  1 sibling, 0 replies; 11+ messages in thread
From: Zhu Yanjun @ 2024-10-05  8:18 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org


在 2024/10/5 9:41, Jens Axboe 写道:
> On 10/4/24 7:26 PM, Zhu Yanjun wrote:
>> ? 2024/10/5 0:31, Bart Van Assche ??:
>>> On 10/4/24 5:40 AM, Zhu Yanjun wrote:
>>>> So I add a jiffies (u64) value into the name.
>>> I don't think that embedding the value of the jiffies counter in the kmem cache names is sufficient to make cache names unique. That sounds like a fragile approach to me.
>> Sorry. I can not get you. Why jiffies counter is not sufficient to
>> make cache names unique? And why is it a fragile approach?
> 1 jiffy is an eternity, what happens if someone calls
> kmem_cache_create() twice in that window?

Got it. Thanks a lot.

Zhu Yanjun

>
>> I read your latest commit. In your commit, the ida is used to make
>> cache names unique. It is a good approach if it can fix this problem.
> That seems over-engineered. Seems to me that either these things should
> share a slab cache (why do they need one each, if they are the same
> sized object?!). And if they really do need one, surely something ala:
>
> static atomic_long_t slab_index;
>
> sprintf(slab_name, "foo-%ld", atomic_inc_return(&slab_index));
>
> would be all you need.
>
-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-05  1:41           ` Jens Axboe
  2024-10-05  8:18             ` Zhu Yanjun
@ 2024-10-05 21:36             ` Bart Van Assche
  2024-10-05 21:45               ` Jens Axboe
  1 sibling, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2024-10-05 21:36 UTC (permalink / raw)
  To: Jens Axboe, Zhu Yanjun, Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On 10/4/24 6:41 PM, Jens Axboe wrote:
> That seems over-engineered. Seems to me that either these things should
> share a slab cache (why do they need one each, if they are the same
> sized object?!).

The size of two of the three slab caches is variable.

> And if they really do need one, surely something ala:
> 
> static atomic_long_t slab_index;
> 
> sprintf(slab_name, "foo-%ld", atomic_inc_return(&slab_index));
> 
> would be all you need.

A 32-bit counter wraps around after about 4 billion iterations, isn't
it?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: blktests failures with v6.12-rc1 kernel
  2024-10-05 21:36             ` Bart Van Assche
@ 2024-10-05 21:45               ` Jens Axboe
  0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2024-10-05 21:45 UTC (permalink / raw)
  To: Bart Van Assche, Zhu Yanjun, Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On 10/5/24 3:36 PM, Bart Van Assche wrote:
> On 10/4/24 6:41 PM, Jens Axboe wrote:
>> That seems over-engineered. Seems to me that either these things should
>> share a slab cache (why do they need one each, if they are the same
>> sized object?!).
> 
> The size of two of the three slab caches is variable.
> 
>> And if they really do need one, surely something ala:
>>
>> static atomic_long_t slab_index;
>>
>> sprintf(slab_name, "foo-%ld", atomic_inc_return(&slab_index));
>>
>> would be all you need.
> 
> A 32-bit counter wraps around after about 4 billion iterations, isn't
> it?

I did use an atomic_long_t, just forgot to use that for the pseudo
code inc and return. Though I highly doubt it matters in practice...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-10-05 21:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-03  8:02 blktests failures with v6.12-rc1 kernel Shinichiro Kawasaki
2024-10-03 20:56 ` Bart Van Assche
2024-10-04  2:35   ` Zhu Yanjun
2024-10-04  2:40   ` Shinichiro Kawasaki
2024-10-04 12:40     ` Zhu Yanjun
2024-10-04 16:31       ` Bart Van Assche
2024-10-05  1:26         ` Zhu Yanjun
2024-10-05  1:41           ` Jens Axboe
2024-10-05  8:18             ` Zhu Yanjun
2024-10-05 21:36             ` Bart Van Assche
2024-10-05 21:45               ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox