* Kernel warning at drivers/infiniband/core/rw.c:349
@ 2021-10-13 0:07 Bart Van Assche
2021-10-13 0:30 ` Logan Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2021-10-13 0:07 UTC (permalink / raw)
To: Logan Gunthorpe; +Cc: linux-rdma@vger.kernel.org, Jason Gunthorpe
Hi,
If I run the SRP tests against the for-next branch of the RDMA git tree
then the following warning appears (commit 2a152512a155 ("RDMA/efa: CQ
notifications")):
------------[ cut here ]------------
WARNING: CPU: 69 PID: 838 at drivers/infiniband/core/rw.c:349
rdma_rw_ctx_init+0x63b/0x690 [ib_core]
CPU: 69 PID: 838 Comm: kworker/69:1H Tainted: G E 5.15.0-rc4-dbg+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
RIP: 0010:rdma_rw_ctx_init+0x63b/0x690 [ib_core]
Code: 8b 45 10 49 8d 7e 48 49 89 46 40 e8 cf 32 ca e0 8b 45 18 49 8d 7e
04 41 89 46 48 e8 df 30 ca e0 41 c6 46 04 00 e9 61 fe ff ff <0f> 0b 41
bc fb ff ff ff e9 3e fe ff ff 48 8b 9d 70 ff ff ff 48 8d
RSP: 0018:ffff88810b867968 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000024 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffff888169ee9a40 RDI: ffff888169ee9a58
RBP: ffff88810b867a20 R08: ffffffffa081b01b R09: 0000000000000000
R10: ffffed1085d2e3f1 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: ffff888169ee9a58 R15: ffff888169ee9a40
FS: 0000000000000000(0000) GS:ffff88842e940000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4720169e88 CR3: 00000001895d9006 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
srpt_alloc_rw_ctxs+0x2f2/0x560 [ib_srpt]
srpt_get_desc_tbl.constprop.0+0x289/0x2e0 [ib_srpt]
srpt_handle_cmd+0x17f/0x2b0 [ib_srpt]
srpt_handle_new_iu+0x27e/0x520 [ib_srpt]
srpt_recv_done+0x9b/0xd0 [ib_srpt]
__ib_process_cq+0x121/0x3d0 [ib_core]
ib_cq_poll_work+0x37/0xb0 [ib_core]
process_one_work+0x585/0xae0
worker_thread+0x2e7/0x700
kthread+0x1f6/0x220
ret_from_fork+0x1f/0x30
irq event stamp: 1255
hardirqs last enabled at (1263): [<ffffffff811ab2c8>]
__up_console_sem+0x58/0x60
hardirqs last disabled at (1270): [<ffffffff811ab2ad>]
__up_console_sem+0x3d/0x60
softirqs last enabled at (1290): [<ffffffff82200473>]
__do_softirq+0x473/0x6ed
softirqs last disabled at (1279): [<ffffffff810e2152>]
__irq_exit_rcu+0xf2/0x140
---[ end trace 81a8636fba7e1a77 ]---
Does this perhaps indicate a regression in the RDMA rw code?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Kernel warning at drivers/infiniband/core/rw.c:349 2021-10-13 0:07 Kernel warning at drivers/infiniband/core/rw.c:349 Bart Van Assche @ 2021-10-13 0:30 ` Logan Gunthorpe 2021-10-13 5:34 ` Bart Van Assche 0 siblings, 1 reply; 6+ messages in thread From: Logan Gunthorpe @ 2021-10-13 0:30 UTC (permalink / raw) To: Bart Van Assche; +Cc: linux-rdma@vger.kernel.org, Jason Gunthorpe On 2021-10-12 6:07 p.m., Bart Van Assche wrote: > Hi, > > If I run the SRP tests against the for-next branch of the RDMA git tree > then the following warning appears (commit 2a152512a155 ("RDMA/efa: CQ > notifications")): > > ------------[ cut here ]------------ > WARNING: CPU: 69 PID: 838 at drivers/infiniband/core/rw.c:349 > rdma_rw_ctx_init+0x63b/0x690 [ib_core] > CPU: 69 PID: 838 Comm: kworker/69:1H Tainted: G E 5.15.0-rc4-dbg+ #2 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 > Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] > RIP: 0010:rdma_rw_ctx_init+0x63b/0x690 [ib_core] > Code: 8b 45 10 49 8d 7e 48 49 89 46 40 e8 cf 32 ca e0 8b 45 18 49 8d 7e > 04 41 89 46 48 e8 df 30 ca e0 41 c6 46 04 00 e9 61 fe ff ff <0f> 0b 41 > bc fb ff ff ff e9 3e fe ff ff 48 8b 9d 70 ff ff ff 48 8d > RSP: 0018:ffff88810b867968 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000024 RCX: dffffc0000000000 > RDX: 0000000000000000 RSI: ffff888169ee9a40 RDI: ffff888169ee9a58 > RBP: ffff88810b867a20 R08: ffffffffa081b01b R09: 0000000000000000 > R10: ffffed1085d2e3f1 R11: 0000000000000001 R12: 0000000000000000 > R13: 0000000000000000 R14: ffff888169ee9a58 R15: ffff888169ee9a40 > FS: 0000000000000000(0000) GS:ffff88842e940000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f4720169e88 CR3: 00000001895d9006 CR4: 0000000000770ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > Call Trace: > srpt_alloc_rw_ctxs+0x2f2/0x560 [ib_srpt] > srpt_get_desc_tbl.constprop.0+0x289/0x2e0 [ib_srpt] > srpt_handle_cmd+0x17f/0x2b0 [ib_srpt] > srpt_handle_new_iu+0x27e/0x520 [ib_srpt] > srpt_recv_done+0x9b/0xd0 [ib_srpt] > __ib_process_cq+0x121/0x3d0 [ib_core] > ib_cq_poll_work+0x37/0xb0 [ib_core] > process_one_work+0x585/0xae0 > worker_thread+0x2e7/0x700 > kthread+0x1f6/0x220 > ret_from_fork+0x1f/0x30 > irq event stamp: 1255 > hardirqs last enabled at (1263): [<ffffffff811ab2c8>] > __up_console_sem+0x58/0x60 > hardirqs last disabled at (1270): [<ffffffff811ab2ad>] > __up_console_sem+0x3d/0x60 > softirqs last enabled at (1290): [<ffffffff82200473>] > __do_softirq+0x473/0x6ed > softirqs last disabled at (1279): [<ffffffff810e2152>] > __irq_exit_rcu+0xf2/0x140 > ---[ end trace 81a8636fba7e1a77 ]--- > > Does this perhaps indicate a regression in the RDMA rw code? Hmm, yes looks like a regression with my recent patch. Best I can see from the code is that someone is passing an sg_cnt of zero. Previously that would have returned -ENOMEM, but now it might be ignored, in which case it would hit that WARNING and return -EIO. We can try a patch such as below to confirm. Logan -- diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c index 5a3bd41b331c..4eb9781ccfaf 100644 --- a/drivers/infiniband/core/rw.c +++ b/drivers/infiniband/core/rw.c @@ -331,6 +331,10 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u3> return ret; sg_cnt = sgt.nents; + ret = -EIO; + if (!sg_cnt) + goto out_unmap_sg; + /* * Skip to the S/G entry that sg_offset falls into: */ ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Kernel warning at drivers/infiniband/core/rw.c:349 2021-10-13 0:30 ` Logan Gunthorpe @ 2021-10-13 5:34 ` Bart Van Assche 2021-10-13 16:15 ` Logan Gunthorpe 0 siblings, 1 reply; 6+ messages in thread From: Bart Van Assche @ 2021-10-13 5:34 UTC (permalink / raw) To: Logan Gunthorpe; +Cc: linux-rdma@vger.kernel.org, Jason Gunthorpe On 10/12/21 17:30, Logan Gunthorpe wrote: > Best I can see from the code is that someone is passing an sg_cnt of > zero. Previously that would have returned -ENOMEM, but now it might be > ignored, in which case it would hit that WARNING and return -EIO. That is not what is happening. The debug patch shown below learned me the following: * The sg_cnt argument of rdma_rw_ctx_init() is not zero. * After the rdma_rw_map_sgtable() call, sgt.nents is zero. The debug patch that I used is as follows: diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c index 5a3bd41b331c..a6dabea37958 100644 --- a/drivers/infiniband/core/rw.c +++ b/drivers/infiniband/core/rw.c @@ -326,11 +326,15 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u32 port_num, }; int ret; + WARN_ON_ONCE(!sg_cnt); + ret = rdma_rw_map_sgtable(dev, &sgt, dir); if (ret) return ret; sg_cnt = sgt.nents; + WARN_ON_ONCE(!sg_cnt); + /* * Skip to the S/G entry that sg_offset falls into: */ diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 3cadf1295417..d9e3d52eb952 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -911,11 +911,16 @@ static int srpt_alloc_rw_ctxs(struct srpt_send_ioctx *ioctx, u32 size = be32_to_cpu(db->len); u32 rkey = be32_to_cpu(db->key); + WARN_ON_ONCE(!size); + ret = target_alloc_sgl(&ctx->sg, &ctx->nents, size, false, i < nbufs - 1); if (ret) goto unwind; + WARN_ONCE(ctx->nents <= 0, "%u bytes -> %d entries\n", + size, ctx->nents); + ret = rdma_rw_ctx_init(&ctx->rw, ch->qp, ch->sport->port, ctx->sg, ctx->nents, 0, remote_addr, rkey, dir); if (ret < 0) { ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Kernel warning at drivers/infiniband/core/rw.c:349 2021-10-13 5:34 ` Bart Van Assche @ 2021-10-13 16:15 ` Logan Gunthorpe 2021-10-13 16:20 ` Jason Gunthorpe 2021-10-13 16:38 ` Bart Van Assche 0 siblings, 2 replies; 6+ messages in thread From: Logan Gunthorpe @ 2021-10-13 16:15 UTC (permalink / raw) To: Bart Van Assche; +Cc: linux-rdma@vger.kernel.org, Jason Gunthorpe On 2021-10-12 11:34 p.m., Bart Van Assche wrote: > On 10/12/21 17:30, Logan Gunthorpe wrote: >> Best I can see from the code is that someone is passing an sg_cnt of >> zero. Previously that would have returned -ENOMEM, but now it might be >> ignored, in which case it would hit that WARNING and return -EIO. > > That is not what is happening. The debug patch shown below learned me > the following: > * The sg_cnt argument of rdma_rw_ctx_init() is not zero. > * After the rdma_rw_map_sgtable() call, sgt.nents is zero. > > The debug patch that I used is as follows: Ah, hmm. Perhaps it's this... The virt path in ib_dma_map_sgtable_attrs() doesn't set the sgt.nents... Maybe try this something like the patch below. Thanks, Logan -- diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 4b50d9a3018a..4ba642fc8a19 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -4097,8 +4097,13 @@ static inline int ib_dma_map_sgtable_attrs(struct ib_dev> enum dma_data_direction direction, unsigned long dma_attrs) { + int nents; + if (ib_uses_virt_dma(dev)) { - ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents); + nents = ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents); + if (!nents) + return -EIO; + sgt->nents = nents; return 0; } return dma_map_sgtable(dev->dma_device, sgt, direction, dma_attrs); ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Kernel warning at drivers/infiniband/core/rw.c:349 2021-10-13 16:15 ` Logan Gunthorpe @ 2021-10-13 16:20 ` Jason Gunthorpe 2021-10-13 16:38 ` Bart Van Assche 1 sibling, 0 replies; 6+ messages in thread From: Jason Gunthorpe @ 2021-10-13 16:20 UTC (permalink / raw) To: Logan Gunthorpe; +Cc: Bart Van Assche, linux-rdma@vger.kernel.org On Wed, Oct 13, 2021 at 10:15:59AM -0600, Logan Gunthorpe wrote: > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > index 4b50d9a3018a..4ba642fc8a19 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -4097,8 +4097,13 @@ static inline int ib_dma_map_sgtable_attrs(struct ib_dev> > enum dma_data_direction direction, > unsigned long dma_attrs) > { > + int nents; > + > if (ib_uses_virt_dma(dev)) { > - ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents); > + nents = ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents); > + if (!nents) > + return -EIO; > + sgt->nents = nents; > return 0; > } Oh yes, that definitely looks needed. Jason ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel warning at drivers/infiniband/core/rw.c:349 2021-10-13 16:15 ` Logan Gunthorpe 2021-10-13 16:20 ` Jason Gunthorpe @ 2021-10-13 16:38 ` Bart Van Assche 1 sibling, 0 replies; 6+ messages in thread From: Bart Van Assche @ 2021-10-13 16:38 UTC (permalink / raw) To: Logan Gunthorpe; +Cc: linux-rdma@vger.kernel.org, Jason Gunthorpe On 10/13/21 9:15 AM, Logan Gunthorpe wrote: > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > index 4b50d9a3018a..4ba642fc8a19 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -4097,8 +4097,13 @@ static inline int ib_dma_map_sgtable_attrs(struct ib_dev> > enum dma_data_direction direction, > unsigned long dma_attrs) > { > + int nents; > + > if (ib_uses_virt_dma(dev)) { > - ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents); > + nents = ib_dma_virt_map_sg(dev, sgt->sgl, sgt->orig_nents); > + if (!nents) > + return -EIO; > + sgt->nents = nents; > return 0; > } > return dma_map_sgtable(dev->dma_device, sgt, direction, dma_attrs); Thanks! Tested-by: Bart Van Assche <bvanassche@acm.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-10-13 16:38 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-10-13 0:07 Kernel warning at drivers/infiniband/core/rw.c:349 Bart Van Assche 2021-10-13 0:30 ` Logan Gunthorpe 2021-10-13 5:34 ` Bart Van Assche 2021-10-13 16:15 ` Logan Gunthorpe 2021-10-13 16:20 ` Jason Gunthorpe 2021-10-13 16:38 ` Bart Van Assche
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.