From: Dan Aloni <dan@kernelim.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>, linux-rdma@vger.kernel.org
Subject: RDMA/addr: NULL dereference in process_one_req
Date: Tue, 22 Sep 2020 18:13:48 +0300 [thread overview]
Message-ID: <20200922151348.GA4103095@gmail.com> (raw)
The Oops below [1], is quite rare, and occurs after awhile when kernel
code repeatedly tries to resolve addresses. According to my analysis the
work item is executed twice, and in the second time a NULL value of
`req->callback` triggers this Oops.
After many run iterations, I did managed to reproduce this issue once
with an isolated sample kernel code I posted at this address:
https://github.com/kernelim/ibaddr-null-deref-repro
The sample code works similarly to the client code in the rpcrdma kernel
module.
Is it possible that once a work item is executing, the netevent-based
side call to requeue it in `set_timeout`, puts it on another CPU while
it is still running? Otherwise it is hard to explain what I'm seeing.
My sample code also attempts to inject a notifier NETEVENT_NEIGH_UPDATE
event to trigger this, but it did not increase the frequency of
reproduction.
I'm experimenting with a fix [2] but I'm not sure it would solve this
issue yet. I'm hoping for more suggestions and insight.
Thanks
[1]
[165371.631784] Workqueue: ib_addr process_one_req [ib_core]
[165371.637268] RIP: 0010:0x0
[165371.640066] Code: Bad RIP value.
[165371.643468] RSP: 0018:ffffb484cfd87e60 EFLAGS: 00010297
[165371.648870] RAX: 0000000000000000 RBX: ffff94ef2e027130 RCX: ffff94eee8271800
[165371.656196] RDX: ffff94eee8271920 RSI: ffff94ef2e027010 RDI: 00000000ffffff92
[165371.663518] RBP: ffffb484cfd87e80 R08: 00726464615f6269 R09: 8080808080808080
[165371.670839] R10: ffffb484cfd87c68 R11: fefefefefefefeff R12: ffff94ef2e027000
[165371.678162] R13: ffff94ef2e027010 R14: ffff94ef2e027130 R15: 0ffff951f2c624a0
[165371.685485] FS: 0000000000000000(0000) GS:ffff94ef40e80000(0000) knlGS:0000000000000000
[165371.693762] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[165371.699681] CR2: ffffffffffffffd6 CR3: 0000005eca20a002 CR4: 00000000007606e0
[165371.707001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[165371.714325] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[165371.721647] PKRU: 55555554
[165371.724526] Call Trace:
[165371.727170] process_one_req+0x39/0x150 [ib_core]
[165371.732051] process_one_work+0x20f/0x400
[165371.736242] worker_thread+0x34/0x410
[165371.740082] kthread+0x121/0x140
[165371.743484] ? process_one_work+0x400/0x400
[165371.747844] ? kthread_park+0x90/0x90
[165371.751681] ret_from_fork+0x1f/0x40
[2]
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 3a98439bba83..6d7c325cb8e6 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -636,7 +636,8 @@ static void process_one_req(struct work_struct *_work)
/* requeue the work for retrying again */
spin_lock_bh(&lock);
if (!list_empty(&req->list))
- set_timeout(req, req->timeout);
+ if (delayed_work_pending(&req->work))
+ set_timeout(req, req->timeout);
spin_unlock_bh(&lock);
return;
}
Signed-off-by: Dan Aloni <dan@kernelim.com>
--
Dan Aloni
next reply other threads:[~2020-09-22 15:13 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-22 15:13 Dan Aloni [this message]
2020-09-22 17:09 ` RDMA/addr: NULL dereference in process_one_req Jason Gunthorpe
2020-09-23 4:45 ` Dan Aloni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200922151348.GA4103095@gmail.com \
--to=dan@kernelim.com \
--cc=jgg@nvidia.com \
--cc=leonro@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.