From: Jason Gunthorpe <jgg@nvidia.com>
To: Dan Aloni <dan@kernelim.com>
Cc: Leon Romanovsky <leonro@nvidia.com>, <linux-rdma@vger.kernel.org>
Subject: Re: RDMA/addr: NULL dereference in process_one_req
Date: Tue, 22 Sep 2020 14:09:51 -0300 [thread overview]
Message-ID: <20200922170951.GE3699@nvidia.com> (raw)
In-Reply-To: <20200922151348.GA4103095@gmail.com>
On Tue, Sep 22, 2020 at 06:13:48PM +0300, Dan Aloni wrote:
> The Oops below [1], is quite rare, and occurs after awhile when kernel
> code repeatedly tries to resolve addresses. According to my analysis the
> work item is executed twice, and in the second time a NULL value of
> `req->callback` triggers this Oops.
Hum I think the race is rdma_addr_cancel(), process_one_req() and
netevent_callback() running concurrently
It is very narrow but it looks like netevent_callback() could cause
the work to become running such that rdma_addr_cancel() has already
done the list_del_init() which causes the cancel_delayed_work() to be
skipped, and the work re-run before rdma_addr_cancel() hits its
cancel_work_sync()
Please try this:
From fac94acc7a6fb4d78ddd06c51674110937442d15 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@nvidia.com>
Date: Tue, 22 Sep 2020 13:54:17 -0300
Subject: [PATCH] RDMA/addr: Fix race with
netevent_callback()/rdma_addr_cancel()
This three thread race can result in the work being run once the callback
becomes NULL:
CPU1 CPU2 CPU3
netevent_callback()
process_one_req() rdma_addr_cancel()
[..]
spin_lock_bh()
set_timeout()
spin_unlock_bh()
spin_lock_bh()
list_del_init(&req->list);
spin_unlock_bh()
req->callback = NULL
spin_lock_bh()
if (!list_empty(&req->list))
// Skipped!
// cancel_delayed_work(&req->work);
spin_unlock_bh()
process_one_req() // again
req->callback() // BOOM
cancel_delayed_work_sync()
The solution is to always cancel the work once it is completed so any
inbetween set_timeout() does not result in it running again.
Fixes: 44e75052bc2a ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/infiniband/core/addr.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 3a98439bba832b..0abce004a9591e 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -647,13 +647,12 @@ static void process_one_req(struct work_struct *_work)
req->callback = NULL;
spin_lock_bh(&lock);
+ /*
+ * Although the work will normally have been canceled by the workqueue,
+ * it can still be requeued as long as it is on the req_list.
+ */
+ cancel_delayed_work(&req->work);
if (!list_empty(&req->list)) {
- /*
- * Although the work will normally have been canceled by the
- * workqueue, it can still be requeued as long as it is on the
- * req_list.
- */
- cancel_delayed_work(&req->work);
list_del_init(&req->list);
kfree(req);
}
--
2.28.0
next prev parent reply other threads:[~2020-09-22 17:10 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-22 15:13 RDMA/addr: NULL dereference in process_one_req Dan Aloni
2020-09-22 17:09 ` Jason Gunthorpe [this message]
2020-09-23 4:45 ` Dan Aloni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200922170951.GE3699@nvidia.com \
--to=jgg@nvidia.com \
--cc=dan@kernelim.com \
--cc=leonro@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox