All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Dan Aloni <dan@kernelim.com>
Cc: Leon Romanovsky <leonro@nvidia.com>, <linux-rdma@vger.kernel.org>
Subject: Re: RDMA/addr: NULL dereference in process_one_req
Date: Tue, 22 Sep 2020 14:09:51 -0300	[thread overview]
Message-ID: <20200922170951.GE3699@nvidia.com> (raw)
In-Reply-To: <20200922151348.GA4103095@gmail.com>

On Tue, Sep 22, 2020 at 06:13:48PM +0300, Dan Aloni wrote:
> The Oops below [1], is quite rare, and occurs after awhile when kernel
> code repeatedly tries to resolve addresses. According to my analysis the
> work item is executed twice, and in the second time a NULL value of
> `req->callback` triggers this Oops.

Hum I think the race is rdma_addr_cancel(), process_one_req() and 
netevent_callback() running concurrently

It is very narrow but it looks like netevent_callback() could cause
the work to become running such that rdma_addr_cancel() has already
done the list_del_init() which causes the cancel_delayed_work() to be
skipped, and the work re-run before rdma_addr_cancel() hits its
cancel_work_sync()

Please try this:

From fac94acc7a6fb4d78ddd06c51674110937442d15 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@nvidia.com>
Date: Tue, 22 Sep 2020 13:54:17 -0300
Subject: [PATCH] RDMA/addr: Fix race with
 netevent_callback()/rdma_addr_cancel()

This three thread race can result in the work being run once the callback
becomes NULL:

       CPU1                 CPU2                   CPU3
 netevent_callback()
                     process_one_req()       rdma_addr_cancel()
                      [..]
     spin_lock_bh()
  	set_timeout()
     spin_unlock_bh()

						spin_lock_bh()
						list_del_init(&req->list);
						spin_unlock_bh()

		     req->callback = NULL
		     spin_lock_bh()
		       if (!list_empty(&req->list))
                         // Skipped!
		         // cancel_delayed_work(&req->work);
		     spin_unlock_bh()

		    process_one_req() // again
		     req->callback() // BOOM
						cancel_delayed_work_sync()

The solution is to always cancel the work once it is completed so any
inbetween set_timeout() does not result in it running again.

Fixes: 44e75052bc2a ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/infiniband/core/addr.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 3a98439bba832b..0abce004a9591e 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -647,13 +647,12 @@ static void process_one_req(struct work_struct *_work)
 	req->callback = NULL;
 
 	spin_lock_bh(&lock);
+	/*
+	 * Although the work will normally have been canceled by the workqueue,
+	 * it can still be requeued as long as it is on the req_list.
+	 */
+	cancel_delayed_work(&req->work);
 	if (!list_empty(&req->list)) {
-		/*
-		 * Although the work will normally have been canceled by the
-		 * workqueue, it can still be requeued as long as it is on the
-		 * req_list.
-		 */
-		cancel_delayed_work(&req->work);
 		list_del_init(&req->list);
 		kfree(req);
 	}
-- 
2.28.0


  reply	other threads:[~2020-09-22 17:10 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-22 15:13 RDMA/addr: NULL dereference in process_one_req Dan Aloni
2020-09-22 17:09 ` Jason Gunthorpe [this message]
2020-09-23  4:45   ` Dan Aloni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200922170951.GE3699@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=dan@kernelim.com \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.