All of lore.kernel.org
 help / color / mirror / Atom feed
From: yanjun.zhu@linux.dev
To: jgg@ziepe.ca, leon@kernel.org, linux-rdma@vger.kernel.org,
	yanjun.zhu@linux.dev
Cc: Yi Zhang <yi.zhang@redhat.com>
Subject: [PATCHv6 1/4] RDMA/rxe: Fix dead lock caused by __rxe_add_to_pool interrupted by rxe_pool_get_index
Date: Fri, 22 Apr 2022 15:44:13 -0400	[thread overview]
Message-ID: <20220422194416.983549-1-yanjun.zhu@linux.dev> (raw)

From: Zhu Yanjun <yanjun.zhu@linux.dev>

This is a dead lock problem.
The ah_pool xa_lock first is acquired in this:

{SOFTIRQ-ON-W} state was registered at:

  lock_acquire+0x1d2/0x5a0
  _raw_spin_lock+0x33/0x80
  __rxe_add_to_pool+0x183/0x230 [rdma_rxe]

Then ah_pool xa_lock is acquired in this:

{IN-SOFTIRQ-W}:

Call Trace:
 <TASK>
  dump_stack_lvl+0x44/0x57
  mark_lock.part.52.cold.79+0x3c/0x46
  __lock_acquire+0x1565/0x34a0
  lock_acquire+0x1d2/0x5a0
  _raw_spin_lock_irqsave+0x42/0x90
  rxe_pool_get_index+0x72/0x1d0 [rdma_rxe]
  rxe_get_av+0x168/0x2a0 [rdma_rxe]
</TASK>

From the above, in the function __rxe_add_to_pool,
xa_lock is acquired. Then the function __rxe_add_to_pool
is interrupted by softirq. The function
rxe_pool_get_index will also acquire xa_lock.

Finally, the dead lock appears.

        CPU0
        ----
   lock(&xa->xa_lock#15);  <----- __rxe_add_to_pool
   <Interrupt>
     lock(&xa->xa_lock#15); <---- rxe_pool_get_index

                 *** DEADLOCK ***

Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays")
Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
V5->V6: One dead lock fix in one commit
V4->V5: Commit logs are changed.
V3->V4: xa_lock_irq locks are used.
V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so
        GFP_ATOMIC is used in __rxe_add_to_pool.
V1->V2: Replace GFP_KERNEL with GFP_ATOMIC
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 87066d04ed18..67f1d4733682 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 
 	atomic_set(&pool->num_elem, 0);
 
-	xa_init_flags(&pool->xa, XA_FLAGS_ALLOC);
+	xa_init_flags(&pool->xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
 	pool->limit.min = info->min_index;
 	pool->limit.max = info->max_index;
 }
@@ -155,6 +155,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
 	int err;
+	unsigned long flags;
 
 	if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
 		return -EINVAL;
@@ -166,8 +167,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
-	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
-			      &pool->next, GFP_KERNEL);
+	xa_lock_irqsave(&pool->xa, flags);
+	err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
+				&pool->next, GFP_ATOMIC);
+	xa_unlock_irqrestore(&pool->xa, flags);
 	if (err)
 		goto err_cnt;
 
@@ -201,7 +204,7 @@ static void rxe_elem_release(struct kref *kref)
 	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
 	struct rxe_pool *pool = elem->pool;
 
-	xa_erase(&pool->xa, elem->index);
+	xa_erase_irq(&pool->xa, elem->index);
 
 	if (pool->cleanup)
 		pool->cleanup(elem);
-- 
2.27.0


             reply	other threads:[~2022-04-22  3:17 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22 19:44 yanjun.zhu [this message]
2022-04-22 15:57 ` [PATCHv6 1/4] RDMA/rxe: Fix dead lock caused by __rxe_add_to_pool interrupted by rxe_pool_get_index Pearson, Robert B
2022-04-24 23:47   ` Yanjun Zhu
2022-04-25 17:32     ` Bob Pearson
2022-04-25 19:02     ` Jason Gunthorpe
2022-04-25 22:01       ` Yanjun Zhu
2022-04-25 23:16         ` Jason Gunthorpe
2022-04-22 19:44 ` [PATCH 2/4] RDMA/rxe: Fix dead lock caused by rxe_alloc " yanjun.zhu
2022-04-22 19:44 ` [PATCH 3/4] RDMA/rxe: Use different xa locks on different path yanjun.zhu
2022-04-22 19:44 ` [PATCH 4/4] RDMA/rxe: Check RDMA_CREATE_AH_SLEEPABLE in creating AH yanjun.zhu
2022-04-22 16:49   ` Jason Gunthorpe
2022-04-22 23:26     ` Yanjun Zhu
2022-04-23 18:17     ` [PATCHv2 " yanjun.zhu
2022-07-22  6:51 ` [PATCHv6 1/4] RDMA/rxe: Fix dead lock caused by __rxe_add_to_pool interrupted by rxe_pool_get_index yangx.jy
2022-07-22 13:43   ` Yanjun Zhu
2022-07-22 15:14     ` yangx.jy
2022-07-22 15:20       ` Jason Gunthorpe
2022-07-23  0:35       ` Yanjun Zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220422194416.983549-1-yanjun.zhu@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.