From: Yanjun Zhu <yanjun.zhu@linux.dev>
To: Jason Gunthorpe <jgg@ziepe.ca>, Leon Romanovsky <leon@kernel.org>
Cc: yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org,
Yi Zhang <yi.zhang@redhat.com>
Subject: Re: [PATCHv5 1/2] RDMA/rxe: Fix a dead lock problem
Date: Thu, 21 Apr 2022 20:49:14 +0800 [thread overview]
Message-ID: <540b594e-dcc1-0d8a-a579-c5638d8919d1@linux.dev> (raw)
In-Reply-To: <20220420163249.GQ64706@ziepe.ca>
在 2022/4/21 0:32, Jason Gunthorpe 写道:
> On Wed, Apr 20, 2022 at 09:42:23AM +0300, Leon Romanovsky wrote:
>> On Sat, Apr 16, 2022 at 10:43:42PM -0400, yanjun.zhu@linux.dev wrote:
>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>
>>> This is a dead lock problem.
>>> The ah_pool xa_lock first is acquired in this:
>>>
>>> {SOFTIRQ-ON-W} state was registered at:
>>>
>>> lock_acquire+0x1d2/0x5a0
>>> _raw_spin_lock+0x33/0x80
>>> __rxe_add_to_pool+0x183/0x230 [rdma_rxe]
>>>
>>> Then ah_pool xa_lock is acquired in this:
>>>
>>> {IN-SOFTIRQ-W}:
>>>
>>> Call Trace:
>>> <TASK>
>>> dump_stack_lvl+0x44/0x57
>>> mark_lock.part.52.cold.79+0x3c/0x46
>>> __lock_acquire+0x1565/0x34a0
>>> lock_acquire+0x1d2/0x5a0
>>> _raw_spin_lock_irqsave+0x42/0x90
>>> rxe_pool_get_index+0x72/0x1d0 [rdma_rxe]
>>> rxe_get_av+0x168/0x2a0 [rdma_rxe]
>>> </TASK>
>>>
>>> From the above, in the function __rxe_add_to_pool,
>>> xa_lock is acquired. Then the function __rxe_add_to_pool
>>> is interrupted by softirq. The function
>>> rxe_pool_get_index will also acquire xa_lock.
>>>
>>> Finally, the dead lock appears.
>>>
>>> [ 296.806097] CPU0
>>> [ 296.808550] ----
>>> [ 296.811003] lock(&xa->xa_lock#15); <----- __rxe_add_to_pool
>>> [ 296.814583] <Interrupt>
>>> [ 296.817209] lock(&xa->xa_lock#15); <---- rxe_pool_get_index
>>> [ 296.820961]
>>> *** DEADLOCK ***
>>>
>>> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays")
>>> Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
>>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>>> V4->V5: Commit logs are changed to avoid confusion.
>>> V3->V4: xa_lock_irq locks are used.
>>> V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so
>>> GFP_ATOMIC is used in __rxe_add_to_pool.
>>> V1->V2: Replace GFP_KERNEL with GFP_ATOMIC
>>> drivers/infiniband/sw/rxe/rxe_pool.c | 20 ++++++++++++++------
>>> 1 file changed, 14 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
>>> index 87066d04ed18..f1f06dc7e64f 100644
>>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
>>> @@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
>>>
>>> atomic_set(&pool->num_elem, 0);
>>>
>>> - xa_init_flags(&pool->xa, XA_FLAGS_ALLOC);
>>> + xa_init_flags(&pool->xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
>>> pool->limit.min = info->min_index;
>>> pool->limit.max = info->max_index;
>>> }
>>> @@ -138,8 +138,10 @@ void *rxe_alloc(struct rxe_pool *pool)
>>> elem->obj = obj;
>>> kref_init(&elem->ref_cnt);
>>>
>>> - err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>>> - &pool->next, GFP_KERNEL);
>>> + xa_lock_irq(&pool->xa);
>>> + err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
>>> + &pool->next, GFP_KERNEL);
>>> + xa_unlock_irq(&pool->xa);
>
> It should just use xa_alloc_cyclic_irq() and xa_erase_irq(). Don't
> open code the lock.
Got it. I will use the above functions.
>
>> I may admit that I didn't follow your previous discussions, so maybe you
>> already explained it. But why do you need xa_lock_irq() here?
>
> The spinlock type needs to be consistent in all users.
>
> You can only use the naked version if the spinlock is always obtained
> from a process context.
>
> You can only use bh version if the spinlock is always obtained from a
> process context or bh/softirq
>
> You can always use the irq version
>
> What I don't understand is why IRQ and not BH? AFAIK there is no case
> where rxe is called from a real IRQ, right? Or is it because you can't
> nest BH under the IRQ spinlock from CM?
Sure. I will use IRQ spinlock. The reason is as below:
1.
https://patchwork.kernel.org/project/linux-rdma/patch/20220210073655.42281-2-guoqing.jiang@linux.dev/
2.
https://patchwork.kernel.org/project/linux-rdma/patch/20220215194448.44369-1-rpearsonhpe@gmail.com/
The above 2 links are why I used IRQ.
Zhu Yanjun
>
> Jason
prev parent reply other threads:[~2022-04-21 12:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-17 2:43 [PATCHv5 1/2] RDMA/rxe: Fix a dead lock problem yanjun.zhu
2022-04-17 2:43 ` [PATCHv2 2/2] RDMA/rxe: Use different xa locks on different path yanjun.zhu
2022-04-20 16:34 ` Jason Gunthorpe
2022-04-20 6:42 ` [PATCHv5 1/2] RDMA/rxe: Fix a dead lock problem Leon Romanovsky
2022-04-20 16:32 ` Jason Gunthorpe
2022-04-21 12:49 ` Yanjun Zhu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=540b594e-dcc1-0d8a-a579-c5638d8919d1@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=yi.zhang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.