public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: zyjzyj2000@gmail.com, frank.zago@hpe.com, ian.ziemba@hpe.com,
	jhack@hpe.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-next] RDMA/rxe: Fix potential race in rxe_pool_get_index
Date: Mon, 10 Jul 2023 13:47:49 -0300	[thread overview]
Message-ID: <ZKw2NbcUhCo5F2+g@nvidia.com> (raw)
In-Reply-To: <f48d9b89-d80a-c191-9618-102957868429@gmail.com>

On Fri, Jun 30, 2023 at 10:33:38AM -0500, Bob Pearson wrote:
> On 6/29/23 18:18, Jason Gunthorpe wrote:
> > On Thu, Jun 29, 2023 at 05:30:24PM -0500, Bob Pearson wrote:
> >> Currently the lookup of an object from its index and taking a
> >> reference to the object are incorrectly protected by an rcu_read_lock
> >> but this does not make the xa_load and the kref_get combination an
> >> atomic operation.
> >>
> >> The various write operations need to share the xarray state in a
> >> mixture of user, soft irq and hard irq contexts so the xa_locking
> >> must support that.
> >>
> >> This patch replaces the xa locks with xa_lock_irqsave.
> >>
> >> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by xarrays")
> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >> ---
> >>  drivers/infiniband/sw/rxe/rxe_pool.c | 24 ++++++++++++++++++------
> >>  1 file changed, 18 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> index 6215c6de3a84..f2b586249793 100644
> >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> @@ -119,8 +119,10 @@ void rxe_pool_cleanup(struct rxe_pool *pool)
> >>  int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
> >>  				bool sleepable)
> >>  {
> >> -	int err;
> >> +	struct xarray *xa = &pool->xa;
> >> +	unsigned long flags;
> >>  	gfp_t gfp_flags;
> >> +	int err;
> >>  
> >>  	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
> >>  		goto err_cnt;
> >> @@ -138,8 +140,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
> >>  
> >>  	if (sleepable)
> >>  		might_sleep();
> >> -	err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
> >> +	xa_lock_irqsave(xa, flags);
> >> +	err = __xa_alloc_cyclic(xa, &elem->index, NULL, pool->limit,
> >>  			      &pool->next, gfp_flags);
> >> +	xa_unlock_irqrestore(xa, flags);
> > 
> > This stuff doesn't make any sense, the non __ versions already take
> > the xa_lock internally.
> > 
> > Or is this because you need the save/restore version for some reason?
> > But that seems unrelated and there should be a lockdep oops to go
> > along with it showing the backtrace??
> 
> The background here is that we are testing a 256 node system with
> the Lustre file system and doing fail-over-fail-back testing which
> is very high stress. This has uncovered several bugs where this is
> just one.

> The logic is 1st need to lock the lookup in rxe_pool_get_index()
> then when we tried to run with ordinary spin_locks we got lots of
> deadlocks. These are related to taking spin locks while in (soft
> irq) interrupt mode. In theory we could also get called in hard irq
> mode so might as well convert the locks to spin_lock_irqsave() which
> is safe in all cases.

That should be its own patch with justification..
 
> >> @@ -154,15 +158,16 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
> >>  {
> >>  	struct rxe_pool_elem *elem;
> >>  	struct xarray *xa = &pool->xa;
> >> +	unsigned long flags;
> >>  	void *obj;
> >>  
> >> -	rcu_read_lock();
> >> +	xa_lock_irqsave(xa, flags);
> >>  	elem = xa_load(xa, index);
> >>  	if (elem && kref_get_unless_zero(&elem->ref_cnt))
> >>  		obj = elem->obj;
> >>  	else
> >>  		obj = NULL;
> >> -	rcu_read_unlock();
> >> +	xa_unlock_irqrestore(xa, flags);
> > 
> > And this should be safe as long as the object is freed via RCU, so
> > what are you trying to fix?
> 
> The problem here is that rcu_read_lock only helps us if the object is freed with kfree_rcu.
> But we have no control over what rdma-core does and it does *not* do
> that for e.g. qp's.

Oh, yes that does sound right. This is another patch with this
explanation.

Jason

  reply	other threads:[~2023-07-10 16:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-29 22:30 [PATCH for-next] RDMA/rxe: Fix potential race in rxe_pool_get_index Bob Pearson
2023-06-29 23:18 ` Jason Gunthorpe
2023-06-30 15:33   ` Bob Pearson
2023-07-10 16:47     ` Jason Gunthorpe [this message]
2023-07-10 18:11       ` Bob Pearson
2023-07-10 18:15         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKw2NbcUhCo5F2+g@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=frank.zago@hpe.com \
    --cc=ian.ziemba@hpe.com \
    --cc=jhack@hpe.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox