Re: [PATCH for-next] RDMA/rxe: Fix potential race in rxe_pool_get_index

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: zyjzyj2000@gmail.com, frank.zago@hpe.com, ian.ziemba@hpe.com,
	jhack@hpe.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-next] RDMA/rxe: Fix potential race in rxe_pool_get_index
Date: Mon, 10 Jul 2023 13:47:49 -0300	[thread overview]
Message-ID: <ZKw2NbcUhCo5F2+g@nvidia.com> (raw)
In-Reply-To: <f48d9b89-d80a-c191-9618-102957868429@gmail.com>

On Fri, Jun 30, 2023 at 10:33:38AM -0500, Bob Pearson wrote:
> On 6/29/23 18:18, Jason Gunthorpe wrote:
> > On Thu, Jun 29, 2023 at 05:30:24PM -0500, Bob Pearson wrote:
> >> Currently the lookup of an object from its index and taking a
> >> reference to the object are incorrectly protected by an rcu_read_lock
> >> but this does not make the xa_load and the kref_get combination an
> >> atomic operation.
> >>
> >> The various write operations need to share the xarray state in a
> >> mixture of user, soft irq and hard irq contexts so the xa_locking
> >> must support that.
> >>
> >> This patch replaces the xa locks with xa_lock_irqsave.
> >>
> >> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by xarrays")
> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >> ---
> >>  drivers/infiniband/sw/rxe/rxe_pool.c | 24 ++++++++++++++++++------
> >>  1 file changed, 18 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> index 6215c6de3a84..f2b586249793 100644
> >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> @@ -119,8 +119,10 @@ void rxe_pool_cleanup(struct rxe_pool *pool)
> >>  int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
> >>  				bool sleepable)
> >>  {
> >> -	int err;
> >> +	struct xarray *xa = &pool->xa;
> >> +	unsigned long flags;
> >>  	gfp_t gfp_flags;
> >> +	int err;
> >>  
> >>  	if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
> >>  		goto err_cnt;
> >> @@ -138,8 +140,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
> >>  
> >>  	if (sleepable)
> >>  		might_sleep();
> >> -	err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
> >> +	xa_lock_irqsave(xa, flags);
> >> +	err = __xa_alloc_cyclic(xa, &elem->index, NULL, pool->limit,
> >>  			      &pool->next, gfp_flags);
> >> +	xa_unlock_irqrestore(xa, flags);
> > 
> > This stuff doesn't make any sense, the non __ versions already take
> > the xa_lock internally.
> > 
> > Or is this because you need the save/restore version for some reason?
> > But that seems unrelated and there should be a lockdep oops to go
> > along with it showing the backtrace??
> 
> The background here is that we are testing a 256 node system with
> the Lustre file system and doing fail-over-fail-back testing which
> is very high stress. This has uncovered several bugs where this is
> just one.

> The logic is 1st need to lock the lookup in rxe_pool_get_index()
> then when we tried to run with ordinary spin_locks we got lots of
> deadlocks. These are related to taking spin locks while in (soft
> irq) interrupt mode. In theory we could also get called in hard irq
> mode so might as well convert the locks to spin_lock_irqsave() which
> is safe in all cases.

That should be its own patch with justification..
 
> >> @@ -154,15 +158,16 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
> >>  {
> >>  	struct rxe_pool_elem *elem;
> >>  	struct xarray *xa = &pool->xa;
> >> +	unsigned long flags;
> >>  	void *obj;
> >>  
> >> -	rcu_read_lock();
> >> +	xa_lock_irqsave(xa, flags);
> >>  	elem = xa_load(xa, index);
> >>  	if (elem && kref_get_unless_zero(&elem->ref_cnt))
> >>  		obj = elem->obj;
> >>  	else
> >>  		obj = NULL;
> >> -	rcu_read_unlock();
> >> +	xa_unlock_irqrestore(xa, flags);
> > 
> > And this should be safe as long as the object is freed via RCU, so
> > what are you trying to fix?
> 
> The problem here is that rcu_read_lock only helps us if the object is freed with kfree_rcu.
> But we have no control over what rdma-core does and it does *not* do
> that for e.g. qp's.

Oh, yes that does sound right. This is another patch with this
explanation.

Jason

next prev parent reply	other threads:[~2023-07-10 16:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-29 22:30 [PATCH for-next] RDMA/rxe: Fix potential race in rxe_pool_get_index Bob Pearson
2023-06-29 23:18 ` Jason Gunthorpe
2023-06-30 15:33   ` Bob Pearson
2023-07-10 16:47     ` Jason Gunthorpe [this message]
2023-07-10 18:11       ` Bob Pearson
2023-07-10 18:15         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKw2NbcUhCo5F2+g@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=frank.zago@hpe.com \
    --cc=ian.ziemba@hpe.com \
    --cc=jhack@hpe.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.