From: Jason Gunthorpe <jgg@nvidia.com>
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: zyjzyj2000@gmail.com, frank.zago@hpe.com, ian.ziemba@hpe.com,
jhack@hpe.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-next] RDMA/rxe: Fix potential race in rxe_pool_get_index
Date: Mon, 10 Jul 2023 13:47:49 -0300 [thread overview]
Message-ID: <ZKw2NbcUhCo5F2+g@nvidia.com> (raw)
In-Reply-To: <f48d9b89-d80a-c191-9618-102957868429@gmail.com>
On Fri, Jun 30, 2023 at 10:33:38AM -0500, Bob Pearson wrote:
> On 6/29/23 18:18, Jason Gunthorpe wrote:
> > On Thu, Jun 29, 2023 at 05:30:24PM -0500, Bob Pearson wrote:
> >> Currently the lookup of an object from its index and taking a
> >> reference to the object are incorrectly protected by an rcu_read_lock
> >> but this does not make the xa_load and the kref_get combination an
> >> atomic operation.
> >>
> >> The various write operations need to share the xarray state in a
> >> mixture of user, soft irq and hard irq contexts so the xa_locking
> >> must support that.
> >>
> >> This patch replaces the xa locks with xa_lock_irqsave.
> >>
> >> Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by xarrays")
> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >> ---
> >> drivers/infiniband/sw/rxe/rxe_pool.c | 24 ++++++++++++++++++------
> >> 1 file changed, 18 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> index 6215c6de3a84..f2b586249793 100644
> >> --- a/drivers/infiniband/sw/rxe/rxe_pool.c
> >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> >> @@ -119,8 +119,10 @@ void rxe_pool_cleanup(struct rxe_pool *pool)
> >> int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
> >> bool sleepable)
> >> {
> >> - int err;
> >> + struct xarray *xa = &pool->xa;
> >> + unsigned long flags;
> >> gfp_t gfp_flags;
> >> + int err;
> >>
> >> if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
> >> goto err_cnt;
> >> @@ -138,8 +140,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem,
> >>
> >> if (sleepable)
> >> might_sleep();
> >> - err = xa_alloc_cyclic(&pool->xa, &elem->index, NULL, pool->limit,
> >> + xa_lock_irqsave(xa, flags);
> >> + err = __xa_alloc_cyclic(xa, &elem->index, NULL, pool->limit,
> >> &pool->next, gfp_flags);
> >> + xa_unlock_irqrestore(xa, flags);
> >
> > This stuff doesn't make any sense, the non __ versions already take
> > the xa_lock internally.
> >
> > Or is this because you need the save/restore version for some reason?
> > But that seems unrelated and there should be a lockdep oops to go
> > along with it showing the backtrace??
>
> The background here is that we are testing a 256 node system with
> the Lustre file system and doing fail-over-fail-back testing which
> is very high stress. This has uncovered several bugs where this is
> just one.
> The logic is 1st need to lock the lookup in rxe_pool_get_index()
> then when we tried to run with ordinary spin_locks we got lots of
> deadlocks. These are related to taking spin locks while in (soft
> irq) interrupt mode. In theory we could also get called in hard irq
> mode so might as well convert the locks to spin_lock_irqsave() which
> is safe in all cases.
That should be its own patch with justification..
> >> @@ -154,15 +158,16 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index)
> >> {
> >> struct rxe_pool_elem *elem;
> >> struct xarray *xa = &pool->xa;
> >> + unsigned long flags;
> >> void *obj;
> >>
> >> - rcu_read_lock();
> >> + xa_lock_irqsave(xa, flags);
> >> elem = xa_load(xa, index);
> >> if (elem && kref_get_unless_zero(&elem->ref_cnt))
> >> obj = elem->obj;
> >> else
> >> obj = NULL;
> >> - rcu_read_unlock();
> >> + xa_unlock_irqrestore(xa, flags);
> >
> > And this should be safe as long as the object is freed via RCU, so
> > what are you trying to fix?
>
> The problem here is that rcu_read_lock only helps us if the object is freed with kfree_rcu.
> But we have no control over what rdma-core does and it does *not* do
> that for e.g. qp's.
Oh, yes that does sound right. This is another patch with this
explanation.
Jason
next prev parent reply other threads:[~2023-07-10 16:48 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-29 22:30 [PATCH for-next] RDMA/rxe: Fix potential race in rxe_pool_get_index Bob Pearson
2023-06-29 23:18 ` Jason Gunthorpe
2023-06-30 15:33 ` Bob Pearson
2023-07-10 16:47 ` Jason Gunthorpe [this message]
2023-07-10 18:11 ` Bob Pearson
2023-07-10 18:15 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZKw2NbcUhCo5F2+g@nvidia.com \
--to=jgg@nvidia.com \
--cc=frank.zago@hpe.com \
--cc=ian.ziemba@hpe.com \
--cc=jhack@hpe.com \
--cc=linux-rdma@vger.kernel.org \
--cc=rpearsonhpe@gmail.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox