Re: [PATCH for-next 1/6] RDMA/rxe: Make rxe_alloc() take pool lock

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Robert Pearson <rpearsonhpe@gmail.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Zhu Yanjun <zyjzyj2000@gmail.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: Re: [PATCH for-next 1/6] RDMA/rxe: Make rxe_alloc() take pool lock
Date: Mon, 25 Oct 2021 13:48:58 -0500	[thread overview]
Message-ID: <CAFc_bgY3e7Zy6kN2PyZN74dvPWda94-tjRorA52yVPBS9gkH7A@mail.gmail.com> (raw)
In-Reply-To: <20211025124357.GS2744544@nvidia.com>

This looks useful but I had given up using xarrays to hold multicast
groups because the mgids are 128 bits. The ib_attach_mcast() verb just
passes in the QP and MGID and I couldn't think of an efficient way to
reduce the MGID to a 32 bit non-sparse index. The xarray documentation
advises against hashing the object to get an index. If Linux hands out
mcast group IDs sequentially it may be OK to just use the flag and
scope bits concatenated with the low order 24 bits of the group number
but one would have to deal with (maybe never happen) collisions. If
this worked I could get rid of all the RB trees and make everything be
xarrays.

Bob

On Mon, Oct 25, 2021 at 7:43 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Oct 21, 2021 at 12:46:50PM -0500, Bob Pearson wrote:
> > On 10/20/21 6:16 PM, Jason Gunthorpe wrote:
> > > On Sun, Oct 10, 2021 at 06:59:26PM -0500, Bob Pearson wrote:
> > >> In rxe there are two separate pool APIs for creating a new object
> > >> rxe_alloc() and rxe_alloc_locked(). Currently they are identical.
> > >> Make rxe_alloc() take the pool lock which is in line with the other
> > >> APIs in the library.
> > >>
> > >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> > >>  drivers/infiniband/sw/rxe/rxe_pool.c | 21 ++++-----------------
> > >>  1 file changed, 4 insertions(+), 17 deletions(-)
> > >>
> > >> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
> > >> index ffa8420b4765..7a288ebacceb 100644
> > >> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c
> > >> @@ -352,27 +352,14 @@ void *rxe_alloc_locked(struct rxe_pool *pool)
> > >>
> > >>  void *rxe_alloc(struct rxe_pool *pool)
> > >>  {
> > >> -  struct rxe_type_info *info = &rxe_type_info[pool->type];
> > >> -  struct rxe_pool_entry *elem;
> > >> +  unsigned long flags;
> > >>    u8 *obj;
> > >>
> > >> -  if (atomic_inc_return(&pool->num_elem) > pool->max_elem)
> > >> -          goto out_cnt;
> > >> -
> > >> -  obj = kzalloc(info->size, GFP_KERNEL);
> > >> -  if (!obj)
> > >> -          goto out_cnt;
> > >> -
> > >> -  elem = (struct rxe_pool_entry *)(obj + info->elem_offset);
> > >> -
> > >> -  elem->pool = pool;
> > >> -  kref_init(&elem->ref_cnt);
> > >> +  write_lock_irqsave(&pool->pool_lock, flags);
> > >> +  obj = rxe_alloc_locked(pool);
> > >> +  write_unlock_irqrestore(&pool->pool_lock, flags);
> > >
> > > But why? This just makes a GFP_KERNEL allocation into a GFP_ATOMIC
> > > allocation, which is bad.
> > >
> > > Jason
> > >
> > how bad?
>
> Quite bad..
>
> > It only has to happen once in the driver for mcast group elements
> > where currently I have (to avoid the race when two QPs try to join
> > the same mcast grp on different CPUs at the same time)
> >
> >       spin_lock()
> >       grp = rxe_get_key_locked(pool, mgid)
> >       if !grp
> >               grp = rxe_alloc_locked(pool)
> >       spin_unlock()
>
> When you have xarray the general pattern is:
>
> old = xa_load(mgid)
> if (old
>    return old
>
> new = kzalloc()
> old = xa_cmpxchg(mgid, NULL, new)
> if (old)
>      kfree(new)
>      return old;
> return new;
>
> There are several examples of this with various locking patterns
>
> Jason

next prev parent reply	other threads:[~2021-10-25 18:49 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-10 23:59 [PATCH for-next 0/6] RDMA/rxe: Fix potential races Bob Pearson
2021-10-10 23:59 ` [PATCH for-next 1/6] RDMA/rxe: Make rxe_alloc() take pool lock Bob Pearson
2021-10-20 23:16   ` Jason Gunthorpe
2021-10-21 17:46     ` Bob Pearson
2021-10-25 12:43       ` Jason Gunthorpe
2021-10-25 18:48         ` Robert Pearson [this message]
2021-10-10 23:59 ` [PATCH for-next 2/6] RDMA/rxe: Copy setup parameters into rxe_pool Bob Pearson
2021-10-10 23:59 ` [PATCH for-next 3/6] RDMA/rxe: Save object pointer in pool element Bob Pearson
2021-10-20 23:20   ` Jason Gunthorpe
2021-10-21 17:21     ` Bob Pearson
2021-10-25 15:40       ` Jason Gunthorpe
2021-10-10 23:59 ` [PATCH for-next 4/6] RDMA/rxe: Combine rxe_add_index with rxe_alloc Bob Pearson
2021-10-10 23:59 ` [PATCH for-next 5/6] RDMA/rxe: Combine rxe_add_key " Bob Pearson
2021-10-10 23:59 ` [PATCH for-next 6/6] RDMA/rxe: Fix potential race condition in rxe_pool Bob Pearson
2021-10-20 23:23   ` Jason Gunthorpe
2021-10-12  6:34 ` [PATCH for-next 0/6] RDMA/rxe: Fix potential races Leon Romanovsky
2021-10-12 20:19   ` Bob Pearson
2021-10-19 13:07     ` Leon Romanovsky
2021-10-19 16:35       ` Bob Pearson
2021-10-19 18:43         ` Jason Gunthorpe
2021-10-19 22:51           ` Bob Pearson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFc_bgY3e7Zy6kN2PyZN74dvPWda94-tjRorA52yVPBS9gkH7A@mail.gmail.com \
    --to=rpearsonhpe@gmail.com \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).