All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Doug Ledford <dledford@redhat.com>,
	Aharon Landau <aharonl@nvidia.com>,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty
Date: Wed, 6 Oct 2021 12:30:41 +0300	[thread overview]
Message-ID: <YV1swbl1VMQqoR1x@unreal> (raw)
In-Reply-To: <20211004230003.GA2602856@nvidia.com>

On Mon, Oct 04, 2021 at 08:00:03PM -0300, Jason Gunthorpe wrote:
> On Sun, Sep 26, 2021 at 11:31:43AM +0300, Leon Romanovsky wrote:
> > From: Aharon Landau <aharonl@nvidia.com>
> > 
> > Currently, if a cache entry is empty, the driver will try to take MRs
> > from larger cache entries. This behavior consumes a lot of memory.
> > In addition, when searching for an mkey in an entry, the entry is locked.
> > When using a multithreaded application with the old behavior, the threads
> > will block each other more often, which can hurt performance as can be
> > seen in the table below.
> > 
> > Therefore, avoid it by creating a new mkey when the requested cache entry
> > is empty.
> > 
> > The test was performed on a machine with
> > Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores.
> > 
> > Here are the time measures for allocating MRs of 2^6 pages. The search in
> > the cache started from entry 6.
> > 
> > +------------+---------------------+---------------------+
> > |            |     Old behavior    |     New behavior    |
> > |            +----------+----------+----------+----------+
> > |            | 1 thread | 5 thread | 1 thread | 5 thread |
> > +============+==========+==========+==========+==========+
> > |  1,000 MRs |   14 ms  |   30 ms  |   14 ms  |   80 ms  |
> > +------------+----------+----------+----------+----------+
> > | 10,000 MRs |  135 ms  |   6 sec  |  173 ms  |  880 ms  |
> > +------------+----------+----------+----------+----------+
> > |100,000 MRs | 11.2 sec |  57 sec  | 1.74 sec |  8.8 sec |
> > +------------+----------+----------+----------+----------+
> > 
> > Signed-off-by: Aharon Landau <aharonl@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/mlx5/mr.c | 26 +++++++++-----------------
> >  1 file changed, 9 insertions(+), 17 deletions(-)
> 
> I'm surprised the cost is so high, I assume this has alot to do with
> repeated calls to queue_adjust_cache_locked()? Maybe this should be
> further investigated?

I don't think so, most of the overhead comes from entry lock, which
effectively stops any change to that shared entry.

> 
> Anyhow, applied to for-next, thanks

Thanks

> 
> Jason

      reply	other threads:[~2021-10-06  9:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-26  8:31 [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty Leon Romanovsky
2021-10-04 23:00 ` Jason Gunthorpe
2021-10-06  9:30   ` Leon Romanovsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YV1swbl1VMQqoR1x@unreal \
    --to=leon@kernel.org \
    --cc=aharonl@nvidia.com \
    --cc=dledford@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.