All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: "Håkon Bugge" <haakon.bugge@oracle.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	rds-devel@oss.oracle.com, linux-kernel@vger.kernel.org,
	Jack Morgenstein <jackm@dev.mellanox.co.il>
Subject: Re: [PATCH] mlx4_ib: Increase the timeout for CM cache
Date: Tue, 5 Feb 2019 15:36:08 -0700	[thread overview]
Message-ID: <20190205223608.GA23110@ziepe.ca> (raw)
In-Reply-To: <20190131170951.178676-1-haakon.bugge@oracle.com>

On Thu, Jan 31, 2019 at 06:09:51PM +0100, Håkon Bugge wrote:
> Using CX-3 virtual functions, either from a bare-metal machine or
> pass-through from a VM, MAD packets are proxied through the PF driver.
> 
> Since the VMs have separate name spaces for MAD Transaction Ids
> (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
> in a cache.
> 
> Following the RDMA CM protocol, it is clear when an entry has to
> evicted form the cache. But life is not perfect, remote peers may die
> or be rebooted. Hence, it's a timeout to wipe out a cache entry, when
> the PF driver assumes the remote peer has gone.
> 
> We have experienced excessive amount of DREQ retries during fail-over
> testing, when running with eight VMs per database server.
> 
> The problem has been reproduced in a bare-metal system using one VM
> per physical node. In this environment, running 256 processes in each
> VM, each process uses RDMA CM to create an RC QP between himself and
> all (256) remote processes. All in all 16K QPs.
> 
> When tearing down these 16K QPs, excessive DREQ retries (and
> duplicates) are observed. With some cat/paste/awk wizardry on the
> infiniband_cm sysfs, we observe:
> 
>       dreq:       5007
> cm_rx_msgs:
>       drep:       3838
>       dreq:      13018
>        rep:       8128
>        req:       8256
>        rtu:       8256
> cm_tx_msgs:
>       drep:       8011
>       dreq:      68856
>        rep:       8256
>        req:       8128
>        rtu:       8128
> cm_tx_retries:
>       dreq:      60483
> 
> Note that the active/passive side is distributed.
> 
> Enabling pr_debug in cm.c gives tons of:
> 
> [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave:
> 1,sl_cm_id: 0xd393089f} is NULL!
> 
> By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
> tear-down phase of the application is reduced from 113 to 67
> seconds. Retries/duplicates are also significantly reduced:
> 
> cm_rx_duplicates:
>       dreq:       7726
> []
> cm_tx_retries:
>       drep:          1
>       dreq:       7779
> 
> Increasing the timeout further didn't help, as these duplicates and
> retries stem from a too short CMA timeout, which was 20 (~4 seconds)
> on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
> numbers fell down to about one hundred for both of them.
> 
> Adjustment of the CMA timeout is _not_ part of this commit.
> 
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> ---
>  drivers/infiniband/hw/mlx4/cm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Jack? What do you think?

> diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
> index fedaf8260105..8c79a480f2b7 100644
> --- a/drivers/infiniband/hw/mlx4/cm.c
> +++ b/drivers/infiniband/hw/mlx4/cm.c
> @@ -39,7 +39,7 @@
>  
>  #include "mlx4_ib.h"
>  
> -#define CM_CLEANUP_CACHE_TIMEOUT  (5 * HZ)
> +#define CM_CLEANUP_CACHE_TIMEOUT  (30 * HZ)
>  
>  struct id_map_entry {
>  	struct rb_node node;
> -- 
> 2.20.1
> 

  parent reply	other threads:[~2019-02-05 22:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-31 17:09 [PATCH] mlx4_ib: Increase the timeout for CM cache Håkon Bugge
2019-02-01 15:18 ` Håkon Bugge
2019-02-05 22:36 ` Jason Gunthorpe [this message]
2019-02-06  8:50   ` Håkon Bugge
2019-02-06 15:40     ` Håkon Bugge
2019-02-06 18:02       ` jackm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190205223608.GA23110@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=davem@davemloft.net \
    --cc=haakon.bugge@oracle.com \
    --cc=jackm@dev.mellanox.co.il \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rds-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.