Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: [PATCH for-next 0/8] Some fixes from hns
From: Jason Gunthorpe @ 2019-07-05 17:23 UTC (permalink / raw)
  To: Lijun Ou; +Cc: dledford, leon, linux-rdma, linuxarm
In-Reply-To: <1561376872-111496-1-git-send-email-oulijun@huawei.com>

On Mon, Jun 24, 2019 at 07:47:44PM +0800, Lijun Ou wrote:
> Here are some bug fixes as well code optimization.
> 
> Lang Cheng (3):
>   RDMA/hns: Set reset flag when hw resetting
>   RDMA/hns: Use %pK format pointer print
>   RDMA/hns: Clean up unnecessary variable initialization
> 
> Lijun Ou (1):
>   RDMA/hns: Bugfix for cleaning mtr
> 
> Xi Wang (1):
>   RDMA/hns: Fixs hw access invalid dma memory error
> 
> Yangyang Li (1):
>   RDMA/hns: Modify ba page size for cqe
> 
> chenglang (1):
>   RDMA/hns: Fixup qp release bug
> 
> o00290482 (1):
>   RDMA/hns: Bugfix for calculating qp buffer size
> 
>  drivers/infiniband/hw/hns/hns_roce_cmd.c   |  2 +-
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.c |  4 +++-
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c |  9 +++++----
>  drivers/infiniband/hw/hns/hns_roce_main.c  |  2 +-
>  drivers/infiniband/hw/hns/hns_roce_pd.c    |  2 +-
>  drivers/infiniband/hw/hns/hns_roce_qp.c    | 13 ++++---------
>  6 files changed, 15 insertions(+), 17 deletions(-)
>

Applied to for-next, thanks

Jason

^ permalink raw reply

* Re: [rdma 14/16] RDMA/irdma: Add ABI definitions
From: Jason Gunthorpe @ 2019-07-05 17:16 UTC (permalink / raw)
  To: Saleem, Shiraz
  Cc: Leon Romanovsky, Kirsher, Jeffrey T, dledford@redhat.com,
	davem@davemloft.net, Ismail, Mustafa, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, nhorman@redhat.com, sassmann@redhat.com,
	poswald@suse.com, Ertman, David M
In-Reply-To: <9DD61F30A802C4429A01CA4200E302A7A684DAAA@fmsmsx124.amr.corp.intel.com>

On Fri, Jul 05, 2019 at 04:42:19PM +0000, Saleem, Shiraz wrote:
> > Subject: Re: [rdma 14/16] RDMA/irdma: Add ABI definitions
> > 
> > On Thu, Jul 04, 2019 at 10:40:21AM +0300, Leon Romanovsky wrote:
> > > On Wed, Jul 03, 2019 at 07:12:57PM -0700, Jeff Kirsher wrote:
> > > > From: Mustafa Ismail <mustafa.ismail@intel.com>
> > > >
> > > > Add ABI definitions for irdma.
> > > >
> > > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
> > > > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > include/uapi/rdma/irdma-abi.h | 130
> > > > ++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 130 insertions(+)
> > > >  create mode 100644 include/uapi/rdma/irdma-abi.h
> > > >
> > > > diff --git a/include/uapi/rdma/irdma-abi.h
> > > > b/include/uapi/rdma/irdma-abi.h new file mode 100644 index
> > > > 000000000000..bdfbda4c829e
> > > > +++ b/include/uapi/rdma/irdma-abi.h
> > > > @@ -0,0 +1,130 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
> > > > +/* Copyright (c) 2006 - 2019 Intel Corporation.  All rights reserved.
> > > > + * Copyright (c) 2005 Topspin Communications.  All rights reserved.
> > > > + * Copyright (c) 2005 Cisco Systems.  All rights reserved.
> > > > + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
> > > > + */
> > > > +
> > > > +#ifndef IRDMA_ABI_H
> > > > +#define IRDMA_ABI_H
> > > > +
> > > > +#include <linux/types.h>
> > > > +
> > > > +/* irdma must support legacy GEN_1 i40iw kernel
> > > > + * and user-space whose last ABI ver is 5  */ #define IRDMA_ABI_VER
> > > > +6
> > >
> > > Can you please elaborate about it more?
> > > There is no irdma code in RDMA yet, so it makes me wonder why new
> > > define shouldn't start from 1.
> > 
> > It is because they are ABI compatible with the current user space, which raises the
> > question why we even have this confusing header file..
> 
> It is because we need to support current providers/i40iw user-space.
> Our user-space patch series will introduce a new provider (irdma) whose ABI
> ver. is also 6 (capable of supporting X722 and which will work with i40iw driver
> on older kernels) and removes providers/i40iw from rdma-core.

Why on earth would we do that?

Jason 

^ permalink raw reply

* Re: [PATCH rdma-next] IB/mlx5: Report correctly tag matching rendezvous capability
From: Jason Gunthorpe @ 2019-07-05 17:15 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Danit Goldberg, RDMA mailing list, Artemy Kovalyov,
	Yishai Hadas, Leon Romanovsky
In-Reply-To: <20190705162157.17336-1-leon@kernel.org>

On Fri, Jul 05, 2019 at 07:21:57PM +0300, Leon Romanovsky wrote:
> From: Danit Goldberg <danitg@mellanox.com>
> 
> Tag matching with rendezvous offload for RC transport is controlled
> by FW and before this change, it was advertised to user as supported
> without any relation to FW.
> 
> Separate tag matching for rendezvous and eager protocols, so users
> will see real capabilities.
> 
> Cc: <stable@vger.kernel.org> # 4.13
> Fixes: eb761894351d ("IB/mlx5: Fill XRQ capabilities")
> Signed-off-by: Danit Goldberg <danitg@mellanox.com>
> Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/main.c | 8 ++++++--
>  include/rdma/ib_verbs.h           | 4 ++--
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index 07a05b0b9e42..c2a5780cb394 100644
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -1046,15 +1046,19 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
>  	}
>  
>  	if (MLX5_CAP_GEN(mdev, tag_matching)) {
> -		props->tm_caps.max_rndv_hdr_size = MLX5_TM_MAX_RNDV_MSG_SIZE;
>  		props->tm_caps.max_num_tags =
>  			(1 << MLX5_CAP_GEN(mdev, log_tag_matching_list_sz)) - 1;
> -		props->tm_caps.flags = IB_TM_CAP_RC;
>  		props->tm_caps.max_ops =
>  			1 << MLX5_CAP_GEN(mdev, log_max_qp_sz);
>  		props->tm_caps.max_sge = MLX5_TM_MAX_SGE;
>  	}
>  
> +	if (MLX5_CAP_GEN(mdev, tag_matching) &&
> +	    MLX5_CAP_GEN(mdev, rndv_offload_rc)) {
> +		props->tm_caps.flags = IB_TM_CAP_RNDV_RC;
> +		props->tm_caps.max_rndv_hdr_size = MLX5_TM_MAX_RNDV_MSG_SIZE;
> +	}
> +
>  	if (MLX5_CAP_GEN(dev->mdev, cq_moderation)) {
>  		props->cq_caps.max_cq_moderation_count =
>  						MLX5_MAX_CQ_COUNT;
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 30eb68f36109..c5f8a9f17063 100644
> +++ b/include/rdma/ib_verbs.h
> @@ -308,8 +308,8 @@ struct ib_rss_caps {
>  };
>  
>  enum ib_tm_cap_flags {
> -	/*  Support tag matching on RC transport */
> -	IB_TM_CAP_RC		    = 1 << 0,
> +	/*  Support tag matching with rendezvous offload for RC transport */
> +	IB_TM_CAP_RNDV_RC = 1 << 0,
>  };

This is in the wrong header, right?

Jason

^ permalink raw reply

* RE: [rdma 14/16] RDMA/irdma: Add ABI definitions
From: Saleem, Shiraz @ 2019-07-05 16:42 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Kirsher, Jeffrey T, dledford@redhat.com, davem@davemloft.net,
	Ismail, Mustafa, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, nhorman@redhat.com, sassmann@redhat.com,
	poswald@suse.com, Ertman, David M
In-Reply-To: <20190704121933.GD3401@mellanox.com>

> Subject: Re: [rdma 14/16] RDMA/irdma: Add ABI definitions
> 
> On Thu, Jul 04, 2019 at 10:40:21AM +0300, Leon Romanovsky wrote:
> > On Wed, Jul 03, 2019 at 07:12:57PM -0700, Jeff Kirsher wrote:
> > > From: Mustafa Ismail <mustafa.ismail@intel.com>
> > >
> > > Add ABI definitions for irdma.
> > >
> > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
> > > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > include/uapi/rdma/irdma-abi.h | 130
> > > ++++++++++++++++++++++++++++++++++
> > >  1 file changed, 130 insertions(+)
> > >  create mode 100644 include/uapi/rdma/irdma-abi.h
> > >
> > > diff --git a/include/uapi/rdma/irdma-abi.h
> > > b/include/uapi/rdma/irdma-abi.h new file mode 100644 index
> > > 000000000000..bdfbda4c829e
> > > +++ b/include/uapi/rdma/irdma-abi.h
> > > @@ -0,0 +1,130 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
> > > +/* Copyright (c) 2006 - 2019 Intel Corporation.  All rights reserved.
> > > + * Copyright (c) 2005 Topspin Communications.  All rights reserved.
> > > + * Copyright (c) 2005 Cisco Systems.  All rights reserved.
> > > + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
> > > + */
> > > +
> > > +#ifndef IRDMA_ABI_H
> > > +#define IRDMA_ABI_H
> > > +
> > > +#include <linux/types.h>
> > > +
> > > +/* irdma must support legacy GEN_1 i40iw kernel
> > > + * and user-space whose last ABI ver is 5  */ #define IRDMA_ABI_VER
> > > +6
> >
> > Can you please elaborate about it more?
> > There is no irdma code in RDMA yet, so it makes me wonder why new
> > define shouldn't start from 1.
> 
> It is because they are ABI compatible with the current user space, which raises the
> question why we even have this confusing header file..

It is because we need to support current providers/i40iw user-space.
Our user-space patch series will introduce a new provider (irdma) whose ABI
ver. is also 6 (capable of supporting X722 and which will work with i40iw driver
on older kernels) and removes providers/i40iw from rdma-core.


^ permalink raw reply

* RE: [net-next 1/3] ice: Initialize and register platform device to provide RDMA
From: Saleem, Shiraz @ 2019-07-05 16:33 UTC (permalink / raw)
  To: Greg KH, Jason Gunthorpe
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, dledford@redhat.com,
	Nguyen, Anthony L, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, nhorman@redhat.com,
	sassmann@redhat.com, poswald@suse.com, Ismail, Mustafa,
	Ertman, David M, Bowers, AndrewX
In-Reply-To: <20190704134612.GB10963@kroah.com>

> Subject: Re: [net-next 1/3] ice: Initialize and register platform device to provide
> RDMA
> 
> On Thu, Jul 04, 2019 at 12:48:29PM +0000, Jason Gunthorpe wrote:
> > On Thu, Jul 04, 2019 at 02:42:47PM +0200, Greg KH wrote:
> > > On Thu, Jul 04, 2019 at 12:37:33PM +0000, Jason Gunthorpe wrote:
> > > > On Thu, Jul 04, 2019 at 02:29:50PM +0200, Greg KH wrote:
> > > > > On Thu, Jul 04, 2019 at 12:16:41PM +0000, Jason Gunthorpe wrote:
> > > > > > On Wed, Jul 03, 2019 at 07:12:50PM -0700, Jeff Kirsher wrote:
> > > > > > > From: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > > > > >
> > > > > > > The RDMA block does not advertise on the PCI bus or any other bus.
> > > > > > > Thus the ice driver needs to provide access to the RDMA
> > > > > > > hardware block via a virtual bus; utilize the platform bus to provide this
> access.
> > > > > > >
> > > > > > > This patch initializes the driver to support RDMA as well as
> > > > > > > creates and registers a platform device for the RDMA driver
> > > > > > > to register to. At this point the driver is fully
> > > > > > > initialized to register a platform driver, however, can not
> > > > > > > yet register as the ops have not been implemented.
> > > > > >
> > > > > > I think you need Greg's ack on all this driver stuff -
> > > > > > particularly that a platform_device is OK.
> > > > >
> > > > > A platform_device is almost NEVER ok.
> > > > >
> > > > > Don't abuse it, make a real device on a real bus.  If you don't
> > > > > have a real bus and just need to create a device to hang other
> > > > > things off of, then use the virtual one, that's what it is there for.
> > > >
> > > > Ideally I'd like to see all the RDMA drivers that connect to
> > > > ethernet drivers use some similar scheme.
> > >
> > > Why?  They should be attached to a "real" device, why make any up?
> >
> > ? A "real" device, like struct pci_device, can only bind to one
> > driver. How can we bind it concurrently to net, rdma, scsi, etc?
> 
> MFD was designed for this very problem.
> 
> > > > This is for a PCI device that plugs into multiple subsystems in
> > > > the kernel, ie it has net driver functionality, rdma
> > > > functionality, some even have SCSI functionality
> > >
> > > Sounds like a MFD device, why aren't you using that functionality
> > > instead?
> >
> > This was also my advice, but in another email Jeff says:
> >
> >   MFD architecture was also considered, and we selected the simpler
> >   platform model. Supporting a MFD architecture would require an
> >   additional MFD core driver, individual platform netdev, RDMA function
> >   drivers, and stripping a large portion of the netdev drivers into
> >   MFD core. The sub-devices registered by MFD core for function
> >   drivers are indeed platform devices.
> 
> So, "mfd is too hard, let's abuse a platform device" is ok?
> 
> People have been wanting to do MFD drivers for PCI devices for a long time, it's
> about time someone actually did the work for it, I bet it will not be all that complex
> if tiny embedded drivers can do it :)
> 
Hi Greg - Thanks for your feedback!

We currently have 2 PCI function netdev drivers in the kernel (i40e & ice) that support devices (x722 & e810)
which are RDMA capable. Our objective is to add a single unified RDMA driver
(as this a subsystem specific requirement) which needs to access HW resources from the
netdev PF drivers. Attaching platform devices from the netdev drivers to the platform bus
and having a single RDMA platform driver bind to them and access these resources seemed
like a simple approach to realize our objective. But seems like attaching platform devices is
wrong. I would like to understand why. 

Are platform sub devices only to be added from an MFD core driver? I am also wondering if MFD arch.
would allow for realizing a single RDMA driver and whether we need an MFD core driver for
each device, x722 & e810 or whether it can be a single driver.

Shiraz

^ permalink raw reply

* [PATCH rdma-next] IB/mlx5: Report correctly tag matching rendezvous capability
From: Leon Romanovsky @ 2019-07-05 16:21 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Danit Goldberg, RDMA mailing list, Artemy Kovalyov, Yishai Hadas,
	Leon Romanovsky

From: Danit Goldberg <danitg@mellanox.com>

Tag matching with rendezvous offload for RC transport is controlled
by FW and before this change, it was advertised to user as supported
without any relation to FW.

Separate tag matching for rendezvous and eager protocols, so users
will see real capabilities.

Cc: <stable@vger.kernel.org> # 4.13
Fixes: eb761894351d ("IB/mlx5: Fill XRQ capabilities")
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 8 ++++++--
 include/rdma/ib_verbs.h           | 4 ++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 07a05b0b9e42..c2a5780cb394 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1046,15 +1046,19 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	}
 
 	if (MLX5_CAP_GEN(mdev, tag_matching)) {
-		props->tm_caps.max_rndv_hdr_size = MLX5_TM_MAX_RNDV_MSG_SIZE;
 		props->tm_caps.max_num_tags =
 			(1 << MLX5_CAP_GEN(mdev, log_tag_matching_list_sz)) - 1;
-		props->tm_caps.flags = IB_TM_CAP_RC;
 		props->tm_caps.max_ops =
 			1 << MLX5_CAP_GEN(mdev, log_max_qp_sz);
 		props->tm_caps.max_sge = MLX5_TM_MAX_SGE;
 	}
 
+	if (MLX5_CAP_GEN(mdev, tag_matching) &&
+	    MLX5_CAP_GEN(mdev, rndv_offload_rc)) {
+		props->tm_caps.flags = IB_TM_CAP_RNDV_RC;
+		props->tm_caps.max_rndv_hdr_size = MLX5_TM_MAX_RNDV_MSG_SIZE;
+	}
+
 	if (MLX5_CAP_GEN(dev->mdev, cq_moderation)) {
 		props->cq_caps.max_cq_moderation_count =
 						MLX5_MAX_CQ_COUNT;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 30eb68f36109..c5f8a9f17063 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -308,8 +308,8 @@ struct ib_rss_caps {
 };
 
 enum ib_tm_cap_flags {
-	/*  Support tag matching on RC transport */
-	IB_TM_CAP_RC		    = 1 << 0,
+	/*  Support tag matching with rendezvous offload for RC transport */
+	IB_TM_CAP_RNDV_RC = 1 << 0,
 };
 
 struct ib_tm_caps {
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH rdma-next v5 00/17] Statistics counter support
From: Jason Gunthorpe @ 2019-07-05 15:50 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev
In-Reply-To: <20190702100246.17382-1-leon@kernel.org>

On Tue, Jul 02, 2019 at 01:02:29PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Changelog:
>  v4 -> v5:
>  * Patch #6 and #14 - consolidated many counter release functions,
>    removed mutex lock protection from dealloc_counter() call
>    and simplified kref_put/kref_get operations.
>  * Added Saeed's ACK tags.
>  v3 -> v4:
>  * Add counter_dealloc() callback function
>  * Moved to kref implementation
>  * Fixed lock during spinlock
>  v2 -> v3:
>  * We didn't change use of atomics over kref for management of unbind
>    counter from QP. The reason to it that bind and unbind are non-symmetric
>    in regards of put and get, so we need to count differently memory
>    release flows of HW objects (restrack) and SW bind operations.
>  * Everything else was addressed.
>  v1 -> v2:
>  * Rebased to latest rdma-next
>  v0 -> v1:
>  * Changed wording of counter comment
>  * Removed unneeded assignments
>  * Added extra patch to present global counters
> 
> 
> Hi,
> 
> This series from Mark provides dynamic statistics infrastructure.
> He uses netlink interface to configure and retrieve those counters.
> 
> This infrastructure allows to users monitor various objects by binding
> to them counters. As the beginning, we used QP object as target for
> those counters, but future patches will include ODP MR information too.
> 
> Two binding modes are supported:
>  - Auto: This allows a user to build automatic set of objects to a counter
>    according to common criteria. For example in a per-type scheme, where in
>    one process all QPs with same QP type are bound automatically to a single
>    counter.
>  - Manual: This allows a user to manually bind objects on a counter.
> 
> Those two modes are mutual-exclusive with separation between processes,
> objects created by different processes cannot be bound to a same counter.
> 
> For objects which don't support counter binding, we will return
> pre-allocated counters.
> 
> $ rdma statistic qp set link mlx5_2/1 auto type on
> $ rdma statistic qp set link mlx5_2/1 auto off
> $ rdma statistic qp bind link mlx5_2/1 lqpn 178
> $ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178
> $ rdma statistic show
> $ rdma statistic qp mode
> 
> Thanks
> 
> 
> Mark Zhang (17):
>   net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
>   RDMA/restrack: Introduce statistic counter
>   RDMA/restrack: Add an API to attach a task to a resource
>   RDMA/restrack: Make is_visible_in_pid_ns() as an API
>   RDMA/counter: Add set/clear per-port auto mode support
>   RDMA/counter: Add "auto" configuration mode support
>   IB/mlx5: Support set qp counter
>   IB/mlx5: Add counter set id as a parameter for
>     mlx5_ib_query_q_counters()
>   IB/mlx5: Support statistic q counter configuration
>   RDMA/nldev: Allow counter auto mode configration through RDMA netlink
>   RDMA/netlink: Implement counter dumpit calback
>   IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support
>   RDMA/core: Get sum value of all counters when perform a sysfs stat
>     read
>   RDMA/counter: Allow manual mode configuration support
>   RDMA/nldev: Allow counter manual mode configration through RDMA
>     netlink
>   RDMA/nldev: Allow get counter mode through RDMA netlink
>   RDMA/nldev: Allow get default counter statistics through RDMA netlink

Okay, applied to for-next

Thanks,
Jason

^ permalink raw reply

* Re: [EXT] Re: [RFC rdma 1/3] RDMA/core: Create a common mmap function
From: Jason Gunthorpe @ 2019-07-05 15:32 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Gal Pressman, dledford@redhat.com, leon@kernel.org,
	sleybo@amazon.com, Ariel Elior, linux-rdma@vger.kernel.org
In-Reply-To: <MN2PR18MB318240185BE80841F1265D2FA1F50@MN2PR18MB3182.namprd18.prod.outlook.com>

On Fri, Jul 05, 2019 at 03:29:03PM +0000, Michal Kalderon wrote:
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Thursday, July 4, 2019 3:35 PM
> > 
> > External Email
> > 
> > On Wed, Jul 03, 2019 at 11:19:34AM +0300, Gal Pressman wrote:
> > > On 03/07/2019 1:31, Jason Gunthorpe wrote:
> > > >> Seems except Mellanox + hns the mmap flags aren't ABI.
> > > >> Also, current Mellanox code seems like it won't benefit from mmap
> > > >> cookie helper functions in any case as the mmap function is very
> > > >> specific and the flags used indicate the address and not just how to map
> > it.
> > > >
> > > > IMHO, mlx5 has a goofy implementaiton here as it codes all of the
> > > > object type, handle and cachability flags in one thing.
> > >
> > > Do we need object type flags as well in the generic mmap code?
> > 
> > At the end of the day the driver needs to know what page to map during the
> > mmap syscall.
> > 
> > mlx5 does this by encoding the page type in the address, and then many
> > types have seperate lookups based onthe offset for the actual page.
> > 
> > IMHO the single lookup and opaque offset is generally better..
> > 
> > Since the mlx5 scheme is ABI it can't be changed unfortunately.
> > 
> > If you want to do user controlled cachability flags, or not, is a fair question,
> > but they still become ABI..
> > 
> > I'm wondering if it really makes sense to do that during the mmap, or if the
> > cachability should be set as part of creating the cookie?
> > 
> > > Another issue is that these flags aren't exposed in an ABI file, so a
> > > userspace library can't really make use of it in current state.
> > 
> > Woops.
> > 
> > Ah, this is all ABI so you need to dig out of this hole ASAP :)
> >
> Jason, I didn't follow - what is all ABI? 
> currently EFA implementation encodes the cachability inside the key,
> It's not exposed in ABI file and is opaque to user-space. The kernel decides on the cachability
> And get's it back in the key when mmap is called. It seems good
> enough for the current cases.

Then the key 'offset' should not include cachability information at
all.

Jason

^ permalink raw reply

* RE: [EXT] Re: [RFC rdma 1/3] RDMA/core: Create a common mmap function
From: Michal Kalderon @ 2019-07-05 15:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Gal Pressman
  Cc: dledford@redhat.com, leon@kernel.org, sleybo@amazon.com,
	Ariel Elior, linux-rdma@vger.kernel.org
In-Reply-To: <20190704123511.GA3447@ziepe.ca>

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Thursday, July 4, 2019 3:35 PM
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Wed, Jul 03, 2019 at 11:19:34AM +0300, Gal Pressman wrote:
> > On 03/07/2019 1:31, Jason Gunthorpe wrote:
> > >> Seems except Mellanox + hns the mmap flags aren't ABI.
> > >> Also, current Mellanox code seems like it won't benefit from mmap
> > >> cookie helper functions in any case as the mmap function is very
> > >> specific and the flags used indicate the address and not just how to map
> it.
> > >
> > > IMHO, mlx5 has a goofy implementaiton here as it codes all of the
> > > object type, handle and cachability flags in one thing.
> >
> > Do we need object type flags as well in the generic mmap code?
> 
> At the end of the day the driver needs to know what page to map during the
> mmap syscall.
> 
> mlx5 does this by encoding the page type in the address, and then many
> types have seperate lookups based onthe offset for the actual page.
> 
> IMHO the single lookup and opaque offset is generally better..
> 
> Since the mlx5 scheme is ABI it can't be changed unfortunately.
> 
> If you want to do user controlled cachability flags, or not, is a fair question,
> but they still become ABI..
> 
> I'm wondering if it really makes sense to do that during the mmap, or if the
> cachability should be set as part of creating the cookie?
> 
> > Another issue is that these flags aren't exposed in an ABI file, so a
> > userspace library can't really make use of it in current state.
> 
> Woops.
> 
> Ah, this is all ABI so you need to dig out of this hole ASAP :)
>
Jason, I didn't follow - what is all ABI? 
currently EFA implementation encodes the cachability inside the key,
It's not exposed in ABI file and is opaque to user-space. The kernel decides on the cachability
And get's it back in the key when mmap is called. It seems good enough for the current cases.  
Thanks , 
Michal
 
> Jason

^ permalink raw reply

* Re: [PATCH v2] RDMA/core: Fix race when resolving IP address
From: Dag Moxnes @ 2019-07-05  9:03 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Parav Pandit,
	linux-rdma, Linux Kernel Mailing List
In-Reply-To: <CAG53R5VQqqr0S6OU+13tcuxcvz922iuqoP-mWbaQERPc48964A@mail.gmail.com>


Den 05.07.2019 04:19, skrev Parav Pandit:
> On Fri, Jun 28, 2019 at 2:20 PM Dag Moxnes <dag.moxnes@oracle.com> wrote:
>> Use neighbour lock when copying MAC address from neighbour data struct
>> in dst_fetch_ha.
>>
>> When not using the lock, it is possible for the function to race with
>> neigh_update, causing it to copy an invalid MAC address.
>>
>> It is possible to provoke this error by calling rdma_resolve_addr in a
>> tight loop, while deleting the corresponding ARP entry in another tight
>> loop.
>>
>> Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>
>> ---
>> v1 -> v2:
>>     * Modified implementation to improve readability
>> ---
>>   drivers/infiniband/core/addr.c | 9 ++++++---
>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
>> index 2f7d141598..51323ffbc5 100644
>> --- a/drivers/infiniband/core/addr.c
>> +++ b/drivers/infiniband/core/addr.c
>> @@ -333,11 +333,14 @@ static int dst_fetch_ha(const struct dst_entry *dst,
>>          if (!n)
>>                  return -ENODATA;
>>
>> -       if (!(n->nud_state & NUD_VALID)) {
>> +       read_lock_bh(&n->lock);
>> +       if (n->nud_state & NUD_VALID) {
>> +               memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
>> +               read_unlock_bh(&n->lock);
>> +       } else {
>> +               read_unlock_bh(&n->lock);
>>                  neigh_event_send(n, NULL);
>>                  ret = -ENODATA;
>> -       } else {
>> -               memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
>>          }
>>
>>          neigh_release(n);
>> --
>> 2.20.1
>>
> Reviewed-by: Parav Pandit <parav@mellanox.com>
>
> A sample trace such as below in commit message would be good to have.
> Or the similar one that you noticed with ARP delete sequence.
>
> neigh_changeaddr()
>    neigh_flush_dev()
>     n->nud_state = NUD_NOARP;
>
> Having some issues with office outlook, so replying via gmail.

Hi Parav,

Thanks for your review. I'll add a sample trace to the commit message as

you suggest.


Regards,

-Dag


^ permalink raw reply

* Re: [PATCH v2] RDMA/core: Fix race when resolving IP address
From: Leon Romanovsky @ 2019-07-05  4:09 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Dag Moxnes, Doug Ledford, Jason Gunthorpe, Parav Pandit,
	linux-rdma, Linux Kernel Mailing List
In-Reply-To: <CAG53R5VQqqr0S6OU+13tcuxcvz922iuqoP-mWbaQERPc48964A@mail.gmail.com>

On Fri, Jul 05, 2019 at 07:49:06AM +0530, Parav Pandit wrote:
> On Fri, Jun 28, 2019 at 2:20 PM Dag Moxnes <dag.moxnes@oracle.com> wrote:
> >
> > Use neighbour lock when copying MAC address from neighbour data struct
> > in dst_fetch_ha.
> >
> > When not using the lock, it is possible for the function to race with
> > neigh_update, causing it to copy an invalid MAC address.
> >
> > It is possible to provoke this error by calling rdma_resolve_addr in a
> > tight loop, while deleting the corresponding ARP entry in another tight
> > loop.
> >
> > Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
> > Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> >
> > ---
> > v1 -> v2:
> >    * Modified implementation to improve readability
> > ---
> >  drivers/infiniband/core/addr.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> > index 2f7d141598..51323ffbc5 100644
> > --- a/drivers/infiniband/core/addr.c
> > +++ b/drivers/infiniband/core/addr.c
> > @@ -333,11 +333,14 @@ static int dst_fetch_ha(const struct dst_entry *dst,
> >         if (!n)
> >                 return -ENODATA;
> >
> > -       if (!(n->nud_state & NUD_VALID)) {
> > +       read_lock_bh(&n->lock);
> > +       if (n->nud_state & NUD_VALID) {
> > +               memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
> > +               read_unlock_bh(&n->lock);
> > +       } else {
> > +               read_unlock_bh(&n->lock);
> >                 neigh_event_send(n, NULL);
> >                 ret = -ENODATA;
> > -       } else {
> > -               memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
> >         }
> >
> >         neigh_release(n);
> > --
> > 2.20.1
> >
> Reviewed-by: Parav Pandit <parav@mellanox.com>
>
> A sample trace such as below in commit message would be good to have.
> Or the similar one that you noticed with ARP delete sequence.
>
> neigh_changeaddr()
>   neigh_flush_dev()
>    n->nud_state = NUD_NOARP;
>
> Having some issues with office outlook, so replying via gmail.

Your replies from gmail looks much better when you used Outlook - proper
spacing between quoted text and your reply.

Thanks

^ permalink raw reply

* Re: [PATCH v2] RDMA/core: Fix race when resolving IP address
From: Parav Pandit @ 2019-07-05  2:19 UTC (permalink / raw)
  To: Dag Moxnes
  Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Parav Pandit,
	linux-rdma, Linux Kernel Mailing List
In-Reply-To: <1561711763-24705-1-git-send-email-dag.moxnes@oracle.com>

On Fri, Jun 28, 2019 at 2:20 PM Dag Moxnes <dag.moxnes@oracle.com> wrote:
>
> Use neighbour lock when copying MAC address from neighbour data struct
> in dst_fetch_ha.
>
> When not using the lock, it is possible for the function to race with
> neigh_update, causing it to copy an invalid MAC address.
>
> It is possible to provoke this error by calling rdma_resolve_addr in a
> tight loop, while deleting the corresponding ARP entry in another tight
> loop.
>
> Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>
> ---
> v1 -> v2:
>    * Modified implementation to improve readability
> ---
>  drivers/infiniband/core/addr.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index 2f7d141598..51323ffbc5 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -333,11 +333,14 @@ static int dst_fetch_ha(const struct dst_entry *dst,
>         if (!n)
>                 return -ENODATA;
>
> -       if (!(n->nud_state & NUD_VALID)) {
> +       read_lock_bh(&n->lock);
> +       if (n->nud_state & NUD_VALID) {
> +               memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
> +               read_unlock_bh(&n->lock);
> +       } else {
> +               read_unlock_bh(&n->lock);
>                 neigh_event_send(n, NULL);
>                 ret = -ENODATA;
> -       } else {
> -               memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN);
>         }
>
>         neigh_release(n);
> --
> 2.20.1
>
Reviewed-by: Parav Pandit <parav@mellanox.com>

A sample trace such as below in commit message would be good to have.
Or the similar one that you noticed with ARP delete sequence.

neigh_changeaddr()
  neigh_flush_dev()
   n->nud_state = NUD_NOARP;

Having some issues with office outlook, so replying via gmail.

^ permalink raw reply

* Re: [PATCH rdma-next 0/2] Allow netlink commands in non init_net net namespace
From: Parav Pandit @ 2019-07-05  2:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Parav Pandit, Steve Wise
In-Reply-To: <20190704173403.GA20921@ziepe.ca>

On Thu, Jul 4, 2019 at 11:46 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Jul 04, 2019 at 04:04:00PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Now that RDMA devices can be attached to specific net namespace,
> > allow netlink commands in non init_net namespace.
> >
> > Parav Pandit (2):
> >   IB/core: Work on the caller socket net namespace in nldev_newlink()
> >   IB: Support netlink commands in non init_net net namespaces
>
> Could someone please confirm that all the new libibverbs stuff works
> properly in a container after this series?
>
Hi Jason,

I tested rping using rdma-core from [1] where I had to skip siw
compilation due to a compile error.
I used kernel from [2].

[1] https://github.com/jgunthorpe/rdma-plumbing.git branch netlink
[2] https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=rdma-next

Test were done in non init_net net namespace.
It looks good.
A sample kernel frace to double confirm below.

Due to our internal net-next-mlx5 behind of nldev patches, I couldn't
test mdev devices yet with this series.

19)               |  rdma_nl_rcv [ib_core]() {
19)               |    rdma_nl_rcv_msg [ib_core]() {
19)               |      nldev_get_chardev [ib_core]() {
19)   2.220 us    |        ib_device_get_by_index [ib_core]();
19)               |        ib_get_client_nl_info [ib_core]() {
19)               |          __ib_get_client_nl_info [ib_core]() {
19)   1.697 us    |            xan_find_marked.constprop.26 [ib_core]();
19)   1.024 us    |            xan_find_marked.constprop.26 [ib_core]();
19)   1.023 us    |            xan_find_marked.constprop.26 [ib_core]();
19)   1.153 us    |            xan_find_marked.constprop.26 [ib_core]();
19)   0.574 us    |            ib_uverbs_get_nl_info [ib_uverbs]();
19) + 12.507 us   |          }
19) + 13.533 us   |        }
19)   0.380 us    |        ib_device_put [ib_core]();
19)   4.600 us    |        rdma_nl_unicast [ib_core]();
19) + 26.533 us   |      }
19) + 27.457 us   |    }

Some issue with my outlook email. So replying from gmail..
Thanks.

^ permalink raw reply

* Re: [PATCH rdma-next v5 00/17] Statistics counter support
From: Leon Romanovsky @ 2019-07-04 18:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190704182529.GA20631@ziepe.ca>

On Thu, Jul 04, 2019 at 03:25:29PM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 02, 2019 at 01:02:29PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Changelog:
> >  v4 -> v5:
> >  * Patch #6 and #14 - consolidated many counter release functions,
> >    removed mutex lock protection from dealloc_counter() call
> >    and simplified kref_put/kref_get operations.
> >  * Added Saeed's ACK tags.
> >  v3 -> v4:
> >  * Add counter_dealloc() callback function
> >  * Moved to kref implementation
> >  * Fixed lock during spinlock
> >  v2 -> v3:
> >  * We didn't change use of atomics over kref for management of unbind
> >    counter from QP. The reason to it that bind and unbind are non-symmetric
> >    in regards of put and get, so we need to count differently memory
> >    release flows of HW objects (restrack) and SW bind operations.
> >  * Everything else was addressed.
> >  v1 -> v2:
> >  * Rebased to latest rdma-next
> >  v0 -> v1:
> >  * Changed wording of counter comment
> >  * Removed unneeded assignments
> >  * Added extra patch to present global counters
> >
> >
> > Hi,
> >
> > This series from Mark provides dynamic statistics infrastructure.
> > He uses netlink interface to configure and retrieve those counters.
> >
> > This infrastructure allows to users monitor various objects by binding
> > to them counters. As the beginning, we used QP object as target for
> > those counters, but future patches will include ODP MR information too.
> >
> > Two binding modes are supported:
> >  - Auto: This allows a user to build automatic set of objects to a counter
> >    according to common criteria. For example in a per-type scheme, where in
> >    one process all QPs with same QP type are bound automatically to a single
> >    counter.
> >  - Manual: This allows a user to manually bind objects on a counter.
> >
> > Those two modes are mutual-exclusive with separation between processes,
> > objects created by different processes cannot be bound to a same counter.
> >
> > For objects which don't support counter binding, we will return
> > pre-allocated counters.
> >
> > $ rdma statistic qp set link mlx5_2/1 auto type on
> > $ rdma statistic qp set link mlx5_2/1 auto off
> > $ rdma statistic qp bind link mlx5_2/1 lqpn 178
> > $ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178
> > $ rdma statistic show
> > $ rdma statistic qp mode
> >
> > Thanks
> >
> >
> > Mark Zhang (17):
> >   net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
> >   RDMA/restrack: Introduce statistic counter
> >   RDMA/restrack: Add an API to attach a task to a resource
> >   RDMA/restrack: Make is_visible_in_pid_ns() as an API
> >   RDMA/counter: Add set/clear per-port auto mode support
> >   RDMA/counter: Add "auto" configuration mode support
> >   IB/mlx5: Support set qp counter
> >   IB/mlx5: Add counter set id as a parameter for
> >     mlx5_ib_query_q_counters()
> >   IB/mlx5: Support statistic q counter configuration
> >   RDMA/nldev: Allow counter auto mode configration through RDMA netlink
> >   RDMA/netlink: Implement counter dumpit calback
> >   IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support
> >   RDMA/core: Get sum value of all counters when perform a sysfs stat
> >     read
> >   RDMA/counter: Allow manual mode configuration support
> >   RDMA/nldev: Allow counter manual mode configration through RDMA
> >     netlink
> >   RDMA/nldev: Allow get counter mode through RDMA netlink
> >   RDMA/nldev: Allow get default counter statistics through RDMA netlink
>
> Well, I can made the needed edits, can you apply the the first patch
> to the shared branch?

Thanks, pushed
f8efee08dd9d net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap

>
> Thanks,
> Jason

^ permalink raw reply

* Re: [PATCH rdma-next v5 00/17] Statistics counter support
From: Jason Gunthorpe @ 2019-07-04 18:25 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev
In-Reply-To: <20190702100246.17382-1-leon@kernel.org>

On Tue, Jul 02, 2019 at 01:02:29PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Changelog:
>  v4 -> v5:
>  * Patch #6 and #14 - consolidated many counter release functions,
>    removed mutex lock protection from dealloc_counter() call
>    and simplified kref_put/kref_get operations.
>  * Added Saeed's ACK tags.
>  v3 -> v4:
>  * Add counter_dealloc() callback function
>  * Moved to kref implementation
>  * Fixed lock during spinlock
>  v2 -> v3:
>  * We didn't change use of atomics over kref for management of unbind
>    counter from QP. The reason to it that bind and unbind are non-symmetric
>    in regards of put and get, so we need to count differently memory
>    release flows of HW objects (restrack) and SW bind operations.
>  * Everything else was addressed.
>  v1 -> v2:
>  * Rebased to latest rdma-next
>  v0 -> v1:
>  * Changed wording of counter comment
>  * Removed unneeded assignments
>  * Added extra patch to present global counters
> 
> 
> Hi,
> 
> This series from Mark provides dynamic statistics infrastructure.
> He uses netlink interface to configure and retrieve those counters.
> 
> This infrastructure allows to users monitor various objects by binding
> to them counters. As the beginning, we used QP object as target for
> those counters, but future patches will include ODP MR information too.
> 
> Two binding modes are supported:
>  - Auto: This allows a user to build automatic set of objects to a counter
>    according to common criteria. For example in a per-type scheme, where in
>    one process all QPs with same QP type are bound automatically to a single
>    counter.
>  - Manual: This allows a user to manually bind objects on a counter.
> 
> Those two modes are mutual-exclusive with separation between processes,
> objects created by different processes cannot be bound to a same counter.
> 
> For objects which don't support counter binding, we will return
> pre-allocated counters.
> 
> $ rdma statistic qp set link mlx5_2/1 auto type on
> $ rdma statistic qp set link mlx5_2/1 auto off
> $ rdma statistic qp bind link mlx5_2/1 lqpn 178
> $ rdma statistic qp unbind link mlx5_2/1 cntn 4 lqpn 178
> $ rdma statistic show
> $ rdma statistic qp mode
> 
> Thanks
> 
> 
> Mark Zhang (17):
>   net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
>   RDMA/restrack: Introduce statistic counter
>   RDMA/restrack: Add an API to attach a task to a resource
>   RDMA/restrack: Make is_visible_in_pid_ns() as an API
>   RDMA/counter: Add set/clear per-port auto mode support
>   RDMA/counter: Add "auto" configuration mode support
>   IB/mlx5: Support set qp counter
>   IB/mlx5: Add counter set id as a parameter for
>     mlx5_ib_query_q_counters()
>   IB/mlx5: Support statistic q counter configuration
>   RDMA/nldev: Allow counter auto mode configration through RDMA netlink
>   RDMA/netlink: Implement counter dumpit calback
>   IB/mlx5: Add counter_alloc_stats() and counter_update_stats() support
>   RDMA/core: Get sum value of all counters when perform a sysfs stat
>     read
>   RDMA/counter: Allow manual mode configuration support
>   RDMA/nldev: Allow counter manual mode configration through RDMA
>     netlink
>   RDMA/nldev: Allow get counter mode through RDMA netlink
>   RDMA/nldev: Allow get default counter statistics through RDMA netlink

Well, I can made the needed edits, can you apply the the first patch
to the shared branch?

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH rdma-next v5 11/17] RDMA/netlink: Implement counter dumpit calback
From: Leon Romanovsky @ 2019-07-04 18:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Majd Dibbiny, Mark Zhang,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190704180716.GA2583@ziepe.ca>

On Thu, Jul 04, 2019 at 03:07:16PM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 02, 2019 at 01:02:40PM +0300, Leon Romanovsky wrote:
> > diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> > index 0cb47d23fd86..22c5bc7a82dd 100644
> > +++ b/include/uapi/rdma/rdma_netlink.h
> > @@ -283,6 +283,8 @@ enum rdma_nldev_command {
> >
> >  	RDMA_NLDEV_CMD_STAT_SET,
> >
> > +	RDMA_NLDEV_CMD_STAT_GET, /* can dump */
> > +
> >  	RDMA_NLDEV_NUM_OPS
> >  };
> >
> > @@ -496,7 +498,13 @@ enum rdma_nldev_attr {
> >  	RDMA_NLDEV_ATTR_STAT_MODE,		/* u32 */
> >  	RDMA_NLDEV_ATTR_STAT_RES,		/* u32 */
> >  	RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK,	/* u32 */
> > -
> > +	RDMA_NLDEV_ATTR_STAT_COUNTER,		/* nested table */
> > +	RDMA_NLDEV_ATTR_STAT_COUNTER_ENTRY,	/* nested table */
> > +	RDMA_NLDEV_ATTR_STAT_COUNTER_ID,	/* u32 */
> > +	RDMA_NLDEV_ATTR_STAT_HWCOUNTERS,	/* nested table */
> > +	RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY,	/* nested table */
> > +	RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_NAME,	/* string */
> > +	RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_VALUE,	/* u64 */
> >  	/*
> >  	 * Information about a chardev.
> >  	 * CHARDEV_TYPE is the name of the chardev ABI (ie uverbs, umad, etc)
>
> This is in the wrong place, needs to be at the end.

Yes, it is rebase error.

Thanks

>
> Jason

^ permalink raw reply

* Re: [PATCH mlx5-next 4/5] net/mlx5: Introduce TLS TX offload hardware bits and structures
From: Leon Romanovsky @ 2019-07-04 18:21 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, Eran Ben Elisha, Tariq Toukan
In-Reply-To: <CALzJLG--k3z2HuV09tivJuOtU-BFAyCEV1vJbPqYX+OyskggmQ@mail.gmail.com>

On Thu, Jul 04, 2019 at 01:21:04PM -0400, Saeed Mahameed wrote:
> On Thu, Jul 4, 2019 at 1:15 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Thu, Jul 04, 2019 at 01:06:58PM -0400, Saeed Mahameed wrote:
> > > On Wed, Jul 3, 2019 at 5:27 AM <leon@kernel.org> wrote:
> > > >
> > > > On Wed, Jul 03, 2019 at 07:39:32AM +0000, Saeed Mahameed wrote:
> > > > > From: Eran Ben Elisha <eranbe@mellanox.com>
> > > > >
> > > > > Add TLS offload related IFC structs, layouts and enumerations.
> > > > >
> > > > > Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> > > > > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > > > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > > > > ---
> > > > >  include/linux/mlx5/device.h   |  14 +++++
> > > > >  include/linux/mlx5/mlx5_ifc.h | 104 ++++++++++++++++++++++++++++++++--
> > > > >  2 files changed, 114 insertions(+), 4 deletions(-)
> > > >
> > > > <...>
> > > >
> > > > > @@ -2725,7 +2739,8 @@ struct mlx5_ifc_traffic_counter_bits {
> > > > >
> > > > >  struct mlx5_ifc_tisc_bits {
> > > > >       u8         strict_lag_tx_port_affinity[0x1];
> > > > > -     u8         reserved_at_1[0x3];
> > > > > +     u8         tls_en[0x1];
> > > > > +     u8         reserved_at_1[0x2];
> > > >
> > > > It should be reserved_at_2.
> > > >
> > >
> > > it should be at_1.
> >
> > Why? See mlx5_ifc_flow_table_prop_layout_bits, mlx5_ifc_roce_cap_bits, e.t.c.
> >
>
> they are all at_1 .. so i don't really understand what you want from me,
> Leon the code is good, please double check you comments..

Saeed,

reserved_at_1 should be renamed to be reserved_at_2.

strict_lag_tx_port_affinity[0x1] + tls_en[0x1] = 0x2

>
> > Thanks
> >
> > >
> > > > Thanks

^ permalink raw reply

* Re: [PATCH rdma-next v5 06/17] RDMA/counter: Add "auto" configuration mode support
From: Jason Gunthorpe @ 2019-07-04 18:09 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev
In-Reply-To: <20190702100246.17382-7-leon@kernel.org>

On Tue, Jul 02, 2019 at 01:02:35PM +0300, Leon Romanovsky wrote:
> From: Mark Zhang <markz@mellanox.com>
> 
> In auto mode all QPs belong to one category are bind automatically to
> a single counter set. Currently only "qp type" is supported.
> 
> In this mode the qp counter is set in RST2INIT modification, and when
> a qp is destroyed the counter is unbound.
> 
> Signed-off-by: Mark Zhang <markz@mellanox.com>
> Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/core/counters.c | 221 +++++++++++++++++++++++++++++
>  drivers/infiniband/core/device.c   |   3 +
>  drivers/infiniband/core/verbs.c    |   9 ++
>  include/rdma/ib_verbs.h            |  18 +++
>  include/rdma/rdma_counter.h        |   8 ++
>  5 files changed, 259 insertions(+)
> 
> diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c
> index 6167914fba06..60639452669c 100644
> +++ b/drivers/infiniband/core/counters.c
> @@ -54,6 +54,227 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u8 port,
>  	return ret;
>  }
>  
> +static struct rdma_counter *rdma_counter_alloc(struct ib_device *dev, u8 port,
> +					       enum rdma_nl_counter_mode mode)
> +{
> +	struct rdma_counter *counter;
> +
> +	if (!dev->ops.counter_dealloc)
> +		return NULL;
> +
> +	counter = kzalloc(sizeof(*counter), GFP_KERNEL);
> +	if (!counter)
> +		return NULL;
> +
> +	counter->device    = dev;
> +	counter->port      = port;
> +	counter->res.type  = RDMA_RESTRACK_COUNTER;
> +	counter->mode.mode = mode;
> +	kref_init(&counter->kref);
> +	mutex_init(&counter->lock);
> +
> +	return counter;
> +}
> +
> +static void rdma_counter_free(struct rdma_counter *counter)
> +{
> +	rdma_restrack_del(&counter->res);
> +	kfree(counter);
> +}
> +
> +static void auto_mode_init_counter(struct rdma_counter *counter,
> +				   const struct ib_qp *qp,
> +				   enum rdma_nl_counter_mask new_mask)
> +{
> +	struct auto_mode_param *param = &counter->mode.param;
> +
> +	counter->mode.mode = RDMA_COUNTER_MODE_AUTO;
> +	counter->mode.mask = new_mask;
> +
> +	if (new_mask & RDMA_COUNTER_MASK_QP_TYPE)
> +		param->qp_type = qp->qp_type;
> +}
> +
> +static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter,
> +			    enum rdma_nl_counter_mask auto_mask)
> +{
> +	struct auto_mode_param *param = &counter->mode.param;
> +	bool match = true;
> +
> +	if (rdma_is_kernel_res(&counter->res) != rdma_is_kernel_res(&qp->res))
> +		return false;
> +
> +	/* Ensure that counter belong to right PID */
> +	if (!rdma_is_kernel_res(&counter->res) &&
> +	    !rdma_is_kernel_res(&qp->res) &&
> +	    (task_pid_vnr(counter->res.task) != current->pid))
> +		return false;
> +
> +	if (auto_mask & RDMA_COUNTER_MASK_QP_TYPE)
> +		match &= (param->qp_type == qp->qp_type);
> +
> +	return match;
> +}
> +
> +static int __rdma_counter_bind_qp(struct rdma_counter *counter,
> +				  struct ib_qp *qp)
> +{
> +	int ret;
> +
> +	if (qp->counter)
> +		return -EINVAL;
> +
> +	if (!qp->device->ops.counter_bind_qp)
> +		return -EOPNOTSUPP;
> +
> +	mutex_lock(&counter->lock);
> +	ret = qp->device->ops.counter_bind_qp(counter, qp);
> +	mutex_unlock(&counter->lock);
> +
> +	return ret;
> +}
> +
> +static int __rdma_counter_unbind_qp(struct ib_qp *qp)
> +{
> +	struct rdma_counter *counter = qp->counter;
> +	int ret;
> +
> +	if (!qp->device->ops.counter_unbind_qp)
> +		return -EOPNOTSUPP;
> +
> +	mutex_lock(&counter->lock);
> +	ret = qp->device->ops.counter_unbind_qp(qp);
> +	mutex_unlock(&counter->lock);
> +
> +	return ret;
> +}
> +
> +/**
> + * rdma_get_counter_auto_mode - Find the counter that @qp should be bound
> + *     with in auto mode
> + *
> + * Return: The counter (with ref-count increased) if found
> + */
> +static struct rdma_counter *rdma_get_counter_auto_mode(struct ib_qp *qp,
> +						       u8 port)
> +{
> +	struct rdma_port_counter *port_counter;
> +	struct rdma_counter *counter = NULL;
> +	struct ib_device *dev = qp->device;
> +	struct rdma_restrack_entry *res;
> +	struct rdma_restrack_root *rt;
> +	unsigned long id = 0;
> +
> +	port_counter = &dev->port_data[port].port_counter;
> +	rt = &dev->res[RDMA_RESTRACK_COUNTER];
> +	xa_lock(&rt->xa);
> +	xa_for_each(&rt->xa, id, res) {
> +		if (!rdma_is_visible_in_pid_ns(res))
> +			continue;
> +
> +		counter = container_of(res, struct rdma_counter, res);
> +		if ((counter->device != qp->device) || (counter->port != port))
> +			goto next;
> +
> +		if (auto_mode_match(qp, counter, port_counter->mode.mask))
> +			break;
> +next:
> +		counter = NULL;
> +	}
> +
> +	if (counter)
> +		kref_get(&counter->kref);

This still needs to be kref_get_unless_zero:

	if (counter && !kref_get_unless_zero(&counter->kref))
		counter = NULL;

Jason

^ permalink raw reply

* Re: [PATCH rdma-next v5 11/17] RDMA/netlink: Implement counter dumpit calback
From: Jason Gunthorpe @ 2019-07-04 18:07 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Majd Dibbiny,
	Mark Zhang, Saeed Mahameed, linux-netdev
In-Reply-To: <20190702100246.17382-12-leon@kernel.org>

On Tue, Jul 02, 2019 at 01:02:40PM +0300, Leon Romanovsky wrote:
> diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> index 0cb47d23fd86..22c5bc7a82dd 100644
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -283,6 +283,8 @@ enum rdma_nldev_command {
>  
>  	RDMA_NLDEV_CMD_STAT_SET,
>  
> +	RDMA_NLDEV_CMD_STAT_GET, /* can dump */
> +
>  	RDMA_NLDEV_NUM_OPS
>  };
>  
> @@ -496,7 +498,13 @@ enum rdma_nldev_attr {
>  	RDMA_NLDEV_ATTR_STAT_MODE,		/* u32 */
>  	RDMA_NLDEV_ATTR_STAT_RES,		/* u32 */
>  	RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK,	/* u32 */
> -
> +	RDMA_NLDEV_ATTR_STAT_COUNTER,		/* nested table */
> +	RDMA_NLDEV_ATTR_STAT_COUNTER_ENTRY,	/* nested table */
> +	RDMA_NLDEV_ATTR_STAT_COUNTER_ID,	/* u32 */
> +	RDMA_NLDEV_ATTR_STAT_HWCOUNTERS,	/* nested table */
> +	RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY,	/* nested table */
> +	RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_NAME,	/* string */
> +	RDMA_NLDEV_ATTR_STAT_HWCOUNTER_ENTRY_VALUE,	/* u64 */
>  	/*
>  	 * Information about a chardev.
>  	 * CHARDEV_TYPE is the name of the chardev ABI (ie uverbs, umad, etc)

This is in the wrong place, needs to be at the end.

Jason

^ permalink raw reply

* Re: [PATCH rdma-next 0/2] Allow netlink commands in non init_net net namespace
From: Jason Gunthorpe @ 2019-07-04 17:34 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Parav Pandit,
	Steve Wise
In-Reply-To: <20190704130402.8431-1-leon@kernel.org>

On Thu, Jul 04, 2019 at 04:04:00PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Now that RDMA devices can be attached to specific net namespace,
> allow netlink commands in non init_net namespace.
> 
> Parav Pandit (2):
>   IB/core: Work on the caller socket net namespace in nldev_newlink()
>   IB: Support netlink commands in non init_net net namespaces

Could someone please confirm that all the new libibverbs stuff works
properly in a container after this series?

Jason

^ permalink raw reply

* Re: [PATCH for-next] RDMA/efa: Entropy in admin commands id
From: Jason Gunthorpe @ 2019-07-04 17:32 UTC (permalink / raw)
  To: Gal Pressman
  Cc: Doug Ledford, linux-rdma, Daniel Kranzdorf, Firas JahJah,
	Yossi Leybovich
In-Reply-To: <20190630145302.13603-1-galpress@amazon.com>

On Sun, Jun 30, 2019 at 05:53:02PM +0300, Gal Pressman wrote:
> From: Daniel Kranzdorf <dkkranzd@amazon.com>
> 
> Make admin commands id easier to distinguish by using relevant bits from
> the producer counter.
> This allows us to differentiate admin commands with the same producer
> index (happens after admin queue overlap), which is helpful when
> debugging.
> 
> Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com>
> Reviewed-by: Firas JahJah <firasj@amazon.com>
> Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
> Signed-off-by: Gal Pressman <galpress@amazon.com>
> ---
>  drivers/infiniband/hw/efa/efa_com.c | 44 +++++++++++++++--------------
>  1 file changed, 23 insertions(+), 21 deletions(-)

Applied to for-next, thanks

Jason

^ permalink raw reply

* Re: [PATCH rdma-next] IB/ipoib: Add child to parent list only if device initialized
From: Jason Gunthorpe @ 2019-07-04 17:30 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Valentine Fatiev, RDMA mailing list, Feras Daoud,
	Leon Romanovsky
In-Reply-To: <20190630134841.19413-1-leon@kernel.org>

On Sun, Jun 30, 2019 at 04:48:41PM +0300, Leon Romanovsky wrote:
> From: Valentine Fatiev <valentinef@mellanox.com>
> 
> Despite failure in ipoib_dev_init() we continue with initialization
> flow and creation of child device. It causes to the situation
> where this child device is added too early to parent device list.
> 
> change the logic, so in case of failure we properly return error
> from ipoib_dev_init() and add child only in success path.
> 
> Fixes: eaeb39842508 ("IB/ipoib: Move init code to ndo_init")
> Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
> Reviewed-by: Feras Daoud <ferasda@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 34 +++++++++++++----------
>  1 file changed, 20 insertions(+), 14 deletions(-)

Applied to for-next, thanks

Jason

^ permalink raw reply

* Re: [PATCH mlx5-next 4/5] net/mlx5: Introduce TLS TX offload hardware bits and structures
From: Saeed Mahameed @ 2019-07-04 17:21 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Saeed Mahameed, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, Eran Ben Elisha, Tariq Toukan
In-Reply-To: <20190704171519.GE7212@mtr-leonro.mtl.com>

On Thu, Jul 4, 2019 at 1:15 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Thu, Jul 04, 2019 at 01:06:58PM -0400, Saeed Mahameed wrote:
> > On Wed, Jul 3, 2019 at 5:27 AM <leon@kernel.org> wrote:
> > >
> > > On Wed, Jul 03, 2019 at 07:39:32AM +0000, Saeed Mahameed wrote:
> > > > From: Eran Ben Elisha <eranbe@mellanox.com>
> > > >
> > > > Add TLS offload related IFC structs, layouts and enumerations.
> > > >
> > > > Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> > > > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > > > ---
> > > >  include/linux/mlx5/device.h   |  14 +++++
> > > >  include/linux/mlx5/mlx5_ifc.h | 104 ++++++++++++++++++++++++++++++++--
> > > >  2 files changed, 114 insertions(+), 4 deletions(-)
> > >
> > > <...>
> > >
> > > > @@ -2725,7 +2739,8 @@ struct mlx5_ifc_traffic_counter_bits {
> > > >
> > > >  struct mlx5_ifc_tisc_bits {
> > > >       u8         strict_lag_tx_port_affinity[0x1];
> > > > -     u8         reserved_at_1[0x3];
> > > > +     u8         tls_en[0x1];
> > > > +     u8         reserved_at_1[0x2];
> > >
> > > It should be reserved_at_2.
> > >
> >
> > it should be at_1.
>
> Why? See mlx5_ifc_flow_table_prop_layout_bits, mlx5_ifc_roce_cap_bits, e.t.c.
>

they are all at_1 .. so i don't really understand what you want from me,
Leon the code is good, please double check you comments..

> Thanks
>
> >
> > > Thanks

^ permalink raw reply

* Re: [PATCH mlx5-next 0/5] Mellanox, mlx5 low level updates 2019-07-02
From: Leon Romanovsky @ 2019-07-04 17:16 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <5c85e7cd688cc8727f421e4592304e66ccd018c7.camel@mellanox.com>

On Thu, Jul 04, 2019 at 05:10:25PM +0000, Saeed Mahameed wrote:
> On Wed, 2019-07-03 at 07:39 +0000, Saeed Mahameed wrote:
> > Hi All,
> >
> > This series includes some low level updates to mlx5 driver, required
> > for
> > shared mlx5-next branch.
> >
> > Tariq extends the WQE control fields names.
> > Eran adds the required HW definitions and structures for upcoming TLS
> > support.
> > Parav improves and refactors the E-Switch "function changed" handler.
> >
> > In case of no objections these patches will be applied to mlx5-next
> > and
> > will be sent later as pull request to both rdma-next and net-next
> > trees.
> >
> > Thanks,
> > Saeed.
>
> Applied to mlx5-next.

Saeed,

Please fix IFC, before you are pushing it out.

Thanks

>

^ permalink raw reply

* Re: [PATCH mlx5-next 4/5] net/mlx5: Introduce TLS TX offload hardware bits and structures
From: Leon Romanovsky @ 2019-07-04 17:15 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Saeed Mahameed, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, Eran Ben Elisha, Tariq Toukan
In-Reply-To: <CALzJLG-em1w+Lgf2UutbG2Lzq8bx3zUqoLGx26H2_EXOuuk+jg@mail.gmail.com>

On Thu, Jul 04, 2019 at 01:06:58PM -0400, Saeed Mahameed wrote:
> On Wed, Jul 3, 2019 at 5:27 AM <leon@kernel.org> wrote:
> >
> > On Wed, Jul 03, 2019 at 07:39:32AM +0000, Saeed Mahameed wrote:
> > > From: Eran Ben Elisha <eranbe@mellanox.com>
> > >
> > > Add TLS offload related IFC structs, layouts and enumerations.
> > >
> > > Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> > > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > > ---
> > >  include/linux/mlx5/device.h   |  14 +++++
> > >  include/linux/mlx5/mlx5_ifc.h | 104 ++++++++++++++++++++++++++++++++--
> > >  2 files changed, 114 insertions(+), 4 deletions(-)
> >
> > <...>
> >
> > > @@ -2725,7 +2739,8 @@ struct mlx5_ifc_traffic_counter_bits {
> > >
> > >  struct mlx5_ifc_tisc_bits {
> > >       u8         strict_lag_tx_port_affinity[0x1];
> > > -     u8         reserved_at_1[0x3];
> > > +     u8         tls_en[0x1];
> > > +     u8         reserved_at_1[0x2];
> >
> > It should be reserved_at_2.
> >
>
> it should be at_1.

Why? See mlx5_ifc_flow_table_prop_layout_bits, mlx5_ifc_roce_cap_bits, e.t.c.

Thanks

>
> > Thanks

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox