Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: rdma-core example spec file is broken
From: Leon Romanovsky @ 2016-11-09  5:59 UTC (permalink / raw)
  To: Alaa Hleihel
  Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	yishaih-VPRAkNaXOzVWk0Htik3J/w
In-Reply-To: <f807db01-c4c7-0f64-fe6b-476d02b686b3-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 374 bytes --]

On Tue, Nov 08, 2016 at 04:47:21PM +0200, Alaa Hleihel wrote:
> I tested this patch.
> It resolves the issue.

Thanks, applied.

>
> Thanks,
> Alaa
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH rdma-next 0/4] Add packet pacing support for IB verbs
From: Leon Romanovsky @ 2016-11-09  6:40 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1828884A29C6694DAF28B7E6B8A82373AB0A7B31-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 752 bytes --]

On Tue, Nov 08, 2016 at 05:49:26PM +0000, Hefty, Sean wrote:
> > When sending from a 10G host to a 1G host, it is easy to overrun the
> > receiver,
> > leading to packet loss and traffic backing off. Similar problems occur
> > when
> > a 10G host sends data to a sub-10G virtual circuit, or a 40G host
> > sending
> > to a 10G host. Packet pacing could control packet injection rate and
> > reduces
> > network congestion to maximize throughput & minimize network latency.
>
> Why isn't the path record data and existing mechanisms sufficient to handle this?
>

Packet pacing allows different combinations of traffic shaping: per-CPU,
per-flow and their combinations with better and steady QoS requirements
without involving subnet management.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v4 9/9] selinux: Add a cache for quicker retreival of PKey SIDs
From: Leon Romanovsky @ 2016-11-09  7:04 UTC (permalink / raw)
  To: Dan Jurgens
  Cc: chrisw, paul, sds, eparis, dledford, sean.hefty, hal.rosenstock,
	selinux, linux-security-module, linux-rdma, yevgenyp, liranl
In-Reply-To: <1478639185-47521-10-git-send-email-danielj@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

On Tue, Nov 08, 2016 at 11:06:25PM +0200, Dan Jurgens wrote:
> From: Daniel Jurgens <danielj@mellanox.com>
>
> It is likely that the SID for the same PKey will be requested many
> times. To reduce the time to modify QPs and process MADs use a cache to
> store PKey SIDs.
>
> This code is heavily based on the "netif" and "netport" concept
> originally developed by James Morris <jmorris@redhat.com> and Paul Moore
> <paul@paul-moore.com> (see security/selinux/netif.c and
> security/selinux/netport.c for more information)
>
> issue: 736423
> Change-Id: I176c3079d5d84d06839b4f750100ac47a6081e94

It doesn't belong to commit message.

> Signed-off-by: Daniel Jurgens <danielj@mellanox.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs
From: Leon Romanovsky @ 2016-11-09  7:21 UTC (permalink / raw)
  To: Salil Mehta
  Cc: dledford, xavier.huwei, oulijun, mehta.salil.lnk, linux-rdma,
	netdev, linux-kernel, linuxarm, Ping Zhang
In-Reply-To: <20161104163633.141880-4-salil.mehta@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 1501 bytes --]

On Fri, Nov 04, 2016 at 04:36:25PM +0000, Salil Mehta wrote:
> From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
>
> This patch modified the logic of allocating memory using APIs in
> hns RoCE driver. We used kcalloc instead of kmalloc_array and
> bitmap_zero. And When kcalloc failed, call vzalloc to alloc
> memory.
>
> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
> Signed-off-by: Ping Zhang <zhangping5@huawei.com>
> Signed-off-by: Salil Mehta  <salil.mehta@huawei.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_mr.c |   15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
> index fb87883..d3dfb5f 100644
> --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> @@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct hns_roce_buddy *buddy, int max_order)
>
>  	for (i = 0; i <= buddy->max_order; ++i) {
>  		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> -		buddy->bits[i] = kmalloc_array(s, sizeof(long), GFP_KERNEL);
> -		if (!buddy->bits[i])
> -			goto err_out_free;
> -
> -		bitmap_zero(buddy->bits[i], 1 << (buddy->max_order - i));
> +		buddy->bits[i] = kcalloc(s, sizeof(long), GFP_KERNEL);
> +		if (!buddy->bits[i]) {
> +			buddy->bits[i] = vzalloc(s * sizeof(long));

I wonder, why don't you use directly vzalloc instead of kcalloc fallback?

> +			if (!buddy->bits[i])
> +				goto err_out_free;
> +		}
>  	}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH for-next 09/11] IB/hns: Change qpn allocation to round-robin mode.
From: Leon Romanovsky @ 2016-11-09  7:24 UTC (permalink / raw)
  To: Salil Mehta
  Cc: dledford, xavier.huwei, oulijun, mehta.salil.lnk, linux-rdma,
	netdev, linux-kernel, linuxarm
In-Reply-To: <20161104163633.141880-10-salil.mehta@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

On Fri, Nov 04, 2016 at 04:36:31PM +0000, Salil Mehta wrote:
> From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
>
> When using CM to establish connections, qp number that was freed
> just now will be rejected by ib core. To fix these problem, We
> change qpn allocation to round-robin mode. We added the round-robin
> mode for allocating resources using bitmap. We use round-robin mode
> for qp number and non round-robing mode for other resources like
> cq number, pd number etc.
>
> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
> Signed-off-by: Salil Mehta  <salil.mehta@huawei.com>

Reviewed-by: Leon Romanovsky <leonro@mellanox.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH rdma-rc 2/9] IB/mlx4: Check gid_index return value
From: Leon Romanovsky @ 2016-11-09  7:26 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens
In-Reply-To: <20161106072503.GB3799-Hxa29pjIrETwm8eLU6eYyt+IiqhCXseY@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 219 bytes --]

On Sun, Nov 06, 2016 at 09:25:04AM +0200, Yuval Shaia wrote:
> FWIW
> Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Thanks Yuval,
As I wrote earlier, we will address all your comments.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH] build: Fix build script to use correct cmake cmd
From: Leon Romanovsky @ 2016-11-09  7:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Dennis Dalessandro,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161107235709.GF7002-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 611 bytes --]

On Mon, Nov 07, 2016 at 04:57:09PM -0700, Jason Gunthorpe wrote:
> On Tue, Oct 25, 2016 at 01:12:00PM +0300, Leon Romanovsky wrote:
> > > stuff I have - eg should I make it pushable? It is easy to use, but
> > > you need to have docker installed.
> >
> > I would be happy to get it and be more confident in my local tests.
>
> You can test it out with this commit:
>
> https://github.com/jgunthorpe/rdma-plumbing/commit/ef24b991c949ad4f50614bf6bf549e1cdf841358
>
> It will need some tidying before it can merged, but let me know if it
> is useful as-is.

Thanks, I'll do my best to try it next week.

>
> Jason

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v2 rdma-core 3/7] libhns: Add verbs of pd and mr support
From: Leon Romanovsky @ 2016-11-09  7:34 UTC (permalink / raw)
  To: Lijun Ou
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1477731826-10787-4-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

On Sat, Oct 29, 2016 at 05:03:42PM +0800, Lijun Ou wrote:
> This patch mainly introduces the verbs with pd and mr,
> included alloc_pd, dealloc_pd, reg_mr and dereg_mr.
>
> Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> ---
> v2:
> - No change over v1
>
> v1:
> - The initial submit
> ---
>  providers/hns/hns_roce_u.c       |  4 ++
>  providers/hns/hns_roce_u.h       | 18 +++++++++
>  providers/hns/hns_roce_u_abi.h   |  6 +++
>  providers/hns/hns_roce_u_verbs.c | 79 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 107 insertions(+)

<....>

> +struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
> +				 int access)
> +{
> +	int ret;
> +	struct ibv_mr *mr;
> +	struct ibv_reg_mr cmd;
> +	struct ibv_reg_mr_resp resp;
> +
> +	if (addr == NULL) {

It can be great if you use one style for all your code e.g. if(!addr) ....

> +		fprintf(stderr, "2nd parm addr is NULL!\n");
> +		return NULL;
> +	}
> +
> +	if (length == 0) {
> +		fprintf(stderr, "3st parm length is 0!\n");
> +		return NULL;
> +	}
> +
> +	mr = malloc(sizeof(*mr));
> +	if (mr)
> +		return NULL;

It looks like bug and you wanted if(!mr) and not if(mr).


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [PATCH] net/mlx4_en: Fix bpf_prog_add ref_cnt in mlx4
From: Zhiyi Sun @ 2016-11-09  7:35 UTC (permalink / raw)
  To: bblanco, Tariq Toukan, Yishai Hadas, netdev, linux-rdma,
	linux-kernel
  Cc: zhiyisun

There are rx_ring_num queues. Each queue will load xdp prog. So
bpf_prog_add() should add rx_ring_num to ref_cnt.

Signed-off-by: Zhiyi Sun <zhiyisun@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 12c99a2..d25e150 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2650,7 +2650,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	 */
 	if (priv->xdp_ring_num == xdp_ring_num) {
 		if (prog) {
-			prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
+			prog = bpf_prog_add(prog, priv->rx_ring_num);
 			if (IS_ERR(prog))
 				return PTR_ERR(prog);
 		}
@@ -2680,7 +2680,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	}
 
 	if (prog) {
-		prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
+		prog = bpf_prog_add(prog, priv->rx_ring_num);
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 	}
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH v2 rdma-core 3/7] libhns: Add verbs of pd and mr support
From: oulijun @ 2016-11-09  8:01 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161109073426.GL27883-2ukJVAZIZ/Y@public.gmane.org>

在 2016/11/9 15:34, Leon Romanovsky 写道:
> On Sat, Oct 29, 2016 at 05:03:42PM +0800, Lijun Ou wrote:
>> This patch mainly introduces the verbs with pd and mr,
>> included alloc_pd, dealloc_pd, reg_mr and dereg_mr.
>>
>> Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>> ---
>> v2:
>> - No change over v1
>>
>> v1:
>> - The initial submit
>> ---
>>  providers/hns/hns_roce_u.c       |  4 ++
>>  providers/hns/hns_roce_u.h       | 18 +++++++++
>>  providers/hns/hns_roce_u_abi.h   |  6 +++
>>  providers/hns/hns_roce_u_verbs.c | 79 ++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 107 insertions(+)
> 
> <....>
> 
>> +struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
>> +				 int access)
>> +{
>> +	int ret;
>> +	struct ibv_mr *mr;
>> +	struct ibv_reg_mr cmd;
>> +	struct ibv_reg_mr_resp resp;
>> +
>> +	if (addr == NULL) {
> 
> It can be great if you use one style for all your code e.g. if(!addr) ....
> 
ok, thanks your advice and i will consider to fix it.
>> +		fprintf(stderr, "2nd parm addr is NULL!\n");
>> +		return NULL;
>> +	}
>> +
>> +	if (length == 0) {
>> +		fprintf(stderr, "3st parm length is 0!\n");
>> +		return NULL;
>> +	}
>> +
>> +	mr = malloc(sizeof(*mr));
>> +	if (mr)
>> +		return NULL;
> 
> It looks like bug and you wanted if(!mr) and not if(mr).
> 
Yes, This is my careless for generating patch. my local server's code is if(!mr)
I will fix it.

Lijun Ou

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net/mlx4_en: Fix bpf_prog_add ref_cnt in mlx4
From: Daniel Borkmann @ 2016-11-09  9:05 UTC (permalink / raw)
  To: Zhiyi Sun
  Cc: bblanco-uqk4Ao+rVK5Wk0Htik3J/w, Tariq Toukan, Yishai Hadas,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <20161109073544.jbufjqn7y7oa6ptg@ubuntu>

On 11/09/2016 08:35 AM, Zhiyi Sun wrote:
> There are rx_ring_num queues. Each queue will load xdp prog. So
> bpf_prog_add() should add rx_ring_num to ref_cnt.
>
> Signed-off-by: Zhiyi Sun <zhiyisun-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Your analysis looks incorrect to me. Please elaborate in more detail why
you think current code is buggy ...

Call path is dev_change_xdp_fd(), which does bpf_prog_get_type() on the
fd. This already takes a ref and only drops it in case of error. Thus
in mlx4_xdp_set(), you only need priv->rx_ring_num - 1 refs for the rest
of the rings, so that dropping refs from old_prog makes sure we release
it again. Looks correct to me (maybe a comment would have helped there).

>   drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 12c99a2..d25e150 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2650,7 +2650,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>   	 */
>   	if (priv->xdp_ring_num == xdp_ring_num) {
>   		if (prog) {
> -			prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
> +			prog = bpf_prog_add(prog, priv->rx_ring_num);
>   			if (IS_ERR(prog))
>   				return PTR_ERR(prog);
>   		}
> @@ -2680,7 +2680,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>   	}
>
>   	if (prog) {
> -		prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
> +		prog = bpf_prog_add(prog, priv->rx_ring_num);
>   		if (IS_ERR(prog))
>   			return PTR_ERR(prog);
>   	}
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC ABI V5 01/10] RDMA/core: Refactor IDR to be per-device
From: Matan Barak @ 2016-11-09  9:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Hefty, Sean
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Doug Ledford,
	Christoph Lameter, Liran Liss, Haggai Eran, Majd Dibbiny,
	Tal Alon, Leon Romanovsky
In-Reply-To: <20161107235516.GE7002-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On 08/11/2016 01:55, Jason Gunthorpe wrote:
> On Fri, Oct 28, 2016 at 10:53:13PM +0000, Hefty, Sean wrote:
>>> The current code creates an IDR per type. Since types are currently
>>> common for all vendors and known in advance, this was good enough.
>>> However, the proposed ioctl based infrastructure allows each vendor
>>> to declare only some of the common types and declare its own specific
>>> types.
>>>
>>> Thus, we decided to implement IDR to be per device and refactor it to
>>> use a new file.
>>
>> I think this needs to be more abstract.  I would consider
>> introducing the concept of an 'ioctl provider', with the idr per
>> ioctl provider.  You could then make each ib_device an ioctl
>> provider.  (Just embed the structure).  I believe this will be
>> necessary to support the rdma_cm, ib_cm, as well as devices that
>> export different sets of ioctls, where an ib_device isn't
>> necessarily available.
>>
>> Essentially, I would treat plugging into the uABI independent from
>> plugging into the kernel verbs API.  Otherwise, I think we'll end up
>> with multiple ioctl 'frameworks'.
>
> Matan,
>
> I think you should change things so that all the *general* code uses
> 'urdma_' as a prefix instead of uverbs_. Only use uverbs_ on things
> that truely only apply to uverbs. This will make things much
> clearer how the code sharing is expected to work with rdma_cm
>

Yeah, I'll change the general infrastructure to be urdma.

> Sean is right, this shows why having the IDR be per device does not
> work, rdma-cm really does need a per-file or global IDR - both
> approaches should really be the same, and I think per-file has better
> locking characteristics, so I'd recommend that.
>

Eventually, I think ending up with an ioctl_provider and ioctl_context 
is the way to go here. The IDR and locks should be per ioctl_provider.
In ib_device world, an ioctl_provider is indeed an ib_device. In rdma_cm 
world, the ioctl_provider is the rdma_cm global file.
However, I think in order to do such large amount of changes, lets push 
things incrementally. We could start with the current schema, where it's 
ib_device specific, lay out the foundations and then refactor this to be 
more abstract when adding rdma_cm. We could even do that refactoring 
before enabling the ioctl interface, so if we see that something in the 
model is broken, we could still back-off.
Sounds reasonable?

> Jason
>

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
From: Matan Barak @ 2016-11-09  9:45 UTC (permalink / raw)
  To: Jason Gunthorpe, Matan Barak
  Cc: Leon Romanovsky, Christoph Hellwig, linux-rdma, Doug Ledford,
	Sean Hefty, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon
In-Reply-To: <20161108004351.GA32444-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On 08/11/2016 02:43, Jason Gunthorpe wrote:
> On Sun, Oct 30, 2016 at 10:48:39AM +0200, Matan Barak wrote:
>> On Fri, Oct 28, 2016 at 5:46 PM, Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>>> On Fri, Oct 28, 2016 at 08:37:25AM -0700, Christoph Hellwig wrote:
>>>> On Fri, Oct 28, 2016 at 06:33:06PM +0300, Leon Romanovsky wrote:
>>>>> Just to summarize, to be sure that I understood you correctly.
>>>>>
>>>>> | write | -> | conversion logic | ---
>>>>> | ioctl | ---------------------------
>>>>>
>>>>> Am I right?
>>>>
>>>> Yes, as long as the write and ioctl boxes do the copy_{from,to}_user.
>
>> If we accept the limitations here (i.e - all commands attributes
>> come either from kernel or from user, but you can't mix them -
>> that's mean the write comparability layer either needs to copy all
>> attributes or use a direct mapping for all of them), I could just
>> either break ib_uverbs_cmd_verbs to a a few functions or just pass a
>> callback of boxing the descriptors copy.
>
> From what I saw in the series, this looks easy enough to fix..
>
> Just lightly refactor things so that the write() compat layer can call
> into the ioctl processor with an already prepared tlv list in kernel
> memory and form such a list on the stack when doing the compat stuff.
>

Yeah, it's just an easy refactor of ib_uverbs_cmd_verbs and there's 
multiple ways of doing that :)

> The bigger problem is the tlv list pointers themselves, they have to
> point to user memory so the compat layer can only do so much of a
> transformation.
>
> I guess another flag in the copy_from_user wrapper would do the trick
> if we need it.
>

Currently we assume the payload itself is in user-space only so direct 
mapping is mandatory.
If we ever need to do something other than (bunch of consecutive write 
ABI struct fields) -> (attribute in the ioctl world), we'll have to box 
these copy macros/functions with copy_from_attr and copy_to_attr calls.

> Jason

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net/mlx4_en: Fix bpf_prog_add ref_cnt in mlx4
From: Zhiyi Sun @ 2016-11-09  9:45 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: bblanco-uqk4Ao+rVK5Wk0Htik3J/w, Tariq Toukan, Yishai Hadas,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <5822E6DB.40204-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>

On Wed, Nov 09, 2016 at 10:05:31AM +0100, Daniel Borkmann wrote:
> On 11/09/2016 08:35 AM, Zhiyi Sun wrote:
> > There are rx_ring_num queues. Each queue will load xdp prog. So
> > bpf_prog_add() should add rx_ring_num to ref_cnt.
> > 
> > Signed-off-by: Zhiyi Sun <zhiyisun-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> Your analysis looks incorrect to me. Please elaborate in more detail why
> you think current code is buggy ...
> 

Yes, you are correct. My patch is incorrect. It is not a bug.

> Call path is dev_change_xdp_fd(), which does bpf_prog_get_type() on the
> fd. This already takes a ref and only drops it in case of error. Thus
> in mlx4_xdp_set(), you only need priv->rx_ring_num - 1 refs for the rest
> of the rings, so that dropping refs from old_prog makes sure we release
> it again. Looks correct to me (maybe a comment would have helped there).
> 

I thought mlx4's code is incorrect because in mlx5's driver, function
mlx5e_xdp_set() calls a pair of bpf_prog_add/put, the number of add and
put to the refs are same. I didn't notice that one "add" has been called in its
calller. So, it seems that mlx5's code is incorrect, right?

> >   drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > index 12c99a2..d25e150 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> > @@ -2650,7 +2650,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
> >   	 */
> >   	if (priv->xdp_ring_num == xdp_ring_num) {
> >   		if (prog) {
> > -			prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
> > +			prog = bpf_prog_add(prog, priv->rx_ring_num);
> >   			if (IS_ERR(prog))
> >   				return PTR_ERR(prog);
> >   		}
> > @@ -2680,7 +2680,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
> >   	}
> > 
> >   	if (prog) {
> > -		prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
> > +		prog = bpf_prog_add(prog, priv->rx_ring_num);
> >   		if (IS_ERR(prog))
> >   			return PTR_ERR(prog);
> >   	}
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net/mlx4_en: Fix bpf_prog_add ref_cnt in mlx4
From: Daniel Borkmann @ 2016-11-09  9:57 UTC (permalink / raw)
  To: Zhiyi Sun
  Cc: bblanco, Tariq Toukan, Yishai Hadas, netdev, linux-rdma,
	linux-kernel, alexei.starovoitov
In-Reply-To: <20161109094546.jtmzc4xwtaavzcnt@ubuntu>

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

On 11/09/2016 10:45 AM, Zhiyi Sun wrote:
> On Wed, Nov 09, 2016 at 10:05:31AM +0100, Daniel Borkmann wrote:
>> On 11/09/2016 08:35 AM, Zhiyi Sun wrote:
>>> There are rx_ring_num queues. Each queue will load xdp prog. So
>>> bpf_prog_add() should add rx_ring_num to ref_cnt.
>>>
>>> Signed-off-by: Zhiyi Sun <zhiyisun@gmail.com>
>>
>> Your analysis looks incorrect to me. Please elaborate in more detail why
>> you think current code is buggy ...
>
> Yes, you are correct. My patch is incorrect. It is not a bug.
>
>> Call path is dev_change_xdp_fd(), which does bpf_prog_get_type() on the
>> fd. This already takes a ref and only drops it in case of error. Thus
>> in mlx4_xdp_set(), you only need priv->rx_ring_num - 1 refs for the rest
>> of the rings, so that dropping refs from old_prog makes sure we release
>> it again. Looks correct to me (maybe a comment would have helped there).
>
> I thought mlx4's code is incorrect because in mlx5's driver, function
> mlx5e_xdp_set() calls a pair of bpf_prog_add/put, the number of add and
> put to the refs are same. I didn't notice that one "add" has been called in its
> calller. So, it seems that mlx5's code is incorrect, right?

Yep, I think the two attached patches are needed.

The other thing I noticed in mlx5e_create_rq() is that it calls
bpf_prog_add(rq->xdp_prog, 1) without actually checking for errors.

[-- Attachment #2: 0001-bpf-mlx4-fix-prog-refcount-in-mlx4_en_try_alloc_reso.patch --]
[-- Type: text/x-patch, Size: 3022 bytes --]

>From d2bd6b3cd8636716a06b0ea3b1e041e16f87cce0 Mon Sep 17 00:00:00 2001
Message-Id: <d2bd6b3cd8636716a06b0ea3b1e041e16f87cce0.1478685278.git.daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 9 Nov 2016 10:31:19 +0100
Subject: [PATCH net-next 1/2] bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path

Commit 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings
scheme") added a bug in that the prog's reference count is not dropped
in the error path when mlx4_en_try_alloc_resources() is failing.

We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we
need to release again. Earlier in the call-path, dev_change_xdp_fd()
itself holds a ref to the prog as well, which is then released though
bpf_prog_put() due to the propagated error.

Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  5 ++++-
 include/linux/bpf.h                            |  1 +
 kernel/bpf/syscall.c                           | 11 +++++++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0f6225c..4104aec 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2747,8 +2747,11 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	}
 
 	err = mlx4_en_try_alloc_resources(priv, tmp, &new_prof);
-	if (err)
+	if (err) {
+		if (prog)
+			bpf_prog_add_undo(prog, priv->rx_ring_num - 1);
 		goto unlock_out;
+	}
 
 	if (priv->port_up) {
 		port_up = 1;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index edcd96d..4f6a4f1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -234,6 +234,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
 struct bpf_prog *bpf_prog_get(u32 ufd);
 struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type);
 struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i);
+void bpf_prog_add_undo(struct bpf_prog *prog, int i);
 struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog);
 void bpf_prog_put(struct bpf_prog *prog);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 228f962..a6e4dd8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -680,6 +680,17 @@ struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
 }
 EXPORT_SYMBOL_GPL(bpf_prog_add);
 
+void bpf_prog_add_undo(struct bpf_prog *prog, int i)
+{
+	/* Only to be used for undoing previous bpf_prog_add() in some
+	 * error path. We still know that another entity in our call
+	 * path holds a reference to the program, thus atomic_sub() can
+	 * be safely used here!
+	 */
+	atomic_sub(i, &prog->aux->refcnt);
+}
+EXPORT_SYMBOL_GPL(bpf_prog_add_undo);
+
 struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
 {
 	return bpf_prog_add(prog, 1);
-- 
1.9.3


[-- Attachment #3: 0002-bpf-mlx5-fix-prog-refcount-in-mlx5e_xdp_set.patch --]
[-- Type: text/x-patch, Size: 1474 bytes --]

>From f0789544432bbb89c53c3b8ac6575d48fed97786 Mon Sep 17 00:00:00 2001
Message-Id: <f0789544432bbb89c53c3b8ac6575d48fed97786.1478685278.git.daniel@iogearbox.net>
In-Reply-To: <d2bd6b3cd8636716a06b0ea3b1e041e16f87cce0.1478685278.git.daniel@iogearbox.net>
References: <d2bd6b3cd8636716a06b0ea3b1e041e16f87cce0.1478685278.git.daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 9 Nov 2016 10:51:26 +0100
Subject: [PATCH net-next 2/2] bpf, mlx5: fix prog refcount in mlx5e_xdp_set

dev_change_xdp_fd() already holds a reference, so bpf_prog_add(prog, 1)
is not correct as it takes one reference too much and will thus leak
the prog eventually. Also, bpf_prog_add() can fail and is not checked
for errors here.

Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index ba0c774..63309dd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3121,8 +3121,6 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
 
 	/* exchange programs */
 	old_prog = xchg(&priv->xdp_prog, prog);
-	if (prog)
-		bpf_prog_add(prog, 1);
 	if (old_prog)
 		bpf_prog_put(old_prog);
 
-- 
1.9.3


^ permalink raw reply related

* Re: [PATCH net-next 1/2] bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path
From: kbuild test robot @ 2016-11-09 10:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: kbuild-all-JC7UmRfGjtg, Zhiyi Sun, bblanco-uqk4Ao+rVK5Wk0Htik3J/w,
	Tariq Toukan, Yishai Hadas, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <5822F30C.1050900-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

Hi Daniel,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Daniel-Borkmann/bpf-mlx4-fix-prog-refcount-in-mlx4_en_try_alloc_resources-error-path/20161109-182712
config: x86_64-acpi-redef (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/mellanox/mlx4/en_netdev.c: In function 'mlx4_xdp_set':
>> drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2752:4: error: implicit declaration of function 'bpf_prog_add_undo' [-Werror=implicit-function-declaration]
       bpf_prog_add_undo(prog, priv->rx_ring_num - 1);
       ^~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/bpf_prog_add_undo +2752 drivers/net/ethernet/mellanox/mlx4/en_netdev.c

  2746			en_warn(priv, "Reducing the number of TX rings, to not exceed the max total rings number.\n");
  2747		}
  2748	
  2749		err = mlx4_en_try_alloc_resources(priv, tmp, &new_prof);
  2750		if (err) {
  2751			if (prog)
> 2752				bpf_prog_add_undo(prog, priv->rx_ring_num - 1);
  2753			goto unlock_out;
  2754		}
  2755	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28646 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 1/2] bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path
From: Daniel Borkmann @ 2016-11-09 11:04 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Zhiyi Sun, bblanco, Tariq Toukan, Yishai Hadas,
	netdev, linux-rdma, linux-kernel, alexei.starovoitov
In-Reply-To: <201611091853.HAp072gP%fengguang.wu@intel.com>

On 11/09/2016 11:58 AM, kbuild test robot wrote:
[...]
> All errors (new ones prefixed by >>):
>
>     drivers/net/ethernet/mellanox/mlx4/en_netdev.c: In function 'mlx4_xdp_set':
>>> drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2752:4: error: implicit declaration of function 'bpf_prog_add_undo' [-Werror=implicit-function-declaration]
>         bpf_prog_add_undo(prog, priv->rx_ring_num - 1);
>         ^~~~~~~~~~~~~~~~~
>     cc1: some warnings being treated as errors

Ahh right, needs an empty variant for !CONFIG_BPF_SYSCALL. I'll fix that up
before sending an official patch.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH rdma-core 1/7] libhns: Add initial main frame
From: oulijun @ 2016-11-09 13:10 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161108125441.GB27883-2ukJVAZIZ/Y@public.gmane.org>

在 2016/11/8 20:54, Leon Romanovsky 写道:
> On Mon, Nov 07, 2016 at 04:15:32PM -0700, Jason Gunthorpe wrote:
>> On Sat, Oct 29, 2016 at 09:16:25AM +0800, oulijun wrote:
>>
>>> We hope that the only one userspace library file named
>>> libhns-rdmav2.so will be used for the different hardware
>>> version(hip06, hip07, ...), because there are only little change
>>> between their userspace drivers. So we need to distinguish hardware
>>> version.
>>
>> I guess that makes sense, but you still need to be able to parse dt
>> compatible strings that are lists.
> 
> IMHO, it can be easily done as follow up patches.
> 
Hi, Leon & Jason
   We hope that the only one userspace library file named libhns-rdmav2.so will be used for the different hardware version(hip06, hip07, ...),
because there are only little change between their userspace drivers. So we need to distinguish hardware version.
We can't distinguish them if only matching driver name "hns_roce".

It will be matched it when appeared the second hard version, the code will be fixed as follows:
 firstly, we will add a hca_table structure:
 static struct {
	unsigned int		vendor;
	unsigned int		device;
	void			*data;
	int			version;
} hca_table[] = {
	{PCI_VENDOR_ID_HISILICON, 0xA223, &hns_roce_u_hw_v2, HNS_ROCE_HW_VER2},
	{PCI_VENDOR_ID_HISILICON, 0xA224, &hns_roce_u_hw_v2, HNS_ROCE_HW_VER2},
	{PCI_VENDOR_ID_HISILICON, 0xA225, &hns_roce_u_hw_v2, HNS_ROCE_HW_VER2},
	{PCI_VENDOR_ID_HISILICON, 0xA226, &hns_roce_u_hw_v2, HNS_ROCE_HW_VER2},
	{PCI_VENDOR_ID_HISILICON, 0xA227, &hns_roce_u_hw_v2, HNS_ROCE_HW_VER2},
	{PCI_VENDOR_ID_HISILICON, 0xA22F, &hns_roce_u_hw_v2, HNS_ROCE_HW_VER2},
};

second, we will distinguish with it by hca_table[]:
   if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
				value, sizeof(value)) > 0)
		for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
			if (!strcmp(value, acpi_table[i].hid)) {
				u_hw = acpi_table[i].data;
				hw_version = acpi_table[i].version;
				goto found;
			}

	if (ibv_read_sysfs_file(uverbs_sys_path, "device/of_node/compatible",
				value, sizeof(value)) > 0)
		for (i = 0; i < sizeof(dt_table) / sizeof(dt_table[0]); ++i)
			if (!strcmp(value, dt_table[i].compatible)) {
				u_hw = dt_table[i].data;
				hw_version = dt_table[i].version;
				goto found;
			}

	if (ibv_read_sysfs_file(uverbs_sys_path, "device/device", value,
				sizeof(value)) < 0)
		return NULL;

	sscanf(value, "%i", &vendor);

	if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor", value,
				sizeof(value)) < 0)
		return NULL;

	sscanf(value, "%i", &vendor);

	for (i = 0; i < sizeof(hca_table) / sizeof(hca_table[0]); ++i)
		if (vendor == hca_table[i].vendor &&
		    device == hca_table[i].device)
			goto found;

for using the path "device/of_node/compatible" when startup by DT method with hip06,
the content of compatible can only match with the device id and name in the hns-roce.ko:
static const struct of_device_id hns_roce_of_match[] = {
	{ .compatible = "hisilicon,hns-roce-v1", .data = &hns_roce_hw_v1, },
	{},
};
hence, we think that it will be distinguished by found the string("hisilicon, hns-roce-v1") in
/../../../device/of_node/compatible

When the userspace library of hns support hip07 or hip 08, the so file is still libhns-rdmav2.so and
will be used simultaneously for hip06 and hip07 or hip08 etc.

Lijun Ou
>>
>> Jason
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 9/9] selinux: Add a cache for quicker retreival of PKey SIDs
From: Daniel Jurgens @ 2016-11-09 14:03 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: chrisw-69jw2NvuJkxg9hUCZPvPmw@public.gmane.org,
	paul-r2n+y4ga6xFZroRs9YW3xA@public.gmane.org,
	sds-+05T5uksL2qpZYMLLGbcSA@public.gmane.org,
	eparis-FjpueFixGhCM4zKIHC2jIg@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	selinux-+05T5uksL2qpZYMLLGbcSA@public.gmane.org,
	linux-security-module-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Yevgeny Petrilin, Liran Liss
In-Reply-To: <20161109070455.GF27883@leon.nu>

On 11/9/2016 1:05 AM, Leon Romanovsky wrote:
> On Tue, Nov 08, 2016 at 11:06:25PM +0200, Dan Jurgens wrote:
>> From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> It is likely that the SID for the same PKey will be requested many
>> times. To reduce the time to modify QPs and process MADs use a cache to
>> store PKey SIDs.
>>
>> This code is heavily based on the "netif" and "netport" concept
>> originally developed by James Morris <jmorris-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> and Paul Moore
>> <paul-r2n+y4ga6xFZroRs9YW3xA@public.gmane.org> (see security/selinux/netif.c and
>> security/selinux/netport.c for more information)
>>
>> issue: 736423
>> Change-Id: I176c3079d5d84d06839b4f750100ac47a6081e94
> It doesn't belong to commit message.
>
>> Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Yes, sorry silly oversight on my part.  I will address for all patches in v5.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* wireshark's RPC-over-RDMA dissector
From: Chuck Lever @ 2016-11-09 16:05 UTC (permalink / raw)
  To: Linux NFS Mailing List, List Linux RDMA Mailing

Hi-

Thanks to Yan Berman, for a couple of years now we've had a basic
RPC-over-RDMA dissector in wireshark that can be used with ibdump
captures. There have been some bugs noted, but no-one has had the
cycles to dig in and address.

Recently Tom Haynes helped me set up my own wireshark build so I
could help address some of the known issues.

http://git.linux-nfs.org/?p=cel/wireshark.git;a=shortlog;h=refs/heads/rpc-rdma-fixes

Posting here for review before I take the next steps to push these
to the wireshark community. Constructive critique and other
suggestions are welcome.

The fixes so far focus on dissecting transport headers correctly.
There continue to be significant open issues with the dissector:
	• There does not appear to be any support for dissecting
	  RPC-over-RDMA on iWARP or RoCE
	• The NFS dissector does not handle portions of the XDR
	  stream that were transmitted via RDMA Read/Write
	• RPC messages conveyed via RDMA_NOMSG are not recognized
	  or dissected
	• There is no association between RDMA Reads and Writes
	  and the RPC-over-RDMA message they go with
	• A CREQ / CREP pair are needed to identify which QP
	  numbers are used for RPC-over-RDMA traffic
	• With TCP, the dissector fully outdents the RPC and NFS
	  dissection results; but with RDMA, the dissector places
	  the results in the tree under the Infiniband header
	• Not enough error detection in the dissector

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: wireshark's RPC-over-RDMA dissector
From: Parav Pandit @ 2016-11-09 16:41 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List, List Linux RDMA Mailing
In-Reply-To: <2975B49B-2696-4E2A-B9D0-2D6CB607EC59-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Hi Chuck,

Just FYI.
I have committed few fixes in wireshark trunk for RoCE and IB
dissectors which will in general benefit other ULPs as well (primary
for statefulness of ULPs) last month.

I have few patches pending in my sandbox in area of RoCE and for other
ULP that I will push in coming days, likely next week.
I am currently actively testing them and have made steady progress so far.

I will try to find sometime to review them next week.

Regards,
Parav Pandit


On Wed, Nov 9, 2016 at 9:35 PM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> Hi-
>
> Thanks to Yan Berman, for a couple of years now we've had a basic
> RPC-over-RDMA dissector in wireshark that can be used with ibdump
> captures. There have been some bugs noted, but no-one has had the
> cycles to dig in and address.
>
> Recently Tom Haynes helped me set up my own wireshark build so I
> could help address some of the known issues.
>
> http://git.linux-nfs.org/?p=cel/wireshark.git;a=shortlog;h=refs/heads/rpc-rdma-fixes
>
> Posting here for review before I take the next steps to push these
> to the wireshark community. Constructive critique and other
> suggestions are welcome.
>
> The fixes so far focus on dissecting transport headers correctly.
> There continue to be significant open issues with the dissector:
>         • There does not appear to be any support for dissecting
>           RPC-over-RDMA on iWARP or RoCE
>         • The NFS dissector does not handle portions of the XDR
>           stream that were transmitted via RDMA Read/Write
>         • RPC messages conveyed via RDMA_NOMSG are not recognized
>           or dissected
>         • There is no association between RDMA Reads and Writes
>           and the RPC-over-RDMA message they go with
>         • A CREQ / CREP pair are needed to identify which QP
>           numbers are used for RPC-over-RDMA traffic
>         • With TCP, the dissector fully outdents the RPC and NFS
>           dissection results; but with RDMA, the dissector places
>           the results in the tree under the Infiniband header
>         • Not enough error detection in the dissector
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net/mlx4_en: Fix bpf_prog_add ref_cnt in mlx4
From: Brenden Blanco @ 2016-11-09 17:06 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Zhiyi Sun, Tariq Toukan, Yishai Hadas, netdev, linux-rdma,
	linux-kernel, alexei.starovoitov
In-Reply-To: <5822F30C.1050900@iogearbox.net>

On Wed, Nov 09, 2016 at 10:57:32AM +0100, Daniel Borkmann wrote:
> On 11/09/2016 10:45 AM, Zhiyi Sun wrote:
> >On Wed, Nov 09, 2016 at 10:05:31AM +0100, Daniel Borkmann wrote:
> >>On 11/09/2016 08:35 AM, Zhiyi Sun wrote:
> >>>There are rx_ring_num queues. Each queue will load xdp prog. So
> >>>bpf_prog_add() should add rx_ring_num to ref_cnt.
> >>>
> >>>Signed-off-by: Zhiyi Sun <zhiyisun@gmail.com>
> >>
> >>Your analysis looks incorrect to me. Please elaborate in more detail why
> >>you think current code is buggy ...
> >
> >Yes, you are correct. My patch is incorrect. It is not a bug.
> >
> >>Call path is dev_change_xdp_fd(), which does bpf_prog_get_type() on the
> >>fd. This already takes a ref and only drops it in case of error. Thus
> >>in mlx4_xdp_set(), you only need priv->rx_ring_num - 1 refs for the rest
> >>of the rings, so that dropping refs from old_prog makes sure we release
> >>it again. Looks correct to me (maybe a comment would have helped there).
> >
> >I thought mlx4's code is incorrect because in mlx5's driver, function
> >mlx5e_xdp_set() calls a pair of bpf_prog_add/put, the number of add and
> >put to the refs are same. I didn't notice that one "add" has been called in its
> >calller. So, it seems that mlx5's code is incorrect, right?
> 
> Yep, I think the two attached patches are needed.
> 
> The other thing I noticed in mlx5e_create_rq() is that it calls
> bpf_prog_add(rq->xdp_prog, 1) without actually checking for errors.

> From d2bd6b3cd8636716a06b0ea3b1e041e16f87cce0 Mon Sep 17 00:00:00 2001
> Message-Id: <d2bd6b3cd8636716a06b0ea3b1e041e16f87cce0.1478685278.git.daniel@iogearbox.net>
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Wed, 9 Nov 2016 10:31:19 +0100
> Subject: [PATCH net-next 1/2] bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path
> 
> Commit 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings
> scheme") added a bug in that the prog's reference count is not dropped
> in the error path when mlx4_en_try_alloc_resources() is failing.
> 
> We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we
> need to release again. Earlier in the call-path, dev_change_xdp_fd()
> itself holds a ref to the prog as well, which is then released though
> bpf_prog_put() due to the propagated error.
> 
> Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  5 ++++-
>  include/linux/bpf.h                            |  1 +
>  kernel/bpf/syscall.c                           | 11 +++++++++++
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 0f6225c..4104aec 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2747,8 +2747,11 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  	}
>  
>  	err = mlx4_en_try_alloc_resources(priv, tmp, &new_prof);
> -	if (err)
> +	if (err) {
> +		if (prog)
> +			bpf_prog_add_undo(prog, priv->rx_ring_num - 1);
Why not just move the above bpf_prog_add to be below the try_alloc?
Nobody needs those references until all of the resources have been
allocated, and then we can remove the need for bpf_prog_add_undo.
>  		goto unlock_out;
> +	}
>  
>  	if (priv->port_up) {
>  		port_up = 1;
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index edcd96d..4f6a4f1 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -234,6 +234,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
>  struct bpf_prog *bpf_prog_get(u32 ufd);
>  struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type);
>  struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i);
> +void bpf_prog_add_undo(struct bpf_prog *prog, int i);
>  struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog);
>  void bpf_prog_put(struct bpf_prog *prog);
>  
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 228f962..a6e4dd8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -680,6 +680,17 @@ struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
>  }
>  EXPORT_SYMBOL_GPL(bpf_prog_add);
>  
> +void bpf_prog_add_undo(struct bpf_prog *prog, int i)
> +{
> +	/* Only to be used for undoing previous bpf_prog_add() in some
> +	 * error path. We still know that another entity in our call
> +	 * path holds a reference to the program, thus atomic_sub() can
> +	 * be safely used here!
> +	 */
> +	atomic_sub(i, &prog->aux->refcnt);
> +}
> +EXPORT_SYMBOL_GPL(bpf_prog_add_undo);
> +
>  struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
>  {
>  	return bpf_prog_add(prog, 1);
> -- 
> 1.9.3
> 

> From f0789544432bbb89c53c3b8ac6575d48fed97786 Mon Sep 17 00:00:00 2001
> Message-Id: <f0789544432bbb89c53c3b8ac6575d48fed97786.1478685278.git.daniel@iogearbox.net>
> In-Reply-To: <d2bd6b3cd8636716a06b0ea3b1e041e16f87cce0.1478685278.git.daniel@iogearbox.net>
> References: <d2bd6b3cd8636716a06b0ea3b1e041e16f87cce0.1478685278.git.daniel@iogearbox.net>
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Wed, 9 Nov 2016 10:51:26 +0100
> Subject: [PATCH net-next 2/2] bpf, mlx5: fix prog refcount in mlx5e_xdp_set
> 
> dev_change_xdp_fd() already holds a reference, so bpf_prog_add(prog, 1)
> is not correct as it takes one reference too much and will thus leak
> the prog eventually. Also, bpf_prog_add() can fail and is not checked
> for errors here.
> 
> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index ba0c774..63309dd 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3121,8 +3121,6 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
>  
>  	/* exchange programs */
>  	old_prog = xchg(&priv->xdp_prog, prog);
> -	if (prog)
> -		bpf_prog_add(prog, 1);
There is also another use of bpf_prog_add down below, which does not
check the error return. Same in mlx5e_create_rq.
>  	if (old_prog)
>  		bpf_prog_put(old_prog);
>  
> -- 
> 1.9.3
> 

^ permalink raw reply

* RE: [PATCH rdma-next 0/4] Add packet pacing support for IB verbs
From: Hefty, Sean @ 2016-11-09 17:06 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161109064009.GE27883-2ukJVAZIZ/Y@public.gmane.org>

> On Tue, Nov 08, 2016 at 05:49:26PM +0000, Hefty, Sean wrote:
> > > When sending from a 10G host to a 1G host, it is easy to overrun
> the
> > > receiver,
> > > leading to packet loss and traffic backing off. Similar problems
> occur
> > > when
> > > a 10G host sends data to a sub-10G virtual circuit, or a 40G host
> > > sending
> > > to a 10G host. Packet pacing could control packet injection rate
> and
> > > reduces
> > > network congestion to maximize throughput & minimize network
> latency.
> >
> > Why isn't the path record data and existing mechanisms sufficient to
> handle this?
> >
> 
> Packet pacing allows different combinations of traffic shaping: per-
> CPU,
> per-flow and their combinations with better and steady QoS requirements
> without involving subnet management.

The patch adds this as a QP attribute, and we already have a rate for that.  I still don't see why the standard mechanisms are insufficient or couldn't be adapted.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH rdma-next 2/4] IB/core: Support rate limit for packet pacing
From: Hefty, Sean @ 2016-11-09 17:27 UTC (permalink / raw)
  To: Leon Romanovsky, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Bodong Wang
In-Reply-To: <1477909297-14491-3-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

>  enum ib_qp_state {
> @@ -1151,6 +1152,7 @@ struct ib_qp_attr {
>  	u8			rnr_retry;
>  	u8			alt_port_num;
>  	u8			alt_timeout;
> +	u32			rate_limit;
>  };

We already have ib_qp_attr::ib_ah_attr::static_rate, and that accounts for both the primary and alternate paths.  We should not add a conflicting rate_limit field.  Either use static_rate as defined by the spec, or replace/update it, with corresponding changes to how it is used in conjunction with SM data.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v1 0/7] server-side NFS/RDMA patches proposed for v4.10
From: Chuck Lever @ 2016-11-09 17:33 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

I'd like to propose these server-side changes for v4.10. They
include:

- Drop connection on GSS sequence window overflow
- Remove unnecessary spin lock in the svc_rdma_send path
- A number of minor clean-ups

Available in the "nfsd-rdma-for-4.10" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git


Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.10


Meanwhile, I've been working on converting the server-side RPC/RDMA
transport to use the new generic R/W API. The prototype for the
svc_rdma_sendto path works for some forms of the transport header,
but still has a few bugs. The svc_rdma_recvfrom path will be next,
but is an even larger task.

When this work is further along I will publish a topic branch.

---

Chuck Lever (7):
      svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
      svcauth_gss: Close connection when dropping an incoming message
      svcrdma: Renovate sendto chunk list parsing
      svcrdma: Remove BH-disabled spin locking in svc_rdma_send()
      svcrdma: Remove DMA map accounting
      svcrdma: Remove svc_rdma_op_ctxt::wc_status
      svcrdma: Break up dprintk format in svc_rdma_accept()


 include/linux/sunrpc/svc_rdma.h            |    7 --
 net/sunrpc/auth_gss/svcauth_gss.c          |    2 -
 net/sunrpc/svc.c                           |   14 +++-
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    1 
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   19 +++++
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |   99 +++++++++-------------------
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   87 ++++++++-----------------
 7 files changed, 87 insertions(+), 142 deletions(-)

--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox