Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
From: Jakub Kicinski @ 2019-08-15 19:22 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, netdev, bpf, William Tu
In-Reply-To: <20190815152100.GN2820@mini-arch>

On Thu, 15 Aug 2019 08:21:00 -0700, Stanislav Fomichev wrote:
> On 08/15, Toshiaki Makita wrote:
> > On 2019/08/15 2:07, Stanislav Fomichev wrote:  
> > > On 08/13, Toshiaki Makita wrote:  
> > > > * Implementation
> > > > 
> > > > xdp_flow makes use of UMH to load an eBPF program for XDP, similar to
> > > > bpfilter. The difference is that xdp_flow does not generate the eBPF
> > > > program dynamically but a prebuilt program is embedded in UMH. This is
> > > > mainly because flow insertion is considerably frequent. If we generate
> > > > and load an eBPF program on each insertion of a flow, the latency of the
> > > > first packet of ping in above test will incease, which I want to avoid.  
> > > Can this be instead implemented with a new hook that will be called
> > > for TC events? This hook can write to perf event buffer and control
> > > plane will insert/remove/modify flow tables in the BPF maps (contol
> > > plane will also install xdp program).
> > > 
> > > Why do we need UMH? What am I missing?  
> > 
> > So you suggest doing everything in xdp_flow kmod?  
> You probably don't even need xdp_flow kmod. Add new tc "offload" mode
> (bypass) that dumps every command via netlink (or calls the BPF hook
> where you can dump it into perf event buffer) and then read that info
> from userspace and install xdp programs and modify flow tables.
> I don't think you need any kernel changes besides that stream
> of data from the kernel about qdisc/tc flow creation/removal/etc.

There's a certain allure in bringing the in-kernel BPF translation
infrastructure forward. OTOH from system architecture perspective IMHO
it does seem like a task best handed in user space. bpfilter can replace
iptables completely, here we're looking at an acceleration relatively
loosely coupled with flower.

FWIW Quentin spent some time working on a universal flow rule to BPF
translation library:

https://github.com/Netronome/libkefir

A lot remains to be done there, but flower front end is one of the
targets. A library can be tuned for any application, without a
dependency on flower uAPI.

> But, I haven't looked at the series deeply, so I might be missing
> something :-)

I don't think you are :)

^ permalink raw reply

* Re: [PATCH net-next] page_pool: fix logic in __page_pool_get_cached
From: David Miller @ 2019-08-15 19:28 UTC (permalink / raw)
  To: jonathan.lemon
  Cc: netdev, brouer, ilias.apalodimas, saeedm, ttoukan.linux,
	kernel-team
In-Reply-To: <20190813174509.494723-1-jonathan.lemon@gmail.com>

From: Jonathan Lemon <jonathan.lemon@gmail.com>
Date: Tue, 13 Aug 2019 10:45:09 -0700

> __page_pool_get_cached() will return NULL when the ring is
> empty, even if there are pages present in the lookaside cache.
> 
> It is also possible to refill the cache, and then return a
> NULL page.
> 
> Restructure the logic so eliminate both cases.
> 
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Jakub Kicinski @ 2019-08-15 19:28 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: magnus.karlsson, bjorn.topel, netdev, bpf, intel-wired-lan,
	maciej.fijalkowski, tom.herbert
In-Reply-To: <1565840783-8269-1-git-send-email-sridhar.samudrala@intel.com>

On Wed, 14 Aug 2019 20:46:18 -0700, Sridhar Samudrala wrote:
> This patch series introduces XDP_SKIP_BPF flag that can be specified
> during the bind() call of an AF_XDP socket to skip calling the BPF 
> program in the receive path and pass the buffer directly to the socket.
> 
> When a single AF_XDP socket is associated with a queue and a HW
> filter is used to redirect the packets and the app is interested in
> receiving all the packets on that queue, we don't need an additional 
> BPF program to do further filtering or lookup/redirect to a socket.
> 
> Here are some performance numbers collected on 
>   - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>   - Intel 40Gb Ethernet NIC (i40e)
> 
> All tests use 2 cores and the results are in Mpps.
> 
> turbo on (default)
> ---------------------------------------------	
>                       no-skip-bpf    skip-bpf
> ---------------------------------------------	
> rxdrop zerocopy           21.9         38.5 
> l2fwd  zerocopy           17.0         20.5
> rxdrop copy               11.1         13.3
> l2fwd  copy                1.9          2.0
> 
> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> ---------------------------------------------	
>                       no-skip-bpf    skip-bpf
> ---------------------------------------------	
> rxdrop zerocopy           15.4         29.0
> l2fwd  zerocopy           11.8         18.2
> rxdrop copy                8.2         10.5
> l2fwd  copy                1.7          1.7
> ---------------------------------------------	

Could you include a third column here - namely the in-XDP performance?
AFAIU the way to achieve better performance with AF_XDP is to move the
fast path into the kernel's XDP program..

Maciej's work on batching XDP program's execution should lower the
retpoline overhead, without leaning close to the bypass model.

^ permalink raw reply

* Re: [PATCH] net/ncsi: Ensure 32-bit boundary for data cksum
From: David Miller @ 2019-08-15 19:32 UTC (permalink / raw)
  To: terry.s.duncan; +Cc: sam, netdev, linux-kernel, openbmc, wak, joel
In-Reply-To: <20190814011840.9387-1-terry.s.duncan@linux.intel.com>

From: "Terry S. Duncan" <terry.s.duncan@linux.intel.com>
Date: Tue, 13 Aug 2019 18:18:40 -0700

> The NCSI spec indicates that if the data does not end on a 32 bit
> boundary, one to three padding bytes equal to 0x00 shall be present to
> align the checksum field to a 32-bit boundary.
> 
> Signed-off-by: Terry S. Duncan <terry.s.duncan@linux.intel.com>
> ---
>  net/ncsi/internal.h |  1 +
>  net/ncsi/ncsi-cmd.c |  2 +-
>  net/ncsi/ncsi-rsp.c | 12 ++++++++----
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
> index 0b3f0673e1a2..468a19fdfd88 100644
> --- a/net/ncsi/internal.h
> +++ b/net/ncsi/internal.h
> @@ -185,6 +185,7 @@ struct ncsi_package;
>  #define NCSI_TO_CHANNEL(p, c)	(((p) << NCSI_PACKAGE_SHIFT) | (c))
>  #define NCSI_MAX_PACKAGE	8
>  #define NCSI_MAX_CHANNEL	32
> +#define NCSI_ROUND32(x)		(((x) + 3) & ~3) /* Round to 32 bit boundary */

I think we have enough of a proliferation of alignment macros, let's not add more.

Either define this to "ALIGN(x, 4)" or expand that into each of the locations:

>  	pchecksum = (__be32 *)((void *)h + sizeof(struct ncsi_pkt_hdr) +
> -		    nca->payload);
> +		    NCSI_ROUND32(nca->payload));

		     ALIGN(nca->payload, 4)

> -	pchecksum = (__be32 *)((void *)(h + 1) + payload - 4);
> +	pchecksum = (__be32 *)((void *)(h + 1) + NCSI_ROUND32(payload) - 4);

						 ALIGN(payload, 4)


etc.

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
From: Jonathan Lemon @ 2019-08-15 19:32 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko
In-Reply-To: <20190815191456.GA11699@khorivan>



On 15 Aug 2019, at 12:19, Ivan Khoronzhuk wrote:

> On Thu, Aug 15, 2019 at 11:23:16AM -0700, Jonathan Lemon wrote:
>> On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:
>>
>>> For 64-bit there is no reason to use vmap/vunmap, so use 
>>> page_address
>>> as it was initially. For 32 bits, in some apps, like in samples
>>> xdpsock_user.c when number of pgs in use is quite big, the kmap
>>> memory can be not enough, despite on this, kmap looks like is
>>> deprecated in such cases as it can block and should be used rather
>>> for dynamic mm.
>>>
>>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>>> ---
>>> net/xdp/xdp_umem.c | 36 ++++++++++++++++++++++++++++++------
>>> 1 file changed, 30 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
>>> index a0607969f8c0..d740c4f8810c 100644
>>> --- a/net/xdp/xdp_umem.c
>>> +++ b/net/xdp/xdp_umem.c
>>> @@ -14,7 +14,7 @@
>>> #include <linux/netdevice.h>
>>> #include <linux/rtnetlink.h>
>>> #include <linux/idr.h>
>>> -#include <linux/highmem.h>
>>> +#include <linux/vmalloc.h>
>>>
>>> #include "xdp_umem.h"
>>> #include "xsk_queue.h"
>>> @@ -170,7 +170,30 @@ static void xdp_umem_unmap_pages(struct 
>>> xdp_umem *umem)
>>> 	unsigned int i;
>>>
>>> 	for (i = 0; i < umem->npgs; i++)
>>> -		kunmap(umem->pgs[i]);
>>> +		if (PageHighMem(umem->pgs[i]))
>>> +			vunmap(umem->pages[i].addr);
>>> +}
>>> +
>>> +static int xdp_umem_map_pages(struct xdp_umem *umem)
>>> +{
>>> +	unsigned int i;
>>> +	void *addr;
>>> +
>>> +	for (i = 0; i < umem->npgs; i++) {
>>> +		if (PageHighMem(umem->pgs[i]))
>>> +			addr = vmap(&umem->pgs[i], 1, VM_MAP, PAGE_KERNEL);
>>> +		else
>>> +			addr = page_address(umem->pgs[i]);
>>> +
>>> +		if (!addr) {
>>> +			xdp_umem_unmap_pages(umem);
>>> +			return -ENOMEM;
>>> +		}
>>> +
>>> +		umem->pages[i].addr = addr;
>>> +	}
>>> +
>>> +	return 0;
>>> }
>>
>> You'll want a __xdp_umem_unmap_pages() helper here that takes an
>> count of the number of pages to unmap, so it can be called from
>> xdp_umem_unmap_pages() in the normal case, and xdp_umem_map_pages()
>> in the error case.  Otherwise the error case ends up calling
>> PageHighMem on a null page.
>> -- 
>> Jonathan
>
> Do you mean null address?
> If so, then vunmap do nothing if it's null, and addr is null if it's 
> not
> assigned... and it's not assigned w/o correct mapping...
>
> If you mean null page, then it is not possible after all they are
> pinned above, here: xdp_umem_pin_pages(), thus assigned.
>
> Or I missed smth?

No - I forgot about umem_pin_pages() - feel free to ignore my comments.
--
Jonathan

>
> Despite of this, seems like here should be one more patch, adding 
> unpinning page
> in error path, but this not related to this change. Will do this in 
> follow up
> fix patch, if no objection to my explanation, ofc.
>
> -- 
> Regards,
> Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
From: Jason Gunthorpe @ 2019-08-15 19:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel,
	linux-mm
In-Reply-To: <74838e61-3a5e-0f51-2092-f4a16d144b45@redhat.com>

On Thu, Aug 15, 2019 at 11:26:46AM +0800, Jason Wang wrote:
> 
> On 2019/8/13 下午7:57, Jason Gunthorpe wrote:
> > On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
> > 
> > > What kind of issues do you see? Spinlock is to synchronize GUP with MMU
> > > notifier in this series.
> > A GUP that can't sleep can't pagefault which makes it a really weird
> > pattern
> 
> 
> My understanding is __get_user_pages_fast() assumes caller can fail or have
> fallback. And we have graceful fallback to copy_{to|from}_user().

My point is that if you can fall back to copy_user then it is weird to
call the special non-sleeping GUP under a spinlock.

AFAIK the only reason this is done is because of the way the notifier
is being locked...

Jason

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
From: Jonathan Lemon @ 2019-08-15 19:33 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko
In-Reply-To: <20190815121356.8848-3-ivan.khoronzhuk@linaro.org>



On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:

> For 64-bit there is no reason to use vmap/vunmap, so use page_address
> as it was initially. For 32 bits, in some apps, like in samples
> xdpsock_user.c when number of pgs in use is quite big, the kmap
> memory can be not enough, despite on this, kmap looks like is
> deprecated in such cases as it can block and should be used rather
> for dynamic mm.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply

* Re: [PATCH] net: pch_gbe: Fix memory leaks
From: David Miller @ 2019-08-15 19:34 UTC (permalink / raw)
  To: wenwen
  Cc: rfontana, allison, alexios.zavras, gregkh, tglx, netdev,
	linux-kernel
In-Reply-To: <1565746427-5366-1-git-send-email-wenwen@cs.uga.edu>

From: Wenwen Wang <wenwen@cs.uga.edu>
Date: Tue, 13 Aug 2019 20:33:45 -0500

> In pch_gbe_set_ringparam(), if netif_running() returns false, 'tx_old' and
> 'rx_old' are not deallocated, leading to memory leaks. To fix this issue,
> move the free statements after the if branch.
> 
> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>

Why would they be "deallocated"?  They are still assigned to
adapter->tx_ring and adapter->rx_ring.

^ permalink raw reply

* Re: [PATCH] liquidio: add cleanup in octeon_setup_iq()
From: David Miller @ 2019-08-15 19:37 UTC (permalink / raw)
  To: wenwen; +Cc: dchickles, sburla, fmanlunas, netdev, linux-kernel
In-Reply-To: <1565759689-5941-1-git-send-email-wenwen@cs.uga.edu>

From: Wenwen Wang <wenwen@cs.uga.edu>
Date: Wed, 14 Aug 2019 00:14:49 -0500

> If oct->fn_list.enable_io_queues() fails, no cleanup is executed, leading
> to memory/resource leaks. To fix this issue, invoke
> octeon_delete_instr_queue() before returning from the function.
> 
> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>

Applied.

^ permalink raw reply

* Re: [PATCH 00/14] ARM: move lpc32xx and dove to multiplatform
From: Arnd Bergmann @ 2019-08-15 19:38 UTC (permalink / raw)
  To: Sylvain Lemieux
  Cc: Russell King - ARM Linux admin, SoC Team, Linux ARM,
	Vladimir Zapolskiy, Gregory Clement, Linus Walleij, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, David S. Miller,
	Greg Kroah-Hartman, Alan Stern, Guenter Roeck,
	open list:GPIO SUBSYSTEM, Networking, linux-serial, USB list,
	LINUXWATCHDOG
In-Reply-To: <b43c3d60-b675-442c-c549-25530cfbffe3@gmail.com>

On Thu, Aug 15, 2019 at 8:32 PM Sylvain Lemieux <slemieux.tyco@gmail.com> wrote:
> On 8/15/19 9:11 AM, Arnd Bergmann wrote:
> > On Thu, Aug 1, 2019 at 9:33 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > I applied patches 12 and 13 into the soc tree now. There are some
> > other pending multiplatform conversions (iop32x, ep93xx, lpc32xx,
> > omap1), but it looks like none of those will be complete for 5.4.
>
> I think the patchset (v2) for the LPC32xx is ready for 5.4
> ([PATCH v2 00/13] v2: ARM: move lpc32xx to multiplatform)

Good point. I've merged these into the arm/soc branch now.

     Arnd

^ permalink raw reply

* Re: [net PATCH] net: tls, fix sk_write_space NULL write when tx disabled
From: David Miller @ 2019-08-15 19:41 UTC (permalink / raw)
  To: john.fastabend; +Cc: jakub.kicinski, ying.xue, netdev, andreyknvl
In-Reply-To: <156576071416.1402.5907777786031481705.stgit@ubuntu3-kvm1>

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 14 Aug 2019 05:31:54 +0000

> The ctx->sk_write_space pointer is only set when TLS tx mode is enabled.
> When running without TX mode its a null pointer but we still set the
> sk sk_write_space pointer on close().
> 
> Fix the close path to only overwrite sk->sk_write_space when the current
> pointer is to the tls_write_space function indicating the tls module should
> clean it up properly as well.
> 
> Reported-by: Hillf Danton <hdanton@sina.com>
> Cc: Ying Xue <ying.xue@windriver.com>
> Cc: Andrey Konovalov <andreyknvl@google.com>
> Fixes: 57c722e932cfb ("net/tls: swap sk_write_space on close")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Applied, thanks John.

That Fixes tag takes one through an interesting chain of fixes to fixes.
I'll queue this up for -stable and make sure it all ends up in the
proper place.

Thanks.

^ permalink raw reply

* Re: pull-request: can-next 2019-08-14,pull-request: can-next 2019-08-14
From: David Miller @ 2019-08-15 19:43 UTC (permalink / raw)
  To: mkl; +Cc: netdev, kernel, linux-can
In-Reply-To: <f0658ccd-389f-fc60-7538-c512112b9978@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 14 Aug 2019 08:16:24 +0200

> this is a pull request for net-next/master consisting of 41 patches.
 ...

Pulled, thanks Marc.

^ permalink raw reply

* Re: [PATCH] net: myri10ge: fix memory leaks
From: David Miller @ 2019-08-15 19:45 UTC (permalink / raw)
  To: wenwen; +Cc: christopher.lee, netdev, linux-kernel
In-Reply-To: <1565764719-6488-1-git-send-email-wenwen@cs.uga.edu>

From: Wenwen Wang <wenwen@cs.uga.edu>
Date: Wed, 14 Aug 2019 01:38:39 -0500

> In myri10ge_probe(), myri10ge_alloc_slices() is invoked to allocate slices
> related structures. Later on, myri10ge_request_irq() is used to get an irq.
> However, if this process fails, the allocated slices related structures are
> not deallocated, leading to memory leaks. To fix this issue, revise the
> target label of the goto statement to 'abort_with_slices'.
> 
> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>

Applied, thanks.

^ permalink raw reply

* [PATCH mlx5-next 0/5] Mellanox, Updates for mlx5-next branch 2019-08-15
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org

Hi All,

This series includes misc updates for mlx5-next shared branch.

mlx5 HW spec and bits updates:
1) Aya exposes IP-in-IP capability in mlx5_core.
2) Maxim exposes lag tx port affinity capabilities.
3) Moshe adds VNIC_ENV internal rq counter bits.

Misc updates:
4) Saeed, two compiler warnings cleanups

In case of no objection this series will be applied to mlx5-next branch
and sent later as pull request to both rdma-next and net-next branches.

Thanks,
Saeed.

---

Aya Levin (1):
  net/mlx5: Expose IP-in-IP capability bit

Maxim Mikityanskiy (1):
  net/mlx5: Add lag_tx_port_affinity capability bit

Moshe Shemesh (1):
  net/mlx5: Add support for VNIC_ENV internal rq counter

Saeed Mahameed (2):
  net/mlx5: Add missing include file to lib/crypto.c
  net/mlx5: Improve functions documentation

 drivers/net/ethernet/mellanox/mlx5/core/eq.c  | 22 +++++++++++--------
 .../ethernet/mellanox/mlx5/core/lib/crypto.c  |  1 +
 include/linux/mlx5/mlx5_ifc.h                 | 18 +++++++++++----
 3 files changed, 28 insertions(+), 13 deletions(-)

-- 
2.21.0

^ permalink raw reply

* [PATCH mlx5-next 1/5] net/mlx5: Add missing include file to lib/crypto.c
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

Add missing include file to avoid compiler warnings:
drivers/net/ethernet/mellanox/mlx5/core//lib/crypto.c:6:5:
warning: no previous prototype for ‘mlx5_create_encryption_key’
    6 | int mlx5_create_encryption_key(struct mlx5_core_dev *mdev,
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core//lib/crypto.c:60:6:
 warning: no previous prototype for ‘mlx5_destroy_encryption_key’
   60 | void mlx5_destroy_encryption_key(struct mlx5_core_dev *mdev, ...

Fixes: 45d3b55dc665 ("net/mlx5: Add crypto library to support create/destroy encryption key")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c
index ea9ee88491e5..22bc45c831d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c
@@ -2,6 +2,7 @@
 // Copyright (c) 2019 Mellanox Technologies.
 
 #include "mlx5_core.h"
+#include "lib/mlx5.h"
 
 int mlx5_create_encryption_key(struct mlx5_core_dev *mdev,
 			       void *key, u32 sz_bytes,
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 4/5] net/mlx5: Expose IP-in-IP capability bit
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, Aya Levin,
	Tariq Toukan
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

From: Aya Levin <ayal@mellanox.com>

Expose Fw indication that it supports Stateless Offloads for IP over IP
tunneled packets. The following offloads are supported for the inner
packets: RSS, RX & TX Checksum Offloads, LSO and Flow Steering.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index c788f895b350..2837fe4d8901 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -808,7 +808,9 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
 	u8         swp_csum[0x1];
 	u8         swp_lso[0x1];
 	u8         cqe_checksum_full[0x1];
-	u8         reserved_at_24[0xc];
+	u8         reserved_at_24[0x5];
+	u8         tunnel_stateless_ip_over_ip[0x1];
+	u8         reserved_at_2a[0x6];
 	u8         max_vxlan_udp_ports[0x8];
 	u8         reserved_at_38[0x6];
 	u8         max_geneve_opt_len[0x1];
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 5/5] net/mlx5: Add lag_tx_port_affinity capability bit
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	Maxim Mikityanskiy, Tariq Toukan
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

From: Maxim Mikityanskiy <maximmi@mellanox.com>

Add the lag_tx_port_affinity HCA capability bit that indicates that
setting port affinity of TISes is supported.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 2837fe4d8901..1e55cf73e88c 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1249,7 +1249,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_263[0x8];
 	u8         log_bf_reg_size[0x5];
 
-	u8         reserved_at_270[0xb];
+	u8         reserved_at_270[0x8];
+	u8         lag_tx_port_affinity[0x1];
+	u8         reserved_at_279[0x2];
 	u8         lag_master[0x1];
 	u8         num_lag_ports[0x4];
 
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 2/5] net/mlx5: Improve functions documentation
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

Fix documentation of mlx5_eq_enable/disable to cleanup compiler warnings.

drivers/net/ethernet/mellanox/mlx5/core//eq.c:334:
warning: Function parameter or member 'dev' not described in 'mlx5_eq_enable'
warning: Function parameter or member 'eq' not described in 'mlx5_eq_enable'
warning: Function parameter or member 'nb' not described in 'mlx5_eq_enable'

drivers/net/ethernet/mellanox/mlx5/core//eq.c:355:
warning: Function parameter or member 'dev' not described in 'mlx5_eq_disable'
warning: Function parameter or member 'eq' not described in 'mlx5_eq_disable'
warning: Function parameter or member 'nb' not described in 'mlx5_eq_disable'

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 22 ++++++++++++--------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 2df9aaa421c6..a0e78ab64618 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -324,10 +324,13 @@ create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 
 /**
  * mlx5_eq_enable - Enable EQ for receiving EQEs
- * @dev - Device which owns the eq
- * @eq - EQ to enable
- * @nb - notifier call block
- * mlx5_eq_enable - must be called after EQ is created in device.
+ * @dev : Device which owns the eq
+ * @eq  : EQ to enable
+ * @nb  : Notifier call block
+ *
+ * Must be called after EQ is created in device.
+ *
+ * @return: 0 if no error
  */
 int mlx5_eq_enable(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 		   struct notifier_block *nb)
@@ -344,11 +347,12 @@ int mlx5_eq_enable(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 EXPORT_SYMBOL(mlx5_eq_enable);
 
 /**
- * mlx5_eq_disable - Enable EQ for receiving EQEs
- * @dev - Device which owns the eq
- * @eq - EQ to disable
- * @nb - notifier call block
- * mlx5_eq_disable - must be called before EQ is destroyed.
+ * mlx5_eq_disable - Disable EQ for receiving EQEs
+ * @dev : Device which owns the eq
+ * @eq  : EQ to disable
+ * @nb  : Notifier call block
+ *
+ * Must be called before EQ is destroyed.
  */
 void mlx5_eq_disable(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 		     struct notifier_block *nb)
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 3/5] net/mlx5: Add support for VNIC_ENV internal rq counter
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, Moshe Shemesh
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

From: Moshe Shemesh <moshe@mellanox.com>

Add mlx5 interface support for reading internal rq out of buffer counter
as part of QUERY_VNIC_ENV command. The command is used by the driver to
query vnic diagnostic statistics from FW.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ab6ae723aae6..c788f895b350 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1116,7 +1116,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         cache_line_128byte[0x1];
 	u8         reserved_at_165[0x4];
 	u8         rts2rts_qp_counters_set_id[0x1];
-	u8         reserved_at_16a[0x5];
+	u8         reserved_at_16a[0x2];
+	u8         vnic_env_int_rq_oob[0x1];
+	u8         reserved_at_16d[0x2];
 	u8         qcam_reg[0x1];
 	u8         gid_table_size[0x10];
 
@@ -2772,7 +2774,11 @@ struct mlx5_ifc_vnic_diagnostic_statistics_bits {
 
 	u8         transmit_discard_vport_down[0x40];
 
-	u8         reserved_at_140[0xec0];
+	u8         reserved_at_140[0xa0];
+
+	u8         internal_rq_out_of_buffer[0x20];
+
+	u8         reserved_at_200[0xe00];
 };
 
 struct mlx5_ifc_traffic_counter_bits {
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Kees Cook @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Song Liu, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Lorenz Bauer, Jann Horn, Greg KH,
	Linux API, LSM List
In-Reply-To: <20190813215823.3sfbakzzjjykyng2@ast-mbp>

On Tue, Aug 13, 2019 at 02:58:25PM -0700, Alexei Starovoitov wrote:
> agree that containers (namespaces) reduce amount of trust necessary
> for apps to run, but the end goal is not security though.

Unsurprisingly, I totally disagree: this is the very definition of
improved "security": reduced attack surface, confined trust, etc.

> Linux has become a single user system.

I hope this is just hyperbole, because it's not true in reality. I agree
that the vast majority of Linux devices are single-user-at-a-time
systems now (rather than the "shell servers" of yore), but the system
still has to be expected to confine users from each other, root, and the
hardware. Switching users on Chrome OS or a distro laptop, etc is still
very much expected to _mean_ something.

> If user can ssh into the host they can become root.
> If arbitrary code can run on the host it will be break out of any sandbox.
> Containers are not providing the level of security that is enough
> to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.

I'm not sure why you draw the line for VMs -- they're just as buggy
as anything else. Regardless, I reject this line of thinking: yes,
all software is buggy, but that isn't a reason to give up. In fact,
we should be trying very hard to create safe code (*insert arguments
for sane languages and toolchains here*).

If you look at software safety as a binary, you will always be
disappointed. If you look at it as it manifests in the real world,
then there is some perspective to be had. Reachability of flaws becomes
a major factor; exploit chain length becomes a factor. There are very
real impacts to be had from security hardening, sandboxing, etc. Of
course nothing is perfect, but the current state of the world isn't
as you describe. (And I say this with the knowledge of how long
the lifetime of bugs are in the kernel.)

> Containers are used to make production systems safer.

Yes.

> Some people call it more 'secure', but it's clearly not secure for
> arbitrary code

Perhaps it's just a language issue. "More secure" and "safer" mean
mostly the same thing to me. I tend to think "safer" is actually
a superset that includes things that wreck the user experience but
aren't actually in the privilege manipulation realm. In the traditional
"security" triad of confidentiality, integrity, and availability, I tend
to weigh availability less highly, but a bug that stops someone from
doing their work but doesn't wreck data, let them switch users, etc,
is still considered a "security" issue by many folks. The fewer bugs
someone is exposed to improves their security, safety, whatever. The
easiest way to do that is confinement and its associated attack surface
reduction. tl;dr: security and safety are very use-case-specific
continuum, not a binary state.

> When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> It's been a constant source of pain. The constant blinding, randomization,
> verifier speculative analysis, all spectre v1, v2, v4 mitigations
> are simply not worth it. It's a lot of complex kernel code without users.
> There is not a single use case to allow arbitrary malicious bpf
> program to be loaded and executed.

The world isn't binary (safe code/malicious code), and we need to build
systems that can be used safely even when things go wrong. Yes, probably
no one has a system that _intentionally_ feeds eBPF into the kernel from
a web form. But there is probably someone who does it unintentionally,
or has a user login exposed on a system where unpriv BPF is enabled. The
point is to create primitives as safely as possible so when things DO
go wrong, they fail safe instead of making things worse.

I'm all for a "less privileged than root" API for eBPF, but I get worried
when I see "security" being treated as a binary state. Especially when
it is considered an always-failed state. :)

-- 
Kees Cook

^ permalink raw reply

* [PATCH] net: cavium: fix driver name
From: Stephen Hemminger @ 2019-08-15 19:49 UTC (permalink / raw)
  To: yuehaibing; +Cc: netdev, Stephen Hemminger

The driver name gets exposed in sysfs under /sys/bus/pci/drivers
so it should look like other devices. Change it to be common
format (instead of "Cavium PTP").

This is a trivial fix that was observed by accident because
Debian kernels were building this driver into kernel (bug).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/ethernet/cavium/common/cavium_ptp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/common/cavium_ptp.c b/drivers/net/ethernet/cavium/common/cavium_ptp.c
index 73632b843749..b821c9e1604c 100644
--- a/drivers/net/ethernet/cavium/common/cavium_ptp.c
+++ b/drivers/net/ethernet/cavium/common/cavium_ptp.c
@@ -10,7 +10,7 @@
 
 #include "cavium_ptp.h"
 
-#define DRV_NAME	"Cavium PTP Driver"
+#define DRV_NAME "cavium_ptp"
 
 #define PCI_DEVICE_ID_CAVIUM_PTP	0xA00C
 #define PCI_DEVICE_ID_CAVIUM_RST	0xA00E
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v4 0/2] qed*: Support for NVM config attributes.
From: David Miller @ 2019-08-15 19:54 UTC (permalink / raw)
  To: skalluru; +Cc: netdev, mkalderon, aelior
In-Reply-To: <20190814081153.18889-1-skalluru@marvell.com>

From: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Date: Wed, 14 Aug 2019 01:11:51 -0700

> The patch series adds support for managing the NVM config attributes.
> Patch (1) adds functionality to update config attributes via MFW.
> Patch (2) adds driver interface for updating the config attributes.
> 
> Changes from previous versions:
> -------------------------------
> v4: Added more details on the functionality and its usage.
> v3: Removed unused variable.
> v2: Removed unused API.
> 
> Please consider applying this series to "net-next".

Series applied.

^ permalink raw reply

* Re: [PATCH] net: pch_gbe: Fix memory leaks
From: Wenwen Wang @ 2019-08-15 20:03 UTC (permalink / raw)
  To: David Miller
  Cc: Richard Fontana, Allison Randal, Alexios Zavras,
	Greg Kroah-Hartman, Thomas Gleixner,
	open list:NETWORKING [GENERAL], open list, Wenwen Wang
In-Reply-To: <20190815.123430.831231953098536795.davem@davemloft.net>

On Thu, Aug 15, 2019 at 3:34 PM David Miller <davem@davemloft.net> wrote:
>
> From: Wenwen Wang <wenwen@cs.uga.edu>
> Date: Tue, 13 Aug 2019 20:33:45 -0500
>
> > In pch_gbe_set_ringparam(), if netif_running() returns false, 'tx_old' and
> > 'rx_old' are not deallocated, leading to memory leaks. To fix this issue,
> > move the free statements after the if branch.
> >
> > Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
>
> Why would they be "deallocated"?  They are still assigned to
> adapter->tx_ring and adapter->rx_ring.

'adapter->tx_ring' and 'adapter->rx_ring' has been covered by newly
allocated 'txdr' and 'rxdr' respectively before this if statement.

Wenwen

^ permalink raw reply

* Re: [PATCH] wimax/i2400m: fix a memory leak bug
From: Wenwen Wang @ 2019-08-15 20:05 UTC (permalink / raw)
  To: Wenwen Wang, Inaky Perez-Gonzalez,
	supporter:INTEL WIRELESS WIMAX CONNECTION 2400, David S. Miller,
	open list:NETWORKING DRIVERS, open list
In-Reply-To: <20190815184505.o7o2ojt7ag4shh7u@oracle.com>

On Thu, Aug 15, 2019 at 2:45 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * Wenwen Wang <wenwen@cs.uga.edu> [190815 14:05]:
> > In i2400m_barker_db_init(), 'options_orig' is allocated through kstrdup()
> > to hold the original command line options. Then, the options are parsed.
> > However, if an error occurs during the parsing process, 'options_orig' is
> > not deallocated, leading to a memory leak bug. To fix this issue, free
> > 'options_orig' before returning the error.
> >
> > Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
> > ---
> >  drivers/net/wimax/i2400m/fw.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/wimax/i2400m/fw.c b/drivers/net/wimax/i2400m/fw.c
> > index e9fc168..6b36f6d 100644
> > --- a/drivers/net/wimax/i2400m/fw.c
> > +++ b/drivers/net/wimax/i2400m/fw.c
> > @@ -342,6 +342,7 @@ int i2400m_barker_db_init(const char *_options)
> >                                      "a 32-bit number\n",
> >                                      __func__, token);
> >                               result = -EINVAL;
> > +                             kfree(options_orig);
> >                               goto error_parse;
> >                       }
> >                       if (barker == 0) {
> > @@ -350,8 +351,10 @@ int i2400m_barker_db_init(const char *_options)
> >                               continue;
> >                       }
> >                       result = i2400m_barker_db_add(barker);
> > -                     if (result < 0)
> > +                     if (result < 0) {
> > +                             kfree(options_orig);
> >                               goto error_add;
>
> I know that you didn't add this error_add label, but it seems like the
> incorrect goto label.  Although looking at the caller indicates an add
> failed, this label is used prior to and after the memory leak you are
> trying to fix.  It might be better to change this label to something
> like error_parse_add and move the kfree to the unwinding.  If a new
> label is used, it becomes more clear as to what is being undone and
> there aren't two jumps into an unwind from two very different stages of
> the function.  Adding a new label also has the benefit of moving the
> kfree to the unwind of error_parse.

Thanks for your suggestion! I will rework the patch.

Wenwen

^ permalink raw reply

* Re: [PATCH bpf] tools: bpftool: close prog FD before exit on showing a single program
From: Andrii Nakryiko @ 2019-08-15 20:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Quentin Monnet, Alexei Starovoitov, Daniel Borkmann, bpf,
	Networking, oss-drivers
In-Reply-To: <20190815110917.657de4e3@cakuba.netronome.com>

On Thu, Aug 15, 2019 at 11:09 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Thu, 15 Aug 2019 11:05:16 -0700, Andrii Nakryiko wrote:
> > > > Would it be better to make show_prog(fd) close provided fd instead or
> > > > is it used in some other context where FD should live longer (I
> > > > haven't checked, sorry)?
> > >
> > > I think it used to close that's how the bug crept in. Other than the bug
> > > it's fine the way it is.
> >
> > So are you saying that show_prog() should or should not close FD?
>
> Yup, it we'd have to rename it to indicate it closes the fd, and it's
> only called in two places. Not worth the churn.

OK, I'm fine with that.

Acked-by: Andrii Nakryiko <andriin@fb.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox