Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v4 0/2] qed*: Support for NVM config attributes.
From: David Miller @ 2019-08-15 19:54 UTC (permalink / raw)
  To: skalluru; +Cc: netdev, mkalderon, aelior
In-Reply-To: <20190814081153.18889-1-skalluru@marvell.com>

From: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Date: Wed, 14 Aug 2019 01:11:51 -0700

> The patch series adds support for managing the NVM config attributes.
> Patch (1) adds functionality to update config attributes via MFW.
> Patch (2) adds driver interface for updating the config attributes.
> 
> Changes from previous versions:
> -------------------------------
> v4: Added more details on the functionality and its usage.
> v3: Removed unused variable.
> v2: Removed unused API.
> 
> Please consider applying this series to "net-next".

Series applied.

^ permalink raw reply

* [PATCH] net: cavium: fix driver name
From: Stephen Hemminger @ 2019-08-15 19:49 UTC (permalink / raw)
  To: yuehaibing; +Cc: netdev, Stephen Hemminger

The driver name gets exposed in sysfs under /sys/bus/pci/drivers
so it should look like other devices. Change it to be common
format (instead of "Cavium PTP").

This is a trivial fix that was observed by accident because
Debian kernels were building this driver into kernel (bug).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/ethernet/cavium/common/cavium_ptp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/common/cavium_ptp.c b/drivers/net/ethernet/cavium/common/cavium_ptp.c
index 73632b843749..b821c9e1604c 100644
--- a/drivers/net/ethernet/cavium/common/cavium_ptp.c
+++ b/drivers/net/ethernet/cavium/common/cavium_ptp.c
@@ -10,7 +10,7 @@
 
 #include "cavium_ptp.h"
 
-#define DRV_NAME	"Cavium PTP Driver"
+#define DRV_NAME "cavium_ptp"
 
 #define PCI_DEVICE_ID_CAVIUM_PTP	0xA00C
 #define PCI_DEVICE_ID_CAVIUM_RST	0xA00E
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Kees Cook @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Song Liu, Networking, bpf, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Lorenz Bauer, Jann Horn, Greg KH,
	Linux API, LSM List
In-Reply-To: <20190813215823.3sfbakzzjjykyng2@ast-mbp>

On Tue, Aug 13, 2019 at 02:58:25PM -0700, Alexei Starovoitov wrote:
> agree that containers (namespaces) reduce amount of trust necessary
> for apps to run, but the end goal is not security though.

Unsurprisingly, I totally disagree: this is the very definition of
improved "security": reduced attack surface, confined trust, etc.

> Linux has become a single user system.

I hope this is just hyperbole, because it's not true in reality. I agree
that the vast majority of Linux devices are single-user-at-a-time
systems now (rather than the "shell servers" of yore), but the system
still has to be expected to confine users from each other, root, and the
hardware. Switching users on Chrome OS or a distro laptop, etc is still
very much expected to _mean_ something.

> If user can ssh into the host they can become root.
> If arbitrary code can run on the host it will be break out of any sandbox.
> Containers are not providing the level of security that is enough
> to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.

I'm not sure why you draw the line for VMs -- they're just as buggy
as anything else. Regardless, I reject this line of thinking: yes,
all software is buggy, but that isn't a reason to give up. In fact,
we should be trying very hard to create safe code (*insert arguments
for sane languages and toolchains here*).

If you look at software safety as a binary, you will always be
disappointed. If you look at it as it manifests in the real world,
then there is some perspective to be had. Reachability of flaws becomes
a major factor; exploit chain length becomes a factor. There are very
real impacts to be had from security hardening, sandboxing, etc. Of
course nothing is perfect, but the current state of the world isn't
as you describe. (And I say this with the knowledge of how long
the lifetime of bugs are in the kernel.)

> Containers are used to make production systems safer.

Yes.

> Some people call it more 'secure', but it's clearly not secure for
> arbitrary code

Perhaps it's just a language issue. "More secure" and "safer" mean
mostly the same thing to me. I tend to think "safer" is actually
a superset that includes things that wreck the user experience but
aren't actually in the privilege manipulation realm. In the traditional
"security" triad of confidentiality, integrity, and availability, I tend
to weigh availability less highly, but a bug that stops someone from
doing their work but doesn't wreck data, let them switch users, etc,
is still considered a "security" issue by many folks. The fewer bugs
someone is exposed to improves their security, safety, whatever. The
easiest way to do that is confinement and its associated attack surface
reduction. tl;dr: security and safety are very use-case-specific
continuum, not a binary state.

> When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> It's been a constant source of pain. The constant blinding, randomization,
> verifier speculative analysis, all spectre v1, v2, v4 mitigations
> are simply not worth it. It's a lot of complex kernel code without users.
> There is not a single use case to allow arbitrary malicious bpf
> program to be loaded and executed.

The world isn't binary (safe code/malicious code), and we need to build
systems that can be used safely even when things go wrong. Yes, probably
no one has a system that _intentionally_ feeds eBPF into the kernel from
a web form. But there is probably someone who does it unintentionally,
or has a user login exposed on a system where unpriv BPF is enabled. The
point is to create primitives as safely as possible so when things DO
go wrong, they fail safe instead of making things worse.

I'm all for a "less privileged than root" API for eBPF, but I get worried
when I see "security" being treated as a binary state. Especially when
it is considered an always-failed state. :)

-- 
Kees Cook

^ permalink raw reply

* [PATCH mlx5-next 3/5] net/mlx5: Add support for VNIC_ENV internal rq counter
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, Moshe Shemesh
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

From: Moshe Shemesh <moshe@mellanox.com>

Add mlx5 interface support for reading internal rq out of buffer counter
as part of QUERY_VNIC_ENV command. The command is used by the driver to
query vnic diagnostic statistics from FW.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ab6ae723aae6..c788f895b350 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1116,7 +1116,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         cache_line_128byte[0x1];
 	u8         reserved_at_165[0x4];
 	u8         rts2rts_qp_counters_set_id[0x1];
-	u8         reserved_at_16a[0x5];
+	u8         reserved_at_16a[0x2];
+	u8         vnic_env_int_rq_oob[0x1];
+	u8         reserved_at_16d[0x2];
 	u8         qcam_reg[0x1];
 	u8         gid_table_size[0x10];
 
@@ -2772,7 +2774,11 @@ struct mlx5_ifc_vnic_diagnostic_statistics_bits {
 
 	u8         transmit_discard_vport_down[0x40];
 
-	u8         reserved_at_140[0xec0];
+	u8         reserved_at_140[0xa0];
+
+	u8         internal_rq_out_of_buffer[0x20];
+
+	u8         reserved_at_200[0xe00];
 };
 
 struct mlx5_ifc_traffic_counter_bits {
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 2/5] net/mlx5: Improve functions documentation
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

Fix documentation of mlx5_eq_enable/disable to cleanup compiler warnings.

drivers/net/ethernet/mellanox/mlx5/core//eq.c:334:
warning: Function parameter or member 'dev' not described in 'mlx5_eq_enable'
warning: Function parameter or member 'eq' not described in 'mlx5_eq_enable'
warning: Function parameter or member 'nb' not described in 'mlx5_eq_enable'

drivers/net/ethernet/mellanox/mlx5/core//eq.c:355:
warning: Function parameter or member 'dev' not described in 'mlx5_eq_disable'
warning: Function parameter or member 'eq' not described in 'mlx5_eq_disable'
warning: Function parameter or member 'nb' not described in 'mlx5_eq_disable'

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 22 ++++++++++++--------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 2df9aaa421c6..a0e78ab64618 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -324,10 +324,13 @@ create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 
 /**
  * mlx5_eq_enable - Enable EQ for receiving EQEs
- * @dev - Device which owns the eq
- * @eq - EQ to enable
- * @nb - notifier call block
- * mlx5_eq_enable - must be called after EQ is created in device.
+ * @dev : Device which owns the eq
+ * @eq  : EQ to enable
+ * @nb  : Notifier call block
+ *
+ * Must be called after EQ is created in device.
+ *
+ * @return: 0 if no error
  */
 int mlx5_eq_enable(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 		   struct notifier_block *nb)
@@ -344,11 +347,12 @@ int mlx5_eq_enable(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 EXPORT_SYMBOL(mlx5_eq_enable);
 
 /**
- * mlx5_eq_disable - Enable EQ for receiving EQEs
- * @dev - Device which owns the eq
- * @eq - EQ to disable
- * @nb - notifier call block
- * mlx5_eq_disable - must be called before EQ is destroyed.
+ * mlx5_eq_disable - Disable EQ for receiving EQEs
+ * @dev : Device which owns the eq
+ * @eq  : EQ to disable
+ * @nb  : Notifier call block
+ *
+ * Must be called before EQ is destroyed.
  */
 void mlx5_eq_disable(struct mlx5_core_dev *dev, struct mlx5_eq *eq,
 		     struct notifier_block *nb)
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 5/5] net/mlx5: Add lag_tx_port_affinity capability bit
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	Maxim Mikityanskiy, Tariq Toukan
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

From: Maxim Mikityanskiy <maximmi@mellanox.com>

Add the lag_tx_port_affinity HCA capability bit that indicates that
setting port affinity of TISes is supported.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 2837fe4d8901..1e55cf73e88c 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1249,7 +1249,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8         reserved_at_263[0x8];
 	u8         log_bf_reg_size[0x5];
 
-	u8         reserved_at_270[0xb];
+	u8         reserved_at_270[0x8];
+	u8         lag_tx_port_affinity[0x1];
+	u8         reserved_at_279[0x2];
 	u8         lag_master[0x1];
 	u8         num_lag_ports[0x4];
 
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 4/5] net/mlx5: Expose IP-in-IP capability bit
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, Aya Levin,
	Tariq Toukan
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

From: Aya Levin <ayal@mellanox.com>

Expose Fw indication that it supports Stateless Offloads for IP over IP
tunneled packets. The following offloads are supported for the inner
packets: RSS, RX & TX Checksum Offloads, LSO and Flow Steering.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/mlx5_ifc.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index c788f895b350..2837fe4d8901 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -808,7 +808,9 @@ struct mlx5_ifc_per_protocol_networking_offload_caps_bits {
 	u8         swp_csum[0x1];
 	u8         swp_lso[0x1];
 	u8         cqe_checksum_full[0x1];
-	u8         reserved_at_24[0xc];
+	u8         reserved_at_24[0x5];
+	u8         tunnel_stateless_ip_over_ip[0x1];
+	u8         reserved_at_2a[0x6];
 	u8         max_vxlan_udp_ports[0x8];
 	u8         reserved_at_38[0x6];
 	u8         max_geneve_opt_len[0x1];
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 1/5] net/mlx5: Add missing include file to lib/crypto.c
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <20190815194543.14369-1-saeedm@mellanox.com>

Add missing include file to avoid compiler warnings:
drivers/net/ethernet/mellanox/mlx5/core//lib/crypto.c:6:5:
warning: no previous prototype for ‘mlx5_create_encryption_key’
    6 | int mlx5_create_encryption_key(struct mlx5_core_dev *mdev,
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core//lib/crypto.c:60:6:
 warning: no previous prototype for ‘mlx5_destroy_encryption_key’
   60 | void mlx5_destroy_encryption_key(struct mlx5_core_dev *mdev, ...

Fixes: 45d3b55dc665 ("net/mlx5: Add crypto library to support create/destroy encryption key")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c
index ea9ee88491e5..22bc45c831d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/crypto.c
@@ -2,6 +2,7 @@
 // Copyright (c) 2019 Mellanox Technologies.
 
 #include "mlx5_core.h"
+#include "lib/mlx5.h"
 
 int mlx5_create_encryption_key(struct mlx5_core_dev *mdev,
 			       void *key, u32 sz_bytes,
-- 
2.21.0


^ permalink raw reply related

* [PATCH mlx5-next 0/5] Mellanox, Updates for mlx5-next branch 2019-08-15
From: Saeed Mahameed @ 2019-08-15 19:46 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org

Hi All,

This series includes misc updates for mlx5-next shared branch.

mlx5 HW spec and bits updates:
1) Aya exposes IP-in-IP capability in mlx5_core.
2) Maxim exposes lag tx port affinity capabilities.
3) Moshe adds VNIC_ENV internal rq counter bits.

Misc updates:
4) Saeed, two compiler warnings cleanups

In case of no objection this series will be applied to mlx5-next branch
and sent later as pull request to both rdma-next and net-next branches.

Thanks,
Saeed.

---

Aya Levin (1):
  net/mlx5: Expose IP-in-IP capability bit

Maxim Mikityanskiy (1):
  net/mlx5: Add lag_tx_port_affinity capability bit

Moshe Shemesh (1):
  net/mlx5: Add support for VNIC_ENV internal rq counter

Saeed Mahameed (2):
  net/mlx5: Add missing include file to lib/crypto.c
  net/mlx5: Improve functions documentation

 drivers/net/ethernet/mellanox/mlx5/core/eq.c  | 22 +++++++++++--------
 .../ethernet/mellanox/mlx5/core/lib/crypto.c  |  1 +
 include/linux/mlx5/mlx5_ifc.h                 | 18 +++++++++++----
 3 files changed, 28 insertions(+), 13 deletions(-)

-- 
2.21.0

^ permalink raw reply

* Re: [PATCH] net: myri10ge: fix memory leaks
From: David Miller @ 2019-08-15 19:45 UTC (permalink / raw)
  To: wenwen; +Cc: christopher.lee, netdev, linux-kernel
In-Reply-To: <1565764719-6488-1-git-send-email-wenwen@cs.uga.edu>

From: Wenwen Wang <wenwen@cs.uga.edu>
Date: Wed, 14 Aug 2019 01:38:39 -0500

> In myri10ge_probe(), myri10ge_alloc_slices() is invoked to allocate slices
> related structures. Later on, myri10ge_request_irq() is used to get an irq.
> However, if this process fails, the allocated slices related structures are
> not deallocated, leading to memory leaks. To fix this issue, revise the
> target label of the goto statement to 'abort_with_slices'.
> 
> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>

Applied, thanks.

^ permalink raw reply

* Re: pull-request: can-next 2019-08-14,pull-request: can-next 2019-08-14
From: David Miller @ 2019-08-15 19:43 UTC (permalink / raw)
  To: mkl; +Cc: netdev, kernel, linux-can
In-Reply-To: <f0658ccd-389f-fc60-7538-c512112b9978@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 14 Aug 2019 08:16:24 +0200

> this is a pull request for net-next/master consisting of 41 patches.
 ...

Pulled, thanks Marc.

^ permalink raw reply

* Re: [net PATCH] net: tls, fix sk_write_space NULL write when tx disabled
From: David Miller @ 2019-08-15 19:41 UTC (permalink / raw)
  To: john.fastabend; +Cc: jakub.kicinski, ying.xue, netdev, andreyknvl
In-Reply-To: <156576071416.1402.5907777786031481705.stgit@ubuntu3-kvm1>

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 14 Aug 2019 05:31:54 +0000

> The ctx->sk_write_space pointer is only set when TLS tx mode is enabled.
> When running without TX mode its a null pointer but we still set the
> sk sk_write_space pointer on close().
> 
> Fix the close path to only overwrite sk->sk_write_space when the current
> pointer is to the tls_write_space function indicating the tls module should
> clean it up properly as well.
> 
> Reported-by: Hillf Danton <hdanton@sina.com>
> Cc: Ying Xue <ying.xue@windriver.com>
> Cc: Andrey Konovalov <andreyknvl@google.com>
> Fixes: 57c722e932cfb ("net/tls: swap sk_write_space on close")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Applied, thanks John.

That Fixes tag takes one through an interesting chain of fixes to fixes.
I'll queue this up for -stable and make sure it all ends up in the
proper place.

Thanks.

^ permalink raw reply

* Re: [PATCH 00/14] ARM: move lpc32xx and dove to multiplatform
From: Arnd Bergmann @ 2019-08-15 19:38 UTC (permalink / raw)
  To: Sylvain Lemieux
  Cc: Russell King - ARM Linux admin, SoC Team, Linux ARM,
	Vladimir Zapolskiy, Gregory Clement, Linus Walleij, Jason Cooper,
	Andrew Lunn, Sebastian Hesselbarth, David S. Miller,
	Greg Kroah-Hartman, Alan Stern, Guenter Roeck,
	open list:GPIO SUBSYSTEM, Networking, linux-serial, USB list,
	LINUXWATCHDOG
In-Reply-To: <b43c3d60-b675-442c-c549-25530cfbffe3@gmail.com>

On Thu, Aug 15, 2019 at 8:32 PM Sylvain Lemieux <slemieux.tyco@gmail.com> wrote:
> On 8/15/19 9:11 AM, Arnd Bergmann wrote:
> > On Thu, Aug 1, 2019 at 9:33 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > I applied patches 12 and 13 into the soc tree now. There are some
> > other pending multiplatform conversions (iop32x, ep93xx, lpc32xx,
> > omap1), but it looks like none of those will be complete for 5.4.
>
> I think the patchset (v2) for the LPC32xx is ready for 5.4
> ([PATCH v2 00/13] v2: ARM: move lpc32xx to multiplatform)

Good point. I've merged these into the arm/soc branch now.

     Arnd

^ permalink raw reply

* Re: [PATCH] liquidio: add cleanup in octeon_setup_iq()
From: David Miller @ 2019-08-15 19:37 UTC (permalink / raw)
  To: wenwen; +Cc: dchickles, sburla, fmanlunas, netdev, linux-kernel
In-Reply-To: <1565759689-5941-1-git-send-email-wenwen@cs.uga.edu>

From: Wenwen Wang <wenwen@cs.uga.edu>
Date: Wed, 14 Aug 2019 00:14:49 -0500

> If oct->fn_list.enable_io_queues() fails, no cleanup is executed, leading
> to memory/resource leaks. To fix this issue, invoke
> octeon_delete_instr_queue() before returning from the function.
> 
> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>

Applied.

^ permalink raw reply

* Re: [PATCH] net: pch_gbe: Fix memory leaks
From: David Miller @ 2019-08-15 19:34 UTC (permalink / raw)
  To: wenwen
  Cc: rfontana, allison, alexios.zavras, gregkh, tglx, netdev,
	linux-kernel
In-Reply-To: <1565746427-5366-1-git-send-email-wenwen@cs.uga.edu>

From: Wenwen Wang <wenwen@cs.uga.edu>
Date: Tue, 13 Aug 2019 20:33:45 -0500

> In pch_gbe_set_ringparam(), if netif_running() returns false, 'tx_old' and
> 'rx_old' are not deallocated, leading to memory leaks. To fix this issue,
> move the free statements after the if branch.
> 
> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>

Why would they be "deallocated"?  They are still assigned to
adapter->tx_ring and adapter->rx_ring.

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
From: Jonathan Lemon @ 2019-08-15 19:33 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko
In-Reply-To: <20190815121356.8848-3-ivan.khoronzhuk@linaro.org>



On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:

> For 64-bit there is no reason to use vmap/vunmap, so use page_address
> as it was initially. For 32 bits, in some apps, like in samples
> xdpsock_user.c when number of pgs in use is quite big, the kmap
> memory can be not enough, despite on this, kmap looks like is
> deprecated in such cases as it can block and should be used rather
> for dynamic mm.
>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

^ permalink raw reply

* Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
From: Jason Gunthorpe @ 2019-08-15 19:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, kvm, virtualization, netdev, linux-kernel,
	linux-mm
In-Reply-To: <74838e61-3a5e-0f51-2092-f4a16d144b45@redhat.com>

On Thu, Aug 15, 2019 at 11:26:46AM +0800, Jason Wang wrote:
> 
> On 2019/8/13 下午7:57, Jason Gunthorpe wrote:
> > On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote:
> > 
> > > What kind of issues do you see? Spinlock is to synchronize GUP with MMU
> > > notifier in this series.
> > A GUP that can't sleep can't pagefault which makes it a really weird
> > pattern
> 
> 
> My understanding is __get_user_pages_fast() assumes caller can fail or have
> fallback. And we have graceful fallback to copy_{to|from}_user().

My point is that if you can fall back to copy_user then it is weird to
call the special non-sleeping GUP under a spinlock.

AFAIK the only reason this is done is because of the way the notifier
is being locked...

Jason

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
From: Jonathan Lemon @ 2019-08-15 19:32 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko
In-Reply-To: <20190815191456.GA11699@khorivan>



On 15 Aug 2019, at 12:19, Ivan Khoronzhuk wrote:

> On Thu, Aug 15, 2019 at 11:23:16AM -0700, Jonathan Lemon wrote:
>> On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:
>>
>>> For 64-bit there is no reason to use vmap/vunmap, so use 
>>> page_address
>>> as it was initially. For 32 bits, in some apps, like in samples
>>> xdpsock_user.c when number of pgs in use is quite big, the kmap
>>> memory can be not enough, despite on this, kmap looks like is
>>> deprecated in such cases as it can block and should be used rather
>>> for dynamic mm.
>>>
>>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>>> ---
>>> net/xdp/xdp_umem.c | 36 ++++++++++++++++++++++++++++++------
>>> 1 file changed, 30 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
>>> index a0607969f8c0..d740c4f8810c 100644
>>> --- a/net/xdp/xdp_umem.c
>>> +++ b/net/xdp/xdp_umem.c
>>> @@ -14,7 +14,7 @@
>>> #include <linux/netdevice.h>
>>> #include <linux/rtnetlink.h>
>>> #include <linux/idr.h>
>>> -#include <linux/highmem.h>
>>> +#include <linux/vmalloc.h>
>>>
>>> #include "xdp_umem.h"
>>> #include "xsk_queue.h"
>>> @@ -170,7 +170,30 @@ static void xdp_umem_unmap_pages(struct 
>>> xdp_umem *umem)
>>> 	unsigned int i;
>>>
>>> 	for (i = 0; i < umem->npgs; i++)
>>> -		kunmap(umem->pgs[i]);
>>> +		if (PageHighMem(umem->pgs[i]))
>>> +			vunmap(umem->pages[i].addr);
>>> +}
>>> +
>>> +static int xdp_umem_map_pages(struct xdp_umem *umem)
>>> +{
>>> +	unsigned int i;
>>> +	void *addr;
>>> +
>>> +	for (i = 0; i < umem->npgs; i++) {
>>> +		if (PageHighMem(umem->pgs[i]))
>>> +			addr = vmap(&umem->pgs[i], 1, VM_MAP, PAGE_KERNEL);
>>> +		else
>>> +			addr = page_address(umem->pgs[i]);
>>> +
>>> +		if (!addr) {
>>> +			xdp_umem_unmap_pages(umem);
>>> +			return -ENOMEM;
>>> +		}
>>> +
>>> +		umem->pages[i].addr = addr;
>>> +	}
>>> +
>>> +	return 0;
>>> }
>>
>> You'll want a __xdp_umem_unmap_pages() helper here that takes an
>> count of the number of pages to unmap, so it can be called from
>> xdp_umem_unmap_pages() in the normal case, and xdp_umem_map_pages()
>> in the error case.  Otherwise the error case ends up calling
>> PageHighMem on a null page.
>> -- 
>> Jonathan
>
> Do you mean null address?
> If so, then vunmap do nothing if it's null, and addr is null if it's 
> not
> assigned... and it's not assigned w/o correct mapping...
>
> If you mean null page, then it is not possible after all they are
> pinned above, here: xdp_umem_pin_pages(), thus assigned.
>
> Or I missed smth?

No - I forgot about umem_pin_pages() - feel free to ignore my comments.
--
Jonathan

>
> Despite of this, seems like here should be one more patch, adding 
> unpinning page
> in error path, but this not related to this change. Will do this in 
> follow up
> fix patch, if no objection to my explanation, ofc.
>
> -- 
> Regards,
> Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH] net/ncsi: Ensure 32-bit boundary for data cksum
From: David Miller @ 2019-08-15 19:32 UTC (permalink / raw)
  To: terry.s.duncan; +Cc: sam, netdev, linux-kernel, openbmc, wak, joel
In-Reply-To: <20190814011840.9387-1-terry.s.duncan@linux.intel.com>

From: "Terry S. Duncan" <terry.s.duncan@linux.intel.com>
Date: Tue, 13 Aug 2019 18:18:40 -0700

> The NCSI spec indicates that if the data does not end on a 32 bit
> boundary, one to three padding bytes equal to 0x00 shall be present to
> align the checksum field to a 32-bit boundary.
> 
> Signed-off-by: Terry S. Duncan <terry.s.duncan@linux.intel.com>
> ---
>  net/ncsi/internal.h |  1 +
>  net/ncsi/ncsi-cmd.c |  2 +-
>  net/ncsi/ncsi-rsp.c | 12 ++++++++----
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
> index 0b3f0673e1a2..468a19fdfd88 100644
> --- a/net/ncsi/internal.h
> +++ b/net/ncsi/internal.h
> @@ -185,6 +185,7 @@ struct ncsi_package;
>  #define NCSI_TO_CHANNEL(p, c)	(((p) << NCSI_PACKAGE_SHIFT) | (c))
>  #define NCSI_MAX_PACKAGE	8
>  #define NCSI_MAX_CHANNEL	32
> +#define NCSI_ROUND32(x)		(((x) + 3) & ~3) /* Round to 32 bit boundary */

I think we have enough of a proliferation of alignment macros, let's not add more.

Either define this to "ALIGN(x, 4)" or expand that into each of the locations:

>  	pchecksum = (__be32 *)((void *)h + sizeof(struct ncsi_pkt_hdr) +
> -		    nca->payload);
> +		    NCSI_ROUND32(nca->payload));

		     ALIGN(nca->payload, 4)

> -	pchecksum = (__be32 *)((void *)(h + 1) + payload - 4);
> +	pchecksum = (__be32 *)((void *)(h + 1) + NCSI_ROUND32(payload) - 4);

						 ALIGN(payload, 4)


etc.

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Jakub Kicinski @ 2019-08-15 19:28 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: magnus.karlsson, bjorn.topel, netdev, bpf, intel-wired-lan,
	maciej.fijalkowski, tom.herbert
In-Reply-To: <1565840783-8269-1-git-send-email-sridhar.samudrala@intel.com>

On Wed, 14 Aug 2019 20:46:18 -0700, Sridhar Samudrala wrote:
> This patch series introduces XDP_SKIP_BPF flag that can be specified
> during the bind() call of an AF_XDP socket to skip calling the BPF 
> program in the receive path and pass the buffer directly to the socket.
> 
> When a single AF_XDP socket is associated with a queue and a HW
> filter is used to redirect the packets and the app is interested in
> receiving all the packets on that queue, we don't need an additional 
> BPF program to do further filtering or lookup/redirect to a socket.
> 
> Here are some performance numbers collected on 
>   - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>   - Intel 40Gb Ethernet NIC (i40e)
> 
> All tests use 2 cores and the results are in Mpps.
> 
> turbo on (default)
> ---------------------------------------------	
>                       no-skip-bpf    skip-bpf
> ---------------------------------------------	
> rxdrop zerocopy           21.9         38.5 
> l2fwd  zerocopy           17.0         20.5
> rxdrop copy               11.1         13.3
> l2fwd  copy                1.9          2.0
> 
> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> ---------------------------------------------	
>                       no-skip-bpf    skip-bpf
> ---------------------------------------------	
> rxdrop zerocopy           15.4         29.0
> l2fwd  zerocopy           11.8         18.2
> rxdrop copy                8.2         10.5
> l2fwd  copy                1.7          1.7
> ---------------------------------------------	

Could you include a third column here - namely the in-XDP performance?
AFAIU the way to achieve better performance with AF_XDP is to move the
fast path into the kernel's XDP program..

Maciej's work on batching XDP program's execution should lower the
retpoline overhead, without leaning close to the bypass model.

^ permalink raw reply

* Re: [PATCH net-next] page_pool: fix logic in __page_pool_get_cached
From: David Miller @ 2019-08-15 19:28 UTC (permalink / raw)
  To: jonathan.lemon
  Cc: netdev, brouer, ilias.apalodimas, saeedm, ttoukan.linux,
	kernel-team
In-Reply-To: <20190813174509.494723-1-jonathan.lemon@gmail.com>

From: Jonathan Lemon <jonathan.lemon@gmail.com>
Date: Tue, 13 Aug 2019 10:45:09 -0700

> __page_pool_get_cached() will return NULL when the ring is
> empty, even if there are pages present in the lookaside cache.
> 
> It is also possible to refill the cache, and then return a
> NULL page.
> 
> Restructure the logic so eliminate both cases.
> 
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>

Applied.

^ permalink raw reply

* Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
From: Jakub Kicinski @ 2019-08-15 19:22 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, netdev, bpf, William Tu
In-Reply-To: <20190815152100.GN2820@mini-arch>

On Thu, 15 Aug 2019 08:21:00 -0700, Stanislav Fomichev wrote:
> On 08/15, Toshiaki Makita wrote:
> > On 2019/08/15 2:07, Stanislav Fomichev wrote:  
> > > On 08/13, Toshiaki Makita wrote:  
> > > > * Implementation
> > > > 
> > > > xdp_flow makes use of UMH to load an eBPF program for XDP, similar to
> > > > bpfilter. The difference is that xdp_flow does not generate the eBPF
> > > > program dynamically but a prebuilt program is embedded in UMH. This is
> > > > mainly because flow insertion is considerably frequent. If we generate
> > > > and load an eBPF program on each insertion of a flow, the latency of the
> > > > first packet of ping in above test will incease, which I want to avoid.  
> > > Can this be instead implemented with a new hook that will be called
> > > for TC events? This hook can write to perf event buffer and control
> > > plane will insert/remove/modify flow tables in the BPF maps (contol
> > > plane will also install xdp program).
> > > 
> > > Why do we need UMH? What am I missing?  
> > 
> > So you suggest doing everything in xdp_flow kmod?  
> You probably don't even need xdp_flow kmod. Add new tc "offload" mode
> (bypass) that dumps every command via netlink (or calls the BPF hook
> where you can dump it into perf event buffer) and then read that info
> from userspace and install xdp programs and modify flow tables.
> I don't think you need any kernel changes besides that stream
> of data from the kernel about qdisc/tc flow creation/removal/etc.

There's a certain allure in bringing the in-kernel BPF translation
infrastructure forward. OTOH from system architecture perspective IMHO
it does seem like a task best handed in user space. bpfilter can replace
iptables completely, here we're looking at an acceleration relatively
loosely coupled with flower.

FWIW Quentin spent some time working on a universal flow rule to BPF
translation library:

https://github.com/Netronome/libkefir

A lot remains to be done there, but flower front end is one of the
targets. A library can be tuned for any application, without a
dependency on flower uAPI.

> But, I haven't looked at the series deeply, so I might be missing
> something :-)

I don't think you are :)

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/3] xdp: xdp_umem: replace kmap on vmap for umem map
From: Ivan Khoronzhuk @ 2019-08-15 19:19 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: magnus.karlsson, bjorn.topel, davem, hawk, john.fastabend,
	jakub.kicinski, daniel, netdev, bpf, xdp-newbies, linux-kernel,
	yhs, andrii.nakryiko
In-Reply-To: <5B58D364-609F-498E-B7DF-4457D454A14D@gmail.com>

On Thu, Aug 15, 2019 at 11:23:16AM -0700, Jonathan Lemon wrote:
>On 15 Aug 2019, at 5:13, Ivan Khoronzhuk wrote:
>
>>For 64-bit there is no reason to use vmap/vunmap, so use page_address
>>as it was initially. For 32 bits, in some apps, like in samples
>>xdpsock_user.c when number of pgs in use is quite big, the kmap
>>memory can be not enough, despite on this, kmap looks like is
>>deprecated in such cases as it can block and should be used rather
>>for dynamic mm.
>>
>>Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>>---
>> net/xdp/xdp_umem.c | 36 ++++++++++++++++++++++++++++++------
>> 1 file changed, 30 insertions(+), 6 deletions(-)
>>
>>diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
>>index a0607969f8c0..d740c4f8810c 100644
>>--- a/net/xdp/xdp_umem.c
>>+++ b/net/xdp/xdp_umem.c
>>@@ -14,7 +14,7 @@
>> #include <linux/netdevice.h>
>> #include <linux/rtnetlink.h>
>> #include <linux/idr.h>
>>-#include <linux/highmem.h>
>>+#include <linux/vmalloc.h>
>>
>> #include "xdp_umem.h"
>> #include "xsk_queue.h"
>>@@ -170,7 +170,30 @@ static void xdp_umem_unmap_pages(struct 
>>xdp_umem *umem)
>> 	unsigned int i;
>>
>> 	for (i = 0; i < umem->npgs; i++)
>>-		kunmap(umem->pgs[i]);
>>+		if (PageHighMem(umem->pgs[i]))
>>+			vunmap(umem->pages[i].addr);
>>+}
>>+
>>+static int xdp_umem_map_pages(struct xdp_umem *umem)
>>+{
>>+	unsigned int i;
>>+	void *addr;
>>+
>>+	for (i = 0; i < umem->npgs; i++) {
>>+		if (PageHighMem(umem->pgs[i]))
>>+			addr = vmap(&umem->pgs[i], 1, VM_MAP, PAGE_KERNEL);
>>+		else
>>+			addr = page_address(umem->pgs[i]);
>>+
>>+		if (!addr) {
>>+			xdp_umem_unmap_pages(umem);
>>+			return -ENOMEM;
>>+		}
>>+
>>+		umem->pages[i].addr = addr;
>>+	}
>>+
>>+	return 0;
>> }
>
>You'll want a __xdp_umem_unmap_pages() helper here that takes an
>count of the number of pages to unmap, so it can be called from
>xdp_umem_unmap_pages() in the normal case, and xdp_umem_map_pages()
>in the error case.  Otherwise the error case ends up calling
>PageHighMem on a null page.
>-- 
>Jonathan

Do you mean null address?
If so, then vunmap do nothing if it's null, and addr is null if it's not
assigned... and it's not assigned w/o correct mapping...

If you mean null page, then it is not possible after all they are
pinned above, here: xdp_umem_pin_pages(), thus assigned.

Or I missed smth?

Despite of this, seems like here should be one more patch, adding unpinning page
in error path, but this not related to this change. Will do this in follow up
fix patch, if no objection to my explanation, ofc.

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* [net-next 15/16] net/mlx5e: Fix deallocation of non-fully init encap entries
From: Saeed Mahameed @ 2019-08-15 19:10 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Vlad Buslov, Saeed Mahameed
In-Reply-To: <20190815190911.12050-1-saeedm@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>

Recent rtnl lock dependency refactoring changed encap entry attach code to
insert encap entry to hash table before it was fully initialized in order
to allow concurrent tc users to wait on completion for encap entry to
finish initialization. That change required all the users of encap entry to
obtain reference to it first and for caller that creates encap to put
reference to it on error, instead of freeing the entry memory directly.
However, releasing reference to such encap entry that wasn't fully
initialized causes NULL pointer dereference in
mlx5e_rep_encap_entry_detach() which expects e->out_dev to be set and encap
to be attached to nhe:

[ 1092.454517] BUG: unable to handle page fault for address: 00000000000420e8
[ 1092.454571] #PF: supervisor read access in kernel mode
[ 1092.454602] #PF: error_code(0x0000) - not-present page
[ 1092.454632] PGD 800000083032c067 P4D 800000083032c067 PUD 84107d067 PMD 0
[ 1092.454673] Oops: 0000 [#1] SMP PTI
[ 1092.454697] CPU: 20 PID: 22393 Comm: tc Not tainted 5.3.0-rc3+ #589
[ 1092.454733] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 1092.454806] RIP: 0010:mlx5e_rep_encap_entry_detach+0x1c/0x630 [mlx5_core]
[ 1092.454845] Code: be f4 ff ff ff e9 11 ff ff ff 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 30 <48> 8b 87 28 16 04 00 48 89 f7 48 05 d0 03 00 00 48 89
 45 c8 e8 cb
[ 1092.454942] RSP: 0018:ffffb6f08421f5a0 EFLAGS: 00010286
[ 1092.454974] RAX: 0000000000000000 RBX: ffff8ab668644e00 RCX: ffffb6f08421f56c
[ 1092.455013] RDX: ffff8ab668644e40 RSI: ffff8ab668644e00 RDI: 0000000000000ac0
[ 1092.455053] RBP: ffffb6f08421f5f8 R08: 0000000000000001 R09: 0000000000000000
[ 1092.455092] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000ac0
[ 1092.455131] R13: 00000000ffffff9b R14: ffff8ab63f200ac0 R15: ffff8ab668644e40
[ 1092.455171] FS:  00007fa195bdc480(0000) GS:ffff8ab66fa00000(0000) knlGS:0000000000000000
[ 1092.455216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1092.455249] CR2: 00000000000420e8 CR3: 0000000867522001 CR4: 00000000001606e0
[ 1092.455288] Call Trace:
[ 1092.455315]  ? __mutex_unlock_slowpath+0x4d/0x2a0
[ 1092.455365]  mlx5e_encap_dealloc.isra.0+0x31/0x60 [mlx5_core]
[ 1092.455424]  mlx5e_tc_add_fdb_flow+0x596/0x750 [mlx5_core]
[ 1092.455484]  __mlx5e_add_fdb_flow+0x152/0x210 [mlx5_core]
[ 1092.455534]  mlx5e_configure_flower+0x4d5/0xe30 [mlx5_core]
[ 1092.455574]  tc_setup_cb_call+0x67/0xb0
[ 1092.455601]  fl_hw_replace_filter+0x142/0x300 [cls_flower]
[ 1092.455639]  fl_change+0xd24/0x1bdb [cls_flower]
[ 1092.455675]  tc_new_tfilter+0x3e0/0x970
[ 1092.455709]  ? tc_del_tfilter+0x720/0x720
[ 1092.455735]  rtnetlink_rcv_msg+0x389/0x4b0
[ 1092.455763]  ? netlink_deliver_tap+0x95/0x400
[ 1092.455791]  ? rtnl_dellink+0x2d0/0x2d0
[ 1092.455817]  netlink_rcv_skb+0x49/0x110
[ 1092.455844]  netlink_unicast+0x171/0x200
[ 1092.455872]  netlink_sendmsg+0x224/0x3f0
[ 1092.455901]  sock_sendmsg+0x5e/0x60
[ 1092.455924]  ___sys_sendmsg+0x2ae/0x330
[ 1092.455950]  ? task_work_add+0x43/0x50
[ 1092.455976]  ? fput_many+0x45/0x80
[ 1092.456004]  ? __lock_acquire+0x248/0x18e0
[ 1092.456033]  ? find_held_lock+0x2b/0x80
[ 1092.456058]  ? task_work_run+0x7b/0xd0
[ 1092.456085]  __sys_sendmsg+0x59/0xa0
[ 1092.457013]  do_syscall_64+0x5c/0xb0
[ 1092.457924]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1092.458842] RIP: 0033:0x7fa195da27b8
[ 1092.459918] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83
 ec 28 89 54
[ 1092.462634] RSP: 002b:00007fff94409298 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1092.464011] RAX: ffffffffffffffda RBX: 000000005d515b0e RCX: 00007fa195da27b8
[ 1092.465391] RDX: 0000000000000000 RSI: 00007fff94409300 RDI: 0000000000000003
[ 1092.466761] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006
[ 1092.468121] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001
[ 1092.469456] R13: 0000000000480640 R14: 0000000000000016 R15: 0000000000000001
[ 1092.470766] Modules linked in: act_mirred act_tunnel_key cls_flower dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm
iw_cm ib_cm mlx5_ib ib_uverbs ib_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_core kvm_intel kvm irqbypass crct10dif_pclmul mei_me crc32_pclmul crc32
c_intel igb iTCO_wdt ghash_clmulni_intel ses mlxfw intel_cstate iTCO_vendor_support ptp intel_uncore lpc_ich pps_core mei i2c_i801 joydev intel_rapl_perf ioatdma enclosure ipmi_ssif pcspkr dca wmi ipmi_
si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
[ 1092.479618] CR2: 00000000000420e8
[ 1092.481214] ---[ end trace ce2e0f4d9a67f604 ]---

To fix the issue, set e->compl_result to positive value after encap was
initialized successfully. Check e->compl_result value in
mlx5e_encap_dealloc() and only detach and dealloc encap if the value is
positive.

Fixes: d589e785baf5 ("net/mlx5e: Allow concurrent creation of encap entries")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 5be3da621499..2a4de37fd210 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1481,10 +1481,13 @@ void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe)
 static void mlx5e_encap_dealloc(struct mlx5e_priv *priv, struct mlx5e_encap_entry *e)
 {
 	WARN_ON(!list_empty(&e->flows));
-	mlx5e_rep_encap_entry_detach(netdev_priv(e->out_dev), e);
 
-	if (e->flags & MLX5_ENCAP_ENTRY_VALID)
-		mlx5_packet_reformat_dealloc(priv->mdev, e->encap_id);
+	if (e->compl_result > 0) {
+		mlx5e_rep_encap_entry_detach(netdev_priv(e->out_dev), e);
+
+		if (e->flags & MLX5_ENCAP_ENTRY_VALID)
+			mlx5_packet_reformat_dealloc(priv->mdev, e->encap_id);
+	}
 
 	kfree(e->encap_header);
 	kfree(e);
@@ -2910,7 +2913,7 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 
 		/* Protect against concurrent neigh update. */
 		mutex_lock(&esw->offloads.encap_tbl_lock);
-		if (e->compl_result) {
+		if (e->compl_result < 0) {
 			err = -EREMOTEIO;
 			goto out_err;
 		}
@@ -2950,6 +2953,7 @@ static int mlx5e_attach_encap(struct mlx5e_priv *priv,
 		e->compl_result = err;
 		goto out_err;
 	}
+	e->compl_result = 1;
 
 attach_flow:
 	flow->encaps[out_index].e = e;
-- 
2.21.0


^ permalink raw reply related

* [net-next 14/16] Documentation: net: mlx5: Devlink health documentation updates
From: Saeed Mahameed @ 2019-08-15 19:10 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev@vger.kernel.org, Aya Levin, Saeed Mahameed
In-Reply-To: <20190815190911.12050-1-saeedm@mellanox.com>

From: Aya Levin <ayal@mellanox.com>

Add documentation for devlink health rx reporter supported by mlx5.
Update tx reporter documentation.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 .../device_drivers/mellanox/mlx5.rst          | 33 +++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/device_drivers/mellanox/mlx5.rst b/Documentation/networking/device_drivers/mellanox/mlx5.rst
index 214325897732..cfda464e52de 100644
--- a/Documentation/networking/device_drivers/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/mellanox/mlx5.rst
@@ -126,7 +126,7 @@ Devlink health reporters
 
 tx reporter
 -----------
-The tx reporter is responsible of two error scenarios:
+The tx reporter is responsible for reporting and recovering of the following two error scenarios:
 
 - TX timeout
     Report on kernel tx timeout detection.
@@ -135,7 +135,7 @@ The tx reporter is responsible of two error scenarios:
     Report on error tx completion.
     Recover by flushing the TX queue and reset it.
 
-TX reporter also support Diagnose callback, on which it provides
+TX reporter also support on demand diagnose callback, on which it provides
 real time information of its send queues status.
 
 User commands examples:
@@ -144,11 +144,40 @@ User commands examples:
 
     $ devlink health diagnose pci/0000:82:00.0 reporter tx
 
+NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
+
 - Show number of tx errors indicated, number of recover flows ended successfully,
   is autorecover enabled and graceful period from last recover::
 
     $ devlink health show pci/0000:82:00.0 reporter tx
 
+rx reporter
+-----------
+The rx reporter is responsible for reporting and recovering of the following two error scenarios:
+
+- RX queues initialization (population) timeout
+    RX queues descriptors population on ring initialization is done in
+    napi context via triggering an irq, in case of a failure to get
+    the minimum amount of descriptors, a timeout would occur and it
+    could be recoverable by polling the EQ (Event Queue).
+- RX completions with errors (reported by HW on interrupt context)
+    Report on rx completion error.
+    Recover (if needed) by flushing the related queue and reset it.
+
+RX reporter also supports on demand diagnose callback, on which it
+provides real time information of its receive queues status.
+
+- Diagnose rx queues status, and corresponding completion queue::
+
+    $ devlink health diagnose pci/0000:82:00.0 reporter rx
+
+NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
+
+- Show number of rx errors indicated, number of recover flows ended successfully,
+  is autorecover enabled and graceful period from last recover::
+
+    $ devlink health show pci/0000:82:00.0 reporter rx
+
 fw reporter
 -----------
 The fw reporter implements diagnose and dump callbacks.
-- 
2.21.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox