Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v3 00/12] net: stmmac: Improvements for -next
From: David Miller @ 2019-08-17 19:44 UTC (permalink / raw)
  To: Jose.Abreu
  Cc: netdev, Joao.Pinto, jakub.kicinski, peppe.cavallaro,
	alexandre.torgue, mcoquelin.stm32, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <cover.1566067802.git.joabreu@synopsys.com>

From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Sat, 17 Aug 2019 20:54:39 +0200

> Couple of improvements for -next tree. More info in commit logs.

Series applied.

^ permalink raw reply

* Re: [net-next 11/16] net/mlx5e: Report and recover from rx timeout
From: David Miller @ 2019-08-17 19:48 UTC (permalink / raw)
  To: saeedm; +Cc: netdev, ayal, tariqt, jiri
In-Reply-To: <20190815190911.12050-12-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 15 Aug 2019 19:10:07 +0000

> +static int mlx5e_rx_reporter_timeout_recover(void *ctx)
> +{
> +	struct mlx5e_rq *rq = ctx;
> +	struct mlx5e_icosq *icosq = &rq->channel->icosq;
> +	struct mlx5_eq_comp *eq = rq->cq.mcq.eq;
> +	int err;

In this and several further patches, this non-reverse-christmas tree
sequence appears.  Please fix it.

Put the variable assignments into the body of the function if you have
to in order to make this styled correctly.

Thanks.

^ permalink raw reply

* Re: [PATCH RFC v2 net-next 0/4] mv88e6xxx: setting 2500base-x mode for CPU/DSA port in dts
From: Vivien Didelot @ 2019-08-17 19:50 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190817191452.16716-1-marek.behun@nic.cz>

Hi Marek,

On Sat, 17 Aug 2019 21:14:48 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> Hi,
> 
> here is another proposal for supporting setting 2500base-x mode for
> CPU/DSA ports in device tree correctly.
> 
> The changes from v1 are that instead of adding .port_setup() and
> .port_teardown() methods to the DSA operations struct we instead, for
> CPU/DSA ports, call dsa_port_enable() from dsa_port_setup(), but only
> after the port is registered (and required phylink/devlink structures
> exist).
> 
> The .port_enable/.port_disable methods are now only meant to be used
> for user ports, when the slave interface is brought up/down. This
> proposal changes that in such a way that these methods are also called
> for CPU/DSA ports, but only just after the switch is set up (and just
> before the switch is tore down).
> 
> If we went this way, we would have to patch the other DSA drivers to
> check if user port is being given in their respective .port_enable
> and .port_disable implmentations.
> 
> What do you think about this?

This looks much better. Let me pass through all patches of this RFC so that
I can include bits I would like to see in your next series.


Thanks,

	Vivien

^ permalink raw reply

* [PATCH net v2 0/6] bnxt_en: Bug fixes.
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev

2 Bug fixes related to 57500 shutdown sequence and doorbell sequence,
2 TC Flower bug fixes related to the setting of the flow direction,
1 NVRAM update bug fix, and a minor fix to suppress an unnecessary
error message.  Please queue for -stable as well.  Thanks.

v2: Fix compile warning reported by kbuild test robot on patch #6.

Michael Chan (2):
  bnxt_en: Fix VNIC clearing logic for 57500 chips.
  bnxt_en: Improve RX doorbell sequence.

Somnath Kotur (1):
  bnxt_en: Fix to include flow direction in L2 key

Vasundhara Volam (2):
  bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
  bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command

Venkat Duvvuru (1):
  bnxt_en: Use correct src_fid to determine direction of the flow

 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 36 ++++++++++++++++-------
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c |  9 ++++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 ++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c      |  8 ++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h      |  6 ++--
 5 files changed, 42 insertions(+), 29 deletions(-)

-- 
2.5.1


^ permalink raw reply

* [PATCH net v2 1/6] bnxt_en: Fix VNIC clearing logic for 57500 chips.
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>

During device shutdown, the VNIC clearing sequence needs to be modified
to free the VNIC first before freeing the RSS contexts.  The current
code is doing the reverse and we can get mis-directed RX completions
to CP ring ID 0 when the RSS contexts are freed and zeroed.  The clearing
of RSS contexts is not required with the new sequence.

Refactor the VNIC clearing logic into a new function bnxt_clear_vnic()
and do the chip specific VNIC clearing sequence.

Fixes: 7b3af4f75b81 ("bnxt_en: Add RSS support for 57500 chips.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7070349..1ef224f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -7016,19 +7016,29 @@ static void bnxt_hwrm_clear_vnic_rss(struct bnxt *bp)
 		bnxt_hwrm_vnic_set_rss(bp, i, false);
 }
 
-static void bnxt_hwrm_resource_free(struct bnxt *bp, bool close_path,
-				    bool irq_re_init)
+static void bnxt_clear_vnic(struct bnxt *bp)
 {
-	if (bp->vnic_info) {
-		bnxt_hwrm_clear_vnic_filter(bp);
+	if (!bp->vnic_info)
+		return;
+
+	bnxt_hwrm_clear_vnic_filter(bp);
+	if (!(bp->flags & BNXT_FLAG_CHIP_P5)) {
 		/* clear all RSS setting before free vnic ctx */
 		bnxt_hwrm_clear_vnic_rss(bp);
 		bnxt_hwrm_vnic_ctx_free(bp);
-		/* before free the vnic, undo the vnic tpa settings */
-		if (bp->flags & BNXT_FLAG_TPA)
-			bnxt_set_tpa(bp, false);
-		bnxt_hwrm_vnic_free(bp);
 	}
+	/* before free the vnic, undo the vnic tpa settings */
+	if (bp->flags & BNXT_FLAG_TPA)
+		bnxt_set_tpa(bp, false);
+	bnxt_hwrm_vnic_free(bp);
+	if (bp->flags & BNXT_FLAG_CHIP_P5)
+		bnxt_hwrm_vnic_ctx_free(bp);
+}
+
+static void bnxt_hwrm_resource_free(struct bnxt *bp, bool close_path,
+				    bool irq_re_init)
+{
+	bnxt_clear_vnic(bp);
 	bnxt_hwrm_ring_free(bp, close_path);
 	bnxt_hwrm_ring_grp_free(bp);
 	if (irq_re_init) {
-- 
2.5.1


^ permalink raw reply related

* [PATCH net v2 2/6] bnxt_en: Improve RX doorbell sequence.
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>

When both RX buffers and RX aggregation buffers have to be
replenished at the end of NAPI, post the RX aggregation buffers first
before RX buffers.  Otherwise, we may run into a situation where
there are only RX buffers without RX aggregation buffers for a split
second.  This will cause the hardware to abort the RX packet and
report buffer errors, which will cause unnecessary cleanup by the
driver.

Ringing the Aggregation ring doorbell first before the RX ring doorbell
will prevent some of these buffer errors.  Use the same sequence during
ring initialization as well.

Fixes: 697197e5a173 ("bnxt_en: Re-structure doorbells.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1ef224f..8dce406 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2021,9 +2021,9 @@ static void __bnxt_poll_work_done(struct bnxt *bp, struct bnxt_napi *bnapi)
 	if (bnapi->events & BNXT_RX_EVENT) {
 		struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
 
-		bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
 		if (bnapi->events & BNXT_AGG_EVENT)
 			bnxt_db_write(bp, &rxr->rx_agg_db, rxr->rx_agg_prod);
+		bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
 	}
 	bnapi->events = 0;
 }
@@ -5064,6 +5064,7 @@ static void bnxt_set_db(struct bnxt *bp, struct bnxt_db_info *db, u32 ring_type,
 
 static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
 {
+	bool agg_rings = !!(bp->flags & BNXT_FLAG_AGG_RINGS);
 	int i, rc = 0;
 	u32 type;
 
@@ -5139,7 +5140,9 @@ static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
 		if (rc)
 			goto err_out;
 		bnxt_set_db(bp, &rxr->rx_db, type, map_idx, ring->fw_ring_id);
-		bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
+		/* If we have agg rings, post agg buffers first. */
+		if (!agg_rings)
+			bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
 		bp->grp_info[map_idx].rx_fw_ring_id = ring->fw_ring_id;
 		if (bp->flags & BNXT_FLAG_CHIP_P5) {
 			struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring;
@@ -5158,7 +5161,7 @@ static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
 		}
 	}
 
-	if (bp->flags & BNXT_FLAG_AGG_RINGS) {
+	if (agg_rings) {
 		type = HWRM_RING_ALLOC_AGG;
 		for (i = 0; i < bp->rx_nr_rings; i++) {
 			struct bnxt_rx_ring_info *rxr = &bp->rx_ring[i];
@@ -5174,6 +5177,7 @@ static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
 			bnxt_set_db(bp, &rxr->rx_agg_db, type, map_idx,
 				    ring->fw_ring_id);
 			bnxt_db_write(bp, &rxr->rx_agg_db, rxr->rx_agg_prod);
+			bnxt_db_write(bp, &rxr->rx_db, rxr->rx_prod);
 			bp->grp_info[grp_idx].agg_fw_ring_id = ring->fw_ring_id;
 		}
 	}
-- 
2.5.1


^ permalink raw reply related

* [PATCH net v2 4/6] bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, Vasundhara Volam
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

For newly added NVM parameters, older firmware may not have the support.
Suppress the error message to avoid the unncessary error message which is
triggered when devlink calls the driver during initialization.

Fixes: 782a624d00fa ("bnxt_en: Add bnxt_en initial params table and register it.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 549c90d3..c05d663 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -98,10 +98,13 @@ static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
 	if (idx)
 		req->dimensions = cpu_to_le16(1);
 
-	if (req->req_type == cpu_to_le16(HWRM_NVM_SET_VARIABLE))
+	if (req->req_type == cpu_to_le16(HWRM_NVM_SET_VARIABLE)) {
 		memcpy(data_addr, buf, bytesize);
-
-	rc = hwrm_send_message(bp, msg, msg_len, HWRM_CMD_TIMEOUT);
+		rc = hwrm_send_message(bp, msg, msg_len, HWRM_CMD_TIMEOUT);
+	} else {
+		rc = hwrm_send_message_silent(bp, msg, msg_len,
+					      HWRM_CMD_TIMEOUT);
+	}
 	if (!rc && req->req_type == cpu_to_le16(HWRM_NVM_GET_VARIABLE))
 		memcpy(buf, data_addr, bytesize);
 
-- 
2.5.1


^ permalink raw reply related

* [PATCH net v2 3/6] bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, Vasundhara Volam
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

If FW returns FRAG_ERR in response error code, driver is resending the
command only when HWRM command returns success. Fix the code to resend
NVM_INSTALL_UPDATE command with DEFRAG install flags, if FW returns
FRAG_ERR in its response error code.

Fixes: cb4d1d626145 ("bnxt_en: Retry failed NVM_INSTALL_UPDATE with defragmentation flag enabled.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index c7ee63d..8445a0c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -2016,21 +2016,19 @@ static int bnxt_flash_package_from_file(struct net_device *dev,
 	mutex_lock(&bp->hwrm_cmd_lock);
 	hwrm_err = _hwrm_send_message(bp, &install, sizeof(install),
 				      INSTALL_PACKAGE_TIMEOUT);
-	if (hwrm_err)
-		goto flash_pkg_exit;
-
-	if (resp->error_code) {
+	if (hwrm_err) {
 		u8 error_code = ((struct hwrm_err_output *)resp)->cmd_err;
 
-		if (error_code == NVM_INSTALL_UPDATE_CMD_ERR_CODE_FRAG_ERR) {
+		if (resp->error_code && error_code ==
+		    NVM_INSTALL_UPDATE_CMD_ERR_CODE_FRAG_ERR) {
 			install.flags |= cpu_to_le16(
 			       NVM_INSTALL_UPDATE_REQ_FLAGS_ALLOWED_TO_DEFRAG);
 			hwrm_err = _hwrm_send_message(bp, &install,
 						      sizeof(install),
 						      INSTALL_PACKAGE_TIMEOUT);
-			if (hwrm_err)
-				goto flash_pkg_exit;
 		}
+		if (hwrm_err)
+			goto flash_pkg_exit;
 	}
 
 	if (resp->result) {
-- 
2.5.1


^ permalink raw reply related

* [PATCH net v2 5/6] bnxt_en: Use correct src_fid to determine direction of the flow
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, Venkat Duvvuru
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>

From: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>

Direction of the flow is determined using src_fid. For an RX flow,
src_fid is PF's fid and for TX flow, src_fid is VF's fid. Direction
of the flow must be specified, when getting statistics for that flow.
Currently, for DECAP flow, direction is determined incorrectly, i.e.,
direction is initialized as TX for DECAP flow, instead of RX. Because
of which, stats are not reported for this DECAP flow, though it is
offloaded and there is traffic for that flow, resulting in flow age out.

This patch fixes the problem by determining the DECAP flow's direction
using correct fid.  Set the flow direction in all cases for consistency
even if 64-bit flow handle is not used.

Fixes: abd43a13525d ("bnxt_en: Support for 64-bit flow handle.")
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index 6fe4a71..6224c30 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -1285,9 +1285,7 @@ static int bnxt_tc_add_flow(struct bnxt *bp, u16 src_fid,
 		goto free_node;
 
 	bnxt_tc_set_src_fid(bp, flow, src_fid);
-
-	if (bp->fw_cap & BNXT_FW_CAP_OVS_64BIT_HANDLE)
-		bnxt_tc_set_flow_dir(bp, flow, src_fid);
+	bnxt_tc_set_flow_dir(bp, flow, flow->src_fid);
 
 	if (!bnxt_tc_can_offload(bp, flow)) {
 		rc = -EOPNOTSUPP;
-- 
2.5.1


^ permalink raw reply related

* [PATCH net v2 6/6] bnxt_en: Fix to include flow direction in L2 key
From: Michael Chan @ 2019-08-17 21:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, Somnath Kotur
In-Reply-To: <1566075892-30064-1-git-send-email-michael.chan@broadcom.com>

From: Somnath Kotur <somnath.kotur@broadcom.com>

FW expects the driver to provide unique flow reference handles
for Tx or Rx flows. When a Tx flow and an Rx flow end up sharing
a reference handle, flow offload does not seem to work.
This could happen in the case of 2 flows having their L2 fields
wildcarded but in different direction.
Fix to incorporate the flow direction as part of the L2 key

v2: Move the dir field to the end of the bnxt_tc_l2_key struct to
fix the warning reported by kbuild test robot <lkp@intel.com>.
There is existing code that initializes the structure using
nested initializer and will warn with the new u8 field added to
the beginning.  The structure also packs nicer when this new u8 is
added to the end of the structure [MChan].

Fixes: abd43a13525d ("bnxt_en: Support for 64-bit flow handle.")
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 4 ++--
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index 6224c30..dd621f6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -1236,7 +1236,7 @@ static int __bnxt_tc_del_flow(struct bnxt *bp,
 static void bnxt_tc_set_flow_dir(struct bnxt *bp, struct bnxt_tc_flow *flow,
 				 u16 src_fid)
 {
-	flow->dir = (bp->pf.fw_fid == src_fid) ? BNXT_DIR_RX : BNXT_DIR_TX;
+	flow->l2_key.dir = (bp->pf.fw_fid == src_fid) ? BNXT_DIR_RX : BNXT_DIR_TX;
 }
 
 static void bnxt_tc_set_src_fid(struct bnxt *bp, struct bnxt_tc_flow *flow,
@@ -1405,7 +1405,7 @@ static void bnxt_fill_cfa_stats_req(struct bnxt *bp,
 		 * 2. 15th bit of flow_handle must specify the flow
 		 *    direction (TX/RX).
 		 */
-		if (flow_node->flow.dir == BNXT_DIR_RX)
+		if (flow_node->flow.l2_key.dir == BNXT_DIR_RX)
 			handle = CFA_FLOW_INFO_REQ_FLOW_HANDLE_DIR_RX |
 				 CFA_FLOW_INFO_REQ_FLOW_HANDLE_MAX_MASK;
 		else
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h
index ffec57d..4f05305 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.h
@@ -23,6 +23,9 @@ struct bnxt_tc_l2_key {
 	__be16		inner_vlan_tci;
 	__be16		ether_type;
 	u8		num_vlans;
+	u8		dir;
+#define BNXT_DIR_RX	1
+#define BNXT_DIR_TX	0
 };
 
 struct bnxt_tc_l3_key {
@@ -98,9 +101,6 @@ struct bnxt_tc_flow {
 
 	/* flow applicable to pkts ingressing on this fid */
 	u16				src_fid;
-	u8				dir;
-#define BNXT_DIR_RX	1
-#define BNXT_DIR_TX	0
 	struct bnxt_tc_l2_key		l2_key;
 	struct bnxt_tc_l2_key		l2_mask;
 	struct bnxt_tc_l3_key		l3_key;
-- 
2.5.1


^ permalink raw reply related

* Re: [PATCH bpf-next v5 0/2] net: xdp: XSKMAP improvements
From: Daniel Borkmann @ 2019-08-17 21:28 UTC (permalink / raw)
  To: Björn Töpel, ast, netdev
  Cc: magnus.karlsson, jonathan.lemon, bjorn.topel, bruce.richardson,
	songliubraving, bpf
In-Reply-To: <20190815093014.31174-1-bjorn.topel@gmail.com>

On 8/15/19 11:30 AM, Björn Töpel wrote:
> This series (v5 and counting) add two improvements for the XSKMAP,
> used by AF_XDP sockets.
> 
> 1. Automatic cleanup when an AF_XDP socket goes out of scope/is
>     released. Instead of require that the user manually clears the
>     "released" state socket from the map, this is done
>     automatically. Each socket tracks which maps it resides in, and
>     remove itself from those maps at relase. A notable implementation
>     change, is that the sockets references the map, instead of the map
>     referencing the sockets. Which implies that when the XSKMAP is
>     freed, it is by definition cleared of sockets.
> 
> 2. The XSKMAP did not honor the BPF_EXIST/BPF_NOEXIST flag on insert,
>     which this patch addresses.
> 
> 
> Thanks,
> Björn
> 
> v1->v2: Fixed deadlock and broken cleanup. (Daniel)
> v2->v3: Rebased onto bpf-next
> v3->v4: {READ, WRITE}_ONCE consistency. (Daniel)
>          Socket release/map update race. (Daniel)
> v4->v5: Avoid use-after-free on XSKMAP self-assignment [1]. (Daniel)
>          Removed redundant assignment in xsk_map_update_elem().
>          Variable name consistency; Use map_entry everywhere.
> 
> [1] https://lore.kernel.org/bpf/20190802081154.30962-1-bjorn.topel@gmail.com/T/#mc68439e97bc07fa301dad9fc4850ed5aa392f385
> 
> Björn Töpel (2):
>    xsk: remove AF_XDP socket from map when the socket is released
>    xsk: support BPF_EXIST and BPF_NOEXIST flags in XSKMAP
> 
>   include/net/xdp_sock.h |  18 ++++++
>   kernel/bpf/xskmap.c    | 133 ++++++++++++++++++++++++++++++++++-------
>   net/xdp/xsk.c          |  50 ++++++++++++++++
>   3 files changed, 179 insertions(+), 22 deletions(-)
> 

Looks better, applied thanks!

^ permalink raw reply

* Re: [PATCH bpf-next v4 0/8] add need_wakeup flag to the AF_XDP rings
From: Daniel Borkmann @ 2019-08-17 21:29 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, brouer, maximmi
  Cc: bpf, bruce.richardson, ciara.loftus, jakub.kicinski, xiaolong.ye,
	qi.z.zhang, sridhar.samudrala, kevin.laatz, ilias.apalodimas,
	jonathan.lemon, kiran.patil, axboe, maciej.fijalkowski,
	maciejromanfijalkowski, intel-wired-lan
In-Reply-To: <1565767643-4908-1-git-send-email-magnus.karlsson@intel.com>

On 8/14/19 9:27 AM, Magnus Karlsson wrote:
> This patch set adds support for a new flag called need_wakeup in the
> AF_XDP Tx and fill rings. When this flag is set by the driver, it
> means that the application has to explicitly wake up the kernel Rx
> (for the bit in the fill ring) or kernel Tx (for bit in the Tx ring)
> processing by issuing a syscall. Poll() can wake up both and sendto()
> will wake up Tx processing only.
> 
> The main reason for introducing this new flag is to be able to
> efficiently support the case when application and driver is executing
> on the same core. Previously, the driver was just busy-spinning on the
> fill ring if it ran out of buffers in the HW and there were none to
> get from the fill ring. This approach works when the application and
> driver is running on different cores as the application can replenish
> the fill ring while the driver is busy-spinning. Though, this is a
> lousy approach if both of them are running on the same core as the
> probability of the fill ring getting more entries when the driver is
> busy-spinning is zero. With this new feature the driver now sets the
> need_wakeup flag and returns to the application. The application can
> then replenish the fill queue and then explicitly wake up the Rx
> processing in the kernel using the syscall poll(). For Tx, the flag is
> only set to one if the driver has no outstanding Tx completion
> interrupts. If it has some, the flag is zero as it will be woken up by
> a completion interrupt anyway. This flag can also be used in other
> situations where the driver needs to be woken up explicitly.
> 
> As a nice side effect, this new flag also improves the Tx performance
> of the case where application and driver are running on two different
> cores as it reduces the number of syscalls to the kernel. The kernel
> tells user space if it needs to be woken up by a syscall, and this
> eliminates many of the syscalls. The Rx performance of the 2-core case
> is on the other hand slightly worse, since there is a need to use a
> syscall now to wake up the driver, instead of the driver
> busy-spinning. It does waste less CPU cycles though, which might lead
> to better overall system performance.
> 
> This new flag needs some simple driver support. If the driver does not
> support it, the Rx flag is always zero and the Tx flag is always
> one. This makes any application relying on this feature default to the
> old behavior of not requiring any syscalls in the Rx path and always
> having to call sendto() in the Tx path.
> 
> For backwards compatibility reasons, this feature has to be explicitly
> turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend
> that you always turn it on as it has a large positive performance
> impact for the one core case and does not degrade 2 core performance
> and actually improves it for Tx heavy workloads.
> 
> Here are some performance numbers measured on my local,
> non-performance optimized development system. That is why you are
> seeing numbers lower than the ones from Björn and Jesper. 64 byte
> packets at 40Gbit/s line rate. All results in Mpps. Cores == 1 means
> that both application and driver is executing on the same core. Cores
> == 2 that they are on different cores.
> 
>                                Applications
> need_wakeup  cores    txpush    rxdrop      l2fwd
> ---------------------------------------------------------------
>       n         1       0.07      0.06        0.03
>       y         1       21.6      8.2         6.5
>       n         2       32.3      11.7        8.7
>       y         2       33.1      11.7        8.7
> 
> Overall, the need_wakeup flag provides the same or better performance
> in all the micro-benchmarks. The reduction of sendto() calls in txpush
> is large. Only a few per second is needed. For l2fwd, the drop is 50%
> for the 1 core case and more than 99.9% for the 2 core case. Do not
> know why I am not seeing the same drop for the 1 core case yet.
> 
> The name and inspiration of the flag has been taken from io_uring by
> Jens Axboe. Details about this feature in io_uring can be found in
> http://kernel.dk/io_uring.pdf, section 8.3. It also addresses most of
> the denial of service and sendto() concerns raised by Maxim
> Mikityanskiy in https://www.spinics.net/lists/netdev/msg554657.html.
> 
> The typical Tx part of an application will have to change from:
> 
> ret = sendto(fd,....)
> 
> to:
> 
> if (xsk_ring_prod__needs_wakeup(&xsk->tx))
>         ret = sendto(fd,....)
> 
> and th Rx part from:
> 
> rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
> if (!rcvd)
>         return;
> 
> to:
> 
> rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
> if (!rcvd) {
>         if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
>                ret = poll(fd,.....);
>         return;
> }
> 
> v3 -> v4:
> * Maxim found a possible race in the Tx part of the driver. The
>    setting of the flag needs to happen before the sending, otherwise it
>    might trigger this race. Fixed in ixgbe and i40e driver.
> * Mellanox support contributed by Maxim
> * Removed the XSK_DRV_CAN_SLEEP flag as it was not used
>    anymore. Thanks to Sridhar for discovering this.
> * For consistency the feature is now always called need_wakeup. There
>    were some places where it was referred to as might_sleep, but they
>    have been removed. Thanks to Sridhar for spotting.
> * Fixed some typos in the commit messages
> 
> v2 -> v3:
> * Converted the Mellanox driver to the new ndo in patch 1 as pointed out
>    by Maxim
> * Fixed the compatibility code of XDP_MMAP_OFFSETS so it now works.
> 
> v1 -> v2:
> * Fixed bisectability problem pointed out by Jakub
> * Added missing initiliztion of the Tx need_wakeup flag to 1
> 
> This patch has been applied against commit b753c5a7f99f ("Merge branch 'r8152-RX-improve'")
> 
> Structure of the patch set:
> 
> Patch 1: Replaces the ndo_xsk_async_xmit with ndo_xsk_wakeup to
>           support waking up both Rx and Tx processing
> Patch 2: Implements the need_wakeup functionality in common code
> Patch 3-4: Add need_wakeup support to the i40e and ixgbe drivers
> Patch 5: Add need_wakeup support to libbpf
> Patch 6: Add need_wakeup support to the xdpsock sample application
> Patch 7-8: Add need_wakeup support to the Mellanox mlx5 driver
> 
> Thanks: Magnus
> 
> Magnus Karlsson (6):
>    xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup
>    xsk: add support for need_wakeup flag in AF_XDP rings
>    i40e: add support for AF_XDP need_wakeup feature
>    ixgbe: add support for AF_XDP need_wakeup feature
>    libbpf: add support for need_wakeup flag in AF_XDP part
>    samples/bpf: add use of need_wakeup flag in xdpsock
> 
> Maxim Mikityanskiy (2):
>    net/mlx5e: Move the SW XSK code from NAPI poll to a separate function
>    net/mlx5e: Add AF_XDP need_wakeup support
> 
>   drivers/net/ethernet/intel/i40e/i40e_main.c        |   5 +-
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c         |  25 ++-
>   drivers/net/ethernet/intel/i40e/i40e_xsk.h         |   2 +-
>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   5 +-
>   .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h   |   2 +-
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |  22 ++-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h    |  14 ++
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c    |   2 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h    |  14 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |   7 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |  27 ++-
>   include/linux/netdevice.h                          |  14 +-
>   include/net/xdp_sock.h                             |  33 +++-
>   include/uapi/linux/if_xdp.h                        |  13 ++
>   net/xdp/xdp_umem.c                                 |  12 +-
>   net/xdp/xsk.c                                      | 149 +++++++++++++---
>   net/xdp/xsk.h                                      |  13 ++
>   net/xdp/xsk_queue.h                                |   1 +
>   samples/bpf/xdpsock_user.c                         | 192 +++++++++++++--------
>   tools/include/uapi/linux/if_xdp.h                  |  13 ++
>   tools/lib/bpf/xsk.c                                |   4 +
>   tools/lib/bpf/xsk.h                                |   6 +
>   23 files changed, 462 insertions(+), 115 deletions(-)
> 
> --
> 2.7.4
> 

Series applied, thanks everyone!

^ permalink raw reply

* Re: [PATCH bpf-next v2] net: Don't call XDP_SETUP_PROG when nothing is changed
From: Daniel Borkmann @ 2019-08-17 21:29 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Alexei Starovoitov, Jakub Kicinski
  Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, David S. Miller,
	Björn Töpel, Saeed Mahameed, Jesper Dangaard Brouer,
	John Fastabend, Martin KaFai Lau, Song Liu, Yonghong Song
In-Reply-To: <20190814143352.3759-1-maximmi@mellanox.com>

On 8/14/19 4:34 PM, Maxim Mikityanskiy wrote:
> Don't uninstall an XDP program when none is installed, and don't install
> an XDP program that has the same ID as the one already installed.
> 
> dev_change_xdp_fd doesn't perform any checks in case it uninstalls an
> XDP program. It means that the driver's ndo_bpf can be called with
> XDP_SETUP_PROG asking to set it to NULL even if it's already NULL. This
> case happens if the user runs `ip link set eth0 xdp off` when there is
> no XDP program attached.
> 
> The symmetrical case is possible when the user tries to set the program
> that is already set.
> 
> The drivers typically perform some heavy operations on XDP_SETUP_PROG,
> so they all have to handle these cases internally to return early if
> they happen. This patch puts this check into the kernel code, so that
> all drivers will benefit from it.
> 
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: relicense bpf_helpers.h and bpf_endian.h
From: Daniel Borkmann @ 2019-08-17 21:30 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev
  Cc: andrii.nakryiko, kernel-team, Michael Holzheu, Naveen N . Rao,
	David S . Miller, Michal Rostecki, John Fastabend, Sargun Dhillon
In-Reply-To: <20190816054543.2215626-1-andriin@fb.com>

On 8/16/19 7:45 AM, Andrii Nakryiko wrote:
> bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
> definitions essential to almost every BPF program. Which makes them
> useful not just for selftests. To be able to expose them as part of
> libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
> BSD-2-Clause. This patch updates licensing of those two files.
> 
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Hechao Li <hechaol@fb.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Acked-by: Andrey Ignatov <rdna@fb.com>
> Acked-by: Yonghong Song <yhs@fb.com>
> Acked-by: Lawrence Brakmo <brakmo@fb.com>
> Acked-by: Adam Barth <arb@fb.com>
> Acked-by: Roman Gushchin <guro@fb.com>
> Acked-by: Josef Bacik <jbacik@fb.com>
> Acked-by: Joe Stringer <joe@wand.net.nz>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Acked-by: David Ahern <dsahern@gmail.com>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Acked-by: Lorenz Bauer <lmb@cloudflare.com>
> Acked-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> Acked-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> Acked-by: Petar Penkov <ppenkov@google.com>
> Acked-by: Teng Qin <palmtenor@gmail.com>
> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Michal Rostecki <mrostecki@opensuse.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Sargun Dhillon <sargun@sargun.me>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Applied, thanks!

^ permalink raw reply

* Re: [bpf-next,v2] selftests/bpf: fix race in test_tcp_rtt test
From: Daniel Borkmann @ 2019-08-17 21:31 UTC (permalink / raw)
  To: Petar Penkov, netdev, bpf; +Cc: davem, ast, sdf, Petar Penkov
In-Reply-To: <20190816170825.22500-1-ppenkov.kernel@gmail.com>

On 8/16/19 7:08 PM, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
> 
> There is a race in this test between receiving the ACK for the
> single-byte packet sent in the test, and reading the values from the
> map.
> 
> This patch fixes this by having the client wait until there are no more
> unacknowledged packets.
> 
> Before:
> for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> < trimmed error messages >
> 993
> 
> After:
> for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> 10000
> 
> Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
> Signed-off-by: Petar Penkov <ppenkov@google.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next v4 0/4] bpf: support cloning sk storage on accept()
From: Daniel Borkmann @ 2019-08-17 21:32 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, bpf
  Cc: davem, ast, Martin KaFai Lau, Yonghong Song
In-Reply-To: <20190814173751.31806-1-sdf@google.com>

On 8/14/19 7:37 PM, Stanislav Fomichev wrote:
> Currently there is no way to propagate sk storage from the listener
> socket to a newly accepted one. Consider the following use case:
> 
>          fd = socket();
>          setsockopt(fd, SOL_IP, IP_TOS,...);
>          /* ^^^ setsockopt BPF program triggers here and saves something
>           * into sk storage of the listener.
>           */
>          listen(fd, ...);
>          while (client = accept(fd)) {
>                  /* At this point all association between listener
>                   * socket and newly accepted one is gone. New
>                   * socket will not have any sk storage attached.
>                   */
>          }
> 
> Let's add new BPF_F_CLONE flag that can be specified when creating
> a socket storage map. This new flag indicates that map contents
> should be cloned when the socket is cloned.
> 
> v4:
> * drop 'goto err' in bpf_sk_storage_clone (Yonghong Song)
> * add comment about race with bpf_sk_storage_map_free to the
>    bpf_sk_storage_clone side as well (Daniel Borkmann)
> 
> v3:
> * make sure BPF_F_NO_PREALLOC is always present when creating
>    a map (Martin KaFai Lau)
> * don't call bpf_sk_storage_free explicitly, rely on
>    sk_free_unlock_clone to do the cleanup (Martin KaFai Lau)
> 
> v2:
> * remove spinlocks around selem_link_map/sk (Martin KaFai Lau)
> * BPF_F_CLONE on a map, not selem (Martin KaFai Lau)
> * hold a map while cloning (Martin KaFai Lau)
> * use BTF maps in selftests (Yonghong Song)
> * do proper cleanup selftests; don't call close(-1) (Yonghong Song)
> * export bpf_map_inc_not_zero
> 
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Yonghong Song <yhs@fb.com>
> 
> Stanislav Fomichev (4):
>    bpf: export bpf_map_inc_not_zero
>    bpf: support cloning sk storage on accept()
>    bpf: sync bpf.h to tools/
>    selftests/bpf: add sockopt clone/inheritance test
> 
>   include/linux/bpf.h                           |   2 +
>   include/net/bpf_sk_storage.h                  |  10 +
>   include/uapi/linux/bpf.h                      |   3 +
>   kernel/bpf/syscall.c                          |  16 +-
>   net/core/bpf_sk_storage.c                     | 104 ++++++-
>   net/core/sock.c                               |   9 +-
>   tools/include/uapi/linux/bpf.h                |   3 +
>   tools/testing/selftests/bpf/.gitignore        |   1 +
>   tools/testing/selftests/bpf/Makefile          |   3 +-
>   .../selftests/bpf/progs/sockopt_inherit.c     |  97 +++++++
>   .../selftests/bpf/test_sockopt_inherit.c      | 253 ++++++++++++++++++
>   11 files changed, 491 insertions(+), 10 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
>   create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c
> 

Applied, thanks!

^ permalink raw reply

* Re: [PATCH v2] virtio-net: lower min ring num_free for efficiency
From: Michael S. Tsirkin @ 2019-08-18  7:10 UTC (permalink / raw)
  To: ? jiang
  Cc: jasowang@redhat.com, davem@davemloft.net, ast@kernel.org,
	daniel@iogearbox.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com, kafai@fb.com,
	songliubraving@fb.com, yhs@fb.com,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, xdp-newbies@vger.kernel.org,
	bpf@vger.kernel.org, jiangran.jr@alibaba-inc.com
In-Reply-To: <BYAPR14MB32058F4B2AD162F5421BB9B4A6AC0@BYAPR14MB3205.namprd14.prod.outlook.com>

On Thu, Aug 15, 2019 at 09:42:40AM +0000, ? jiang wrote:
> This change lowers ring buffer reclaim threshold from 1/2*queue to budget
> for better performance. According to our test with qemu + dpdk, packet
> dropping happens when the guest is not able to provide free buffer in
> avail ring timely with default 1/2*queue. The value in the patch has been
> tested and does show better performance.
> 
> Test setup: iperf3 to generate packets to guest (total 30mins, pps 400k, UDP)
> avg packets drop before: 2842
> avg packets drop after: 360(-87.3%)
> 
> Signed-off-by: jiangkidd <jiangkidd@hotmail.com>

To add to that:

Further, current code suffers from a starvation problem: the amount of
work done by try_fill_recv is not bounded by the budget parameter, thus
(with large queues) once in a while userspace gets blocked for a long
time while queue is being refilled. Trigger refills earlier to make sure
the amount of work to do is limited.

With this addition to the log:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0d4115c9e20b..bc08be7925eb 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  		}
>  	}
>  
> -	if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
> +	if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
>  		if (!try_fill_recv(vi, rq, GFP_ATOMIC))
>  			schedule_delayed_work(&vi->refill, 0);
>  	}
> -- 
> 2.11.0

^ permalink raw reply

* Question on xen-netfront code to fix a potential ring buffer corruption
From: Dongli Zhang @ 2019-08-18  8:31 UTC (permalink / raw)
  To: xen-devel; +Cc: netdev, jgross, Joe Jin

Hi,

Would you please help confirm why the condition at line 908 is ">="?

In my opinion, we would only hit "skb_shinfo(skb)->nr_frag == MAX_SKB_FRAGS" at
line 908.

890 static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
891                                   struct sk_buff *skb,
892                                   struct sk_buff_head *list)
893 {
894         RING_IDX cons = queue->rx.rsp_cons;
895         struct sk_buff *nskb;
896 
897         while ((nskb = __skb_dequeue(list))) {
898                 struct xen_netif_rx_response *rx =
899                         RING_GET_RESPONSE(&queue->rx, ++cons);
900                 skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
901 
902                 if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) {
903                         unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
904 
905                         BUG_ON(pull_to < skb_headlen(skb));
906                         __pskb_pull_tail(skb, pull_to - skb_headlen(skb));
907                 }
908                 if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)) {
909                         queue->rx.rsp_cons = ++cons;
910                         kfree_skb(nskb);
911                         return ~0U;
912                 }
913 
914                 skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
915                                 skb_frag_page(nfrag),
916                                 rx->offset, rx->status, PAGE_SIZE);
917 
918                 skb_shinfo(nskb)->nr_frags = 0;
919                 kfree_skb(nskb);
920         }
921 
922         return cons;
923 }


The reason that I ask about this is because I am considering below patch to
avoid a potential xen-netfront ring buffer corruption.

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 8d33970..48a2162 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -906,7 +906,7 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
                        __pskb_pull_tail(skb, pull_to - skb_headlen(skb));
                }
                if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)) {
-                       queue->rx.rsp_cons = ++cons;
+                       queue->rx.rsp_cons = cons + skb_queue_len(list) + 1;
                        kfree_skb(nskb);
                        return ~0U;
                }


If there is skb left in list when we return ~0U, queue->rx.rsp_cons may be set
incorrectly.

While I am trying to make up a case that would hit the corruption, I could not
explain why (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)), but not
just "==". Perhaps __pskb_pull_tail() may fail although pull_to is less than
skb_headlen(skb).

Thank you very much!

Dongli Zhang

^ permalink raw reply related

* Re: [PATCH 1/3] nvmem: mxs-ocotp: update MODULE_AUTHOR() email address
From: Srinivas Kandagatla @ 2019-08-18  9:08 UTC (permalink / raw)
  To: Stefan Wahren, Jean Delvare, Guenter Roeck, David S. Miller,
	Shawn Guo, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	NXP Linux Team
  Cc: linux-hwmon, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <1565720249-6549-1-git-send-email-wahrenst@gmx.net>



On 13/08/2019 19:17, Stefan Wahren wrote:
> The email address listed in MODULE_AUTHOR() will be disabled in the
> near future. Replace it with my private one.
> 
> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
> =2D--
>   drivers/nvmem/mxs-ocotp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 

Applied thanks.

--srini

^ permalink raw reply

* Re: kernel BUG at include/linux/skbuff.h:LINE! (2)
From: Xin Long @ 2019-08-18 14:06 UTC (permalink / raw)
  To: syzbot
  Cc: davem, LKML, linux-sctp, Marcelo Ricardo Leitner, network dev,
	Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <0000000000008182a50590404a02@google.com>

On Sat, Aug 17, 2019 at 2:38 AM syzbot
<syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    459c5fb4 Merge branch 'mscc-PTP-support'
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13f2d33c600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=eb349eeee854e389c36d
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=111849e2600000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1442c25a600000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> kernel BUG at include/linux/skbuff.h:2225!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9030 Comm: syz-executor649 Not tainted 5.3.0-rc3+ #134
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
> RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
> RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0
> 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13
> 18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
> RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
> RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
> RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
> RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
> R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
> R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
> FS:  0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
> Call Trace:
>   sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
>   sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
>   sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
>   sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
>   sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
>   ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
>   ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
>   NF_HOOK include/linux/netfilter.h:305 [inline]
>   NF_HOOK include/linux/netfilter.h:299 [inline]
>   ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
>   dst_input include/net/dst.h:442 [inline]
>   ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
Looks skb_list_del_init() should be called in ip6_sublist_rcv_finish,
as does in ip_sublist_rcv_finish().

>   ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
>   ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
>   ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
>   __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
>   __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
>   __netif_receive_skb_list net/core/dev.c:5149 [inline]
>   netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
>   gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
>   gro_normal_list net/core/dev.c:5755 [inline]
>   gro_normal_one net/core/dev.c:5769 [inline]
>   napi_frags_finish net/core/dev.c:5782 [inline]
>   napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
>   tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
>   tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
>   call_write_iter include/linux/fs.h:1870 [inline]
>   do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
>   do_iter_write fs/read_write.c:970 [inline]
>   do_iter_write+0x184/0x610 fs/read_write.c:951
>   vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
>   do_writev+0x15b/0x330 fs/read_write.c:1058
>   __do_sys_writev fs/read_write.c:1131 [inline]
>   __se_sys_writev fs/read_write.c:1128 [inline]
>   __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
>   do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x441b10
> Code: 05 48 3d 01 f0 ff ff 0f 83 5d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> 00 66 90 83 3d 01 95 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 34 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00
> RSP: 002b:00007ffe63706b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
> RAX: ffffffffffffffda RBX: 00007ffe63706ba0 RCX: 0000000000441b10
> RDX: 0000000000000001 RSI: 00007ffe63706bd0 RDI: 00000000000000f0
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000004
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000122cb
> R13: 0000000000402960 R14: 0000000000000000 R15: 0000000000000000
> Modules linked in:
> ---[ end trace c37566c1c02066db ]---
> RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
> RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
> RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0
> 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13
> 18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
> RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
> RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
> RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
> RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
> R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
> R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
> FS:  0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: kernel BUG at include/linux/skbuff.h:LINE! (2)
From: Dmitry Vyukov @ 2019-08-18 14:13 UTC (permalink / raw)
  To: Xin Long
  Cc: syzbot, davem, LKML, linux-sctp, Marcelo Ricardo Leitner,
	network dev, Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <CADvbK_e+em+LiQOttfA9nckA4EPFuW_Q-cBmXx3S5pw5X+tQfw@mail.gmail.com>

On Sun, Aug 18, 2019 at 7:07 AM Xin Long <lucien.xin@gmail.com> wrote:
>
> On Sat, Aug 17, 2019 at 2:38 AM syzbot
> <syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    459c5fb4 Merge branch 'mscc-PTP-support'
> > git tree:       net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13f2d33c600000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=eb349eeee854e389c36d
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=111849e2600000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1442c25a600000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > kernel BUG at include/linux/skbuff.h:2225!
> > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 9030 Comm: syz-executor649 Not tainted 5.3.0-rc3+ #134
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> > RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
> > RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
> > RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> > Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0
> > 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13
> > 18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
> > RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
> > RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
> > RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
> > RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
> > R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
> > R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
> > FS:  0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
> > Call Trace:
> >   sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
> >   sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
> >   sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
> >   sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
> >   sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
> >   ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
> >   ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
> >   NF_HOOK include/linux/netfilter.h:305 [inline]
> >   NF_HOOK include/linux/netfilter.h:299 [inline]
> >   ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
> >   dst_input include/net/dst.h:442 [inline]
> >   ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
> Looks skb_list_del_init() should be called in ip6_sublist_rcv_finish,
> as does in ip_sublist_rcv_finish().

This was recently introduced, right? Only in net-next and linux-next.
Otherwise, is it a remote DoS? If so and if it's present in any
releases, may need a CVE.

^ permalink raw reply

* [PATCH net 1/1] bnx2x: Fix VF's VLAN reconfiguration in reload.
From: Manish Chopra @ 2019-08-18 14:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, aelior, skalluru

Commit 04f05230c5c13 ("bnx2x: Remove configured vlans as
part of unload sequence."), introduced a regression in driver
that as a part of VF's reload flow, VLANs created on the VF
doesn't get re-configured in hardware as vlan metadata/info
was not getting cleared for the VFs which causes vlan PING to stop.

This patch clears the vlan metadata/info so that VLANs gets
re-configured back in the hardware in VF's reload flow and
PING/traffic continues for VLANs created over the VFs.

Fixes: 04f05230c5c13 ("bnx2x: Remove configured vlans as part of unload sequence.")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com>
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |  7 ++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |  2 ++
 .../net/ethernet/broadcom/bnx2x/bnx2x_main.c    | 17 ++++++++++++-----
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index e47ea92e2ae3..d10b421ed1f1 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -3057,12 +3057,13 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link)
 	/* if VF indicate to PF this function is going down (PF will delete sp
 	 * elements and clear initializations
 	 */
-	if (IS_VF(bp))
+	if (IS_VF(bp)) {
+		bnx2x_clear_vlan_info(bp);
 		bnx2x_vfpf_close_vf(bp);
-	else if (unload_mode != UNLOAD_RECOVERY)
+	} else if (unload_mode != UNLOAD_RECOVERY) {
 		/* if this is a normal/close unload need to clean up chip*/
 		bnx2x_chip_cleanup(bp, unload_mode, keep_link);
-	else {
+	} else {
 		/* Send the UNLOAD_REQUEST to the MCP */
 		bnx2x_send_unload_req(bp, unload_mode);
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index c2f6e44e9a3f..8b08cb18e363 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -425,6 +425,8 @@ void bnx2x_set_reset_global(struct bnx2x *bp);
 void bnx2x_disable_close_the_gate(struct bnx2x *bp);
 int bnx2x_init_hw_func_cnic(struct bnx2x *bp);
 
+void bnx2x_clear_vlan_info(struct bnx2x *bp);
+
 /**
  * bnx2x_sp_event - handle ramrods completion.
  *
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 2cc14db8f0ec..192ff8d5da32 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -8482,11 +8482,21 @@ int bnx2x_set_vlan_one(struct bnx2x *bp, u16 vlan,
 	return rc;
 }
 
+void bnx2x_clear_vlan_info(struct bnx2x *bp)
+{
+	struct bnx2x_vlan_entry *vlan;
+
+	/* Mark that hw forgot all entries */
+	list_for_each_entry(vlan, &bp->vlan_reg, link)
+		vlan->hw = false;
+
+	bp->vlan_cnt = 0;
+}
+
 static int bnx2x_del_all_vlans(struct bnx2x *bp)
 {
 	struct bnx2x_vlan_mac_obj *vlan_obj = &bp->sp_objs[0].vlan_obj;
 	unsigned long ramrod_flags = 0, vlan_flags = 0;
-	struct bnx2x_vlan_entry *vlan;
 	int rc;
 
 	__set_bit(RAMROD_COMP_WAIT, &ramrod_flags);
@@ -8495,10 +8505,7 @@ static int bnx2x_del_all_vlans(struct bnx2x *bp)
 	if (rc)
 		return rc;
 
-	/* Mark that hw forgot all entries */
-	list_for_each_entry(vlan, &bp->vlan_reg, link)
-		vlan->hw = false;
-	bp->vlan_cnt = 0;
+	bnx2x_clear_vlan_info(bp);
 
 	return 0;
 }
-- 
2.18.1


^ permalink raw reply related

* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain
From: Paul Blakey @ 2019-08-18 16:00 UTC (permalink / raw)
  To: Pravin B Shelar, netdev, David S. Miller, Justin Pettit,
	Simon Horman, Marcelo Ricardo Leitner, Vlad Buslov, Paul Blakey
  Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Rony Efraim, Oz Shlomo

What do you guys say about the following diff on top of the last one?
Use static key, and also have OVS_DP_CMD_SET command probe/enable the feature.

This will allow userspace to probe the feature, and selectivly enable it via the
OVS_DP_CMD_SET command.

Thansk,
Paul.


---
 include/uapi/linux/openvswitch.h |  3 +++
 net/openvswitch/datapath.c       | 29 +++++++++++++++++++++++++----
 net/openvswitch/datapath.h       |  2 ++
 net/openvswitch/flow.c           |  6 ++++--
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index f271f1e..1887a45 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -123,6 +123,9 @@ struct ovs_vport_stats {
 /* Allow datapath to associate multiple Netlink PIDs to each vport */
 #define OVS_DP_F_VPORT_PIDS	(1 << 1)
 
+/* Allow tc offload recirc sharing */
+#define OVS_DP_F_TC_RECIRC_SHARING	(1 << 2)
+
 /* Fixed logical ports. */
 #define OVSP_LOCAL      ((__u32)0)
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 892287d..589b4f1 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1541,10 +1541,27 @@ static void ovs_dp_reset_user_features(struct sk_buff *skb, struct genl_info *in
 	dp->user_features = 0;
 }
 
-static void ovs_dp_change(struct datapath *dp, struct nlattr *a[])
+DEFINE_STATIC_KEY_FALSE(tc_recirc_sharing_support);
+
+static int ovs_dp_change(struct datapath *dp, struct nlattr *a[])
 {
+	u32 user_features;
+
 	if (a[OVS_DP_ATTR_USER_FEATURES])
-		dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
+		user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
+
+#if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	if (user_features & OVS_DP_F_TC_RECIRC_SHARING)
+		return -EOPNOTSUPP;
+#endif
+	dp->user_features = user_features;
+
+	if (dp->user_features & OVS_DP_F_TC_RECIRC_SHARING)
+		static_branch_enable(&tc_recirc_sharing_support);
+	else
+		static_branch_disable(&tc_recirc_sharing_support);
+
+	return 0;
 }
 
 static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
@@ -1606,7 +1623,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	parms.port_no = OVSP_LOCAL;
 	parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID];
 
-	ovs_dp_change(dp, a);
+	err = ovs_dp_change(dp, a);
+	if (err)
+		goto err_destroy_meters;
 
 	/* So far only local changes have been made, now need the lock. */
 	ovs_lock();
@@ -1732,7 +1751,9 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto err_unlock_free;
 
-	ovs_dp_change(dp, info->attrs);
+	err = ovs_dp_change(dp, info->attrs);
+	if (err)
+		goto err_unlock_free;
 
 	err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
 				   info->snd_seq, 0, OVS_DP_CMD_SET);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 751d34a..81e85dd 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -218,6 +218,8 @@ static inline struct datapath *get_dp(struct net *net, int dp_ifindex)
 extern struct notifier_block ovs_dp_device_notifier;
 extern struct genl_family dp_vport_genl_family;
 
+DECLARE_STATIC_KEY_FALSE(tc_recirc_sharing_support);
+
 void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key);
 void ovs_dp_detach_port(struct vport *);
 int ovs_dp_upcall(struct datapath *, struct sk_buff *,
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 0287ead..c0ac7c9 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -853,8 +853,10 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->mac_proto = res;
 
 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
-	tc_ext = skb_ext_find(skb, TC_SKB_EXT);
-	key->recirc_id = tc_ext ? tc_ext->chain : 0;
+	if (static_branch_unlikely(&tc_recirc_sharing_support)) {
+		tc_ext = skb_ext_find(skb, TC_SKB_EXT);
+		key->recirc_id = tc_ext ? tc_ext->chain : 0;
+	}
 #else
 	key->recirc_id = 0;
 #endif
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net-next 0/6] net: dsa: enable and disable all ports
From: Vivien Didelot @ 2019-08-18 17:35 UTC (permalink / raw)
  To: netdev; +Cc: marek.behun, davem, f.fainelli, andrew, Vivien Didelot

The DSA stack currently calls the .port_enable and .port_disable switch
callbacks for slave ports only. However, it is useful to call them for all
port types. For example this allows some drivers to delay the optimization
of power consumption after the switch is setup. This can also help reducing
the setup code of drivers a bit.

The first DSA core patches enable and disable all ports of a switch, regardless
their type. The last mv88e6xxx patches remove redundant code from the driver
setup and the said callbacks, now that they handle SERDES power for all ports.

Vivien Didelot (6):
  net: dsa: use a single switch statement for port setup
  net: dsa: do not enable or disable non user ports
  net: dsa: enable and disable all ports
  net: dsa: mv88e6xxx: do not change STP state on port disabling
  net: dsa: mv88e6xxx: enable SERDES after setup
  net: dsa: mv88e6xxx: wrap SERDES IRQ in power function

 drivers/net/dsa/b53/b53_common.c       | 10 ++-
 drivers/net/dsa/bcm_sf2.c              |  6 ++
 drivers/net/dsa/lan9303-core.c         |  6 ++
 drivers/net/dsa/lantiq_gswip.c         |  6 ++
 drivers/net/dsa/microchip/ksz_common.c |  6 ++
 drivers/net/dsa/mt7530.c               |  6 ++
 drivers/net/dsa/mv88e6xxx/chip.c       | 64 ++++++-----------
 net/dsa/dsa2.c                         | 98 +++++++++++++-------------
 8 files changed, 112 insertions(+), 90 deletions(-)

-- 
2.22.0


^ permalink raw reply

* [PATCH net-next 1/6] net: dsa: use a single switch statement for port setup
From: Vivien Didelot @ 2019-08-18 17:35 UTC (permalink / raw)
  To: netdev; +Cc: marek.behun, davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190818173548.19631-1-vivien.didelot@gmail.com>

It is currently difficult to read the different steps involved in the
setup and teardown of ports in the DSA code. Keep it simple with a
single switch statement for each port type: UNUSED, CPU, DSA, or USER.

Also no need to call devlink_port_unregister from within dsa_port_setup
as this step is inconditionally handled by dsa_port_teardown on error.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 net/dsa/dsa2.c | 87 ++++++++++++++++++++++----------------------------
 1 file changed, 39 insertions(+), 48 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 3abd173ebacb..405552ac4c08 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -254,88 +254,79 @@ static void dsa_tree_teardown_default_cpu(struct dsa_switch_tree *dst)
 
 static int dsa_port_setup(struct dsa_port *dp)
 {
-	enum devlink_port_flavour flavour;
 	struct dsa_switch *ds = dp->ds;
 	struct dsa_switch_tree *dst = ds->dst;
-	int err = 0;
-
-	if (dp->type == DSA_PORT_TYPE_UNUSED)
-		return 0;
-
-	memset(&dp->devlink_port, 0, sizeof(dp->devlink_port));
-	dp->mac = of_get_mac_address(dp->dn);
-
-	switch (dp->type) {
-	case DSA_PORT_TYPE_CPU:
-		flavour = DEVLINK_PORT_FLAVOUR_CPU;
-		break;
-	case DSA_PORT_TYPE_DSA:
-		flavour = DEVLINK_PORT_FLAVOUR_DSA;
-		break;
-	case DSA_PORT_TYPE_USER: /* fall-through */
-	default:
-		flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
-		break;
-	}
-
-	/* dp->index is used now as port_number. However
-	 * CPU and DSA ports should have separate numbering
-	 * independent from front panel port numbers.
-	 */
-	devlink_port_attrs_set(&dp->devlink_port, flavour,
-			       dp->index, false, 0,
-			       (const char *) &dst->index, sizeof(dst->index));
-	err = devlink_port_register(ds->devlink, &dp->devlink_port,
-				    dp->index);
-	if (err)
-		return err;
+	const unsigned char *id = (const unsigned char *)&dst->index;
+	const unsigned char len = sizeof(dst->index);
+	struct devlink_port *dlp = &dp->devlink_port;
+	struct devlink *dl = ds->devlink;
+	int err;
 
 	switch (dp->type) {
 	case DSA_PORT_TYPE_UNUSED:
 		break;
 	case DSA_PORT_TYPE_CPU:
+		memset(dlp, 0, sizeof(*dlp));
+		devlink_port_attrs_set(dlp, DEVLINK_PORT_FLAVOUR_CPU,
+				       dp->index, false, 0, id, len);
+		err = devlink_port_register(dl, dlp, dp->index);
+		if (err)
+			return err;
+
 		err = dsa_port_link_register_of(dp);
 		if (err)
-			dev_err(ds->dev, "failed to setup link for port %d.%d\n",
-				ds->index, dp->index);
+			return err;
 		break;
 	case DSA_PORT_TYPE_DSA:
+		memset(dlp, 0, sizeof(*dlp));
+		devlink_port_attrs_set(dlp, DEVLINK_PORT_FLAVOUR_DSA,
+				       dp->index, false, 0, id, len);
+		err = devlink_port_register(dl, dlp, dp->index);
+		if (err)
+			return err;
+
 		err = dsa_port_link_register_of(dp);
 		if (err)
-			dev_err(ds->dev, "failed to setup link for port %d.%d\n",
-				ds->index, dp->index);
+			return err;
 		break;
 	case DSA_PORT_TYPE_USER:
+		memset(dlp, 0, sizeof(*dlp));
+		devlink_port_attrs_set(dlp, DEVLINK_PORT_FLAVOUR_PHYSICAL,
+				       dp->index, false, 0, id, len);
+		err = devlink_port_register(dl, dlp, dp->index);
+		if (err)
+			return err;
+
+		dp->mac = of_get_mac_address(dp->dn);
 		err = dsa_slave_create(dp);
 		if (err)
-			dev_err(ds->dev, "failed to create slave for port %d.%d\n",
-				ds->index, dp->index);
-		else
-			devlink_port_type_eth_set(&dp->devlink_port, dp->slave);
+			return err;
+
+		devlink_port_type_eth_set(dlp, dp->slave);
 		break;
 	}
 
-	if (err)
-		devlink_port_unregister(&dp->devlink_port);
-
-	return err;
+	return 0;
 }
 
 static void dsa_port_teardown(struct dsa_port *dp)
 {
-	if (dp->type != DSA_PORT_TYPE_UNUSED)
-		devlink_port_unregister(&dp->devlink_port);
+	struct devlink_port *dlp = &dp->devlink_port;
 
 	switch (dp->type) {
 	case DSA_PORT_TYPE_UNUSED:
 		break;
 	case DSA_PORT_TYPE_CPU:
 		dsa_tag_driver_put(dp->tag_ops);
-		/* fall-through */
+		devlink_port_unregister(dlp);
+		dsa_port_link_unregister_of(dp);
+		break;
 	case DSA_PORT_TYPE_DSA:
+		devlink_port_unregister(dlp);
 		dsa_port_link_unregister_of(dp);
 		break;
 	case DSA_PORT_TYPE_USER:
+		devlink_port_unregister(dlp);
 		if (dp->slave) {
 			dsa_slave_destroy(dp->slave);
 			dp->slave = NULL;
-- 
2.22.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox