Netdev List
 help / color / mirror / Atom feed
* [net-next 09/12] bpf, i40e: add meta data support
From: Jeff Kirsher @ 2018-06-04 17:56 UTC (permalink / raw)
  To: davem; +Cc: Daniel Borkmann, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180604175644.24293-1-jeffrey.t.kirsher@intel.com>

From: Daniel Borkmann <daniel@iogearbox.net>

Add support for XDP meta data when using build skb variant of
the i40e driver. Implementation is analogous to the existing
ixgbe and ixgbevf support for meta data from 366a88fe2f40 ("bpf,
ixgbe: add meta data support") and be8333322eff ("ixgbevf: Add
support for meta data"). With the build skb variant we get
192 bytes of extra headroom which can be used for encaps or
meta data.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 39 ++++++++++++++++-----
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 9b698c5acd05..105a26f447c0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2032,6 +2032,21 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
 #if L1_CACHE_BYTES < 128
 	prefetch(xdp->data + L1_CACHE_BYTES);
 #endif
+	/* Note, we get here by enabling legacy-rx via:
+	 *
+	 *    ethtool --set-priv-flags <dev> legacy-rx on
+	 *
+	 * In this mode, we currently get 0 extra XDP headroom as
+	 * opposed to having legacy-rx off, where we process XDP
+	 * packets going to stack via i40e_build_skb(). The latter
+	 * provides us currently with 192 bytes of headroom.
+	 *
+	 * For i40e_construct_skb() mode it means that the
+	 * xdp->data_meta will always point to xdp->data, since
+	 * the helper cannot expand the head. Should this ever
+	 * change in future for legacy-rx mode on, then lets also
+	 * add xdp->data_meta handling here.
+	 */
 
 	/* allocate a skb to store the frags */
 	skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
@@ -2083,19 +2098,25 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 				      struct i40e_rx_buffer *rx_buffer,
 				      struct xdp_buff *xdp)
 {
-	unsigned int size = xdp->data_end - xdp->data;
+	unsigned int metasize = xdp->data - xdp->data_meta;
 #if (PAGE_SIZE < 8192)
 	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
 #else
 	unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
-				SKB_DATA_ALIGN(I40E_SKB_PAD + size);
+				SKB_DATA_ALIGN(I40E_SKB_PAD +
+					       (xdp->data_end -
+						xdp->data_hard_start));
 #endif
 	struct sk_buff *skb;
 
-	/* prefetch first cache line of first page */
-	prefetch(xdp->data);
+	/* Prefetch first cache line of first page. If xdp->data_meta
+	 * is unused, this points exactly as xdp->data, otherwise we
+	 * likely have a consumer accessing first few bytes of meta
+	 * data, and then actual data.
+	 */
+	prefetch(xdp->data_meta);
 #if L1_CACHE_BYTES < 128
-	prefetch(xdp->data + L1_CACHE_BYTES);
+	prefetch(xdp->data_meta + L1_CACHE_BYTES);
 #endif
 	/* build an skb around the page buffer */
 	skb = build_skb(xdp->data_hard_start, truesize);
@@ -2103,8 +2124,10 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 		return NULL;
 
 	/* update pointers within the skb to store the data */
-	skb_reserve(skb, I40E_SKB_PAD);
-	__skb_put(skb, size);
+	skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start));
+	__skb_put(skb, xdp->data_end - xdp->data);
+	if (metasize)
+		skb_metadata_set(skb, metasize);
 
 	/* buffer is used by skb, update page_offset */
 #if (PAGE_SIZE < 8192)
@@ -2341,7 +2364,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		if (!skb) {
 			xdp.data = page_address(rx_buffer->page) +
 				   rx_buffer->page_offset;
-			xdp_set_data_meta_invalid(&xdp);
+			xdp.data_meta = xdp.data;
 			xdp.data_hard_start = xdp.data -
 					      i40e_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-- 
2.17.1

^ permalink raw reply related

* [net-next 06/12] ixgbevf: Fix coexistence of malicious driver detection with XDP
From: Jeff Kirsher @ 2018-06-04 17:56 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180604175644.24293-1-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

In the case of the VF driver it is supposed to provide a context descriptor
that allows us to provide information about the header offsets inside of
the frame. However in the case of XDP we don't really have any of that
information since the data is minimally processed. As a result we were
seeing malicious driver detection (MDD) events being triggered when the PF
had that functionality enabled.

To address this I have added a bit of new code that will "prime" the XDP
ring by providing one context descriptor that assumes the minimal setup of
an Ethernet frame which is an L2 header length of 14. With just that we can
provide enough information to make the hardware happy so that we don't
trigger MDD events.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  1 +
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 36 +++++++++++++++----
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 70c75681495f..56a1031dcc07 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -76,6 +76,7 @@ enum ixgbevf_ring_state_t {
 	__IXGBEVF_TX_DETECT_HANG,
 	__IXGBEVF_HANG_CHECK_ARMED,
 	__IXGBEVF_TX_XDP_RING,
+	__IXGBEVF_TX_XDP_RING_PRIMED,
 };
 
 #define ring_is_xdp(ring) \
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 083041129539..2d5a706c3c29 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -991,24 +991,45 @@ static int ixgbevf_xmit_xdp_ring(struct ixgbevf_ring *ring,
 		return IXGBEVF_XDP_CONSUMED;
 
 	/* record the location of the first descriptor for this packet */
-	tx_buffer = &ring->tx_buffer_info[ring->next_to_use];
-	tx_buffer->bytecount = len;
-	tx_buffer->gso_segs = 1;
-	tx_buffer->protocol = 0;
-
 	i = ring->next_to_use;
-	tx_desc = IXGBEVF_TX_DESC(ring, i);
+	tx_buffer = &ring->tx_buffer_info[i];
 
 	dma_unmap_len_set(tx_buffer, len, len);
 	dma_unmap_addr_set(tx_buffer, dma, dma);
 	tx_buffer->data = xdp->data;
-	tx_desc->read.buffer_addr = cpu_to_le64(dma);
+	tx_buffer->bytecount = len;
+	tx_buffer->gso_segs = 1;
+	tx_buffer->protocol = 0;
+
+	/* Populate minimal context descriptor that will provide for the
+	 * fact that we are expected to process Ethernet frames.
+	 */
+	if (!test_bit(__IXGBEVF_TX_XDP_RING_PRIMED, &ring->state)) {
+		struct ixgbe_adv_tx_context_desc *context_desc;
+
+		set_bit(__IXGBEVF_TX_XDP_RING_PRIMED, &ring->state);
+
+		context_desc = IXGBEVF_TX_CTXTDESC(ring, 0);
+		context_desc->vlan_macip_lens	=
+			cpu_to_le32(ETH_HLEN << IXGBE_ADVTXD_MACLEN_SHIFT);
+		context_desc->seqnum_seed	= 0;
+		context_desc->type_tucmd_mlhl	=
+			cpu_to_le32(IXGBE_TXD_CMD_DEXT |
+				    IXGBE_ADVTXD_DTYP_CTXT);
+		context_desc->mss_l4len_idx	= 0;
+
+		i = 1;
+	}
 
 	/* put descriptor type bits */
 	cmd_type = IXGBE_ADVTXD_DTYP_DATA |
 		   IXGBE_ADVTXD_DCMD_DEXT |
 		   IXGBE_ADVTXD_DCMD_IFCS;
 	cmd_type |= len | IXGBE_TXD_CMD;
+
+	tx_desc = IXGBEVF_TX_DESC(ring, i);
+	tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
 	tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
 	tx_desc->read.olinfo_status =
 			cpu_to_le32((len << IXGBE_ADVTXD_PAYLEN_SHIFT) |
@@ -1688,6 +1709,7 @@ static void ixgbevf_configure_tx_ring(struct ixgbevf_adapter *adapter,
 	       sizeof(struct ixgbevf_tx_buffer) * ring->count);
 
 	clear_bit(__IXGBEVF_HANG_CHECK_ARMED, &ring->state);
+	clear_bit(__IXGBEVF_TX_XDP_RING_PRIMED, &ring->state);
 
 	IXGBE_WRITE_REG(hw, IXGBE_VFTXDCTL(reg_idx), txdctl);
 
-- 
2.17.1

^ permalink raw reply related

* [net-next 07/12] ixgbevf: fix possible race in the reset subtask
From: Jeff Kirsher @ 2018-06-04 17:56 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180604175644.24293-1-jeffrey.t.kirsher@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

Extend the RTNL lock in ixgbevf_reset_subtask() to protect the state bits
check in addition to the call to ixgbevf_reinit_locked().

This is to make sure that we get the most up-to-date values for the bits
and avoid a possible race when going down.

Suggested-by: Zhiping du <zhipingdu@tencent.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 2d5a706c3c29..59416eddd840 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3141,15 +3141,17 @@ static void ixgbevf_reset_subtask(struct ixgbevf_adapter *adapter)
 	if (!test_and_clear_bit(__IXGBEVF_RESET_REQUESTED, &adapter->state))
 		return;
 
+	rtnl_lock();
 	/* If we're already down or resetting, just bail */
 	if (test_bit(__IXGBEVF_DOWN, &adapter->state) ||
 	    test_bit(__IXGBEVF_REMOVING, &adapter->state) ||
-	    test_bit(__IXGBEVF_RESETTING, &adapter->state))
+	    test_bit(__IXGBEVF_RESETTING, &adapter->state)) {
+		rtnl_unlock();
 		return;
+	}
 
 	adapter->tx_timeout_count++;
 
-	rtnl_lock();
 	ixgbevf_reinit_locked(adapter);
 	rtnl_unlock();
 }
-- 
2.17.1

^ permalink raw reply related

* [net-next 08/12] ixgbe: introduce a helper to simplify code
From: Jeff Kirsher @ 2018-06-04 17:56 UTC (permalink / raw)
  To: davem; +Cc: YueHaibing, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180604175644.24293-1-jeffrey.t.kirsher@intel.com>

From: YueHaibing <yuehaibing@huawei.com>

ixgbe_dbg_reg_ops_read and ixgbe_dbg_netdev_ops_read copy-pasting
the same code except for ixgbe_dbg_netdev_ops_buf/ixgbe_dbg_reg_ops_buf,
so introduce a helper ixgbe_dbg_common_ops_read to remove redundant code.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 .../net/ethernet/intel/ixgbe/ixgbe_debugfs.c  | 57 +++++++------------
 1 file changed, 21 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
index 55fe8114fe99..50dfb02fa34c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
@@ -10,15 +10,9 @@ static struct dentry *ixgbe_dbg_root;
 
 static char ixgbe_dbg_reg_ops_buf[256] = "";
 
-/**
- * ixgbe_dbg_reg_ops_read - read for reg_ops datum
- * @filp: the opened file
- * @buffer: where to write the data for the user to read
- * @count: the size of the user's buffer
- * @ppos: file position offset
- **/
-static ssize_t ixgbe_dbg_reg_ops_read(struct file *filp, char __user *buffer,
-				    size_t count, loff_t *ppos)
+static ssize_t ixgbe_dbg_common_ops_read(struct file *filp, char __user *buffer,
+					 size_t count, loff_t *ppos,
+					 char *dbg_buf)
 {
 	struct ixgbe_adapter *adapter = filp->private_data;
 	char *buf;
@@ -29,8 +23,7 @@ static ssize_t ixgbe_dbg_reg_ops_read(struct file *filp, char __user *buffer,
 		return 0;
 
 	buf = kasprintf(GFP_KERNEL, "%s: %s\n",
-			adapter->netdev->name,
-			ixgbe_dbg_reg_ops_buf);
+			adapter->netdev->name, dbg_buf);
 	if (!buf)
 		return -ENOMEM;
 
@@ -45,6 +38,20 @@ static ssize_t ixgbe_dbg_reg_ops_read(struct file *filp, char __user *buffer,
 	return len;
 }
 
+/**
+ * ixgbe_dbg_reg_ops_read - read for reg_ops datum
+ * @filp: the opened file
+ * @buffer: where to write the data for the user to read
+ * @count: the size of the user's buffer
+ * @ppos: file position offset
+ **/
+static ssize_t ixgbe_dbg_reg_ops_read(struct file *filp, char __user *buffer,
+				      size_t count, loff_t *ppos)
+{
+	return ixgbe_dbg_common_ops_read(filp, buffer, count, ppos,
+					 ixgbe_dbg_reg_ops_buf);
+}
+
 /**
  * ixgbe_dbg_reg_ops_write - write into reg_ops datum
  * @filp: the opened file
@@ -121,33 +128,11 @@ static char ixgbe_dbg_netdev_ops_buf[256] = "";
  * @count: the size of the user's buffer
  * @ppos: file position offset
  **/
-static ssize_t ixgbe_dbg_netdev_ops_read(struct file *filp,
-					 char __user *buffer,
+static ssize_t ixgbe_dbg_netdev_ops_read(struct file *filp, char __user *buffer,
 					 size_t count, loff_t *ppos)
 {
-	struct ixgbe_adapter *adapter = filp->private_data;
-	char *buf;
-	int len;
-
-	/* don't allow partial reads */
-	if (*ppos != 0)
-		return 0;
-
-	buf = kasprintf(GFP_KERNEL, "%s: %s\n",
-			adapter->netdev->name,
-			ixgbe_dbg_netdev_ops_buf);
-	if (!buf)
-		return -ENOMEM;
-
-	if (count < strlen(buf)) {
-		kfree(buf);
-		return -ENOSPC;
-	}
-
-	len = simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
-
-	kfree(buf);
-	return len;
+	return ixgbe_dbg_common_ops_read(filp, buffer, count, ppos,
+					 ixgbe_dbg_netdev_ops_buf);
 }
 
 /**
-- 
2.17.1

^ permalink raw reply related

* [net-next 10/12] ixgbe: fix possible race in reset subtask
From: Jeff Kirsher @ 2018-06-04 17:56 UTC (permalink / raw)
  To: davem; +Cc: Tony Nguyen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180604175644.24293-1-jeffrey.t.kirsher@intel.com>

From: Tony Nguyen <anthony.l.nguyen@intel.com>

Similar to ixgbevf, the same possibility for race exists. Extend the RTNL
lock in ixgbe_reset_subtask() to protect the state bits; this is to make
sure that we get the most up-to-date values for the bits and avoid a
possible race when going down.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ba3035c08572..dd8a3a037c2f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7621,17 +7621,19 @@ static void ixgbe_reset_subtask(struct ixgbe_adapter *adapter)
 	if (!test_and_clear_bit(__IXGBE_RESET_REQUESTED, &adapter->state))
 		return;
 
+	rtnl_lock();
 	/* If we're already down, removing or resetting, just bail */
 	if (test_bit(__IXGBE_DOWN, &adapter->state) ||
 	    test_bit(__IXGBE_REMOVING, &adapter->state) ||
-	    test_bit(__IXGBE_RESETTING, &adapter->state))
+	    test_bit(__IXGBE_RESETTING, &adapter->state)) {
+		rtnl_unlock();
 		return;
+	}
 
 	ixgbe_dump(adapter);
 	netdev_err(adapter->netdev, "Reset adapter\n");
 	adapter->tx_timeout_count++;
 
-	rtnl_lock();
 	ixgbe_reinit_locked(adapter);
 	rtnl_unlock();
 }
-- 
2.17.1

^ permalink raw reply related

* [net-next 11/12] ixgbe: check ipsec ip addr against mgmt filters
From: Jeff Kirsher @ 2018-06-04 17:56 UTC (permalink / raw)
  To: davem; +Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180604175644.24293-1-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@oracle.com>

Make sure we don't try to offload the decryption of an incoming
packet that should get delivered to the management engine.  This
is a corner case that will likely be very seldom seen, but could
really confuse someone if they were to hit it.

Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c    | 88 +++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 99b170f1efd1..e1c976271bbd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -444,6 +444,89 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state *xs,
 	return 0;
 }
 
+/**
+ * ixgbe_ipsec_check_mgmt_ip - make sure there is no clash with mgmt IP filters
+ * @xs: pointer to transformer state struct
+ **/
+static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
+{
+	struct net_device *dev = xs->xso.dev;
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 mfval, manc, reg;
+	int num_filters = 4;
+	bool manc_ipv4;
+	u32 bmcipval;
+	int i, j;
+
+#define MANC_EN_IPV4_FILTER      BIT(24)
+#define MFVAL_IPV4_FILTER_SHIFT  16
+#define MFVAL_IPV6_FILTER_SHIFT  24
+#define MIPAF_ARR(_m, _n)        (IXGBE_MIPAF + ((_m) * 0x10) + ((_n) * 4))
+
+#define IXGBE_BMCIP(_n)          (0x5050 + ((_n) * 4))
+#define IXGBE_BMCIPVAL           0x5060
+#define BMCIP_V4                 0x2
+#define BMCIP_V6                 0x3
+#define BMCIP_MASK               0x3
+
+	manc = IXGBE_READ_REG(hw, IXGBE_MANC);
+	manc_ipv4 = !!(manc & MANC_EN_IPV4_FILTER);
+	mfval = IXGBE_READ_REG(hw, IXGBE_MFVAL);
+	bmcipval = IXGBE_READ_REG(hw, IXGBE_BMCIPVAL);
+
+	if (xs->props.family == AF_INET) {
+		/* are there any IPv4 filters to check? */
+		if (manc_ipv4) {
+			/* the 4 ipv4 filters are all in MIPAF(3, i) */
+			for (i = 0; i < num_filters; i++) {
+				if (!(mfval & BIT(MFVAL_IPV4_FILTER_SHIFT + i)))
+					continue;
+
+				reg = IXGBE_READ_REG(hw, MIPAF_ARR(3, i));
+				if (reg == xs->id.daddr.a4)
+					return 1;
+			}
+		}
+
+		if ((bmcipval & BMCIP_MASK) == BMCIP_V4) {
+			reg = IXGBE_READ_REG(hw, IXGBE_BMCIP(3));
+			if (reg == xs->id.daddr.a4)
+				return 1;
+		}
+
+	} else {
+		/* if there are ipv4 filters, they are in the last ipv6 slot */
+		if (manc_ipv4)
+			num_filters = 3;
+
+		for (i = 0; i < num_filters; i++) {
+			if (!(mfval & BIT(MFVAL_IPV6_FILTER_SHIFT + i)))
+				continue;
+
+			for (j = 0; j < 4; j++) {
+				reg = IXGBE_READ_REG(hw, MIPAF_ARR(i, j));
+				if (reg != xs->id.daddr.a6[j])
+					break;
+			}
+			if (j == 4)   /* did we match all 4 words? */
+				return 1;
+		}
+
+		if ((bmcipval & BMCIP_MASK) == BMCIP_V6) {
+			for (j = 0; j < 4; j++) {
+				reg = IXGBE_READ_REG(hw, IXGBE_BMCIP(j));
+				if (reg != xs->id.daddr.a6[j])
+					break;
+			}
+			if (j == 4)   /* did we match all 4 words? */
+				return 1;
+		}
+	}
+
+	return 0;
+}
+
 /**
  * ixgbe_ipsec_add_sa - program device with a security association
  * @xs: pointer to transformer state struct
@@ -465,6 +548,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 		return -EINVAL;
 	}
 
+	if (ixgbe_ipsec_check_mgmt_ip(xs)) {
+		netdev_err(dev, "IPsec IP addr clash with mgmt filters\n");
+		return -EINVAL;
+	}
+
 	if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
 		struct rx_sa rsa;
 
-- 
2.17.1

^ permalink raw reply related

* [net-next 12/12] ixgbe: fix broken ipsec Rx with proper cast on spi
From: Jeff Kirsher @ 2018-06-04 17:56 UTC (permalink / raw)
  To: davem; +Cc: Shannon Nelson, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180604175644.24293-1-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@oracle.com>

Fix up a cast problem introduced by a sparse cleanup patch.  This fixes
a problem where the encrypted packets were not recognized on Rx and
subsequently dropped.

Fixes: 9cfbfa701b55 ("ixgbe: cleanup sparse warnings")
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index e1c976271bbd..344a1f213a5f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -663,7 +663,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 
 		/* hash the new entry for faster search in Rx path */
 		hash_add_rcu(ipsec->rx_sa_list, &ipsec->rx_tbl[sa_idx].hlist,
-			     (__force u64)rsa.xs->id.spi);
+			     (__force u32)rsa.xs->id.spi);
 	} else {
 		struct tx_sa tsa;
 
-- 
2.17.1

^ permalink raw reply related

* [Patch net-next] netdev-FAQ: clarify DaveM's position for stable backports
From: Cong Wang @ 2018-06-04 18:07 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, stable, Greg Kroah-Hartman

Per discussion with David at netconf 2018, let's clarify
DaveM's position of handling stable backports in netdev-FAQ.

This is important for people relying on upstream -stable
releases.

Cc: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 Documentation/networking/netdev-FAQ.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/networking/netdev-FAQ.txt b/Documentation/networking/netdev-FAQ.txt
index 2a3278d5cf35..6dde6686c870 100644
--- a/Documentation/networking/netdev-FAQ.txt
+++ b/Documentation/networking/netdev-FAQ.txt
@@ -179,6 +179,15 @@ A: No.  See above answer.  In short, if you think it really belongs in
    dash marker line as described in Documentation/process/submitting-patches.rst to
    temporarily embed that information into the patch that you send.
 
+Q: Are all networking bug fixes backported to all stable releases?
+
+A: Due to capacity, Dave could only take care of the backports for the last
+   3 stable releases. For earlier stable releases, each stable branch maintainer
+   is supposed to take care of them. If you find any patch is missing from an
+   earlier stable branch, please notify stable@vger.kernel.org with either a
+   commit ID or a formal patch backported, and CC Dave and other relevant
+   networking developers.
+
 Q: Someone said that the comment style and coding convention is different
    for the networking content.  Is this true?
 
-- 
2.13.0

^ permalink raw reply related

* Re: [PATCH 15a/18] rhashtables: add lockdep tracking to bucket bit-spin-locks.
From: Simon Horman @ 2018-06-04 18:16 UTC (permalink / raw)
  To: NeilBrown
  Cc: Eric Dumazet, Herbert Xu, Thomas Graf, netdev, linux-kernel,
	David S. Miller
In-Reply-To: <87po17p8jd.fsf@notabene.neil.brown.name>

On Mon, Jun 04, 2018 at 12:52:54PM +1000, NeilBrown wrote:
> 
> Native bit_spin_locks are not tracked by lockdep.
> 
> The bit_spin_locks used for rhashtable buckets are local
> to the rhashtable implementation, so there is little opportunity
> for the sort of misuse that lockdep might detect.
> However locks are held while a hash function or compare
> function is called, and if one of these took a lock,
> a misbehaviour is possible.
> 
> As it is quite easy to add lockdep support this unlikely
> possibility see to be enough justification.

nit: s/see/seems/

> 
> So create a lockdep class for bucket bit_spin_lock as attach
> through a lockdep_map in each bucket_table.
> 
> With the 'nested' annotation in rhashtable_rehash_one(), lockdep
> correctly reports a possible problem as this lock it taken
> while another bucket lock (in another table) is held.  This
> confirms that the added support works.
> With the correct nested annotation in place, lockdep reports
> no problems.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

^ permalink raw reply

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
From: David Ahern @ 2018-06-04 18:17 UTC (permalink / raw)
  To: AMG Zollner Robert, ganeshgr; +Cc: netdev
In-Reply-To: <b515a1c7-2cf8-1af8-372d-393420d298b8@cloudmedia.eu>

On 6/4/18 8:03 AM, AMG Zollner Robert wrote:
> I have noticed that vrf is not working with kernel v4.15.0 but was
> working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
> 
> Setup:
> Two metal servers with a T520-cr card each, directly connected without a
> switch in between.
> 
>        SVR1  only ipfwd                 SVR2     with vrf
> .----------------------------. .----------------------------------.
> |                            |         |             |
> |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
> |    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
> `----------------------------' `----------------------------------'
> 
> When vrf is not working there are no error messages (dmesg or iproute
> commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
> shows packets(arp req/reply) coming in and going out, but outgoing
> packets(arp reply) do not reach the other server SVR1.ens2f4d1
> 
> 
> Bisect:
> Found this commit to be the problem after doing a git bisect between
> v4.13..v4.15:
> 
> commit ba581f77df23c8ee70b372966e69cf10bc5453d8
> Author: Ganesh Goudar <ganeshgr@chelsio.com>
> Date:   Sat Sep 23 16:07:28 2017 +0530
> 
>     cxgb4: do DCB state reset in couple of places
> 
>     reset the driver's DCB state in couple of places
>     where it was missing.
> 
> 
> A bisect step was considered good when:
> - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
> - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
> forwarding) (this check was redundant,both tests fail or pass simultaneous)
> 
> The problem is still present on recent kernels also, checked v4.16.0 and
> v4.17.rc7
> 
> Disabling DCB for the card support fixes the problem ( Compiling kernel
> with "CONFIG_CHELSIO_T4_DCB=n")
> 

Are you doing the VRF enslave while it is up?

If so, does it work ok if you change the sequence:

ip li set ens1f4d1 down
ip li set ens1f4d1 master <VRF>
ip li set ens1f4d1 up

^ permalink raw reply

* Re: [PATCH] net/dns_resolver: dns_query Modify parameter checking to avoid dead code
From: Simon Horman @ 2018-06-04 18:25 UTC (permalink / raw)
  To: nixiaoming
  Cc: davem, dhowells, manuel.schoelling, wang840925, linux-kernel,
	netdev
In-Reply-To: <20180604064031.116472-1-nixiaoming@huawei.com>

On Mon, Jun 04, 2018 at 02:40:31PM +0800, nixiaoming wrote:
> After commit 1a4240f4764a ("DNS: Separate out CIFS DNS Resolver code")
> a dead code exists in function dns_query
> 
> code show as below:
> 	if (!name || namelen == 0)
> 		return -EINVAL;
> 	/*Now the value of "namelen" cannot be equal to 0*/
> 	....
> 	if (!namelen) /*The condition "!namelen"" cannot be true*/
> 		namelen = strnlen(name, 256); /*deadcode*/
> 
> Modify parameter checking to avoid dead code
> 
> Signed-off-by: nixiaoming <nixiaoming@huawei.com>
> ---
>  net/dns_resolver/dns_query.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/net/dns_resolver/dns_query.c b/net/dns_resolver/dns_query.c
> index 49da670..f2acee2 100644
> --- a/net/dns_resolver/dns_query.c
> +++ b/net/dns_resolver/dns_query.c
> @@ -81,7 +81,9 @@ int dns_query(const char *type, const char *name, size_t namelen,
>  	kenter("%s,%*.*s,%zu,%s",
>  	       type, (int)namelen, (int)namelen, name, namelen, options);
>  
> -	if (!name || namelen == 0)
> +	if (!name || namelen < 3 || namelen > 255)
> +		return -EINVAL;
> +	if (namelen > strnlen(name, 256)) /*maybe only need part of name*/

The line above seems to change the behaviour of this function.
I think it and the previous line can be dropped.

>  		return -EINVAL;
>  
>  	/* construct the query key description as "[<type>:]<name>" */
> @@ -94,10 +96,6 @@ int dns_query(const char *type, const char *name, size_t namelen,
>  		desclen += typelen + 1;
>  	}
>  
> -	if (!namelen)
> -		namelen = strnlen(name, 256);
> -	if (namelen < 3 || namelen > 255)
> -		return -EINVAL;
>  	desclen += namelen + 1;

I think the initialisation of desclen can be changed
to include namelen + 1 without changing the behaviour of this function.

>  
>  	desc = kmalloc(desclen, GFP_KERNEL);
> -- 
> 2.10.1
> 

^ permalink raw reply

* Re: [PATCH bpf-next v3 05/11] bpf: avoid retpoline for lookup/update/delete calls on maps
From: Jakub Kicinski @ 2018-06-04 18:25 UTC (permalink / raw)
  To: Phil Sutter
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, alexei.starovoitov,
	netdev, Jakub Kicinski, Quentin Monnet
In-Reply-To: <20180604110225.GX11363@tatos.vnet>

On Mon, 4 Jun 2018 13:02:25 +0200, Phil Sutter wrote:
> On Sun, Jun 03, 2018 at 07:08:55PM +0200, Jesper Dangaard Brouer wrote:
> > Secondly I personally *hate* how the 'ip' does it's short options
> > parsing and especially order/precedence ambiguity.  Phil Sutter
> > (Fedora/RHEL iproute2 maintainer) have a funny quiz illustrating the
> > ambiguity issues.  
> 
> Hehe, yes. It's a classical case of something smart evolving into a
> pain: At first there's only 'ip link', so you allow 'ip l' as a
> shortcut. Then someone implements 'ip l2tp' - so what do you do?

Good example, I like that "ip l" shows me the links because that's what
99.99% of people want when they type that command ;)

> Establish a policy of abbreviation having to be unique and break
> existing behaviour or accept the mess and head on.

Commands are tested in order of addition so older ones take precedence.

The iproute2 behaviour was replicated in bpftool on purpose, because
it should be very familiar to people.  It is to me at least.  And IMHO
it's better to be consistent with a well known tool than have our own
quirks and rules...

> My suggestion would be to not get into the abbreviated subcommands
> business at all but instead ship and maintain a bash-completion script.

We prefer to have both :)  Those of us who like to abbreviate can do
that, and others can use completions.  I personally think Quentin did
an awesome job on the completions, they cover the entire syntax unlike
the iproute2 ones and we intend to keep them that way!

^ permalink raw reply

* Re: [PATCH 0/3] sh_eth: fix & clean up sh_eth_soft_swap()
From: David Miller @ 2018-06-04 19:24 UTC (permalink / raw)
  To: sergei.shtylyov; +Cc: netdev, linux-renesas-soc
In-Reply-To: <9027499a-0e19-7721-a17f-26e86885da3f@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Sat, 2 Jun 2018 22:32:48 +0300

> Here's a set of 3 patches against DaveM's 'net-next.git' repo. First one fixes an
> old buffer endiannes issue (luckily, the ARM SoCs are smart enough to not actually
> care) plus couple clean ups around sh_eth_soft_swap()...
> 
> [1/1] sh_eth: make sh_eth_soft_swap() work on ARM
> [2/3] sh_eth: uninline sh_eth_soft_swap()
> [3/3] sh_eth: use DIV_ROUND_UP() in sh_eth_soft_swap()

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next 0/2] mlxsw: Fixes in offloading of mirror-to-gretap
From: David Miller @ 2018-06-04 19:34 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, petrm, mlxsw
In-Reply-To: <20180602180935.24544-1-idosch@mellanox.com>

From: Ido Schimmel <idosch@mellanox.com>
Date: Sat,  2 Jun 2018 21:09:33 +0300

> Petr says:
> 
> These two patches fix issues in offloading of mirror-to-gretap when
> bridge is present in the underlay.
> 
> In patch #1, reconsideration of SPAN configuration is not done right at
> the point that SWITCHDEV_OBJ_ID_PORT_VLAN deletion notification is
> distributed, but is postponed, because the notifications are actually
> distributed before the relevant change is implemented in the bridge.
> 
> In patch #2, a problem in configuring VLAN tagging in situations when a
> VLAN device is on top of an 802.1Q bridge whose egress port is marked as
> "egress untagged". In that case, mlxsw would neglect to suppress the
> tagging implicitly assumed after the VLAN device was seen.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next 0/2] net: phy: improve PM handling of PHY/MDIO
From: David Miller @ 2018-06-04 19:41 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <b8f7b42e-791a-5997-d5eb-16f649738421@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Sat, 2 Jun 2018 22:33:36 +0200

> Current implementation of MDIO bus PM ops doesn't actually implement
> bus-specific PM ops but just calls PM ops defined on a device level
> what doesn't seem to be fully in line with the core PM model.
> 
> When looking e.g. at __device_suspend() the PM core looks for PM ops
> of a device in a specific order:
> 1. device PM domain
> 2. device type
> 3. device class
> 4. device bus
> 
> I think it has good reason that there's no PM ops on device level.
> The situation can be improved by modeling PHY's as device type of
> a MDIO device. If for some other type of MDIO device PM ops are
> needed, it could be modeled as struct device_type as well.

Andrew and Florian, it would nice if one of you would review this
patch series.

Thank you.

^ permalink raw reply

* Re: [PATCH bpf-next v3 05/11] bpf: avoid retpoline for lookup/update/delete calls on maps
From: Daniel Borkmann @ 2018-06-04 19:45 UTC (permalink / raw)
  To: Jakub Kicinski, Phil Sutter
  Cc: Jesper Dangaard Brouer, alexei.starovoitov, netdev,
	Jakub Kicinski, Quentin Monnet
In-Reply-To: <20180604112524.1d5732e8@cakuba.netronome.com>

On 06/04/2018 08:25 PM, Jakub Kicinski wrote:
[...]
> We prefer to have both :)  Those of us who like to abbreviate can do
> that, and others can use completions.  I personally think Quentin did
> an awesome job on the completions, they cover the entire syntax unlike
> the iproute2 ones and we intend to keep them that way!

Fully agree, both make sense. Personally, I only use abbreviations on
bpftool so far. :)

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] AF_XDP: bug fixes and descriptor changes
From: Daniel Borkmann @ 2018-06-04 19:51 UTC (permalink / raw)
  To: Alexei Starovoitov, Björn Töpel
  Cc: magnus.karlsson, magnus.karlsson, alexander.h.duyck,
	alexander.duyck, ast, brouer, netdev, mykyta.iziumtsev,
	Björn Töpel, john.fastabend, willemdebruijn.kernel, mst,
	michael.lundkvist, jesse.brandeburg, anjali.singhai, qi.z.zhang,
	francois.ozog, ilias.apalodimas, brian.brooks, andy, michael.chan
In-Reply-To: <20180604162429.zu4uno6fviz4pfte@ast-mbp>

On 06/04/2018 06:24 PM, Alexei Starovoitov wrote:
> On Mon, Jun 04, 2018 at 01:57:10PM +0200, Björn Töpel wrote:
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> An issue with the current AF_XDP uapi raised by Mykyta Iziumtsev (see
>> https://www.spinics.net/lists/netdev/msg503664.html) is that it does
>> not support NICs that have a "type-writer" model in an efficient
>> way. In this model, a memory window is passed to the hardware and
>> multiple frames might be filled into that window, instead of just one
>> that we have in the current fixed frame-size model.
>>
>> This patch set fixes two bugs in the current implementation and then
>> changes the uapi so that the type-writer model can be supported
>> efficiently by a possible future extension of AF_XDP.
>>
>> These are the uapi changes in this patch:
>>
>> * Change the "u32 idx" in the descriptors to "u64 addr". The current
>>   idx based format does NOT work for the type-writer model (as packets
>>   can start anywhere within a frame) but that a relative address
>>   pointer (the u64 addr) works well for both models in the prototype
>>   code we have that supports both models. We increased it from u32 to
>>   u64 to support umems larger than 4G. We have also removed the u16
>>   offset when having a "u64 addr" since that information is already
>>   carried in the least significant bits of the address.
>>
>> * We want to use "u8 padding[5]" for something useful in the future
>>   (since we are not allowed to change its name), so we now call it
>>   just options so it can be extended for various purposes in the
>>   future. It is an u32 as that it what is left of the 16 byte
>>   descriptor.
>>
>> * We changed the name of frame_size in the UMEM_REG setsockopt to
>>   chunk_size since this naming also makes sense to the type-writer
>>   model.
>>
>> With these changes to the uapi, we believe the type-writer model can
>> be supported without having to resort to a new descriptor format. The
>> type-writer model could then be supported, from the uapi point of
>> view, by setting a flag at bind time and providing a new flag bit in
>> the options field of the descriptor that signals to user space that
>> all packets have been written in a chunk. Or with a new chunk
>> completion queue as suggested by Mykyta in his latest feedback mail on
>> the list.
> 
> for the set:
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Thank you for these fixes.
> According to unofficial feedback from brcm and netronome folks
> the descriptor format should work for these nics too.
> At some point we may consider second format, but I think SW
> should drive HW requirements and not the other way around.

LGTM as well, applied to bpf-next, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: guard bpf_get_current_cgroup_id() with CONFIG_CGROUPS
From: Daniel Borkmann @ 2018-06-04 19:53 UTC (permalink / raw)
  To: Yonghong Song, ast, netdev; +Cc: kernel-team
In-Reply-To: <20180604155341.1517003-1-yhs@fb.com>

On 06/04/2018 05:53 PM, Yonghong Song wrote:
> Commit bf6fa2c893c5 ("bpf: implement bpf_get_current_cgroup_id()
> helper") introduced a new helper bpf_get_current_cgroup_id().
> The helper has a dependency on CONFIG_CGROUPS.
> 
> When CONFIG_CGROUPS is not defined, using the helper will result
> the following verifier error:
>   kernel subsystem misconfigured func bpf_get_current_cgroup_id#80
> which is hard for users to interpret.
> Guarding the reference to bpf_get_current_cgroup_id_proto with
> CONFIG_CGROUPS will result in below better message:
>   unknown func bpf_get_current_cgroup_id#80
> 
> Fixes: bf6fa2c893c5 ("bpf: implement bpf_get_current_cgroup_id() helper")
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Applied to bpf-next, thanks Yonghong!

^ permalink raw reply

* Re: [bpf-next PATCH] bpf: sockmap, fix crash when ipv6 sock is added
From: Daniel Borkmann @ 2018-06-04 19:59 UTC (permalink / raw)
  To: John Fastabend, edumazet, ast; +Cc: netdev
In-Reply-To: <20180604152125.6930.88723.stgit@john-Precision-Tower-5810>

On 06/04/2018 05:21 PM, John Fastabend wrote:
> This fixes a crash where we assign tcp_prot to IPv6 sockets instead
> of tcpv6_prot.
> 
> Previously we overwrote the sk->prot field with tcp_prot even in the
> AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
> are used. Further, only allow ESTABLISHED connections to join the
> map per note in TLS ULP,
> 
>    /* The TLS ulp is currently supported only for TCP sockets
>     * in ESTABLISHED state.
>     * Supporting sockets in LISTEN state will require us
>     * to modify the accept implementation to clone rather then
>     * share the ulp context.
>     */
> 
> Also tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
> 'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
> crashing case here.
> 
> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
> Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Wei Wang <weiwan@google.com>

Applied to bpf-next, thanks everyone!

^ permalink raw reply

* Re: [PATCH net-next] Allow ethtool to change tun link settings
From: David Miller @ 2018-06-04 20:05 UTC (permalink / raw)
  To: 3chas3; +Cc: netdev
In-Reply-To: <20180602214953.22866-1-3chas3@gmail.com>

From: Chas Williams <3chas3@gmail.com>
Date: Sat,  2 Jun 2018 17:49:53 -0400

> Let user space set whatever it would like to advertise for the
> tun interface.  Preserve the existing defaults.
> 
> Signed-off-by: Chas Williams <3chas3@gmail.com>

This looks fine, applied.

^ permalink raw reply

* Re: [PATCH net] rxrpc: Fix handling of call quietly cancelled out on server
From: David Miller @ 2018-06-04 20:06 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs, linux-kernel
In-Reply-To: <152798865948.5814.10196893914153717731.stgit@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Sun, 03 Jun 2018 02:17:39 +0100

> Sometimes an in-progress call will stop responding on the fileserver when
> the fileserver quietly cancels the call with an internally marked abort
> (RX_CALL_DEAD), without sending an ABORT to the client.
> 
> This causes the client's call to eventually expire from lack of incoming
> packets directed its way, which currently leads to it being cancelled
> locally with ETIME.  Note that it's not currently clear as to why this
> happens as it's really hard to reproduce.
> 
> The rotation policy implement by kAFS, however, doesn't differentiate
> between ETIME meaning we didn't get any response from the server and ETIME
> meaning the call got cancelled mid-flow.  The latter leads to an oops when
> fetching data as the rotation partially resets the afs_read descriptor,
> which can result in a cleared page pointer being dereferenced because that
> page has already been filled.
 ...
> Signed-off-by: David Howells <dhowells@redhat.com>

Applied, thanks David.

^ permalink raw reply

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
From: AMG Zollner Robert @ 2018-06-04 20:14 UTC (permalink / raw)
  To: David Ahern, ganeshgr; +Cc: netdev
In-Reply-To: <8073c78c-3243-d7f3-55c3-2cc1a2153366@cumulusnetworks.com>

Yes, I was enslaving while the interface was up.

Just tested some of the builds that where not working earlier and they 
are working if I keep the interface down when enslaving as you suggested.

Is this the expected behavior?

Thank you,
Zollner Robert


On 04.06.2018 21:17, David Ahern wrote:
> On 6/4/18 8:03 AM, AMG Zollner Robert wrote:
>> I have noticed that vrf is not working with kernel v4.15.0 but was
>> working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
>>
>> Setup:
>> Two metal servers with a T520-cr card each, directly connected without a
>> switch in between.
>>
>>         SVR1  only ipfwd                 SVR2     with vrf
>> .----------------------------. .----------------------------------.
>> |                            |         |             |
>> |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
>> |    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
>> `----------------------------' `----------------------------------'
>>
>> When vrf is not working there are no error messages (dmesg or iproute
>> commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
>> shows packets(arp req/reply) coming in and going out, but outgoing
>> packets(arp reply) do not reach the other server SVR1.ens2f4d1
>>
>>
>> Bisect:
>> Found this commit to be the problem after doing a git bisect between
>> v4.13..v4.15:
>>
>> commit ba581f77df23c8ee70b372966e69cf10bc5453d8
>> Author: Ganesh Goudar <ganeshgr@chelsio.com>
>> Date:   Sat Sep 23 16:07:28 2017 +0530
>>
>>      cxgb4: do DCB state reset in couple of places
>>
>>      reset the driver's DCB state in couple of places
>>      where it was missing.
>>
>>
>> A bisect step was considered good when:
>> - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
>> - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
>> forwarding) (this check was redundant,both tests fail or pass simultaneous)
>>
>> The problem is still present on recent kernels also, checked v4.16.0 and
>> v4.17.rc7
>>
>> Disabling DCB for the card support fixes the problem ( Compiling kernel
>> with "CONFIG_CHELSIO_T4_DCB=n")
>>
> Are you doing the VRF enslave while it is up?
>
> If so, does it work ok if you change the sequence:
>
> ip li set ens1f4d1 down
> ip li set ens1f4d1 master <VRF>
> ip li set ens1f4d1 up

^ permalink raw reply

* [PATCH net-next] net: phy: broadcom: Enable 125 MHz clock on LED4 pin for BCM54612E by default.
From: Kun Yi @ 2018-06-04 20:17 UTC (permalink / raw)
  To: davem, kunyi
  Cc: netdev, Avi.Fishman, tali.perry, tomer.maimon, benjaminfair,
	rlippert, f.fainelli

BCM54612E have 4 multi-functional LED pins that can be configured
through register setting; the LED4 pin can be configured to a 125MHz
reference clock output by setting the spare register. Since the dedicated
CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4
pin is the only pin to provide such function in this package, and therefore
it is beneficial to just enable the reference clock by default.

Signed-off-by: Kun Yi <kunyi@google.com>
---
 drivers/net/phy/broadcom.c | 16 ++++++++++++++--
 include/linux/brcmphy.h    |  4 ++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index f9c25912eb98..e86ea105c802 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -54,6 +54,8 @@ static int bcm54210e_config_init(struct phy_device *phydev)
 
 static int bcm54612e_config_init(struct phy_device *phydev)
 {
+	int reg;
+
 	/* Clear TX internal delay unless requested. */
 	if ((phydev->interface != PHY_INTERFACE_MODE_RGMII_ID) &&
 	    (phydev->interface != PHY_INTERFACE_MODE_RGMII_TXID)) {
@@ -65,8 +67,6 @@ static int bcm54612e_config_init(struct phy_device *phydev)
 	/* Clear RX internal delay unless requested. */
 	if ((phydev->interface != PHY_INTERFACE_MODE_RGMII_ID) &&
 	    (phydev->interface != PHY_INTERFACE_MODE_RGMII_RXID)) {
-		u16 reg;
-
 		reg = bcm54xx_auxctl_read(phydev,
 					  MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
 		/* Disable RXD to RXC delay (default set) */
@@ -77,6 +77,18 @@ static int bcm54612e_config_init(struct phy_device *phydev)
 				     MII_BCM54XX_AUXCTL_MISC_WREN | reg);
 	}
 
+	/* Enable CLK125 MUX on LED4 if ref clock is enabled. */
+	if (!(phydev->dev_flags & PHY_BRCM_RX_REFCLK_UNUSED)) {
+		int err;
+
+		reg = bcm_phy_read_exp(phydev, BCM54612E_EXP_SPARE0);
+		err = bcm_phy_write_exp(phydev, BCM54612E_EXP_SPARE0,
+					BCM54612E_LED4_CLK125OUT_EN | reg);
+
+		if (err < 0)
+			return err;
+	}
+
 	return 0;
 }
 
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index b324e01ccf2d..daa9234a9baf 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -85,6 +85,7 @@
 #define MII_BCM54XX_EXP_SEL	0x17	/* Expansion register select */
 #define MII_BCM54XX_EXP_SEL_SSD	0x0e00	/* Secondary SerDes select */
 #define MII_BCM54XX_EXP_SEL_ER	0x0f00	/* Expansion register select */
+#define MII_BCM54XX_EXP_SEL_ETC	0x0d00	/* Expansion register spare + 2k mem */
 
 #define MII_BCM54XX_AUX_CTL	0x18	/* Auxiliary control register */
 #define MII_BCM54XX_ISR		0x1a	/* BCM54xx interrupt status register */
@@ -219,6 +220,9 @@
 #define BCM54810_SHD_CLK_CTL			0x3
 #define BCM54810_SHD_CLK_CTL_GTXCLK_EN		(1 << 9)
 
+/* BCM54612E Registers */
+#define BCM54612E_EXP_SPARE0		(MII_BCM54XX_EXP_SEL_ETC + 0x34)
+#define BCM54612E_LED4_CLK125OUT_EN	(1 << 1)
 
 /*****************************************************************************/
 /* Fast Ethernet Transceiver definitions. */
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply related

* Re: [Intel-wired-lan] [PATCH bpf-next 00/11] AF_XDP: introducing zero-copy support
From: Jeff Kirsher @ 2018-06-04 20:29 UTC (permalink / raw)
  To: Alexei Starovoitov, Björn Töpel
  Cc: mykyta.iziumtsev, mst, brian.brooks, magnus.karlsson, andy,
	francois.ozog, willemdebruijn.kernel, daniel, ast,
	intel-wired-lan, brouer, Björn Töpel, michael.lundkvist,
	qi.z.zhang, michael.chan, magnus.karlsson, netdev,
	ilias.apalodimas
In-Reply-To: <20180604163838.5pzojvzrxd2cusny@ast-mbp>

[-- Attachment #1: Type: text/plain, Size: 5343 bytes --]

On Mon, 2018-06-04 at 09:38 -0700, Alexei Starovoitov wrote:
> On Mon, Jun 04, 2018 at 02:05:50PM +0200, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> > 
> > This patch serie introduces zerocopy (ZC) support for
> > AF_XDP. Programs using AF_XDP sockets will now receive RX packets
> > without any copies and can also transmit packets without incurring
> > any
> > copies. No modifications to the application are needed, but the NIC
> > driver needs to be modified to support ZC. If ZC is not supported
> > by
> > the driver, the modes introduced in the AF_XDP patch will be
> > used. Using ZC in our micro benchmarks results in significantly
> > improved performance as can be seen in the performance section
> > later
> > in this cover letter.
> > 
> > Note that for an untrusted application, HW packet steering to a
> > specific queue pair (the one associated with the application) is a
> > requirement when using ZC, as the application would otherwise be
> > able
> > to see other user space processes' packets. If the HW cannot
> > support
> > the required packet steering you need to use the XDP_SKB mode or
> > the
> > XDP_DRV mode without ZC turned on. The XSKMAP introduced in the
> > AF_XDP
> > patch set can be used to do load balancing in that case.
> > 
> > For benchmarking, you can use the xdpsock application from the
> > AF_XDP
> > patch set without any modifications. Say that you would like your
> > UDP
> > traffic from port 4242 to end up in queue 16, that we will enable
> > AF_XDP on. Here, we use ethtool for this:
> > 
> >       ethtool -N p3p2 rx-flow-hash udp4 fn
> >       ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \
> >           action 16
> > 
> > Running the rxdrop benchmark in XDP_DRV mode with zerocopy can then
> > be
> > done using:
> > 
> >       samples/bpf/xdpsock -i p3p2 -q 16 -r -N
> > 
> > We have run some benchmarks on a dual socket system with two
> > Broadwell
> > E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has
> > 14
> > cores which gives a total of 28, but only two cores are used in
> > these
> > experiments. One for TR/RX and one for the user space application.
> > The
> > memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> > 8192MB and with 8 of those DIMMs in the system we have 64 GB of
> > total
> > memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0.
> > The
> > NIC is Intel I40E 40Gbit/s using the i40e driver.
> > 
> > Below are the results in Mpps of the I40E NIC benchmark runs for 64
> > and 1500 byte packets, generated by a commercial packet generator
> > HW
> > outputing packets at full 40 Gbit/s line rate. The results are
> > without
> > retpoline so that we can compare against previous numbers. 
> > 
> > AF_XDP performance 64 byte packets. Results from the AF_XDP V3
> > patch
> > set are also reported for ease of reference. The numbers within
> > parantheses are from the RFC V1 ZC patch set.
> > Benchmark   XDP_SKB    XDP_DRV    XDP_DRV with zerocopy
> > rxdrop       2.9*       9.6*       21.1(21.5)
> > txpush       2.6*       -          22.0(21.6)
> > l2fwd        1.9*       2.5*       15.3(15.0)
> > 
> > AF_XDP performance 1500 byte packets:
> > Benchmark   XDP_SKB   XDP_DRV     XDP_DRV with zerocopy
> > rxdrop       2.1*       3.3*       3.3(3.3)
> > l2fwd        1.4*       1.8*       3.1(3.1)
> > 
> > * From AF_XDP V3 patch set and cover letter.
> > 
> > So why do we not get higher values for RX similar to the 34 Mpps we
> > had in AF_PACKET V4? We made an experiment running the rxdrop
> > benchmark without using the xdp_do_redirect/flush infrastructure
> > nor
> > using an XDP program (all traffic on a queue goes to one
> > socket). Instead the driver acts directly on the AF_XDP socket.
> > With
> > this we got 36.9 Mpps, a significant improvement without any change
> > to
> > the uapi. So not forcing users to have an XDP program if they do
> > not
> > need it, might be a good idea. This measurement is actually higher
> > than what we got with AF_PACKET V4.
> > 
> > XDP performance on our system as a base line:
> > 
> > 64 byte packets:
> > XDP stats       CPU     pps         issue-pps
> > XDP-RX CPU      16      32.3M  0
> > 
> > 1500 byte packets:
> > XDP stats       CPU     pps         issue-pps
> > XDP-RX CPU      16      3.3M    0
> > 
> > The structure of the patch set is as follows:
> > 
> > Patches 1-3: Plumbing for AF_XDP ZC support
> > Patches 4-5: AF_XDP ZC for RX
> > Patches 6-7: AF_XDP ZC for TX
> 
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> for above patches
> 
> > Patch 8-10: ZC support for i40e.
> 
> these also look good to me.
> would be great if i40e experts take a look at them asap.
> 
> If there are no major objections we'd like to merge all of it
> for this merge window.

We would like a bit more time to review and test the changes, I
understand your eagerness for wanting this to get into 4.18 but this
change is large enough that a 24-48 hour review time is not prudent,
IMHO.

Alex also has requested for more time so that he can review the changes
as well.  I will go ahead and put the entire series in my tree so that
our validation team can start to "kick the tires".

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04
From: Or Gerlitz @ 2018-06-04 20:27 UTC (permalink / raw)
  To: Jeff Kirsher, David Miller; +Cc: Linux Netdev List
In-Reply-To: <20180604175644.24293-1-jeffrey.t.kirsher@intel.com>

On Mon, Jun 4, 2018 at 8:56 PM, Jeff Kirsher
<jeffrey.t.kirsher@intel.com> wrote:
> This series contains a smorgasbord of updates to documentation, e1000e,
> igb, ixgbe, ixgbevf and i40e.

Dave,

Did you forgot to flip the sign on the shop's door [1]?

Or.

[1] http://vger.kernel.org/~davem/net-next.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox