Netdev List

Netdev List
 help / color / mirror / Atom feed

* [net 10/11] ixgbe: Fix cls_u32 offload support for fields with masks
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Sridhar Samudrala, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Sridhar Samudrala <sridhar.samudrala@intel.com>

Remove the incorrect check for mask in ixgbe_configure_clsu32 and
drop the 'mask' field that is not required in struct ixgbe_mat_field

Verified with the following filters:

 #tc qdisc add dev p4p1 ingress
 #tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
	handle 800:0:1 u32 ht 800: \
	match ip dst 10.0.0.1/8 match ip src 10.0.0.2/8 action drop
 #tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
	handle 800:0:2 u32 ht 800: \
	match ip dst 11.0.0.1/16 match ip src 11.0.0.2/16 action drop
 #tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
	handle 800:0:3 u32 ht 800: \
	match ip dst 12.0.0.1/24 match ip src 12.0.0.2/24 action drop
 #tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
	handle 800:0:4 u32 ht 800: \
	match ip dst 13.0.0.1/32 match ip src 13.0.0.2/32 action drop

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 3 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h | 9 ++++-----
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ca9c543..7df3fe2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8331,8 +8331,7 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
 		int j;
 
 		for (j = 0; field_ptr[j].val; j++) {
-			if (field_ptr[j].off == off &&
-			    field_ptr[j].mask == m) {
+			if (field_ptr[j].off == off) {
 				field_ptr[j].val(input, &mask, val, m);
 				input->filter.formatted.flow_type |=
 					field_ptr[j].type;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
index ce48872..61f7290 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
@@ -32,7 +32,6 @@
 
 struct ixgbe_mat_field {
 	unsigned int off;
-	unsigned int mask;
 	int (*val)(struct ixgbe_fdir_filter *input,
 		   union ixgbe_atr_input *mask,
 		   u32 val, u32 m);
@@ -58,9 +57,9 @@ static inline int ixgbe_mat_prgm_dip(struct ixgbe_fdir_filter *input,
 }
 
 static struct ixgbe_mat_field ixgbe_ipv4_fields[] = {
-	{ .off = 12, .mask = -1, .val = ixgbe_mat_prgm_sip,
+	{ .off = 12, .val = ixgbe_mat_prgm_sip,
 	  .type = IXGBE_ATR_FLOW_TYPE_IPV4},
-	{ .off = 16, .mask = -1, .val = ixgbe_mat_prgm_dip,
+	{ .off = 16, .val = ixgbe_mat_prgm_dip,
 	  .type = IXGBE_ATR_FLOW_TYPE_IPV4},
 	{ .val = NULL } /* terminal node */
 };
@@ -84,9 +83,9 @@ static inline int ixgbe_mat_prgm_dport(struct ixgbe_fdir_filter *input,
 };
 
 static struct ixgbe_mat_field ixgbe_tcp_fields[] = {
-	{.off = 0, .mask = 0xffff, .val = ixgbe_mat_prgm_sport,
+	{.off = 0, .val = ixgbe_mat_prgm_sport,
 	 .type = IXGBE_ATR_FLOW_TYPE_TCPV4},
-	{.off = 2, .mask = 0xffff, .val = ixgbe_mat_prgm_dport,
+	{.off = 2, .val = ixgbe_mat_prgm_dport,
 	 .type = IXGBE_ATR_FLOW_TYPE_TCPV4},
 	{ .val = NULL } /* terminal node */
 };
-- 
2.5.5

^ permalink raw reply related

* [net 08/11] ixgbe: make __ixgbe_setup_tc static
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

This function is only used in ixgbe_main.c
Resolves a "missing prototype" warning when building the driver with W=1

Reported-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 267a507..f5736a3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8376,8 +8376,8 @@ err_out:
 	return -EINVAL;
 }
 
-int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
-		     struct tc_to_netdev *tc)
+static int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+			    struct tc_to_netdev *tc)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 
-- 
2.5.5

^ permalink raw reply related

* [net 09/11] ixgbe: fix error handling in TC cls_u32 offload routines
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Sridhar Samudrala, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Sridhar Samudrala <sridhar.samudrala@intel.com>

Check for handle ids when adding/deleting hash nodes OR adding/deleting
filter entries and limit them to max number of links or header nodes
supported(IXGBE_MAX_LINK_HANDLE).

Start from bit 0 when setting hash table bit-map.(adapter->tables)

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 52 +++++++++++++++++----------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f5736a3..ca9c543 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8192,10 +8192,17 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 static int ixgbe_delete_clsu32(struct ixgbe_adapter *adapter,
 			       struct tc_cls_u32_offload *cls)
 {
+	u32 uhtid = TC_U32_USERHTID(cls->knode.handle);
+	u32 loc;
 	int err;
 
+	if ((uhtid != 0x800) && (uhtid >= IXGBE_MAX_LINK_HANDLE))
+		return -EINVAL;
+
+	loc = cls->knode.handle & 0xfffff;
+
 	spin_lock(&adapter->fdir_perfect_lock);
-	err = ixgbe_update_ethtool_fdir_entry(adapter, NULL, cls->knode.handle);
+	err = ixgbe_update_ethtool_fdir_entry(adapter, NULL, loc);
 	spin_unlock(&adapter->fdir_perfect_lock);
 	return err;
 }
@@ -8204,20 +8211,30 @@ static int ixgbe_configure_clsu32_add_hnode(struct ixgbe_adapter *adapter,
 					    __be16 protocol,
 					    struct tc_cls_u32_offload *cls)
 {
+	u32 uhtid = TC_U32_USERHTID(cls->hnode.handle);
+
+	if (uhtid >= IXGBE_MAX_LINK_HANDLE)
+		return -EINVAL;
+
 	/* This ixgbe devices do not support hash tables at the moment
 	 * so abort when given hash tables.
 	 */
 	if (cls->hnode.divisor > 0)
 		return -EINVAL;
 
-	set_bit(TC_U32_USERHTID(cls->hnode.handle), &adapter->tables);
+	set_bit(uhtid - 1, &adapter->tables);
 	return 0;
 }
 
 static int ixgbe_configure_clsu32_del_hnode(struct ixgbe_adapter *adapter,
 					    struct tc_cls_u32_offload *cls)
 {
-	clear_bit(TC_U32_USERHTID(cls->hnode.handle), &adapter->tables);
+	u32 uhtid = TC_U32_USERHTID(cls->hnode.handle);
+
+	if (uhtid >= IXGBE_MAX_LINK_HANDLE)
+		return -EINVAL;
+
+	clear_bit(uhtid - 1, &adapter->tables);
 	return 0;
 }
 
@@ -8235,27 +8252,29 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
 #endif
 	int i, err = 0;
 	u8 queue;
-	u32 handle;
+	u32 uhtid, link_uhtid;
 
 	memset(&mask, 0, sizeof(union ixgbe_atr_input));
-	handle = cls->knode.handle;
+	uhtid = TC_U32_USERHTID(cls->knode.handle);
+	link_uhtid = TC_U32_USERHTID(cls->knode.link_handle);
 
-	/* At the moment cls_u32 jumps to transport layer and skips past
+	/* At the moment cls_u32 jumps to network layer and skips past
 	 * L2 headers. The canonical method to match L2 frames is to use
 	 * negative values. However this is error prone at best but really
 	 * just broken because there is no way to "know" what sort of hdr
-	 * is in front of the transport layer. Fix cls_u32 to support L2
+	 * is in front of the network layer. Fix cls_u32 to support L2
 	 * headers when needed.
 	 */
 	if (protocol != htons(ETH_P_IP))
 		return -EINVAL;
 
-	if (cls->knode.link_handle ||
-	    cls->knode.link_handle >= IXGBE_MAX_LINK_HANDLE) {
+	if (link_uhtid) {
 		struct ixgbe_nexthdr *nexthdr = ixgbe_ipv4_jumps;
-		u32 uhtid = TC_U32_USERHTID(cls->knode.link_handle);
 
-		if (!test_bit(uhtid, &adapter->tables))
+		if (link_uhtid >= IXGBE_MAX_LINK_HANDLE)
+			return -EINVAL;
+
+		if (!test_bit(link_uhtid - 1, &adapter->tables))
 			return -EINVAL;
 
 		for (i = 0; nexthdr[i].jump; i++) {
@@ -8271,10 +8290,7 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
 			    nexthdr->mask != cls->knode.sel->keys[0].mask)
 				return -EINVAL;
 
-			if (uhtid >= IXGBE_MAX_LINK_HANDLE)
-				return -EINVAL;
-
-			adapter->jump_tables[uhtid] = nexthdr->jump;
+			adapter->jump_tables[link_uhtid] = nexthdr->jump;
 		}
 		return 0;
 	}
@@ -8291,13 +8307,13 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
 	 * To add support for new nodes update ixgbe_model.h parse structures
 	 * this function _should_ be generic try not to hardcode values here.
 	 */
-	if (TC_U32_USERHTID(handle) == 0x800) {
+	if (uhtid == 0x800) {
 		field_ptr = adapter->jump_tables[0];
 	} else {
-		if (TC_U32_USERHTID(handle) >= ARRAY_SIZE(adapter->jump_tables))
+		if (uhtid >= IXGBE_MAX_LINK_HANDLE)
 			return -EINVAL;
 
-		field_ptr = adapter->jump_tables[TC_U32_USERHTID(handle)];
+		field_ptr = adapter->jump_tables[uhtid];
 	}
 
 	if (!field_ptr)
-- 
2.5.5

^ permalink raw reply related

* [net 00/11][pull request] Intel Wired LAN Driver Updates 2016-03-29
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene, john.ronciak

This series contains fixes to ixgbe and ixgbevf.

Tushar fixes an issue which was introduced with an earlier commit, where
hardware register RAR0 default MAC address does not get set properly.

Alex fixes two issues, first being the VXLAN port number should be stored
in network order instead of in host order.  The second fix corrects the ATR
code to handle IPv6 extension headers.  The issue was ATR code was assuming
that it would be able to use tcp_hdr for every TCP frame that came through,
but that is not the case, which resulted in bad filters being setup.

Mark fixes a use of usleep_range() to udelay() in the case where a lock
is being held.

Stefan fixes the offline self tests where ndo_stop() should be used instead
of ndo_close(), which causes IFF_UP to be cleared and interface routes get
removed.

Emil fixes the error case where we need to return an error when a MAC
address change is rejected by the PF.  This helps prevent the user from
modifying the MAC address when the operation is not permitted.

Sridhar provides three fixes for ixgbe, all dealing with traffic class
offload handling.

The following are changes since commit e84810c7b85a2d7897797b3ad3e879168a8e032a:
  qmi_wwan: add "D-Link DWM-221 B1" device id
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 10GbE

Alexander Duyck (2):
  ixgbe: Store VXLAN port number in network order
  ixgbe: Fix ATR so that it correctly handles IPv6 extension headers

Emil Tantilov (2):
  ixgbevf: fix error code path when setting MAC address
  ixgbe: make __ixgbe_setup_tc static

Mark Rustad (1):
  ixgbe: Use udelay to avoid sleeping while atomic

Sridhar Samudrala (3):
  ixgbe: fix error handling in TC cls_u32 offload routines
  ixgbe: Fix cls_u32 offload support for fields with masks
  ixgbe: Fix cls_u32 offload support for L4 ports

Stefan Assmann (2):
  ixgbe: call ndo_stop() instead of dev_close() when running offline
    selftest
  ixgbevf: call ndo_stop() instead of dev_close() when running offline
    selftest

Tushar Dave (1):
  ixgbe: Fix for RAR0 not being set to default MAC addr

 drivers/net/ethernet/intel/ixgbe/ixgbe.h          |  10 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 165 ++++++++++------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h    |  21 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c     |   2 +-
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      |   4 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |   2 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  16 ++-
 drivers/net/ethernet/intel/ixgbevf/vf.c           |   4 +-
 9 files changed, 107 insertions(+), 121 deletions(-)

-- 
2.5.5

^ permalink raw reply

* [net 02/11] ixgbe: Store VXLAN port number in network order
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <aduyck@mirantis.com>

The VXLAN port number should be stored in network order instead of in host
order as it is accessed from the hot-path in ATR.  This way we can avoid
having to do any byte swaps in order to validate the port number.

I moved the vxlan_port value into a hole in the read-mostly region of the
adapter struct.  This way it should be in a warm cache-line instead of in
some isolated region in memory when it needs to be accessed.

In addition I went through and stripped a bunch of unneeded ifdef flags
since having an extra variable present doesn't really hurt anything and
makes the code easier to read.  I also went through and dropped the
NETIF_F_RXCSUM flag which was being set in hw_encap_features but provides
no value as the flag is not evaluated in the Rx path.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  8 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 61 ++++++++-------------------
 2 files changed, 20 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 84fa28c..458549c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -661,9 +661,7 @@ struct ixgbe_adapter {
 #define IXGBE_FLAG2_RSS_FIELD_IPV6_UDP		(u32)(1 << 9)
 #define IXGBE_FLAG2_PTP_PPS_ENABLED		(u32)(1 << 10)
 #define IXGBE_FLAG2_PHY_INTERRUPT		(u32)(1 << 11)
-#ifdef CONFIG_IXGBE_VXLAN
 #define IXGBE_FLAG2_VXLAN_REREG_NEEDED		BIT(12)
-#endif
 #define IXGBE_FLAG2_VLAN_PROMISC		BIT(13)
 
 	/* Tx fast path data */
@@ -675,6 +673,9 @@ struct ixgbe_adapter {
 	int num_rx_queues;
 	u16 rx_itr_setting;
 
+	/* Port number used to identify VXLAN traffic */
+	__be16 vxlan_port;
+
 	/* TX */
 	struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
 
@@ -782,9 +783,6 @@ struct ixgbe_adapter {
 	u32 timer_event_accumulator;
 	u32 vferr_refcount;
 	struct ixgbe_mac_addr *mac_table;
-#ifdef CONFIG_IXGBE_VXLAN
-	u16 vxlan_port;
-#endif
 	struct kobject *info_kobj;
 #ifdef CONFIG_IXGBE_HWMON
 	struct hwmon_buff *ixgbe_hwmon_buff;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f193273..f67c9a6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4531,9 +4531,7 @@ static void ixgbe_clear_vxlan_port(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_X550:
 	case ixgbe_mac_X550EM_x:
 		IXGBE_WRITE_REG(&adapter->hw, IXGBE_VXLANCTRL, 0);
-#ifdef CONFIG_IXGBE_VXLAN
 		adapter->vxlan_port = 0;
-#endif
 		break;
 	default:
 		break;
@@ -7561,9 +7559,6 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 	} hdr;
 	struct tcphdr *th;
 	struct sk_buff *skb;
-#ifdef CONFIG_IXGBE_VXLAN
-	u8 encap = false;
-#endif /* CONFIG_IXGBE_VXLAN */
 	__be16 vlan_id;
 
 	/* if ring doesn't have a interrupt vector, cannot perform ATR */
@@ -7579,28 +7574,21 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 	/* snag network header to get L4 type and address */
 	skb = first->skb;
 	hdr.network = skb_network_header(skb);
-	if (!skb->encapsulation) {
-		th = tcp_hdr(skb);
-	} else {
+	th = tcp_hdr(skb);
 #ifdef CONFIG_IXGBE_VXLAN
+	if (skb->encapsulation &&
+	    first->protocol == htons(ETH_P_IP) &&
+	    hdr.ipv4->protocol != IPPROTO_UDP) {
 		struct ixgbe_adapter *adapter = q_vector->adapter;
 
-		if (!adapter->vxlan_port)
-			return;
-		if (first->protocol != htons(ETH_P_IP) ||
-		    hdr.ipv4->version != IPVERSION ||
-		    hdr.ipv4->protocol != IPPROTO_UDP) {
-			return;
+		/* verify the port is recognized as VXLAN */
+		if (adapter->vxlan_port &&
+		    udp_hdr(skb)->dest == adapter->vxlan_port) {
+			hdr.network = skb_inner_network_header(skb);
+			th = inner_tcp_hdr(skb);
 		}
-		if (ntohs(udp_hdr(skb)->dest) != adapter->vxlan_port)
-			return;
-		encap = true;
-		hdr.network = skb_inner_network_header(skb);
-		th = inner_tcp_hdr(skb);
-#else
-		return;
-#endif /* CONFIG_IXGBE_VXLAN */
 	}
+#endif /* CONFIG_IXGBE_VXLAN */
 
 	/* Currently only IPv4/IPv6 with TCP is supported */
 	switch (hdr.ipv4->version) {
@@ -7682,10 +7670,8 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 		break;
 	}
 
-#ifdef CONFIG_IXGBE_VXLAN
-	if (encap)
+	if (hdr.network != skb_network_header(skb))
 		input.formatted.flow_type |= IXGBE_ATR_L4TYPE_TUNNEL_MASK;
-#endif /* CONFIG_IXGBE_VXLAN */
 
 	/* This assumes the Rx queue and Tx queue are bound to the same CPU */
 	ixgbe_fdir_add_signature_filter_82599(&q_vector->adapter->hw,
@@ -8554,7 +8540,6 @@ static void ixgbe_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
 {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_hw *hw = &adapter->hw;
-	u16 new_port = ntohs(port);
 
 	if (!(adapter->flags & IXGBE_FLAG_VXLAN_OFFLOAD_CAPABLE))
 		return;
@@ -8562,18 +8547,18 @@ static void ixgbe_add_vxlan_port(struct net_device *dev, sa_family_t sa_family,
 	if (sa_family == AF_INET6)
 		return;
 
-	if (adapter->vxlan_port == new_port)
+	if (adapter->vxlan_port == port)
 		return;
 
 	if (adapter->vxlan_port) {
 		netdev_info(dev,
 			    "Hit Max num of VXLAN ports, not adding port %d\n",
-			    new_port);
+			    ntohs(port));
 		return;
 	}
 
-	adapter->vxlan_port = new_port;
-	IXGBE_WRITE_REG(hw, IXGBE_VXLANCTRL, new_port);
+	adapter->vxlan_port = port;
+	IXGBE_WRITE_REG(hw, IXGBE_VXLANCTRL, ntohs(port));
 }
 
 /**
@@ -8586,7 +8571,6 @@ static void ixgbe_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
 				 __be16 port)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
-	u16 new_port = ntohs(port);
 
 	if (!(adapter->flags & IXGBE_FLAG_VXLAN_OFFLOAD_CAPABLE))
 		return;
@@ -8594,9 +8578,9 @@ static void ixgbe_del_vxlan_port(struct net_device *dev, sa_family_t sa_family,
 	if (sa_family == AF_INET6)
 		return;
 
-	if (adapter->vxlan_port != new_port) {
+	if (adapter->vxlan_port != port) {
 		netdev_info(dev, "Port %d was not found, not deleting\n",
-			    new_port);
+			    ntohs(port));
 		return;
 	}
 
@@ -9265,17 +9249,6 @@ skip_sriov:
 	netdev->priv_flags |= IFF_UNICAST_FLT;
 	netdev->priv_flags |= IFF_SUPP_NOFCS;
 
-#ifdef CONFIG_IXGBE_VXLAN
-	switch (adapter->hw.mac.type) {
-	case ixgbe_mac_X550:
-	case ixgbe_mac_X550EM_x:
-		netdev->hw_enc_features |= NETIF_F_RXCSUM;
-		break;
-	default:
-		break;
-	}
-#endif /* CONFIG_IXGBE_VXLAN */
-
 #ifdef CONFIG_IXGBE_DCB
 	netdev->dcbnl_ops = &dcbnl_ops;
 #endif
-- 
2.5.5

^ permalink raw reply related

* [net 01/11] ixgbe: Fix for RAR0 not being set to default MAC addr
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Tushar Dave, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Tushar Dave <tushar.n.dave@oracle.com>

commit c9f53e63c208 ("ixgbe: Refactor MAC address configuration code")
introduced code that doesn't set HW register RAR0 to default mac address
but FF:FF:FF:FF:FF:FF. Due to this, ixgbe HW discards all incoming packets
that doesn't have destination mac address equals to FF:FF:FF:FF:FF:FF.

This commit sets RAR0 correctly to default HW mac address.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Tested-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 569cb07..f193273 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9329,6 +9329,8 @@ skip_sriov:
 		goto err_sw_init;
 	}
 
+	/* Set hw->mac.addr to permanent MAC address */
+	ether_addr_copy(hw->mac.addr, hw->mac.perm_addr);
 	ixgbe_mac_set_default_filter(adapter);
 
 	setup_timer(&adapter->service_timer, &ixgbe_service_timer,
-- 
2.5.5

^ permalink raw reply related

* [net 04/11] ixgbe: Use udelay to avoid sleeping while atomic
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Mark Rustad, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mark Rustad <mark.d.rustad@intel.com>

Use udelay instead of usleep_range because this can be called while
a lock is held.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index 87aca3f..68a9c64 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -355,7 +355,7 @@ static s32 ixgbe_iosf_wait(struct ixgbe_hw *hw, u32 *ctrl)
 		command = IXGBE_READ_REG(hw, IXGBE_SB_IOSF_INDIRECT_CTRL);
 		if (!(command & IXGBE_SB_IOSF_CTRL_BUSY))
 			break;
-		usleep_range(10, 20);
+		udelay(10);
 	}
 	if (ctrl)
 		*ctrl = command;
-- 
2.5.5

^ permalink raw reply related

* [net 03/11] ixgbe: Fix ATR so that it correctly handles IPv6 extension headers
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <aduyck@mirantis.com>

The ATR code was assuming that it would be able to use tcp_hdr for
every TCP frame that came through.  However this isn't the case as it
is possible for a frame to arrive that is TCP but sent through something
like a raw socket.  As a result the driver was setting up bad filters in
which tcp_hdr was really pointing to the network header so the data was
all invalid.

In order to correct this I have added a bit of parsing logic that will
determine the TCP header location based off of the network header and
either the offset in the case of the IPv4 header, or a walk through the
IPv6 extension headers until it encounters the header that indicates
IPPROTO_TCP.  In addition I have added checks to verify that the lowest
protocol provided is recognized as IPv4 or IPv6 to help mitigate raw
sockets using ETH_P_ALL from having ATR applied to them.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 45 +++++++++++++--------------
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f67c9a6..ee81618 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7558,8 +7558,10 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 		struct ipv6hdr *ipv6;
 	} hdr;
 	struct tcphdr *th;
+	unsigned int hlen;
 	struct sk_buff *skb;
 	__be16 vlan_id;
+	int l4_proto;
 
 	/* if ring doesn't have a interrupt vector, cannot perform ATR */
 	if (!q_vector)
@@ -7571,10 +7573,14 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 
 	ring->atr_count++;
 
+	/* currently only IPv4/IPv6 with TCP is supported */
+	if ((first->protocol != htons(ETH_P_IP)) &&
+	    (first->protocol != htons(ETH_P_IPV6)))
+		return;
+
 	/* snag network header to get L4 type and address */
 	skb = first->skb;
 	hdr.network = skb_network_header(skb);
-	th = tcp_hdr(skb);
 #ifdef CONFIG_IXGBE_VXLAN
 	if (skb->encapsulation &&
 	    first->protocol == htons(ETH_P_IP) &&
@@ -7583,43 +7589,34 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 
 		/* verify the port is recognized as VXLAN */
 		if (adapter->vxlan_port &&
-		    udp_hdr(skb)->dest == adapter->vxlan_port) {
+		    udp_hdr(skb)->dest == adapter->vxlan_port)
 			hdr.network = skb_inner_network_header(skb);
-			th = inner_tcp_hdr(skb);
-		}
 	}
 #endif /* CONFIG_IXGBE_VXLAN */
 
 	/* Currently only IPv4/IPv6 with TCP is supported */
 	switch (hdr.ipv4->version) {
 	case IPVERSION:
-		if (hdr.ipv4->protocol != IPPROTO_TCP)
-			return;
+		/* access ihl as u8 to avoid unaligned access on ia64 */
+		hlen = (hdr.network[0] & 0x0F) << 2;
+		l4_proto = hdr.ipv4->protocol;
 		break;
 	case 6:
-		if (likely((unsigned char *)th - hdr.network ==
-			   sizeof(struct ipv6hdr))) {
-			if (hdr.ipv6->nexthdr != IPPROTO_TCP)
-				return;
-		} else {
-			__be16 frag_off;
-			u8 l4_hdr;
-
-			ipv6_skip_exthdr(skb, hdr.network - skb->data +
-					      sizeof(struct ipv6hdr),
-					 &l4_hdr, &frag_off);
-			if (unlikely(frag_off))
-				return;
-			if (l4_hdr != IPPROTO_TCP)
-				return;
-		}
+		hlen = hdr.network - skb->data;
+		l4_proto = ipv6_find_hdr(skb, &hlen, IPPROTO_TCP, NULL, NULL);
+		hlen -= hdr.network - skb->data;
 		break;
 	default:
 		return;
 	}
 
-	/* skip this packet since it is invalid or the socket is closing */
-	if (!th || th->fin)
+	if (l4_proto != IPPROTO_TCP)
+		return;
+
+	th = (struct tcphdr *)(hdr.network + hlen);
+
+	/* skip this packet since the socket is closing */
+	if (th->fin)
 		return;
 
 	/* sample on all syn packets or once every atr sample count */
-- 
2.5.5

^ permalink raw reply related

* [net 06/11] ixgbevf: call ndo_stop() instead of dev_close() when running offline selftest
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Stefan Assmann, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Stefan Assmann <sassmann@kpanic.de>

Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      | 4 ++--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      | 2 ++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index c48aef6..d7aa4b2 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -680,7 +680,7 @@ static void ixgbevf_diag_test(struct net_device *netdev,
 
 		if (if_running)
 			/* indicate we're in test mode */
-			dev_close(netdev);
+			ixgbevf_close(netdev);
 		else
 			ixgbevf_reset(adapter);
 
@@ -692,7 +692,7 @@ static void ixgbevf_diag_test(struct net_device *netdev,
 
 		clear_bit(__IXGBEVF_TESTING, &adapter->state);
 		if (if_running)
-			dev_open(netdev);
+			ixgbevf_open(netdev);
 	} else {
 		hw_dbg(&adapter->hw, "online testing starting\n");
 		/* Online tests */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 68ec7daa..991eeae 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -486,6 +486,8 @@ extern const struct ixgbe_mbx_operations ixgbevf_mbx_ops;
 extern const char ixgbevf_driver_name[];
 extern const char ixgbevf_driver_version[];
 
+int ixgbevf_open(struct net_device *netdev);
+int ixgbevf_close(struct net_device *netdev);
 void ixgbevf_up(struct ixgbevf_adapter *adapter);
 void ixgbevf_down(struct ixgbevf_adapter *adapter);
 void ixgbevf_reinit_locked(struct ixgbevf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 0ea14c0..6a337bb 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3122,7 +3122,7 @@ static void ixgbevf_free_all_rx_resources(struct ixgbevf_adapter *adapter)
  * handler is registered with the OS, the watchdog timer is started,
  * and the stack is notified that the interface is ready.
  **/
-static int ixgbevf_open(struct net_device *netdev)
+int ixgbevf_open(struct net_device *netdev)
 {
 	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
@@ -3205,7 +3205,7 @@ err_setup_reset:
  * needs to be disabled.  A global MAC reset is issued to stop the
  * hardware, and all transmit and receive resources are freed.
  **/
-static int ixgbevf_close(struct net_device *netdev)
+int ixgbevf_close(struct net_device *netdev)
 {
 	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
 
-- 
2.5.5

^ permalink raw reply related

* [net 05/11] ixgbe: call ndo_stop() instead of dev_close() when running offline selftest
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Stefan Assmann, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Stefan Assmann <sassmann@kpanic.de>

Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         | 2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 458549c..e4949af 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -877,6 +877,8 @@ extern const char ixgbe_driver_version[];
 extern char ixgbe_default_device_descr[];
 #endif /* IXGBE_FCOE */
 
+int ixgbe_open(struct net_device *netdev);
+int ixgbe_close(struct net_device *netdev);
 void ixgbe_up(struct ixgbe_adapter *adapter);
 void ixgbe_down(struct ixgbe_adapter *adapter);
 void ixgbe_reinit_locked(struct ixgbe_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 726e0ee..b3530e1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2053,7 +2053,7 @@ static void ixgbe_diag_test(struct net_device *netdev,
 
 		if (if_running)
 			/* indicate we're in test mode */
-			dev_close(netdev);
+			ixgbe_close(netdev);
 		else
 			ixgbe_reset(adapter);
 
@@ -2091,7 +2091,7 @@ skip_loopback:
 		/* clear testing bit and return adapter to previous state */
 		clear_bit(__IXGBE_TESTING, &adapter->state);
 		if (if_running)
-			dev_open(netdev);
+			ixgbe_open(netdev);
 		else if (hw->mac.ops.disable_tx_laser)
 			hw->mac.ops.disable_tx_laser(hw);
 	} else {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ee81618..267a507 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5992,7 +5992,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
  * handler is registered with the OS, the watchdog timer is started,
  * and the stack is notified that the interface is ready.
  **/
-static int ixgbe_open(struct net_device *netdev)
+int ixgbe_open(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
@@ -6094,7 +6094,7 @@ static void ixgbe_close_suspend(struct ixgbe_adapter *adapter)
  * needs to be disabled.  A global MAC reset is issued to stop the
  * hardware, and all transmit and receive resources are freed.
  **/
-static int ixgbe_close(struct net_device *netdev)
+int ixgbe_close(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
-- 
2.5.5

^ permalink raw reply related

* [net 07/11] ixgbevf: fix error code path when setting MAC address
From: Jeff Kirsher @ 2016-03-30 23:00 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1459378855-139837-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

Return error when a MAC address change is rejected by the PF.

This will prevent the user from modifying the MAC address when
that operation is not permitted.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 12 ++++++++----
 drivers/net/ethernet/intel/ixgbevf/vf.c           |  4 +++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 6a337bb..b0edae9 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3692,19 +3692,23 @@ static int ixgbevf_set_mac(struct net_device *netdev, void *p)
 	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct sockaddr *addr = p;
+	int err;
 
 	if (!is_valid_ether_addr(addr->sa_data))
 		return -EADDRNOTAVAIL;
 
-	ether_addr_copy(netdev->dev_addr, addr->sa_data);
-	ether_addr_copy(hw->mac.addr, addr->sa_data);
-
 	spin_lock_bh(&adapter->mbx_lock);
 
-	hw->mac.ops.set_rar(hw, 0, hw->mac.addr, 0);
+	err = hw->mac.ops.set_rar(hw, 0, addr->sa_data, 0);
 
 	spin_unlock_bh(&adapter->mbx_lock);
 
+	if (err)
+		return -EPERM;
+
+	ether_addr_copy(hw->mac.addr, addr->sa_data);
+	ether_addr_copy(netdev->dev_addr, addr->sa_data);
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 61a98f4..4d613a4 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -408,8 +408,10 @@ static s32 ixgbevf_set_rar_vf(struct ixgbe_hw *hw, u32 index, u8 *addr,
 
 	/* if nacked the address was rejected, use "perm_addr" */
 	if (!ret_val &&
-	    (msgbuf[0] == (IXGBE_VF_SET_MAC_ADDR | IXGBE_VT_MSGTYPE_NACK)))
+	    (msgbuf[0] == (IXGBE_VF_SET_MAC_ADDR | IXGBE_VT_MSGTYPE_NACK))) {
 		ixgbevf_get_mac_addr_vf(hw, hw->mac.addr);
+		return IXGBE_ERR_MBX;
+	}
 
 	return ret_val;
 }
-- 
2.5.5

^ permalink raw reply related

* Re: [net PATCH] i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet
From: Alexander Duyck @ 2016-03-30 23:06 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Sowmini Varadhan, Alexander Duyck, Netdev, intel-wired-lan,
	Jeff Kirsher
In-Reply-To: <CAKgT0Ud59j1NDHD6siPSZL24HJnPOF7m07DKf4U96Tq4dtByvg@mail.gmail.com>

On Wed, Mar 30, 2016 at 2:20 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Wed, Mar 30, 2016 at 12:41 PM, Jesse Brandeburg
> <jesse.brandeburg@intel.com> wrote:
>> On Wed, 30 Mar 2016 10:35:55 -0700
>> Alexander Duyck <alexander.duyck@gmail.com> wrote:
>>
>>> On Wed, Mar 30, 2016 at 10:20 AM, Sowmini Varadhan
>>> <sowmini.varadhan@oracle.com> wrote:
>>> > On (03/30/16 10:12), Alexander Duyck wrote:
>>> >> Yeah.  The patch was sort of a knee-jerk reaction to being told that
>>> >> the patch referenced caused a regression.  From what I can tell that
>>> >> is not the case as I am also seeing the Tx hangs when I run the test
>>> >> with the frames being linearized.
>>> >
>>> > I'm not sure how important of a subtlety this is, but the actual
>>> > console log after the patch is the following:
>>> >
>>> >  i40e 0000:82:00.0: TX driver issue detected, PF reset issued
>>> >  i40e 0000:82:00.0 eth2: adding 68:05:ca:30:dd:18 vid=0
>>> >  i40e 0000:82:00.0: TX driver issue detected, PF reset issued
>>> >  i40e 0000:82:00.0 eth2: adding 68:05:ca:30:dd:18 vid=0
>>> >  i40e 0000:82:00.0: TX driver issue detected, PF reset issued
>>> >
>>> > Comparing with what I'd pasted in the sourceforge thread earlier,
>>> > I see that it does not say "Hung Tx queue etc."  any more, though
>>> > it still resets.
>>> >
>>> > Not sure if that changed info is significant?
>>>
>>> It might be.  Right now I am chasing down the Tx driver issue as that
>>> I what I am reproducing in my environment as well.
>>
>> This gets "Even Uglier", I've turned off all offloads at my receiver,
>> enabled calling skb_linearize on *all* frames, which works fine for
>> scp, but the receiver shows > MSS sized frames on the wire for
>> rds-stress traffic.
>
> Are you sure it isn't just GRO reassembling frames on the receive
> side.  I know that one always trips me up when I am using the Rx path
> to validate Tx checksums.
>
>> This implies to me we have some issue with skb_linearize, possibly in
>> how the stack linearizes the data, or how the driver interprets the
>> linearized packets (which should always work)
>>
>> Wheee......
>
> With the descriptor dump code you have you should be able to verify
> what the layout is after the descriptor is linearized.  I would think
> in most cases you would end up with at most something like 4 to maybe
> 5 descriptors for a 64K frame.
>
> - Alex

Actually I think I just found an issue I missed in the patch.  I
didn't update the inline function that was performing the check for 8
descriptors.  As such it was allowing TSO with 8 descriptors to pass
even though the fact that the head and payload in the first descriptor
had pushed it to 9.

I should have a v2 ready in about 20 minutes or so.  In my testing it
fixes the issue.

- Alex

^ permalink raw reply

* [net PATCH v2] i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet
From: Alexander Duyck @ 2016-03-30 23:15 UTC (permalink / raw)
  To: netdev, jesse.brandeburg, alexander.duyck, intel-wired-lan,
	jeffrey.t.kirsher, sowmini.varadhan

This patch addresses a bug introduced based on my interpretation of the
XL710 datasheet.  Specifically section 8.4.1 states that "A single transmit
packet may span up to 8 buffers (up to 8 data descriptors per packet
including both the header and payload buffers)."  It then later goes on to
say that each segment for a TSO obeys the previous rule, however it then
refers to TSO header and the segment payload buffers.

I believe the actual limit for fragments with TSO and a skbuff that has
payload data in the header portion of the buffer is actually only 7
fragments as the skb->data portion counts as 2 buffers, one for the TSO
header, and one for a segment payload buffer.

Fixes: 2d37490b82af ("i40e/i40evf: Rewrite logic for 8 descriptor per packet check")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---

v2: I realized that I overlooked the check in the inline function and as a
    result we were still allowing for cases where 8 descriptors were being
    used per packet and this would result in 9 DMA buffers.  I updated the
    code so that we only allow 8 in the case of a single send, otherwise we
    go into the function that walks the frags to verify each block.

I have tested this using rds-stress and it seems to run traffic without
throwing any errors.

 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   49 ++++++++++++-------------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |   10 ++++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |   49 ++++++++++++-------------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h |   10 ++++-
 4 files changed, 62 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 5d5fa5359a1d..6bf705add321 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2597,35 +2597,34 @@ int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size)
 }
 
 /**
- * __i40e_chk_linearize - Check if there are more than 8 fragments per packet
+ * __i40e_chk_linearize - Check if there are more than 8 buffers per packet
  * @skb:      send buffer
  *
- * Note: Our HW can't scatter-gather more than 8 fragments to build
- * a packet on the wire and so we need to figure out the cases where we
- * need to linearize the skb.
+ * Note: Our HW can't DMA more than 8 buffers to build a packet on the wire
+ * and so we need to figure out the cases where we need to linearize the skb.
+ *
+ * For TSO we need to count the TSO header and segment payload separately.
+ * As such we need to check cases where we have 7 fragments or more as we
+ * can potentially require 9 DMA transactions, 1 for the TSO header, 1 for
+ * the segment payload in the first descriptor, and another 7 for the
+ * fragments.
  **/
 bool __i40e_chk_linearize(struct sk_buff *skb)
 {
 	const struct skb_frag_struct *frag, *stale;
-	int gso_size, nr_frags, sum;
-
-	/* check to see if TSO is enabled, if so we may get a repreive */
-	gso_size = skb_shinfo(skb)->gso_size;
-	if (unlikely(!gso_size))
-		return true;
+	int nr_frags, sum;
 
-	/* no need to check if number of frags is less than 8 */
+	/* no need to check if number of frags is less than 7 */
 	nr_frags = skb_shinfo(skb)->nr_frags;
-	if (nr_frags < I40E_MAX_BUFFER_TXD)
+	if (nr_frags < (I40E_MAX_BUFFER_TXD - 1))
 		return false;
 
 	/* We need to walk through the list and validate that each group
 	 * of 6 fragments totals at least gso_size.  However we don't need
-	 * to perform such validation on the first or last 6 since the first
-	 * 6 cannot inherit any data from a descriptor before them, and the
-	 * last 6 cannot inherit any data from a descriptor after them.
+	 * to perform such validation on the last 6 since the last 6 cannot
+	 * inherit any data from a descriptor after them.
 	 */
-	nr_frags -= I40E_MAX_BUFFER_TXD - 1;
+	nr_frags -= I40E_MAX_BUFFER_TXD - 2;
 	frag = &skb_shinfo(skb)->frags[0];
 
 	/* Initialize size to the negative value of gso_size minus 1.  We
@@ -2634,21 +2633,21 @@ bool __i40e_chk_linearize(struct sk_buff *skb)
 	 * descriptors for a single transmit as the header and previous
 	 * fragment are already consuming 2 descriptors.
 	 */
-	sum = 1 - gso_size;
+	sum = 1 - skb_shinfo(skb)->gso_size;
 
-	/* Add size of frags 1 through 5 to create our initial sum */
-	sum += skb_frag_size(++frag);
-	sum += skb_frag_size(++frag);
-	sum += skb_frag_size(++frag);
-	sum += skb_frag_size(++frag);
-	sum += skb_frag_size(++frag);
+	/* Add size of frags 0 through 4 to create our initial sum */
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
 
 	/* Walk through fragments adding latest fragment, testing it, and
 	 * then removing stale fragments from the sum.
 	 */
 	stale = &skb_shinfo(skb)->frags[0];
 	for (;;) {
-		sum += skb_frag_size(++frag);
+		sum += skb_frag_size(frag++);
 
 		/* if sum is negative we failed to make sufficient progress */
 		if (sum < 0)
@@ -2658,7 +2657,7 @@ bool __i40e_chk_linearize(struct sk_buff *skb)
 		if (!--nr_frags)
 			break;
 
-		sum -= skb_frag_size(++stale);
+		sum -= skb_frag_size(stale++);
 	}
 
 	return false;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 681e9bca37db..c864932bd844 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -442,10 +442,14 @@ static inline int i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size)
  **/
 static inline bool i40e_chk_linearize(struct sk_buff *skb, int count)
 {
-	/* we can only support up to 8 data buffers for a single send */
-	if (likely(count <= I40E_MAX_BUFFER_TXD))
+	/* Both TSO and single send will work if count is less than 8 */
+	if (likely(count < I40E_MAX_BUFFER_TXD))
 		return false;
 
-	return __i40e_chk_linearize(skb);
+	if (skb_is_gso(skb))
+		return __i40e_chk_linearize(skb);
+
+	/* we can support up to 8 data buffers for a single send */
+	return count != I40E_MAX_BUFFER_TXD;
 }
 #endif /* _I40E_TXRX_H_ */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 04aabc52ba0d..519256bb63e8 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1799,35 +1799,34 @@ static void i40e_create_tx_ctx(struct i40e_ring *tx_ring,
 }
 
 /**
- * __i40evf_chk_linearize - Check if there are more than 8 fragments per packet
+ * __i40evf_chk_linearize - Check if there are more than 8 buffers per packet
  * @skb:      send buffer
  *
- * Note: Our HW can't scatter-gather more than 8 fragments to build
- * a packet on the wire and so we need to figure out the cases where we
- * need to linearize the skb.
+ * Note: Our HW can't DMA more than 8 buffers to build a packet on the wire
+ * and so we need to figure out the cases where we need to linearize the skb.
+ *
+ * For TSO we need to count the TSO header and segment payload separately.
+ * As such we need to check cases where we have 7 fragments or more as we
+ * can potentially require 9 DMA transactions, 1 for the TSO header, 1 for
+ * the segment payload in the first descriptor, and another 7 for the
+ * fragments.
  **/
 bool __i40evf_chk_linearize(struct sk_buff *skb)
 {
 	const struct skb_frag_struct *frag, *stale;
-	int gso_size, nr_frags, sum;
-
-	/* check to see if TSO is enabled, if so we may get a repreive */
-	gso_size = skb_shinfo(skb)->gso_size;
-	if (unlikely(!gso_size))
-		return true;
+	int nr_frags, sum;
 
-	/* no need to check if number of frags is less than 8 */
+	/* no need to check if number of frags is less than 7 */
 	nr_frags = skb_shinfo(skb)->nr_frags;
-	if (nr_frags < I40E_MAX_BUFFER_TXD)
+	if (nr_frags < (I40E_MAX_BUFFER_TXD - 1))
 		return false;
 
 	/* We need to walk through the list and validate that each group
 	 * of 6 fragments totals at least gso_size.  However we don't need
-	 * to perform such validation on the first or last 6 since the first
-	 * 6 cannot inherit any data from a descriptor before them, and the
-	 * last 6 cannot inherit any data from a descriptor after them.
+	 * to perform such validation on the last 6 since the last 6 cannot
+	 * inherit any data from a descriptor after them.
 	 */
-	nr_frags -= I40E_MAX_BUFFER_TXD - 1;
+	nr_frags -= I40E_MAX_BUFFER_TXD - 2;
 	frag = &skb_shinfo(skb)->frags[0];
 
 	/* Initialize size to the negative value of gso_size minus 1.  We
@@ -1836,21 +1835,21 @@ bool __i40evf_chk_linearize(struct sk_buff *skb)
 	 * descriptors for a single transmit as the header and previous
 	 * fragment are already consuming 2 descriptors.
 	 */
-	sum = 1 - gso_size;
+	sum = 1 - skb_shinfo(skb)->gso_size;
 
-	/* Add size of frags 1 through 5 to create our initial sum */
-	sum += skb_frag_size(++frag);
-	sum += skb_frag_size(++frag);
-	sum += skb_frag_size(++frag);
-	sum += skb_frag_size(++frag);
-	sum += skb_frag_size(++frag);
+	/* Add size of frags 0 through 4 to create our initial sum */
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
+	sum += skb_frag_size(frag++);
 
 	/* Walk through fragments adding latest fragment, testing it, and
 	 * then removing stale fragments from the sum.
 	 */
 	stale = &skb_shinfo(skb)->frags[0];
 	for (;;) {
-		sum += skb_frag_size(++frag);
+		sum += skb_frag_size(frag++);
 
 		/* if sum is negative we failed to make sufficient progress */
 		if (sum < 0)
@@ -1860,7 +1859,7 @@ bool __i40evf_chk_linearize(struct sk_buff *skb)
 		if (!--nr_frags)
 			break;
 
-		sum -= skb_frag_size(++stale);
+		sum -= skb_frag_size(stale++);
 	}
 
 	return false;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 6cf116972f62..396c3893ce8a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -424,10 +424,14 @@ static inline int i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size)
  **/
 static inline bool i40e_chk_linearize(struct sk_buff *skb, int count)
 {
-	/* we can only support up to 8 data buffers for a single send */
-	if (likely(count <= I40E_MAX_BUFFER_TXD))
+	/* Both TSO and single send will work if count is less than 8 */
+	if (likely(count < I40E_MAX_BUFFER_TXD))
 		return false;
 
-	return __i40evf_chk_linearize(skb);
+	if (skb_is_gso(skb))
+		return __i40evf_chk_linearize(skb);
+
+	/* we can support up to 8 data buffers for a single send */
+	return count != I40E_MAX_BUFFER_TXD;
 }
 #endif /* _I40E_TXRX_H_ */

^ permalink raw reply related

* Re: [net PATCH v2] i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet
From: Sowmini Varadhan @ 2016-03-30 23:51 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, jesse.brandeburg, alexander.duyck, intel-wired-lan,
	jeffrey.t.kirsher
In-Reply-To: <20160330231116.12643.59554.stgit@localhost.localdomain>


Yes, this fixes it for me too!

Tested-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

^ permalink raw reply

* Re: [PATCH iproute2 v1 1/1] lib/utils: fix get_addr() and get_prefix() error messages
From: Stephen Hemminger @ 2016-03-30 23:54 UTC (permalink / raw)
  To: Varlese, Marco
  Cc: netdev@vger.kernel.org, davem@davemloft.net, Jiri Pirko,
	John Fastabend, jhs@mojatatu.com, Szczerbik, PrzemyslawX
In-Reply-To: <C4896FB061E7DE4AAC93031BDCA044B11A2146FF@IRSMSX108.ger.corp.intel.com>

On Tue, 22 Mar 2016 13:02:02 +0000
"Varlese, Marco" <marco.varlese@intel.com> wrote:

> An attempt to add invalid address to interface would print "???" string
> instead of the address family name.
> 
> For example:
> $ ip address add 256.10.166.1/24 dev ens8
> Error: ??? prefix is expected rather than "256.10.166.1/24".
> 
> $ ip neighbor add proxy 2001:db8::g dev ens8
> Error: ??? address is expected rather than "2001:db8::g".
> 
> With this patch the output will look like:
> $ ip address add 256.10.166.1/24 dev ens8
> Error: inet prefix is expected rather than "256.10.166.1/24".
> 
> $ ip neighbor add proxy 2001:db8::g dev ens8
> Error: inet6 address is expected rather than "2001:db8::g".
> 
> Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik@intel.com>
> Signed-off-by: Marco Varlese <marco.varlese@intel.com>
> ---
> lib/utils.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

If you look at git, I already applied this by manual fix.

^ permalink raw reply

* [PATCH net] tun, bpf: fix suspicious RCU usage in tun_{attach,detach}_filter
From: Daniel Borkmann @ 2016-03-31  0:13 UTC (permalink / raw)
  To: davem
  Cc: alexei.starovoitov, mkubecek, sasha.levin, jslaby, eric.dumazet,
	mst, netdev, Daniel Borkmann

Sasha Levin reported a suspicious rcu_dereference_protected() warning
found while fuzzing with trinity that is similar to this one:

  [   52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
  [   52.765688] other info that might help us debug this:
  [   52.765695] rcu_scheduler_active = 1, debug_locks = 1
  [   52.765701] 1 lock held by a.out/1525:
  [   52.765704]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
  [   52.765721] stack backtrace:
  [   52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
  [...]
  [   52.765768] Call Trace:
  [   52.765775]  [<ffffffff813e488d>] dump_stack+0x85/0xc8
  [   52.765784]  [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
  [   52.765792]  [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
  [   52.765801]  [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
  [   52.765810]  [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
  [   52.765818]  [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
  [   52.765827]  [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
  [   52.765834]  [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
  [   52.765843]  [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
  [   52.765850]  [<ffffffff81261519>] SyS_ioctl+0x79/0x90
  [   52.765858]  [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
  [   52.765866]  [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25

Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.

Since the fix in f91ff5b9ff52 ("net: sk_{detach|attach}_filter() rcu
fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
filter with rcu_dereference_protected(), checking whether socket lock
is held in control path.

Since its introduction in 994051625981 ("tun: socket filter support"),
tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
sock_owned_by_user(sk) doesn't apply in this specific case and therefore
triggers the false positive.

Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
that is used by tap filters and pass in lockdep_rtnl_is_held() for the
rcu_dereference_protected() checks instead.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 drivers/net/tun.c      |  8 +++++---
 include/linux/filter.h |  4 ++++
 net/core/filter.c      | 33 +++++++++++++++++++++------------
 3 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index afdf950..510e90a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -622,7 +622,8 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte
 
 	/* Re-attach the filter to persist device */
 	if (!skip_filter && (tun->filter_attached == true)) {
-		err = sk_attach_filter(&tun->fprog, tfile->socket.sk);
+		err = __sk_attach_filter(&tun->fprog, tfile->socket.sk,
+					 lockdep_rtnl_is_held());
 		if (!err)
 			goto out;
 	}
@@ -1822,7 +1823,7 @@ static void tun_detach_filter(struct tun_struct *tun, int n)
 
 	for (i = 0; i < n; i++) {
 		tfile = rtnl_dereference(tun->tfiles[i]);
-		sk_detach_filter(tfile->socket.sk);
+		__sk_detach_filter(tfile->socket.sk, lockdep_rtnl_is_held());
 	}
 
 	tun->filter_attached = false;
@@ -1835,7 +1836,8 @@ static int tun_attach_filter(struct tun_struct *tun)
 
 	for (i = 0; i < tun->numqueues; i++) {
 		tfile = rtnl_dereference(tun->tfiles[i]);
-		ret = sk_attach_filter(&tun->fprog, tfile->socket.sk);
+		ret = __sk_attach_filter(&tun->fprog, tfile->socket.sk,
+					 lockdep_rtnl_is_held());
 		if (ret) {
 			tun_detach_filter(tun, i);
 			return ret;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 43aa1f8..a51a536 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -465,10 +465,14 @@ int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
 void bpf_prog_destroy(struct bpf_prog *fp);
 
 int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
+int __sk_attach_filter(struct sock_fprog *fprog, struct sock *sk,
+		       bool locked);
 int sk_attach_bpf(u32 ufd, struct sock *sk);
 int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk);
 int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk);
 int sk_detach_filter(struct sock *sk);
+int __sk_detach_filter(struct sock *sk, bool locked);
+
 int sk_get_filter(struct sock *sk, struct sock_filter __user *filter,
 		  unsigned int len);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 4b81b71..ca7f832 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1149,7 +1149,8 @@ void bpf_prog_destroy(struct bpf_prog *fp)
 }
 EXPORT_SYMBOL_GPL(bpf_prog_destroy);
 
-static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk)
+static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk,
+			    bool locked)
 {
 	struct sk_filter *fp, *old_fp;
 
@@ -1165,10 +1166,8 @@ static int __sk_attach_prog(struct bpf_prog *prog, struct sock *sk)
 		return -ENOMEM;
 	}
 
-	old_fp = rcu_dereference_protected(sk->sk_filter,
-					   sock_owned_by_user(sk));
+	old_fp = rcu_dereference_protected(sk->sk_filter, locked);
 	rcu_assign_pointer(sk->sk_filter, fp);
-
 	if (old_fp)
 		sk_filter_uncharge(sk, old_fp);
 
@@ -1247,7 +1246,8 @@ struct bpf_prog *__get_filter(struct sock_fprog *fprog, struct sock *sk)
  * occurs or there is insufficient memory for the filter a negative
  * errno code is returned. On success the return is zero.
  */
-int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
+int __sk_attach_filter(struct sock_fprog *fprog, struct sock *sk,
+		       bool locked)
 {
 	struct bpf_prog *prog = __get_filter(fprog, sk);
 	int err;
@@ -1255,7 +1255,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 	if (IS_ERR(prog))
 		return PTR_ERR(prog);
 
-	err = __sk_attach_prog(prog, sk);
+	err = __sk_attach_prog(prog, sk, locked);
 	if (err < 0) {
 		__bpf_prog_release(prog);
 		return err;
@@ -1263,7 +1263,12 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(sk_attach_filter);
+EXPORT_SYMBOL_GPL(__sk_attach_filter);
+
+int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
+{
+	return __sk_attach_filter(fprog, sk, sock_owned_by_user(sk));
+}
 
 int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 {
@@ -1309,7 +1314,7 @@ int sk_attach_bpf(u32 ufd, struct sock *sk)
 	if (IS_ERR(prog))
 		return PTR_ERR(prog);
 
-	err = __sk_attach_prog(prog, sk);
+	err = __sk_attach_prog(prog, sk, sock_owned_by_user(sk));
 	if (err < 0) {
 		bpf_prog_put(prog);
 		return err;
@@ -2250,7 +2255,7 @@ static int __init register_sk_filter_ops(void)
 }
 late_initcall(register_sk_filter_ops);
 
-int sk_detach_filter(struct sock *sk)
+int __sk_detach_filter(struct sock *sk, bool locked)
 {
 	int ret = -ENOENT;
 	struct sk_filter *filter;
@@ -2258,8 +2263,7 @@ int sk_detach_filter(struct sock *sk)
 	if (sock_flag(sk, SOCK_FILTER_LOCKED))
 		return -EPERM;
 
-	filter = rcu_dereference_protected(sk->sk_filter,
-					   sock_owned_by_user(sk));
+	filter = rcu_dereference_protected(sk->sk_filter, locked);
 	if (filter) {
 		RCU_INIT_POINTER(sk->sk_filter, NULL);
 		sk_filter_uncharge(sk, filter);
@@ -2268,7 +2272,12 @@ int sk_detach_filter(struct sock *sk)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(sk_detach_filter);
+EXPORT_SYMBOL_GPL(__sk_detach_filter);
+
+int sk_detach_filter(struct sock *sk)
+{
+	return __sk_detach_filter(sk, sock_owned_by_user(sk));
+}
 
 int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 		  unsigned int len)
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH net-next 0/8] add TX timestamping via cmsg
From: Martin KaFai Lau @ 2016-03-31  0:38 UTC (permalink / raw)
  To: Soheil Hassas Yeganeh
  Cc: davem, netdev, willemb, edumazet, ycheng, ncardwell,
	Soheil Hassas Yeganeh
In-Reply-To: <1459377448-2239-1-git-send-email-soheil.kdev@gmail.com>

On Wed, Mar 30, 2016 at 06:37:20PM -0400, Soheil Hassas Yeganeh wrote:
> I will follow up with another patch to enable timestamping for
> active TFO (client-side TCP Fast Open) and also setting packet
> mark via cmsgs.
Can you share more details on what 'setting packet mark' does?

Nice patches!

^ permalink raw reply

* Re: [PATCH net-next 0/8] add TX timestamping via cmsg
From: Soheil Hassas Yeganeh @ 2016-03-31  1:08 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev
In-Reply-To: <20160331003744.GA3300@kafai-mba.local>

On Wed, Mar 30, 2016 at 8:38 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> On Wed, Mar 30, 2016 at 06:37:20PM -0400, Soheil Hassas Yeganeh wrote:
>> I will follow up with another patch to enable timestamping for
>> active TFO (client-side TCP Fast Open) and also setting packet
>> mark via cmsgs.
> Can you share more details on what 'setting packet mark' does?

I meant, since SOL_SOCKET-level csmg's will be processed for all
protocols, users should be able to override sk->sk_mark via command
messages when possible similar to what the following patch did for
packet sockets:
http://patchwork.ozlabs.org/patch/527991/

It'll be a tiny followup patch.

> Nice patches!

Thank you so much!

Cheers,
Soheil

^ permalink raw reply

* Re: [PATCH] net: fec: stop the "rcv is not +last, " error messages
From: Greg Ungerer @ 2016-03-31  1:17 UTC (permalink / raw)
  To: Fabio Estevam; +Cc: Troy Kisky, netdev@vger.kernel.org
In-Reply-To: <CAOMZO5B79aHocuqT4jgEU9gGcFYD5ZSDZh8PrGeXYAcLHW1x5Q@mail.gmail.com>

Hi Fabio,

On 31/03/16 04:37, Fabio Estevam wrote:
> Hi Greg,
> 
> On Wed, Mar 30, 2016 at 12:24 AM, Greg Ungerer <gerg@uclinux.org> wrote:
>> Hi Troy,
>>
>> Commit 55cd48c8 ('net: fec: stop the "rcv is not +last, " error
>> messages') adds a write to a register that is not present in all
>> implementations of the FEC hardware module. None of the ColdFire
>> SoC parts with the FEC module have the FTRL (0x1b0) register.
>>
>> Does this need a quirk flag to key access to this register of?
>> Or can you piggyback on the FEC_QUIRK_HAS_RACC flag?
> 
> Would the change below work on Coldfire?
> 
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -943,8 +943,8 @@ fec_restart(struct net_device *ndev)
>                 else
>                         val &= ~FEC_RACC_OPTIONS;
>                 writel(val, fep->hwp + FEC_RACC);
> -       }
>         writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
> +       }
>  #endif
> 
>         /*

Yes, that fixes it. Will you carry this change?

Regards
Greg

^ permalink raw reply

* Re: [PATCH net] tun, bpf: fix suspicious RCU usage in tun_{attach,detach}_filter
From: Alexei Starovoitov @ 2016-03-31  1:18 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: davem, mkubecek, sasha.levin, jslaby, eric.dumazet, mst, netdev
In-Reply-To: <755ee9ec1f6d2229be41806964b372548e4b7586.1459382574.git.daniel@iogearbox.net>

On Thu, Mar 31, 2016 at 02:13:18AM +0200, Daniel Borkmann wrote:
> Sasha Levin reported a suspicious rcu_dereference_protected() warning
> found while fuzzing with trinity that is similar to this one:
> 
>   [   52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
>   [   52.765688] other info that might help us debug this:
>   [   52.765695] rcu_scheduler_active = 1, debug_locks = 1
>   [   52.765701] 1 lock held by a.out/1525:
>   [   52.765704]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
>   [   52.765721] stack backtrace:
>   [   52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
>   [...]
>   [   52.765768] Call Trace:
>   [   52.765775]  [<ffffffff813e488d>] dump_stack+0x85/0xc8
>   [   52.765784]  [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
>   [   52.765792]  [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
>   [   52.765801]  [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
>   [   52.765810]  [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
>   [   52.765818]  [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
>   [   52.765827]  [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
>   [   52.765834]  [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
>   [   52.765843]  [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
>   [   52.765850]  [<ffffffff81261519>] SyS_ioctl+0x79/0x90
>   [   52.765858]  [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
>   [   52.765866]  [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25
> 
> Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
> from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
> DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.
> 
> Since the fix in f91ff5b9ff52 ("net: sk_{detach|attach}_filter() rcu
> fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
> filter with rcu_dereference_protected(), checking whether socket lock
> is held in control path.
> 
> Since its introduction in 994051625981 ("tun: socket filter support"),
> tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
> sock_owned_by_user(sk) doesn't apply in this specific case and therefore
> triggers the false positive.
> 
> Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
> that is used by tap filters and pass in lockdep_rtnl_is_held() for the
> rcu_dereference_protected() checks instead.
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  drivers/net/tun.c      |  8 +++++---
>  include/linux/filter.h |  4 ++++
>  net/core/filter.c      | 33 +++++++++++++++++++++------------
>  3 files changed, 30 insertions(+), 15 deletions(-)

kinda heavy patch to shut up lockdep.
Can we do
old_fp = rcu_dereference_protected(sk->sk_filter,
                                sock_owned_by_user(sk) || lockdep_rtnl_is_held());
and it always be correct?
I think right now tun is the only such user, but if it's correct for tun,
it's correct for future users too. If not correct then not correct for tun either.
Or I'm missing something?

^ permalink raw reply

* Re: [net-next] bond: output message before setting slave to inactive
From: 张胜举 @ 2016-03-31  1:23 UTC (permalink / raw)
  To: 'David Miller'; +Cc: j.vosburgh, vfalico, gospo, netdev

> From: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
> Date: Tue, 29 Mar 2016 06:32:57 +0000
> 
> > This patch moves output message before setting slave to inactive, this
> > will print the correct status of slave device.
> >
> > Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
> 
> I think the message is in the appropriate spot wrt. state, so I will not
apply
> this, thanks.

The message will always print 'backup',  since the code set it to inactive
before this output.
I think this will cause a little confused, because even when I detach an
active slave, the output is always 'backup'.

Thanks.

^ permalink raw reply

* RE: [PATCH 4/5] fsl/qe: Add QE TDM lib
From: Qiang Zhao @ 2016-03-31  1:25 UTC (permalink / raw)
  To: Joakim Tjernlund, davem@davemloft.net
  Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	Xiaobo Xie, oss@buserror.net, gregkh@linuxfoundation.org,
	akpm@linux-foundation.org, netdev@vger.kernel.org
In-Reply-To: <1459338701.3538.17.camel@infinera.com>

On Wed, 2016-03-30 at 07:50PM, Joakim Tjernlund wrote:
> -----Original Message-----
> From: Joakim Tjernlund [mailto:Joakim.Tjernlund@infinera.com]
> Sent: Wednesday, March 30, 2016 7:50 PM
> To: davem@davemloft.net; Qiang Zhao <qiang.zhao@nxp.com>
> Cc: linuxppc-dev@lists.ozlabs.org; linux-kernel@vger.kernel.org; Xiaobo Xie
> <xiaobo.xie@nxp.com>; oss@buserror.net; gregkh@linuxfoundation.org;
> akpm@linux-foundation.org; netdev@vger.kernel.org
> Subject: Re: [PATCH 4/5] fsl/qe: Add QE TDM lib
> 
> On Wed, 2016-03-30 at 16:50 +0800, Zhao Qiang wrote:
> > QE has module to support TDM, some other protocols supported by QE are
> > based on TDM.
> > add a qe-tdm lib, this lib provides functions to the protocols using
> > TDM to configurate QE-TDM.
> >
> > Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
> > +	utdm->siram_entry_id = val;
> > +
> > +	set_si_param(utdm, ut_info);
> > +
> > +	np2 = of_find_compatible_node(NULL, NULL, "fsl,t1040-qe-si");
> 
> fsl,t1040-qe-si only? What about mpc83xx?
> I recall QE is a little bit different compared to T1040 or will this work(including
> the hdlc driver) on 83xx as well?

The " fsl,t1040-qe-si " is new added to dts and bindings, it is required to have SoC specific compatible strings.
mpc83xx will not use qe-si node. If there will be other soc useing qe-si, " fsl,t1040-qe-si " will follow the soc specific compatible,
like : 
                si1: si@700 {
                        compatible = "fsl,ls1043-qe-si", "fsl,t1040-qe-si";
                        reg = <0x700 0x80>;
                };

Best Regards
Zhao Qiang

^ permalink raw reply

* RE: [PATCH] net: fec: stop the "rcv is not +last, " error messages
From: Fugang Duan @ 2016-03-31  1:41 UTC (permalink / raw)
  To: Fabio Estevam, Greg Ungerer; +Cc: Troy Kisky, netdev@vger.kernel.org
In-Reply-To: <CAOMZO5B79aHocuqT4jgEU9gGcFYD5ZSDZh8PrGeXYAcLHW1x5Q@mail.gmail.com>

From: Fabio Estevam <festevam@gmail.com> Sent: Thursday, March 31, 2016 2:37 AM
> To: Greg Ungerer <gerg@uclinux.org>
> Cc: Troy Kisky <troy.kisky@boundarydevices.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH] net: fec: stop the "rcv is not +last, " error messages
> 
> Hi Greg,
> 
> On Wed, Mar 30, 2016 at 12:24 AM, Greg Ungerer <gerg@uclinux.org> wrote:
> > Hi Troy,
> >
> > Commit 55cd48c8 ('net: fec: stop the "rcv is not +last, " error
> > messages') adds a write to a register that is not present in all
> > implementations of the FEC hardware module. None of the ColdFire SoC
> > parts with the FEC module have the FTRL (0x1b0) register.
> >
> > Does this need a quirk flag to key access to this register of?
> > Or can you piggyback on the FEC_QUIRK_HAS_RACC flag?
> 
> Would the change below work on Coldfire?
> 
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -943,8 +943,8 @@ fec_restart(struct net_device *ndev)
>                 else
>                         val &= ~FEC_RACC_OPTIONS;
>                 writel(val, fep->hwp + FEC_RACC);
> -       }
>         writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
> +       }
>  #endif
> 
>         /*

Fabio, we cannot do it like this that may cause confused for the quirk flag "FEC_QUIRK_HAS_RACC".


Hi, Greg,

The header file fec.h define the FEC_FTRL as below,  if ColdFire SoC has no this register,  we may remove the define in here and define the register according to SOC type. For example, it is ColdFire Soc, define it as 0xFFF. Is it  feasible ?

#if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
    defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM)
...
#define FEC_FTRL                0x1b0
...
#else
...
#endif


Regards,
Andy

^ permalink raw reply

* Re: [PATCH] net: fec: stop the "rcv is not +last, " error messages
From: Greg Ungerer @ 2016-03-31  1:59 UTC (permalink / raw)
  To: Fugang Duan, Fabio Estevam; +Cc: Troy Kisky, netdev@vger.kernel.org
In-Reply-To: <VI1PR0401MB1855458E0B31502D1C112AF6FF990@VI1PR0401MB1855.eurprd04.prod.outlook.com>

Hi Andy,

On 31/03/16 11:41, Fugang Duan wrote:
> From: Fabio Estevam <festevam@gmail.com> Sent: Thursday, March 31, 2016 2:37 AM
>> To: Greg Ungerer <gerg@uclinux.org>
>> Cc: Troy Kisky <troy.kisky@boundarydevices.com>; netdev@vger.kernel.org
>> Subject: Re: [PATCH] net: fec: stop the "rcv is not +last, " error messages
>>
>> Hi Greg,
>>
>> On Wed, Mar 30, 2016 at 12:24 AM, Greg Ungerer <gerg@uclinux.org> wrote:
>>> Hi Troy,
>>>
>>> Commit 55cd48c8 ('net: fec: stop the "rcv is not +last, " error
>>> messages') adds a write to a register that is not present in all
>>> implementations of the FEC hardware module. None of the ColdFire SoC
>>> parts with the FEC module have the FTRL (0x1b0) register.
>>>
>>> Does this need a quirk flag to key access to this register of?
>>> Or can you piggyback on the FEC_QUIRK_HAS_RACC flag?
>>
>> Would the change below work on Coldfire?
>>
>> --- a/drivers/net/ethernet/freescale/fec_main.c
>> +++ b/drivers/net/ethernet/freescale/fec_main.c
>> @@ -943,8 +943,8 @@ fec_restart(struct net_device *ndev)
>>                 else
>>                         val &= ~FEC_RACC_OPTIONS;
>>                 writel(val, fep->hwp + FEC_RACC);
>> -       }
>>         writel(PKT_MAXBUF_SIZE, fep->hwp + FEC_FTRL);
>> +       }
>>  #endif
>>
>>         /*
> 
> Fabio, we cannot do it like this that may cause confused for the quirk flag "FEC_QUIRK_HAS_RACC".
> 
> 
> Hi, Greg,
> 
> The header file fec.h define the FEC_FTRL as below,  if ColdFire SoC has no this register,  we may remove the define in here and define the register according to SOC type. For example, it is ColdFire Soc, define it as 0xFFF. Is it  feasible ?
> 
> #if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || \
>     defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM)
> ...
> #define FEC_FTRL                0x1b0
> ...
> #else
> ...
> #endif

Sure you could do that. But... you still have to be careful
with references to it in fec_main.c. They will need be conditional
on existence of FEC_FTRL, otherwise you break compilation.

Even if you define it to some bogus value we still don't 
want the code actually writing to it.

Regards
Greg

^ permalink raw reply

* 1oan @ low interest rate
From: Mr. Wolf @ 2016-03-31  1:06 UTC (permalink / raw)
  To: Recipients

Are you interested in starting up a business or do you have financial
difficulties that requires capital....what about you over there looking to
solve that personal need? look no further as we are here to offer you a loan of 3% monthly with no hidden charges. All you need to do is send us an email to pierrewoolff@gmail.com for more information

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox