Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 14/27] 8021q: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 net/8021q/vlan_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index e2ed698..604a67a 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -50,7 +50,7 @@ bool vlan_do_receive(struct sk_buff **skbp)
 	}
 
 	skb->priority = vlan_get_ingress_priority(vlan_dev, skb->vlan_tci);
-	skb->vlan_tci = 0;
+	__vlan_hwaccel_clear_tag(skb);
 
 	rx_stats = this_cpu_ptr(vlan_dev_priv(vlan_dev)->vlan_pcpu_stats);
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 15/27] ipv4/tunnel: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 net/ipv4/ip_tunnel_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index fed3d29..0004a54 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -120,7 +120,7 @@ int __iptunnel_pull_header(struct sk_buff *skb, int hdr_len,
 	}
 
 	skb_clear_hash_if_not_l4(skb);
-	skb->vlan_tci = 0;
+	__vlan_hwaccel_clear_tag(skb);
 	skb_set_queue_mapping(skb, 0);
 	skb_scrub_packet(skb, xnet);
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 11/27] sky2: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev; +Cc: Mirko Lindner, Stephen Hemminger
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/ethernet/marvell/sky2.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index b60ad0e..bcd20e0 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -2485,13 +2485,11 @@ static struct sk_buff *receive_copy(struct sky2_port *sky2,
 		skb->ip_summed = re->skb->ip_summed;
 		skb->csum = re->skb->csum;
 		skb_copy_hash(skb, re->skb);
-		skb->vlan_proto = re->skb->vlan_proto;
-		skb->vlan_tci = re->skb->vlan_tci;
+		__vlan_hwaccel_copy_tag(skb, re->skb);
 
 		pci_dma_sync_single_for_device(sky2->hw->pdev, re->data_addr,
 					       length, PCI_DMA_FROMDEVICE);
-		re->skb->vlan_proto = 0;
-		re->skb->vlan_tci = 0;
+		__vlan_hwaccel_clear_tag(re->skb);
 		skb_clear_hash(re->skb);
 		re->skb->ip_summed = CHECKSUM_NONE;
 		skb_put(skb, length);
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 09/27] cxgb4: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev; +Cc: Steve Wise
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

This also initializes vlan_proto field.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/infiniband/hw/cxgb4/cm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index f1510cc..66a3d39 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -3899,7 +3899,7 @@ static int rx_pkt(struct c4iw_dev *dev, struct sk_buff *skb)
 	} else {
 		vlan_eh = (struct vlan_ethhdr *)(req + 1);
 		iph = (struct iphdr *)(vlan_eh + 1);
-		skb->vlan_tci = ntohs(cpl->vlan);
+		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), ntohs(cpl->vlan));
 	}
 
 	if (iph->version != 0x4)
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 08/27] net/hyperv: remove use of VLAN_TAG_PRESENT
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev; +Cc: K. Y. Srinivasan, Haiyang Zhang,
	open list:Hyper-V CORE AND DRIVERS
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/hyperv/hyperv_net.h   |  2 +-
 drivers/net/hyperv/netvsc_drv.c   | 13 ++++++-------
 drivers/net/hyperv/rndis_filter.c |  4 ++--
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 3958ada..b53729e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -186,7 +186,7 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 			void **data,
 			struct ndis_tcp_ip_checksum_info *csum_info,
 			struct vmbus_channel *channel,
-			u16 vlan_tci);
+			u16 vlan_tci, bool vlan_present);
 void netvsc_channel_cb(void *context);
 int rndis_filter_open(struct netvsc_device *nvdev);
 int rndis_filter_close(struct netvsc_device *nvdev);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index c9414c0..6597d79 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -595,7 +595,7 @@ void netvsc_linkstatus_callback(struct hv_device *device_obj,
 static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
 				struct hv_netvsc_packet *packet,
 				struct ndis_tcp_ip_checksum_info *csum_info,
-				void *data, u16 vlan_tci)
+				void *data)
 {
 	struct sk_buff *skb;
 
@@ -625,10 +625,6 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 	}
 
-	if (vlan_tci & VLAN_TAG_PRESENT)
-		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
-				       vlan_tci);
-
 	return skb;
 }
 
@@ -641,7 +637,7 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 				void **data,
 				struct ndis_tcp_ip_checksum_info *csum_info,
 				struct vmbus_channel *channel,
-				u16 vlan_tci)
+				u16 vlan_tci, bool vlan_present)
 {
 	struct net_device *net = hv_get_drvdata(device_obj);
 	struct net_device_context *net_device_ctx = netdev_priv(net);
@@ -664,12 +660,15 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 		net = vf_netdev;
 
 	/* Allocate a skb - TODO direct I/O to pages? */
-	skb = netvsc_alloc_recv_skb(net, packet, csum_info, *data, vlan_tci);
+	skb = netvsc_alloc_recv_skb(net, packet, csum_info, *data);
 	if (unlikely(!skb)) {
 		++net->stats.rx_dropped;
 		return NVSP_STAT_FAIL;
 	}
 
+	if (vlan_present)
+		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tci);
+
 	if (net != vf_netdev)
 		skb_record_rx_queue(skb,
 				    channel->offermsg.offer.sub_channel_index);
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 8d90904..7f7b410 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -381,13 +381,13 @@ static int rndis_filter_receive_data(struct rndis_device *dev,
 
 	vlan = rndis_get_ppi(rndis_pkt, IEEE_8021Q_INFO);
 	if (vlan) {
-		vlan_tci = VLAN_TAG_PRESENT | vlan->vlanid |
+		vlan_tci = vlan->vlanid |
 			(vlan->pri << VLAN_PRIO_SHIFT);
 	}
 
 	csum_info = rndis_get_ppi(rndis_pkt, TCPIP_CHKSUM_PKTINFO);
 	return netvsc_recv_callback(net_device_ctx->device_ctx, pkt, data,
-				    csum_info, channel, vlan_tci);
+				    csum_info, channel, vlan_tci, vlan);
 }
 
 int rndis_filter_receive(struct hv_device *dev,
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 05/27] i40iw: remove use of VLAN_TAG_PRESENT
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev; +Cc: Faisal Latif
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/infiniband/hw/i40iw/i40iw_cm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw_cm.c b/drivers/infiniband/hw/i40iw/i40iw_cm.c
index 8563769..25cf689 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_cm.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_cm.c
@@ -414,7 +414,7 @@ static struct i40iw_puda_buf *i40iw_form_cm_frame(struct i40iw_cm_node *cm_node,
 			pd_len += MPA_ZERO_PAD_LEN;
 	}
 
-	if (cm_node->vlan_id < VLAN_TAG_PRESENT)
+	if (cm_node->vlan_id <= VLAN_VID_MASK)
 		eth_hlen += 4;
 
 	if (cm_node->ipv4)
@@ -443,7 +443,7 @@ static struct i40iw_puda_buf *i40iw_form_cm_frame(struct i40iw_cm_node *cm_node,
 
 		ether_addr_copy(ethh->h_dest, cm_node->rem_mac);
 		ether_addr_copy(ethh->h_source, cm_node->loc_mac);
-		if (cm_node->vlan_id < VLAN_TAG_PRESENT) {
+		if (cm_node->vlan_id <= VLAN_VID_MASK) {
 			((struct vlan_ethhdr *)ethh)->h_vlan_proto = htons(ETH_P_8021Q);
 			((struct vlan_ethhdr *)ethh)->h_vlan_TCI = htons(cm_node->vlan_id);
 
@@ -472,7 +472,7 @@ static struct i40iw_puda_buf *i40iw_form_cm_frame(struct i40iw_cm_node *cm_node,
 
 		ether_addr_copy(ethh->h_dest, cm_node->rem_mac);
 		ether_addr_copy(ethh->h_source, cm_node->loc_mac);
-		if (cm_node->vlan_id < VLAN_TAG_PRESENT) {
+		if (cm_node->vlan_id <= VLAN_VID_MASK) {
 			((struct vlan_ethhdr *)ethh)->h_vlan_proto = htons(ETH_P_8021Q);
 			((struct vlan_ethhdr *)ethh)->h_vlan_TCI = htons(cm_node->vlan_id);
 			((struct vlan_ethhdr *)ethh)->h_vlan_encapsulated_proto = htons(ETH_P_IPV6);
@@ -3235,7 +3235,7 @@ static void i40iw_init_tcp_ctx(struct i40iw_cm_node *cm_node,
 
 	tcp_info->flow_label = 0;
 	tcp_info->snd_mss = cpu_to_le32(((u32)cm_node->tcp_cntxt.mss));
-	if (cm_node->vlan_id < VLAN_TAG_PRESENT) {
+	if (cm_node->vlan_id <= VLAN_VID_MASK) {
 		tcp_info->insert_vlan_tag = true;
 		tcp_info->vlan_tag = cpu_to_le16(cm_node->vlan_id);
 	}
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 06/27] cnic: remove use of VLAN_TAG_PRESENT
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/ethernet/broadcom/cnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index b1d2ac8..6e3c610 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -5734,7 +5734,7 @@ static int cnic_netdev_event(struct notifier_block *this, unsigned long event,
 		if (realdev) {
 			dev = cnic_from_netdev(realdev);
 			if (dev) {
-				vid |= VLAN_TAG_PRESENT;
+				vid |= VLAN_CFI_MASK;	/* make non-zero */
 				cnic_rcv_netevent(dev->cnic_priv, event, vid);
 				cnic_put(dev);
 			}
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 07/27] gianfar: remove use of VLAN_TAG_PRESENT
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev; +Cc: Claudiu Manoil
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/ethernet/freescale/gianfar_ethtool.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 56588f2..95fa647 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -1155,11 +1155,9 @@ static int gfar_convert_to_filer(struct ethtool_rx_flow_spec *rule,
 		prio = vlan_tci_prio(rule);
 		prio_mask = vlan_tci_priom(rule);
 
-		if (cfi == VLAN_TAG_PRESENT && cfi_mask == VLAN_TAG_PRESENT) {
-			vlan |= RQFPR_CFI;
-			vlan_mask |= RQFPR_CFI;
-		} else if (cfi != VLAN_TAG_PRESENT &&
-			   cfi_mask == VLAN_TAG_PRESENT) {
+		if (cfi_mask) {
+			if (cfi)
+				vlan |= RQFPR_CFI;
 			vlan_mask |= RQFPR_CFI;
 		}
 	}
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 17/27] OVS: remove assumptions about VLAN_TAG_PRESENT bit
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA; +Cc: open list:OPENVSWITCH
In-Reply-To: <cover.1481586602.git.mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

This leaves CFI bit toggled in API, because userspace might depend this
is set for normal ethernet traffic with tag present.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 Documentation/networking/openvswitch.txt | 14 --------
 net/openvswitch/actions.c                | 13 +++----
 net/openvswitch/flow.c                   |  4 +--
 net/openvswitch/flow.h                   |  4 +--
 net/openvswitch/flow_netlink.c           | 61 ++++++++++----------------------
 5 files changed, 30 insertions(+), 66 deletions(-)

diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt
index b3b9ac6..e7ca27d 100644
--- a/Documentation/networking/openvswitch.txt
+++ b/Documentation/networking/openvswitch.txt
@@ -219,20 +219,6 @@ this:
 
     eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
 
-As another example, consider a packet with an Ethernet type of 0x8100,
-indicating that a VLAN TCI should follow, but which is truncated just
-after the Ethernet type.  The flow key for this packet would include
-an all-zero-bits vlan and an empty encap attribute, like this:
-
-    eth(...), eth_type(0x8100), vlan(0), encap()
-
-Unlike a TCP packet with source and destination ports 0, an
-all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
-VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
-attribute expressly to allow this situation to be distinguished.
-Thus, the flow key in this second example unambiguously indicates a
-missing or malformed VLAN TCI.
-
 Other rules
 -----------
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 514f7bc..6015bc9 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -277,8 +277,7 @@ static int push_vlan(struct sk_buff *skb, struct sw_flow_key *key,
 		key->eth.vlan.tci = vlan->vlan_tci;
 		key->eth.vlan.tpid = vlan->vlan_tpid;
 	}
-	return skb_vlan_push(skb, vlan->vlan_tpid,
-			     ntohs(vlan->vlan_tci) & ~VLAN_TAG_PRESENT);
+	return skb_vlan_push(skb, vlan->vlan_tpid, ntohs(vlan->vlan_tci ^ VLAN_CFI_MASK));
 }
 
 /* 'src' is already properly masked. */
@@ -704,8 +703,10 @@ static int ovs_vport_output(struct net *net, struct sock *sk, struct sk_buff *sk
 	__skb_dst_copy(skb, data->dst);
 	*OVS_CB(skb) = data->cb;
 	skb->inner_protocol = data->inner_protocol;
-	skb->vlan_tci = data->vlan_tci;
-	skb->vlan_proto = data->vlan_proto;
+	if (data->vlan_proto)
+		__vlan_hwaccel_put_tag(skb, data->vlan_proto, data->vlan_tci ^ VLAN_CFI_MASK);
+	else
+		__vlan_hwaccel_clear_tag(skb);
 
 	/* Reconstruct the MAC header.  */
 	skb_push(skb, data->l2_len);
@@ -749,8 +750,8 @@ static void prepare_frag(struct vport *vport, struct sk_buff *skb,
 	data->cb = *OVS_CB(skb);
 	data->inner_protocol = skb->inner_protocol;
 	data->network_offset = orig_network_offset;
-	data->vlan_tci = skb->vlan_tci;
-	data->vlan_proto = skb->vlan_proto;
+	data->vlan_tci = skb_vlan_tag_present(skb) ? skb->vlan_tci ^ VLAN_CFI_MASK : 0;
+	data->vlan_proto = skb_vlan_tag_present(skb) ? skb->vlan_proto : 0;
 	data->mac_proto = mac_proto;
 	data->l2_len = hlen;
 	memcpy(&data->l2_data, skb->data, hlen);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 08aa926..df58cfd 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -327,7 +327,7 @@ static int parse_vlan_tag(struct sk_buff *skb, struct vlan_head *key_vh)
 		return -ENOMEM;
 
 	vh = (struct vlan_head *)skb->data;
-	key_vh->tci = vh->tci | htons(VLAN_TAG_PRESENT);
+	key_vh->tci = vh->tci ^ htons(VLAN_CFI_MASK);
 	key_vh->tpid = vh->tpid;
 
 	__skb_pull(skb, sizeof(struct vlan_head));
@@ -347,7 +347,7 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 	int res;
 
 	if (skb_vlan_tag_present(skb)) {
-		key->eth.vlan.tci = htons(skb->vlan_tci);
+		key->eth.vlan.tci = htons(skb->vlan_tci) ^ htons(VLAN_CFI_MASK);
 		key->eth.vlan.tpid = skb->vlan_proto;
 	} else {
 		/* Parse outer vlan tag in the non-accelerated case. */
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index f61cae7..f5115ed 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -57,8 +57,8 @@ struct ovs_tunnel_info {
 };
 
 struct vlan_head {
-	__be16 tpid; /* Vlan type. Generally 802.1q or 802.1ad.*/
-	__be16 tci;  /* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
+	__be16 tpid; /* Vlan type. Generally 802.1q or 802.1ad. 0 if no VLAN*/
+	__be16 tci;
 };
 
 #define OVS_SW_FLOW_KEY_METADATA_SIZE			\
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d19044f..6ae5218 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -835,8 +835,6 @@ static int validate_vlan_from_nlattrs(const struct sw_flow_match *match,
 				      u64 key_attrs, bool inner,
 				      const struct nlattr **a, bool log)
 {
-	__be16 tci = 0;
-
 	if (!((key_attrs & (1 << OVS_KEY_ATTR_ETHERNET)) &&
 	      (key_attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) &&
 	       eth_type_vlan(nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE])))) {
@@ -850,20 +848,11 @@ static int validate_vlan_from_nlattrs(const struct sw_flow_match *match,
 		return -EINVAL;
 	}
 
-	if (a[OVS_KEY_ATTR_VLAN])
-		tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
-
-	if (!(tci & htons(VLAN_TAG_PRESENT))) {
-		if (tci) {
-			OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT bit set.",
-				  (inner) ? "C-VLAN" : "VLAN");
-			return -EINVAL;
-		} else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
-			/* Corner case for truncated VLAN header. */
-			OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.",
-				  (inner) ? "C-VLAN" : "VLAN");
-			return -EINVAL;
-		}
+	if (!a[OVS_KEY_ATTR_VLAN] && nla_len(a[OVS_KEY_ATTR_ENCAP])) {
+		/* Corner case for truncated VLAN header. */
+		OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.",
+			(inner) ? "C-VLAN" : "VLAN");
+		return -EINVAL;
 	}
 
 	return 1;
@@ -873,12 +862,9 @@ static int validate_vlan_mask_from_nlattrs(const struct sw_flow_match *match,
 					   u64 key_attrs, bool inner,
 					   const struct nlattr **a, bool log)
 {
-	__be16 tci = 0;
 	__be16 tpid = 0;
-	bool encap_valid = !!(match->key->eth.vlan.tci &
-			      htons(VLAN_TAG_PRESENT));
-	bool i_encap_valid = !!(match->key->eth.cvlan.tci &
-				htons(VLAN_TAG_PRESENT));
+	bool encap_valid = !!match->key->eth.vlan.tpid;
+	bool i_encap_valid = !!match->key->eth.cvlan.tpid;
 
 	if (!(key_attrs & (1 << OVS_KEY_ATTR_ENCAP))) {
 		/* Not a VLAN. */
@@ -891,9 +877,6 @@ static int validate_vlan_mask_from_nlattrs(const struct sw_flow_match *match,
 		return -EINVAL;
 	}
 
-	if (a[OVS_KEY_ATTR_VLAN])
-		tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
-
 	if (a[OVS_KEY_ATTR_ETHERTYPE])
 		tpid = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]);
 
@@ -902,11 +885,6 @@ static int validate_vlan_mask_from_nlattrs(const struct sw_flow_match *match,
 			  (inner) ? "C-VLAN" : "VLAN", ntohs(tpid));
 		return -EINVAL;
 	}
-	if (!(tci & htons(VLAN_TAG_PRESENT))) {
-		OVS_NLERR(log, "%s TCI mask does not have exact match for VLAN_TAG_PRESENT bit.",
-			  (inner) ? "C-VLAN" : "VLAN");
-		return -EINVAL;
-	}
 
 	return 1;
 }
@@ -958,7 +936,7 @@ static int parse_vlan_from_nlattrs(struct sw_flow_match *match,
 	if (err)
 		return err;
 
-	encap_valid = !!(match->key->eth.vlan.tci & htons(VLAN_TAG_PRESENT));
+	encap_valid = !!match->key->eth.vlan.tpid;
 	if (encap_valid) {
 		err = __parse_vlan_from_nlattrs(match, key_attrs, true, a,
 						is_mask, log);
@@ -1974,12 +1952,12 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa,
 static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 				  const struct sw_flow_key *key,
 				  int depth, struct sw_flow_actions **sfa,
-				  __be16 eth_type, __be16 vlan_tci, bool log);
+				  __be16 eth_type, __be16 vlan_tci, bool has_vlan, bool log);
 
 static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 				    const struct sw_flow_key *key, int depth,
 				    struct sw_flow_actions **sfa,
-				    __be16 eth_type, __be16 vlan_tci, bool log)
+				    __be16 eth_type, __be16 vlan_tci, bool has_vlan, bool log)
 {
 	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
 	const struct nlattr *probability, *actions;
@@ -2017,7 +1995,7 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 		return st_acts;
 
 	err = __ovs_nla_copy_actions(net, actions, key, depth + 1, sfa,
-				     eth_type, vlan_tci, log);
+				     eth_type, vlan_tci, has_vlan, log);
 	if (err)
 		return err;
 
@@ -2358,7 +2336,7 @@ static int copy_action(const struct nlattr *from,
 static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 				  const struct sw_flow_key *key,
 				  int depth, struct sw_flow_actions **sfa,
-				  __be16 eth_type, __be16 vlan_tci, bool log)
+				  __be16 eth_type, __be16 vlan_tci, bool has_vlan, bool log)
 {
 	u8 mac_proto = ovs_key_mac_proto(key);
 	const struct nlattr *a;
@@ -2436,6 +2414,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			if (mac_proto != MAC_PROTO_ETHERNET)
 				return -EINVAL;
 			vlan_tci = htons(0);
+			has_vlan = 0;
 			break;
 
 		case OVS_ACTION_ATTR_PUSH_VLAN:
@@ -2444,9 +2423,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			vlan = nla_data(a);
 			if (!eth_type_vlan(vlan->vlan_tpid))
 				return -EINVAL;
-			if (!(vlan->vlan_tci & htons(VLAN_TAG_PRESENT)))
-				return -EINVAL;
 			vlan_tci = vlan->vlan_tci;
+			has_vlan = 1;
 			break;
 
 		case OVS_ACTION_ATTR_RECIRC:
@@ -2460,7 +2438,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			/* Prohibit push MPLS other than to a white list
 			 * for packets that have a known tag order.
 			 */
-			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
+			if (has_vlan ||
 			    (eth_type != htons(ETH_P_IP) &&
 			     eth_type != htons(ETH_P_IPV6) &&
 			     eth_type != htons(ETH_P_ARP) &&
@@ -2472,8 +2450,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		}
 
 		case OVS_ACTION_ATTR_POP_MPLS:
-			if (vlan_tci & htons(VLAN_TAG_PRESENT) ||
-			    !eth_p_mpls(eth_type))
+			if (has_vlan || !eth_p_mpls(eth_type))
 				return -EINVAL;
 
 			/* Disallow subsequent L2.5+ set and mpls_pop actions
@@ -2506,7 +2483,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 
 		case OVS_ACTION_ATTR_SAMPLE:
 			err = validate_and_copy_sample(net, a, key, depth, sfa,
-						       eth_type, vlan_tci, log);
+						       eth_type, vlan_tci, has_vlan, log);
 			if (err)
 				return err;
 			skip_copy = true;
@@ -2530,7 +2507,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 		case OVS_ACTION_ATTR_POP_ETH:
 			if (mac_proto != MAC_PROTO_ETHERNET)
 				return -EINVAL;
-			if (vlan_tci & htons(VLAN_TAG_PRESENT))
+			if (has_vlan)
 				return -EINVAL;
 			mac_proto = MAC_PROTO_ETHERNET;
 			break;
@@ -2565,7 +2542,7 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 
 	(*sfa)->orig_len = nla_len(attr);
 	err = __ovs_nla_copy_actions(net, attr, key, 0, sfa, key->eth.type,
-				     key->eth.vlan.tci, log);
+				     key->eth.vlan.tci, !!key->eth.vlan.tpid, log);
 	if (err)
 		ovs_nla_free_flow_actions(*sfa);
 
-- 
2.10.2

_______________________________________________
dev mailing list
dev@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

^ permalink raw reply related

* [PATCH net-next 02/27] net/vlan: introduce __vlan_hwaccel_copy_tag() helper
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 include/linux/if_vlan.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 38be904..75e839b 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -393,6 +393,19 @@ static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
 	skb->vlan_tci = 0;
 }
 
+/**
+ * __vlan_hwaccel_copy_tag - copy hardware accelerated VLAN info from another skb
+ * @dst: skbuff to copy to
+ * @src: skbuff to copy from
+ *
+ * Copies VLAN information from @src to @dst (for branchless code)
+ */
+static inline void __vlan_hwaccel_copy_tag(struct sk_buff *dst, const struct sk_buff *src)
+{
+	dst->vlan_proto = src->vlan_proto;
+	dst->vlan_tci = src->vlan_tci;
+}
+
 /*
  * __vlan_hwaccel_push_inside - pushes vlan tag to the payload
  * @skb: skbuff to tag
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 01/27] net/vlan: introduce __vlan_hwaccel_clear_tag() helper
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 include/linux/if_vlan.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 8d5fcd6..38be904 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -382,6 +382,17 @@ static inline struct sk_buff *vlan_insert_tag_set_proto(struct sk_buff *skb,
 	return skb;
 }
 
+/**
+ * __vlan_hwaccel_clear_tag - clear hardware accelerated VLAN info
+ * @skb: skbuff to clear
+ *
+ * Clears the VLAN information from @skb
+ */
+static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
+{
+	skb->vlan_tci = 0;
+}
+
 /*
  * __vlan_hwaccel_push_inside - pushes vlan tag to the payload
  * @skb: skbuff to tag
@@ -396,7 +407,7 @@ static inline struct sk_buff *__vlan_hwaccel_push_inside(struct sk_buff *skb)
 	skb = vlan_insert_tag_set_proto(skb, skb->vlan_proto,
 					skb_vlan_tag_get(skb));
 	if (likely(skb))
-		skb->vlan_tci = 0;
+		__vlan_hwaccel_clear_tag(skb);
 	return skb;
 }
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 00/27] Remove VLAN CFI bit abuse
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev
In-Reply-To: <cfa1f2efae2217b50cbefccbf9ba7f0d24a23c63.1480755768.git.mirq-linux@rere.qmqm.pl>

Dear NetDevs

This series removes an abuse of VLAN CFI bit in Linux networking stack.
Currently Linux always clears the bit on outgoing traffic and presents
it cleared to userspace (even via AF_PACKET/tcpdump when hw-accelerated).

This uses a new vlan_present bit in struct skbuff, and removes an assumption
that vlan_proto != 0 when VLAN tag is present.

As I can't test most of the driver changes, please look at them carefully.

The series is supposed to be bisect-friendly and that requires temporary
insertion of #define VLAN_TAG_PRESENT in BPF code to be able to split
JIT changes per architecture.

Best Regards,
Michał Mirosław

---

Michał Mirosław (27):
  net/vlan: introduce __vlan_hwaccel_clear_tag() helper
  net/vlan: introduce __vlan_hwaccel_copy_tag() helper
  ibmvnic: fix accelerated VLAN handling
  qlcnic: remove assumption that vlan_tci != 0
  i40iw: remove use of VLAN_TAG_PRESENT
  cnic: remove use of VLAN_TAG_PRESENT
  gianfar: remove use of VLAN_TAG_PRESENT
  net/hyperv: remove use of VLAN_TAG_PRESENT
  cxgb4: use __vlan_hwaccel helpers
  benet: use __vlan_hwaccel helpers
  sky2: use __vlan_hwaccel helpers
  net/core: use __vlan_hwaccel helpers
  bridge: use __vlan_hwaccel helpers
  8021q: use __vlan_hwaccel helpers
  ipv4/tunnel: use __vlan_hwaccel helpers
  nfnetlink/queue: use __vlan_hwaccel helpers
  OVS: remove assumptions about VLAN_TAG_PRESENT bit
  net/skbuff: add macros for VLAN_PRESENT bit
  net/bpf_jit: ARM: split VLAN_PRESENT bit handling from VLAN_TCI
  net/bpf_jit: MIPS: split VLAN_PRESENT bit handling from VLAN_TCI
  net/bpf_jit: PPC: split VLAN_PRESENT bit handling from VLAN_TCI
  net/bpf_jit: SPARC: split VLAN_PRESENT bit handling from VLAN_TCI
  net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI
  bpf_test: prepare for VLAN_TAG_PRESENT removal
  net: remove VLAN_TAG_PRESENT
  net/hyperv: enable passing of VLAN.CFI bit
  net/vlan: remove unused #define HAVE_VLAN_GET_TAG

 Documentation/networking/openvswitch.txt         | 14 ------
 arch/arm/net/bpf_jit_32.c                        | 17 ++++---
 arch/mips/net/bpf_jit.c                          | 17 +++----
 arch/powerpc/net/bpf_jit_comp.c                  | 14 +++---
 arch/sparc/net/bpf_jit_comp.c                    | 14 +++---
 drivers/infiniband/hw/cxgb4/cm.c                 |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_cm.c           |  8 ++--
 drivers/net/ethernet/broadcom/cnic.c             |  2 +-
 drivers/net/ethernet/emulex/benet/be_main.c      |  4 +-
 drivers/net/ethernet/freescale/gianfar_ethtool.c |  8 ++--
 drivers/net/ethernet/ibm/ibmvnic.c               |  5 +-
 drivers/net/ethernet/marvell/sky2.c              |  6 +--
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c   |  8 ++--
 drivers/net/hyperv/hyperv_net.h                  |  2 +-
 drivers/net/hyperv/netvsc_drv.c                  | 14 +++---
 drivers/net/hyperv/rndis_filter.c                |  5 +-
 include/linux/if_vlan.h                          | 37 +++++++++++---
 include/linux/skbuff.h                           | 10 +++-
 lib/test_bpf.c                                   | 14 +++---
 net/8021q/vlan_core.c                            |  2 +-
 net/bridge/br_netfilter_hooks.c                  | 14 +++---
 net/bridge/br_private.h                          |  2 +-
 net/bridge/br_vlan.c                             |  6 +--
 net/core/dev.c                                   |  8 ++--
 net/core/filter.c                                | 17 +++----
 net/core/skbuff.c                                |  2 +-
 net/ipv4/ip_tunnel_core.c                        |  2 +-
 net/netfilter/nfnetlink_queue.c                  |  5 +-
 net/openvswitch/actions.c                        | 13 ++---
 net/openvswitch/flow.c                           |  4 +-
 net/openvswitch/flow.h                           |  4 +-
 net/openvswitch/flow_netlink.c                   | 61 ++++++++----------------
 net/sched/act_vlan.c                             |  2 +-
 33 files changed, 170 insertions(+), 173 deletions(-)

-- 
2.10.2

^ permalink raw reply

* [PATCH net-next 03/27] ibmvnic: fix accelerated VLAN handling
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Falcon, John Allen
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index c125966..c7664db 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -765,7 +765,7 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 	tx_crq.v1.sge_len = cpu_to_be32(skb->len);
 	tx_crq.v1.ioba = cpu_to_be64(data_dma_addr);
 
-	if (adapter->vlan_header_insertion) {
+	if (adapter->vlan_header_insertion && skb_vlan_tag_present(skb)) {
 		tx_crq.v1.flags2 |= IBMVNIC_TX_VLAN_INSERT;
 		tx_crq.v1.vlan_id = cpu_to_be16(skb->vlan_tci);
 	}
@@ -964,7 +964,8 @@ static int ibmvnic_poll(struct napi_struct *napi, int budget)
 		skb = rx_buff->skb;
 		skb_copy_to_linear_data(skb, rx_buff->data + offset,
 					length);
-		skb->vlan_tci = be16_to_cpu(next->rx_comp.vlan_tci);
+		if (flags & IBMVNIC_VLAN_STRIPPED)
+			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), be16_to_cpu(next->rx_comp.vlan_tci));
 		/* free the entry */
 		next->rx_comp.first = 0;
 		remove_buff_from_pool(adapter, rx_buff);
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 13/27] bridge: use __vlan_hwaccel helpers
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev
  Cc: moderated list:ETHERNET BRIDGE,
	open list:NETFILTER {IP, IP6, ARP, EB, NF}TABLES,
	Jozsef Kadlecsik, Patrick McHardy, Pablo Neira Ayuso
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

This removes assumption than vlan_tci != 0 when tag is present.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 net/bridge/br_netfilter_hooks.c | 14 ++++++++------
 net/bridge/br_private.h         |  2 +-
 net/bridge/br_vlan.c            |  6 +++---
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index b12501a..2cc0747 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -682,10 +682,8 @@ static int br_nf_push_frag_xmit(struct net *net, struct sock *sk, struct sk_buff
 		return 0;
 	}
 
-	if (data->vlan_tci) {
-		skb->vlan_tci = data->vlan_tci;
-		skb->vlan_proto = data->vlan_proto;
-	}
+	if (data->vlan_proto)
+		__vlan_hwaccel_put_tag(skb, data->vlan_proto, data->vlan_tci);
 
 	skb_copy_to_linear_data_offset(skb, -data->size, data->mac, data->size);
 	__skb_push(skb, data->encap_size);
@@ -749,8 +747,12 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
 
 		data = this_cpu_ptr(&brnf_frag_data_storage);
 
-		data->vlan_tci = skb->vlan_tci;
-		data->vlan_proto = skb->vlan_proto;
+		if (skb_vlan_tag_present(skb)) {
+			data->vlan_tci = skb->vlan_tci;
+			data->vlan_proto = skb->vlan_proto;
+		} else
+			data->vlan_proto = 0;
+
 		data->encap_size = nf_bridge_encap_header_len(skb);
 		data->size = ETH_HLEN + data->encap_size;
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 8ce621e..2efbdaf 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -819,7 +819,7 @@ static inline int br_vlan_get_tag(const struct sk_buff *skb, u16 *vid)
 	int err = 0;
 
 	if (skb_vlan_tag_present(skb)) {
-		*vid = skb_vlan_tag_get(skb) & VLAN_VID_MASK;
+		*vid = skb_vlan_tag_get_id(skb);
 	} else {
 		*vid = 0;
 		err = -EINVAL;
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index b6de4f4..ef94664 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -377,7 +377,7 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
 	}
 
 	if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
-		skb->vlan_tci = 0;
+		__vlan_hwaccel_clear_tag(skb);
 out:
 	return skb;
 }
@@ -444,8 +444,8 @@ static bool __allowed_ingress(const struct net_bridge *br,
 			__vlan_hwaccel_put_tag(skb, br->vlan_proto, pvid);
 		else
 			/* Priority-tagged Frame.
-			 * At this point, We know that skb->vlan_tci had
-			 * VLAN_TAG_PRESENT bit and its VID field was 0x000.
+			 * At this point, We know that skb->vlan_tci VID
+			 * field was 0x000.
 			 * We update only VID field and preserve PCP field.
 			 */
 			skb->vlan_tci |= pvid;
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 04/27] qlcnic: remove assumption that vlan_tci != 0
From: Michał Mirosław @ 2016-12-13  0:12 UTC (permalink / raw)
  To: netdev
  Cc: Harish Patil, Manish Chopra,
	supporter:QLOGIC QLCNIC (1/10)Gb ETHERNET DRIVER
In-Reply-To: <cover.1481586602.git.mirq-linux@rere.qmqm.pl>

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c
index fedd736..c3cc707 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c
@@ -459,7 +459,7 @@ static int qlcnic_tx_pkt(struct qlcnic_adapter *adapter,
 			 struct cmd_desc_type0 *first_desc, struct sk_buff *skb,
 			 struct qlcnic_host_tx_ring *tx_ring)
 {
-	u8 l4proto, opcode = 0, hdr_len = 0;
+	u8 l4proto, opcode = 0, hdr_len = 0, tag_vlan = 0;
 	u16 flags = 0, vlan_tci = 0;
 	int copied, offset, copy_len, size;
 	struct cmd_desc_type0 *hwdesc;
@@ -472,14 +472,16 @@ static int qlcnic_tx_pkt(struct qlcnic_adapter *adapter,
 		flags = QLCNIC_FLAGS_VLAN_TAGGED;
 		vlan_tci = ntohs(vh->h_vlan_TCI);
 		protocol = ntohs(vh->h_vlan_encapsulated_proto);
+		tag_vlan = 1;
 	} else if (skb_vlan_tag_present(skb)) {
 		flags = QLCNIC_FLAGS_VLAN_OOB;
 		vlan_tci = skb_vlan_tag_get(skb);
+		tag_vlan = 1;
 	}
 	if (unlikely(adapter->tx_pvid)) {
-		if (vlan_tci && !(adapter->flags & QLCNIC_TAGGING_ENABLED))
+		if (tag_vlan && !(adapter->flags & QLCNIC_TAGGING_ENABLED))
 			return -EIO;
-		if (vlan_tci && (adapter->flags & QLCNIC_TAGGING_ENABLED))
+		if (tag_vlan && (adapter->flags & QLCNIC_TAGGING_ENABLED))
 			goto set_flags;
 
 		flags = QLCNIC_FLAGS_VLAN_OOB;
-- 
2.10.2

^ permalink raw reply related

* Re: netlink: GPF in sock_sndtimeo
From: Cong Wang @ 2016-12-13  0:10 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: linux-audit, Paul Moore, Dmitry Vyukov, David Miller,
	Johannes Berg, Florian Westphal, Eric Dumazet, Herbert Xu, netdev,
	LKML, syzkaller
In-Reply-To: <20161212100215.GA1305@madcap2.tricolour.ca>

On Mon, Dec 12, 2016 at 2:02 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2016-12-09 20:13, Cong Wang wrote:
>> Netlink notifier can safely be converted to blocking one, I will send
>> a patch.
>
> I had a quick look at how that might happen.  The netlink notifier chain
> is atomic.  Would the registered callback funciton need to spawn a
> one-time thread to avoid blocking?

It is already non-atomic now:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=efa172f42836477bf1ac3c9a3053140df764699c


> I had a look at your patch.  It looks attractively simple.  The audit
> next tree has patches queued that add an audit_reset function that will
> require more work.  I still see some potential gaps.
>
> - If the process messes up (or the sock lookup messes up) it is reset
>   in the kauditd thread under the audit_cmd_mutex.
>
> - If the process exits normally or is replaced due to an audit_replace
>   error, it is reset from audit_receive_skb under the audit_cmd_mutex.
>
> - If the process dies before the kauditd thread notices, either reap it
>   via notifier callback or it needs a check on net exit to reset.  This
>   last one appears necessary to decrement the sock refcount so the sock
>   can be released in netlink_kernel_release().
>
> If we want to be proactive and use the netlink notifier, we assume the
> overhead of adding to the netlink notifier chain and eliminate all the
> other reset calls under the kauditd thread.  If we are ok being
> reactionary, then we'll at least need the net exit check on audit_sock.
>

I don't see why we need to check it in net exit if we use refcnt,
because we have two different users of audit_sock: kauditd and
netns, if both take care of refcnt properly, we don't need to worry
about who is the last, no matter what failures occur in what order.

^ permalink raw reply

* Re: [PATCH v2] audit: use proper refcount locking on audit_sock
From: Cong Wang @ 2016-12-12 23:58 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Linux Kernel Network Developers, LKML, linux-audit, Dmitry Vyukov,
	Eric Dumazet, Eric Paris, Paul Moore, sgrubb
In-Reply-To: <5714bd7468cfec225407a6c367e658478d590495.1481534171.git.rgb@redhat.com>

On Mon, Dec 12, 2016 at 2:03 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Resetting audit_sock appears to be racy.
>
> audit_sock was being copied and dereferenced without using a refcount on
> the source sock.
>
> Bump the refcount on the underlying sock when we store a refrence in
> audit_sock and release it when we reset audit_sock.  audit_sock
> modification needs the audit_cmd_mutex.
>
> See: https://lkml.org/lkml/2016/11/26/232
>
> Thanks to Eric Dumazet <edumazet@google.com> and Cong Wang
> <xiyou.wangcong@gmail.com> on ideas how to fix it.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
> There has been a lot of change in the audit code that is about to go
> upstream to address audit queue issues.  This patch is based on the
> source tree: git://git.infradead.org/users/pcmoore/audit#next
> ---
>  kernel/audit.c |   34 ++++++++++++++++++++++++++++------
>  1 files changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index f20eee0..439f7f3 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -452,7 +452,9 @@ static void auditd_reset(void)
>         struct sk_buff *skb;
>
>         /* break the connection */
> +       sock_put(audit_sock);


Why audit_sock can't be NULL here?


>         audit_pid = 0;
> +       audit_nlk_portid = 0;
>         audit_sock = NULL;
>
>         /* flush all of the retry queue to the hold queue */
> @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
>         if (rc >= 0) {
>                 consume_skb(skb);
>                 rc = 0;
> +       } else {
> +               if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {


Are these errno's bits??


> +                       mutex_lock(&audit_cmd_mutex);
> +                       auditd_reset();
> +                       mutex_unlock(&audit_cmd_mutex);
> +               }
>         }
>
>         return rc;
> @@ -579,7 +587,9 @@ static int kauditd_thread(void *dummy)
>
>                                 auditd = 0;
>                                 if (AUDITD_BAD(rc, reschedule)) {
> +                                       mutex_lock(&audit_cmd_mutex);
>                                         auditd_reset();
> +                                       mutex_unlock(&audit_cmd_mutex);
>                                         reschedule = 0;
>                                 }
>                         } else
> @@ -594,7 +604,9 @@ static int kauditd_thread(void *dummy)
>                                 auditd = 0;
>                                 if (AUDITD_BAD(rc, reschedule)) {
>                                         kauditd_hold_skb(skb);
> +                                       mutex_lock(&audit_cmd_mutex);
>                                         auditd_reset();
> +                                       mutex_unlock(&audit_cmd_mutex);
>                                         reschedule = 0;
>                                 } else
>                                         /* temporary problem (we hope), queue
> @@ -623,7 +635,9 @@ quick_loop:
>                                 if (rc) {
>                                         auditd = 0;
>                                         if (AUDITD_BAD(rc, reschedule)) {
> +                                               mutex_lock(&audit_cmd_mutex);
>                                                 auditd_reset();
> +                                               mutex_unlock(&audit_cmd_mutex);
>                                                 reschedule = 0;
>                                         }
>
> @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                                 return -EACCES;
>                         }
>                         if (audit_pid && new_pid &&
> -                           audit_replace(requesting_pid) != -ECONNREFUSED) {
> +                           (audit_replace(requesting_pid) & (-ECONNREFUSED|-EPERM|-ENOMEM))) {
>                                 audit_log_config_change("audit_pid", new_pid, audit_pid, 0);
>                                 return -EEXIST;
>                         }
>                         if (audit_enabled != AUDIT_OFF)
>                                 audit_log_config_change("audit_pid", new_pid, audit_pid, 1);
> -                       audit_pid = new_pid;
> -                       audit_nlk_portid = NETLINK_CB(skb).portid;
> -                       audit_sock = skb->sk;
> -                       if (!new_pid)
> +                       if (new_pid) {
> +                               if (audit_sock)
> +                                       sock_put(audit_sock);
> +                               audit_pid = new_pid;
> +                               audit_nlk_portid = NETLINK_CB(skb).portid;
> +                               sock_hold(skb->sk);

Why refcnt is still needed here? I need it because I removed the code
in net exit code path.


> +                               audit_sock = skb->sk;
> +                       } else {
>                                 auditd_reset();
> +                       }
>                         wake_up_interruptible(&kauditd_wait);
>                 }
>                 if (s.mask & AUDIT_STATUS_RATE_LIMIT) {
> @@ -1283,8 +1302,11 @@ static void __net_exit audit_net_exit(struct net *net)
>  {
>         struct audit_net *aunet = net_generic(net, audit_net_id);
>         struct sock *sock = aunet->nlsk;
> -       if (sock == audit_sock)
> +       if (sock == audit_sock) {
> +               mutex_lock(&audit_cmd_mutex);


You need to put the if check inside the mutex too. Again, this could be
removed if you use refcnt.


>                 auditd_reset();
> +               mutex_unlock(&audit_cmd_mutex);
> +       }
>
>         RCU_INIT_POINTER(aunet->nlsk, NULL);
>         synchronize_net();
> --
> 1.7.1
>

^ permalink raw reply

* Re: [PATCH V2 00/22] Broadcom RoCE Driver (bnxt_re)
From: Doug Ledford @ 2016-12-12 23:52 UTC (permalink / raw)
  To: Selvin Xavier, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1481266096-23331-1-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 1282 bytes --]

On 12/9/2016 1:47 AM, Selvin Xavier wrote:
> This series introduces the RoCE driver for the Broadcom
> NetXtreme-E 10/25/40/50 gigabit RoCE HCAs. 
> This driver is dependent on the bnxt_en NIC driver and is 
> based on the bnxt_re branch in Doug's repository. bnxt_en changes
> required for this patch series is already available in this branch.
> 
> I am preparing a git repository with these changes as per Jason's
> comment and will share the details later today.
> 
> v1-> v2:
>   * The license text in each file updated to reflect Dual license.
>   * Makefile and Kconfig changes are pushed to the last patch
>   * Moved bnxt_re_uverbs_abi.h to include/uapi/rdma folder
>   * Remove duplicate structure definitions from bnxt_re_hsi.h as
>     it is available in the corresponding bnxt_en header file (bnxt_hsi.h)
>   * Removed some unused code reported during code review.
>   * Fixed few sparse warnings
> 
> Doug,
> Please review and consider applying this to linux-rdma repository.

There are outstanding review comments to be addressed still yet, and the
v2 patchset doesn't compile for me in 0day testing.  I'm going to bounce
this one to 4.11.


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [iproute2 v2 net-next 0/8] Add support for vrf helper
From: Stephen Hemminger @ 2016-12-12 23:43 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev
In-Reply-To: <1481401934-4026-1-git-send-email-dsa@cumulusnetworks.com>

On Sat, 10 Dec 2016 12:32:06 -0800
David Ahern <dsa@cumulusnetworks.com> wrote:

> This series adds support to iproute2 to run a command against a specific
> VRF. The user semnatics are similar to 'ip netns'.
> 
> The 'ip vrf' subcommand supports 3 usages:
> 
> 1. Run a command against a given vrf:
>        ip vrf exec NAME CMD
> 
>    Uses the recently committed cgroup/sock BPF option. vrf directory
>    is added to cgroup2 mount. Individual vrfs are created under it. BPF
>    filter is attached to vrf/NAME cgroup2 to set sk_bound_dev_if to the
>    device index of the VRF. From there the current process (ip's pid) is
>    addded to the cgroups.proc file and the given command is exected. In
>    doing so all AF_INET/AF_INET6 (ipv4/ipv6) sockets are automatically
>    bound to the VRF domain.
> 
>    The association is inherited parent to child allowing the command to
>    be a shell from which other commands are run relative to the VRF.
> 
> 2. Show the VRF a process is bound to:
>        ip vrf id
>    This command essentially looks at /proc/pid/cgroup for a "::/vrf/"
>    entry.
> 
> 3. Show process ids bound to a VRF
>        ip vrf pids NAME
>    This command dumps the file MNT/vrf/NAME/cgroup.procs since that file
>    shows the process ids in the particular vrf cgroup.
> 
> v2
> - updated suject of patch 3 to avoid spam filters on vger
> 
> David Ahern (8):
>   lib bpf: Add support for BPF_PROG_ATTACH and BPF_PROG_DETACH
>   bpf: export bpf_prog_load
>   Add libbpf.h header with BPF_ macros
>   move cmd_exec to lib utils
>   Add filesystem APIs to lib
>   change name_is_vrf to return index
>   libnetlink: Add variant of rtnl_talk that does not display RTNETLINK
>     answers error
>   Introduce ip vrf command
> 
>  include/bpf_util.h   |   6 ++
>  include/libbpf.h     | 184 ++++++++++++++++++++++++++++++++
>  include/libnetlink.h |   3 +
>  include/utils.h      |   4 +
>  ip/Makefile          |   3 +-
>  ip/ip.c              |   4 +-
>  ip/ip_common.h       |   4 +-
>  ip/iplink_vrf.c      |  29 ++++--
>  ip/ipnetns.c         |  34 ------
>  ip/ipvrf.c           | 289 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/Makefile         |   2 +-
>  lib/bpf.c            |  71 ++++++++-----
>  lib/exec.c           |  41 ++++++++
>  lib/fs.c             | 143 +++++++++++++++++++++++++
>  lib/libnetlink.c     |  20 +++-
>  man/man8/ip-vrf.8    |  88 ++++++++++++++++
>  16 files changed, 850 insertions(+), 75 deletions(-)
>  create mode 100644 include/libbpf.h
>  create mode 100644 ip/ipvrf.c
>  create mode 100644 lib/exec.c
>  create mode 100644 lib/fs.c
>  create mode 100644 man/man8/ip-vrf.8
> 

Please use tooling that puts v2 on all the updated patches.
It makes it easier to spot them in patchwork

^ permalink raw reply

* "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE
From: Theodore Ts'o @ 2016-12-12 23:33 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, mst, nhorman, davem

Hi,

I was doing a last minute regression test of the ext4 tree before
sending a pull request to Linus, which I do using gce-xfstests[1], and
I found that using networking was broken on GCE on linux-next.  I was
using next-20161209, and after bisecting things, I narrowed down the
commit which causing things to break to commit 449000102901:
"virtio-net: enable multiqueue by default".  Reverting this commit on
top of next-20161209 fixed the problem.

[1] http://thunk.org/gce-xfstests

You can reproduce the problem for building the kernel for Google
Compute Engine --- I use a config such as this [2], and then try to
boot a kernel on a VM.  The way I do this involves booting a test
appliance and then kexec'ing into the kernel to be tested[3], using a
2cpu configuration.  (GCE machine type: n1-standard-2)

[2] https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kernel-configs/ext4-x86_64-config-4.9
[3] https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md

You can then take a look at serial console using a command such as
"gcloud compute instances get-serial-port-output <instance-name>", and
you will get something like this (see attached).  The important bit is
that the dhclient command is completely failing to be able to get a
response from the network, from which I deduce that apparently that
either networking send or receive or both seem to be badly affected by
the commit in question.

Please let me know if there's anything I can do to help you debug this
further.

Cheers,

						- Ted

Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] Linux version 4.9.0-rc8-ext4-06387-g03e5cbd (tytso@tytso-ssd) (gcc version 4.9.2 (Debian 4.9.2-10) ) #9 SMP Mon Dec 12 04:50:16 UTC 2016
Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] Command line: root=/dev/sda1 ro console=ttyS0,38400n8 elevator=noop console=ttyS0  fstestcfg=4k fstestset=-g,quick fstestexc= fstestopt=aex fstesttyp=ext4 fstestapi=1.3
Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Dec 11 23:53:20 xfstests-201612120451 kernel: [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Load Kernel Modules.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Apply Kernel Variables...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounting Configuration File System...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounting FUSE Control File System...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounted FUSE Control File System.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounted Configuration File System.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Apply Kernel Variables.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Create Static Device Nodes in /dev.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting udev Kernel Device Manager...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Kernel Device Manager.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Coldplug all Devices.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting udev Wait for Complete Device Initialization...
Dec 11 23:53:20 xfstests-201612120451 systemd-fsck[1659]: xfstests-root: clean, 56268/655360 files, 357439/2620928 blocks
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started File System Check on Root Device.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Remount Root and Kernel File Systems...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Remount Root and Kernel File Systems.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Various fixups to make systemd work better on Debian.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Load/Save Random Seed...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Local File Systems (Pre).
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Local File Systems (Pre).
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Load/Save Random Seed.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Wait for Complete Device Initialization.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Activation of LVM2 logical volumes...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Copy rules generated while the root was ro...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS0.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS1.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Copy rules generated while the root was ro.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS2.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Found device /dev/ttyS3.
Dec 11 23:53:20 xfstests-201612120451 systemd-udevd[2568]: could not open moddep file '/lib/modules/4.9.0-rc8-ext4-06387-g03e5cbd/modules.dep.bin'
Dec 11 23:53:20 xfstests-201612120451 lvm[2579]: No volume groups found
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Activation of LVM2 logical volumes.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Encrypted Volumes.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Encrypted Volumes.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Activation of LVM2 logical volumes...
Dec 11 23:53:20 xfstests-201612120451 lvm[2625]: No volume groups found
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Activation of LVM2 logical volumes.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Dec 11 23:53:20 xfstests-201612120451 lvm[2627]: No volume groups found
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Local File Systems.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Local File Systems.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Remote File Systems.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Remote File Systems.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Create Volatile Files and Directories...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting LSB: Generate ssh host keys if they do not exist...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting LSB: Raise network interfaces....
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Create Volatile Files and Directories.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started LSB: Generate ssh host keys if they do not exist.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Update UTMP about System Boot/Shutdown...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Update UTMP about System Boot/Shutdown.
Dec 11 23:53:20 xfstests-201612120451 dhclient: Internet Systems Consortium DHCP Client 4.3.1
Dec 11 23:53:20 xfstests-201612120451 dhclient: Copyright 2004-2014 Internet Systems Consortium.
Dec 11 23:53:20 xfstests-201612120451 dhclient: All rights reserved.
Dec 11 23:53:20 xfstests-201612120451 dhclient: For info, please visit https://www.isc.org/software/dhcp/
Dec 11 23:53:20 xfstests-201612120451 dhclient: 
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Configuring network interfaces...Internet Systems Consortium DHCP Client 4.3.1
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Copyright 2004-2014 Internet Systems Consortium.
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: All rights reserved.
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: For info, please visit https://www.isc.org/software/dhcp/
Dec 11 23:53:20 xfstests-201612120451 dhclient: Listening on LPF/eth0/42:01:0a:f0:00:03
Dec 11 23:53:20 xfstests-201612120451 dhclient: Sending on   LPF/eth0/42:01:0a:f0:00:03
Dec 11 23:53:20 xfstests-201612120451 dhclient: Sending on   Socket/fallback
Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Listening on LPF/eth0/42:01:0a:f0:00:03
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Sending on   LPF/eth0/42:01:0a:f0:00:03
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Sending on   Socket/fallback
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCP[^[[32m  OK  ^[[0m] DISCOVER on eth0 to 255.255.255.255 port 67 interval 8
Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 13
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 13
Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 17
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 17
Dec 11 23:53:20 xfstests-201612120451 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
Dec 11 23:53:20 xfstests-201612120451 dhclient: No DHCPOFFERS received.
Dec 11 23:53:20 xfstests-201612120451 dhclient: Trying recorded lease 10.240.0.3
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: No DHCPOFFERS received.
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: Trying recorded lease 10.240.0.3
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: connect: Network is unreachable
Dec 11 23:53:20 xfstests-201612120451 logger: /etc/dhcp/dhclient-exit-hooks returned non-zero exit status 2
Dec 11 23:53:20 xfstests-201612120451 dhclient: bound: renewal in 38598 seconds.
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: bound: renewal in 38598 seconds.
Dec 11 23:53:20 xfstests-201612120451 networking[2633]: done.

^ permalink raw reply

* [ANNOUNCE] iproute2 4.9
From: Stephen Hemminger @ 2016-12-12 23:24 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

Release of iproute2 for Linux 4.9, just in time for your holiday
giving.

Update to iproute2 utility to support new features in Linux 4.9.
Mostly this is refinements to add new flags to tipc, l2tp, ss
and macsec support. There are also a couple of performance
enhancments for handling lots of interfaces and namespaces.

Source:
  https://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-4.9.0.tar.gz

Repository:
  git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

Report problems (or enhancements) to the netdev@vger.kernel.org mailing list.

---
Alexei Starovoitov (1):
      iptnl: add support for collect_md flag in IPv4 and IPv6 tunnels

Anton Aksola (1):
      iproute2: build nsid-name cache only for commands that need it

Asbjørn Sloth Tønnesen (9):
      man: ip-l2tp.8: fix l2spec_type documentation
      man: ip-l2tp.8: remove non-existent tunnel parameter name
      l2tp: fix integers with too few significant bits
      l2tp: fix L2TP_ATTR_{RECV,SEND}_SEQ handling
      l2tp: fix L2TP_ATTR_UDP_CSUM handling
      l2tp: read IPv6 UDP checksum attributes from kernel
      l2tp: support sequence numbering
      l2tp: show tunnel: expose UDP checksum state
      man: ip-l2tp.8: document UDP checksum options

Craig Dillabaugh (1):
      action gact: list pipe as a valid action

Daniel Borkmann (1):
      tc, ipt: don't enforce iproute2 dependency on iptables-devel

Daniel Hopf (1):
      macsec: Nr. of packets and octets for macsec tx stats were swapped

Eric Dumazet (1):
      tc: fq: display unthrottle latency

Hadar Hen Zion (2):
      tc: flower: Introduce vlan support
      tc: m_vlan: Add priority option to push vlan action

Hangbin Liu (4):
      misc/ss: tcp cwnd should be unsigned
      ip rule: merge ip rule flush and list, save together
      ip rule: add selector support
      devlink: Convert conditional in dl_argv_handle_port() to switch()

Isaac Boukris (1):
      iproute2: ss: escape all null bytes in abstract unix domain socket

Jakub Kicinski (1):
      tc: cls_bpf: handle skip_sw and skip_hw flags

Jamal Hadi Salim (4):
      actions ife: Introduce encoding and decoding of tcindex metadata
      actions: add skbmod action
      man pages: Add tc-ife to Makefile
      tc filters: add support to get individual filters by handle

Lorenzo Colitti (1):
      ss: Support displaying and filtering on socket marks.

Lucas Bates (2):
      man pages: update ife action to include tcindex
      man pages: add man page for skbmod action

Mahesh Bandewar (1):
      ip: (ipvlan) introduce L3s mode

Mike Frysinger (1):
      ifstat/nstat: fix help output alignment

Moshe Shemesh (1):
      ip link: Add support to configure SR-IOV VF to vlan protocol 802.1ad (VST QinQ)

Neal Cardwell (1):
      ss: output TCP BBR diag information

Nikolay Aleksandrov (4):
      bridge: vlan: add support to display per-vlan statistics
      ipmroute: add support for age dumping
      bridge: vlan: remove wrong stats help
      bridge: add support for the multicast flood flag

Parthasarathy Bhuvaragan (7):
      tipc: remove dead code
      tipc: add link monitor set threshold
      tipc: add link monitor get threshold
      tipc: add link monitor summary
      tipc: refractor bearer to facilitate link monitor
      tipc: add link monitor list
      tipc: update man page for link monitor

Paul Blakey (1):
      tc: flower: Fix usage message

Phil Sutter (6):
      iproute: fix documentation for ip rule scan order
      include: Add linux/sctp.h
      ss: Add support for SCTP protocol
      ipaddress: Simplify vf_info parsing
      ipaddress: Print IFLA_VF_QUERY_RSS_EN setting
      man: ip-route.8: Add notes about dropped IPv4 route cache

Richard Alpe (3):
      tipc: add peer remove functionality
      tipc: introduce bearer add for remoteip
      tipc: add the ability to get UDP bearer options

Roi Dayan (2):
      devlink: Add usage help for eswitch subcommand
      devlink: Add option to set and show eswitch inline mode

Roman Mashak (7):
      ife action: allow specifying index in hex
      ife: print prio, mark and hash as unsigned
      ife: improve help text
      tc: updated man page to reflect GET command to retrieve a single filter.
      tc: improved usage help for fw classifier.
      tc: print raw qdisc handle.
      tc: distinguish Add/Replace filter operations

Shmulik Ladkani (1):
      tc: m_vlan: Add vlan modify action

Simon Horman (1):
      ss: initialise variables outside of for loop

Stephen Hemminger (24):
      update headers to 4.8-rc2 net-next
      update TIPC headers
      tipc: cleanup style issues
      update kernel headers from net-next
      update bpf.h
      update headers from pre 4.9 (net-next)
      iplink: cleanup style errors
      ip: iprule style cleanup
      tc: skbmod style cleanup
      tc_filter: style cleanup
      ip: macvlan style cleanup
      Revert "iproute2: macvlan: add "source" mode"
      cleanup debris from revert
      ss: break really long lines
      ip: style cleanup
      tc: cleanup style of qdisc code
      update headers based on 4.9-rc7
      libnetlink: style cleanups
      l2tp: style cleanup
      Revert "devlink: Add option to set and show eswitch inline mode"
      Revert "devlink: Add usage help for eswitch subcommand"
      update kernel headers
      update to 4.9 release headers
      v4.9.0

Zhang Shengju (3):
      iproute2: fix the link group name getting error
      libnetlink: reduce size of message sent to kernel
      link: add team and team_slave link type

david decotigny (2):
      iproute2: avoid exit in case of error.
      iproute2: a non-expected rtnl message is an error

michael-dev@fami-braun.de (2):
      iproute2: macvlan: add "source" mode
      iproute2: macvlan: add "source" mode

stefan@datenfreihafen.org (1):
      ip: update link types to show 6lowpan and ieee802.15.4 monitor

^ permalink raw reply

* Re: Soft lockup in tc_classify
From: Cong Wang @ 2016-12-12 22:51 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Daniel Borkmann, Shahar Klein, Linux Netdev List, Roi Dayan,
	David Miller, Jiri Pirko, John Fastabend, Hadar Hen Zion
In-Reply-To: <CAJ3xEMjABmvAMs6h0EqBgPH8QDDwF_x0COx01MkEw2pa+fp7LA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 711 bytes --]

On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
>> Note that there's still the RCU fix missing for the deletion race that
>> Cong will still send out, but you say that the only thing you do is to
>> add a single rule, but no other operation in involved during that test?
>
> What's missing to have the deletion race fixed? making a patch or
> testing to a patch which was sent?

If you think it would help for this problem, here is my patch rebased
on the latest net-next.

Again, I don't see how it could help this case yet, especially I don't
see how we could have a loop in this singly linked list.

[-- Attachment #2: tc-filter-destroy.diff --]
[-- Type: text/plain, Size: 21977 bytes --]

commit f6becda1e12fd8ef74e901fe39adb4558ce6c8f9
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Wed Nov 23 14:58:01 2016 -0800

    net_sched: move the empty tp check from ->destroy() to ->delete()
    
    Roi reported we could have a race condition where in ->classify() path
    we dereference tp->root and meanwhile a parallel ->destroy() makes it
    a NULL.
    
    This is possible because ->destroy() could be called when deleting
    a filter to check if we are the last one in tp, this tp is still
    linked and visible at that time.
    
    The root cause of this problem is the semantic of ->destroy(), it
    does two things (for non-force case):
    
    1) check if tp is empty
    2) if tp is empty we could really destroy it
    
    and its caller, if cares, needs to check its return value to see if
    it is really destroyed. Therefore we can't unlink tp unless we know
    it is empty.
    
    As suggested by Daniel, we could actually move the test logic to ->delete()
    so that we can safely unlink tp after ->delete() tells us the last one is
    just deleted and before ->destroy().
    
    What's more, even we unlink it before ->destroy(), it could still have
    readers since we don't wait for a grace period here, we should not modify
    tp->root in ->destroy() either.
    
    Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
    Reported-by: Roi Dayan <roid@mellanox.com>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Cc: John Fastabend <john.fastabend@gmail.com>
    Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 498f81b..b5eda3f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -203,14 +203,14 @@ struct tcf_proto_ops {
 					    const struct tcf_proto *,
 					    struct tcf_result *);
 	int			(*init)(struct tcf_proto*);
-	bool			(*destroy)(struct tcf_proto*, bool);
+	void			(*destroy)(struct tcf_proto*);
 
 	unsigned long		(*get)(struct tcf_proto*, u32 handle);
 	int			(*change)(struct net *net, struct sk_buff *,
 					struct tcf_proto*, unsigned long,
 					u32 handle, struct nlattr **,
 					unsigned long *, bool);
-	int			(*delete)(struct tcf_proto*, unsigned long);
+	int			(*delete)(struct tcf_proto*, unsigned long, bool*);
 	void			(*walk)(struct tcf_proto*, struct tcf_walker *arg);
 
 	/* rtnetlink specific */
@@ -405,7 +405,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 				const struct Qdisc_ops *ops, u32 parentid);
 void __qdisc_calculate_pkt_len(struct sk_buff *skb,
 			       const struct qdisc_size_table *stab);
-bool tcf_destroy(struct tcf_proto *tp, bool force);
+void tcf_destroy(struct tcf_proto *tp);
 void tcf_destroy_chain(struct tcf_proto __rcu **fl);
 int skb_do_redirect(struct sk_buff *);
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3fbba79..f9179e0 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -321,7 +321,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 
 			tfilter_notify(net, skb, n, tp, fh,
 				       RTM_DELTFILTER, false);
-			tcf_destroy(tp, true);
+			tcf_destroy(tp);
 			err = 0;
 			goto errout;
 		}
@@ -331,25 +331,29 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 		    !(n->nlmsg_flags & NLM_F_CREATE))
 			goto errout;
 	} else {
+		bool last;
+
 		switch (n->nlmsg_type) {
 		case RTM_NEWTFILTER:
 			err = -EEXIST;
 			if (n->nlmsg_flags & NLM_F_EXCL) {
 				if (tp_created)
-					tcf_destroy(tp, true);
+					tcf_destroy(tp);
 				goto errout;
 			}
 			break;
 		case RTM_DELTFILTER:
-			err = tp->ops->delete(tp, fh);
+			err = tp->ops->delete(tp, fh, &last);
 			if (err == 0) {
-				struct tcf_proto *next = rtnl_dereference(tp->next);
-
 				tfilter_notify(net, skb, n, tp,
 					       t->tcm_handle,
 					       RTM_DELTFILTER, false);
-				if (tcf_destroy(tp, false))
+				if (last) {
+					struct tcf_proto *next = rtnl_dereference(tp->next);
+
 					RCU_INIT_POINTER(*back, next);
+					tcf_destroy(tp);
+				}
 			}
 			goto errout;
 		case RTM_GETTFILTER:
@@ -372,7 +376,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
 		tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER, false);
 	} else {
 		if (tp_created)
-			tcf_destroy(tp, true);
+			tcf_destroy(tp);
 	}
 
 errout:
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 5877f60..8d822e5 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -93,30 +93,28 @@ static void basic_delete_filter(struct rcu_head *head)
 	kfree(f);
 }
 
-static bool basic_destroy(struct tcf_proto *tp, bool force)
+static void basic_destroy(struct tcf_proto *tp)
 {
 	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f, *n;
 
-	if (!force && !list_empty(&head->flist))
-		return false;
-
 	list_for_each_entry_safe(f, n, &head->flist, link) {
 		list_del_rcu(&f->link);
 		tcf_unbind_filter(tp, &f->res);
 		call_rcu(&f->rcu, basic_delete_filter);
 	}
 	kfree_rcu(head, rcu);
-	return true;
 }
 
-static int basic_delete(struct tcf_proto *tp, unsigned long arg)
+static int basic_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
+	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f = (struct basic_filter *) arg;
 
 	list_del_rcu(&f->link);
 	tcf_unbind_filter(tp, &f->res);
 	call_rcu(&f->rcu, basic_delete_filter);
+	*last = list_empty(&head->flist);
 	return 0;
 }
 
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index adc7760..55c9961 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -268,25 +268,24 @@ static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog)
 	call_rcu(&prog->rcu, cls_bpf_delete_prog_rcu);
 }
 
-static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
+static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
+	struct cls_bpf_head *head = rtnl_dereference(tp->root);
+
 	__cls_bpf_delete(tp, (struct cls_bpf_prog *) arg);
+	*last = list_empty(&head->plist);
 	return 0;
 }
 
-static bool cls_bpf_destroy(struct tcf_proto *tp, bool force)
+static void cls_bpf_destroy(struct tcf_proto *tp)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 	struct cls_bpf_prog *prog, *tmp;
 
-	if (!force && !list_empty(&head->plist))
-		return false;
-
 	list_for_each_entry_safe(prog, tmp, &head->plist, link)
 		__cls_bpf_delete(tp, prog);
 
 	kfree_rcu(head, rcu);
-	return true;
 }
 
 static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index c1f2007..51c822d 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -131,20 +131,16 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static bool cls_cgroup_destroy(struct tcf_proto *tp, bool force)
+static void cls_cgroup_destroy(struct tcf_proto *tp)
 {
 	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
 
-	if (!force)
-		return false;
 	/* Head can still be NULL due to cls_cgroup_init(). */
 	if (head)
 		call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
-
-	return true;
 }
 
-static int cls_cgroup_delete(struct tcf_proto *tp, unsigned long arg)
+static int cls_cgroup_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 6575aba..ea2be75 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -563,12 +563,14 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static int flow_delete(struct tcf_proto *tp, unsigned long arg)
+static int flow_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
+	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f = (struct flow_filter *)arg;
 
 	list_del_rcu(&f->list);
 	call_rcu(&f->rcu, flow_destroy_filter);
+	*last = list_empty(&head->filters);
 	return 0;
 }
 
@@ -584,20 +586,16 @@ static int flow_init(struct tcf_proto *tp)
 	return 0;
 }
 
-static bool flow_destroy(struct tcf_proto *tp, bool force)
+static void flow_destroy(struct tcf_proto *tp)
 {
 	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f, *next;
 
-	if (!force && !list_empty(&head->filters))
-		return false;
-
 	list_for_each_entry_safe(f, next, &head->filters, list) {
 		list_del_rcu(&f->list);
 		call_rcu(&f->rcu, flow_destroy_filter);
 	}
 	kfree_rcu(head, rcu);
-	return true;
 }
 
 static unsigned long flow_get(struct tcf_proto *tp, u32 handle)
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e040c51..328938b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -312,21 +312,16 @@ static void fl_destroy_rcu(struct rcu_head *rcu)
 	schedule_work(&head->work);
 }
 
-static bool fl_destroy(struct tcf_proto *tp, bool force)
+static void fl_destroy(struct tcf_proto *tp)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 	struct cls_fl_filter *f, *next;
 
-	if (!force && !list_empty(&head->filters))
-		return false;
-
 	list_for_each_entry_safe(f, next, &head->filters, list)
 		__fl_delete(tp, f);
 
 	__module_get(THIS_MODULE);
 	call_rcu(&head->rcu, fl_destroy_rcu);
-
-	return true;
 }
 
 static unsigned long fl_get(struct tcf_proto *tp, u32 handle)
@@ -877,7 +872,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static int fl_delete(struct tcf_proto *tp, unsigned long arg)
+static int fl_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 	struct cls_fl_filter *f = (struct cls_fl_filter *) arg;
@@ -886,6 +881,7 @@ static int fl_delete(struct tcf_proto *tp, unsigned long arg)
 		rhashtable_remove_fast(&head->ht, &f->ht_node,
 				       head->ht_params);
 	__fl_delete(tp, f);
+	*last = list_empty(&head->filters);
 	return 0;
 }
 
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 9dc63d5..bc8ceb7 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -127,20 +127,14 @@ static void fw_delete_filter(struct rcu_head *head)
 	kfree(f);
 }
 
-static bool fw_destroy(struct tcf_proto *tp, bool force)
+static void fw_destroy(struct tcf_proto *tp)
 {
 	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f;
 	int h;
 
 	if (head == NULL)
-		return true;
-
-	if (!force) {
-		for (h = 0; h < HTSIZE; h++)
-			if (rcu_access_pointer(head->ht[h]))
-				return false;
-	}
+		return;
 
 	for (h = 0; h < HTSIZE; h++) {
 		while ((f = rtnl_dereference(head->ht[h])) != NULL) {
@@ -150,17 +144,17 @@ static bool fw_destroy(struct tcf_proto *tp, bool force)
 			call_rcu(&f->rcu, fw_delete_filter);
 		}
 	}
-	RCU_INIT_POINTER(tp->root, NULL);
 	kfree_rcu(head, rcu);
-	return true;
 }
 
-static int fw_delete(struct tcf_proto *tp, unsigned long arg)
+static int fw_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
 	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = (struct fw_filter *)arg;
 	struct fw_filter __rcu **fp;
 	struct fw_filter *pfp;
+	int ret = -EINVAL;
+	int h;
 
 	if (head == NULL || f == NULL)
 		goto out;
@@ -173,11 +167,21 @@ static int fw_delete(struct tcf_proto *tp, unsigned long arg)
 			RCU_INIT_POINTER(*fp, rtnl_dereference(f->next));
 			tcf_unbind_filter(tp, &f->res);
 			call_rcu(&f->rcu, fw_delete_filter);
-			return 0;
+			ret = 0;
+			break;
 		}
 	}
+
+	*last = true;
+	for (h = 0; h < HTSIZE; h++) {
+		if (rcu_access_pointer(head->ht[h])) {
+			*last = false;
+			break;
+		}
+	}
+
 out:
-	return -EINVAL;
+	return ret;
 }
 
 static const struct nla_policy fw_policy[TCA_FW_MAX + 1] = {
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index f935429..7d54805 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -99,15 +99,12 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp,
 					     &offload);
 }
 
-static bool mall_destroy(struct tcf_proto *tp, bool force)
+static void mall_destroy(struct tcf_proto *tp)
 {
 	struct cls_mall_head *head = rtnl_dereference(tp->root);
 	struct net_device *dev = tp->q->dev_queue->dev;
 	struct cls_mall_filter *f = head->filter;
 
-	if (!force && f)
-		return false;
-
 	if (f) {
 		if (tc_should_offload(dev, tp, f->flags))
 			mall_destroy_hw_filter(tp, f, (unsigned long) f);
@@ -115,7 +112,6 @@ static bool mall_destroy(struct tcf_proto *tp, bool force)
 		call_rcu(&f->rcu, mall_destroy_filter);
 	}
 	kfree_rcu(head, rcu);
-	return true;
 }
 
 static unsigned long mall_get(struct tcf_proto *tp, u32 handle)
@@ -224,7 +220,7 @@ static int mall_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static int mall_delete(struct tcf_proto *tp, unsigned long arg)
+static int mall_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
 	struct cls_mall_head *head = rtnl_dereference(tp->root);
 	struct cls_mall_filter *f = (struct cls_mall_filter *) arg;
@@ -236,6 +232,7 @@ static int mall_delete(struct tcf_proto *tp, unsigned long arg)
 	RCU_INIT_POINTER(head->filter, NULL);
 	tcf_unbind_filter(tp, &f->res);
 	call_rcu(&f->rcu, mall_destroy_filter);
+	*last = true;
 	return 0;
 }
 
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index 455fc8f..1a38e41 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -276,20 +276,13 @@ static void route4_delete_filter(struct rcu_head *head)
 	kfree(f);
 }
 
-static bool route4_destroy(struct tcf_proto *tp, bool force)
+static void route4_destroy(struct tcf_proto *tp)
 {
 	struct route4_head *head = rtnl_dereference(tp->root);
 	int h1, h2;
 
 	if (head == NULL)
-		return true;
-
-	if (!force) {
-		for (h1 = 0; h1 <= 256; h1++) {
-			if (rcu_access_pointer(head->table[h1]))
-				return false;
-		}
-	}
+		return;
 
 	for (h1 = 0; h1 <= 256; h1++) {
 		struct route4_bucket *b;
@@ -312,12 +305,10 @@ static bool route4_destroy(struct tcf_proto *tp, bool force)
 			kfree_rcu(b, rcu);
 		}
 	}
-	RCU_INIT_POINTER(tp->root, NULL);
 	kfree_rcu(head, rcu);
-	return true;
 }
 
-static int route4_delete(struct tcf_proto *tp, unsigned long arg)
+static int route4_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
 	struct route4_head *head = rtnl_dereference(tp->root);
 	struct route4_filter *f = (struct route4_filter *)arg;
@@ -325,7 +316,7 @@ static int route4_delete(struct tcf_proto *tp, unsigned long arg)
 	struct route4_filter *nf;
 	struct route4_bucket *b;
 	unsigned int h = 0;
-	int i;
+	int i, h1;
 
 	if (!head || !f)
 		return -EINVAL;
@@ -356,16 +347,25 @@ static int route4_delete(struct tcf_proto *tp, unsigned long arg)
 
 				rt = rtnl_dereference(b->ht[i]);
 				if (rt)
-					return 0;
+					goto out;
 			}
 
 			/* OK, session has no flows */
 			RCU_INIT_POINTER(head->table[to_hash(h)], NULL);
 			kfree_rcu(b, rcu);
+			break;
+		}
+	}
 
-			return 0;
+out:
+	*last = true;
+	for (h1 = 0; h1 <= 256; h1++) {
+		if (rcu_access_pointer(head->table[h1])) {
+			*last = false;
+			break;
 		}
 	}
+
 	return 0;
 }
 
diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index 322438f..1aaff10 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -302,22 +302,13 @@ static void rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
 	call_rcu(&f->rcu, rsvp_delete_filter_rcu);
 }
 
-static bool rsvp_destroy(struct tcf_proto *tp, bool force)
+static void rsvp_destroy(struct tcf_proto *tp)
 {
 	struct rsvp_head *data = rtnl_dereference(tp->root);
 	int h1, h2;
 
 	if (data == NULL)
-		return true;
-
-	if (!force) {
-		for (h1 = 0; h1 < 256; h1++) {
-			if (rcu_access_pointer(data->ht[h1]))
-				return false;
-		}
-	}
-
-	RCU_INIT_POINTER(tp->root, NULL);
+		return;
 
 	for (h1 = 0; h1 < 256; h1++) {
 		struct rsvp_session *s;
@@ -337,10 +328,9 @@ static bool rsvp_destroy(struct tcf_proto *tp, bool force)
 		}
 	}
 	kfree_rcu(data, rcu);
-	return true;
 }
 
-static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
+static int rsvp_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
 	struct rsvp_head *head = rtnl_dereference(tp->root);
 	struct rsvp_filter *nfp, *f = (struct rsvp_filter *)arg;
@@ -348,7 +338,7 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
 	unsigned int h = f->handle;
 	struct rsvp_session __rcu **sp;
 	struct rsvp_session *nsp, *s = f->sess;
-	int i;
+	int i, h1;
 
 	fp = &s->ht[(h >> 8) & 0xFF];
 	for (nfp = rtnl_dereference(*fp); nfp;
@@ -361,7 +351,7 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
 
 			for (i = 0; i <= 16; i++)
 				if (s->ht[i])
-					return 0;
+					goto out;
 
 			/* OK, session has no flows */
 			sp = &head->ht[h & 0xFF];
@@ -370,13 +360,23 @@ static int rsvp_delete(struct tcf_proto *tp, unsigned long arg)
 				if (nsp == s) {
 					RCU_INIT_POINTER(*sp, s->next);
 					kfree_rcu(s, rcu);
-					return 0;
+					goto out;
 				}
 			}
 
-			return 0;
+			break;
 		}
 	}
+
+out:
+	*last = true;
+	for (h1 = 0; h1 < 256; h1++) {
+		if (rcu_access_pointer(head->ht[h1])) {
+			*last = false;
+			break;
+		}
+	}
+
 	return 0;
 }
 
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 0751245..9149a03 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -150,7 +150,7 @@ static void tcindex_destroy_fexts(struct rcu_head *head)
 	kfree(f);
 }
 
-static int tcindex_delete(struct tcf_proto *tp, unsigned long arg)
+static int tcindex_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
 	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r = (struct tcindex_filter_result *) arg;
@@ -186,6 +186,8 @@ static int tcindex_delete(struct tcf_proto *tp, unsigned long arg)
 		call_rcu(&f->rcu, tcindex_destroy_fexts);
 	else
 		call_rcu(&r->rcu, tcindex_destroy_rexts);
+
+	*last = false;
 	return 0;
 }
 
@@ -193,7 +195,9 @@ static int tcindex_destroy_element(struct tcf_proto *tp,
 				   unsigned long arg,
 				   struct tcf_walker *walker)
 {
-	return tcindex_delete(tp, arg);
+	bool last;
+
+	return tcindex_delete(tp, arg, &last);
 }
 
 static void __tcindex_destroy(struct rcu_head *head)
@@ -529,14 +533,11 @@ static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
 	}
 }
 
-static bool tcindex_destroy(struct tcf_proto *tp, bool force)
+static void tcindex_destroy(struct tcf_proto *tp)
 {
 	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcf_walker walker;
 
-	if (!force)
-		return false;
-
 	pr_debug("tcindex_destroy(tp %p),p %p\n", tp, p);
 	walker.count = 0;
 	walker.skip = 0;
@@ -544,7 +545,6 @@ static bool tcindex_destroy(struct tcf_proto *tp, bool force)
 	tcindex_walk(tp, &walker);
 
 	call_rcu(&p->rcu, __tcindex_destroy);
-	return true;
 }
 
 
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ae83c3ae..787573b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -582,37 +582,13 @@ static bool ht_empty(struct tc_u_hnode *ht)
 	return true;
 }
 
-static bool u32_destroy(struct tcf_proto *tp, bool force)
+static void u32_destroy(struct tcf_proto *tp)
 {
 	struct tc_u_common *tp_c = tp->data;
 	struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
 
 	WARN_ON(root_ht == NULL);
 
-	if (!force) {
-		if (root_ht) {
-			if (root_ht->refcnt > 1)
-				return false;
-			if (root_ht->refcnt == 1) {
-				if (!ht_empty(root_ht))
-					return false;
-			}
-		}
-
-		if (tp_c->refcnt > 1)
-			return false;
-
-		if (tp_c->refcnt == 1) {
-			struct tc_u_hnode *ht;
-
-			for (ht = rtnl_dereference(tp_c->hlist);
-			     ht;
-			     ht = rtnl_dereference(ht->next))
-				if (!ht_empty(ht))
-					return false;
-		}
-	}
-
 	if (root_ht && --root_ht->refcnt == 0)
 		u32_destroy_hnode(tp, root_ht);
 
@@ -637,20 +613,22 @@ static bool u32_destroy(struct tcf_proto *tp, bool force)
 	}
 
 	tp->data = NULL;
-	return true;
 }
 
-static int u32_delete(struct tcf_proto *tp, unsigned long arg)
+static int u32_delete(struct tcf_proto *tp, unsigned long arg, bool *last)
 {
 	struct tc_u_hnode *ht = (struct tc_u_hnode *)arg;
 	struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
+	struct tc_u_common *tp_c = tp->data;
+	int ret = 0;
 
 	if (ht == NULL)
-		return 0;
+		goto out;
 
 	if (TC_U32_KEY(ht->handle)) {
 		u32_remove_hw_knode(tp, ht->handle);
-		return u32_delete_key(tp, (struct tc_u_knode *)ht);
+		ret = u32_delete_key(tp, (struct tc_u_knode *)ht);
+		goto out;
 	}
 
 	if (root_ht == ht)
@@ -663,7 +641,40 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg)
 		return -EBUSY;
 	}
 
-	return 0;
+out:
+	*last = true;
+	if (root_ht) {
+		if (root_ht->refcnt > 1) {
+			*last = false;
+			goto ret;
+		}
+		if (root_ht->refcnt == 1) {
+			if (!ht_empty(root_ht)) {
+				*last = false;
+				goto ret;
+			}
+		}
+	}
+
+	if (tp_c->refcnt > 1) {
+		*last = false;
+		goto ret;
+	}
+
+	if (tp_c->refcnt == 1) {
+		struct tc_u_hnode *ht;
+
+		for (ht = rtnl_dereference(tp_c->hlist);
+		     ht;
+		     ht = rtnl_dereference(ht->next))
+			if (!ht_empty(ht)) {
+				*last = false;
+				break;
+			}
+	}
+
+ret:
+	return ret;
 }
 
 #define NR_U32_NODE (1<<12)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index d7b9342..20293ee 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1899,15 +1899,11 @@ int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 }
 EXPORT_SYMBOL(tc_classify);
 
-bool tcf_destroy(struct tcf_proto *tp, bool force)
+void tcf_destroy(struct tcf_proto *tp)
 {
-	if (tp->ops->destroy(tp, force)) {
-		module_put(tp->ops->owner);
-		kfree_rcu(tp, rcu);
-		return true;
-	}
-
-	return false;
+	tp->ops->destroy(tp);
+	module_put(tp->ops->owner);
+	kfree_rcu(tp, rcu);
 }
 
 void tcf_destroy_chain(struct tcf_proto __rcu **fl)
@@ -1916,7 +1912,7 @@ void tcf_destroy_chain(struct tcf_proto __rcu **fl)
 
 	while ((tp = rtnl_dereference(*fl)) != NULL) {
 		RCU_INIT_POINTER(*fl, tp->next);
-		tcf_destroy(tp, true);
+		tcf_destroy(tp);
 	}
 }
 EXPORT_SYMBOL(tcf_destroy_chain);

^ permalink raw reply related

* [PATCH] net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings
From: Philippe Reynes @ 2016-12-12 22:28 UTC (permalink / raw)
  To: hsweeten, davem; +Cc: netdev, linux-kernel, Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
---
 drivers/net/ethernet/cirrus/ep93xx_eth.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/cirrus/ep93xx_eth.c b/drivers/net/ethernet/cirrus/ep93xx_eth.c
index a1de0d1..396c886 100644
--- a/drivers/net/ethernet/cirrus/ep93xx_eth.c
+++ b/drivers/net/ethernet/cirrus/ep93xx_eth.c
@@ -715,16 +715,18 @@ static void ep93xx_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *i
 	strlcpy(info->version, DRV_MODULE_VERSION, sizeof(info->version));
 }
 
-static int ep93xx_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int ep93xx_get_link_ksettings(struct net_device *dev,
+				     struct ethtool_link_ksettings *cmd)
 {
 	struct ep93xx_priv *ep = netdev_priv(dev);
-	return mii_ethtool_gset(&ep->mii, cmd);
+	return mii_ethtool_get_link_ksettings(&ep->mii, cmd);
 }
 
-static int ep93xx_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int ep93xx_set_link_ksettings(struct net_device *dev,
+				     const struct ethtool_link_ksettings *cmd)
 {
 	struct ep93xx_priv *ep = netdev_priv(dev);
-	return mii_ethtool_sset(&ep->mii, cmd);
+	return mii_ethtool_set_link_ksettings(&ep->mii, cmd);
 }
 
 static int ep93xx_nway_reset(struct net_device *dev)
@@ -741,10 +743,10 @@ static u32 ep93xx_get_link(struct net_device *dev)
 
 static const struct ethtool_ops ep93xx_ethtool_ops = {
 	.get_drvinfo		= ep93xx_get_drvinfo,
-	.get_settings		= ep93xx_get_settings,
-	.set_settings		= ep93xx_set_settings,
 	.nway_reset		= ep93xx_nway_reset,
 	.get_link		= ep93xx_get_link,
+	.get_link_ksettings	= ep93xx_get_link_ksettings,
+	.set_link_ksettings	= ep93xx_set_link_ksettings,
 };
 
 static const struct net_device_ops ep93xx_netdev_ops = {
-- 
1.7.4.4

^ permalink raw reply related

* Re: Soft lockup in inet_put_port on 4.6
From: Josef Bacik @ 2016-12-12 22:24 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, Tom Herbert, Linux Kernel Network Developers,
	Josef Bacik
In-Reply-To: <3c022731-e703-34ac-55f1-60f5b94b6d62@stressinduktion.org>


On Mon, Dec 12, 2016 at 1:44 PM, Hannes Frederic Sowa 
<hannes@stressinduktion.org> wrote:
> On 12.12.2016 19:05, Josef Bacik wrote:
>>  On Fri, Dec 9, 2016 at 11:14 PM, Eric Dumazet 
>> <eric.dumazet@gmail.com>
>>  wrote:
>>>  On Fri, 2016-12-09 at 19:47 -0800, Eric Dumazet wrote:
>>> 
>>>> 
>>>>   Hmm... Is your ephemeral port range includes the port your load
>>>>   balancing app is using ?
>>> 
>>>  I suspect that you might have processes doing bind( port = 0) that 
>>> are
>>>  trapped into the bind_conflict() scan ?
>>> 
>>>  With 100,000 + timewaits there, this possibly hurts.
>>> 
>>>  Can you try the following loop breaker ?
>> 
>>  It doesn't appear that the app is doing bind(port = 0) during normal
>>  operation.  I tested this patch and it made no difference.  I'm 
>> going to
>>  test simply restarting the app without changing to the SO_REUSEPORT
>>  option.  Thanks,
> 
> Would it be possible to trace the time the function uses with trace? 
> If
> we don't see the number growing considerably over time we probably can
> rule out that we loop somewhere in there (I would instrument
> inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port).
> 
> __inet_hash_connect -> __inet_check_established also takes a lock
> (inet_ehash_lockp) which can be locked from inet_diag code path during
> socket diag info dumping.
> 
> Unfortunately we couldn't reproduce it so far. :/

So I had a bcc script running to time how long we spent in 
inet_csk_bind_conflict, __inet_hash_connect and inet_csk_get_port, but 
of course I'm an idiot and didn't actually separate out the stats so I 
could tell _which_ one was taking forever.  But anyway here's a normal 
distribution on the box

     Some shit           : count     distribution
         0 -> 1          : 0        |                                   
     |
         2 -> 3          : 0        |                                   
     |
         4 -> 7          : 0        |                                   
     |
         8 -> 15         : 0        |                                   
     |
        16 -> 31         : 0        |                                   
     |
        32 -> 63         : 0        |                                   
     |
        64 -> 127        : 0        |                                   
     |
       128 -> 255        : 0        |                                   
     |
       256 -> 511        : 0        |                                   
     |
       512 -> 1023       : 0        |                                   
     |
      1024 -> 2047       : 74       |                                   
     |
      2048 -> 4095       : 10537    
|****************************************|
      4096 -> 8191       : 8497     |********************************   
     |
      8192 -> 16383      : 3745     |**************                     
     |
     16384 -> 32767      : 300      |*                                  
     |
     32768 -> 65535      : 250      |                                   
     |
     65536 -> 131071     : 180      |                                   
     |
    131072 -> 262143     : 71       |                                   
     |
    262144 -> 524287     : 18       |                                   
     |
    524288 -> 1048575    : 5        |                                   
     |

With the times in nanoseconds, and here's the distribution during the 
problem

     Some shit           : count     distribution
         0 -> 1          : 0        |                                   
     |
         2 -> 3          : 0        |                                   
     |
         4 -> 7          : 0        |                                   
     |
         8 -> 15         : 0        |                                   
     |
        16 -> 31         : 0        |                                   
     |
        32 -> 63         : 0        |                                   
     |
        64 -> 127        : 0        |                                   
     |
       128 -> 255        : 0        |                                   
     |
       256 -> 511        : 0        |                                   
     |
       512 -> 1023       : 0        |                                   
     |
      1024 -> 2047       : 21       |                                   
     |
      2048 -> 4095       : 21820    
|****************************************|
      4096 -> 8191       : 11598    |*********************              
     |
      8192 -> 16383      : 4337     |*******                            
     |
     16384 -> 32767      : 290      |                                   
     |
     32768 -> 65535      : 59       |                                   
     |
     65536 -> 131071     : 23       |                                   
     |
    131072 -> 262143     : 12       |                                   
     |
    262144 -> 524287     : 6        |                                   
     |
    524288 -> 1048575    : 19       |                                   
     |
   1048576 -> 2097151    : 1079     |*                                  
     |
   2097152 -> 4194303    : 0        |                                   
     |
   4194304 -> 8388607    : 1        |                                   
     |
   8388608 -> 16777215   : 0        |                                   
     |
  16777216 -> 33554431   : 0        |                                   
     |
  33554432 -> 67108863   : 1192     |**                                 
     |
               Some shit                     : count     distribution
                   0 -> 1                    : 0        |               
     |
                   2 -> 3                    : 0        |               
     |
                   4 -> 7                    : 0        |               
     |
                   8 -> 15                   : 0        |               
     |
                  16 -> 31                   : 0        |               
     |
                  32 -> 63                   : 0        |               
     |
                  64 -> 127                  : 0        |               
     |
                 128 -> 255                  : 0        |               
     |
                 256 -> 511                  : 0        |               
     |
                 512 -> 1023                 : 0        |               
     |
                1024 -> 2047                 : 48       |               
     |
                2048 -> 4095                 : 14714    
|********************|
                4096 -> 8191                 : 6769     |*********      
     |
                8192 -> 16383                : 2234     |***            
     |
               16384 -> 32767                : 422      |               
     |
               32768 -> 65535                : 208      |               
     |
               65536 -> 131071               : 61       |               
     |
              131072 -> 262143               : 10       |               
     |
              262144 -> 524287               : 416      |               
     |
              524288 -> 1048575              : 826      |*              
     |
             1048576 -> 2097151              : 598      |               
     |
             2097152 -> 4194303              : 10       |               
     |
             4194304 -> 8388607              : 0        |               
     |
             8388608 -> 16777215             : 1        |               
     |
            16777216 -> 33554431             : 289      |               
     |
            33554432 -> 67108863             : 921      |*              
     |
            67108864 -> 134217727            : 74       |               
     |
           134217728 -> 268435455            : 75       |               
     |
           268435456 -> 536870911            : 48       |               
     |
           536870912 -> 1073741823           : 25       |               
     |
          1073741824 -> 2147483647           : 3        |               
     |
          2147483648 -> 4294967295           : 2        |               
     |
          4294967296 -> 8589934591           : 1        |               
     |

As you can see we start getting tail latencies of up to 4-8 seconds.  
Tomorrow I'll separate out the stats so we can know which function is 
the problem child.  Sorry about not doing that first.  Thanks,

Josef

^ permalink raw reply

* Re: [PATCH for-next 0/6] IB/hns: Bug Fixes for HNS RoCE Driver
From: Doug Ledford @ 2016-12-12 22:09 UTC (permalink / raw)
  To: Salil Mehta
  Cc: xavier.huwei, oulijun, xushaobo2, mehta.salil.lnk, lijun_nudt,
	linux-rdma, netdev, linux-kernel, linuxarm
In-Reply-To: <20161129231030.1105600-1-salil.mehta@huawei.com>


[-- Attachment #1.1: Type: text/plain, Size: 1076 bytes --]

On 11/29/2016 6:10 PM, Salil Mehta wrote:
> This patch-set contains bug fixes for the HNS RoCE driver.
> 
> Lijun Ou (1):
>   IB/hns: Fix the IB device name
> 
> Shaobo Xu (2):
>   IB/hns: Fix the bug when free mr
>   IB/hns: Fix the bug when free cq
> 
> Wei Hu (Xavier) (3):
>   IB/hns: Fix the bug when destroy qp
>   IB/hns: Fix the bug of setting port mtu
>   IB/hns: Delete the redundant memset operation
> 
>  drivers/infiniband/hw/hns/hns_roce_cmd.h    |    5 -
>  drivers/infiniband/hw/hns/hns_roce_common.h |   42 ++
>  drivers/infiniband/hw/hns/hns_roce_cq.c     |   27 +-
>  drivers/infiniband/hw/hns/hns_roce_device.h |   18 +
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  967 ++++++++++++++++++++++++---
>  drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   57 ++
>  drivers/infiniband/hw/hns/hns_roce_main.c   |   26 +-
>  drivers/infiniband/hw/hns/hns_roce_mr.c     |   21 +-
>  8 files changed, 1026 insertions(+), 137 deletions(-)
> 

Series applied, thanks.

-- 
Doug Ledford <dledford@redhat.com>
    GPG Key ID: 0E572FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox