Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v2 0/5] Make tcp-metrics source-address aware
From: David Miller @ 2014-01-10 22:38 UTC (permalink / raw)
  To: christoph.paasch; +Cc: netdev, eric.dumazet, ycheng, ja
In-Reply-To: <1389193559-16756-1-git-send-email-christoph.paasch@uclouvain.be>

From: Christoph Paasch <christoph.paasch@uclouvain.be>
Date: Wed,  8 Jan 2014 16:05:54 +0100

> Currently tcp-metrics only stores per-destination addresses. This brings
> problems, when a host has multiple interfaces (e.g., a smartphone having
> WiFi/3G):
> 
> For example, a host contacting a server over WiFi will store the tcp-metrics
> per destination IP. If then the host contacts the same server over 3G, the
> same tcp-metrics will be used, although the path-characteristics are completly
> different (e.g., the ssthresh is probably not the same).
> 
> In case of TFO this is not a problem, as the server will provide us a new cookie
> once he saw our SYN+DATA with an incorrect cookie.
> It may be (in case of carrier-grade NAT), that we keep the same public IP but
> have a different private IP. Thus, we better reuse the old cookie even if our
> source-IP has changed. However, this scenario is probably very uncommon, as 
> carriers try to provide the same src-IP to the clients behind their CGN.
> 
> Patches 1 + 2 add the source-IP to the tcp metrics.
> 
> Patches 3 to 5 modify the netlink-api to support the source-IP. From now on,
> when using the command "ip tcp_metrics delete address ADDRESS" all entries
> which match this destination IP will be deleted.
> 
> Today's iproute2 will complain when doing "ip tcp_metrics flush PREFIX" if
> several entries are present for the same destination-IP but with different
> source-IPs:
> 
> root@client:~/test# ip tcp_metrics
> 10.2.1.2 age 3.640sec rtt 16250us rttvar 15000us cwnd 10
> 10.2.1.2 age 4.030sec rtt 18750us rttvar 15000us cwnd 10
> root@client:~/test# ip tcp_metrics flush 10.2.1.2/16
> Failed to send flush request
> : No such process
> 
> 
> Follow-up patches will modify iproute2 to handle this correctly and allow
> specifying the source-IP in the get/del commands.
> 
> 
> v2: Added the patch that allows to selectively get/del of tcp-metrics based
>     on src-IP and moved the patch that adds the new netlink attribute before
>     the other patches.

Looks good, series applied, thanks Christoph.

^ permalink raw reply

* Re: [PATCH net-next v2 2/5] tcp: metrics: Add source-address to tcp-metrics
From: David Miller @ 2014-01-10 22:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: christoph.paasch, netdev, ycheng, ja
In-Reply-To: <1389222826.31367.28.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 08 Jan 2014 15:13:46 -0800

> On Wed, 2014-01-08 at 23:43 +0100, Christoph Paasch wrote:
>> Hello Eric,
>> 
>> On 08/01/14 - 09:55:51, Eric Dumazet wrote:
>> > On Wed, 2014-01-08 at 16:05 +0100, Christoph Paasch wrote:
>> > > We add the source-address to the tcp-metrics, so that different metrics
>> > > will be used per source/destination-pair. We use the destination-hash to
>> > > store the metric inside the hash-table. That way, deleting and dumping
>> > > via "ip tcp_metrics" is easy.
>> > 
>> > Note that this has the following problem :
>> > 
>> > Some applications use a set of source IP addresses to overcome the 64K
>> > port limitation.
>> 
>> Ok, did not know about that.
>> 
>> > tcp_metrics uses a hard-coded TCP_METRICS_RECLAIM_DEPTH value of 5,
>> > meaning that cache wont be able to store more than 5 source IP addresses
>> > (reaching one particular remote IP).
>> 
>> Maybe we could do something like the below (yet untested). That way we allow
>> up to 32 entries with the same destination but different source and still
>> only 5 with different destinations.
>> 
>> I guess 32 * 64K connections is enough. :)
>> We could also make TCP_METRICS_RECLAIM_DEPTH(_DST) a tunable.
> 
> Well, not sure if this is a problem anyway, and if we want extra
> complexity for this rare use case, considering tcp metrics for
> high number of flows sharing a common path is unlikely to be useful
> (with exception of Fast Open, but again it must be rare)

I think we are overthinking this, even for the aforementioned case.

If people report that this is a real problem they are hitting, and
not just with constructed test cases, we can work on a solution.

^ permalink raw reply

* [PATCH v3 net-next] net: vxlan: when lower dev unregisters remove vxlan dev as well
From: Daniel Borkmann @ 2014-01-10 22:35 UTC (permalink / raw)
  To: davem; +Cc: netdev

We can create a vxlan device with an explicit underlying carrier.
In that case, when the carrier link is being deleted from the
system (e.g. due to module unload) we should also clean up all
created vxlan devices on top of it since otherwise we're in an
inconsistent state in vxlan device. In that case, the user needs
to remove all such devices, while in case of other virtual devs
that sit on top of physical ones, it is usually the case that
these devices do unregister automatically as well and do not
leave the burden on the user.

This work is not necessary when vxlan device was not created with
a real underlying device, as connections can resume in that case
when driver is plugged again. But at least for the other cases,
we should go ahead and do the cleanup on removal.

We don't register the notifier during vxlan_newlink() here since
I consider this event rather rare, and therefore we should not
bloat vxlan's core structure unecessary. Also, we can simply make
use of unregister_netdevice_many() to batch that.

E.g. `ip -d link show vxlan13` after carrier removal before
this patch:

5: vxlan13: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default
    link/ether 1e:47:da:6d:4d:99 brd ff:ff:ff:ff:ff:ff promiscuity 0
    vxlan id 13 group 239.0.0.10 dev 2 port 32768 61000 ageing 300
                                 ^^^^^
While at it, we should also use vxlan_dellink() handler in
vxlan_exit_net(), since i) we're not in hurry and we should better
be consistent, ii) in case future code will kfree() things in
vxlan_dellink(), we would leak it right here unnoticed. Therfore,
do not only half of the cleanup work, but make it properly.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 v1->v2:
  - Removed BUG_ON as it's not needed.
 v2->v3:
  - Removed dev->reg_state check.

 drivers/net/vxlan.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 53 insertions(+), 7 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 481f85d..7255527 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2656,6 +2656,45 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.fill_info	= vxlan_fill_info,
 };
 
+static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
+					     struct net_device *dev)
+{
+	struct vxlan_dev *vxlan, *next;
+	LIST_HEAD(list_kill);
+
+	list_for_each_entry_safe(vxlan, next, &vn->vxlan_list, next) {
+		struct vxlan_rdst *dst = &vxlan->default_dst;
+
+		/* In case we created vxlan device with carrier
+		 * and we loose the carrier due to module unload
+		 * we also need to remove vxlan device. In other
+		 * cases, it's not necessary and remote_ifindex
+		 * is 0 here, so no matches.
+		 */
+		if (dst->remote_ifindex == dev->ifindex)
+			vxlan_dellink(vxlan->dev, &list_kill);
+	}
+
+	unregister_netdevice_many(&list_kill);
+	list_del(&list_kill);
+}
+
+static int vxlan_lowerdev_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
+
+	if (event == NETDEV_UNREGISTER)
+		vxlan_handle_lowerdev_unregister(vn, dev);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block vxlan_notifier_block __read_mostly = {
+	.notifier_call = vxlan_lowerdev_event,
+};
+
 static __net_init int vxlan_init_net(struct net *net)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
@@ -2673,13 +2712,14 @@ static __net_init int vxlan_init_net(struct net *net)
 static __net_exit void vxlan_exit_net(struct net *net)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
-	struct vxlan_dev *vxlan;
-	LIST_HEAD(list);
+	struct vxlan_dev *vxlan, *next;
+	LIST_HEAD(list_kill);
 
 	rtnl_lock();
-	list_for_each_entry(vxlan, &vn->vxlan_list, next)
-		unregister_netdevice_queue(vxlan->dev, &list);
-	unregister_netdevice_many(&list);
+	list_for_each_entry_safe(vxlan, next, &vn->vxlan_list, next)
+		vxlan_dellink(vxlan->dev, &list_kill);
+	unregister_netdevice_many(&list_kill);
+	list_del(&list_kill);
 	rtnl_unlock();
 }
 
@@ -2704,12 +2744,17 @@ static int __init vxlan_init_module(void)
 	if (rc)
 		goto out1;
 
-	rc = rtnl_link_register(&vxlan_link_ops);
+	rc = register_netdevice_notifier(&vxlan_notifier_block);
 	if (rc)
 		goto out2;
 
-	return 0;
+	rc = rtnl_link_register(&vxlan_link_ops);
+	if (rc)
+		goto out3;
 
+	return 0;
+out3:
+	unregister_netdevice_notifier(&vxlan_notifier_block);
 out2:
 	unregister_pernet_device(&vxlan_net_ops);
 out1:
@@ -2721,6 +2766,7 @@ late_initcall(vxlan_init_module);
 static void __exit vxlan_cleanup_module(void)
 {
 	rtnl_link_unregister(&vxlan_link_ops);
+	unregister_netdevice_notifier(&vxlan_notifier_block);
 	destroy_workqueue(vxlan_wq);
 	unregister_pernet_device(&vxlan_net_ops);
 	rcu_barrier();
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 3/3] net: Add trace events for all receive entry points, exposing more skb fields
From: Ben Hutchings @ 2014-01-10 22:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1389392139.2025.123.camel@bwh-desktop.uk.level5networks.com>

The existing net/netif_rx and net/netif_receive_skb trace events
provide little information about the skb, nor do they indicate how it
entered the stack.

Add trace events at entry of each of the exported functions, including
most fields that are likely to be interesting for debugging driver
datapath behaviour.  Split netif_rx() and netif_receive_skb() so that
internal calls are not traced.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 include/trace/events/net.h | 100 +++++++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             | 100 +++++++++++++++++++++++++++------------------
 2 files changed, 161 insertions(+), 39 deletions(-)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 0b61f2abee4c..731907c35051 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -136,6 +136,106 @@ DEFINE_EVENT(net_dev_template, netif_rx,
 
 	TP_ARGS(skb)
 );
+
+DECLARE_EVENT_CLASS(net_dev_rx_verbose_template,
+
+	TP_PROTO(const struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__string(	name,			skb->dev->name	)
+		__field(	unsigned int,		napi_id		)
+		__field(	u16,			queue_mapping	)
+		__field(	const void *,		skbaddr		)
+		__field(	bool,			vlan_tagged	)
+		__field(	u16,			vlan_proto	)
+		__field(	u16,			vlan_tci	)
+		__field(	u16,			protocol	)
+		__field(	u8,			ip_summed	)
+		__field(	u32,			rxhash		)
+		__field(	bool,			l4_rxhash	)
+		__field(	unsigned int,		len		)
+		__field(	unsigned int,		data_len	)
+		__field(	unsigned int,		truesize	)
+		__field(	bool,			mac_header_valid)
+		__field(	int,			mac_header	)
+		__field(	unsigned char,		nr_frags	)
+		__field(	u16,			gso_size	)
+		__field(	u16,			gso_type	)
+	),
+
+	TP_fast_assign(
+		__assign_str(name, skb->dev->name);
+#ifdef CONFIG_NET_RX_BUSY_POLL
+		__entry->napi_id = skb->napi_id;
+#else
+		__entry->napi_id = 0;
+#endif
+		__entry->queue_mapping = skb->queue_mapping;
+		__entry->skbaddr = skb;
+		__entry->vlan_tagged = vlan_tx_tag_present(skb);
+		__entry->vlan_proto = ntohs(skb->vlan_proto);
+		__entry->vlan_tci = vlan_tx_tag_get(skb);
+		__entry->protocol = ntohs(skb->protocol);
+		__entry->ip_summed = skb->ip_summed;
+		__entry->rxhash = skb->rxhash;
+		__entry->l4_rxhash = skb->l4_rxhash;
+		__entry->len = skb->len;
+		__entry->data_len = skb->data_len;
+		__entry->truesize = skb->truesize;
+		__entry->mac_header_valid = skb_mac_header_was_set(skb);
+		__entry->mac_header = skb_mac_header(skb) - skb->data;
+		__entry->nr_frags = skb_shinfo(skb)->nr_frags;
+		__entry->gso_size = skb_shinfo(skb)->gso_size;
+		__entry->gso_type = skb_shinfo(skb)->gso_type;
+	),
+
+	TP_printk("dev=%s napi_id=%#x queue_mapping=%u skbaddr=%p vlan_tagged=%d vlan_proto=0x%04x vlan_tci=0x%04x protocol=0x%04x ip_summed=%d rxhash=0x%08x l4_rxhash=%d len=%u data_len=%u truesize=%u mac_header_valid=%d mac_header=%d nr_frags=%d gso_size=%d gso_type=%#x",
+		  __get_str(name), __entry->napi_id, __entry->queue_mapping,
+		  __entry->skbaddr, __entry->vlan_tagged, __entry->vlan_proto,
+		  __entry->vlan_tci, __entry->protocol, __entry->ip_summed,
+		  __entry->rxhash, __entry->l4_rxhash, __entry->len,
+		  __entry->data_len, __entry->truesize,
+		  __entry->mac_header_valid, __entry->mac_header,
+		  __entry->nr_frags, __entry->gso_size, __entry->gso_type)
+);
+
+DEFINE_EVENT(net_dev_rx_verbose_template, napi_gro_frags_entry,
+
+	TP_PROTO(const struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_rx_verbose_template, napi_gro_receive_entry,
+
+	TP_PROTO(const struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_rx_verbose_template, netif_receive_skb_entry,
+
+	TP_PROTO(const struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_entry,
+
+	TP_PROTO(const struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_ni_entry,
+
+	TP_PROTO(const struct sk_buff *skb),
+
+	TP_ARGS(skb)
+);
+
 #endif /* _TRACE_NET_H */
 
 /* This part must be outside protection */
diff --git a/net/core/dev.c b/net/core/dev.c
index 82b3c0fa0ea7..02979938cd42 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -147,6 +147,8 @@ struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 struct list_head ptype_all __read_mostly;	/* Taps */
 static struct list_head offload_base __read_mostly;
 
+static int netif_rx_internal(struct sk_buff *skb);
+
 /*
  * The @dev_base_head list is protected by @dev_base_lock and the rtnl
  * semaphore.
@@ -1698,7 +1700,7 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 	skb_scrub_packet(skb, true);
 	skb->protocol = eth_type_trans(skb, dev);
 
-	return netif_rx(skb);
+	return netif_rx_internal(skb);
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
 
@@ -3219,22 +3221,7 @@ enqueue:
 	return NET_RX_DROP;
 }
 
-/**
- *	netif_rx	-	post buffer to the network code
- *	@skb: buffer to post
- *
- *	This function receives a packet from a device driver and queues it for
- *	the upper (protocol) levels to process.  It always succeeds. The buffer
- *	may be dropped during processing for congestion control or by the
- *	protocol layers.
- *
- *	return values:
- *	NET_RX_SUCCESS	(no congestion)
- *	NET_RX_DROP     (packet was dropped)
- *
- */
-
-int netif_rx(struct sk_buff *skb)
+static int netif_rx_internal(struct sk_buff *skb)
 {
 	int ret;
 
@@ -3270,14 +3257,38 @@ int netif_rx(struct sk_buff *skb)
 	}
 	return ret;
 }
+
+/**
+ *	netif_rx	-	post buffer to the network code
+ *	@skb: buffer to post
+ *
+ *	This function receives a packet from a device driver and queues it for
+ *	the upper (protocol) levels to process.  It always succeeds. The buffer
+ *	may be dropped during processing for congestion control or by the
+ *	protocol layers.
+ *
+ *	return values:
+ *	NET_RX_SUCCESS	(no congestion)
+ *	NET_RX_DROP     (packet was dropped)
+ *
+ */
+
+int netif_rx(struct sk_buff *skb)
+{
+	trace_netif_rx_entry(skb);
+
+	return netif_rx_internal(skb);
+}
 EXPORT_SYMBOL(netif_rx);
 
 int netif_rx_ni(struct sk_buff *skb)
 {
 	int err;
 
+	trace_netif_rx_ni_entry(skb);
+
 	preempt_disable();
-	err = netif_rx(skb);
+	err = netif_rx_internal(skb);
 	if (local_softirq_pending())
 		do_softirq();
 	preempt_enable();
@@ -3662,22 +3673,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	return ret;
 }
 
-/**
- *	netif_receive_skb - process receive buffer from network
- *	@skb: buffer to process
- *
- *	netif_receive_skb() is the main receive data processing function.
- *	It always succeeds. The buffer may be dropped during processing
- *	for congestion control or by the protocol layers.
- *
- *	This function may only be called from softirq context and interrupts
- *	should be enabled.
- *
- *	Return values (usually ignored):
- *	NET_RX_SUCCESS: no congestion
- *	NET_RX_DROP: packet was dropped
- */
-int netif_receive_skb(struct sk_buff *skb)
+static int netif_receive_skb_internal(struct sk_buff *skb)
 {
 	net_timestamp_check(netdev_tstamp_prequeue, skb);
 
@@ -3703,6 +3699,28 @@ int netif_receive_skb(struct sk_buff *skb)
 #endif
 	return __netif_receive_skb(skb);
 }
+
+/**
+ *	netif_receive_skb - process receive buffer from network
+ *	@skb: buffer to process
+ *
+ *	netif_receive_skb() is the main receive data processing function.
+ *	It always succeeds. The buffer may be dropped during processing
+ *	for congestion control or by the protocol layers.
+ *
+ *	This function may only be called from softirq context and interrupts
+ *	should be enabled.
+ *
+ *	Return values (usually ignored):
+ *	NET_RX_SUCCESS: no congestion
+ *	NET_RX_DROP: packet was dropped
+ */
+int netif_receive_skb(struct sk_buff *skb)
+{
+	trace_netif_receive_skb_entry(skb);
+
+	return netif_receive_skb_internal(skb);
+}
 EXPORT_SYMBOL(netif_receive_skb);
 
 /* Network device is going away, flush any packets still pending
@@ -3764,7 +3782,7 @@ static int napi_gro_complete(struct sk_buff *skb)
 	}
 
 out:
-	return netif_receive_skb(skb);
+	return netif_receive_skb_internal(skb);
 }
 
 /* napi->gro_list contains packets ordered by age.
@@ -3959,7 +3977,7 @@ static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
 {
 	switch (ret) {
 	case GRO_NORMAL:
-		if (netif_receive_skb(skb))
+		if (netif_receive_skb_internal(skb))
 			ret = GRO_DROP;
 		break;
 
@@ -3984,6 +4002,8 @@ static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
 
 gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
+	trace_napi_gro_receive_entry(skb);
+
 	return napi_skb_finish(dev_gro_receive(napi, skb), skb);
 }
 EXPORT_SYMBOL(napi_gro_receive);
@@ -4017,7 +4037,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi, struct sk_buff *
 {
 	switch (ret) {
 	case GRO_NORMAL:
-		if (netif_receive_skb(skb))
+		if (netif_receive_skb_internal(skb))
 			ret = GRO_DROP;
 		break;
 
@@ -4056,6 +4076,8 @@ gro_result_t napi_gro_frags(struct napi_struct *napi)
 	if (!skb)
 		return GRO_DROP;
 
+	trace_napi_gro_frags_entry(skb);
+
 	return napi_frags_finish(napi, skb, dev_gro_receive(napi, skb));
 }
 EXPORT_SYMBOL(napi_gro_frags);
@@ -6592,11 +6614,11 @@ static int dev_cpu_callback(struct notifier_block *nfb,
 
 	/* Process offline CPU's input_pkt_queue */
 	while ((skb = __skb_dequeue(&oldsd->process_queue))) {
-		netif_rx(skb);
+		netif_rx_internal(skb);
 		input_queue_head_incr(oldsd);
 	}
 	while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
-		netif_rx(skb);
+		netif_rx_internal(skb);
 		input_queue_head_incr(oldsd);
 	}
 

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 2/3] net: Add net_dev_start_xmit trace event, exposing more skb fields
From: Ben Hutchings @ 2014-01-10 22:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1389392139.2025.123.camel@bwh-desktop.uk.level5networks.com>

The existing net/net_dev_xmit trace event provides little information
about the skb that has been passed to the driver, and it is not
simple to add more since the skb may already have been freed at
the point the event is emitted.

Add a separate trace event before the skb is passed to the driver,
including most fields that are likely to be interesting for debugging
driver datapath behaviour.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 include/trace/events/net.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             |  2 ++
 2 files changed, 60 insertions(+)

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index f99645d05a8f..0b61f2abee4c 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -6,9 +6,67 @@
 
 #include <linux/skbuff.h>
 #include <linux/netdevice.h>
+#include <linux/if_vlan.h>
 #include <linux/ip.h>
 #include <linux/tracepoint.h>
 
+TRACE_EVENT(net_dev_start_xmit,
+
+	TP_PROTO(const struct sk_buff *skb, const struct net_device *dev),
+
+	TP_ARGS(skb, dev),
+
+	TP_STRUCT__entry(
+		__string(	name,			dev->name	)
+		__field(	u16,			queue_mapping	)
+		__field(	const void *,		skbaddr		)
+		__field(	bool,			vlan_tagged	)
+		__field(	u16,			vlan_proto	)
+		__field(	u16,			vlan_tci	)
+		__field(	u16,			protocol	)
+		__field(	u8,			ip_summed	)
+		__field(	unsigned int,		len		)
+		__field(	unsigned int,		data_len	)
+		__field(	int,			network_offset	)
+		__field(	bool,			transport_offset_valid)
+		__field(	int,			transport_offset)
+		__field(	u8,			tx_flags	)
+		__field(	u16,			gso_size	)
+		__field(	u16,			gso_segs	)
+		__field(	u16,			gso_type	)
+	),
+
+	TP_fast_assign(
+		__assign_str(name, dev->name);
+		__entry->queue_mapping = skb->queue_mapping;
+		__entry->skbaddr = skb;
+		__entry->vlan_tagged = vlan_tx_tag_present(skb);
+		__entry->vlan_proto = ntohs(skb->vlan_proto);
+		__entry->vlan_tci = vlan_tx_tag_get(skb);
+		__entry->protocol = ntohs(skb->protocol);
+		__entry->ip_summed = skb->ip_summed;
+		__entry->len = skb->len;
+		__entry->data_len = skb->data_len;
+		__entry->network_offset = skb_network_offset(skb);
+		__entry->transport_offset_valid =
+			skb_transport_header_was_set(skb);
+		__entry->transport_offset = skb_transport_offset(skb);
+		__entry->tx_flags = skb_shinfo(skb)->tx_flags;
+		__entry->gso_size = skb_shinfo(skb)->gso_size;
+		__entry->gso_segs = skb_shinfo(skb)->gso_segs;
+		__entry->gso_type = skb_shinfo(skb)->gso_type;
+	),
+
+	TP_printk("dev=%s queue_mapping=%u skbaddr=%p vlan_tagged=%d vlan_proto=0x%04x vlan_tci=0x%04x protocol=0x%04x ip_summed=%d len=%u data_len=%u network_offset=%d transport_offset_valid=%d transport_offset=%d tx_flags=%d gso_size=%d gso_segs=%d gso_type=%#x",
+		  __get_str(name), __entry->queue_mapping, __entry->skbaddr,
+		  __entry->vlan_tagged, __entry->vlan_proto, __entry->vlan_tci,
+		  __entry->protocol, __entry->ip_summed, __entry->len,
+		  __entry->data_len, 
+		  __entry->network_offset, __entry->transport_offset_valid,
+		  __entry->transport_offset, __entry->tx_flags,
+		  __entry->gso_size, __entry->gso_segs, __entry->gso_type)
+);
+
 TRACE_EVENT(net_dev_xmit,
 
 	TP_PROTO(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 41bb08f1ec99..82b3c0fa0ea7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2596,6 +2596,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			dev_queue_xmit_nit(skb, dev);
 
 		skb_len = skb->len;
+		trace_net_dev_start_xmit(skb, dev);
 		rc = ops->ndo_start_xmit(skb, dev);
 		trace_net_dev_xmit(skb, rc, dev, skb_len);
 		if (rc == NETDEV_TX_OK)
@@ -2614,6 +2615,7 @@ gso:
 			dev_queue_xmit_nit(nskb, dev);
 
 		skb_len = nskb->len;
+		trace_net_dev_start_xmit(nskb, dev);
 		rc = ops->ndo_start_xmit(nskb, dev);
 		trace_net_dev_xmit(nskb, rc, dev, skb_len);
 		if (unlikely(rc != NETDEV_TX_OK)) {


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 1/3] net: Fix indentation in dev_hard_start_xmit()
From: Ben Hutchings @ 2014-01-10 22:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1389392139.2025.123.camel@bwh-desktop.uk.level5networks.com>


Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 net/core/dev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index daa153738b40..41bb08f1ec99 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2596,8 +2596,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			dev_queue_xmit_nit(skb, dev);
 
 		skb_len = skb->len;
-			rc = ops->ndo_start_xmit(skb, dev);
-
+		rc = ops->ndo_start_xmit(skb, dev);
 		trace_net_dev_xmit(skb, rc, dev, skb_len);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 0/3] Improve tracing at the driver/core boundary
From: Ben Hutchings @ 2014-01-10 22:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

These patches add static tracpeoints at the driver/core boundary which
record various skb fields likely to be useful for datapath debugging.
On the TX side the boundary is where the core calls ndo_start_xmit, and
on the RX side it is where any of the various exported receive functions
is called.

The set of skb fields is mostly based on what I thought would be
interesting for sfc.

These patches are basically the same as what I sent as an RFC in
November, but rebased.  They now depend on 'net: core: explicitly select
a txq before doing l2 forwarding', so please merge net into net-next
before trying to apply them.  The first patch fixes a code formatting
error left behind after that fix.

Ben.

Ben Hutchings (3):
  net: Fix indentation in dev_hard_start_xmit()
  net: Add net_dev_start_xmit trace event, exposing more skb fields
  net: Add trace events for all receive entry points, exposing more skb
    fields

 include/trace/events/net.h | 158 +++++++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             | 105 ++++++++++++++++++------------
 2 files changed, 222 insertions(+), 41 deletions(-)


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH net,stable] net: usbnet: fix SG initialisation
From: Bjørn Mork @ 2014-01-10 22:10 UTC (permalink / raw)
  To: netdev; +Cc: linux-usb, Thomas Kear, Ben Hutchings, Bjørn Mork, Ming Lei

Commit 60e453a940ac ("USBNET: fix handling padding packet")
added an extra SG entry in case padding is necessary, but
failed to update the initialisation of the list. This can
cause list traversal to fall off the end of the list,
resulting in an oops.

Fixes: 60e453a940ac ("USBNET: fix handling padding packet")
Reported-by: Thomas Kear <thomas@kear.co.nz>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
---
I don't have the hardware to verify this fix.  It would be good if
someone could test it before it goes to stable...

But in case this works, it should go into v3.12 stable.


Bjørn

 drivers/net/usb/usbnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 8494bb53ebdc..aba04f561760 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1245,7 +1245,7 @@ static int build_dma_sg(const struct sk_buff *skb, struct urb *urb)
 		return -ENOMEM;
 
 	urb->num_sgs = num_sgs;
-	sg_init_table(urb->sg, urb->num_sgs);
+	sg_init_table(urb->sg, urb->num_sgs + 1);
 
 	sg_set_buf(&urb->sg[s++], skb->data, skb_headlen(skb));
 	total_len += skb_headlen(skb);
-- 
1.8.5.2

^ permalink raw reply related

* Re: PROBLEM: usbnet / ax88179_178a: Panic in usb_hcd_map_urb_for_dma
From: Bjørn Mork @ 2014-01-10 22:09 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Thomas Kear, netdev, linux-usb-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1389385824.2025.95.camel-/LGg1Z1CJKQ+9kgCwbf1HqK4ta4zdZpAajtMo4Cw6ucAvxtiuMwx3w@public.gmane.org>

Ben Hutchings <bhutchings-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org> writes:
> On Sat, 2014-01-11 at 08:56 +1300, Thomas Kear wrote:
>> USB3 gigabit ethernet adapters with the ASIX AX88179 chipset (LevelOne
>> USB0401-V3, Plugable USB3-E1000, SIIG JU-NE0211-S1 and others) are
>> experiencing kernel panics in usb_hcd_map_urb_for_dma since 3.12.
> [...]
>> So far as I can tell, the driver is unaffected as late as 3.11.6, but
>> problematic as of 3.12 (and still affected in 3.13-rc5).  The history
>> of drivers/net/usb/ax88179_178a.c for this time period yields this
>> patch, which at least in my somewhat limited understanding appeared a
>> likely candidate.  I've reverted this on my system - against several
>> linux-next builds from the last 3-4 weeks - and have had no issues
>> with this network controller since.
>> 
>> commit 3804fad45411b48233b48003e33a78f290d227c8
>> Author: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>> Date:   Thu Aug 8 21:48:25 2013 +0800
>> 
>>     USBNET: ax88179_178a: enable tso if usb host supports sg dma
>
> Enabling SG DMA here has unfortunately caused a number of regressions as
> the XHCI (USB 3 controller) hardware is a bit inflexible and the driver
> wasn't quite ready to handle the SG lists generated for net devices.
>
> I don't think I've seen this particular symptom though, and it might
> indicate a bug in usbnet.

Yes, I believe this code in usbnet is still somewhat unmature and not
much tested.  Unfortunately I don't have any PC with xhci, so I can't do
much about that.

But looking at the code I think I found and obvious miss in the SG list
initialisation.  I'll post a proposed fix for that.  Would be good if
someone was able to test it.



Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next v2 14/17] i40e: enable PTP
From: Ben Hutchings @ 2014-01-10 22:08 UTC (permalink / raw)
  To: Keller, Jacob E
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com, sassmann@redhat.com, Richard Cochran,
	Vick, Matthew, Brandeburg, Jesse
In-Reply-To: <1389390171.8039.21.camel@jekeller-desk1.amr.corp.intel.com>

On Fri, 2014-01-10 at 21:42 +0000, Keller, Jacob E wrote:
> On Fri, 2014-01-10 at 20:37 +0000, Ben Hutchings wrote:
> > On Fri, 2014-01-10 at 12:30 -0800, Jeff Kirsher wrote:
> > > From: Jacob Keller <jacob.e.keller@intel.com>
> > > 
> > > New feature: Enable PTP support in the i40e driver.
> > > 
> > > Change-ID: I6a8e799f582705191f9583afb1b9231a8db96cc8
> > > Cc: Richard Cochran <richardcochran@gmail.com>
> > > Cc: Ben Hutchings <bhutchings@solarflare.com>
> > > Signed-off-by: Matthew Vick <matthew.vick@intel.com>
> > > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> > > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > > ---
> > >  drivers/net/ethernet/intel/Kconfig             |   1 +
> > >  drivers/net/ethernet/intel/i40e/Makefile       |   1 +
> > >  drivers/net/ethernet/intel/i40e/i40e.h         |  26 +
> > >  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  33 +-
> > >  drivers/net/ethernet/intel/i40e/i40e_main.c    |  47 +-
> > >  drivers/net/ethernet/intel/i40e/i40e_ptp.c     | 662 +++++++++++++++++++++++++
> > >  drivers/net/ethernet/intel/i40e/i40e_txrx.c    |  53 ++
> > >  drivers/net/ethernet/intel/i40e/i40e_txrx.h    |   3 +
> > >  8 files changed, 824 insertions(+), 2 deletions(-)
> > >  create mode 100644 drivers/net/ethernet/intel/i40e/i40e_ptp.c
> > > 
> > > diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> > > index 9fb2eb8..333bb54 100644
> > > --- a/drivers/net/ethernet/intel/Kconfig
> > > +++ b/drivers/net/ethernet/intel/Kconfig
> > [...]
> > > +void i40e_ptp_init(struct i40e_pf *pf)
> > > +{
> > > +	struct i40e_hw *hw = &pf->hw;
> > > +	struct net_device *netdev = pf->vsi[pf->lan_vsi]->netdev;
> > > +
> > > +	strncpy(pf->ptp_caps.name, netdev->name, sizeof(pf->ptp_caps.name));
> > [...]
> > 
> > I recommended using the *driver* name as the clock name.  The net device
> > doesn't even have a proper name at this point.

Actually, on closer inspection it looks like the net device gets
registered before this is called, so the name will be valid.  However
the net device can (and probably will) be renamed after registration so
this name may become stale.

If the net device is registered first, this raises a different problem:
i40e_get_ts_info() can then race with PTP init and cleanup; it can see a
non-null but invalid value of pf->ptp_clock if it races with this in
i40e_ptp_init():

> +	/* Attempt to register the clock before enabling the hardware. */
> +	pf->ptp_clock = ptp_clock_register(&pf->ptp_caps, &pf->pdev->dev);
> +	if (IS_ERR(pf->ptp_clock)) {

or it can see a non-null value of ptp_clock just before the clock is
freed, then use-after-free.

You could use rtnl_lock() to serialise the registration/unregistration
with i40e_get_ts_info(), or reorder init and cleanup so the clock is
registered before the net device and unregistered after.

> > Ben.
> > 
> 
> We use netdev->name a few lines farther down.. should that be changed as
> well?

I think that's fine if it's done after the net device is registered.
However you would normally use netdev_info() to include the name at the
beginning of the log line.

Ben.

> Sorry about this one. The ixgbe driver initializes clock at open
> instead of at init, so it has a valid clock name already.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] IPv6: enable bind() to assign an anycast address
From: François-Xavier Le Bail @ 2014-01-10 22:01 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki Yoshifuji, Patrick McHardy
In-Reply-To: <20140110205100.GB23647@order.stressinduktion.org>

On Fri, 1/10/14, Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:

>On Fri, Jan 10, 2014 at 07:12:24PM +0100, Francois-Xavier Le Bail wrote:
>> - Add ipv6_chk_acast_addr_src() to check if an anycast address is link-local
>>   on given interface or is global (on any interface).
>> - Use it in inet6_bind().
>> 
>> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
>> ---
>> Tested with link-local and global anycast addresses.
>> Tested with SOCK_DGRAM socket, bind and UDP traffic OK.
>>
>> Tested with SOCK_STREAM socket, bind OK, traffic need another change.

> What do you mean with that?

In actual TCP code, if the destination address is not unicast, the request is dropped:

	if (!ipv6_unicast_destination(skb))
		goto drop;

I will send another patch to allow anycat addresses destinations fot TCP. 

bye,
Francois-Xavier

^ permalink raw reply

* [Patch net-next] bridge: move br_net_exit() to br.c
From: Cong Wang @ 2014-01-10 21:58 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David S. Miller, Cong Wang

And it can become static.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

---
diff --git a/net/bridge/br.c b/net/bridge/br.c
index ba780cc..19311aa 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -22,14 +22,29 @@
 
 #include "br_private.h"
 
-static const struct stp_proto br_stp_proto = {
-	.rcv	= br_stp_rcv,
-};
+static void __net_exit br_net_exit(struct net *net)
+{
+	struct net_device *dev;
+	LIST_HEAD(list);
+
+	rtnl_lock();
+	for_each_netdev(net, dev)
+		if (dev->priv_flags & IFF_EBRIDGE)
+			br_dev_delete(dev, &list);
+
+	unregister_netdevice_many(&list);
+	rtnl_unlock();
+
+}
 
 static struct pernet_operations br_net_ops = {
 	.exit	= br_net_exit,
 };
 
+static const struct stp_proto br_stp_proto = {
+	.rcv	= br_stp_rcv,
+};
+
 static int __init br_init(void)
 {
 	int err;
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 1f6bd1e..cffe1d6 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -455,18 +455,3 @@ int br_del_if(struct net_bridge *br, struct net_device *dev)
 
 	return 0;
 }
-
-void __net_exit br_net_exit(struct net *net)
-{
-	struct net_device *dev;
-	LIST_HEAD(list);
-
-	rtnl_lock();
-	for_each_netdev(net, dev)
-		if (dev->priv_flags & IFF_EBRIDGE)
-			br_dev_delete(dev, &list);
-
-	unregister_netdevice_many(&list);
-	rtnl_unlock();
-
-}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 3733f15..fcd1233 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -407,7 +407,6 @@ void br_flood_forward(struct net_bridge *br, struct sk_buff *skb,
 void br_port_carrier_check(struct net_bridge_port *p);
 int br_add_bridge(struct net *net, const char *name);
 int br_del_bridge(struct net *net, const char *name);
-void br_net_exit(struct net *net);
 int br_add_if(struct net_bridge *br, struct net_device *dev);
 int br_del_if(struct net_bridge *br, struct net_device *dev);
 int br_min_mtu(const struct net_bridge *br);

^ permalink raw reply related

* Re: [net-next v2 14/17] i40e: enable PTP
From: Keller, Jacob E @ 2014-01-10 21:42 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com, sassmann@redhat.com, Richard Cochran,
	Vick, Matthew, Brandeburg, Jesse
In-Reply-To: <1389386245.2025.98.camel@bwh-desktop.uk.level5networks.com>

On Fri, 2014-01-10 at 20:37 +0000, Ben Hutchings wrote:
> On Fri, 2014-01-10 at 12:30 -0800, Jeff Kirsher wrote:
> > From: Jacob Keller <jacob.e.keller@intel.com>
> > 
> > New feature: Enable PTP support in the i40e driver.
> > 
> > Change-ID: I6a8e799f582705191f9583afb1b9231a8db96cc8
> > Cc: Richard Cochran <richardcochran@gmail.com>
> > Cc: Ben Hutchings <bhutchings@solarflare.com>
> > Signed-off-by: Matthew Vick <matthew.vick@intel.com>
> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> >  drivers/net/ethernet/intel/Kconfig             |   1 +
> >  drivers/net/ethernet/intel/i40e/Makefile       |   1 +
> >  drivers/net/ethernet/intel/i40e/i40e.h         |  26 +
> >  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  33 +-
> >  drivers/net/ethernet/intel/i40e/i40e_main.c    |  47 +-
> >  drivers/net/ethernet/intel/i40e/i40e_ptp.c     | 662 +++++++++++++++++++++++++
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.c    |  53 ++
> >  drivers/net/ethernet/intel/i40e/i40e_txrx.h    |   3 +
> >  8 files changed, 824 insertions(+), 2 deletions(-)
> >  create mode 100644 drivers/net/ethernet/intel/i40e/i40e_ptp.c
> > 
> > diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> > index 9fb2eb8..333bb54 100644
> > --- a/drivers/net/ethernet/intel/Kconfig
> > +++ b/drivers/net/ethernet/intel/Kconfig
> [...]
> > +void i40e_ptp_init(struct i40e_pf *pf)
> > +{
> > +	struct i40e_hw *hw = &pf->hw;
> > +	struct net_device *netdev = pf->vsi[pf->lan_vsi]->netdev;
> > +
> > +	strncpy(pf->ptp_caps.name, netdev->name, sizeof(pf->ptp_caps.name));
> [...]
> 
> I recommended using the *driver* name as the clock name.  The net device
> doesn't even have a proper name at this point.
> 
> Ben.
> 

We use netdev->name a few lines farther down.. should that be changed as
well? Sorry about this one. The ixgbe driver initializes clock at open
instead of at init, so it has a valid clock name already.

Regards,
Jake

^ permalink raw reply

* Re: [PATCH net] inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
From: Eric Dumazet @ 2014-01-10 21:32 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, netdev, Eric Dumazet
In-Reply-To: <1389386085-22659-1-git-send-email-ncardwell@google.com>

On Fri, 2014-01-10 at 15:34 -0500, Neal Cardwell wrote:
> Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
> and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
> (not just TIME_WAIT), and for such sockets the tw_substate field holds
> the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.
> 
> This brings the inet_diag state-matching code in line with the field
> it uses to populate idiag_state. This is also analogous to the info
> exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
> and get_timewait4_sock() exports tw->tw_substate.
> 
> Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
> fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
> state time-wait" would also return sockets in state TCP_FIN_WAIT2.
> 
> This is an old bug that predates 05dbc7b ("tcp/dccp: remove twchain").
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH 3/3] atl1: update statistics code
From: Sabrina Dubroca @ 2014-01-10 21:30 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: davem, netdev, jcliburn, chris.snook
In-Reply-To: <1389378566.2025.78.camel@bwh-desktop.uk.level5networks.com>

2014-01-10, 18:29:26 +0000, Ben Hutchings wrote:
> On Fri, 2014-01-10 at 17:08 +0100, Sabrina Dubroca wrote:
> > As Ben Hutchings pointed out for the stats in alx, some
> > hardware-specific stats aren't matched to the right net_device_stats
> > field. Also fix the collision field and include errors in the total
> > number of RX/TX packets.
> > 
> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > ---
> >  drivers/net/ethernet/atheros/atlx/atl1.c | 16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/atheros/atlx/atl1.c b/drivers/net/ethernet/atheros/atlx/atl1.c
> > index 538211d..31d460a 100644
> > --- a/drivers/net/ethernet/atheros/atlx/atl1.c
> > +++ b/drivers/net/ethernet/atheros/atlx/atl1.c
> [...]
> > @@ -1718,23 +1718,18 @@ static void atl1_inc_smb(struct atl1_adapter *adapter)
> >  	adapter->soft_stats.tx_trunc += smb->tx_trunc;
> >  	adapter->soft_stats.tx_pause += smb->tx_pause;
> >  
> > -	netdev->stats.rx_packets = adapter->soft_stats.rx_packets;
> > -	netdev->stats.tx_packets = adapter->soft_stats.tx_packets;
> >  	netdev->stats.rx_bytes = adapter->soft_stats.rx_bytes;
> >  	netdev->stats.tx_bytes = adapter->soft_stats.tx_bytes;
> >  	netdev->stats.multicast = adapter->soft_stats.multicast;
> >  	netdev->stats.collisions = adapter->soft_stats.collisions;
> >  	netdev->stats.rx_errors = adapter->soft_stats.rx_errors;
> > -	netdev->stats.rx_over_errors =
> > -		adapter->soft_stats.rx_missed_errors;
> >  	netdev->stats.rx_length_errors =
> >  		adapter->soft_stats.rx_length_errors;
> >  	netdev->stats.rx_crc_errors = adapter->soft_stats.rx_crc_errors;
> >  	netdev->stats.rx_frame_errors =
> >  		adapter->soft_stats.rx_frame_errors;
> >  	netdev->stats.rx_fifo_errors = adapter->soft_stats.rx_fifo_errors;
> > -	netdev->stats.rx_missed_errors =
> > -		adapter->soft_stats.rx_missed_errors;
> 
> So adapter->soft_stats.rx_missed_errors is set (to something silly) and
> then ignored...

Ignored in netdev->stats, but still used in ethtool stats, as both
rx_over_errors and rx_missed_errors. I don't want to rename or remove
ethtool stats entries, so I could leave it set to 0.


> > +	netdev->stats.rx_dropped = adapter->soft_stats.rx_rrd_ov;
> >  	netdev->stats.tx_errors = adapter->soft_stats.tx_errors;
> >  	netdev->stats.tx_fifo_errors = adapter->soft_stats.tx_fifo_errors;
> >  	netdev->stats.tx_aborted_errors =
> > @@ -1743,6 +1738,11 @@ static void atl1_inc_smb(struct atl1_adapter *adapter)
> >  		adapter->soft_stats.tx_window_errors;
> >  	netdev->stats.tx_carrier_errors =
> >  		adapter->soft_stats.tx_carrier_errors;
> > +
> > +	netdev->stats.rx_packets = adapter->soft_stats.rx_packets +
> > +				   netdev->stats.rx_errors;
> > +	netdev->stats.tx_packets = adapter->soft_stats.tx_packets +
> > +				   netdev->stats.tx_errors;
> 
> Given that adapter->soft_stats largely mirrors struct net_device_stats,
> would it not make sense to do these additions there so that
> adapter->soft_stats.{rx,tx}_packets include all packets?
> 
> I.e. you could do something like:
> 
> 	delta_rx_errors = (smb->rx_frag + smb->rx_fcs_err +
> 			   smb->rx_len_err + smb->rx_sz_ov + smb->rx_rxf_ov +
> 			   smb->rx_rrd_ov + smb->rx_align_err);
> 	adapter->soft_stats.rx_errors += delta_rx_errors;
> 	adapter->soft_stats.rx_packets += delta_rx_errors;
> 
> (and similarly for TX).
> 
> Basically I think these changes belong further up the function.

Oops, yeah.


I just noticed this bit in atl1_alloc_rx_buffers:

static u16 atl1_alloc_rx_buffers(struct atl1_adapter *adapter)
{
<...>
		skb = netdev_alloc_skb_ip_align(adapter->netdev,
						adapter->rx_buffer_len);
		if (unlikely(!skb)) {
			/* Better luck next round */
			adapter->netdev->stats.rx_dropped++;
			break;
		}
<...>
}

Now that netdev->stats.rx_dropped gets overwritten, it should be
changed to a field in soft_stats. soft_stats doesn't have a rx_dropped
field, so rx_rrd_ov? Or do I add rx_dropped to soft_stats?



Thanks a lot Ben!

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH v2 net-next] net: vxlan: when lower dev unregisters remove vxlan dev as well
From: Stephen Hemminger @ 2014-01-10 21:01 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev
In-Reply-To: <1389386780-24827-1-git-send-email-dborkman@redhat.com>

On Fri, 10 Jan 2014 21:46:20 +0100
Daniel Borkmann <dborkman@redhat.com> wrote:

> +static int vxlan_lowerdev_event(struct notifier_block *unused,
> +				unsigned long event, void *ptr)
> +{
> +	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
> +
> +	switch (event) {
> +	case NETDEV_UNREGISTER:
> +		/* Twiddle thumbs on netns device moves. */
> +		if (dev->reg_state != NETREG_UNREGISTERING)
> +			break;

No. Don't look at reg_state here. If moved from namespace
then it is a good as gone.

^ permalink raw reply

* Re: [PATCH net-next] IPv6: enable bind() to assign an anycast address
From: Hannes Frederic Sowa @ 2014-01-10 20:51 UTC (permalink / raw)
  To: Francois-Xavier Le Bail
  Cc: netdev, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki Yoshifuji, Patrick McHardy
In-Reply-To: <1389377544-3119-1-git-send-email-fx.lebail@yahoo.com>

Hi!

On Fri, Jan 10, 2014 at 07:12:24PM +0100, Francois-Xavier Le Bail wrote:
> - Add ipv6_chk_acast_addr_src() to check if an anycast address is link-local
>   on given interface or is global (on any interface).
> - Use it in inet6_bind().
> 
> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
> ---
> Tested with link-local and global anycast addresses.
> Tested with SOCK_DGRAM socket, bind and UDP traffic OK.

> Tested with SOCK_STREAM socket, bind OK, traffic need another change.

What do you mean with that?

Bye,

  Hannes

^ permalink raw reply

* Re: [PATCH v2 05/16] blackfin: Update stmmac callback signatures
From: Sergei Shtylyov @ 2014-01-10 20:47 UTC (permalink / raw)
  To: Chen-Yu Tsai, Srinivas Kandagatla, Giuseppe Cavallaro,
	Maxime Ripard, Mike Frysinger
  Cc: Mike Turquette, Emilio Lopez, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Rob Herring,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	uclinux-dist-devel-ZG0+EudsQA8dtHy/vicBwGD2FQJk+8+b,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <1389337217-29032-6-git-send-email-wens-jdAy2FN1RRM@public.gmane.org>

Hello.

On 01/10/2014 10:00 AM, Chen-Yu Tsai wrote:

> stmmac callbacks have been extended for better seperation.

    Only "separation".

> Update them to avoid breakage.

> Signed-off-by: Chen-Yu Tsai <wens-jdAy2FN1RRM@public.gmane.org>

WBR, Sergei

^ permalink raw reply

* [PATCH v2 net-next] net: vxlan: when lower dev unregisters remove vxlan dev as well
From: Daniel Borkmann @ 2014-01-10 20:46 UTC (permalink / raw)
  To: davem; +Cc: netdev

We can create a vxlan device with an explicit underlying carrier.
In that case, when the carrier link is being deleted from the
system (e.g. due to module unload) we should also clean up all
created vxlan devices on top of it since otherwise we're in an
inconsistent state in vxlan device. In that case, the user needs
to remove all such devices, while in case of other virtual devs
that sit on top of physical ones, it is usually the case that
these devices do unregister automatically as well and do not
leave the burden on the user.

This work is not necessary when vxlan device was not created with
a real underlying device, as connections can resume in that case
when driver is plugged again. But at least for the other cases,
we should go ahead and do the cleanup on removal.

We don't register the notifier during vxlan_newlink() here since
I consider this event rather rare, and therefore we should not
bloat vxlan's core structure unecessary. Also, we can simply make
use of unregister_netdevice_many() to batch that.

E.g. `ip -d link show vxlan13` after carrier removal before
this patch:

5: vxlan13: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default
    link/ether 1e:47:da:6d:4d:99 brd ff:ff:ff:ff:ff:ff promiscuity 0
    vxlan id 13 group 239.0.0.10 dev 2 port 32768 61000 ageing 300
                                 ^^^^^
While at it, we should also use vxlan_dellink() handler in
vxlan_exit_net(), since i) we're not in hurry and we should better
be consistent, ii) in case future code will kfree() things in
vxlan_dellink(), we would leak it right here unnoticed. Therfore,
do not only half of the cleanup work, but make it properly.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 v1->v2:
  - Removed BUG_ON as it's not needed.

 drivers/net/vxlan.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 481f85d..479fe97 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2656,6 +2656,52 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.fill_info	= vxlan_fill_info,
 };
 
+static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
+					     struct net_device *dev)
+{
+	struct vxlan_dev *vxlan, *next;
+	LIST_HEAD(list_kill);
+
+	list_for_each_entry_safe(vxlan, next, &vn->vxlan_list, next) {
+		struct vxlan_rdst *dst = &vxlan->default_dst;
+
+		/* In case we created vxlan device with carrier
+		 * and we loose the carrier due to module unload
+		 * we also need to remove vxlan device. In other
+		 * cases, it's not necessary and remote_ifindex
+		 * is 0 here, so no matches.
+		 */
+		if (dst->remote_ifindex == dev->ifindex)
+			vxlan_dellink(vxlan->dev, &list_kill);
+	}
+
+	unregister_netdevice_many(&list_kill);
+	list_del(&list_kill);
+}
+
+static int vxlan_lowerdev_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
+
+	switch (event) {
+	case NETDEV_UNREGISTER:
+		/* Twiddle thumbs on netns device moves. */
+		if (dev->reg_state != NETREG_UNREGISTERING)
+			break;
+
+		vxlan_handle_lowerdev_unregister(vn, dev);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block vxlan_notifier_block __read_mostly = {
+	.notifier_call = vxlan_lowerdev_event,
+};
+
 static __net_init int vxlan_init_net(struct net *net)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
@@ -2673,14 +2719,16 @@ static __net_init int vxlan_init_net(struct net *net)
 static __net_exit void vxlan_exit_net(struct net *net)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
-	struct vxlan_dev *vxlan;
-	LIST_HEAD(list);
+	struct vxlan_dev *vxlan, *next;
+	LIST_HEAD(list_kill);
 
 	rtnl_lock();
-	list_for_each_entry(vxlan, &vn->vxlan_list, next)
-		unregister_netdevice_queue(vxlan->dev, &list);
-	unregister_netdevice_many(&list);
+	list_for_each_entry_safe(vxlan, next, &vn->vxlan_list, next)
+		vxlan_dellink(vxlan->dev, &list_kill);
+	unregister_netdevice_many(&list_kill);
 	rtnl_unlock();
+
+	list_del(&list_kill);
 }
 
 static struct pernet_operations vxlan_net_ops = {
@@ -2704,12 +2752,17 @@ static int __init vxlan_init_module(void)
 	if (rc)
 		goto out1;
 
-	rc = rtnl_link_register(&vxlan_link_ops);
+	rc = register_netdevice_notifier(&vxlan_notifier_block);
 	if (rc)
 		goto out2;
 
-	return 0;
+	rc = rtnl_link_register(&vxlan_link_ops);
+	if (rc)
+		goto out3;
 
+	return 0;
+out3:
+	unregister_netdevice_notifier(&vxlan_notifier_block);
 out2:
 	unregister_pernet_device(&vxlan_net_ops);
 out1:
@@ -2721,6 +2774,7 @@ late_initcall(vxlan_init_module);
 static void __exit vxlan_cleanup_module(void)
 {
 	rtnl_link_unregister(&vxlan_link_ops);
+	unregister_netdevice_notifier(&vxlan_notifier_block);
 	destroy_workqueue(vxlan_wq);
 	unregister_pernet_device(&vxlan_net_ops);
 	rcu_barrier();
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH] net: Check skb->rxhash in gro_receive
From: Eric Dumazet @ 2014-01-10 20:43 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List
In-Reply-To: <CA+mtBx-OHxHCTQO50F9_aw=T8VSg93E2eCkzdLL3PZapcR-CFQ@mail.gmail.com>

On Fri, 2014-01-10 at 11:42 -0800, Tom Herbert wrote:
> > Really, 99% of the time gro_list contains zero or one single slot, I
> > have hard data saying so.
> >
> Please provide the data.

Based on a Google patch you can run on your lab host. 

lpaa5:~# ethtool -S eth1|egrep "rx_packets|gro"
     rx_packets: 235709959
     gro_complete[0]: 5895809
     gro_overflows[0]: 1495
     gro_nogro[0]: 63584
     gro_msgs[0]: 9791617
     gro_segs[0]: 52290544
     gro_flush[0]: 25
     gro_complete[1]: 6464583
     gro_overflows[1]: 5920
     gro_nogro[1]: 74680
     gro_msgs[1]: 11797481
     gro_segs[1]: 62081299
     gro_flush[1]: 35
     gro_complete[2]: 6289929
     gro_overflows[2]: 4016
     gro_nogro[2]: 71479
     gro_msgs[2]: 11111473
     gro_segs[2]: 58518690
     gro_flush[2]: 42
     gro_complete[3]: 6448928
     gro_overflows[3]: 6781
     gro_nogro[3]: 76417
     gro_msgs[3]: 11845717
     gro_segs[3]: 62730931
     gro_flush[3]: 42
     gro_complete: 25099249
     gro_overflows: 18212
     gro_nogro: 286160
     gro_msgs: 44546288
     gro_segs: 235621464
     gro_flush: 144

The key is the gro_complete thing, meaning that most NAPI poll are
completed and GRO list flushed.

Here the results are from a synthetic bench (400 concurrent TCP_STREAM),
very small number of RX queues (trying to force stress load you want),
increase coalescing parameters on the NIC (to really try to increase
batches load), plenty of ECN marking to force GRO flushes, and even so,
you can see :

Average aggregation is : (235621464-286160)/44546288 = 5.28 frames per
GRO packet.

average NAPI run handles 235709959/25099249 = 9.39 packets

Very few overflows of the gro_list : 18212


> > If you want to optimize the case where list is fully populated (because
> > of yet another synthetic benchmark you use), you really need to build a
> > temporary list so that all layers do not even have to check
> > NAPI_GRO_CB(p)->same_flow
> >
> Well if you prefer, you can think of the "synthetic benchmark" as
> emulating an obvious DOS attack by pumping MSS sized TCP segments with
> random ports to a server. The stack needs to be resilient to such
> things, an O(n*m) algorithm in the data path is a red flag.

SYN floods are way more effective, or sending small packets with
MSS=10 : Even with one flow your host wont be resilient at all.

n is what , and m is what ?

GRO is O(8), or O(1). This is the least of your concerns under attack.

I played last year adding a hash table (based on rxhash), and
my performance tests were not good enough.

profile exactly showed flow dissection being a problem. This is what
your patch is trying to do.

Optimizing GRO stack to better react to attacks, but lowering
performance in normal cases was not a win.

So if you believe your patch is really useful in its current form,
please provide your data.

My opinion is that this one liner is much better.

diff --git a/net/core/dev.c b/net/core/dev.c
index ce01847793c0..be3135d6c12a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3800,6 +3800,7 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
 
 		diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;
 		diffs |= p->vlan_tci ^ skb->vlan_tci;
+		diffs |= p->rxhash ^ skb->rxhash;
 		if (maclen == ETH_HLEN)
 			diffs |= compare_ether_header(skb_mac_header(p),
 						      skb_gro_mac_header(skb));

^ permalink raw reply related

* Re: [PATCH 1/1] drivers: net: silence compiler warning in smc91x.c
From: Sergei Shtylyov @ 2014-01-10 20:42 UTC (permalink / raw)
  To: Pankaj Dubey, linux-samsung-soc, linux-arm-kernel
  Cc: David S. Miller, Jingoo Han, netdev
In-Reply-To: <1389323046-4521-1-git-send-email-pankaj.dubey@samsung.com>

Hello.

On 01/10/2014 06:04 AM, Pankaj Dubey wrote:

> If used 64 bit compiler GCC warns that:

> drivers/net/ethernet/smsc/smc91x.c:1897:7:
> warning: cast from pointer to integer of different
> size [-Wpointer-to-int-cast]

> This patch fixes this by changing typecase from "unsigned int" to "unsigned long"

    Only "typecast".

> CC: "David S. Miller" <davem@davemloft.net>
> CC: Jingoo Han <jg1.han@samsung.com>
> CC: netdev@vger.kernel.org
> Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>

WBR, Sergei

^ permalink raw reply

* [PATCH net] inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
From: Neal Cardwell @ 2014-01-10 20:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Eric Dumazet

Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
(not just TIME_WAIT), and for such sockets the tw_substate field holds
the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.

This brings the inet_diag state-matching code in line with the field
it uses to populate idiag_state. This is also analogous to the info
exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
and get_timewait4_sock() exports tw->tw_substate.

Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
state time-wait" would also return sockets in state TCP_FIN_WAIT2.

This is an old bug that predates 05dbc7b ("tcp/dccp: remove twchain").

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_diag.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index a0f52da..e34dccb 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -930,12 +930,15 @@ skip_listen_ht:
 		spin_lock_bh(lock);
 		sk_nulls_for_each(sk, node, &head->chain) {
 			int res;
+			int state;
 
 			if (!net_eq(sock_net(sk), net))
 				continue;
 			if (num < s_num)
 				goto next_normal;
-			if (!(r->idiag_states & (1 << sk->sk_state)))
+			state = (sk->sk_state == TCP_TIME_WAIT) ?
+				inet_twsk(sk)->tw_substate : sk->sk_state;
+			if (!(r->idiag_states & (1 << state)))
 				goto next_normal;
 			if (r->sdiag_family != AF_UNSPEC &&
 			    sk->sk_family != r->sdiag_family)
-- 
1.8.5.1

^ permalink raw reply related

* Re: [PATCH 08/23] netfilter: nft_ct: load both IPv4 and IPv6 conntrack modules for NFPROTO_INET
From: Sergei Shtylyov @ 2014-01-10 20:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1389314142-17969-9-git-send-email-pablo@netfilter.org>

On 01/10/2014 03:35 AM, Pablo Neira Ayuso wrote:

> From: Patrick McHardy <kaber@trash.net>

> The ct expression can currently not be used in the inet family since
> we don't have a conntrack module for NFPROTO_INET, so
> nf_ct_l3proto_try_module_get() fails. Add some manual handling to
> load the modules for both NFPROTO_IPV4 and NFPROTO_IPV6 if the
> ct expression is used in the inet family.

> Signed-off-by: Patrick McHardy <kaber@trash.net>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>   net/netfilter/nft_ct.c |   39 ++++++++++++++++++++++++++++++++++++---
>   1 file changed, 36 insertions(+), 3 deletions(-)

> diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
> index 955f4e6..3727a32 100644
> --- a/net/netfilter/nft_ct.c
> +++ b/net/netfilter/nft_ct.c
[...]
> +static void nft_ct_l3proto_module_put(uint8_t family)
> +{
> +	if (family == NFPROTO_INET) {
> +		nf_ct_l3proto_module_put(NFPROTO_IPV4);
> +		nf_ct_l3proto_module_put(NFPROTO_IPV6);
> +	} else
> +		nf_ct_l3proto_module_put(family);

    According to Documentation/CodingStyle, there should be {} in the *else* 
arm, as the other arm of *if* statement has it.

> +}
> +

WBR, Sergei


^ permalink raw reply

* Re: Layer 2 acceleration vs GSO
From: Ben Hutchings @ 2014-01-10 20:39 UTC (permalink / raw)
  To: Neil Horman; +Cc: John Fastabend, David S. Miller, Andy Gospodarek, netdev
In-Reply-To: <20140110203222.GB2645@hmsreliant.think-freely.org>

On Fri, 2014-01-10 at 15:32 -0500, Neil Horman wrote:
> On Fri, Jan 10, 2014 at 08:05:39PM +0000, Ben Hutchings wrote:
> > What happens when an skb to be sent through ndo_dfwd_start_xmit()
> > requires software GSO?
> > 
> > dev_hard_start_xmit() will segment it and then submit each segment, and
> > then:
> > 
> > > gso:
> > > 	do {
> > [...]
> > > 		if (accel_priv)
> > > 			rc = ops->ndo_dfwd_start_xmit(nskb, dev, accel_priv);
> > > 		else
> > > 			rc = ops->ndo_start_xmit(nskb, dev);
> > > 		trace_net_dev_xmit(nskb, rc, dev, skb_len);
> > [...]
> > > 		txq_trans_update(txq);
> > 
> > Oops, txq is NULL.  And once we add the obvious condition to that,
> > 
> > > 		if (unlikely(netif_xmit_stopped(txq) && skb->next))
> > > 			return NETDEV_TX_BUSY;
> > 
> > How can we tell if the hardware transmit queue filled up?
> > 
> > It seems like this feature currently relies on the driver returning
> > NETDEV_TX_BUSY when the TX queue is already full, like ixgbe does.  But
> > that is exactly what drivers are *not* supposed to do.
> > 
> > Ben.
> > 
> > > 	} while (skb->next);
> > 
> Dave just to a fix to deal with this (among some other issues) here today:
> http://marc.info/?l=linux-kernel&m=138934276507518&w=2
> Neil

Sorry, I didn't think to check the net tree as I this feature was in
net-next only.  Good to know it's fixed.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [net-next v2 14/17] i40e: enable PTP
From: Ben Hutchings @ 2014-01-10 20:37 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, Jacob Keller, netdev, gospo, sassmann, Richard Cochran,
	Matthew Vick, Jesse Brandeburg
In-Reply-To: <1389385839-11996-15-git-send-email-jeffrey.t.kirsher@intel.com>

On Fri, 2014-01-10 at 12:30 -0800, Jeff Kirsher wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> New feature: Enable PTP support in the i40e driver.
> 
> Change-ID: I6a8e799f582705191f9583afb1b9231a8db96cc8
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Ben Hutchings <bhutchings@solarflare.com>
> Signed-off-by: Matthew Vick <matthew.vick@intel.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  drivers/net/ethernet/intel/Kconfig             |   1 +
>  drivers/net/ethernet/intel/i40e/Makefile       |   1 +
>  drivers/net/ethernet/intel/i40e/i40e.h         |  26 +
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  33 +-
>  drivers/net/ethernet/intel/i40e/i40e_main.c    |  47 +-
>  drivers/net/ethernet/intel/i40e/i40e_ptp.c     | 662 +++++++++++++++++++++++++
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c    |  53 ++
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h    |   3 +
>  8 files changed, 824 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/net/ethernet/intel/i40e/i40e_ptp.c
> 
> diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> index 9fb2eb8..333bb54 100644
> --- a/drivers/net/ethernet/intel/Kconfig
> +++ b/drivers/net/ethernet/intel/Kconfig
[...]
> +void i40e_ptp_init(struct i40e_pf *pf)
> +{
> +	struct i40e_hw *hw = &pf->hw;
> +	struct net_device *netdev = pf->vsi[pf->lan_vsi]->netdev;
> +
> +	strncpy(pf->ptp_caps.name, netdev->name, sizeof(pf->ptp_caps.name));
[...]

I recommended using the *driver* name as the clock name.  The net device
doesn't even have a proper name at this point.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox