Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 00/13] net_sched: misc cleanups and improvements
From: David Miller @ 2014-11-06 19:03 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiyou.wangcong, netdev, jhs
In-Reply-To: <1415297873.13896.77.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 06 Nov 2014 10:17:53 -0800

> There is a difference from newbies and you.
> 
> As a community, we welcome new comers and encourage them,
> but after a while, people sending mostly cleanups are shifting in a
> category which doesn't fit to you.
> 
> We expect from you more interesting stuff. You can do it.

+1

> If you want to send cleanups, do this once in a while. Do not send 13
> patches and expect us to be happy with that. We are not.

+1

^ permalink raw reply

* Re: [PATCH net] dcbnl : Fix lock initialization
From: John Fastabend @ 2014-11-06 19:03 UTC (permalink / raw)
  To: Anish Bhatt
  Cc: netdev, davem, john.r.fastabend, ying.xue, jeffrey.t.kirsher,
	ebiederm
In-Reply-To: <1415297355-27282-1-git-send-email-anish@chelsio.com>

On 11/06/2014 10:09 AM, Anish Bhatt wrote:
> dcb_lock was being used uninitialized in dcbnl and is infact missing
>   initialization code. Fixed
>

Are you trying to resolve a bug? It is initialized with

static DEFINE_SPINLOCK(dcb_lock);

and if you follow the code far enough you get to this in
spinlock_types.h:


  #ifdef CONFIG_DEBUG_SPINLOCK
  # define SPIN_DEBUG_INIT(lockname)      \
      .magic = SPINLOCK_MAGIC,        \
      .owner_cpu = -1,            \
      .owner = SPINLOCK_OWNER_INIT,
  #else
  # define SPIN_DEBUG_INIT(lockname)
  #endif

  #define __RAW_SPIN_LOCK_INITIALIZER(lockname)   \
      {                   \
      .raw_lock = __ARCH_SPIN_LOCK_UNLOCKED,  \
      SPIN_DEBUG_INIT(lockname)       \
      SPIN_DEP_MAP_INIT(lockname) }

[...]



-- 
John Fastabend         Intel Corporation

^ permalink raw reply

* Re: [PATCH 00/13] net_sched: misc cleanups and improvements
From: David Miller @ 2014-11-06 19:02 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: eric.dumazet, netdev, jhs
In-Reply-To: <CAM_iQpWw4UMKZcdZfpp5D-tDfj954fbptyXJUzydgFCero6xNw@mail.gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Thu, 6 Nov 2014 10:05:41 -0800

> On Tue, Nov 4, 2014 at 5:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2014-11-04 at 17:25 -0800, Cong Wang wrote:
>>
>>> Seriously, think about why it should when it's just cleanup's, be practical.
>>
>> I seriously ask you to not do cleanups then.
> 
> Apparently you didn't say this when the following commits got accepted:

I very strongly encourage you to not go arguing down this road.

The issues with your submissions is the amount of churn as well as the
terseness of explanations.

^ permalink raw reply

* [PATCH] man: ip-link: fix a typo
From: Masatake YAMATO @ 2014-11-06 18:57 UTC (permalink / raw)
  To: netdev; +Cc: yamato

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
---
 man/man8/ip-link.8.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 4ee1d62..6d32f5e 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -54,7 +54,7 @@ ip-link \- network device configuration
 .ti -8
 .IR TYPE " := [ "
 .BR bridge " | "
-.BR bond " ]"
+.BR bond " | "
 .BR can " | "
 .BR dummy " | "
 .BR hsr " | "
-- 
1.9.3

^ permalink raw reply related

* Re: [patch net-next 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
From: Scott Feldman @ 2014-11-06 18:57 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo
In-Reply-To: <545BA8EF.4060601@gmail.com>



On Thu, 6 Nov 2014, Florian Fainelli wrote:

> On 11/06/2014 01:20 AM, Jiri Pirko wrote:
>> From: Scott Feldman <sfeldma@gmail.com>
>>
>> To notify switch driver of change in STP state of bridge port, add new
>> .ndo op and provide swdev wrapper func to call ndo op. Use it in bridge
>> code then.
>>
>> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>
> [snip]
>
>>  #endif /* _LINUX_SWITCHDEV_H_ */
>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>> index 86c239b..13fecf1 100644
>> --- a/net/bridge/br_netlink.c
>> +++ b/net/bridge/br_netlink.c
>> @@ -17,6 +17,7 @@
>>  #include <net/net_namespace.h>
>>  #include <net/sock.h>
>>  #include <uapi/linux/if_bridge.h>
>> +#include <net/switchdev.h>
>>
>>  #include "br_private.h"
>>  #include "br_private_stp.h"
>> @@ -304,6 +305,7 @@ static int br_set_port_state(struct net_bridge_port *p, u8 state)
>>
>>  	br_set_state(p, state);
>>  	br_log_state(p);
>> +	netdev_sw_port_stp_update(p->dev, p->state);
>
> Is there a reason netdev_sw_port_stp_update() is not folded in
> br_set_state()? Are we missing calls to br_set_state() in some locations?

I put the netdev_sw call at the same level as br_log_state() and 
br_ifinfo_notify(), but now that you bring up the question, I agree it 
would be cleaner/safer if netdev_sw call was from br_set_state().

> --
> Florian
>

^ permalink raw reply

* Re: [PATCH net 3/5] fm10k: Implement ndo_gso_check()
From: Joe Stringer @ 2014-11-06 18:41 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, sathya.perla, jeffrey.t.kirsher, linux.nics, amirv,
	shahed.shaikh, Dept-GELinuxNICDev, therbert, linux-kernel
In-Reply-To: <545AE2C8.3070705@gmail.com>

On Wed, Nov 05, 2014 at 06:54:00PM -0800, Alexander Duyck wrote:
> On 11/04/2014 01:56 PM, Joe Stringer wrote:
> > ndo_gso_check() was recently introduced to allow NICs to report the
> > offloading support that they have on a per-skb basis. Add an
> > implementation for this driver which checks for something that looks
> > like VXLAN.
> >
> > Implementation shamelessly stolen from Tom Herbert:
> > http://thread.gmane.org/gmane.linux.network/332428/focus=333111
> >
> > Signed-off-by: Joe Stringer <joestringer@nicira.com>
> > ---
> > Should this driver report support for GSO on packets with tunnel headers
> > up to 64B like the i40e driver does?
> > ---
> >  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |   12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> > index 8811364..b9ef622 100644
> > --- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> > +++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
> > @@ -1350,6 +1350,17 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
> >  	}
> >  }
> >  
> > +static bool fm10k_gso_check(struct sk_buff *skb, struct net_device *dev)
> > +{
> > +	if ((skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) &&
> > +	    (skb->inner_protocol_type != ENCAP_TYPE_ETHER ||
> > +	     skb->inner_protocol != htons(ETH_P_TEB) ||
> > +	     skb_inner_mac_header(skb) - skb_transport_header(skb) != 16))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> >  static const struct net_device_ops fm10k_netdev_ops = {
> >  	.ndo_open		= fm10k_open,
> >  	.ndo_stop		= fm10k_close,
> > @@ -1372,6 +1383,7 @@ static const struct net_device_ops fm10k_netdev_ops = {
> >  	.ndo_do_ioctl		= fm10k_ioctl,
> >  	.ndo_dfwd_add_station	= fm10k_dfwd_add_station,
> >  	.ndo_dfwd_del_station	= fm10k_dfwd_del_station,
> > +	.ndo_gso_check		= fm10k_gso_check,
> >  };
> >  
> >  #define DEFAULT_DEBUG_LEVEL_SHIFT 3
> 
> I'm thinking this check is far too simplistic.  If you look the fm10k
> driver already has fm10k_tx_encap_offload() in the TSO function for
> verifying if it can support offloading tunnels or not.  I would
> recommend starting there or possibly even just adapting that function to
> suit your purpose.
> 
> Thanks,
> 
> Alex

Would it be enough to just call fm10k_tx_encap_offload() in a way that echoes fm10k_tso()?

+static bool fm10k_gso_check(struct sk_buff *skb, struct net_device *dev)
+{
+       if (skb->encapsulation && !fm10k_tx_encap_offload(skb))
+               return false;
+
+       return true;
+}

Thanks,
Joe

^ permalink raw reply

* [PATCH v4 net-next] udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts
From: Rick Jones @ 2014-11-06 18:37 UTC (permalink / raw)
  To: netdev; +Cc: davem


From: Rick Jones <rick.jones2@hp.com>

As NIC multicast filtering isn't perfect, and some platforms are
quite content to spew broadcasts, we should not trigger an event
for skb:kfree_skb when we do not have a match for such an incoming
datagram.  We do though want to avoid sweeping the matter under the
rug entirely, so increment a suitable statistic.

This incorporates feedback from David L. Stevens, Karl Neiss and Eric
Dumazet.

V3 - use bool per David Miller

Signed-off-by: Rick Jones <rick.jones2@hp.com>

---

Noticed __udp4_lib_mcast_deliver showing-up in a perf dropped packet
profile on a system sitting on a network with a bunch of Windows boxes
sending what they are fond of sending.

Verified that the new UDP_MIB_IGNOREDMULTI increments when ignored
datagrams are encountered, but was unable to cross the i's and dot
the t's of perf because the perf built from the tree at the time
wasn't happy in general.  Also hit a test system with some netperf
multicast UDP_STREAM and UDP_RR testing but that is the extent of 
the testing performed.

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index df40137..30f541b 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -156,6 +156,7 @@ enum
 	UDP_MIB_RCVBUFERRORS,			/* RcvbufErrors */
 	UDP_MIB_SNDBUFERRORS,			/* SndbufErrors */
 	UDP_MIB_CSUMERRORS,			/* InCsumErrors */
+	UDP_MIB_IGNOREDMULTI,			/* IgnoredMulti */
 	__UDP_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 8e3eb39..5c5450c 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -181,6 +181,7 @@ static const struct snmp_mib snmp4_udp_list[] = {
 	SNMP_MIB_ITEM("RcvbufErrors", UDP_MIB_RCVBUFERRORS),
 	SNMP_MIB_ITEM("SndbufErrors", UDP_MIB_SNDBUFERRORS),
 	SNMP_MIB_ITEM("InCsumErrors", UDP_MIB_CSUMERRORS),
+	SNMP_MIB_ITEM("IgnoredMulti", UDP_MIB_IGNOREDMULTI),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cd0db54..ebee9af 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1647,7 +1647,8 @@ static void udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
 static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udphdr  *uh,
 				    __be32 saddr, __be32 daddr,
-				    struct udp_table *udptable)
+				    struct udp_table *udptable,
+				    int proto)
 {
 	struct sock *sk, *stack[256 / sizeof(struct sock *)];
 	struct hlist_nulls_node *node;
@@ -1656,6 +1657,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	int dif = skb->dev->ifindex;
 	unsigned int count = 0, offset = offsetof(typeof(*sk), sk_nulls_node);
 	unsigned int hash2 = 0, hash2_any = 0, use_hash2 = (hslot->count > 10);
+	bool inner_flushed = false;
 
 	if (use_hash2) {
 		hash2_any = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum) &
@@ -1674,6 +1676,7 @@ start_lookup:
 					dif, hnum)) {
 			if (unlikely(count == ARRAY_SIZE(stack))) {
 				flush_stack(stack, count, skb, ~0);
+				inner_flushed = true;
 				count = 0;
 			}
 			stack[count++] = sk;
@@ -1695,7 +1698,10 @@ start_lookup:
 	if (count) {
 		flush_stack(stack, count, skb, count - 1);
 	} else {
-		kfree_skb(skb);
+		if (!inner_flushed)
+			UDP_INC_STATS_BH(net, UDP_MIB_IGNOREDMULTI,
+					 proto == IPPROTO_UDPLITE);
+		consume_skb(skb);
 	}
 	return 0;
 }
@@ -1780,7 +1786,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	} else {
 		if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
 			return __udp4_lib_mcast_deliver(net, skb, uh,
-					saddr, daddr, udptable);
+					saddr, daddr, udptable, proto);
 
 		sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	}
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 1752cd0..679253d0 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -136,6 +136,7 @@ static const struct snmp_mib snmp6_udp6_list[] = {
 	SNMP_MIB_ITEM("Udp6RcvbufErrors", UDP_MIB_RCVBUFERRORS),
 	SNMP_MIB_ITEM("Udp6SndbufErrors", UDP_MIB_SNDBUFERRORS),
 	SNMP_MIB_ITEM("Udp6InCsumErrors", UDP_MIB_CSUMERRORS),
+	SNMP_MIB_ITEM("Udp6IgnoredMulti", UDP_MIB_IGNOREDMULTI),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f6ba535..5bee6d2 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -771,7 +771,7 @@ static void udp6_csum_zero_error(struct sk_buff *skb)
  */
 static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
-		struct udp_table *udptable)
+		struct udp_table *udptable, int proto)
 {
 	struct sock *sk, *stack[256 / sizeof(struct sock *)];
 	const struct udphdr *uh = udp_hdr(skb);
@@ -781,6 +781,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	int dif = inet6_iif(skb);
 	unsigned int count = 0, offset = offsetof(typeof(*sk), sk_nulls_node);
 	unsigned int hash2 = 0, hash2_any = 0, use_hash2 = (hslot->count > 10);
+	bool inner_flushed = false;
 
 	if (use_hash2) {
 		hash2_any = udp6_portaddr_hash(net, &in6addr_any, hnum) &
@@ -803,6 +804,7 @@ start_lookup:
 		    (uh->check || udp_sk(sk)->no_check6_rx)) {
 			if (unlikely(count == ARRAY_SIZE(stack))) {
 				flush_stack(stack, count, skb, ~0);
+				inner_flushed = true;
 				count = 0;
 			}
 			stack[count++] = sk;
@@ -821,7 +823,10 @@ start_lookup:
 	if (count) {
 		flush_stack(stack, count, skb, count - 1);
 	} else {
-		kfree_skb(skb);
+		if (!inner_flushed)
+			UDP_INC_STATS_BH(net, UDP_MIB_IGNOREDMULTI,
+					 proto == IPPROTO_UDPLITE);
+		consume_skb(skb);
 	}
 	return 0;
 }
@@ -873,7 +878,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	 */
 	if (ipv6_addr_is_multicast(daddr))
 		return __udp6_lib_mcast_deliver(net, skb,
-				saddr, daddr, udptable);
+				saddr, daddr, udptable, proto);
 
 	/* Unicast */
 

^ permalink raw reply related

* [PATCH v3 net-next] udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts
From: Rick Jones @ 2014-11-06 18:36 UTC (permalink / raw)
  To: netdev; +Cc: davem


From: Rick Jones <rick.jones2@hp.com>

As NIC multicast filtering isn't perfect, and some platforms are
quite content to spew broadcasts, we should not trigger an event
for skb:kfree_skb when we do not have a match for such an incoming
datagram.  We do though want to avoid sweeping the matter under the
rug entirely, so increment a suitable statistic.

This incorporates feedback from David L. Stevens, Karl Neiss and Eric
Dumazet.

Signed-off-by: Rick Jones <rick.jones2@hp.com>

---

Noticed __udp4_lib_mcast_deliver showing-up in a perf dropped packet
profile on a system sitting on a network with a bunch of Windows boxes
sending what they are fond of sending.

Verified that the new UDP_MIB_IGNOREDMULTI increments when ignored
datagrams are encountered, but was unable to cross the i's and dot
the t's of perf because the perf built from the tree at the time
wasn't happy in general.  Also hit a test system with some netperf
multicast UDP_STREAM and UDP_RR testing but that is the extent of 
the testing performed.

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index df40137..30f541b 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -156,6 +156,7 @@ enum
 	UDP_MIB_RCVBUFERRORS,			/* RcvbufErrors */
 	UDP_MIB_SNDBUFERRORS,			/* SndbufErrors */
 	UDP_MIB_CSUMERRORS,			/* InCsumErrors */
+	UDP_MIB_IGNOREDMULTI,			/* IgnoredMulti */
 	__UDP_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 8e3eb39..5c5450c 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -181,6 +181,7 @@ static const struct snmp_mib snmp4_udp_list[] = {
 	SNMP_MIB_ITEM("RcvbufErrors", UDP_MIB_RCVBUFERRORS),
 	SNMP_MIB_ITEM("SndbufErrors", UDP_MIB_SNDBUFERRORS),
 	SNMP_MIB_ITEM("InCsumErrors", UDP_MIB_CSUMERRORS),
+	SNMP_MIB_ITEM("IgnoredMulti", UDP_MIB_IGNOREDMULTI),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cd0db54..1215f89 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1647,7 +1647,8 @@ static void udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
 static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udphdr  *uh,
 				    __be32 saddr, __be32 daddr,
-				    struct udp_table *udptable)
+				    struct udp_table *udptable,
+				    int proto)
 {
 	struct sock *sk, *stack[256 / sizeof(struct sock *)];
 	struct hlist_nulls_node *node;
@@ -1656,6 +1657,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	int dif = skb->dev->ifindex;
 	unsigned int count = 0, offset = offsetof(typeof(*sk), sk_nulls_node);
 	unsigned int hash2 = 0, hash2_any = 0, use_hash2 = (hslot->count > 10);
+	unsigned int inner_flushed = 0;
 
 	if (use_hash2) {
 		hash2_any = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum) &
@@ -1674,6 +1676,7 @@ start_lookup:
 					dif, hnum)) {
 			if (unlikely(count == ARRAY_SIZE(stack))) {
 				flush_stack(stack, count, skb, ~0);
+				inner_flushed = 1;
 				count = 0;
 			}
 			stack[count++] = sk;
@@ -1695,7 +1698,10 @@ start_lookup:
 	if (count) {
 		flush_stack(stack, count, skb, count - 1);
 	} else {
-		kfree_skb(skb);
+		if (!inner_flushed)
+			UDP_INC_STATS_BH(net, UDP_MIB_IGNOREDMULTI,
+					 proto == IPPROTO_UDPLITE);
+		consume_skb(skb);
 	}
 	return 0;
 }
@@ -1780,7 +1786,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	} else {
 		if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
 			return __udp4_lib_mcast_deliver(net, skb, uh,
-					saddr, daddr, udptable);
+					saddr, daddr, udptable, proto);
 
 		sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	}
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 1752cd0..679253d0 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -136,6 +136,7 @@ static const struct snmp_mib snmp6_udp6_list[] = {
 	SNMP_MIB_ITEM("Udp6RcvbufErrors", UDP_MIB_RCVBUFERRORS),
 	SNMP_MIB_ITEM("Udp6SndbufErrors", UDP_MIB_SNDBUFERRORS),
 	SNMP_MIB_ITEM("Udp6InCsumErrors", UDP_MIB_CSUMERRORS),
+	SNMP_MIB_ITEM("Udp6IgnoredMulti", UDP_MIB_IGNOREDMULTI),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f6ba535..d80f21e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -771,7 +771,7 @@ static void udp6_csum_zero_error(struct sk_buff *skb)
  */
 static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
-		struct udp_table *udptable)
+		struct udp_table *udptable, int proto)
 {
 	struct sock *sk, *stack[256 / sizeof(struct sock *)];
 	const struct udphdr *uh = udp_hdr(skb);
@@ -781,6 +781,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	int dif = inet6_iif(skb);
 	unsigned int count = 0, offset = offsetof(typeof(*sk), sk_nulls_node);
 	unsigned int hash2 = 0, hash2_any = 0, use_hash2 = (hslot->count > 10);
+	int inner_flushed = 0;
 
 	if (use_hash2) {
 		hash2_any = udp6_portaddr_hash(net, &in6addr_any, hnum) &
@@ -803,6 +804,7 @@ start_lookup:
 		    (uh->check || udp_sk(sk)->no_check6_rx)) {
 			if (unlikely(count == ARRAY_SIZE(stack))) {
 				flush_stack(stack, count, skb, ~0);
+				inner_flushed = 1;
 				count = 0;
 			}
 			stack[count++] = sk;
@@ -821,7 +823,10 @@ start_lookup:
 	if (count) {
 		flush_stack(stack, count, skb, count - 1);
 	} else {
-		kfree_skb(skb);
+		if (!inner_flushed)
+			UDP_INC_STATS_BH(net, UDP_MIB_IGNOREDMULTI,
+					 proto == IPPROTO_UDPLITE);
+		consume_skb(skb);
 	}
 	return 0;
 }
@@ -873,7 +878,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	 */
 	if (ipv6_addr_is_multicast(daddr))
 		return __udp6_lib_mcast_deliver(net, skb,
-				saddr, daddr, udptable);
+				saddr, daddr, udptable, proto);
 
 	/* Unicast */
 

^ permalink raw reply related

* Re: [PATCH 00/13] net_sched: misc cleanups and improvements
From: Eric Dumazet @ 2014-11-06 18:21 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, Jamal Hadi Salim
In-Reply-To: <CAM_iQpWw4UMKZcdZfpp5D-tDfj954fbptyXJUzydgFCero6xNw@mail.gmail.com>

On Thu, 2014-11-06 at 10:05 -0800, Cong Wang wrote:

> 
> Who works on what? Does he/she at least announce it on netdev?
> (If you meant John, I already waited for his rcu stuffs in the last
> merge window,
> I assumed his works is almost done therefore sent this patchset.)
> 
> Since when it becomes a rule that we should yield to something not merged,
> not even announced? If so, why not adding it to netdev-FAQ?

You really dont get it. You cant understand how it really works.

Most probably I am one of the contributor, and my work depends on the
knowledge I got from studying the code. If you constantly change it, my
knowledge is reduced to useless bits.

Clearly you have to understand how _other_ people work, not assume
everybody is as smart as you are.

^ permalink raw reply

* Re: [PATCH 00/13] net_sched: misc cleanups and improvements
From: Eric Dumazet @ 2014-11-06 18:17 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, Jamal Hadi Salim
In-Reply-To: <CAM_iQpWw4UMKZcdZfpp5D-tDfj954fbptyXJUzydgFCero6xNw@mail.gmail.com>

On Thu, 2014-11-06 at 10:05 -0800, Cong Wang wrote:
> On Tue, Nov 4, 2014 at 5:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2014-11-04 at 17:25 -0800, Cong Wang wrote:
> >
> >> Seriously, think about why it should when it's just cleanup's, be practical.
> >
> > I seriously ask you to not do cleanups then.
> 
> Apparently you didn't say this when the following commits got accepted:

There is a difference from newbies and you.

As a community, we welcome new comers and encourage them,
but after a while, people sending mostly cleanups are shifting in a
category which doesn't fit to you.

We expect from you more interesting stuff. You can do it.

I understand you want to fully rewrite net/sched to your ideas of
how the code _should_ be.

Doing so forces other people already knowing all this code to spend time
to understand how things changed. And this is really not nice.

If you want to send cleanups, do this once in a while. Do not send 13
patches and expect us to be happy with that. We are not.

^ permalink raw reply

* [PATCH net] dcbnl : Fix lock initialization
From: Anish Bhatt @ 2014-11-06 18:09 UTC (permalink / raw)
  To: netdev
  Cc: davem, john.r.fastabend, ying.xue, jeffrey.t.kirsher, ebiederm,
	Anish Bhatt

dcb_lock was being used uninitialized in dcbnl and is infact missing
 initialization code. Fixed

Signed-off-by: Anish Bhatt <anish@chelsio.com>
---
 net/dcb/dcbnl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index ca11d28..7bc44e1 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1914,6 +1914,8 @@ static int __init dcbnl_init(void)
 {
 	INIT_LIST_HEAD(&dcb_app_list);
 
+	spin_lock_init(&dcb_lock);
+
 	rtnl_register(PF_UNSPEC, RTM_GETDCB, dcb_doit, NULL, NULL);
 	rtnl_register(PF_UNSPEC, RTM_SETDCB, dcb_doit, NULL, NULL);
 
-- 
2.1.3

^ permalink raw reply related

* Re: [PATCH 00/13] net_sched: misc cleanups and improvements
From: Cong Wang @ 2014-11-06 18:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers, Jamal Hadi Salim
In-Reply-To: <1415152068.1458.2.camel@edumazet-glaptop2.roam.corp.google.com>

On Tue, Nov 4, 2014 at 5:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-11-04 at 17:25 -0800, Cong Wang wrote:
>
>> Seriously, think about why it should when it's just cleanup's, be practical.
>
> I seriously ask you to not do cleanups then.

Apparently you didn't say this when the following commits got accepted:

commit 436f7c206860729d543a457aca5887e52039a5f4
Author: Fabian Frederick <fabf@skynet.be>
Date:   Tue Nov 4 20:52:14 2014 +0100

    igmp: remove camel case definitions

    use standard uppercase for definitions

    Signed-off-by: Fabian Frederick <fabf@skynet.be>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit c18450a52a10a5c4cea3dc426c40447a7152290f
Author: Fabian Frederick <fabf@skynet.be>
Date:   Tue Nov 4 20:48:41 2014 +0100

    udp: remove else after return

commit aa1f731e52807077e9e13a86c0cad12d442c8fd4
Author: Fabian Frederick <fabf@skynet.be>
Date:   Tue Nov 4 20:44:04 2014 +0100

    inet: frags: remove inline on static in c file

    remove __inline__ / inline and let compiler decide what to do
    with static functions

    Inspired-by: "David S. Miller" <davem@davemloft.net>
    Signed-off-by: Fabian Frederick <fabf@skynet.be>
    Signed-off-by: David S. Miller <davem@davemloft.net>

>
> Some people are working adding real stuff here, this code changing every
> month is slowing them a lot.
>

Who works on what? Does he/she at least announce it on netdev?
(If you meant John, I already waited for his rcu stuffs in the last
merge window,
I assumed his works is almost done therefore sent this patchset.)

Since when it becomes a rule that we should yield to something not merged,
not even announced? If so, why not adding it to netdev-FAQ?

^ permalink raw reply

* Fw: [Bug 87701] New: hard cpu lockup during pppd initialization of vpn
From: Stephen Hemminger @ 2014-11-04 18:09 UTC (permalink / raw)
  To: netdev

Begin forwarded message:

Date: Tue, 4 Nov 2014 08:35:39 -0800
From: "bugzilla-daemon@bugzilla.kernel.org" <bugzilla-daemon@bugzilla.kernel.org>
To: "stephen@networkplumber.org" <stephen@networkplumber.org>
Subject: [Bug 87701] New: hard cpu lockup during pppd initialization of vpn

https://bugzilla.kernel.org/show_bug.cgi?id=87701

            Bug ID: 87701
           Summary: hard cpu lockup during pppd initialization of vpn
           Product: Networking
           Version: 2.5
    Kernel Version: 3.18.0-rc3
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: Other
          Assignee: shemminger@linux-foundation.org
          Reporter: richcoe2@gmail.com
        Regression: No

I did not experience this issue in 3.16 or before.
I did not try 3.17.

I moved from kernel-3.15 to 3.16, and then to kernel 3.18.
When I start forticlientsslvpn on 3.18, the system locks up hard.  No mouse and
no keyboard. 

forticlient starts pppd to enable a vpn connection.
Since this is laptop, I don't get a kernel traceback, or OOPS message.

I'm enabling kdump to see if I can get a reliable traceback.
I was first on 3.18.0-rc2, and moved to 3.18.0-rc3 today, and still have the
issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* Re: [PATCH net-next 1/7] bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command
From: Alexei Starovoitov @ 2014-11-06 17:39 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David S. Miller, Ingo Molnar, Andy Lutomirski,
	Hannes Frederic Sowa, Eric Dumazet, Linux API,
	Network Development, LKML
In-Reply-To: <545A3ACC.3080101@redhat.com>

On Wed, Nov 5, 2014 at 6:57 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 11/05/2014 12:04 AM, Alexei Starovoitov wrote:
>>
>> On Tue, Nov 4, 2014 at 1:25 AM, Daniel Borkmann <dborkman@redhat.com>
>> wrote:
>>>
>>> On 11/04/2014 03:54 AM, Alexei Starovoitov wrote:
>>>>
>>>>
>>>> the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
>>>> either update existing map element or create a new one.
>>>> Initially the plan was to add a new command to handle the case of
>>>> 'create new element if it didn't exist', but 'flags' style looks
>>>> cleaner and overall diff is much smaller (more code reused), so add
>>>> 'flags'
>>>> attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
>>>> enum {
>>>>     BPF_MAP_UPDATE_OR_CREATE = 0, /* add new element or update existing
>>>> */
>>>>     BPF_MAP_CREATE_ONLY,          /* add new element if it didn't exist
>>>> */
>>>>     BPF_MAP_UPDATE_ONLY           /* update existing element */
>>>> };
>>>
>>>
>>>  From you commit message/code I currently don't see an explanation why
>>> it cannot be done in typical ``flags style'' as various syscalls do,
>>> i.e. BPF_MAP_UPDATE_OR_CREATE rather represented as ...
>>>
>>>    BPF_MAP_CREATE | BPF_MAP_UPDATE
>>>
>>> Do you expect more than 64 different flags to be passed from user space
>>> for BPF_MAP?
>>
>>
>> several reasons:
>> - preserve flags==0 as default behavior
>> - avoid holes and extra checks for invalid combinations, so
>>    if (flags > BPF_MAP_UPDATE_ONLY) goto err, is enough.
>> - it looks much neater when user space uses
>>    BPF_MAP_UPDATE_OR_CREATE instead of ORing bits.
>>
>> Note this choice doesn't prevent adding bit-like flags
>> in the future. Today I cannot think of any new flags
>> for the update() command, but if somebody comes up with
>> a new selector that can apply to all three combinations,
>> we can add it as 3rd bit that can be ORed.
>
>
> Hm, mixing enums together with bitfield-like flags seems
> kind of hacky ... :/ Or, do you mean to say you're adding
> a 2nd flag field, i.e. splitting the 64bits into a 32bit
> ``cmd enum'' and 32bit ``flag section''?

something like this.
or splitting 64-bit into 2 and 62. We'll see.
First two encode this 'type' of update and the rest -
whatever else.

> Hm, my concern is that we start to add many *_OR_* enum
> elements once we find that a flag might be a useful in
> combination with many other flags ... even though if we
> only can think of 3 flags /right now/.

Agree. Adding many *_OR_* would look bad, that's
why I said that future additions can be bits. Like in
paragraph above.

Also, we don't have 3 flags now. In this patch I'm
showing 3 types and you're suggesting to treat
them as 2 flags. To me that's incorrect, since 'no flags'
becomes invalid combination, which logically incorrect.
Therefore I cannot see them as 'flags'. This is a 'type'
or 'style' of update() command.

I think it actually matches how open() defines things
in similar situation:
#define O_RDONLY        00000000
#define O_WRONLY        00000001
#define O_RDWR          00000002
We used to think of them as flags, but they're not
bit flags, though the rest of open() flags are bit-like.
If we apply your argument to open() then open()
should have defined O_RD as 1 and OR_WR as 2
and force everyone to mix and match them, but
then zero would be invalid. So I still think that
what I have is a cleaner API :)

^ permalink raw reply

* Re: [PATCH 3/4] macvtap: Use iovec iterators
From: Al Viro @ 2014-11-06 17:33 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev, linux-kernel, bcrl, YOSHIFUJI Hideaki
In-Reply-To: <E1XmIQW-0007mb-J4@gondolin.me.apana.org.au>

On Thu, Nov 06, 2014 at 04:28:20PM +0800, Herbert Xu wrote:

> +		if (copy_to_iter(&vnet_hdr, sizeof(vnet_hdr), iter))
>  			return -EFAULT;

Again, wrong calling conventions.  It returns how much has it copied.

> +		ret = copy_to_iter(&veth, sizeof(veth), iter);
> +		if (ret || !iov_iter_count(iter))
>  			goto done;
Ditto.

^ permalink raw reply

* Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: Al Viro @ 2014-11-06 17:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, netdev, linux-kernel, bcrl, YOSHIFUJI Hideaki
In-Reply-To: <E1XmIQU-0007m1-2e@gondolin.me.apana.org.au>

On Thu, Nov 06, 2014 at 04:28:18PM +0800, Herbert Xu wrote:
> +		if (copy_to_iter(skb->data + offset, copy, to))
> +			goto fault;

Sorry, no - copy_to_iter() returns the number of bytes copied, not 0 or -EFAULT.

> +			vaddr = kmap(page);
> +			err = copy_to_iter(vaddr + frag->page_offset +
> +					   offset - start, copy, to);
> +			kunmap(page);
> +			if (err)
> +				goto fault;

And that one should be
			copied = copy_page_to_iter(page, frag->page_offset +
					   offset - start, copy, to);
			if (copied != copy)
				goto fault;

Don't bother with kmap(), vaddr and all that shite.  The primitive is
	copy_page_to_iter(page, offset_in_page, nbytes, iter)
it does all needed kmap itself and it's smart enough to use kmap_atomic
when it can get away with that.  Similar for copy_page_from_iter().

Both of those (as well as copy_{to,from}_iter()) advance iov_iter and return
the number of bytes actually copied.  So the check for EFAULT is "it has copied
less than you've asked it to copy *and* you haven't run out that iov_iter".
The second part is guaranteed to be true in this case - your code makes sure
that 'copy' is no more than the space left in iterator.

In general, this check would be spelled
			if (copied != copy && iov_iter_count(to))
				goto fault;

^ permalink raw reply

* Re: am335x: cpsw: phy ignores max-speed setting
From: Eric Dumazet @ 2014-11-06 17:28 UTC (permalink / raw)
  To: Dave Taht
  Cc: Yegor Yefremov, netdev, N, Mugunthan V, mpa, lsorense,
	Daniel Mack
In-Reply-To: <CAA93jw5=LDirktyC+rvpLi-kywUSosj6QV8-na5p3-f=PxKcWQ@mail.gmail.com>

On Thu, 2014-11-06 at 08:51 -0800, Dave Taht wrote:
> ooh! ooh! I have a BQL enablement patch for the cpsw that I have no
> means of testing against multiple phys. Could
> you give the attached very small patch a shot along the way?
> 
> The results I get on the beaglebone vs netperf-wrapper are pretty
> spectacular - huge increase in throughput, big reduction in
> latency.

Please send this patch inline, so that we can comment, and start a new
thread.

@@ -1375,9 +1380,11 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb,
                skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 
        skb_tx_timestamp(skb);
-
+       len = max(skb->len, CPSW_MIN_PACKET_SIZE);
+       netdev_sent_queue(ndev,len);
        ret = cpsw_tx_packet_submit(ndev, priv, skb);
        if (unlikely(ret != 0)) {
+               netdev_completed_queue(ndev,1,len);
<<you can not do that, its racy with other netdev_completed_queue() calls from TX completion >>
                cpsw_err(priv, tx_err, "desc submit failed\n");
                goto fail;
        }


You need to call netdev_sent_queue(ndev, len); at the correct place,
because we can not 'undo' it.

^ permalink raw reply

* Re: [PATCH 1/4] inet: Add skb_copy_datagram_iter
From: David Miller @ 2014-11-06 17:25 UTC (permalink / raw)
  To: herbert; +Cc: viro, netdev, linux-kernel, bcrl
In-Reply-To: <20141106082338.GA29800@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 6 Nov 2014 16:23:38 +0800

> On Wed, Nov 05, 2014 at 03:24:10PM -0500, David Miller wrote:
>> 
>> Herbert, please provide a cover letter for this series, and the most recent
>> version of patch #2 gets various rejects when I try to apply it to net-next.
> 
> Sure, I'll regenerate them.  However, while doing so I noticed that
> a number of my patches on tun/macvtap that you have previously set
> as accepted are missing from net-next.  Could this be why you got
> the rejects?

Those were bug fixes so went into plain 'net', they will show up next
time I do a merge and I will deal with the conflicts, if any.

^ permalink raw reply

* Re: [patch net-next 07/10] bridge: call netdev_sw_port_stp_update when bridge port STP status changes
From: Florian Fainelli @ 2014-11-06 16:59 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, jhs, sfeldma, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl
In-Reply-To: <1415265610-9338-8-git-send-email-jiri@resnulli.us>

On 11/06/2014 01:20 AM, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
> 
> To notify switch driver of change in STP state of bridge port, add new
> .ndo op and provide swdev wrapper func to call ndo op. Use it in bridge
> code then.
> 
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---

[snip]

>  #endif /* _LINUX_SWITCHDEV_H_ */
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index 86c239b..13fecf1 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -17,6 +17,7 @@
>  #include <net/net_namespace.h>
>  #include <net/sock.h>
>  #include <uapi/linux/if_bridge.h>
> +#include <net/switchdev.h>
>  
>  #include "br_private.h"
>  #include "br_private_stp.h"
> @@ -304,6 +305,7 @@ static int br_set_port_state(struct net_bridge_port *p, u8 state)
>  
>  	br_set_state(p, state);
>  	br_log_state(p);
> +	netdev_sw_port_stp_update(p->dev, p->state);

Is there a reason netdev_sw_port_stp_update() is not folded in
br_set_state()? Are we missing calls to br_set_state() in some locations?
--
Florian

^ permalink raw reply

* Re: am335x: cpsw: phy ignores max-speed setting
From: Florian Fainelli @ 2014-11-06 16:58 UTC (permalink / raw)
  To: Yegor Yefremov, netdev; +Cc: N, Mugunthan V, mpa, lsorense, Daniel Mack
In-Reply-To: <CAGm1_ktWK5ai85PZJTkq8Q1mAFH6JZ5XM1mDOHO3K_N2iGNLWg@mail.gmail.com>

On 11/06/2014 08:25 AM, Yegor Yefremov wrote:
> I' m trying to override max-speed setting for both CPSW connected
> PHYs. This is my DTS section for configuring CPSW:
> 
> &mac {
>         pinctrl-names = "default", "sleep";
>         pinctrl-0 = <&cpsw_default>;
>         pinctrl-1 = <&cpsw_sleep>;
>         dual_emac = <1>;
> 
>         status = "okay";
> };
> 
> &davinci_mdio {
>         pinctrl-names = "default", "sleep";
>         pinctrl-0 = <&davinci_mdio_default>;
>         pinctrl-1 = <&davinci_mdio_sleep>;
> 
>         status = "okay";
> };
> 
> &cpsw_emac0 {
>         phy_id = <&davinci_mdio>, <0>;
>         phy-mode = "rgmii-id";
>         dual_emac_res_vlan = <1>;
>         max-speed = <100>;
> };
> 
> &cpsw_emac1 {
>         phy_id = <&davinci_mdio>, <1>;
>         phy-mode = "rgmii-id";
>         dual_emac_res_vlan = <2>;
>         max-speed = <100>;
> };
> 
> But in drivers/net/phy/phy_device.c->of_set_phy_supported() routine I
> don't get through node check, i.e. node == NULL. Any idea why?

Yes, because the 'max-speed' property is placed at the Ethernet MAC node
level, not the PHY node as of_set_phy_supported() expect its.

This driver does not appear to use the standard Ethernet PHY device tree
node, so I am not sure what are your options here.

> 
> static void of_set_phy_supported(struct phy_device *phydev)
> {
>         struct device_node *node = phydev->dev.of_node;
>         u32 max_speed;
> 
>         if (!IS_ENABLED(CONFIG_OF_MDIO))
>                 return;
> 
>         if (!node)
>                 return;
> 
>         if (!of_property_read_u32(node, "max-speed", &max_speed)) {
>                 /* The default values for phydev->supported are
> provided by the PHY
>                  * driver "features" member, we want to reset to sane
> defaults fist
>                  * before supporting higher speeds.
>                  */
>                 phydev->supported &= PHY_DEFAULT_FEATURES;
> 
>                 switch (max_speed) {
>                 default:
>                         return;
> 
>                 case SPEED_1000:
>                         phydev->supported |= PHY_1000BT_FEATURES;
>                 case SPEED_100:
>                         phydev->supported |= PHY_100BT_FEATURES;
>                 case SPEED_10:
>                         phydev->supported |= PHY_10BT_FEATURES;
>                 }
>         }
> }
> 
> Yegor
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: am335x: cpsw: phy ignores max-speed setting
From: Dave Taht @ 2014-11-06 16:51 UTC (permalink / raw)
  To: Yegor Yefremov; +Cc: netdev, N, Mugunthan V, mpa, lsorense, Daniel Mack
In-Reply-To: <CAGm1_ktWK5ai85PZJTkq8Q1mAFH6JZ5XM1mDOHO3K_N2iGNLWg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]

ooh! ooh! I have a BQL enablement patch for the cpsw that I have no
means of testing against multiple phys. Could
you give the attached very small patch a shot along the way?

The results I get on the beaglebone vs netperf-wrapper are pretty
spectacular - huge increase in throughput, big reduction in
latency.

http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png
http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewithbql.png

On Thu, Nov 6, 2014 at 8:25 AM, Yegor Yefremov
<yegorslists@googlemail.com> wrote:
> I' m trying to override max-speed setting for both CPSW connected
> PHYs. This is my DTS section for configuring CPSW:
>
> &mac {
>         pinctrl-names = "default", "sleep";
>         pinctrl-0 = <&cpsw_default>;
>         pinctrl-1 = <&cpsw_sleep>;
>         dual_emac = <1>;
>
>         status = "okay";
> };
>
> &davinci_mdio {
>         pinctrl-names = "default", "sleep";
>         pinctrl-0 = <&davinci_mdio_default>;
>         pinctrl-1 = <&davinci_mdio_sleep>;
>
>         status = "okay";
> };
>
> &cpsw_emac0 {
>         phy_id = <&davinci_mdio>, <0>;
>         phy-mode = "rgmii-id";
>         dual_emac_res_vlan = <1>;
>         max-speed = <100>;
> };
>
> &cpsw_emac1 {
>         phy_id = <&davinci_mdio>, <1>;
>         phy-mode = "rgmii-id";
>         dual_emac_res_vlan = <2>;
>         max-speed = <100>;
> };
>
> But in drivers/net/phy/phy_device.c->of_set_phy_supported() routine I
> don't get through node check, i.e. node == NULL. Any idea why?
>
> static void of_set_phy_supported(struct phy_device *phydev)
> {
>         struct device_node *node = phydev->dev.of_node;
>         u32 max_speed;
>
>         if (!IS_ENABLED(CONFIG_OF_MDIO))
>                 return;
>
>         if (!node)
>                 return;
>
>         if (!of_property_read_u32(node, "max-speed", &max_speed)) {
>                 /* The default values for phydev->supported are
> provided by the PHY
>                  * driver "features" member, we want to reset to sane
> defaults fist
>                  * before supporting higher speeds.
>                  */
>                 phydev->supported &= PHY_DEFAULT_FEATURES;
>
>                 switch (max_speed) {
>                 default:
>                         return;
>
>                 case SPEED_1000:
>                         phydev->supported |= PHY_1000BT_FEATURES;
>                 case SPEED_100:
>                         phydev->supported |= PHY_100BT_FEATURES;
>                 case SPEED_10:
>                         phydev->supported |= PHY_10BT_FEATURES;
>                 }
>         }
> }
>
> Yegor
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht

thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks

[-- Attachment #2: 0001-Add-BQL-support-to-the-TI-cpsw-driver.patch --]
[-- Type: text/x-patch, Size: 2369 bytes --]

From 7eccb26dc8f6d09660b22fcbd868572d050df26f Mon Sep 17 00:00:00 2001
From: Dave Taht <dave.taht@bufferbloat.net>
Date: Thu, 6 Nov 2014 08:45:30 -0800
Subject: [PATCH] Add BQL support to the TI cpsw driver

Tested on the beaglebone black.

I get a huge improvement in both throughput and latency.

Latency goes from 60ms worst case with pfifo_fast, and 12ms worst case with
sch_fq to 2.5ms with BQL enabled.

Throughput improved also.
---
 drivers/net/ethernet/ti/cpsw.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index d879448..5934fbc 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -118,7 +118,7 @@ do {								\
 #define CPDMA_TXCP		0x40
 #define CPDMA_RXCP		0x60
 
-#define CPSW_POLL_WEIGHT	64
+#define CPSW_POLL_WEIGHT	16
 #define CPSW_MIN_PACKET_SIZE	60
 #define CPSW_MAX_PACKET_SIZE	(1500 + 14 + 4 + 4)
 
@@ -693,6 +693,7 @@ static void cpsw_tx_handler(void *token, int len, int status)
 	cpts_tx_timestamp(priv->cpts, skb);
 	ndev->stats.tx_packets++;
 	ndev->stats.tx_bytes += len;
+	netdev_completed_queue(ndev,1,len);
 	dev_kfree_skb_any(skb);
 }
 
@@ -1307,6 +1308,8 @@ static int cpsw_ndo_open(struct net_device *ndev)
 		cpsw_set_coalesce(ndev, &coal);
 	}
 
+	netdev_reset_queue(ndev);
+	dev_info(priv->dev, "BQL enabled\n");
 	napi_enable(&priv->napi);
 	cpdma_ctlr_start(priv->dma);
 	cpsw_intr_enable(priv);
@@ -1341,6 +1344,7 @@ static int cpsw_ndo_stop(struct net_device *ndev)
 	netif_stop_queue(priv->ndev);
 	napi_disable(&priv->napi);
 	netif_carrier_off(priv->ndev);
+	netdev_reset_queue(priv->ndev);
 
 	if (cpsw_common_res_usage_state(priv) <= 1) {
 		cpts_unregister(priv->cpts);
@@ -1361,6 +1365,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb,
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	int ret;
+	int len;
 
 	ndev->trans_start = jiffies;
 
@@ -1375,9 +1380,11 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb,
 		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
 
 	skb_tx_timestamp(skb);
-
+	len = max(skb->len, CPSW_MIN_PACKET_SIZE);
+	netdev_sent_queue(ndev,len);
 	ret = cpsw_tx_packet_submit(ndev, priv, skb);
 	if (unlikely(ret != 0)) {
+		netdev_completed_queue(ndev,1,len);
 		cpsw_err(priv, tx_err, "desc submit failed\n");
 		goto fail;
 	}
-- 
1.9.1


^ permalink raw reply related

* Re: [PATCHv2 net-next 2/2] sunvnet: Use one Tx queue per vnet_port
From: Sowmini Varadhan @ 2014-11-06 16:46 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: davem, netdev
In-Reply-To: <1415291564.3398.66.camel@decadent.org.uk>

On (11/06/14 16:32), Ben Hutchings wrote:
> 
> OK, then the indentation of the following line is wrong.

you are right, sorry about that. I'll fix that shortly..

--Sowmini

^ permalink raw reply

* Re: [PATCH net-next] net: gro: add a per device gro flush timer
From: Rick Jones @ 2014-11-06 16:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Or Gerlitz, Willem de Bruijn
In-Reply-To: <1415241576.13896.62.camel@edumazet-glaptop2.roam.corp.google.com>

On 11/05/2014 06:39 PM, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 18:14 -0800, Eric Dumazet wrote:
>> On Wed, 2014-11-05 at 17:38 -0800, Rick Jones wrote:
>>
>>> Speaking of QPS, what happens to 200 TCP_RR tests when the feature is
>>> enabled?
>
> The possible reduction of QPS happens when you have a single flow like
> TCP_RR  -- -r 40000,40000
>
> (Because we have one single TCP packet with 40000 bytes of payload,
> application is waked up once when Push flag is received)
>
> So cpu effiency is way better, but application has to copy 40000 bytes
> in one go _after_ Push flag, instead of being able to copy part of the
> data _before_ receiving the Push flag.

Thanks.  That isn't too unlike what I've seen happen in the past with 
say an 8K request size and switching back and forth between a 1500 and 
9000 byte MTU.

happy benchmarking,

rick

^ permalink raw reply

* Re: [PATCHv2 net-next 2/2] sunvnet: Use one Tx queue per vnet_port
From: Ben Hutchings @ 2014-11-06 16:32 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: davem, netdev
In-Reply-To: <20141106162822.GL15665@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]

On Thu, 2014-11-06 at 11:28 -0500, Sowmini Varadhan wrote:
> On (11/06/14 16:19), Ben Hutchings wrote:
> > > +	txq = netdev_get_tx_queue(port->vp->dev, port->q_index);
> > > +	__netif_tx_lock(txq, smp_processor_id());
> > > +	if (likely(netif_tx_queue_stopped(txq))) {
> > > +		struct vio_dring_state *dr;
> > > +
> > > +		dr = &port->vio.drings[VIO_DRIVER_TX_RING];
> > 
> > You seem to have dropped the condition for the netif_tx_wake_queue(),
> > which I would guess based on the old code should be:
> > 
> > 		if (vnet_tx_dring_avail(dr) >= VNET_TX_WAKEUP_THRESH(dr))
> > 
> > > +			netif_tx_wake_queue(txq);
> 
> yes, this was deliberate.
> 
> As I indicated in the comments:
> 
> /* Got back a STOPPED LDC message on port. If the queue is stopped,
>  * wake it up so that we'll send out another START message at the
>  * next TX.
>  */
> 
> We only call maybe_tx_wakeup() if the peer has sent us a STOPPED
> ack (meaning that the peer is no longer reading the descriptor rings).
> So if our tx queue is full and stopped, we need to poke the peer
> on the next TX with a start message. (otherwise we'd never wake up!)

OK, then the indentation of the following line is wrong.

Ben.

-- 
Ben Hutchings
Beware of programmers who carry screwdrivers. - Leonard Brandwein

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply

* Re: mlx4+vxlan offload breaks gre tunnels
From: Or Gerlitz @ 2014-11-06 16:30 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Tom Herbert, Jesse Gross, amirv
In-Reply-To: <20141105165351.GA23131@breakpoint.cc>

On 11/5/2014 6:53 PM, Florian Westphal wrote:
> Right, the patch below works in my setup as well (until link-add-vxlan,
> that is;)  )

Good, let me look on that little further to see what's the best approach 
here, thanks for the report

Or.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox