Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 11/13] net/mlx4_en: Call napi_synchronize on stop_port
From: Amir Vadai @ 2014-10-27  9:37 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Amir Vadai, Ido Shamay
In-Reply-To: <1414402667-8841-1-git-send-email-amirv@mellanox.com>

From: Ido Shamay <idos@mellanox.com>

This is instead of calling the actual implementation of
napi_synchronize, for better encapsulation.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index c4450be..3c07a75 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1843,8 +1843,7 @@ void mlx4_en_stop_port(struct net_device *dev, int detach)
 		}
 		local_bh_enable();
 
-		while (test_bit(NAPI_STATE_SCHED, &cq->napi.state))
-			msleep(1);
+		napi_synchronize(&cq->napi);
 		mlx4_en_deactivate_rx_ring(priv, priv->rx_ring[i]);
 		mlx4_en_deactivate_cq(priv, cq);
 
-- 
1.8.3.4

^ permalink raw reply related

* Re: [PATCH net 0/3] cdc-ether: handle promiscuous mode
From: Oliver Neukum @ 2014-10-27 10:02 UTC (permalink / raw)
  To: Olivier Blin; +Cc: netdev, hayeswang, bjorn, davem
In-Reply-To: <1414172582-30844-1-git-send-email-olivier.blin@softathome.com>

On Fri, 2014-10-24 at 19:42 +0200, Olivier Blin wrote:
> Hi,
> 
> Since kernel 3.16, my Lenovo USB network adapters (RTL8153) using
> cdc-ether are not working anymore in a bridge.
> 
> This is due to commit c472ab68ad67db23c9907a27649b7dc0899b61f9, which
> resets the packet filter when the device is bound.
> 
> The default packet filter set by cdc-ether does not include
> promiscuous, while the adapter seemed to have promiscuous enabled by
> default.
> 
> This patch series allows to support promiscuous mode for cdc-ether, by
> hooking into set_rx_mode.
> 
> Incidentally, maybe this device should be handled by the r8152 driver,
> but this patch series is still nice for other adapters.

Acked-by: Oliver Neukum <oneukum@suse.de>

	Regards
		Oliver

^ permalink raw reply

* Re: [ovs-dev] [PATCH net-next] datapath: Rename last_action() as nla_is_last() and move to netlink.h
From: Thomas Graf @ 2014-10-27 10:08 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, David S. Miller, Pravin Shelar, dev
In-Reply-To: <1414393936-14463-1-git-send-email-simon.horman@netronome.com>

On 10/27/14 at 04:12pm, Simon Horman wrote:
> The original motivation for this change was to allow the helper to be used
> in files other than actions.c as part of work on an odp select group
> action.
> 
> It was as pointed out by Thomas Graf that this helper would be best off
> living in netlink.h. Furthermore, I think that the generic nature of this
> helper means it is best off in netlink.h regardless of if it is used more
> than one .c file or not. Thus, I would like it considered independent of
> the work on an odp select group action.
> 
> Cc: Thomas Graf <tgraf@suug.ch>
> Cc: Pravin Shelar <pshelar@nicira.com>
> Cc: Andy Zhou <azhou@nicira.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>

Acked-by: Thomas Graf <tgraf@noironetworks.com>

^ permalink raw reply

* Poor UDP throughput with virtual devices and UFO
From: Toshiaki Makita @ 2014-10-27 10:29 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu, Eric Dumazet

Hi,

I recently noticed sending UDP packets ends up with very poor throughput when
using UFO and virtual devices.

Example configurations are:
- macvlan on vlan
- gre on bridge

With these configurations, the upper virtual devices (macvlan, gre) has the
UFO feature and the lower devices (vlan, bridge) don't have it. UFO packets
will be sent from the upper devices and fragmented on the lower devices.
So, they will be fragmented before entering qdisc.

Since skb_segment() doesn't increase sk_wmem_alloc, the send buffer of a UDP
socket looks almost always empty, and user space can send packets with no limit,
which causes massive drops on qdisc.

I wrote a patch to increase sk_wmem_alloc in skb_segment(), but I'm wondering
if we can do this change since it has been this way for years and only TCP
handles it so far (d6a4a1041176 "tcp: GSO should be TSQ friendly").

Here are performance test results (macvlan on vlan):

- Before
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      144096 1224195    1258.56
212992           60.00          51              0.45

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.23      0.00     25.26      0.08      0.00     74.43
Average:          0      0.29      0.00      0.76      0.29      0.00     98.66
Average:          1      0.21      0.00      0.33      0.00      0.00     99.45
Average:          2      0.05      0.00      0.12      0.07      0.00     99.76
Average:          3      0.36      0.00     99.64      0.00      0.00      0.00

- After
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      109593      0     957.20
212992           60.00      109593            957.20

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.18      0.00      8.38      0.02      0.00     91.43
Average:          0      0.17      0.00      3.60      0.00      0.00     96.23
Average:          1      0.13      0.00      6.60      0.00      0.00     93.27
Average:          2      0.23      0.00      5.76      0.07      0.00     93.94
Average:          3      0.17      0.00     17.57      0.00      0.00     82.26


The patch (based on net tree) for the test above:

----
Subject: [PATCH net] gso: Inherit sk_wmem_alloc

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
 net/core/skbuff.c      |  6 +++++-
 net/ipv4/tcp_offload.c | 13 ++++---------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c16615b..29dc763 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3020,7 +3020,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 							    len, 0);
 			SKB_GSO_CB(nskb)->csum_start =
 			    skb_headroom(nskb) + doffset;
-			continue;
+			goto set_owner;
 		}
 
 		nskb_frag = skb_shinfo(nskb)->frags;
@@ -3092,6 +3092,10 @@ perform_csum_check:
 			SKB_GSO_CB(nskb)->csum_start =
 			    skb_headroom(nskb) + doffset;
 		}
+
+set_owner:
+		if (head_skb->sk)
+			skb_set_owner_w(nskb, head_skb->sk);
 	} while ((offset += len) < head_skb->len);
 
 	/* Some callers want to get the end of the list.
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 5b90f2f..93758a8 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -139,11 +139,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 			th->check = gso_make_checksum(skb, ~th->check);
 
 		seq += mss;
-		if (copy_destructor) {
+		if (copy_destructor)
 			skb->destructor = gso_skb->destructor;
-			skb->sk = gso_skb->sk;
-			sum_truesize += skb->truesize;
-		}
 		skb = skb->next;
 		th = tcp_hdr(skb);
 
@@ -157,11 +154,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 	 * is freed by GSO engine
 	 */
 	if (copy_destructor) {
-		swap(gso_skb->sk, skb->sk);
-		swap(gso_skb->destructor, skb->destructor);
-		sum_truesize += skb->truesize;
-		atomic_add(sum_truesize - gso_skb->truesize,
-			   &skb->sk->sk_wmem_alloc);
+		skb->destructor = gso_skb->destructor;
+		gso_skb->destructor = NULL;
+		atomic_sub(gso_skb->truesize, &skb->sk->sk_wmem_alloc);
 	}
 
 	delta = htonl(oldlen + (skb_tail_pointer(skb) -
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH v2 09/15] net: dsa: Add support for switch EEPROM access
From: Guenter Roeck @ 2014-10-27 13:22 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Andrew Lunn, netdev, David S. Miller, Florian Fainelli,
	linux-kernel
In-Reply-To: <20141027085048.GC4748@netboy>

On 10/27/2014 01:50 AM, Richard Cochran wrote:
> On Sun, Oct 26, 2014 at 08:56:20PM -0700, Guenter Roeck wrote:
>>
>> Also, it seems that you request two separate properties, one for presence
>> and another for length. Is that correct ? Again, I thought that would not
>> provide any value since presence is indicated by length != 0 in the ethtool
>> callback function. No problem, though, I'll be happy to create two separate
>> properties and platform data variables if you think that would be better.
>
> The fewer properties, the better.
>

Right now I have:

Optional properties:
- eeprom-length         : Set to the length of an EEPROM connected to the
                           switch. Must be set if the switch can not detect
                           the presence and/or size of a connected EEPROM,
                           otherwise optional.

and I think Andrew is asking for the following:

Optional properties:
- eeprom-present	: Boolean property indicating that an EEPROM is present.
			  Must be set if an EEPROM is present.
- eeprom-length         : Set to the length of an EEPROM connected to the
                           switch. Must be set if the switch can not detect
                           the size of a connected EEPROM, otherwise optional.

Platform data semantics would be the same.

I can go either way, but I would like to get some kind of agreement before I jump
into writing the code.

Thanks,
Guenter

^ permalink raw reply

* Re: [PATCH 1/1 net-next] xfrm: fix set but not used warning in xfrm_policy_queue_process()
From: Steffen Klassert @ 2014-10-27 13:35 UTC (permalink / raw)
  To: Fabian Frederick; +Cc: linux-kernel, Herbert Xu, David S. Miller, netdev
In-Reply-To: <1414250829-17908-1-git-send-email-fabf@skynet.be>

On Sat, Oct 25, 2014 at 05:27:09PM +0200, Fabian Frederick wrote:
> err was set but unused.
> 
> Signed-off-by: Fabian Frederick <fabf@skynet.be>

Applied to ipsec-next, thanks!

^ permalink raw reply

* Re: [PATCH v2 09/15] net: dsa: Add support for switch EEPROM access
From: Andrew Lunn @ 2014-10-27 13:59 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Andrew Lunn, netdev, David S. Miller, Florian Fainelli,
	linux-kernel
In-Reply-To: <544DC264.9020600@roeck-us.net>

> So lets take a step back: For the Marvell chips, I have to provide both length
> and presence in devicetree or platform data. Presence seemed to be implied by
> length, so I used only a single property and variable to indicate both.

Hi Guenter

What i was thinking, is that you don't need length in device tree. The
datasheet specifies how big the EEPROM needs to be.

However, i read the datasheet for the 6060, the only public datasheet
from Marvell. It does not work as i expected. Rather than being a
fixed list of register values, it is a variable length list of
command/value pairs.

In this situation, yes, you do need the length in DT.

I had a quick look at some other switch chips. e.g. the RTL8100. It
has a fixed layout of its EEPROM, consisting of 0x80 bytes. In this
case, the switch driver could be hard coded with 0x80, and all DT
needs to indicate is if the EEPROM is present or not.

So, what you have proposed will work. It is maybe not optimal in the
case of a well defined in the datasheet fixed size EEPROM, but it
still works.

Acked-by: Andrew Lunn <andrew@lunn.ch>

	  Andrew

^ permalink raw reply

* Re: [PATCH] netlink: don't copy over empty attribute data
From: Sasha Levin @ 2014-10-27 14:42 UTC (permalink / raw)
  To: David Miller; +Cc: a.ryabinin, pablo, mschmidt, akpm, linux-kernel, netdev
In-Reply-To: <20141026.220350.2098346782596904995.davem@davemloft.net>

On 10/26/2014 10:03 PM, David Miller wrote:
> From: Sasha Levin <sasha.levin@oracle.com>
> Date: Sun, 26 Oct 2014 19:32:42 -0400
> 
>> How so? GCC states clearly that you should *never* pass a NULL
>> pointer there:
>>
>> "The pointers passed to memmove (and similar functions in <string.h>) must
>> be non-null even when nbytes==0" (https://gcc.gnu.org/gcc-4.9/porting_to.html).
>>
>> Even if it doesn't dereference it, it can break somehow in a subtle way. Leaving
>> the kernel code assuming that gcc (or any other compiler) would always behave
>> the same in a situation that shouldn't occur.
> 
> Show me a legal way in which one could legally dereference the pointer
> when length is zero, and I'll entertain this patch.

The moment you've triggered an undefined behaviour you have GCC license to
dereference anything it wants. GCC would be well within it's rights
dereferencing a NULL "from".

They even state it clearly in that GCC 4.9 porting guide I've linked above:

"""
Calling copy(p, NULL, 0) can therefore deference a null pointer and crash.

The example above needs to be fixed to avoid the invalid memmove call, for example:


    if (nbytes != 0)
      memmove (dest, src, nbytes);
"""


Thanks,
Sasha

^ permalink raw reply

* Re: Poor UDP throughput with virtual devices and UFO
From: Eric Dumazet @ 2014-10-27 15:01 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: netdev, Herbert Xu, Eric Dumazet
In-Reply-To: <1414405781.4492.38.camel@ubuntu-vm-makita>

On Mon, 2014-10-27 at 19:29 +0900, Toshiaki Makita wrote:
> Hi,

...

> I wrote a patch to increase sk_wmem_alloc in skb_segment(), but I'm wondering
> if we can do this change since it has been this way for years and only TCP
> handles it so far (d6a4a1041176 "tcp: GSO should be TSQ friendly").

Thats probably because UFO is kind of strange : No NIC actually does UDP
segmentation.

> ----
> Subject: [PATCH net] gso: Inherit sk_wmem_alloc
> 
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> ---
>  net/core/skbuff.c      |  6 +++++-
>  net/ipv4/tcp_offload.c | 13 ++++---------
>  2 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c16615b..29dc763 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3020,7 +3020,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>  							    len, 0);
>  			SKB_GSO_CB(nskb)->csum_start =
>  			    skb_headroom(nskb) + doffset;
> -			continue;
> +			goto set_owner;
>  		}
>  
>  		nskb_frag = skb_shinfo(nskb)->frags;
> @@ -3092,6 +3092,10 @@ perform_csum_check:
>  			SKB_GSO_CB(nskb)->csum_start =
>  			    skb_headroom(nskb) + doffset;
>  		}
> +
> +set_owner:
> +		if (head_skb->sk)
> +			skb_set_owner_w(nskb, head_skb->sk);
>  	} while ((offset += len) < head_skb->len);
>  
>  	/* Some callers want to get the end of the list.
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 5b90f2f..93758a8 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -139,11 +139,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  			th->check = gso_make_checksum(skb, ~th->check);
>  
>  		seq += mss;
> -		if (copy_destructor) {
> +		if (copy_destructor)
>  			skb->destructor = gso_skb->destructor;
> -			skb->sk = gso_skb->sk;
> -			sum_truesize += skb->truesize;
> -		}
>  		skb = skb->next;
>  		th = tcp_hdr(skb);
>  
> @@ -157,11 +154,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  	 * is freed by GSO engine
>  	 */
>  	if (copy_destructor) {
> -		swap(gso_skb->sk, skb->sk);
> -		swap(gso_skb->destructor, skb->destructor);
> -		sum_truesize += skb->truesize;
> -		atomic_add(sum_truesize - gso_skb->truesize,
> -			   &skb->sk->sk_wmem_alloc);
> +		skb->destructor = gso_skb->destructor;
> +		gso_skb->destructor = NULL;
> +		atomic_sub(gso_skb->truesize, &skb->sk->sk_wmem_alloc);
>  	}
>  
>  	delta = htonl(oldlen + (skb_tail_pointer(skb) -


Please rewrite your patch to move the code out of tcp_gso_segment() into
skb_segment()

Look how I carefully avoided many atomic operations on
sk->sk_wmem_alloc, but how you removed it. :(

Alternative would be to use a single skb_set_owner_w() on the last
segment, and tweak its truesize to not corrupt sk->sk_wmem_alloc

d6a4a1041176 was needed for people using GSO=off TSO=off on a bonding
device, while best performance is reached with TSO=on so that
segmentation is performed later on the slave device.

Thanks

^ permalink raw reply

* Re: [PATCH 2/5] stmmac: pci: use managed resources
From: Giuseppe CAVALLARO @ 2014-10-27 15:28 UTC (permalink / raw)
  To: Andy Shevchenko, Sergei Shtylyov
  Cc: netdev, Kweh Hock Leong, David S. Miller, Vince Bridgers,
	Rayagond K
In-Reply-To: <1413966993.2396.26.camel@linux.intel.com>

On 10/22/2014 10:36 AM, Andy Shevchenko wrote:
> So, I was trying to find any specification on public regarding to boards
> that have this IP, no luck so far. I guess that that code was created
> due to XILINX FPGA usage which probably can provide any BAR user wants
> to. Thus, I imply that in real applications the BAR most probably will
> be 0. However, I left variable which can be overridden in future
> (regarding to PCI ID).
>
> It would be nice to hear someone from ST about this. Giuseppe?

Hello Andy

this chip is on ST SoCs since long time but embedded. I have no PCI
card. Added Rayagond on copy too

peppe

^ permalink raw reply

* [PATCH iproute2] man ip: Add missing '-details' option
From: Vadim Kochan @ 2014-10-27 15:22 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 man/man8/ip.8 | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/man/man8/ip.8 b/man/man8/ip.8
index 9065b3a..2d42e98 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -59,6 +59,10 @@ appears twice or more, the amount of information increases.
 As a rule, the information is statistics or some time values.
 
 .TP
+.BR "\-d" , " \-details"
+Output more detailed information.
+
+.TP
 .BR "\-l" , " \-loops " <COUNT>
 Specify maximum number of loops the 'ip addr flush' logic
 will attempt before giving up.  The default is 10.
-- 
2.1.0

^ permalink raw reply related

* Re: Poor UDP throughput with virtual devices and UFO
From: Eric Dumazet @ 2014-10-27 15:39 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: netdev, Herbert Xu, Eric Dumazet
In-Reply-To: <1414422089.16231.8.camel@edumazet-glaptop2.roam.corp.google.com>

On Mon, 2014-10-27 at 08:01 -0700, Eric Dumazet wrote:

> Please rewrite your patch to move the code out of tcp_gso_segment() into
> skb_segment()
> 
> Look how I carefully avoided many atomic operations on
> sk->sk_wmem_alloc, but how you removed it. :(
> 
> Alternative would be to use a single skb_set_owner_w() on the last
> segment, and tweak its truesize to not corrupt sk->sk_wmem_alloc
> 
> d6a4a1041176 was needed for people using GSO=off TSO=off on a bonding
> device, while best performance is reached with TSO=on so that
> segmentation is performed later on the slave device.

Hmm.. I meant 6ff50cd55545d92 ("tcp: gso: do not generate out of order
packets")

I think I will test an alternative patch and send it, keeping you as the
author if you do not mind.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 09/15] net: dsa: Add support for switch EEPROM access
From: Guenter Roeck @ 2014-10-27 15:42 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, David S. Miller, Florian Fainelli, linux-kernel
In-Reply-To: <20141027135917.GA12627@lunn.ch>

On Mon, Oct 27, 2014 at 02:59:17PM +0100, Andrew Lunn wrote:
> > So lets take a step back: For the Marvell chips, I have to provide both length
> > and presence in devicetree or platform data. Presence seemed to be implied by
> > length, so I used only a single property and variable to indicate both.
> 
> Hi Guenter
> 
> What i was thinking, is that you don't need length in device tree. The
> datasheet specifies how big the EEPROM needs to be.
> 
> However, i read the datasheet for the 6060, the only public datasheet
> from Marvell. It does not work as i expected. Rather than being a
> fixed list of register values, it is a variable length list of
> command/value pairs.
> 
> In this situation, yes, you do need the length in DT.
> 
Correct. The 6352 supports "2K bit (93C56) or 4K bit (93C66) 4-wire EEPROM
devices as well as 1K bit (24C01), 2K bit (24C02) or 4K bit (24C04) 2-wire
EEPROM", so the length can be anything from 128 to 512 bytes.

> I had a quick look at some other switch chips. e.g. the RTL8100. It
> has a fixed layout of its EEPROM, consisting of 0x80 bytes. In this
> case, the switch driver could be hard coded with 0x80, and all DT
> needs to indicate is if the EEPROM is present or not.
> 
> So, what you have proposed will work. It is maybe not optimal in the
> case of a well defined in the datasheet fixed size EEPROM, but it
> still works.
> 
> Acked-by: Andrew Lunn <andrew@lunn.ch>
> 
Thanks!

Guenter

^ permalink raw reply

* [PATCH v3] ipv6: notify userspace when we added or changed an ipv6 token
From: Lubomir Rintel @ 2014-10-27 16:39 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Lubomir Rintel, Hannes Frederic Sowa,
	Daniel Borkmann
In-Reply-To: <544D8234.5060504@redhat.com>

NetworkManager might want to know that it changed when the router advertisement
arrives.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Daniel Borkmann <dborkman@redhat.com>
---
Changes since v1:
    - Do not call device notifier chain with netdev_state_change()
Changes since v2:
    - inet6_ifinfo_notify() instead of rtmsg_ifinfo()

 net/ipv6/addrconf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3e118df..d9269ef 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4528,6 +4528,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
 	}
 
 	write_unlock_bh(&idev->lock);
+	inet6_ifinfo_notify(RTM_NEWLINK, idev);
 	addrconf_verify_rtnl();
 	return 0;
 }
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH] netlink: don't copy over empty attribute data
From: Andrey Ryabinin @ 2014-10-27 16:46 UTC (permalink / raw)
  To: Sasha Levin, David Miller; +Cc: pablo, mschmidt, akpm, linux-kernel, netdev
In-Reply-To: <544E59DC.3060906@oracle.com>

On 10/27/2014 05:42 PM, Sasha Levin wrote:
> On 10/26/2014 10:03 PM, David Miller wrote:
>> From: Sasha Levin <sasha.levin@oracle.com>
>> Date: Sun, 26 Oct 2014 19:32:42 -0400
>>
>>> How so? GCC states clearly that you should *never* pass a NULL
>>> pointer there:
>>>
>>> "The pointers passed to memmove (and similar functions in <string.h>) must
>>> be non-null even when nbytes==0" (https://gcc.gnu.org/gcc-4.9/porting_to.html).
>>>
>>> Even if it doesn't dereference it, it can break somehow in a subtle way. Leaving
>>> the kernel code assuming that gcc (or any other compiler) would always behave
>>> the same in a situation that shouldn't occur.
>>
>> Show me a legal way in which one could legally dereference the pointer
>> when length is zero, and I'll entertain this patch.
> 
> The moment you've triggered an undefined behaviour you have GCC license to
> dereference anything it wants. GCC would be well within it's rights
> dereferencing a NULL "from".
> 
> They even state it clearly in that GCC 4.9 porting guide I've linked above:
> 
> """
> Calling copy(p, NULL, 0) can therefore deference a null pointer and crash.
> 
> The example above needs to be fixed to avoid the invalid memmove call, for example:
> 
> 
>     if (nbytes != 0)
>       memmove (dest, src, nbytes);
> """
> 


In example from link null ptr deref could happen because GCC will optimize away null pointer check after
memmove():

int copy (int* dest, int* src, size_t nbytes) {
    memmove (dest, src, nbytes);
    if (src != NULL)  <---- GCC will eliminate this check because src can't be null.
      return *src; <-- NULL ptr deref
    return 0;
  }

Even though GCC and C standard treats such code ( memmove(dest, NULL, 0); ) as invalid, it probably will not crash in linux kernel case,
because that kind of optimization disabled via -fno-delete-null-pointer-checks option.


> 
> Thanks,
> Sasha
> 

^ permalink raw reply

* Re: [PATCH] ovs: Turn vports with dependencies into separate modules
From: Pravin Shelar @ 2014-10-27 17:14 UTC (permalink / raw)
  To: Thomas Graf; +Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, netdev
In-Reply-To: <20141024215758.GA25640-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org>

On Fri, Oct 24, 2014 at 2:57 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 10/24/14 at 10:47am, Pravin Shelar wrote:
>> On Wed, Oct 22, 2014 at 8:29 AM, Thomas Graf <tgraf@suug.ch> wrote:
>> > The internal and netdev vport remain part of openvswitch.ko. Encap
>> > vports including vxlan, gre, and geneve can be built as separate
>> > modules and are loaded on demand. Modules can be unloaded after use.
>> > Datapath ports keep a reference to the vport module during their
>> > lifetime.
>> >
>> > Allows to remove the error prone maintenance of the global list
>> > vport_ops_list.
>> >
>> How error prone is this interface, can you give example? Set of ovs
>> vport type is been pretty stable, so am not sure if we need loadable
>> module support for vports implementations.
>
> I was refering to how many other kernel APIs have been designed, a
> registration API allowing a vport to be implemented exclusively in the
> scope of a single file tends to be cleaner than having to touch multiple
> files and maintaining an init list.
>
This has never been issue in openvswitch. Plus we do not need loadable
vport module to fix this issue.

> It also allows for OVS to be built into vmlinuz while vports can
> remain as modules even if vxlan itself is built as a module.
>

What is problem with current OVS built into kernel?

> As for new vports, GUE and LIS are candidates, encrypted VXLAN might
> look for support and there are several VXLAN extensions currently
> proposed as IETF drafts which might require new vports.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply

* [PATCH net-next] net: skb_segment() should preserve backpressure
From: Eric Dumazet @ 2014-10-27 17:30 UTC (permalink / raw)
  To: Toshiaki Makita, David Miller; +Cc: netdev, Herbert Xu
In-Reply-To: <1414424388.16231.13.camel@edumazet-glaptop2.roam.corp.google.com>

From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

This patch generalizes commit d6a4a1041176 ("tcp: GSO should be TSQ
friendly") to protocols using skb_set_owner_w()

TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
as explained in commit 6ff50cd55545 ("tcp: gso: do not generate out of
order packets")

This allows UDP sockets using UFO to get proper backpressure,
thus avoiding qdisc drops and excessive cpu usage.

Here are performance test results (macvlan on vlan):

- Before
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      144096 1224195    1258.56
212992           60.00          51              0.45

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.23      0.00     25.26      0.08      0.00     74.43

- After
# netperf -t UDP_STREAM ...
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   60.00      109593      0     957.20
212992           60.00      109593            957.20

Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.18      0.00      8.38      0.02      0.00     91.43

[edumazet] Rewrote patch and changelog.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/skbuff.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c16615bfb61edd2a1dae9ef7935a3153d78dc4df..e48e5c02e877d9a9389ea54f0e015ba041d3f2a7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3099,6 +3099,16 @@ perform_csum_check:
 	 * (see validate_xmit_skb_list() for example)
 	 */
 	segs->prev = tail;
+
+	/* Following permits correct backpressure, for protocols
+	 * using skb_set_owner_w().
+	 * Idea is to tranfert ownership from head_skb to last segment.
+	 */
+	if (head_skb->destructor == sock_wfree) {
+		swap(tail->truesize, head_skb->truesize);
+		swap(tail->destructor, head_skb->destructor);
+		swap(tail->sk, head_skb->sk);
+	}
 	return segs;
 
 err:

^ permalink raw reply related

* Re: [PATCH 08/11] ssb: driver_chip_comon_pmu: Fix probable mask then right shift defect
From: Michael Büsch @ 2014-10-27 17:32 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel, netdev
In-Reply-To: <e68a49a228d8d2495630b4ef03a1f32b33fce7fe.1414387334.git.joe@perches.com>

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Sun, 26 Oct 2014 22:25:04 -0700
Joe Perches <joe@perches.com> wrote:

> Precedence of & and >> is not the same and is not left to right.
> shift has higher precedence and should be done after the mask.
> 
> Add parentheses around the mask.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Good catch.

Reviewed-by: Michael Büsch <m@bues.ch>

> ---
>  drivers/ssb/driver_chipcommon_pmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ssb/driver_chipcommon_pmu.c b/drivers/ssb/driver_chipcommon_pmu.c
> index 1173a09..bc71583 100644
> --- a/drivers/ssb/driver_chipcommon_pmu.c
> +++ b/drivers/ssb/driver_chipcommon_pmu.c
> @@ -621,8 +621,8 @@ static u32 ssb_pmu_get_alp_clock_clk0(struct ssb_chipcommon *cc)
>  	u32 crystalfreq;
>  	const struct pmu0_plltab_entry *e = NULL;
>  
> -	crystalfreq = chipco_read32(cc, SSB_CHIPCO_PMU_CTL) &
> -		      SSB_CHIPCO_PMU_CTL_XTALFREQ >> SSB_CHIPCO_PMU_CTL_XTALFREQ_SHIFT;
> +	crystalfreq = (chipco_read32(cc, SSB_CHIPCO_PMU_CTL) &
> +		       SSB_CHIPCO_PMU_CTL_XTALFREQ) >> SSB_CHIPCO_PMU_CTL_XTALFREQ_SHIFT;
>  	e = pmu0_plltab_find_entry(crystalfreq);
>  	BUG_ON(!e);
>  	return e->freq * 1000;




-- 
Michael

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: localed stuck in recent 3.18 git in copy_net_ns?
From: Paul E. McKenney @ 2014-10-27 17:45 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Yanko Kaneti, Josh Boyer, Eric W. Biederman, Cong Wang,
	Kevin Fenzi, netdev, Linux-Kernel@Vger. Kernel. Org, mroos, tj
In-Reply-To: <20141025181827.GE28247@linux.vnet.ibm.com>

On Sat, Oct 25, 2014 at 11:18:27AM -0700, Paul E. McKenney wrote:
> On Sat, Oct 25, 2014 at 09:38:16AM -0700, Jay Vosburgh wrote:
> > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > >On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> > >> 	Looking at the dmesg, the early boot messages seem to be
> > >> confused as to how many CPUs there are, e.g.,
> > >> 
> > >> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > >> [    0.000000] Hierarchical RCU implementation.
> > >> [    0.000000]  RCU debugfs-based tracing is enabled.
> > >> [    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
> > >> [    0.000000]  RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> > >> [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> > >> [    0.000000] NR_IRQS:16640 nr_irqs:456 0
> > >> [    0.000000]  Offload RCU callbacks from all CPUs
> > >> [    0.000000]  Offload RCU callbacks from CPUs: 0-3.
> > >> 
> > >> 	but later shows 2:
> > >> 
> > >> [    0.233703] x86: Booting SMP configuration:
> > >> [    0.236003] .... node  #0, CPUs:      #1
> > >> [    0.255528] x86: Booted up 1 node, 2 CPUs
> > >> 
> > >> 	In any event, the E8400 is a 2 core CPU with no hyperthreading.
> > >
> > >Well, this might explain some of the difficulties.  If RCU decides to wait
> > >on CPUs that don't exist, we will of course get a hang.  And rcu_barrier()
> > >was definitely expecting four CPUs.
> > >
> > >So what happens if you boot with maxcpus=2?  (Or build with
> > >CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang.  If so,
> > >I might have some ideas for a real fix.
> > 
> > 	Booting with maxcpus=2 makes no difference (the dmesg output is
> > the same).
> > 
> > 	Rebuilding with CONFIG_NR_CPUS=2 makes the problem go away, and
> > dmesg has different CPU information at boot:
> > 
> > [    0.000000] smpboot: 4 Processors exceeds NR_CPUS limit of 2
> > [    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> >  [...]
> > [    0.000000] setup_percpu: NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
> >  [...]
> > [    0.000000] Hierarchical RCU implementation.
> > [    0.000000] 	RCU debugfs-based tracing is enabled.
> > [    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
> > [    0.000000] NR_IRQS:4352 nr_irqs:440 0
> > [    0.000000] 	Offload RCU callbacks from all CPUs
> > [    0.000000] 	Offload RCU callbacks from CPUs: 0-1.
> 
> Thank you -- this confirms my suspicions on the fix, though I must admit
> to being surprised that maxcpus made no difference.

And here is an alleged fix, lightly tested at this end.  Does this patch
help?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Make rcu_barrier() understand about missing rcuo kthreads

Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online.  This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29a44a, this could result in huge numbers of useless
rcuo kthreads being created.

It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix.   The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.

It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks.  It is therefore required to wait even for those callbacks
that cannot possibly be invoked.  Even if doing so hangs the system.

Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case.  Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().

So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.

Reported-by: Yanko Kaneti <yaneti@declera.com>
Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Eric B Munson <emunson@akamai.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index aa8e5eea3ab4..c78e88ce5ea3 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -660,18 +660,18 @@ TRACE_EVENT(rcu_torture_read,
 /*
  * Tracepoint for _rcu_barrier() execution.  The string "s" describes
  * the _rcu_barrier phase:
- *	"Begin": rcu_barrier_callback() started.
- *	"Check": rcu_barrier_callback() checking for piggybacking.
- *	"EarlyExit": rcu_barrier_callback() piggybacked, thus early exit.
- *	"Inc1": rcu_barrier_callback() piggyback check counter incremented.
- *	"Offline": rcu_barrier_callback() found offline CPU
- *	"OnlineNoCB": rcu_barrier_callback() found online no-CBs CPU.
- *	"OnlineQ": rcu_barrier_callback() found online CPU with callbacks.
- *	"OnlineNQ": rcu_barrier_callback() found online CPU, no callbacks.
+ *	"Begin": _rcu_barrier() started.
+ *	"Check": _rcu_barrier() checking for piggybacking.
+ *	"EarlyExit": _rcu_barrier() piggybacked, thus early exit.
+ *	"Inc1": _rcu_barrier() piggyback check counter incremented.
+ *	"OfflineNoCB": _rcu_barrier() found callback on never-online CPU
+ *	"OnlineNoCB": _rcu_barrier() found online no-CBs CPU.
+ *	"OnlineQ": _rcu_barrier() found online CPU with callbacks.
+ *	"OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
  *	"IRQ": An rcu_barrier_callback() callback posted on remote CPU.
  *	"CB": An rcu_barrier_callback() invoked a callback, not the last.
  *	"LastCB": An rcu_barrier_callback() invoked the last callback.
- *	"Inc2": rcu_barrier_callback() piggyback check counter incremented.
+ *	"Inc2": _rcu_barrier() piggyback check counter incremented.
  * The "cpu" argument is the CPU or -1 if meaningless, the "cnt" argument
  * is the count of remaining callbacks, and "done" is the piggybacking count.
  */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f6880052b917..7680fc275036 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3312,11 +3312,16 @@ static void _rcu_barrier(struct rcu_state *rsp)
 			continue;
 		rdp = per_cpu_ptr(rsp->rda, cpu);
 		if (rcu_is_nocb_cpu(cpu)) {
-			_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
-					   rsp->n_barrier_done);
-			atomic_inc(&rsp->barrier_cpu_count);
-			__call_rcu(&rdp->barrier_head, rcu_barrier_callback,
-				   rsp, cpu, 0);
+			if (!rcu_nocb_cpu_needs_barrier(rsp, cpu)) {
+				_rcu_barrier_trace(rsp, "OfflineNoCB", cpu,
+						   rsp->n_barrier_done);
+			} else {
+				_rcu_barrier_trace(rsp, "OnlineNoCB", cpu,
+						   rsp->n_barrier_done);
+				atomic_inc(&rsp->barrier_cpu_count);
+				__call_rcu(&rdp->barrier_head,
+					   rcu_barrier_callback, rsp, cpu, 0);
+			}
 		} else if (ACCESS_ONCE(rdp->qlen)) {
 			_rcu_barrier_trace(rsp, "OnlineQ", cpu,
 					   rsp->n_barrier_done);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 4beab3d2328c..8e7b1843896e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -587,6 +587,7 @@ static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
 static void print_cpu_stall_info_end(void);
 static void zero_cpu_stall_ticks(struct rcu_data *rdp);
 static void increment_cpu_stall_ticks(void);
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu);
 static void rcu_nocb_gp_set(struct rcu_node *rnp, int nrq);
 static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp);
 static void rcu_init_one_nocb(struct rcu_node *rnp);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 927c17b081c7..68c5b23b7173 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2050,6 +2050,33 @@ static void wake_nocb_leader(struct rcu_data *rdp, bool force)
 }
 
 /*
+ * Does the specified CPU need an RCU callback for the specified flavor
+ * of rcu_barrier()?
+ */
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+	struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+	struct rcu_head *rhp;
+
+	/* No-CBs CPUs might have callbacks on any of three lists. */
+	rhp = ACCESS_ONCE(rdp->nocb_head);
+	if (!rhp)
+		rhp = ACCESS_ONCE(rdp->nocb_gp_head);
+	if (!rhp)
+		rhp = ACCESS_ONCE(rdp->nocb_follower_head);
+
+	/* Having no rcuo kthread but CBs after scheduler starts is bad! */
+	if (!ACCESS_ONCE(rdp->nocb_kthread) && rhp) {
+		/* RCU callback enqueued before CPU first came online??? */
+		pr_err("RCU: Never-onlined no-CBs CPU %d has CB %p\n",
+		       cpu, rhp->func);
+		WARN_ON_ONCE(1);
+	}
+
+	return !!rhp;
+}
+
+/*
  * Enqueue the specified string of rcu_head structures onto the specified
  * CPU's no-CBs lists.  The CPU is specified by rdp, the head of the
  * string by rhp, and the tail of the string by rhtp.  The non-lazy/lazy
@@ -2646,6 +2673,10 @@ static bool init_nocb_callback_list(struct rcu_data *rdp)
 
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 
+static bool rcu_nocb_cpu_needs_barrier(struct rcu_state *rsp, int cpu)
+{
+}
+
 static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
 {
 }

^ permalink raw reply related

* [PATCH net] cxgb4vf: Replace repetitive pci device ID's with right ones
From: Hariprasad Shenai @ 2014-10-27 17:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, leedom, kumaras, nirranjan, santosh, anish,
	Hariprasad Shenai

Replaced repetive Device ID's which got added in commit b961f9a48844ecf3
("cxgb4vf: Remove superfluous "idx" parameter of CH_DEVICE() macro")

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c    |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index bfa398d..0b42bdd 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -2929,14 +2929,14 @@ static const struct pci_device_id cxgb4vf_pci_tbl[] = {
 	CH_DEVICE(0x480d),	/* T480-cr */
 	CH_DEVICE(0x480e),	/* T440-lp-cr */
 	CH_DEVICE(0x4880),
-	CH_DEVICE(0x4880),
-	CH_DEVICE(0x4880),
-	CH_DEVICE(0x4880),
-	CH_DEVICE(0x4880),
-	CH_DEVICE(0x4880),
-	CH_DEVICE(0x4880),
-	CH_DEVICE(0x4880),
-	CH_DEVICE(0x4880),
+	CH_DEVICE(0x4881),
+	CH_DEVICE(0x4882),
+	CH_DEVICE(0x4883),
+	CH_DEVICE(0x4884),
+	CH_DEVICE(0x4885),
+	CH_DEVICE(0x4886),
+	CH_DEVICE(0x4887),
+	CH_DEVICE(0x4888),
 	CH_DEVICE(0x5801),	/* T520-cr */
 	CH_DEVICE(0x5802),	/* T522-cr */
 	CH_DEVICE(0x5803),	/* T540-cr */
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/1 net-next] net/irda: include linux/uaccess.h instead of asm/uaccess.h
From: Fabian Frederick @ 2014-10-27 18:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Fabian Frederick, Samuel Ortiz, David S. Miller, netdev

Signed-off-by: Fabian Frederick <fabf@skynet.be>
---
 net/irda/ircomm/ircomm_tty_ioctl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/irda/ircomm/ircomm_tty_ioctl.c b/net/irda/ircomm/ircomm_tty_ioctl.c
index ce94385..2db24bd 100644
--- a/net/irda/ircomm/ircomm_tty_ioctl.c
+++ b/net/irda/ircomm/ircomm_tty_ioctl.c
@@ -31,8 +31,7 @@
 #include <linux/termios.h>
 #include <linux/tty.h>
 #include <linux/serial.h>
-
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 
 #include <net/irda/irda.h>
 #include <net/irda/irmod.h>
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 1/2] udp: Record RPS flow in socket operations
From: Tom Herbert @ 2014-10-27 18:01 UTC (permalink / raw)
  To: davem, netdev

Add calls to sock_rps_record_flow for udp_sendmsg, udp_sendpage
and udp_recvmsg. This enables RFS for connected UDP sockets.

Tested:
  Ran netperf UDP_RR with 200 flows, with and without UDP RSS enabled

Before fix:
  No RSS
    Client (connected UDP)
      36.87% CPU utilization
    Server (unconnected UDP)
      33.64% CPU utilization
    256/440/687 90/95/99% latencies
    727273 tps

  UDP RSS
    Client
      79.59% CPU utilization
    Server
      78.83% CPU utilization
    116/159/226 90/95/99% latencies
    1.60974e+06 tps

After fix:
  No RSS
    Client
      44.38% CPU utilization
    Server
      50.46% CPU utilization
    192/245/343 90/95/99% latencies
    1.01413e+06

  UDP RSS
    Client
      79.98% CPU utilization
    Server
      80.35% CPU utilization
    113/158/230 90/95/99% latencies
    1.60622e+06 tps

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/udp.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cd0db54..9a0d346 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -881,6 +881,8 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct sk_buff *skb;
 	struct ip_options_data opt_copy;
 
+	sock_rps_record_flow(sk);
+
 	if (len > 0xFFFF)
 		return -EMSGSIZE;
 
@@ -1113,6 +1115,8 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
 	struct udp_sock *up = udp_sk(sk);
 	int ret;
 
+	sock_rps_record_flow(sk);
+
 	if (flags & MSG_SENDPAGE_NOTLAST)
 		flags |= MSG_MORE;
 
@@ -1253,6 +1257,8 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int is_udplite = IS_UDPLITE(sk);
 	bool slow;
 
+	sock_rps_record_flow(sk);
+
 	if (flags & MSG_ERRQUEUE)
 		return ip_recv_error(sk, msg, len, addr_len);
 
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH net-next 2/2] udp: Reset flow table for flows over unconnected sockets
From: Tom Herbert @ 2014-10-27 18:01 UTC (permalink / raw)
  To: davem, netdev
In-Reply-To: <1414432875-23795-1-git-send-email-therbert@google.com>

When receiving a packet on an unconnected UDP socket clear the
flow table for the corresponding hash. This is needed so flows over
unconnected UDP sockets will use RPS instead of using what is
present in the flow table. In particular, this avoids having flows
over unconnected sockets be perpetually steered by unrelated
entries in the flow table (idle TCP connections for instance).

Tested:

First filled up the RPS flow tables by creating a bunch of TCP
connections and letting them turn idle. Next, run netperf UDP_RR
with 200 flows.

Before fix:
  Client (connected UDP)
    81.15% CPU uilization
  Server (unneconnedted UDP)
    83.63% CPU uilization
  118/167/249 90/95/99% latencies
  1.59215e+06 tps

After fix:
  Client (connected UDP)
    81.13% CPU uilization
  Server (unneconnedted UDP)
    80.68% CPU uilization
  116/167/248 90/95/99% latencies
  1.61048e+06 tps

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/udp.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9a0d346..e58d841 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1451,6 +1451,11 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	if (inet_sk(sk)->inet_daddr) {
 		sock_rps_save_rxhash(sk, skb);
 		sk_mark_napi_id(sk, skb);
+	} else {
+		/* For an unconnected socket reset flow hash so that related
+		 * flow will use RPS.
+		 */
+		sock_rps_reset_flow_hash(skb->hash);
 	}
 
 	rc = sock_queue_rcv_skb(sk, skb);
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH 1/1] ipv4: remove set but unused variable sha
From: Fabian Frederick @ 2014-10-27 18:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Fabian Frederick, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev

unsigned char *sha (source) was already in original git version
 but was never used.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
---
 net/ipv4/ipconfig.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 648fa14..a896da5 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -498,7 +498,7 @@ ic_rarp_recv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
 	struct arphdr *rarp;
 	unsigned char *rarp_ptr;
 	__be32 sip, tip;
-	unsigned char *sha, *tha;		/* s for "source", t for "target" */
+	unsigned char *tha;		/* t for "target" */
 	struct ic_device *d;
 
 	if (!net_eq(dev_net(dev), &init_net))
@@ -549,7 +549,6 @@ ic_rarp_recv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt
 		goto drop_unlock;	/* should never happen */
 
 	/* Extract variable-width fields */
-	sha = rarp_ptr;
 	rarp_ptr += dev->addr_len;
 	memcpy(&sip, rarp_ptr, 4);
 	rarp_ptr += 4;
-- 
1.9.1

^ permalink raw reply related

* [PATCH 1/1 net-next] ipx: replace long unsigned int by unsigned long
From: Fabian Frederick @ 2014-10-27 18:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Fabian Frederick, Arnaldo Carvalho de Melo, David S. Miller,
	netdev

Use standard unsigned long.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
---
 net/ipx/ipx_proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipx/ipx_proc.c b/net/ipx/ipx_proc.c
index e15c16a..c0cb442 100644
--- a/net/ipx/ipx_proc.c
+++ b/net/ipx/ipx_proc.c
@@ -90,7 +90,7 @@ static int ipx_seq_route_show(struct seq_file *seq, void *v)
 	seq_printf(seq, "%08lX   ", (unsigned long int)ntohl(rt->ir_net));
 	if (rt->ir_routed)
 		seq_printf(seq, "%08lX     %02X%02X%02X%02X%02X%02X\n",
-			   (long unsigned int)ntohl(rt->ir_intrfc->if_netnum),
+			   (unsigned long)ntohl(rt->ir_intrfc->if_netnum),
 			   rt->ir_router_node[0], rt->ir_router_node[1],
 			   rt->ir_router_node[2], rt->ir_router_node[3],
 			   rt->ir_router_node[4], rt->ir_router_node[5]);
-- 
1.9.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox