Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: TCP transmit performance regression
From: David Miller @ 2012-07-05 10:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tom.leiming, netdev
In-Reply-To: <1341481760.2583.3579.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 05 Jul 2012 11:49:20 +0200

> -			ax_skb->data = packet;

That's really scary.

^ permalink raw reply

* Re: [PATCH next-next] ppp: change default for incoming protocol filter to NPMODE_DROP
From: David Miller @ 2012-07-05 10:00 UTC (permalink / raw)
  To: bcrl; +Cc: netdev, linux-ppp
In-Reply-To: <20120704013258.GA26225@kvack.org>

From: Benjamin LaHaise <bcrl@kvack.org>
Date: Tue, 3 Jul 2012 21:32:58 -0400

> By default, the ppp_generic code initializes the npmode array that filters
> incoming packet to accept packets for all protocols.  This behaviour is
> incorrect, as it results in packets for protocols that an older version
> of a PPP implementation may not be aware of to be incorrectly accepted.
> This behaviour is visible, for example, when sending IPv6 packets across a
> ppp link where pppd has only been configured to use IPv4.
> 
> This change should be safe since pppd will correctly set the protocols it
> negotiates to NPMODE_PASS as the appropriate protocols transition to an Up
> state.
> 
> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>

As far as I can tell, this has been this way for a very long time.

Therefore it is the applications responsibility to adjust the filters
to suit their needs and we really can't make such adjustments to this
behavior.

^ permalink raw reply

* Re: [PATCH 0/19] Disconnect neigh from dst_entry
From: David Miller @ 2012-07-05  9:55 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20120703.024543.1597240990462633709.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Tue, 03 Jul 2012 02:45:43 -0700 (PDT)

> This finally severs neighbour table entries from dst_entry enough that
> we no longer depend upon them outside of the individual protocols.

I'm pushing this now to net-next, with three minor changes.

1) I fubar'd the neigh lookup in the sch_teql changes, I needed to
   add the following code block to __teql_resolve():

       if (dst->dev != dev) {
                struct neighbour *mn;

                mn = __neigh_lookup_errno(n->tbl, n->primary_key, dev);
                neigh_release(n);
                if (IS_ERR(mn))
                        return PTR_ERR(mn);
                n = mn;
        }

2) I adjusted the comment in the neigh backlog handler of
   neigh_update() to read as follows:


	/* Why not just use 'neigh' as-is?  The problem is that
	 * things such as shaper, eql, and sch_teql can end up
	 * using alternative, different, neigh objects to output
	 * the packet in the output path.  So what we need to do
	 * here is re-lookup the top-level neigh in the path so
	 * we can reinject the packet there.
	 */

3) The redirect network event needs to also pass in the path
   destination address so that we can have it available for
   all callers of t3_l2t_get().

^ permalink raw reply

* Re: TCP transmit performance regression
From: Eric Dumazet @ 2012-07-05  9:49 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller
In-Reply-To: <CACVXFVPTXB7t=zwkm+HTgDaF3bA02bzff_52S+UAr51PfpvpCg@mail.gmail.com>

On Thu, 2012-07-05 at 16:42 +0800, Ming Lei wrote:
> On Thu, Jul 5, 2012 at 4:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2012-07-05 at 16:27 +0800, Ming Lei wrote:
> >
> >> After some investigation, the problem is caused by enabling
> >> DEBUG_SLAB, so it is not a regression.
> >>
> >
> > Strange, unless your machine is a _very_ slow one maybe ?
> 
> It is a beagle-xm board, and its cpu is ARMv7, 1GHz.

OK, driver seems buggy, please try following patch (on both sides if
possible)

 drivers/net/usb/smsc95xx.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index b1112e7..0a4ae35 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1084,26 +1084,23 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 			if (skb->len == size) {
 				if (dev->net->features & NETIF_F_RXCSUM)
 					smsc95xx_rx_csum_offload(skb);
-				skb_trim(skb, skb->len - 4); /* remove fcs */
+				__skb_trim(skb, skb->len - 4); /* remove fcs */
 				skb->truesize = size + sizeof(struct sk_buff);
 
 				return 1;
 			}
 
-			ax_skb = skb_clone(skb, GFP_ATOMIC);
+			ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
 			if (unlikely(!ax_skb)) {
 				netdev_warn(dev->net, "Error allocating skb\n");
 				return 0;
 			}
 
-			ax_skb->len = size;
-			ax_skb->data = packet;
-			skb_set_tail_pointer(ax_skb, size);
+			memcpy(skb_put(ax_skb, size), packet, size);
 
 			if (dev->net->features & NETIF_F_RXCSUM)
 				smsc95xx_rx_csum_offload(ax_skb);
-			skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
-			ax_skb->truesize = size + sizeof(struct sk_buff);
+			__skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
 
 			usbnet_skb_return(dev, ax_skb);
 		}

^ permalink raw reply related

* [PATCH v2] cgroup: fix panic in netprio_cgroup
From: Gao feng @ 2012-07-05  9:28 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, nhorman, tj, lizefan, eric.dumazet,
	Gao feng

we set max_prioidx to the first zero bit index of prioidx_map in
function get_prioidx.

So when we delete the low index netprio cgroup and adding a new
netprio cgroup again,the max_prioidx will be set to the low index.

when we set the high index cgroup's net_prio.ifpriomap,the function
write_priomap will call update_netdev_tables to alloc memory which
size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
so the size of array that map->priomap point to is max_prioidx +1,
which is low than what we actually need.

fix this by adding check in get_prioidx,only set max_prioidx when
max_prioidx low than the new prioidx.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 net/core/netprio_cgroup.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 5b8aa2f..aa907ed 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -49,8 +49,9 @@ static int get_prioidx(u32 *prio)
 		return -ENOSPC;
 	}
 	set_bit(prioidx, prioidx_map);
+	if (atomic_read(&max_prioidx) < prioidx)
+		atomic_set(&max_prioidx, prioidx);
 	spin_unlock_irqrestore(&prioidx_map_lock, flags);
-	atomic_set(&max_prioidx, prioidx);
 	*prio = prioidx;
 	return 0;
 }
-- 
1.7.7.6

^ permalink raw reply related

* Re: [PATCH] cgroup: fix panic in netprio_cgroup
From: Gao feng @ 2012-07-05  9:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, nhorman, tj, lizefan
In-Reply-To: <20120705.015841.2231353345763821829.davem@davemloft.net>

于 2012年07月05日 16:58, David Miller 写道:
> 
> Why did you post this twice?

Sorry to confuse you, there are something wrong with my git sendmail config.
I sent the first patch but I can't find it in the maillist,so I
sent it again.

> 
> Is there a difference between the first patch and the second
> one you posted?  If so, what is that difference?

there isn't a difference between them.
Sorry again.

Thanks.

^ permalink raw reply

* Re: [PATCH] cgroup: fix panic in netprio_cgroup
From: Gao feng @ 2012-07-05  9:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, linux-kernel, nhorman, tj, lizefan
In-Reply-To: <1341477809.2583.3437.camel@edumazet-glaptop>

于 2012年07月05日 16:43, Eric Dumazet 写道:
> On Thu, 2012-07-05 at 16:31 +0800, Gao feng wrote:
>> we set max_prioidx to the first zero bit index of prioidx_map in
>> function get_prioidx.
>>
>> So when we delete the low index netprio cgroup and adding a new
>> netprio cgroup again,the max_prioidx will be set to the low index.
>>
>> when we set the high index cgroup's net_prio.ifpriomap,the function
>> write_priomap will call update_netdev_tables to alloc memory which
>> size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
>> so the size of array that map->priomap point to is max_prioidx +1,
>> which is low than what we actually need.
>>
>> fix this by adding check in get_prioidx,only set max_prioidx when
>> max_prioidx low than the new prioidx.
>>
>> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
>> ---
>>  net/core/netprio_cgroup.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
>> index 5b8aa2f..586f7d9 100644
>> --- a/net/core/netprio_cgroup.c
>> +++ b/net/core/netprio_cgroup.c
>> @@ -50,7 +50,8 @@ static int get_prioidx(u32 *prio)
>>  	}
>>  	set_bit(prioidx, prioidx_map);
>>  	spin_unlock_irqrestore(&prioidx_map_lock, flags);
>> -	atomic_set(&max_prioidx, prioidx);
>> +	if (atomic_read(&max_prioidx) < prioidx)
>> +		atomic_set(&max_prioidx, prioidx);
>>  	*prio = prioidx;
>>  	return 0;
>>  }
> 
> This is still racy.
> 
> Please do this before the 
> spin_unlock_irqrestore(&prioidx_map_lock, flags);
> 

Thanks Eric,you are right
I will fix and resent it.

^ permalink raw reply

* Re: [PATCH] cgroup: fix panic in netprio_cgroup
From: David Miller @ 2012-07-05  8:58 UTC (permalink / raw)
  To: gaofeng; +Cc: netdev, linux-kernel, nhorman, tj, lizefan
In-Reply-To: <1341477102-16988-1-git-send-email-gaofeng@cn.fujitsu.com>

Why did you post this twice?

Is there a difference between the first patch and the second
one you posted?  If so, what is that difference?

^ permalink raw reply

* Re: [PATCH] cgroup: fix panic in netprio_cgroup
From: Eric Dumazet @ 2012-07-05  8:43 UTC (permalink / raw)
  To: Gao feng; +Cc: davem, netdev, linux-kernel, nhorman, tj, lizefan
In-Reply-To: <1341477102-16988-1-git-send-email-gaofeng@cn.fujitsu.com>

On Thu, 2012-07-05 at 16:31 +0800, Gao feng wrote:
> we set max_prioidx to the first zero bit index of prioidx_map in
> function get_prioidx.
> 
> So when we delete the low index netprio cgroup and adding a new
> netprio cgroup again,the max_prioidx will be set to the low index.
> 
> when we set the high index cgroup's net_prio.ifpriomap,the function
> write_priomap will call update_netdev_tables to alloc memory which
> size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
> so the size of array that map->priomap point to is max_prioidx +1,
> which is low than what we actually need.
> 
> fix this by adding check in get_prioidx,only set max_prioidx when
> max_prioidx low than the new prioidx.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
> ---
>  net/core/netprio_cgroup.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index 5b8aa2f..586f7d9 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -50,7 +50,8 @@ static int get_prioidx(u32 *prio)
>  	}
>  	set_bit(prioidx, prioidx_map);
>  	spin_unlock_irqrestore(&prioidx_map_lock, flags);
> -	atomic_set(&max_prioidx, prioidx);
> +	if (atomic_read(&max_prioidx) < prioidx)
> +		atomic_set(&max_prioidx, prioidx);
>  	*prio = prioidx;
>  	return 0;
>  }

This is still racy.

Please do this before the 
spin_unlock_irqrestore(&prioidx_map_lock, flags);

^ permalink raw reply

* Re: TCP transmit performance regression
From: Ming Lei @ 2012-07-05  8:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller
In-Reply-To: <1341477192.2583.3415.camel@edumazet-glaptop>

[-- Attachment #1: Type: text/plain, Size: 759 bytes --]

On Thu, Jul 5, 2012 at 4:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 16:27 +0800, Ming Lei wrote:
>
>> After some investigation, the problem is caused by enabling
>> DEBUG_SLAB, so it is not a regression.
>>
>
> Strange, unless your machine is a _very_ slow one maybe ?

It is a beagle-xm board, and its cpu is ARMv7, 1GHz.

>
>>
>> Looks no improvement. I still don't know why the window size becomes so
>> small even in good situation(disabling DEBUG_SLAB), and the small
>> window size will cause almost every tcp data packet acked.
>
> You are probably missing the fact that window scaling is enabled.
>
> If you dont post a pcap, I am afraid we cant really help.

See attachment for the pcap trace.


Thanks,
-- 
Ming Lei

[-- Attachment #2: tcp.pcap --]
[-- Type: application/octet-stream, Size: 97922 bytes --]

^ permalink raw reply

* Re: TCP transmit performance regression
From: Eric Dumazet @ 2012-07-05  8:33 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller
In-Reply-To: <CACVXFVNxcdEYd-KmkUe9=8+x_9s-ZVuoM=FfZ=QXa7w_qRiTnw@mail.gmail.com>

On Thu, 2012-07-05 at 16:27 +0800, Ming Lei wrote:

> After some investigation, the problem is caused by enabling
> DEBUG_SLAB, so it is not a regression.
> 

Strange, unless your machine is a _very_ slow one maybe ?

> 
> Looks no improvement. I still don't know why the window size becomes so
> small even in good situation(disabling DEBUG_SLAB), and the small
> window size will cause almost every tcp data packet acked.

You are probably missing the fact that window scaling is enabled.

If you dont post a pcap, I am afraid we cant really help.

^ permalink raw reply

* [PATCH] cgroup: fix panic in netprio_cgroup
From: Gao feng @ 2012-07-05  8:31 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, nhorman, tj, lizefan, Gao feng

we set max_prioidx to the first zero bit index of prioidx_map in
function get_prioidx.

So when we delete the low index netprio cgroup and adding a new
netprio cgroup again,the max_prioidx will be set to the low index.

when we set the high index cgroup's net_prio.ifpriomap,the function
write_priomap will call update_netdev_tables to alloc memory which
size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
so the size of array that map->priomap point to is max_prioidx +1,
which is low than what we actually need.

fix this by adding check in get_prioidx,only set max_prioidx when
max_prioidx low than the new prioidx.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
---
 net/core/netprio_cgroup.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 5b8aa2f..586f7d9 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -50,7 +50,8 @@ static int get_prioidx(u32 *prio)
 	}
 	set_bit(prioidx, prioidx_map);
 	spin_unlock_irqrestore(&prioidx_map_lock, flags);
-	atomic_set(&max_prioidx, prioidx);
+	if (atomic_read(&max_prioidx) < prioidx)
+		atomic_set(&max_prioidx, prioidx);
 	*prio = prioidx;
 	return 0;
 }
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH net-next v2] ipv4: defer fib_compute_spec_dst() call
From: Eric Dumazet @ 2012-07-05  8:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

ip_options_compile() can avoid calling fib_compute_spec_dst()
by default, and perform the call only if needed.

David suggested to add a helper to make the call only once.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/ip_options.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 1f02251..a19d647 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -242,6 +242,15 @@ void ip_options_fragment(struct sk_buff *skb)
 	opt->ts_needtime = 0;
 }
 
+/* helper used by ip_options_compile() to call fib_compute_spec_dst()
+ * at most one time.
+ */
+static void spec_dst_fill(__be32 *spec_dst, struct sk_buff *skb)
+{
+	if (*spec_dst == htonl(INADDR_ANY))
+		*spec_dst = fib_compute_spec_dst(skb);
+}
+
 /*
  * Verify options and fill pointers in struct options.
  * Caller should clear *opt, and set opt->data.
@@ -251,7 +260,7 @@ void ip_options_fragment(struct sk_buff *skb)
 int ip_options_compile(struct net *net,
 		       struct ip_options *opt, struct sk_buff *skb)
 {
-	__be32 spec_dst = (__force __be32) 0;
+	__be32 spec_dst = htonl(INADDR_ANY);
 	unsigned char *pp_ptr = NULL;
 	struct rtable *rt = NULL;
 	unsigned char *optptr;
@@ -260,8 +269,6 @@ int ip_options_compile(struct net *net,
 
 	if (skb != NULL) {
 		rt = skb_rtable(skb);
-		if (rt)
-			spec_dst = fib_compute_spec_dst(skb);
 		optptr = (unsigned char *)&(ip_hdr(skb)[1]);
 	} else
 		optptr = opt->__data;
@@ -334,6 +341,7 @@ int ip_options_compile(struct net *net,
 					goto error;
 				}
 				if (rt) {
+					spec_dst_fill(&spec_dst, skb);
 					memcpy(&optptr[optptr[2]-1], &spec_dst, 4);
 					opt->is_changed = 1;
 				}
@@ -376,6 +384,7 @@ int ip_options_compile(struct net *net,
 					}
 					opt->ts = optptr - iph;
 					if (rt)  {
+						spec_dst_fill(&spec_dst, skb);
 						memcpy(&optptr[optptr[2]-1], &spec_dst, 4);
 						timeptr = &optptr[optptr[2]+3];
 					}

^ permalink raw reply related

* Re: TCP transmit performance regression
From: Ming Lei @ 2012-07-05  8:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller
In-Reply-To: <1341474192.2583.3299.camel@edumazet-glaptop>

On Thu, Jul 5, 2012 at 3:43 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 09:45 +0800, Ming Lei wrote:
>> Hi,
>>
>> I observed that on both 3.5-rc5 and 3.5-rc5-next, TCP transmit performance
>> degrades a lot, see my below simple test:
>>
>> 1, test box
>> NIC: 100M USB, normally can reach > 90Mbits/sec
>>
>
> What was the last "OK" kernel version ?

After some investigation, the problem is caused by enabling
DEBUG_SLAB, so it is not a regression.

>
> What NIC driver is it ?
>
>> 2, run below command on the box:
>> [root@root]#iperf -c 192.168.0.103 -w 131072 -t 10
>> ------------------------------------------------------------
>> Client connecting to 192.168.0.103, TCP port 5001
>> TCP window size:   256 KByte (WARNING: requested   128 KByte)
>> ------------------------------------------------------------
>> [  3] local 192.168.0.108 port 59315 connected with 192.168.0.103 port 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0-10.0 sec  40.4 MBytes  33.9 Mbits/sec
>>
>> note: 192.168.0.103 is another production machine running 'iperf -s -w 131072'
>>
>> 3, from traffic captured in wireshark, the window size of most of tcp packets
>> from the test box to 192.168.0.103 is set as 229, looks very weird and should
>> be the cause of performance regression.
>>
>
> Packets sent to 192.168.0.103 announce the window suitable for packets
> in the other way, so not relevant to your problem.
>
> Could you do
>
> # tcpdump -i eth0 -s 100 -c 1000 -w tcp.pcap host 192.168.0.103 &
> # iperf -c 192.168.0.103 -w 131072 -t 10
>
> and post the tcp.pcap file ?
>
> By the way, if you remove -w 131072 (on both sides), I guess throughput
> will increase.

Looks no improvement. I still don't know why the window size becomes so
small even in good situation(disabling DEBUG_SLAB), and the small
window size will cause almost every tcp data packet acked.


Thanks,
-- 
Ming Lei

^ permalink raw reply

* [patch] [AX.25]: small cleanup in ax25_addr_parse()
From: Dan Carpenter @ 2012-07-05  8:27 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: David S. Miller, linux-hams, netdev, kernel-janitors

The comments were wrong here because "AX25_MAX_DIGIS" is 8 but the
comments say 6.  Also I've changed the "7" to "AX25_ADDR_LEN".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/net/ax25/ax25_addr.c b/net/ax25/ax25_addr.c
index 9162409..e7c9b0e 100644
--- a/net/ax25/ax25_addr.c
+++ b/net/ax25/ax25_addr.c
@@ -189,8 +189,10 @@ const unsigned char *ax25_addr_parse(const unsigned char *buf, int len,
 	digi->ndigi      = 0;

 	while (!(buf[-1] & AX25_EBIT)) {
-		if (d >= AX25_MAX_DIGIS)  return NULL;	/* Max of 6 digis */
-		if (len < 7) return NULL;	/* Short packet */
+		if (d >= AX25_MAX_DIGIS)
+			return NULL;
+		if (len < AX25_ADDR_LEN)
+			return NULL;

 		memcpy(&digi->calls[d], buf, AX25_ADDR_LEN);
 		digi->ndigi = d + 1;

^ permalink raw reply related

* Re: [PATCH] ipv4: Create and use fib_compute_spec_dst() helper.
From: Eric Dumazet @ 2012-07-05  8:10 UTC (permalink / raw)
  To: David Miller; +Cc: ja, netdev
In-Reply-To: <20120705.005940.1078811938047681715.davem@davemloft.net>

On Thu, 2012-07-05 at 00:59 -0700, David Miller wrote:

> Yes, this is a great idea.  Actually in some obscure cases your
> change can cause us to compute it more than once I think.
> 
> I'd suggest we do something like create a helper function above this
> code in ip_options.c that checks whether spec_dst is INADDR_ANY or
> not, to guard computing it multiple times.
> 
> Could you put together a quick patch like that?

Sure I'll do that.

^ permalink raw reply

* Re: AF_BUS socket address family
From: Linus Walleij @ 2012-07-05  7:59 UTC (permalink / raw)
  To: Vincent Sanders
  Cc: netdev, linux-kernel, David S. Miller, Arve Hjønnevåg,
	Daniel Walker, John Stultz, Anton Vorontsov, Greg Kroah-Hartman
In-Reply-To: <1340988354-26981-1-git-send-email-vincent.sanders@collabora.co.uk>

2012/6/29 Vincent Sanders <vincent.sanders@collabora.co.uk>:

> AF_BUS is a message oriented inter process communication system.

We have a very huge and important in-kernel IPC message passer
in drivers/staging/android/binder.c

It's deployed in some 400 million devices according to latest reports.
John Stultz & Anton Vorontsov are trying to look after these Android
drivers a bit...

I and others discussed this in the past with the Android folks. Dianne
makes an excellent summary of how it works here:
https://lkml.org/lkml/2009/6/25/3

If we could all be convinced that this thing also fulfills the needs
of what binder does, this is a pretty solid case for it too. I can
sure see that some of the shortcuts that Android is taking with
binder try to address the same issue of high-speed IPC loopholes
through the kernel and some kind of security model.

Whether Android would actually use it (or wrap it) is a totally
different question, but what I think we need to know is whether it
*could*. And staging code has to move forward, maybe this
is the direction it should move?

Yours,
Linus Walleij

^ permalink raw reply

* Re: [PATCH] ipv4: Create and use fib_compute_spec_dst() helper.
From: David Miller @ 2012-07-05  7:59 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ja, netdev
In-Reply-To: <1341474745.2583.3325.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 05 Jul 2012 09:52:25 +0200

> [PATCH] ipv4: defer fib_compute_spec_dst() call
> 
> ip_options_compile() can avoid calling fib_compute_spec_dst()
> by default, and perform the call if needed.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Yes, this is a great idea.  Actually in some obscure cases your
change can cause us to compute it more than once I think.

I'd suggest we do something like create a helper function above this
code in ip_options.c that checks whether spec_dst is INADDR_ANY or
not, to guard computing it multiple times.

Could you put together a quick patch like that?

^ permalink raw reply

* Re: [PATCH] ipv4: Create and use fib_compute_spec_dst() helper.
From: Eric Dumazet @ 2012-07-05  7:52 UTC (permalink / raw)
  To: David Miller; +Cc: ja, netdev
In-Reply-To: <20120704.161335.1503971699878518173.davem@davemloft.net>

On Wed, 2012-07-04 at 16:13 -0700, David Miller wrote:

> ====================
> ipv4: Fix crashes in ip_options_compile().
> 
> The spec_dst uses should be guarded by skb_rtable() being non-NULL
> not just the SKB being non-null.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---

Seems good to me thanks.

By the way, maybe we can defer fib_compute_spec_dst() call to the point
we really need it ?

[PATCH] ipv4: defer fib_compute_spec_dst() call

ip_options_compile() can avoid calling fib_compute_spec_dst()
by default, and perform the call if needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 1f02251..54ab83f 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -260,8 +260,6 @@ int ip_options_compile(struct net *net,
 
 	if (skb != NULL) {
 		rt = skb_rtable(skb);
-		if (rt)
-			spec_dst = fib_compute_spec_dst(skb);
 		optptr = (unsigned char *)&(ip_hdr(skb)[1]);
 	} else
 		optptr = opt->__data;
@@ -334,6 +332,7 @@ int ip_options_compile(struct net *net,
 					goto error;
 				}
 				if (rt) {
+					spec_dst = fib_compute_spec_dst(skb);
 					memcpy(&optptr[optptr[2]-1], &spec_dst, 4);
 					opt->is_changed = 1;
 				}
@@ -376,6 +375,7 @@ int ip_options_compile(struct net *net,
 					}
 					opt->ts = optptr - iph;
 					if (rt)  {
+						spec_dst = fib_compute_spec_dst(skb);
 						memcpy(&optptr[optptr[2]-1], &spec_dst, 4);
 						timeptr = &optptr[optptr[2]+3];
 					}

^ permalink raw reply related

* Re: TCP transmit performance regression
From: Eric Dumazet @ 2012-07-05  7:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller
In-Reply-To: <CACVXFVNM-Db=_793SVfRj+nxGtNG0pRFrwc_F9TGbU0FfES63A@mail.gmail.com>

On Thu, 2012-07-05 at 09:45 +0800, Ming Lei wrote:
> Hi,
> 
> I observed that on both 3.5-rc5 and 3.5-rc5-next, TCP transmit performance
> degrades a lot, see my below simple test:
> 
> 1, test box
> NIC: 100M USB, normally can reach > 90Mbits/sec
> 

What was the last "OK" kernel version ?

What NIC driver is it ?

> 2, run below command on the box:
> [root@root]#iperf -c 192.168.0.103 -w 131072 -t 10
> ------------------------------------------------------------
> Client connecting to 192.168.0.103, TCP port 5001
> TCP window size:   256 KByte (WARNING: requested   128 KByte)
> ------------------------------------------------------------
> [  3] local 192.168.0.108 port 59315 connected with 192.168.0.103 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  40.4 MBytes  33.9 Mbits/sec
> 
> note: 192.168.0.103 is another production machine running 'iperf -s -w 131072'
> 
> 3, from traffic captured in wireshark, the window size of most of tcp packets
> from the test box to 192.168.0.103 is set as 229, looks very weird and should
> be the cause of performance regression.
> 

Packets sent to 192.168.0.103 announce the window suitable for packets
in the other way, so not relevant to your problem.

Could you do

# tcpdump -i eth0 -s 100 -c 1000 -w tcp.pcap host 192.168.0.103 &
# iperf -c 192.168.0.103 -w 131072 -t 10

and post the tcp.pcap file ?

By the way, if you remove -w 131072 (on both sides), I guess throughput
will increase.

^ permalink raw reply

* Re: [PATCH net 3/7] qlge: Garbage values shown in extra info during selftest.
From: David Miller @ 2012-07-05  7:23 UTC (permalink / raw)
  To: jitendra.kalsaria; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1341272514-5156-4-git-send-email-jitendra.kalsaria@qlogic.com>

Why are you posting an arbitrary patch from a patch series,
yet not the rest of that series?

This needs to be sent alongside the rest of the series.

^ permalink raw reply

* RE: BISECTED: Re: REGRESSION: 3.4.0->3.5.0-rc2 kernel WARNING on cable plug on Acer Aspire One, no network
From: Marek Szyprowski @ 2012-07-05  6:58 UTC (permalink / raw)
  To: 'Alex Villacís Lasso', 'Francois Romieu',
	netdev
In-Reply-To: <4FF514B2.4050000@palosanto.com>

Hello,

On Thursday, July 05, 2012 6:15 AM Alex Villacís Lasso wrote:

> El 04/07/12 02:02, Marek Szyprowski escribió:
> > Hello,
> >
> > On Tuesday, July 03, 2012 4:27 PM Alex Villací¬s Lasso wrote:
> >
> >> El 03/07/12 00:40, Marek Szyprowski escribió:
> >>> Hi Alex,
> >>>
> >>> On Tuesday, July 03, 2012 4:45 AM Alex Villacís Lasso wrote:
> >>>
> >>>> -------- Mensaje original --------
> >>>> Asunto:  BISECTED: Re: REGRESSION: 3.4.0->3.5.0-rc2 kernel WARNING on cable
> >>>> plug on Acer Aspire One, no network Fecha:  Mon, 02 Jul 2012 21:33:41 -0500 De:
> >>>>    Alex Villacís Lasso <a_villacis@palosanto.com> Para:  Francois Romieu
> >>>> <romieu@fr.zoreil.com> CC:  netdev@vger.kernel.org
> >>>> El 01/07/12 08:50, Alex Villacís Lasso escribió:
> >>>>> El 11/06/12 16:38, Francois Romieu escribió:
> >>>>>> Alex Villacís Lasso <a_villacis@palosanto.com> :
> >>>>>> [...]
> >>>>>>> $ grep XID dmesg-3.5.0-rc2.txt
> >>>>>>> [   15.873858] r8169 0000:02:00.0: eth0: RTL8102e at 0xf7c0e000,
> >>>>>>> 00:1e:68:e5:5d:b1, XID 04a00000 IRQ 44
> >>>>>> The 8102e has not been touched by that many suspect patches but I do
> >>>>>> not see where the problem is :o(
> >>>>>>
> >>>>>> Can you peel off the r8169 patches between 3.4.0 and 3.5-rc ?
> >>>>>>
> >>>>> Still present in 3.5-rc5. Bisection still in progress.
> >>>>>
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>> My full bisection points to this commit:
> >>>>
> >>>> commit 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6
> >>>> Author: Marek Szyprowski <m.szyprowski@samsung.com>
> >>>> Date:   Thu Dec 29 13:09:51 2011 +0100
> >>>>
> >>>>       X86: integrate CMA with DMA-mapping subsystem
> >>>>
> >>>>       This patch adds support for CMA to dma-mapping subsystem for x86
> >>>>       architecture that uses common pci-dma/pci-nommu implementation. This
> >>>>       allows to test CMA on KVM/QEMU and a lot of common x86 boxes.
> >>>>
> >>>>       Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> >>>>       Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> >>>>       CC: Michal Nazarewicz <mina86@mina86.com>
> >>>>       Acked-by: Arnd Bergmann <arnd@arndb.de>
> >>>>
> >>>> Is this commit somehow messing with the network card DMA?
> >>> This commit in fact touches DMA-mapping subsystem and introduces a bug,
> >>> which has been finally fixed by commit c080e26edc3a2a3 merged to v3.5-rc3.
> >>> After applying it the DMA-mapping subsystem should work exactly the same was
> >>> as in v3.4. Could you please check if it fixes this issue?
> >>>
> >>> Best regards
> >> No. It still fails in 3.5-rc5, as mentioned before.
> > Hmm. I was a bit confused, because both the subject and git bisect log pointed to v3.5-rc2,
> > which had that bug. Maybe there is one some other issue present in v3.5-rc5 not related to
> > my patches?
> >
> > Could you check with v3.5-rc5 if reverting patch c080e26edc3a2a3cdfa4c430c663ee1c3bbd8fae
> > and 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6 fixes the problems with rtl driver?
> >
> > Best regards
> Reverting the two patches indeed fixes the bug on -rc5.

That's really strange. Could you check if you have CMA disabled in the config? After preparing
a c080e26edc3a2a3cdfa4c430c663ee1c3bbd8fae fixup patch, I was really convinced that there are
no functional changes in x86 dma mapping code when CMA is disabled. I will provide some 
patches to revert different parts of my changes, so we will find which line causes issues.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center

^ permalink raw reply

* Re: BISECTED: Re: REGRESSION: 3.4.0->3.5.0-rc2 kernel WARNING on cable plug on Acer Aspire One, no network
From: Alex Villacís Lasso @ 2012-07-05  4:14 UTC (permalink / raw)
  To: Marek Szyprowski, Francois Romieu, netdev
In-Reply-To: <000901cd59b2$f2a542e0$d7efc8a0$%szyprowski@samsung.com>

El 04/07/12 02:02, Marek Szyprowski escribió:
> Hello,
>
> On Tuesday, July 03, 2012 4:27 PM Alex Villací¬s Lasso wrote:
>
>> El 03/07/12 00:40, Marek Szyprowski escribió:
>>> Hi Alex,
>>>
>>> On Tuesday, July 03, 2012 4:45 AM Alex Villacís Lasso wrote:
>>>
>>>> -------- Mensaje original --------
>>>> Asunto:  BISECTED: Re: REGRESSION: 3.4.0->3.5.0-rc2 kernel WARNING on cable
>>>> plug on Acer Aspire One, no network Fecha:  Mon, 02 Jul 2012 21:33:41 -0500 De:
>>>>    Alex Villacís Lasso <a_villacis@palosanto.com> Para:  Francois Romieu
>>>> <romieu@fr.zoreil.com> CC:  netdev@vger.kernel.org
>>>> El 01/07/12 08:50, Alex Villacís Lasso escribió:
>>>>> El 11/06/12 16:38, Francois Romieu escribió:
>>>>>> Alex Villacís Lasso <a_villacis@palosanto.com> :
>>>>>> [...]
>>>>>>> $ grep XID dmesg-3.5.0-rc2.txt
>>>>>>> [   15.873858] r8169 0000:02:00.0: eth0: RTL8102e at 0xf7c0e000,
>>>>>>> 00:1e:68:e5:5d:b1, XID 04a00000 IRQ 44
>>>>>> The 8102e has not been touched by that many suspect patches but I do
>>>>>> not see where the problem is :o(
>>>>>>
>>>>>> Can you peel off the r8169 patches between 3.4.0 and 3.5-rc ?
>>>>>>
>>>>> Still present in 3.5-rc5. Bisection still in progress.
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> My full bisection points to this commit:
>>>>
>>>> commit 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6
>>>> Author: Marek Szyprowski <m.szyprowski@samsung.com>
>>>> Date:   Thu Dec 29 13:09:51 2011 +0100
>>>>
>>>>       X86: integrate CMA with DMA-mapping subsystem
>>>>
>>>>       This patch adds support for CMA to dma-mapping subsystem for x86
>>>>       architecture that uses common pci-dma/pci-nommu implementation. This
>>>>       allows to test CMA on KVM/QEMU and a lot of common x86 boxes.
>>>>
>>>>       Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>       Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>>>       CC: Michal Nazarewicz <mina86@mina86.com>
>>>>       Acked-by: Arnd Bergmann <arnd@arndb.de>
>>>>
>>>> Is this commit somehow messing with the network card DMA?
>>> This commit in fact touches DMA-mapping subsystem and introduces a bug,
>>> which has been finally fixed by commit c080e26edc3a2a3 merged to v3.5-rc3.
>>> After applying it the DMA-mapping subsystem should work exactly the same was
>>> as in v3.4. Could you please check if it fixes this issue?
>>>
>>> Best regards
>> No. It still fails in 3.5-rc5, as mentioned before.
> Hmm. I was a bit confused, because both the subject and git bisect log pointed to v3.5-rc2,
> which had that bug. Maybe there is one some other issue present in v3.5-rc5 not related to
> my patches?
>
> Could you check with v3.5-rc5 if reverting patch c080e26edc3a2a3cdfa4c430c663ee1c3bbd8fae
> and 0a2b9a6ea93650b8a00f9fd5ee8fdd25671e2df6 fixes the problems with rtl driver?
>
> Best regards
Reverting the two patches indeed fixes the bug on -rc5.

^ permalink raw reply

* RE: [PATCH 1/2] be2net: Fix Endian
From: Somnath.Kotur @ 2012-07-05  4:00 UTC (permalink / raw)
  To: roy.qing.li, netdev
In-Reply-To: <1341453942-4198-1-git-send-email-roy.qing.li@gmail.com>



> -----Original Message-----
> From: roy.qing.li@gmail.com [mailto:roy.qing.li@gmail.com]
> Sent: Thursday, July 05, 2012 7:36 AM
> To: netdev@vger.kernel.org
> Cc: Kotur, Somnath
> Subject: [PATCH 1/2] be2net: Fix Endian
> 
> From: Li RongQing <roy.qing.li@gmail.com>
> 
> ETH_P_IP is host Endian, skb->protocol is big Endian, when compare them,
> we should change ETH_P_IP from host endian to big endian, htons, not
> ntohs.
> 
> CC: Somnath Kotur <somnath.kotur@emulex.com>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

Oops!  Unintended...Thanks! 
Acked-by: Somnath Kotur <somnath.kotur@emulex.com>

^ permalink raw reply

* Re: [PATCH 4 2/4] NET ethernet introduce mac_platform helper
From: Joe Perches @ 2012-07-05  3:25 UTC (permalink / raw)
  To: Andy Green
  Cc: linux-omap, s-jan, arnd, patches, tony, netdev, linux-kernel,
	rostedt, linux-arm-kernel
In-Reply-To: <4FF507FF.3000604@linaro.org>

On Thu, 2012-07-05 at 11:20 +0800, Andy Green wrote:
> On 05/07/12 11:12, the mail apparently from Joe Perches included:
[]
> >> diff --git a/net/ethernet/mac-platform.c b/net/ethernet/mac-platform.c
> > []
> >> +static int mac_platform_netdev_event(struct notifier_block *this,
> >> +						unsigned long event, void *ptr)
> >
> > alignment to parenthesis please.
> 
> OK.  Although different places in the kernel seem to have different 
> expectations about that.

net and drivers/net is pretty consistent.
Most of the exceptions are old code.
Some of those exceptions are being slowly updated too.

cheers, Joe

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox