public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* TCP transmit performance regression
@ 2012-07-05  1:45 Ming Lei
  2012-07-05  7:43 ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2012-07-05  1:45 UTC (permalink / raw)
  To: Network Development, David Miller

Hi,

I observed that on both 3.5-rc5 and 3.5-rc5-next, TCP transmit performance
degrades a lot, see my below simple test:

1, test box
NIC: 100M USB, normally can reach > 90Mbits/sec

2, run below command on the box:
[root@root]#iperf -c 192.168.0.103 -w 131072 -t 10
------------------------------------------------------------
Client connecting to 192.168.0.103, TCP port 5001
TCP window size:   256 KByte (WARNING: requested   128 KByte)
------------------------------------------------------------
[  3] local 192.168.0.108 port 59315 connected with 192.168.0.103 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  40.4 MBytes  33.9 Mbits/sec

note: 192.168.0.103 is another production machine running 'iperf -s -w 131072'

3, from traffic captured in wireshark, the window size of most of tcp packets
from the test box to 192.168.0.103 is set as 229, looks very weird and should
be the cause of performance regression.

4, TCP receive performance is OK.


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05  1:45 TCP transmit performance regression Ming Lei
@ 2012-07-05  7:43 ` Eric Dumazet
  2012-07-05  8:27   ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2012-07-05  7:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Thu, 2012-07-05 at 09:45 +0800, Ming Lei wrote:
> Hi,
> 
> I observed that on both 3.5-rc5 and 3.5-rc5-next, TCP transmit performance
> degrades a lot, see my below simple test:
> 
> 1, test box
> NIC: 100M USB, normally can reach > 90Mbits/sec
> 

What was the last "OK" kernel version ?

What NIC driver is it ?

> 2, run below command on the box:
> [root@root]#iperf -c 192.168.0.103 -w 131072 -t 10
> ------------------------------------------------------------
> Client connecting to 192.168.0.103, TCP port 5001
> TCP window size:   256 KByte (WARNING: requested   128 KByte)
> ------------------------------------------------------------
> [  3] local 192.168.0.108 port 59315 connected with 192.168.0.103 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  40.4 MBytes  33.9 Mbits/sec
> 
> note: 192.168.0.103 is another production machine running 'iperf -s -w 131072'
> 
> 3, from traffic captured in wireshark, the window size of most of tcp packets
> from the test box to 192.168.0.103 is set as 229, looks very weird and should
> be the cause of performance regression.
> 

Packets sent to 192.168.0.103 announce the window suitable for packets
in the other way, so not relevant to your problem.

Could you do

# tcpdump -i eth0 -s 100 -c 1000 -w tcp.pcap host 192.168.0.103 &
# iperf -c 192.168.0.103 -w 131072 -t 10

and post the tcp.pcap file ?

By the way, if you remove -w 131072 (on both sides), I guess throughput
will increase.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05  7:43 ` Eric Dumazet
@ 2012-07-05  8:27   ` Ming Lei
  2012-07-05  8:33     ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2012-07-05  8:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

On Thu, Jul 5, 2012 at 3:43 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 09:45 +0800, Ming Lei wrote:
>> Hi,
>>
>> I observed that on both 3.5-rc5 and 3.5-rc5-next, TCP transmit performance
>> degrades a lot, see my below simple test:
>>
>> 1, test box
>> NIC: 100M USB, normally can reach > 90Mbits/sec
>>
>
> What was the last "OK" kernel version ?

After some investigation, the problem is caused by enabling
DEBUG_SLAB, so it is not a regression.

>
> What NIC driver is it ?
>
>> 2, run below command on the box:
>> [root@root]#iperf -c 192.168.0.103 -w 131072 -t 10
>> ------------------------------------------------------------
>> Client connecting to 192.168.0.103, TCP port 5001
>> TCP window size:   256 KByte (WARNING: requested   128 KByte)
>> ------------------------------------------------------------
>> [  3] local 192.168.0.108 port 59315 connected with 192.168.0.103 port 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0-10.0 sec  40.4 MBytes  33.9 Mbits/sec
>>
>> note: 192.168.0.103 is another production machine running 'iperf -s -w 131072'
>>
>> 3, from traffic captured in wireshark, the window size of most of tcp packets
>> from the test box to 192.168.0.103 is set as 229, looks very weird and should
>> be the cause of performance regression.
>>
>
> Packets sent to 192.168.0.103 announce the window suitable for packets
> in the other way, so not relevant to your problem.
>
> Could you do
>
> # tcpdump -i eth0 -s 100 -c 1000 -w tcp.pcap host 192.168.0.103 &
> # iperf -c 192.168.0.103 -w 131072 -t 10
>
> and post the tcp.pcap file ?
>
> By the way, if you remove -w 131072 (on both sides), I guess throughput
> will increase.

Looks no improvement. I still don't know why the window size becomes so
small even in good situation(disabling DEBUG_SLAB), and the small
window size will cause almost every tcp data packet acked.


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05  8:27   ` Ming Lei
@ 2012-07-05  8:33     ` Eric Dumazet
  2012-07-05  8:42       ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2012-07-05  8:33 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Thu, 2012-07-05 at 16:27 +0800, Ming Lei wrote:

> After some investigation, the problem is caused by enabling
> DEBUG_SLAB, so it is not a regression.
> 

Strange, unless your machine is a _very_ slow one maybe ?

> 
> Looks no improvement. I still don't know why the window size becomes so
> small even in good situation(disabling DEBUG_SLAB), and the small
> window size will cause almost every tcp data packet acked.

You are probably missing the fact that window scaling is enabled.

If you dont post a pcap, I am afraid we cant really help.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05  8:33     ` Eric Dumazet
@ 2012-07-05  8:42       ` Ming Lei
  2012-07-05  9:49         ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2012-07-05  8:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

[-- Attachment #1: Type: text/plain, Size: 759 bytes --]

On Thu, Jul 5, 2012 at 4:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 16:27 +0800, Ming Lei wrote:
>
>> After some investigation, the problem is caused by enabling
>> DEBUG_SLAB, so it is not a regression.
>>
>
> Strange, unless your machine is a _very_ slow one maybe ?

It is a beagle-xm board, and its cpu is ARMv7, 1GHz.

>
>>
>> Looks no improvement. I still don't know why the window size becomes so
>> small even in good situation(disabling DEBUG_SLAB), and the small
>> window size will cause almost every tcp data packet acked.
>
> You are probably missing the fact that window scaling is enabled.
>
> If you dont post a pcap, I am afraid we cant really help.

See attachment for the pcap trace.


Thanks,
-- 
Ming Lei

[-- Attachment #2: tcp.pcap --]
[-- Type: application/octet-stream, Size: 97922 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05  8:42       ` Ming Lei
@ 2012-07-05  9:49         ` Eric Dumazet
  2012-07-05 10:02           ` David Miller
                             ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Eric Dumazet @ 2012-07-05  9:49 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Thu, 2012-07-05 at 16:42 +0800, Ming Lei wrote:
> On Thu, Jul 5, 2012 at 4:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2012-07-05 at 16:27 +0800, Ming Lei wrote:
> >
> >> After some investigation, the problem is caused by enabling
> >> DEBUG_SLAB, so it is not a regression.
> >>
> >
> > Strange, unless your machine is a _very_ slow one maybe ?
> 
> It is a beagle-xm board, and its cpu is ARMv7, 1GHz.

OK, driver seems buggy, please try following patch (on both sides if
possible)

 drivers/net/usb/smsc95xx.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index b1112e7..0a4ae35 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1084,26 +1084,23 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 			if (skb->len == size) {
 				if (dev->net->features & NETIF_F_RXCSUM)
 					smsc95xx_rx_csum_offload(skb);
-				skb_trim(skb, skb->len - 4); /* remove fcs */
+				__skb_trim(skb, skb->len - 4); /* remove fcs */
 				skb->truesize = size + sizeof(struct sk_buff);
 
 				return 1;
 			}
 
-			ax_skb = skb_clone(skb, GFP_ATOMIC);
+			ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
 			if (unlikely(!ax_skb)) {
 				netdev_warn(dev->net, "Error allocating skb\n");
 				return 0;
 			}
 
-			ax_skb->len = size;
-			ax_skb->data = packet;
-			skb_set_tail_pointer(ax_skb, size);
+			memcpy(skb_put(ax_skb, size), packet, size);
 
 			if (dev->net->features & NETIF_F_RXCSUM)
 				smsc95xx_rx_csum_offload(ax_skb);
-			skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
-			ax_skb->truesize = size + sizeof(struct sk_buff);
+			__skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
 
 			usbnet_skb_return(dev, ax_skb);
 		}

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05  9:49         ` Eric Dumazet
@ 2012-07-05 10:02           ` David Miller
  2012-07-05 10:32           ` Ming Lei
  2012-07-09 13:23           ` Ming Lei
  2 siblings, 0 replies; 24+ messages in thread
From: David Miller @ 2012-07-05 10:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: tom.leiming, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 05 Jul 2012 11:49:20 +0200

> -			ax_skb->data = packet;

That's really scary.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05  9:49         ` Eric Dumazet
  2012-07-05 10:02           ` David Miller
@ 2012-07-05 10:32           ` Ming Lei
  2012-07-05 10:41             ` Eric Dumazet
  2012-07-09 13:23           ` Ming Lei
  2 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2012-07-05 10:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

On Thu, Jul 5, 2012 at 5:49 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 16:42 +0800, Ming Lei wrote:
>> On Thu, Jul 5, 2012 at 4:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Thu, 2012-07-05 at 16:27 +0800, Ming Lei wrote:
>> >
>> >> After some investigation, the problem is caused by enabling
>> >> DEBUG_SLAB, so it is not a regression.
>> >>
>> >
>> > Strange, unless your machine is a _very_ slow one maybe ?
>>
>> It is a beagle-xm board, and its cpu is ARMv7, 1GHz.
>
> OK, driver seems buggy, please try following patch (on both sides if
> possible)
>
>  drivers/net/usb/smsc95xx.c |   11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index b1112e7..0a4ae35 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1084,26 +1084,23 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>                         if (skb->len == size) {
>                                 if (dev->net->features & NETIF_F_RXCSUM)
>                                         smsc95xx_rx_csum_offload(skb);
> -                               skb_trim(skb, skb->len - 4); /* remove fcs */
> +                               __skb_trim(skb, skb->len - 4); /* remove fcs */
>                                 skb->truesize = size + sizeof(struct sk_buff);
>
>                                 return 1;
>                         }
>
> -                       ax_skb = skb_clone(skb, GFP_ATOMIC);
> +                       ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
>                         if (unlikely(!ax_skb)) {
>                                 netdev_warn(dev->net, "Error allocating skb\n");
>                                 return 0;
>                         }
>
> -                       ax_skb->len = size;
> -                       ax_skb->data = packet;
> -                       skb_set_tail_pointer(ax_skb, size);
> +                       memcpy(skb_put(ax_skb, size), packet, size);
>
>                         if (dev->net->features & NETIF_F_RXCSUM)
>                                 smsc95xx_rx_csum_offload(ax_skb);
> -                       skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
> -                       ax_skb->truesize = size + sizeof(struct sk_buff);
> +                       __skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
>
>                         usbnet_skb_return(dev, ax_skb);
>                 }
>
>

After testing on beagle-xm, the patch is good and network is OK, but
iperf performance is still no improvement, see below:

[root@root]#iperf -c 192.168.0.103 -w 131072 -t 10
------------------------------------------------------------
Client connecting to 192.168.0.103, TCP port 5001
TCP window size:   256 KByte (WARNING: requested   128 KByte)
------------------------------------------------------------
[  3] local 192.168.0.119 port 46776 connected with 192.168.0.103 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  41.4 MBytes  34.7 Mbits/sec



Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05 10:32           ` Ming Lei
@ 2012-07-05 10:41             ` Eric Dumazet
  2012-07-05 14:01               ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2012-07-05 10:41 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Thu, 2012-07-05 at 18:32 +0800, Ming Lei wrote:

> After testing on beagle-xm, the patch is good and network is OK, but
> iperf performance is still no improvement, see below:
> 
> [root@root]#iperf -c 192.168.0.103 -w 131072 -t 10
> ------------------------------------------------------------
> Client connecting to 192.168.0.103, TCP port 5001
> TCP window size:   256 KByte (WARNING: requested   128 KByte)
> ------------------------------------------------------------
> [  3] local 192.168.0.119 port 46776 connected with 192.168.0.103 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec  41.4 MBytes  34.7 Mbits/sec
> 

I fear there are copies in the tx path as well, in smsc95xx_tx_fixup()

Could you add traces in this function to check if skb_copy_expand() is
called ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05 10:41             ` Eric Dumazet
@ 2012-07-05 14:01               ` Ming Lei
  2012-07-05 14:28                 ` Eric Dumazet
  2012-07-05 14:56                 ` Eric Dumazet
  0 siblings, 2 replies; 24+ messages in thread
From: Ming Lei @ 2012-07-05 14:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

On Thu, Jul 5, 2012 at 6:41 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 18:32 +0800, Ming Lei wrote:
>
>> After testing on beagle-xm, the patch is good and network is OK, but
>> iperf performance is still no improvement, see below:
>>
>> [root@root]#iperf -c 192.168.0.103 -w 131072 -t 10
>> ------------------------------------------------------------
>> Client connecting to 192.168.0.103, TCP port 5001
>> TCP window size:   256 KByte (WARNING: requested   128 KByte)
>> ------------------------------------------------------------
>> [  3] local 192.168.0.119 port 46776 connected with 192.168.0.103 port 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0-10.0 sec  41.4 MBytes  34.7 Mbits/sec
>>
>
> I fear there are copies in the tx path as well, in smsc95xx_tx_fixup()

Basically, copy path will be bypassed since hard_header_len
has included the 'overhead' already.

> (SLAB debug is going to cost a lot with big bufers)

At default SMSC95xx turbo mode is true, rx buffer will be very big
(17.5K). Or the large rx buffer size puts limit on concurrent URBs/SKBs
count.  Both two may cause the problem.

Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05 14:01               ` Ming Lei
@ 2012-07-05 14:28                 ` Eric Dumazet
  2012-07-05 14:56                 ` Eric Dumazet
  1 sibling, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2012-07-05 14:28 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Thu, 2012-07-05 at 22:01 +0800, Ming Lei wrote:

> Basically, copy path will be bypassed since hard_header_len
> has included the 'overhead' already.

Unfortunately this is not done correctly.

needed_headroom should be set to SMSC95XX_TX_OVERHEAD_CSUM

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05 14:01               ` Ming Lei
  2012-07-05 14:28                 ` Eric Dumazet
@ 2012-07-05 14:56                 ` Eric Dumazet
  2012-07-06  0:45                   ` Ming Lei
  1 sibling, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2012-07-05 14:56 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Thu, 2012-07-05 at 22:01 +0800, Ming Lei wrote:

> At default SMSC95xx turbo mode is true, rx buffer will be very big
> (17.5K). Or the large rx buffer size puts limit on concurrent URBs/SKBs
> count.  Both two may cause the problem.

I see. So we should try to recycle these large rx buffers in usbnet
instead of allocating/freeing them for each incoming packet.

Following patch does the copybreak of all incoming frames.

It has nice property of not lying anymore on skb truesize ;)

It should be applied on both sender and receiver

 drivers/net/usb/smsc95xx.c |   19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index b1112e7..3d9566f 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1080,30 +1080,17 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 				return 0;
 			}
 
-			/* last frame in this batch */
-			if (skb->len == size) {
-				if (dev->net->features & NETIF_F_RXCSUM)
-					smsc95xx_rx_csum_offload(skb);
-				skb_trim(skb, skb->len - 4); /* remove fcs */
-				skb->truesize = size + sizeof(struct sk_buff);
-
-				return 1;
-			}
-
-			ax_skb = skb_clone(skb, GFP_ATOMIC);
+			ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
 			if (unlikely(!ax_skb)) {
 				netdev_warn(dev->net, "Error allocating skb\n");
 				return 0;
 			}
 
-			ax_skb->len = size;
-			ax_skb->data = packet;
-			skb_set_tail_pointer(ax_skb, size);
+			memcpy(skb_put(ax_skb, size), packet, size);
 
 			if (dev->net->features & NETIF_F_RXCSUM)
 				smsc95xx_rx_csum_offload(ax_skb);
-			skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
-			ax_skb->truesize = size + sizeof(struct sk_buff);
+			__skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
 
 			usbnet_skb_return(dev, ax_skb);
 		}

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05 14:56                 ` Eric Dumazet
@ 2012-07-06  0:45                   ` Ming Lei
  2012-07-06  4:58                     ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2012-07-06  0:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

On Thu, Jul 5, 2012 at 10:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 22:01 +0800, Ming Lei wrote:
>
>> At default SMSC95xx turbo mode is true, rx buffer will be very big
>> (17.5K). Or the large rx buffer size puts limit on concurrent URBs/SKBs
>> count.  Both two may cause the problem.
>
> I see. So we should try to recycle these large rx buffers in usbnet
> instead of allocating/freeing them for each incoming packet.
>
> Following patch does the copybreak of all incoming frames.
>
> It has nice property of not lying anymore on skb truesize ;)
>
> It should be applied on both sender and receiver

In fact, I run the below command in the test beagle-xm box with SMSC95xx
NIC:

             iperf -c 192.168.0.103 -w 131072 -t 10

and run the below command in one x86 production machine(e1000e NIC)
running ubuntu 12.04:

            iperf -s -w 131072

The current problem is that the transmit performance on beagle-xm is
not good with the above iperf test if DEBUG_SLAB is enabled. But if
I set dev->rx_usb_size as 2048, the transmit performance can be
doubled, looks it is caused by the large rx buffer.

>
>  drivers/net/usb/smsc95xx.c |   19 +++----------------
>  1 file changed, 3 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index b1112e7..3d9566f 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1080,30 +1080,17 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>                                 return 0;
>                         }
>
> -                       /* last frame in this batch */
> -                       if (skb->len == size) {
> -                               if (dev->net->features & NETIF_F_RXCSUM)
> -                                       smsc95xx_rx_csum_offload(skb);
> -                               skb_trim(skb, skb->len - 4); /* remove fcs */
> -                               skb->truesize = size + sizeof(struct sk_buff);
> -
> -                               return 1;
> -                       }
> -
> -                       ax_skb = skb_clone(skb, GFP_ATOMIC);
> +                       ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
>                         if (unlikely(!ax_skb)) {
>                                 netdev_warn(dev->net, "Error allocating skb\n");
>                                 return 0;
>                         }
>
> -                       ax_skb->len = size;
> -                       ax_skb->data = packet;
> -                       skb_set_tail_pointer(ax_skb, size);
> +                       memcpy(skb_put(ax_skb, size), packet, size);
>
>                         if (dev->net->features & NETIF_F_RXCSUM)
>                                 smsc95xx_rx_csum_offload(ax_skb);
> -                       skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
> -                       ax_skb->truesize = size + sizeof(struct sk_buff);
> +                       __skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
>
>                         usbnet_skb_return(dev, ax_skb);
>                 }
>
>

Unfortunately, the patch still hasn't any improvement on the transmit
performance of beagle-xm.

Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-06  0:45                   ` Ming Lei
@ 2012-07-06  4:58                     ` Eric Dumazet
  2012-07-06  5:16                       ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2012-07-06  4:58 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Fri, 2012-07-06 at 08:45 +0800, Ming Lei wrote:

> Unfortunately, the patch still hasn't any improvement on the transmit
> performance of beagle-xm.

Ah yes, I need to change usbnet as well to be able to fully recycle the
big skbs allocated in turbo mode.

Right now they are constantly allocated/freed and this sucks if SLAB
wants to check poison bytes in debug mode.

Thanks

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-06  4:58                     ` Eric Dumazet
@ 2012-07-06  5:16                       ` Eric Dumazet
  2012-07-09  5:13                         ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2012-07-06  5:16 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Fri, 2012-07-06 at 06:58 +0200, Eric Dumazet wrote:
> On Fri, 2012-07-06 at 08:45 +0800, Ming Lei wrote:
> 
> > Unfortunately, the patch still hasn't any improvement on the transmit
> > performance of beagle-xm.
> 
> Ah yes, I need to change usbnet as well to be able to fully recycle the
> big skbs allocated in turbo mode.
> 
> Right now they are constantly allocated/freed and this sucks if SLAB
> wants to check poison bytes in debug mode.

In the mean time, you also can use the following patch I have to polish,
but this should give you a nice boost, since the big skb skb->head wont
be checked by SLAB debug :



diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b21522..d31efa2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -296,9 +296,18 @@ EXPORT_SYMBOL(build_skb);
 struct netdev_alloc_cache {
 	struct page *page;
 	unsigned int offset;
+	unsigned int pagecnt_bias;
 };
 static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
 
+#if PAGE_SIZE > 32768
+#define MAX_NETDEV_FRAGSIZE	PAGE_SIZE
+#else
+#define MAX_NETDEV_FRAGSIZE	32768
+#endif
+
+#define NETDEV_PAGECNT_BIAS	(MAX_NETDEV_FRAGSIZE /		\
+				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 /**
  * netdev_alloc_frag - allocate a page fragment
  * @fragsz: fragment size
@@ -316,18 +325,25 @@ void *netdev_alloc_frag(unsigned int fragsz)
 	nc = &__get_cpu_var(netdev_alloc_cache);
 	if (unlikely(!nc->page)) {
 refill:
-		nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD);
+		nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | __GFP_COMP,
+				       get_order(MAX_NETDEV_FRAGSIZE));
+		if (unlikely(!nc->page))
+			goto end;
+recycle:
+		atomic_set(&nc->page->_count, NETDEV_PAGECNT_BIAS);
+		nc->pagecnt_bias = NETDEV_PAGECNT_BIAS;
 		nc->offset = 0;
 	}
-	if (likely(nc->page)) {
-		if (nc->offset + fragsz > PAGE_SIZE) {
-			put_page(nc->page);
-			goto refill;
-		}
-		data = page_address(nc->page) + nc->offset;
-		nc->offset += fragsz;
-		get_page(nc->page);
+	if (nc->offset + fragsz > MAX_NETDEV_FRAGSIZE) {
+		if (!atomic_sub_return(nc->pagecnt_bias,
+				       &nc->page->_count))
+			goto recycle;
+		goto refill;
 	}
+	data = page_address(nc->page) + nc->offset;
+	nc->offset += fragsz;
+	nc->pagecnt_bias--; /* avoid get_page()/get_page() false sharing */
+end:
 	local_irq_restore(flags);
 	return data;
 }
@@ -353,7 +369,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 	unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
 			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
-	if (fragsz <= PAGE_SIZE && !(gfp_mask & __GFP_WAIT)) {
+	if (fragsz <= MAX_NETDEV_FRAGSIZE && !(gfp_mask & __GFP_WAIT)) {
 		void *data = netdev_alloc_frag(fragsz);
 
 		if (likely(data)) {

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-06  5:16                       ` Eric Dumazet
@ 2012-07-09  5:13                         ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2012-07-09  5:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

On Fri, Jul 6, 2012 at 1:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2012-07-06 at 06:58 +0200, Eric Dumazet wrote:
>> On Fri, 2012-07-06 at 08:45 +0800, Ming Lei wrote:
>>
>> > Unfortunately, the patch still hasn't any improvement on the transmit
>> > performance of beagle-xm.
>>
>> Ah yes, I need to change usbnet as well to be able to fully recycle the
>> big skbs allocated in turbo mode.
>>
>> Right now they are constantly allocated/freed and this sucks if SLAB
>> wants to check poison bytes in debug mode.
>
> In the mean time, you also can use the following patch I have to polish,
> but this should give you a nice boost, since the big skb skb->head wont
> be checked by SLAB debug :

Unfortunately, the patch makes the result of the same test  worsen than
without the patch, :-(


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-05  9:49         ` Eric Dumazet
  2012-07-05 10:02           ` David Miller
  2012-07-05 10:32           ` Ming Lei
@ 2012-07-09 13:23           ` Ming Lei
  2012-07-09 13:54             ` Eric Dumazet
  2 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2012-07-09 13:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

On Thu, Jul 5, 2012 at 5:49 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-07-05 at 16:42 +0800, Ming Lei wrote:
>> On Thu, Jul 5, 2012 at 4:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Thu, 2012-07-05 at 16:27 +0800, Ming Lei wrote:
>> >
>> >> After some investigation, the problem is caused by enabling
>> >> DEBUG_SLAB, so it is not a regression.
>> >>
>> >
>> > Strange, unless your machine is a _very_ slow one maybe ?
>>
>> It is a beagle-xm board, and its cpu is ARMv7, 1GHz.
>
> OK, driver seems buggy, please try following patch (on both sides if
> possible)
>
>  drivers/net/usb/smsc95xx.c |   11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
> index b1112e7..0a4ae35 100644
> --- a/drivers/net/usb/smsc95xx.c
> +++ b/drivers/net/usb/smsc95xx.c
> @@ -1084,26 +1084,23 @@ static int smsc95xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>                         if (skb->len == size) {
>                                 if (dev->net->features & NETIF_F_RXCSUM)
>                                         smsc95xx_rx_csum_offload(skb);
> -                               skb_trim(skb, skb->len - 4); /* remove fcs */
> +                               __skb_trim(skb, skb->len - 4); /* remove fcs */
>                                 skb->truesize = size + sizeof(struct sk_buff);
>
>                                 return 1;
>                         }
>
> -                       ax_skb = skb_clone(skb, GFP_ATOMIC);
> +                       ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
>                         if (unlikely(!ax_skb)) {
>                                 netdev_warn(dev->net, "Error allocating skb\n");
>                                 return 0;
>                         }
>
> -                       ax_skb->len = size;
> -                       ax_skb->data = packet;
> -                       skb_set_tail_pointer(ax_skb, size);
> +                       memcpy(skb_put(ax_skb, size), packet, size);
>
>                         if (dev->net->features & NETIF_F_RXCSUM)
>                                 smsc95xx_rx_csum_offload(ax_skb);
> -                       skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
> -                       ax_skb->truesize = size + sizeof(struct sk_buff);
> +                       __skb_trim(ax_skb, ax_skb->len - 4); /* remove fcs */
>
>                         usbnet_skb_return(dev, ax_skb);
>                 }
>
>

Looks the patch replaces skb_clone with netdev_alloc_skb_ip_align and
introduces extra copies on incoming data, so would you mind explaining
it in a bit detail? And why is skb_clone not OK for the purpose?


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-09 13:23           ` Ming Lei
@ 2012-07-09 13:54             ` Eric Dumazet
       [not found]               ` <CACVXFVNdiwVn1Mo--N4N0HdYrEJizExtd_cppT4tS=mjog2PKw@mail.gmail.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2012-07-09 13:54 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Mon, 2012-07-09 at 21:23 +0800, Ming Lei wrote:

> Looks the patch replaces skb_clone with netdev_alloc_skb_ip_align and
> introduces extra copies on incoming data, so would you mind explaining
> it in a bit detail? And why is skb_clone not OK for the purpose?

Problem with cloning is that some paths will have to make a private copy
of the skb.

So you dont see the cost here in the driver, but later in upper stacks.

Since this driver defaults to a huge RX area of more than 16Kbytes,
a copy to a much smaller skb (we call this 'copybreak' in our jargon )
is more than welcome to avoid OOM problems anyway.

TCP coalescing (skb_try_coalesce) for example wont work for cloned skbs,
so TCP receive window will close pretty fast, and performance sucks in
lossy environments (like the Internet)

Actually, since this driver lies about skb->truesize, a single UDP frame
consumes 32Kbytes of memory, escaping normal memory limits we have in
kernel by a factor of 64. Thats pretty bad, especially for a beagle
board.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
       [not found]                 ` <1341895143.3265.4049.camel@edumazet-glaptop>
@ 2012-07-10  7:22                   ` Ming Lei
  2012-07-10  8:28                     ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2012-07-10  7:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

On Tue, Jul 10, 2012 at 12:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Please dont send private messages for discussing general linux stuff.
>
> Next time I wont reply.
>
> On Tue, 2012-07-10 at 12:00 +0800, Ming Lei wrote:
>> On Mon, Jul 9, 2012 at 9:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2012-07-09 at 21:23 +0800, Ming Lei wrote:
>> >
>> >> Looks the patch replaces skb_clone with netdev_alloc_skb_ip_align and
>> >> introduces extra copies on incoming data, so would you mind explaining
>> >> it in a bit detail? And why is skb_clone not OK for the purpose?
>> >
>> > Problem with cloning is that some paths will have to make a private copy
>> > of the skb.
>>
>> Looks you convert some private copy into all copy in rx path, :-)
>
> For small speed device, a copy is probably unnoticed.

The copy still has some effect on low speed device, for example, your recent
patch on asix driver can improve tx performance from ~75M to ~92M.

>
> rtl8169 does that (copybreak) for security issues on Gbps link speed,
> and I get Gbps link speed on an old AMD host with no problem.
>
> As you discovered, the slowdown comes from SLAB debug on the 30K huge
> skb. To recover from this we must patch usbnet to not constantly
> allocate/free such big RX skb but recycle them. Once we do that, you'll
> find out that copybreak improves general performance on low ram devices
> by an order of magnitude.

Looks your copybreak patch doesn't improve tx performance on smsc95xx.

>> >
>> > So you dont see the cost here in the driver, but later in upper stacks.
>> >
>> > Since this driver defaults to a huge RX area of more than 16Kbytes,
>> > a copy to a much smaller skb (we call this 'copybreak' in our jargon )
>> > is more than welcome to avoid OOM problems anyway.
>>
>> Looks 'memory compaction' has been implemented already to address
>> the big buffer allocation problem.
>
> Usually its too late (not enough ram to perform the compaction), and
> a collapse having to compact 3MB is very expensive and blows cpu caches.
>
> I noticed that on machines with 1GB or 2GB ram. These machines are
> called ChromeBooks and every lost network frame is analyzed in Google.
> And we had problems because some wifi adapters use 8KB skbs for incoming
> frames.

Kernel stack size is 8KB or more, so could you find process creation failure
in your ChromeBooks machine at the same time?

> (Not even 32KB !!! This is just crazy !!)
>
> Relying on TCP collapsing is just very lazy. What about other
> protocols ?
>
> I guess that on beagle this can happen very fast.

Previously I only found there was usbnet OOMs triggered by
kmalloc(GFP_ATOMIC), but kmalloc(GFP_KERNEL) can succeed.
Some times later, the problem disappeared.

>>
>> Also the allocated huge RX SKB buffer will be freed after all cloned buffers
>> are consumed, so I still don't know what is the real problem with cloned buffer.
>>
>
> IF they are consumed.
>
> But IF they arent because application is not fast enough to drain, you
> end with sockets storing huge amount of data in their receive buffer.
>
> So a single 100 bytes payload holds the 32KB block.
>
> If you allowed your UDP socket to store 130.000 bytes of payload, you
> can consume 13.000 * 32KB = ~40 MB

Looks it is one advantage of copybreak.

>
>
>> >
>> > TCP coalescing (skb_try_coalesce) for example wont work for cloned skbs,
>> > so TCP receive window will close pretty fast, and performance sucks in
>> > lossy environments (like the Internet)
>>
>> I didn't observe the above thing, so could you provide a way to reproduce it?
>>
>
> netstat -s can show you interesting TCP counters. But as driver lies on
> skb->truesize, you can also have unexpected crashes with malicious
> senders. With a 64 ratio, its easy to consume all ram.
>
> TCP coalescing is great as soon as you have Out Of Order queueing
> because of packet losses. You avoid expensive collapses and
> dropping/purge of OFO queue. Sender has to resend previously sent data.
>
>> Suppose the above is true, looks skb_clone is useless, isn't it?
>
> cloning has some uses, for example if you dont need to touch packet
> content, only mess with skb->data, skb->len, skb->tail.
>
> But if you need to change a single bit in the payload, or play with skb
> fragments (struct skb_shared_info), you have to make a full copy of the
> 30KB buffer, even if the skb contained only 10 bytes of payload.

So the netdev_alloc_skb_ip_align() can be replaced with skb_clone()
in asix driver since not bits are touched in asix_rx_fixup? The default MTU is
1500 and rx_urb_size is 2048.

If so, could we use copybreak only for case of rx_urb_size > 4096?
And for ax88172, the dev->rx_urb_size is always 2048, looks the copy
is not needed at all.

> I would just switch off turbo mode by default, I doubt it has any
> advantage.

At least for smsc95xx, I think 32K buffer is not worthy of the feature.

>
> Coalescing up to 16K of incoming frames adds latency for no performance
> gain, once you do it the right way (that is without OOM risks).
> Currently, skb->truesize lie is very bad.
>



Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-10  7:22                   ` Ming Lei
@ 2012-07-10  8:28                     ` Eric Dumazet
  2012-07-10 13:37                       ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2012-07-10  8:28 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Tue, 2012-07-10 at 15:22 +0800, Ming Lei wrote:

> Kernel stack size is 8KB or more, so could you find process creation failure
> in your ChromeBooks machine at the same time?

I believe you mix a lot of things.

Have you ever heard of sockets limits ?

All available ram on a machine is not for whoever wants it, thanks God.

No : TCP stack was dropping frames, because of socket limits.

Only because skbs were fat (8KB allocated/truesize, for a single 1500
bytes frame)

If application is fast and read skb as soon as the arrive, no problem is
detected.

But if  application is slow, or a TCP packet is lost on network,
man packets are queued into ofo queue. And eventually not enough room is
avalable -> we drop incoming frames, and sender has to restransmit them.

So instead of loading your web pages as fast as possible, you have to
wait for retransmits.

So you see nothing at all, no kernel logs, no failed memory attempts.

Only its slower than necessary

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-10  8:28                     ` Eric Dumazet
@ 2012-07-10 13:37                       ` Ming Lei
  2012-07-10 14:02                         ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2012-07-10 13:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

On Tue, Jul 10, 2012 at 4:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-07-10 at 15:22 +0800, Ming Lei wrote:
>
>> Kernel stack size is 8KB or more, so could you find process creation failure
>> in your ChromeBooks machine at the same time?
>
> I believe you mix a lot of things.
>
> Have you ever heard of sockets limits ?
>
> All available ram on a machine is not for whoever wants it, thanks God.
>
> No : TCP stack was dropping frames, because of socket limits.
>
> Only because skbs were fat (8KB allocated/truesize, for a single 1500
> bytes frame)

Could you explain why the truesize of SKB is 8KB for single
1500bytes frame?

I observed it is 2560bytes for RX SKBs inside asix_rx_fixup with
rx_urb_size of 2048 on beagle-xm.

>
> If application is fast and read skb as soon as the arrive, no problem is
> detected.
>
> But if  application is slow, or a TCP packet is lost on network,
> man packets are queued into ofo queue. And eventually not enough room is
> avalable -> we drop incoming frames, and sender has to restransmit them.
>
> So instead of loading your web pages as fast as possible, you have to
> wait for retransmits.
>
> So you see nothing at all, no kernel logs, no failed memory attempts.
>
> Only its slower than necessary
>
>
>


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-10 13:37                       ` Ming Lei
@ 2012-07-10 14:02                         ` Eric Dumazet
  2012-07-10 14:22                           ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2012-07-10 14:02 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

I am kind of annoyed you sent on netdev a copy of a _private_ mail.

Next time, make sure you dont do that without my consent.

On Tue, 2012-07-10 at 21:37 +0800, Ming Lei wrote:

> Could you explain why the truesize of SKB is 8KB for single
> 1500bytes frame?
> 

Because the driver uses skb_alloc(4096) for example ?

I don't know, you don't tell us the driver.


Goal is to have skb->head points to a 2048 bytes area, so truesize
should be 2048 + sizeof(sk_buff)  (including struct shared_info)

> I observed it is 2560bytes for RX SKBs inside asix_rx_fixup with
> rx_urb_size of 2048 on beagle-xm.
> 

Thats because using 2048 bytes for the urb buffer (excluding
shared_info) means you need :

2048 + sizeof(struct shared_info) + sizeof(sk_buff) = 2560

In fact 2048 + sizeof(struct shared_info) means a full 4096 area is
used.

You have 2560 on recent kernels because the way netdev_alloc_frag()
works.

Thats why copybreak can actually saves ram. Since it is adding a copy,
we try to use it only on slow devices.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-10 14:02                         ` Eric Dumazet
@ 2012-07-10 14:22                           ` Ming Lei
  2012-07-10 14:45                             ` Eric Dumazet
  0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2012-07-10 14:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Network Development, David Miller

On Tue, Jul 10, 2012 at 10:02 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> I am kind of annoyed you sent on netdev a copy of a _private_ mail.

I am sure that your reply which includes below is not from a private mail:

       Only because skbs were fat (8KB allocated/truesize, for a single
       1500 bytes frame)

>
> Next time, make sure you dont do that without my consent.

OK

> On Tue, 2012-07-10 at 21:37 +0800, Ming Lei wrote:
>
>> Could you explain why the truesize of SKB is 8KB for single
>> 1500bytes frame?
>>
>
> Because the driver uses skb_alloc(4096) for example ?
>
> I don't know, you don't tell us the driver.
>
>
> Goal is to have skb->head points to a 2048 bytes area, so truesize
> should be 2048 + sizeof(sk_buff)  (including struct shared_info)
>
>> I observed it is 2560bytes for RX SKBs inside asix_rx_fixup with
>> rx_urb_size of 2048 on beagle-xm.
>>
>
> Thats because using 2048 bytes for the urb buffer (excluding
> shared_info) means you need :
>
> 2048 + sizeof(struct shared_info) + sizeof(sk_buff) = 2560
>
> In fact 2048 + sizeof(struct shared_info) means a full 4096 area is
> used.
>
> You have 2560 on recent kernels because the way netdev_alloc_frag()
> works.
>
> Thats why copybreak can actually saves ram. Since it is adding a copy,
> we try to use it only on slow devices.

Looks single page allocation won't put too much pressure on MM, that is
why I suggested to avoid copy if the skb buffer size is less or equal one
page. Anyway, unnecessary copy will increase computation and consume power.


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: TCP transmit performance regression
  2012-07-10 14:22                           ` Ming Lei
@ 2012-07-10 14:45                             ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2012-07-10 14:45 UTC (permalink / raw)
  To: Ming Lei; +Cc: Network Development, David Miller

On Tue, 2012-07-10 at 22:22 +0800, Ming Lei wrote:

> Looks single page allocation won't put too much pressure on MM, that is
> why I suggested to avoid copy if the skb buffer size is less or equal one
> page. Anyway, unnecessary copy will increase computation and consume power.

AFAIK this long thread started with drivers/net/usb/smsc95xx.c using
32KB buffers, thats order-3 pages, not 'single page'

Definitely very wrong. You can try to claim the contrary, it wont be
wise.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-07-10 14:46 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-05  1:45 TCP transmit performance regression Ming Lei
2012-07-05  7:43 ` Eric Dumazet
2012-07-05  8:27   ` Ming Lei
2012-07-05  8:33     ` Eric Dumazet
2012-07-05  8:42       ` Ming Lei
2012-07-05  9:49         ` Eric Dumazet
2012-07-05 10:02           ` David Miller
2012-07-05 10:32           ` Ming Lei
2012-07-05 10:41             ` Eric Dumazet
2012-07-05 14:01               ` Ming Lei
2012-07-05 14:28                 ` Eric Dumazet
2012-07-05 14:56                 ` Eric Dumazet
2012-07-06  0:45                   ` Ming Lei
2012-07-06  4:58                     ` Eric Dumazet
2012-07-06  5:16                       ` Eric Dumazet
2012-07-09  5:13                         ` Ming Lei
2012-07-09 13:23           ` Ming Lei
2012-07-09 13:54             ` Eric Dumazet
     [not found]               ` <CACVXFVNdiwVn1Mo--N4N0HdYrEJizExtd_cppT4tS=mjog2PKw@mail.gmail.com>
     [not found]                 ` <1341895143.3265.4049.camel@edumazet-glaptop>
2012-07-10  7:22                   ` Ming Lei
2012-07-10  8:28                     ` Eric Dumazet
2012-07-10 13:37                       ` Ming Lei
2012-07-10 14:02                         ` Eric Dumazet
2012-07-10 14:22                           ` Ming Lei
2012-07-10 14:45                             ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox