netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net/mlx4: use one page fragment per incoming frame
@ 2013-06-03 17:54 Eric Dumazet
  2013-06-03 18:05 ` Rick Jones
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Eric Dumazet @ 2013-06-03 17:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Amir Vadai

From: Eric Dumazet <edumazet@google.com>

mlx4 driver has a suboptimal memory allocation strategy for regular
MTU=1500 frames, as it uses two page fragments :

One of 512 bytes and one of 1024 bytes.

This makes GRO less effective, as each GSO packet contains 8 MSS instead
of 16 MSS.

Performance of a single TCP flow gains 25 % increase with the following
patch.

Before patch :

A:~# netperf -H 192.168.0.2 -Cc
MIGRATED TCP STREAM TEST ...
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      13798.47   3.06     4.20     0.436   0.598  

After patch :

A:~# netperf -H 192.68.0.2 -Cc
MIGRATED TCP STREAM TEST ...
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      17273.80   3.44     4.19     0.391   0.477  

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index b1d7657..b1f51c1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -98,11 +98,11 @@
 #define MLX4_EN_ALLOC_SIZE	PAGE_ALIGN(16384)
 #define MLX4_EN_ALLOC_ORDER	get_order(MLX4_EN_ALLOC_SIZE)
 
-/* Receive fragment sizes; we use at most 4 fragments (for 9600 byte MTU
+/* Receive fragment sizes; we use at most 3 fragments (for 9600 byte MTU
  * and 4K allocations) */
 enum {
-	FRAG_SZ0 = 512 - NET_IP_ALIGN,
-	FRAG_SZ1 = 1024,
+	FRAG_SZ0 = 1536 - NET_IP_ALIGN,
+	FRAG_SZ1 = 4096,
 	FRAG_SZ2 = 4096,
 	FRAG_SZ3 = MLX4_EN_ALLOC_SIZE
 };

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/mlx4: use one page fragment per incoming frame
  2013-06-03 17:54 [PATCH net-next] net/mlx4: use one page fragment per incoming frame Eric Dumazet
@ 2013-06-03 18:05 ` Rick Jones
  2013-06-03 18:12   ` Eric Dumazet
  2013-06-04  8:40 ` Amir Vadai
  2013-06-05  0:28 ` David Miller
  2 siblings, 1 reply; 7+ messages in thread
From: Rick Jones @ 2013-06-03 18:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Amir Vadai

On 06/03/2013 10:54 AM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> mlx4 driver has a suboptimal memory allocation strategy for regular
> MTU=1500 frames, as it uses two page fragments :
>
> One of 512 bytes and one of 1024 bytes.
>
> This makes GRO less effective, as each GSO packet contains 8 MSS instead
> of 16 MSS.
>
> Performance of a single TCP flow gains 25 % increase with the following
> patch.
 >
> Before patch :
>
> A:~# netperf -H 192.168.0.2 -Cc
> MIGRATED TCP STREAM TEST ...
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
>
>   87380  16384  16384    10.00      13798.47   3.06     4.20     0.436   0.598
>
> After patch :
>
> A:~# netperf -H 192.68.0.2 -Cc
> MIGRATED TCP STREAM TEST ...
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
>
>   87380  16384  16384    10.00      17273.80   3.44     4.19     0.391   0.477


I take it this is a > 10 Gbit/s NIC?

What if any is the downside to an incoming stream of small packets?

rick jones

> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Amir Vadai <amirv@mellanox.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index b1d7657..b1f51c1 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -98,11 +98,11 @@
>   #define MLX4_EN_ALLOC_SIZE	PAGE_ALIGN(16384)
>   #define MLX4_EN_ALLOC_ORDER	get_order(MLX4_EN_ALLOC_SIZE)
>
> -/* Receive fragment sizes; we use at most 4 fragments (for 9600 byte MTU
> +/* Receive fragment sizes; we use at most 3 fragments (for 9600 byte MTU
>    * and 4K allocations) */
>   enum {
> -	FRAG_SZ0 = 512 - NET_IP_ALIGN,
> -	FRAG_SZ1 = 1024,
> +	FRAG_SZ0 = 1536 - NET_IP_ALIGN,
> +	FRAG_SZ1 = 4096,
>   	FRAG_SZ2 = 4096,
>   	FRAG_SZ3 = MLX4_EN_ALLOC_SIZE
>   };
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/mlx4: use one page fragment per incoming frame
  2013-06-03 18:05 ` Rick Jones
@ 2013-06-03 18:12   ` Eric Dumazet
  2013-06-03 18:24     ` Rick Jones
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2013-06-03 18:12 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev, Amir Vadai

On Mon, 2013-06-03 at 11:05 -0700, Rick Jones wrote:

> I take it this is a > 10 Gbit/s NIC?
> 

Well, I guess so !

Or something is really strange with this NIC !

> What if any is the downside to an incoming stream of small packets?

These kind of NIC are aimed for performance, I was told.

1) It would be strange to use them on memory constrained machines.

2) using 1536 bytes fragments is better than most other NIC drivers, as
they usually use 2048 or 4096 bytes frags

3) Current memory allocations done in mlx4 uses order-2 pages, and
nobody yet complained that these allocation might fail...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/mlx4: use one page fragment per incoming frame
  2013-06-03 18:12   ` Eric Dumazet
@ 2013-06-03 18:24     ` Rick Jones
  2013-06-03 18:26       ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Rick Jones @ 2013-06-03 18:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Amir Vadai

On 06/03/2013 11:12 AM, Eric Dumazet wrote:
> On Mon, 2013-06-03 at 11:05 -0700, Rick Jones wrote:
>
>> I take it this is a > 10 Gbit/s NIC?
>>
>
> Well, I guess so !
>
> Or something is really strange with this NIC !

Indeed.

>> What if any is the downside to an incoming stream of small packets?
>
> These kind of NIC are aimed for performance, I was told.
>
> 1) It would be strange to use them on memory constrained machines.
>
> 2) using 1536 bytes fragments is better than most other NIC drivers, as
> they usually use 2048 or 4096 bytes frags
>
> 3) Current memory allocations done in mlx4 uses order-2 pages, and
> nobody yet complained that these allocation might fail...

I was thinking more about getting up to a socket and finding 
insufficient overhead remaining to hold the packet.

rick

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/mlx4: use one page fragment per incoming frame
  2013-06-03 18:24     ` Rick Jones
@ 2013-06-03 18:26       ` Eric Dumazet
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2013-06-03 18:26 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev, Amir Vadai

On Mon, 2013-06-03 at 11:24 -0700, Rick Jones wrote:

> I was thinking more about getting up to a socket and finding 
> insufficient overhead remaining to hold the packet.

Yes I got the point, and I answered to the question ;)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/mlx4: use one page fragment per incoming frame
  2013-06-03 17:54 [PATCH net-next] net/mlx4: use one page fragment per incoming frame Eric Dumazet
  2013-06-03 18:05 ` Rick Jones
@ 2013-06-04  8:40 ` Amir Vadai
  2013-06-05  0:28 ` David Miller
  2 siblings, 0 replies; 7+ messages in thread
From: Amir Vadai @ 2013-06-04  8:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On 03/06/2013 20:54, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> mlx4 driver has a suboptimal memory allocation strategy for regular
> MTU=1500 frames, as it uses two page fragments :
> 
> One of 512 bytes and one of 1024 bytes.
> 
> This makes GRO less effective, as each GSO packet contains 8 MSS instead
> of 16 MSS.
> 
> Performance of a single TCP flow gains 25 % increase with the following
> patch.
> 
> Before patch :
> 
> A:~# netperf -H 192.168.0.2 -Cc
> MIGRATED TCP STREAM TEST ...
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  16384  16384    10.00      13798.47   3.06     4.20     0.436   0.598  
> 
> After patch :
> 
> A:~# netperf -H 192.68.0.2 -Cc
> MIGRATED TCP STREAM TEST ...
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
>  87380  16384  16384    10.00      17273.80   3.44     4.19     0.391   0.477  
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Amir Vadai <amirv@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index b1d7657..b1f51c1 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -98,11 +98,11 @@
>  #define MLX4_EN_ALLOC_SIZE	PAGE_ALIGN(16384)
>  #define MLX4_EN_ALLOC_ORDER	get_order(MLX4_EN_ALLOC_SIZE)
>  
> -/* Receive fragment sizes; we use at most 4 fragments (for 9600 byte MTU
> +/* Receive fragment sizes; we use at most 3 fragments (for 9600 byte MTU
>   * and 4K allocations) */
>  enum {
> -	FRAG_SZ0 = 512 - NET_IP_ALIGN,
> -	FRAG_SZ1 = 1024,
> +	FRAG_SZ0 = 1536 - NET_IP_ALIGN,
> +	FRAG_SZ1 = 4096,
>  	FRAG_SZ2 = 4096,
>  	FRAG_SZ3 = MLX4_EN_ALLOC_SIZE
>  };
> 
> 

Acked-By: Amir Vadai <amirv@mellanox.com>

We are currently working on a patch to change the skb allocation scheme
for the RX side, because the current mlx4_en architecture is behaving
very bad when IOMMU is enabled.
After this change, the driver will use one fragment, among other
improvements. The most important will be fragments recycling to save
alloc/free and dma_map/unmap.

But, I think it will be a good idea to apply your fix now in case there
will be delays.

Thanks,
Amir

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/mlx4: use one page fragment per incoming frame
  2013-06-03 17:54 [PATCH net-next] net/mlx4: use one page fragment per incoming frame Eric Dumazet
  2013-06-03 18:05 ` Rick Jones
  2013-06-04  8:40 ` Amir Vadai
@ 2013-06-05  0:28 ` David Miller
  2 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2013-06-05  0:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, amirv

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 03 Jun 2013 10:54:55 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> mlx4 driver has a suboptimal memory allocation strategy for regular
> MTU=1500 frames, as it uses two page fragments :
> 
> One of 512 bytes and one of 1024 bytes.
> 
> This makes GRO less effective, as each GSO packet contains 8 MSS instead
> of 16 MSS.
> 
> Performance of a single TCP flow gains 25 % increase with the following
> patch.
 ...
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-06-05  0:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-03 17:54 [PATCH net-next] net/mlx4: use one page fragment per incoming frame Eric Dumazet
2013-06-03 18:05 ` Rick Jones
2013-06-03 18:12   ` Eric Dumazet
2013-06-03 18:24     ` Rick Jones
2013-06-03 18:26       ` Eric Dumazet
2013-06-04  8:40 ` Amir Vadai
2013-06-05  0:28 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).