netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Miller <davem@redhat.com>
To: eric.dumazet@gmail.com
Cc: netdev@vger.kernel.org, ogerlitz@mellanox.com,
	willemb@google.com, amirv@mellanox.com
Subject: Re: [PATCH v2 net-next 1/2] net: gro: add a per device gro flush timer
Date: Fri, 07 Nov 2014 17:00:44 -0500 (EST)	[thread overview]
Message-ID: <20141107.170044.1376374292241401593.davem@redhat.com> (raw)
In-Reply-To: <1415336984.13896.102.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 06 Nov 2014 21:09:44 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Tuning coalescing parameters on NIC can be really hard.
> 
> Servers can handle both bulk and RPC like traffic, with conflicting
> goals : bulk flows want as big GRO packets as possible, RPC want minimal
> latencies.
> 
> To reach big GRO packets on 10Gbe NIC, one can use :
> 
> ethtool -C eth0 rx-usecs 4 rx-frames 44
> 
> But this penalizes rpc sessions, with an increase of latencies, up to
> 50% in some cases, as NICs generally do not force an interrupt when
> a packet with TCP Push flag is received.
> 
> Some NICs do not have an absolute timer, only a timer rearmed for every
> incoming packet.
> 
> This patch uses a different strategy : Let GRO stack decides what do do,
> based on traffic pattern.
> 
> Packets with Push flag wont be delayed.
> Packets without Push flag might be held in GRO engine, if we keep
> receiving data.
> 
> This new mechanism is off by default, and shall be enabled by setting
> /sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.
> 
> To fully enable this mechanism, drivers should use napi_complete_done()
> instead of napi_complete().
> 
> Tested:
>  Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
> 
> Without this feature, we send back about 305,000 ACK per second.
> 
> GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
> 
> Setting a timer of 2000 nsec is enough to increase GRO packet sizes
> and reduce number of ACK packets. (811/19.2 = 42)
> 
> Receiver performs less calls to upper stacks, less wakes up.
> This also reduces cpu usage on the sender, as it receives less ACK
> packets.
> 
> Note that reducing number of wakes up increases cpu efficiency, but can
> decrease QPS, as applications wont have the chance to warmup cpu caches
> doing a partial read of RPC requests/answers if they fit in one skb.
> 
> B:~# sar -n DEV 1 10 | grep eth0 | tail -1
> Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
> 0.00      0.50
> 
> B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
> 
> B:~# sar -n DEV 1 10 | grep eth0 | tail -1
> Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
> 0.00      0.50
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> v2: As requested by David, drivers should use napi_complete_done()
>     instead of napi_complete() so that we do not have to track if
>     a packet was received during last NAPI poll.

Applied, thanks.

I do think this looks a lot nicer.

      reply	other threads:[~2014-11-07 22:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-06  0:55 [PATCH net-next] net: gro: add a per device gro flush timer Eric Dumazet
2014-11-06  1:38 ` Rick Jones
2014-11-06  2:14   ` Eric Dumazet
2014-11-06  2:39     ` Eric Dumazet
2014-11-06 16:42       ` Rick Jones
2014-11-06 21:25 ` David Miller
2014-11-06 22:11   ` Eric Dumazet
2014-11-07  3:36     ` David Miller
2014-11-07  4:15       ` Eric Dumazet
2014-11-07  5:09       ` [PATCH v2 net-next 1/2] " Eric Dumazet
2014-11-07 22:00         ` David Miller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141107.170044.1376374292241401593.davem@redhat.com \
    --to=davem@redhat.com \
    --cc=amirv@mellanox.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).