From: David Miller <davem@redhat.com>
To: eric.dumazet@gmail.com
Cc: netdev@vger.kernel.org, ogerlitz@mellanox.com,
willemb@google.com, amirv@mellanox.com
Subject: Re: [PATCH v2 net-next 1/2] net: gro: add a per device gro flush timer
Date: Fri, 07 Nov 2014 17:00:44 -0500 (EST) [thread overview]
Message-ID: <20141107.170044.1376374292241401593.davem@redhat.com> (raw)
In-Reply-To: <1415336984.13896.102.camel@edumazet-glaptop2.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 06 Nov 2014 21:09:44 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> Tuning coalescing parameters on NIC can be really hard.
>
> Servers can handle both bulk and RPC like traffic, with conflicting
> goals : bulk flows want as big GRO packets as possible, RPC want minimal
> latencies.
>
> To reach big GRO packets on 10Gbe NIC, one can use :
>
> ethtool -C eth0 rx-usecs 4 rx-frames 44
>
> But this penalizes rpc sessions, with an increase of latencies, up to
> 50% in some cases, as NICs generally do not force an interrupt when
> a packet with TCP Push flag is received.
>
> Some NICs do not have an absolute timer, only a timer rearmed for every
> incoming packet.
>
> This patch uses a different strategy : Let GRO stack decides what do do,
> based on traffic pattern.
>
> Packets with Push flag wont be delayed.
> Packets without Push flag might be held in GRO engine, if we keep
> receiving data.
>
> This new mechanism is off by default, and shall be enabled by setting
> /sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.
>
> To fully enable this mechanism, drivers should use napi_complete_done()
> instead of napi_complete().
>
> Tested:
> Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
>
> Without this feature, we send back about 305,000 ACK per second.
>
> GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
>
> Setting a timer of 2000 nsec is enough to increase GRO packet sizes
> and reduce number of ACK packets. (811/19.2 = 42)
>
> Receiver performs less calls to upper stacks, less wakes up.
> This also reduces cpu usage on the sender, as it receives less ACK
> packets.
>
> Note that reducing number of wakes up increases cpu efficiency, but can
> decrease QPS, as applications wont have the chance to warmup cpu caches
> doing a partial read of RPC requests/answers if they fit in one skb.
>
> B:~# sar -n DEV 1 10 | grep eth0 | tail -1
> Average: eth0 811269.80 305732.30 1199462.57 19705.72 0.00
> 0.00 0.50
>
> B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
>
> B:~# sar -n DEV 1 10 | grep eth0 | tail -1
> Average: eth0 811577.30 19230.80 1199916.51 1239.80 0.00
> 0.00 0.50
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> v2: As requested by David, drivers should use napi_complete_done()
> instead of napi_complete() so that we do not have to track if
> a packet was received during last NAPI poll.
Applied, thanks.
I do think this looks a lot nicer.
prev parent reply other threads:[~2014-11-07 22:00 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-06 0:55 [PATCH net-next] net: gro: add a per device gro flush timer Eric Dumazet
2014-11-06 1:38 ` Rick Jones
2014-11-06 2:14 ` Eric Dumazet
2014-11-06 2:39 ` Eric Dumazet
2014-11-06 16:42 ` Rick Jones
2014-11-06 21:25 ` David Miller
2014-11-06 22:11 ` Eric Dumazet
2014-11-07 3:36 ` David Miller
2014-11-07 4:15 ` Eric Dumazet
2014-11-07 5:09 ` [PATCH v2 net-next 1/2] " Eric Dumazet
2014-11-07 22:00 ` David Miller [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141107.170044.1376374292241401593.davem@redhat.com \
--to=davem@redhat.com \
--cc=amirv@mellanox.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@mellanox.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).