From: Rick Jones <rick.jones2@hp.com>
To: netdev@vger.kernel.org
Subject: Just one more byte, it is wafer thin...
Date: Wed, 20 Jul 2011 16:28:32 -0700 [thread overview]
Message-ID: <4E2764A0.90003@hp.com> (raw)
One of the netperf scripts I run from time to time is the
packet_byte_script (doc/examples/packet_byte_script in the netperf
source tree, though I tweaked it locally to use omni output selectors).
The goal of that script is to measure the incremental cost of sending
another byte and/or another TCP segment. Among other things, it runs RR
tests where the request or response size is incremented. It starts at 1
byte, doubles until it would exceed the MSS, then does 1MSS, 1MSS+1,
2MSS, 2MSS+1 and 3MSS, 3MSS+1.
I recently ran it between a pair of dual-processor X5650 based systems
with 10GbE NICs based on Mellanox MT26438 running as a 10GbE interface.
The kernel is 2.6.38-8-server (maverick) and the driver info is:
# ethtool -i eth2
driver: mlx4_en (HP_0200000003)
version: 1.5.1.6 (August 2010)
firmware-version: 2.7.9294
bus-info: 0000:05:00.0
(yes, that HP_mumble does broach the possibility of a local fubar. i'd
try a pure upstream myself but the systems at my disposal are somewhat
locked-down, i'm hoping someone with a "pure" environment can reproduce
the result, or not)
The full output can be seen at:
ftp://ftp.netperf.org/netperf/misc/sl390_NC543i_mlx4_en_1.5.1.6_Ubuntu_11.04_A5800_56C_to_same_pab_1500mtu_20110719.csv
I wasn't entirely sure what TSO and LRO/GRO would mean for the script,
at first I thought I wouldn't get the +1 trip down the stack, but the
transaction rates all looked reasonably "sane" until the 3MSS to 3MSS+1
transition, when the transaction rate dropped by something like 70%. And
stayed there as the request size was increased further in other testing.
I looked at a tcpdump trace on the sending and receiving side - LRO/GRO
had coalesced segments into the full request size. On the sending side
though, I was seeing one segment of 3MSS and one of one byte. At first
I thought that perhaps something was fubar with cwnd, but looking at
traces for 2MSS(+1) and 1MSS(+1) I saw that is just what TSO does - only
send integer multiples of the MSS as TSO. So, while that does
interesting things to the service demand for a given transaction size,
it probably wasn't the culprit.
It would seem that the adaptive-rx was. Previously, the coalescing
settings on the receiver (netserver side) were:
# ethtool -c eth2
Coalesce parameters for eth2:
Adaptive RX: on TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 400000
pkt-rate-high: 450000
rx-usecs: 16
rx-frames: 44
rx-usecs-irq: 0
rx-frames-irq: 0
tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 128
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
and netperf would look like:
# HDR="-P 1";for r in 4344 4345; do netperf -H mumble.3.21 -t TCP_RR
$HDR -- -r ${r},1; HDR="-P 0"; done
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to mumble.3.21 (mumble.3.21) port 0 AF_INET : histogram : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 4344 1 10.00 10030.37
16384 87380
16384 87380 4345 1 10.00 3406.62
16384 87380
when I switched adaptive rx off via ethtool, the drop largely went away:
# HDR="-P 1";for r in 4344 4345; do netperf -H mumble.3.21 -t TCP_RR
$HDR -- -r ${r},1; HDR="-P 0"; done
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to mumble.3.21 (mumble.3.21) port 0 AF_INET : histogram : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 4344 1 10.00 11167.48
16384 87380
16384 87380 4345 1 10.00 10460.02
16384 87380
Now, at 11000 transactions per second, even with the request being 4
packets, that is still < 55000 packets per second, so presumably
everything should have stayed at "_low" right? Just for grins, I put
adaptive coalescing on again and set rx-usecs-high to 64 and ran those
two points again:
# HDR="-P 1";for r in 4344 4345; do netperf -H mumble.3.21 -t TCP_RR
$HDR -- -r ${r},1; HDR="-P 0"; done
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to mumble.3.21 (mumble.3.21) port 0 AF_INET : histogram : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 4344 1 10.00 11143.07
16384 87380
16384 87380 4345 1 10.00 5790.48
16384 87380
and just to be completely pedantic about it, set rx-usecs-high to 0:
# HDR="-P 1";for r in 4344 4345; do netperf -H mumble.3.21 -t TCP_RR
$HDR -- -r ${r},1; HDR="-P 0"; done
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to mumble.3.21 (mumble.3.21) port 0 AF_INET : histogram : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 4344 1 10.00 14274.03
16384 87380
16384 87380 4345 1 10.00 13697.11
16384 87380
and got a somewhat unexpected result - I've no idea why then they both
went up - perhaps it was sensing "high" occasionally even in the 4344
byte request case. Still, is this suggesting that perhaps the adaptive
bits are being a bit to aggressive about sensing high? Over what
interval is that measurement supposed to be happening?
rick jones
next reply other threads:[~2011-07-20 23:36 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-20 23:28 Rick Jones [this message]
2011-07-21 0:52 ` Just one more byte, it is wafer thin Rick Jones
2011-07-21 22:28 ` Rick Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E2764A0.90003@hp.com \
--to=rick.jones2@hp.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.