Just one more byte, it is wafer thin...

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rick Jones <rick.jones2@hp.com>
To: netdev@vger.kernel.org
Subject: Just one more byte, it is wafer thin...
Date: Wed, 20 Jul 2011 16:28:32 -0700	[thread overview]
Message-ID: <4E2764A0.90003@hp.com> (raw)

One of the netperf scripts I run from time to time is the 
packet_byte_script (doc/examples/packet_byte_script in the netperf 
source tree, though I tweaked it locally to use omni output selectors). 
  The goal of that script is to measure the incremental cost of sending 
another byte and/or another TCP segment.  Among other things, it runs RR 
tests where the request or response size is incremented.  It starts at 1 
byte, doubles until it would exceed the MSS, then does 1MSS, 1MSS+1, 
2MSS, 2MSS+1 and 3MSS, 3MSS+1.

I recently ran it between a pair of dual-processor X5650 based systems 
with 10GbE NICs based on Mellanox MT26438 running as a 10GbE interface. 
The kernel is 2.6.38-8-server (maverick) and the driver info is:

# ethtool -i eth2
driver: mlx4_en (HP_0200000003)
version: 1.5.1.6 (August 2010)
firmware-version: 2.7.9294
bus-info: 0000:05:00.0

(yes, that HP_mumble does broach the possibility of a local fubar. i'd 
try a pure upstream myself but the systems at my disposal are somewhat 
locked-down, i'm hoping someone with a "pure" environment can reproduce 
the result, or not)

The full output can be seen at:

ftp://ftp.netperf.org/netperf/misc/sl390_NC543i_mlx4_en_1.5.1.6_Ubuntu_11.04_A5800_56C_to_same_pab_1500mtu_20110719.csv

I wasn't entirely sure what TSO and LRO/GRO would mean for the script, 
at first I thought I wouldn't get the +1 trip down the stack, but the 
transaction rates all looked reasonably "sane" until the 3MSS to 3MSS+1 
transition, when the transaction rate dropped by something like 70%. And 
stayed there as the request size was increased further in other testing. 
I looked at a tcpdump trace on the sending and receiving side - LRO/GRO 
had coalesced segments into the full request size.  On the sending side 
though, I was seeing one segment of 3MSS and one of one byte.  At first 
I thought that perhaps something was fubar with cwnd, but looking at 
traces for 2MSS(+1) and 1MSS(+1) I saw that is just what TSO does - only 
send integer multiples of the MSS as TSO.  So, while that does 
interesting things to the service demand for a given transaction size, 
it probably wasn't the culprit.

It would seem that the adaptive-rx was.  Previously, the coalescing 
settings on the receiver (netserver side) were:

# ethtool -c eth2
Coalesce parameters for eth2:
Adaptive RX: on  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 400000
pkt-rate-high: 450000

rx-usecs: 16
rx-frames: 44
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 128
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

and netperf would look like:
# HDR="-P 1";for r in 4344 4345; do netperf -H mumble.3.21 -t TCP_RR 
$HDR -- -r ${r},1; HDR="-P 0"; done
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET 
to mumble.3.21 (mumble.3.21) port 0 AF_INET : histogram : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  4344     1       10.00    10030.37
16384  87380
16384  87380  4345     1       10.00    3406.62
16384  87380

when I switched adaptive rx off via ethtool, the drop largely went away:

# HDR="-P 1";for r in 4344 4345; do netperf -H mumble.3.21 -t TCP_RR 
$HDR -- -r ${r},1; HDR="-P 0"; done
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET 
to mumble.3.21 (mumble.3.21) port 0 AF_INET : histogram : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  4344     1       10.00    11167.48
16384  87380
16384  87380  4345     1       10.00    10460.02
16384  87380

Now, at 11000 transactions per second, even with the request being 4 
packets, that is still < 55000 packets per second, so presumably 
everything should have stayed at "_low" right?  Just for grins, I put 
adaptive coalescing on again and set rx-usecs-high to 64 and ran those 
two points again:

# HDR="-P 1";for r in 4344 4345; do netperf -H mumble.3.21 -t TCP_RR 
$HDR -- -r ${r},1; HDR="-P 0"; done
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET 
to mumble.3.21 (mumble.3.21) port 0 AF_INET : histogram : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  4344     1       10.00    11143.07
16384  87380
16384  87380  4345     1       10.00    5790.48
16384  87380

and just to be completely pedantic about it, set rx-usecs-high to 0:

# HDR="-P 1";for r in 4344 4345; do netperf -H mumble.3.21 -t TCP_RR 
$HDR -- -r ${r},1; HDR="-P 0"; done
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET 
to mumble.3.21 (mumble.3.21) port 0 AF_INET : histogram : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  4344     1       10.00    14274.03
16384  87380
16384  87380  4345     1       10.00    13697.11
16384  87380

and got a somewhat unexpected result - I've no idea why then they both 
went up - perhaps it was sensing "high" occasionally even in the 4344 
byte request case.  Still, is this suggesting that perhaps the adaptive 
bits are being a bit to aggressive about sensing high?  Over what 
interval is that measurement supposed to be happening?

rick jones

next             reply	other threads:[~2011-07-20 23:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-20 23:28 Rick Jones [this message]
2011-07-21  0:52 ` Just one more byte, it is wafer thin Rick Jones
2011-07-21 22:28 ` Rick Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E2764A0.90003@hp.com \
    --to=rick.jones2@hp.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.