[-RT] multiple streams have degraded performance

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [-RT] multiple streams have degraded performance
@ 2007-06-19  5:12 Vernon Mauery
  2007-06-19  6:51 ` Peter Zijlstra
  2007-06-20  3:38 ` Vernon Mauery
  0 siblings, 2 replies; 7+ messages in thread
From: Vernon Mauery @ 2007-06-19  5:12 UTC (permalink / raw)
  To: Vernon Mauery, LKML; +Cc: Ingo Molnar, Jeff Garzik

In looking at the performance characteristics of my network I found that 
2.6.21.5-rt15 suffers from degraded thoughput with multiple threads.  The 
test that I did this with is simply invoking 1, 2, 4, and 8 instances of 
netperf at a time and measuring the total throughput.  I have two 4-way 
machines connected with 10GbE cards.  I tested several kernels (some older 
and some newer) and found that the only thing in common was that with -RT 
kernels the performance went down with concurrent streams.

While the test was showing the numbers for receiving as well as sending, the 
receiving numbers are not reliable because that machine was running a -RT 
kernel for these tests.

I was just wondering if anyone had seen this problem before or would have any 
idea on where to start hunting for the solution.

--Vernon

The key for this is 'default' was invoked like:
netperf -c -C -l 60 -H 10.2.2.4 -t UDP_STREAM -- -m 1472 -M 1472
and '1Msock' was invoked like:
netperf -c -C -l 60 -H 10.2.2.4 -t UDP_STREAM -- -m 1472 -M 1472 -s 1M -S 1M

2.6.21
==============
default: 1 streams: Send at 2844.2 Mb/s, Receive at 2840.1 Mb/s
default: 2 streams: Send at 3927.9 Mb/s, Receive at 3603.9 Mb/s
default: 4 streams: Send at 4197.4 Mb/s, Receive at 3776.3 Mb/s
default: 8 streams: Send at 4223.9 Mb/s, Receive at 3848.9 Mb/s
1Msock: 1 streams: Send at 4232.3 Mb/s, Receive at 3914.4 Mb/s
1Msock: 2 streams: Send at 5428.8 Mb/s, Receive at 3853.2 Mb/s
1Msock: 4 streams: Send at 6202.1 Mb/s, Receive at 3774.8 Mb/s
1Msock: 8 streams: Send at 6225.1 Mb/s, Receive at 3754.7 Mb/s

2.6.21.5-rt15
===============
default: 1 streams: Send at 3091.6 Mb/s, Receive at 3048.1 Mb/s
default: 2 streams: Send at 3768.8 Mb/s, Receive at 3714.2 Mb/s
default: 4 streams: Send at 1873.6 Mb/s, Receive at 1825.9 Mb/s
default: 8 streams: Send at 1806.5 Mb/s, Receive at 1792.7 Mb/s
1Msock: 1 streams: Send at 3680.4 Mb/s, Receive at 3255.6 Mb/s
1Msock: 2 streams: Send at 4129.8 Mb/s, Receive at 3991.5 Mb/s
1Msock: 4 streams: Send at 1862.1 Mb/s, Receive at 1787.1 Mb/s
1Msock: 8 streams: Send at 1790.2 Mb/s, Receive at 1556.8 Mb/s

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [-RT] multiple streams have degraded performance
  2007-06-19  5:12 [-RT] multiple streams have degraded performance Vernon Mauery
@ 2007-06-19  6:51 ` Peter Zijlstra
  2007-06-19 14:25   ` Vernon Mauery
  2007-06-20  3:38 ` Vernon Mauery
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2007-06-19  6:51 UTC (permalink / raw)
  To: Vernon Mauery; +Cc: LKML, Ingo Molnar, Jeff Garzik

On Mon, 2007-06-18 at 22:12 -0700, Vernon Mauery wrote:
> In looking at the performance characteristics of my network I found that 
> 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads.  The 
> test that I did this with is simply invoking 1, 2, 4, and 8 instances of 
> netperf at a time and measuring the total throughput.  I have two 4-way 
> machines connected with 10GbE cards.  I tested several kernels (some older 
> and some newer) and found that the only thing in common was that with -RT 
> kernels the performance went down with concurrent streams.
> 
> While the test was showing the numbers for receiving as well as sending, the 
> receiving numbers are not reliable because that machine was running a -RT 
> kernel for these tests.
> 
> I was just wondering if anyone had seen this problem before or would have any 
> idea on where to start hunting for the solution.

could you enable CONFIG_LOCK_STAT

echo 0 > /proc/lock_stat

<run your test>

and report the output of (preferably not 80 column wrapped):

grep : /proc/lock_stat | head

or otherwise if there are any highly contended network locks listed?

 - Peter


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [-RT] multiple streams have degraded performance
  2007-06-19  6:51 ` Peter Zijlstra
@ 2007-06-19 14:25   ` Vernon Mauery
  2007-06-19 15:38     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Vernon Mauery @ 2007-06-19 14:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, Jeff Garzik

On Monday 18 June 2007 11:51:38 pm Peter Zijlstra wrote:
> On Mon, 2007-06-18 at 22:12 -0700, Vernon Mauery wrote:
> > In looking at the performance characteristics of my network I found that
> > 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads.  The
> > test that I did this with is simply invoking 1, 2, 4, and 8 instances of
> > netperf at a time and measuring the total throughput.  I have two 4-way
> > machines connected with 10GbE cards.  I tested several kernels (some
> > older and some newer) and found that the only thing in common was that
> > with -RT kernels the performance went down with concurrent streams.
> >
> > While the test was showing the numbers for receiving as well as sending,
> > the receiving numbers are not reliable because that machine was running a
> > -RT kernel for these tests.
> >
> > I was just wondering if anyone had seen this problem before or would have
> > any idea on where to start hunting for the solution.
>
> could you enable CONFIG_LOCK_STAT
>
> echo 0 > /proc/lock_stat
>
> <run your test>
>
> and report the output of (preferably not 80 column wrapped):
>
> grep : /proc/lock_stat | head

/proc/lock_stat stayed empty for the duration of the test.  I am guessing this 
means there was no lock contention.

I do see this on the console:
BUG: scheduling with irqs disabled: IRQ-8414/0x00000000/9494
caller is wait_for_completion+0x85/0xc4

Call Trace:
 [<ffffffff8106f3b2>] dump_trace+0xaa/0x32a
 [<ffffffff8106f673>] show_trace+0x41/0x64
 [<ffffffff8106f6ab>] dump_stack+0x15/0x17
 [<ffffffff8106566f>] schedule+0x82/0x102
 [<ffffffff81065774>] wait_for_completion+0x85/0xc4
 [<ffffffff81092043>] set_cpus_allowed+0xa1/0xc8
 [<ffffffff810986e2>] do_softirq_from_hardirq+0x105/0x12d
 [<ffffffff810ca6cc>] do_irqd+0x2a8/0x32f
 [<ffffffff8103469d>] kthread+0xf5/0x128
 [<ffffffff81060f68>] child_rip+0xa/0x12

INFO: lockdep is turned off.

I haven't seen this until I enabled lock_stat and ran the test.

> or otherwise if there are any highly contended network locks listed?

Any other ideas for debugging this?

--Vernon

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [-RT] multiple streams have degraded performance
  2007-06-19 14:25   ` Vernon Mauery
@ 2007-06-19 15:38     ` Peter Zijlstra
  2007-06-19 17:19       ` Vernon Mauery
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2007-06-19 15:38 UTC (permalink / raw)
  To: Vernon Mauery; +Cc: LKML, Jeff Garzik, Ingo Molnar

On Tue, 2007-06-19 at 07:25 -0700, Vernon Mauery wrote:
> On Monday 18 June 2007 11:51:38 pm Peter Zijlstra wrote:
> > On Mon, 2007-06-18 at 22:12 -0700, Vernon Mauery wrote:
> > > In looking at the performance characteristics of my network I found that
> > > 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads.  The
> > > test that I did this with is simply invoking 1, 2, 4, and 8 instances of
> > > netperf at a time and measuring the total throughput.  I have two 4-way
> > > machines connected with 10GbE cards.  I tested several kernels (some
> > > older and some newer) and found that the only thing in common was that
> > > with -RT kernels the performance went down with concurrent streams.
> > >
> > > While the test was showing the numbers for receiving as well as sending,
> > > the receiving numbers are not reliable because that machine was running a
> > > -RT kernel for these tests.
> > >
> > > I was just wondering if anyone had seen this problem before or would have
> > > any idea on where to start hunting for the solution.
> >
> > could you enable CONFIG_LOCK_STAT
> >
> > echo 0 > /proc/lock_stat
> >
> > <run your test>
> >
> > and report the output of (preferably not 80 column wrapped):
> >
> > grep : /proc/lock_stat | head
> 
> /proc/lock_stat stayed empty for the duration of the test.  I am guessing this 
> means there was no lock contention.

Most likely caused by the issue below.

> I do see this on the console:
> BUG: scheduling with irqs disabled: IRQ-8414/0x00000000/9494
> caller is wait_for_completion+0x85/0xc4
> 
> Call Trace:
>  [<ffffffff8106f3b2>] dump_trace+0xaa/0x32a
>  [<ffffffff8106f673>] show_trace+0x41/0x64
>  [<ffffffff8106f6ab>] dump_stack+0x15/0x17
>  [<ffffffff8106566f>] schedule+0x82/0x102
>  [<ffffffff81065774>] wait_for_completion+0x85/0xc4
>  [<ffffffff81092043>] set_cpus_allowed+0xa1/0xc8
>  [<ffffffff810986e2>] do_softirq_from_hardirq+0x105/0x12d
>  [<ffffffff810ca6cc>] do_irqd+0x2a8/0x32f
>  [<ffffffff8103469d>] kthread+0xf5/0x128
>  [<ffffffff81060f68>] child_rip+0xa/0x12
> 
> INFO: lockdep is turned off.
> 
> I haven't seen this until I enabled lock_stat and ran the test.

I think this is what causes the lack of output. It disables all lock
tracking features...

> > or otherwise if there are any highly contended network locks listed?
> 
> Any other ideas for debugging this?

fixing the above bug would help :-)

Ingo says that should be fixed in -rt17, so if you could give that a
spin...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [-RT] multiple streams have degraded performance
  2007-06-19 15:38     ` Peter Zijlstra
@ 2007-06-19 17:19       ` Vernon Mauery
  0 siblings, 0 replies; 7+ messages in thread
From: Vernon Mauery @ 2007-06-19 17:19 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Jeff Garzik, Ingo Molnar

On Tuesday 19 June 2007 8:38:50 am Peter Zijlstra wrote:
> On Tue, 2007-06-19 at 07:25 -0700, Vernon Mauery wrote:
> > On Monday 18 June 2007 11:51:38 pm Peter Zijlstra wrote:
> > > On Mon, 2007-06-18 at 22:12 -0700, Vernon Mauery wrote:
> > > > In looking at the performance characteristics of my network I found
> > > > that 2.6.21.5-rt15 suffers from degraded thoughput with multiple
> > > > threads.  The test that I did this with is simply invoking 1, 2, 4,
> > > > and 8 instances of netperf at a time and measuring the total
> > > > throughput.  I have two 4-way machines connected with 10GbE cards.  I
> > > > tested several kernels (some older and some newer) and found that the
> > > > only thing in common was that with -RT kernels the performance went
> > > > down with concurrent streams.
> > > >
> > > > While the test was showing the numbers for receiving as well as
> > > > sending, the receiving numbers are not reliable because that machine
> > > > was running a -RT kernel for these tests.
> > > >
> > > > I was just wondering if anyone had seen this problem before or would
> > > > have any idea on where to start hunting for the solution.
> > >
> > > could you enable CONFIG_LOCK_STAT
> > >
> > > echo 0 > /proc/lock_stat
> > >
> > > <run your test>
> > >
> > > and report the output of (preferably not 80 column wrapped):
> > >
> > > grep : /proc/lock_stat | head
> >
> > /proc/lock_stat stayed empty for the duration of the test.  I am guessing
> > this means there was no lock contention.
>
> Most likely caused by the issue below.
>
> > I do see this on the console:
> > BUG: scheduling with irqs disabled: IRQ-8414/0x00000000/9494
> > caller is wait_for_completion+0x85/0xc4
> >
> > Call Trace:
> >  [<ffffffff8106f3b2>] dump_trace+0xaa/0x32a
> >  [<ffffffff8106f673>] show_trace+0x41/0x64
> >  [<ffffffff8106f6ab>] dump_stack+0x15/0x17
> >  [<ffffffff8106566f>] schedule+0x82/0x102
> >  [<ffffffff81065774>] wait_for_completion+0x85/0xc4
> >  [<ffffffff81092043>] set_cpus_allowed+0xa1/0xc8
> >  [<ffffffff810986e2>] do_softirq_from_hardirq+0x105/0x12d
> >  [<ffffffff810ca6cc>] do_irqd+0x2a8/0x32f
> >  [<ffffffff8103469d>] kthread+0xf5/0x128
> >  [<ffffffff81060f68>] child_rip+0xa/0x12
> >
> > INFO: lockdep is turned off.
> >
> > I haven't seen this until I enabled lock_stat and ran the test.
>
> I think this is what causes the lack of output. It disables all lock
> tracking features...
>
> > > or otherwise if there are any highly contended network locks listed?
> >
> > Any other ideas for debugging this?
>
> fixing the above bug would help :-)
>
> Ingo says that should be fixed in -rt17, so if you could give that a
> spin...

I just tested with -rt17 and the BUG message is gone, but I still don't see 
any entries in.  I don't see any other suspicious message so I think it is 
really strange that is not listing any locks at all.

--Vernon

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [-RT] multiple streams have degraded performance
  2007-06-19  5:12 [-RT] multiple streams have degraded performance Vernon Mauery
  2007-06-19  6:51 ` Peter Zijlstra
@ 2007-06-20  3:38 ` Vernon Mauery
  2007-06-20  7:42   ` Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Vernon Mauery @ 2007-06-20  3:38 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Jeff Garzik, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 3687 bytes --]

On Monday 18 June 2007 10:12:21 pm Vernon Mauery wrote:
> In looking at the performance characteristics of my network I found that
> 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads.  The
> test that I did this with is simply invoking 1, 2, 4, and 8 instances of
> netperf at a time and measuring the total throughput.  I have two 4-way
> machines connected with 10GbE cards.  I tested several kernels (some older
> and some newer) and found that the only thing in common was that with -RT
> kernels the performance went down with concurrent streams.

I just tested this using lo instead of the 10GbE adapter.  I found similar 
results.  Since this makes it reproducible by just about anybody (maybe the 
only factor now is the number of CPUs), I have attached the script that I 
test things with.

So with the script run like ./stream_test 127.0.0.1 on 2.6.21 and 
2.6.21.5-rt17 I got the following:

2.6.21
=======================
default: 1 streams: Send at 2790.3 Mb/s, Receive at 2790.3 Mb/s
default: 2 streams: Send at 4129.4 Mb/s, Receive at 4128.7 Mb/s
default: 4 streams: Send at 7949.6 Mb/s, Receive at 7735.5 Mb/s
default: 8 streams: Send at 7930.7 Mb/s, Receive at 7910.1 Mb/s
1Msock: 1 streams: Send at 2810.7 Mb/s, Receive at 2810.7 Mb/s
1Msock: 2 streams: Send at 4093.4 Mb/s, Receive at 4092.6 Mb/s
1Msock: 4 streams: Send at 7887.8 Mb/s, Receive at 7880.4 Mb/s
1Msock: 8 streams: Send at 8091.7 Mb/s, Receive at 8082.2 Mb/s

2.6.21.5-rt17
======================
default: 1 streams: Send at 938.2 Mb/s, Receive at 938.2 Mb/s
default: 2 streams: Send at 1476.3 Mb/s, Receive at 1436.9 Mb/s
default: 4 streams: Send at 1489.8 Mb/s, Receive at 1145.0 Mb/s
default: 8 streams: Send at 1099.8 Mb/s, Receive at 1079.1 Mb/s
1Msock: 1 streams: Send at 921.4 Mb/s, Receive at 920.4 Mb/s
1Msock: 2 streams: Send at 1332.2 Mb/s, Receive at 1311.5 Mb/s
1Msock: 4 streams: Send at 1483.0 Mb/s, Receive at 1137.8 Mb/s
1Msock: 8 streams: Send at 1446.2 Mb/s, Receive at 1135.6 Mb/s

--Vernon

> While the test was showing the numbers for receiving as well as sending,
> the receiving numbers are not reliable because that machine was running a
> -RT kernel for these tests.
>
> I was just wondering if anyone had seen this problem before or would have
> any idea on where to start hunting for the solution.
>
> --Vernon
>
> The key for this is 'default' was invoked like:
> netperf -c -C -l 60 -H 10.2.2.4 -t UDP_STREAM -- -m 1472 -M 1472
> and '1Msock' was invoked like:
> netperf -c -C -l 60 -H 10.2.2.4 -t UDP_STREAM -- -m 1472 -M 1472 -s 1M -S
> 1M
>
> 2.6.21
> ==============
> default: 1 streams: Send at 2844.2 Mb/s, Receive at 2840.1 Mb/s
> default: 2 streams: Send at 3927.9 Mb/s, Receive at 3603.9 Mb/s
> default: 4 streams: Send at 4197.4 Mb/s, Receive at 3776.3 Mb/s
> default: 8 streams: Send at 4223.9 Mb/s, Receive at 3848.9 Mb/s
> 1Msock: 1 streams: Send at 4232.3 Mb/s, Receive at 3914.4 Mb/s
> 1Msock: 2 streams: Send at 5428.8 Mb/s, Receive at 3853.2 Mb/s
> 1Msock: 4 streams: Send at 6202.1 Mb/s, Receive at 3774.8 Mb/s
> 1Msock: 8 streams: Send at 6225.1 Mb/s, Receive at 3754.7 Mb/s
>
> 2.6.21.5-rt15
> ===============
> default: 1 streams: Send at 3091.6 Mb/s, Receive at 3048.1 Mb/s
> default: 2 streams: Send at 3768.8 Mb/s, Receive at 3714.2 Mb/s
> default: 4 streams: Send at 1873.6 Mb/s, Receive at 1825.9 Mb/s
> default: 8 streams: Send at 1806.5 Mb/s, Receive at 1792.7 Mb/s
> 1Msock: 1 streams: Send at 3680.4 Mb/s, Receive at 3255.6 Mb/s
> 1Msock: 2 streams: Send at 4129.8 Mb/s, Receive at 3991.5 Mb/s
> 1Msock: 4 streams: Send at 1862.1 Mb/s, Receive at 1787.1 Mb/s
> 1Msock: 8 streams: Send at 1790.2 Mb/s, Receive at 1556.8 Mb/s



[-- Attachment #2: stream_test --]
[-- Type: text/plain, Size: 2830 bytes --]

#!/bin/sh
# File: stream_test
# Description: test network throughput with a varying number of streams
# Author: Vernon Mauery <vernux@us.ibm.com>
# Copyright: IBM Corporation (C) 2007


instances="1 2 4 8"
msg_size=
time=60
TEST=UDP_STREAM
sock_size=

function usage() {
	echo "usage: $0 [-t|-u] [-c N] [-m N] [-s N] [-T N] [-x ...] <target ip addr>"
	echo "	-t, -u	run TCP or UDP tests respectively (default: UDP)"
	echo "	-c N	run N concurrent instances (default \"1 2 4 8\")"
	echo "	-m N	set message size to N bytes (default 1472)"
	echo "	-s N	set socket buffer size (default 1M)"
	echo "	-T N	run test for N seconds (default 60)"
	echo "	-x '...'	pass extra ags to netperf (included after -- )"
	echo "	-h		display this message"
}

while [ $# -gt 0 ]; do
	case $1 in
		-t)
		TEST=TCP_STREAM
		;;
		-u)
		TEST=UDP_STREAM
		;;
		-c)
		shift
		instances=$1
		;;
		-m)
		shift
		msg_size=$1
		;;
		-s)
		shift
		sock_size=$1
		;;
		-T)
		shift
		time=$1
		;;
		-x)
		shift
		extra_args=$1;
		;;
		-h)
		usage
		exit 0
		;;
		*)
		TARGET=$1
		;;
	esac
	shift
done

if [ -z "$TARGET" ]; then
	exit 1
fi
if ! ping -c 1 $TARGET >&/dev/null; then
	echo "could not connect to $TARGET"
	usage
	exit 1
fi

if [ -z "$msg_size" ]; then
	case $TEST in
		TCP_STREAM)
		msg_size=256k
		;;
		UDP_STREAM)
		msg_size=1472
		;;
	esac
fi
if [ -z "$sock_size" ]; then
	case $TEST in
		TCP_STREAM)
		sock_size=2M
		;;
		UDP_STREAM)
		sock_size=1M
		;;
	esac
fi

function execute() {
	ITERS=$1
	NAME=$2
	ARGS=$3
	for ((i=0; i<$ITERS; i++)); do
		command="./netperf -c -C -l $time -H $TARGET -t $TEST -- $ARGS $extra_args"
		echo $command
		echo $command > out.$$.$i
		$command >> out.$$.$i &
	done
	for ((i=1; i<=$ITERS; i++)); do
		wait %$i
	done
	date > results-$NAME.$$.$ITERS
	uname -a >> results-$NAME.$$.$ITERS
	cat out.$$.* >> results-$NAME.$$.$ITERS
	rm -f out.$$.*
}

function parse_TCP_STREAM() {
	ITER=$1
	NAME=$2
	SEND=`( cat results-$NAME.$$.$ITER | grep "^[ 0-9]" | awk '{printf "%s + ", \$5 }';  echo "0 " ) | bc`
	echo "** $NAME: $ITER streams: Send at $SEND Mb/s"
}

function parse_UDP_STREAM() {
	ITER=$1
	NAME=$2
	SEND=`( cat results-$NAME.$$.$ITER | grep "^[ 0-9]" | awk 'NR % 2 == 1 {printf "%s + ", \$6 }';  echo "0 " ) | bc`
	RECV=`( cat results-$NAME.$$.$ITER | grep "^[ 0-9]" | awk 'NR % 2 == 0 {printf "%s + ", \$4 }';  echo "0 " ) | bc`
	echo "** $NAME: $ITER streams: Send at $SEND Mb/s, Receive at $RECV Mb/s"
}

function cleanup() {
	for ((i=1; i<=$ITERS; i++)); do
		kill -9 %$i
	done
	rm -f out.$$.*
	exit
}

trap cleanup SIGINT SIGQUIT

rm -f out.$$.*
echo "results can be found in results-$$.*"
for C in $instances; do
	execute $C default "-m $msg_size -M $msg_size"
	parse_$TEST $C default
	execute $C ${sock_size}sock "-m $msg_size -M $msg_size -s $sock_size -S $sock_size"
	parse_$TEST $C ${sock_size}sock
done


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [-RT] multiple streams have degraded performance
  2007-06-20  3:38 ` Vernon Mauery
@ 2007-06-20  7:42   ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2007-06-20  7:42 UTC (permalink / raw)
  To: Vernon Mauery; +Cc: LKML, Jeff Garzik, Ingo Molnar, netdev

On Tue, 2007-06-19 at 20:38 -0700, Vernon Mauery wrote:
> On Monday 18 June 2007 10:12:21 pm Vernon Mauery wrote:
> > In looking at the performance characteristics of my network I found that
> > 2.6.21.5-rt15 suffers from degraded thoughput with multiple threads.  The
> > test that I did this with is simply invoking 1, 2, 4, and 8 instances of
> > netperf at a time and measuring the total throughput.  I have two 4-way
> > machines connected with 10GbE cards.  I tested several kernels (some older
> > and some newer) and found that the only thing in common was that with -RT
> > kernels the performance went down with concurrent streams.
> 
> I just tested this using lo instead of the 10GbE adapter.  I found similar 
> results.  Since this makes it reproducible by just about anybody (maybe the 
> only factor now is the number of CPUs), I have attached the script that I 
> test things with.
> 
> So with the script run like ./stream_test 127.0.0.1 on 2.6.21 and 
> 2.6.21.5-rt17 I got the following:
> 
> 2.6.21
> =======================
> default: 1 streams: Send at 2790.3 Mb/s, Receive at 2790.3 Mb/s
> default: 2 streams: Send at 4129.4 Mb/s, Receive at 4128.7 Mb/s
> default: 4 streams: Send at 7949.6 Mb/s, Receive at 7735.5 Mb/s
> default: 8 streams: Send at 7930.7 Mb/s, Receive at 7910.1 Mb/s
> 1Msock: 1 streams: Send at 2810.7 Mb/s, Receive at 2810.7 Mb/s
> 1Msock: 2 streams: Send at 4093.4 Mb/s, Receive at 4092.6 Mb/s
> 1Msock: 4 streams: Send at 7887.8 Mb/s, Receive at 7880.4 Mb/s
> 1Msock: 8 streams: Send at 8091.7 Mb/s, Receive at 8082.2 Mb/s
> 
> 2.6.21.5-rt17
> ======================
> default: 1 streams: Send at 938.2 Mb/s, Receive at 938.2 Mb/s
> default: 2 streams: Send at 1476.3 Mb/s, Receive at 1436.9 Mb/s
> default: 4 streams: Send at 1489.8 Mb/s, Receive at 1145.0 Mb/s
> default: 8 streams: Send at 1099.8 Mb/s, Receive at 1079.1 Mb/s
> 1Msock: 1 streams: Send at 921.4 Mb/s, Receive at 920.4 Mb/s
> 1Msock: 2 streams: Send at 1332.2 Mb/s, Receive at 1311.5 Mb/s
> 1Msock: 4 streams: Send at 1483.0 Mb/s, Receive at 1137.8 Mb/s
> 1Msock: 8 streams: Send at 1446.2 Mb/s, Receive at 1135.6 Mb/s

Unfortunately I do not have a 4-way machine, but a dual core was enough
to show the problem:

./stream_test 127.0.0.1

** default: 1 streams: Send at 268.9 Mb/s, Receive at 268.0 Mb/s
** default: 2 streams: Send at 448.7 Mb/s, Receive at 448.4 Mb/s
** default: 4 streams: Send at 438.7 Mb/s, Receive at 438.3 Mb/s
** default: 8 streams: Send at 352.7 Mb/s, Receive at 351.2 Mb/s

** 1Msock: 1 streams: Send at 207.4 Mb/s, Receive at 207.4 Mb/s
** 1Msock: 2 streams: Send at 450.0 Mb/s, Receive at 450.0 Mb/s
** 1Msock: 4 streams: Send at 444.9 Mb/s, Receive at 444.9 Mb/s
** 1Msock: 8 streams: Send at 351.4 Mb/s, Receive at 351.4 Mb/s


and lock_stat shows the following offender:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    contentions   waittime-min   waittime-max waittime-total    con-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total    acq-bounces
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                         udp_hash_lock-W:             1           1.58           1.58           1.58              1            163          0.100           4.38         184.44            110
                         udp_hash_lock-R:        539603           1.30        4848.34     7756821.91         539131       15095136           0.36         520.09    20404391.93       11527348
                         ---------------
                           udp_hash_lock         539603          [<ffffffff80468d63>] __udp4_lib_lookup+0x3c/0x10d
                           udp_hash_lock              1          [<ffffffff804686fe>] __udp_lib_get_port+0x2c/0x21a


Which makes perfect sense, in that r/w locks are degraded to regular
exclusive locks on -rt.

So I guess you're wanting to go help out with the RCU-ification of
whatever it is that this lock protects. :-)


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-06-20  7:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-19  5:12 [-RT] multiple streams have degraded performance Vernon Mauery
2007-06-19  6:51 ` Peter Zijlstra
2007-06-19 14:25   ` Vernon Mauery
2007-06-19 15:38     ` Peter Zijlstra
2007-06-19 17:19       ` Vernon Mauery
2007-06-20  3:38 ` Vernon Mauery
2007-06-20  7:42   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox