From: Ingo Molnar <mingo@elte.hu>
To: David Schwartz <davids@webmaster.com>
Cc: "Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
Mike Galbraith <efault@gmx.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Martin Michlmayr <tbm@cyrius.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Stephen Hemminger <shemminger@linux-foundation.org>
Subject: Re: Network slowdown due to CFS
Date: Wed, 26 Sep 2007 15:31:38 +0200 [thread overview]
Message-ID: <20070926133138.GA23187@elte.hu> (raw)
In-Reply-To: <MDEHLPKNGKAHNMBLJOLKCEPKHAAC.davids@webmaster.com>
* David Schwartz <davids@webmaster.com> wrote:
> > > I think the real fix would be for iperf to use blocking network IO
> > > though, or maybe to use a POSIX mutex or POSIX semaphores.
> >
> > So it's definitely not a bug in the kernel, only in iperf?
>
> Martin:
>
> Actually, in this case I think iperf is doing the right thing (though not
> the best thing) and the kernel is doing the wrong thing. [...]
it's not doing the right thing at all. I had a quick look at the source
code, and the reason for that weird yield usage was that there's a
locking bug in iperf's "Reporter thread" abstraction and apparently
instead of fixing the bug it was worked around via a horrible yield()
based user-space lock.
the (small) patch below fixes the iperf locking bug and removes the
yield() use. There are numerous immediate benefits of this patch:
- iperf uses _much_ less CPU time. On my Core2Duo test system, before
the patch it used up 100% CPU time to saturate 1 gigabit of network
traffic to another box. With the patch applied it now uses 9% of
CPU time.
- sys_sched_yield() is removed altogether
- i was able to measure much higher bandwidth over localhost for
example. This is the case for over-the-network measurements as well.
- the results are also more consistent and more deterministic, hence
more reliable as a benchmarking tool. (the reason for that is that
more CPU time is spent on actually delivering packets, instead of
mindlessly polling on the user-space "lock", so we actually max out
the CPU, instead of relying on the random proportion the workload was
able to make progress versus wasting CPU time on polling.)
sched_yield() is almost always the symptom of broken locking or other
bug. In that sense CFS does the right thing by exposing such bugs =B-)
Ingo
------------------------->
Subject: iperf: fix locking
From: Ingo Molnar <mingo@elte.hu>
fix iperf locking - it was burning CPU time while polling
unnecessarily, instead of using the proper wait primitives.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
compat/Thread.c | 3 ---
src/Reporter.c | 13 +++++++++----
src/main.cpp | 2 ++
3 files changed, 11 insertions(+), 7 deletions(-)
Index: iperf-2.0.2/compat/Thread.c
===================================================================
--- iperf-2.0.2.orig/compat/Thread.c
+++ iperf-2.0.2/compat/Thread.c
@@ -405,9 +405,6 @@ int thread_numuserthreads( void ) {
void thread_rest ( void ) {
#if defined( HAVE_THREAD )
#if defined( HAVE_POSIX_THREAD )
- // TODO add checks for sched_yield or pthread_yield and call that
- // if available
- usleep( 0 );
#else // Win32
SwitchToThread( );
#endif
Index: iperf-2.0.2/src/Reporter.c
===================================================================
--- iperf-2.0.2.orig/src/Reporter.c
+++ iperf-2.0.2/src/Reporter.c
@@ -111,6 +111,7 @@ report_statistics multiple_reports[kRepo
char buffer[64]; // Buffer for printing
ReportHeader *ReportRoot = NULL;
extern Condition ReportCond;
+extern Condition ReportDoneCond;
int reporter_process_report ( ReportHeader *report );
void process_report ( ReportHeader *report );
int reporter_handle_packet( ReportHeader *report );
@@ -338,7 +339,7 @@ void ReportPacket( ReportHeader* agent,
// item
while ( index == 0 ) {
Condition_Signal( &ReportCond );
- thread_rest();
+ Condition_Wait( &ReportDoneCond );
index = agent->reporterindex;
}
agent->agentindex = 0;
@@ -346,7 +347,7 @@ void ReportPacket( ReportHeader* agent,
// Need to make sure that reporter is not about to be "lapped"
while ( index - 1 == agent->agentindex ) {
Condition_Signal( &ReportCond );
- thread_rest();
+ Condition_Wait( &ReportDoneCond );
index = agent->reporterindex;
}
@@ -553,6 +554,7 @@ void reporter_spawn( thread_Settings *th
}
Condition_Unlock ( ReportCond );
+again:
if ( ReportRoot != NULL ) {
ReportHeader *temp = ReportRoot;
//Condition_Unlock ( ReportCond );
@@ -575,9 +577,12 @@ void reporter_spawn( thread_Settings *th
// finished with report so free it
free( temp );
Condition_Unlock ( ReportCond );
+ Condition_Signal( &ReportDoneCond );
+ if (ReportRoot)
+ goto again;
}
- // yield control of CPU is another thread is waiting
- thread_rest();
+ Condition_Signal( &ReportDoneCond );
+ usleep(10000);
} else {
//Condition_Unlock ( ReportCond );
}
Index: iperf-2.0.2/src/main.cpp
===================================================================
--- iperf-2.0.2.orig/src/main.cpp
+++ iperf-2.0.2/src/main.cpp
@@ -96,6 +96,7 @@ extern "C" {
// records being accessed in a report and also to
// serialize modification of the report list
Condition ReportCond;
+ Condition ReportDoneCond;
}
// global variables only accessed within this file
@@ -141,6 +142,7 @@ int main( int argc, char **argv ) {
// Initialize global mutexes and conditions
Condition_Initialize ( &ReportCond );
+ Condition_Initialize ( &ReportDoneCond );
Mutex_Initialize( &groupCond );
Mutex_Initialize( &clients_mutex );
next prev parent reply other threads:[~2007-09-26 13:32 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-26 8:52 Network slowdown due to CFS Martin Michlmayr
2007-09-26 9:34 ` Ingo Molnar
2007-09-26 9:47 ` Ingo Molnar
2007-09-26 10:08 ` Martin Michlmayr
2007-09-26 10:18 ` Ingo Molnar
2007-09-26 10:20 ` Mike Galbraith
2007-09-26 10:23 ` Mike Galbraith
2007-09-26 10:48 ` Martin Michlmayr
2007-09-26 11:21 ` Ingo Molnar
2007-09-26 11:29 ` Martin Michlmayr
2007-09-26 12:00 ` David Schwartz
2007-09-26 13:31 ` Ingo Molnar [this message]
2007-09-26 15:40 ` Stephen Hemminger
2007-09-26 15:46 ` Stephen Hemminger
2007-09-27 9:30 ` Jarek Poplawski
2007-09-27 9:46 ` Ingo Molnar
2007-09-27 12:27 ` Jarek Poplawski
2007-09-27 13:31 ` Ingo Molnar
2007-09-27 14:42 ` Jarek Poplawski
2007-09-28 6:10 ` Nick Piggin
2007-10-01 8:43 ` Jarek Poplawski
2007-10-01 16:25 ` Ingo Molnar
2007-10-01 16:49 ` David Schwartz
2007-10-01 17:31 ` Ingo Molnar
2007-10-01 18:23 ` David Schwartz
2007-10-02 6:06 ` Ingo Molnar
2007-10-02 6:47 ` Andi Kleen
2007-10-03 8:02 ` Jarek Poplawski
2007-10-03 8:16 ` Ingo Molnar
2007-10-03 8:56 ` Jarek Poplawski
2007-10-03 9:10 ` Ingo Molnar
2007-10-03 9:50 ` Jarek Poplawski
2007-10-03 10:55 ` Dmitry Adamushko
2007-10-03 10:58 ` Dmitry Adamushko
2007-10-03 11:20 ` Jarek Poplawski
2007-10-03 11:22 ` Ingo Molnar
2007-10-03 11:40 ` Jarek Poplawski
2007-10-03 11:56 ` yield Ingo Molnar
2007-10-03 12:16 ` yield Jarek Poplawski
2007-10-07 7:18 ` Network slowdown due to CFS Ingo Molnar
2007-10-04 5:33 ` Casey Dahlin
2007-10-02 6:08 ` Ingo Molnar
2007-10-02 6:26 ` Ingo Molnar
2007-10-02 6:46 ` yield API Ingo Molnar
2007-10-02 11:50 ` linux-os (Dick Johnson)
2007-10-02 15:24 ` Douglas McNaught
2007-10-02 21:57 ` Eric St-Laurent
2007-12-12 22:39 ` Jesper Juhl
2007-12-13 4:43 ` Kyle Moffett
2007-12-13 20:10 ` David Schwartz
2007-10-01 19:53 ` Network slowdown due to CFS Arjan van de Ven
2007-10-01 22:17 ` David Schwartz
2007-10-01 22:35 ` Arjan van de Ven
2007-10-01 22:44 ` David Schwartz
2007-10-01 22:55 ` Arjan van de Ven
2007-10-02 15:37 ` David Schwartz
2007-10-03 7:15 ` Jarek Poplawski
2007-10-03 11:31 ` Helge Hafting
2007-10-04 0:31 ` Rusty Russell
2007-10-01 16:55 ` Chris Friesen
2007-10-01 17:09 ` Ingo Molnar
2007-10-01 17:45 ` Chris Friesen
2007-10-01 19:09 ` iperf yield usage Ingo Molnar
2007-10-02 9:03 ` Network slowdown due to CFS Jarek Poplawski
2007-10-02 13:39 ` Jarek Poplawski
2007-10-02 9:26 ` Jarek Poplawski
2007-09-27 9:49 ` Ingo Molnar
2007-09-27 10:54 ` Martin Michlmayr
2007-09-27 10:56 ` Ingo Molnar
2007-09-27 11:12 ` Martin Michlmayr
-- strict thread matches above, loose matches on Subject: below --
2007-10-01 22:27 Hubert Tonneau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070926133138.GA23187@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=davids@webmaster.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=shemminger@linux-foundation.org \
--cc=tbm@cyrius.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox