From: Ingo Molnar <mingo@elte.hu>
To: David Schwartz <davids@webmaster.com>
Cc: "Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
Mike Galbraith <efault@gmx.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Martin Michlmayr <tbm@cyrius.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Stephen Hemminger <shemminger@linux-foundation.org>
Subject: Re: Network slowdown due to CFS
Date: Wed, 26 Sep 2007 15:31:38 +0200 [thread overview]
Message-ID: <20070926133138.GA23187@elte.hu> (raw)
In-Reply-To: <MDEHLPKNGKAHNMBLJOLKCEPKHAAC.davids@webmaster.com>
* David Schwartz <davids@webmaster.com> wrote:
> > > I think the real fix would be for iperf to use blocking network IO
> > > though, or maybe to use a POSIX mutex or POSIX semaphores.
> >
> > So it's definitely not a bug in the kernel, only in iperf?
>
> Martin:
>
> Actually, in this case I think iperf is doing the right thing (though not
> the best thing) and the kernel is doing the wrong thing. [...]
it's not doing the right thing at all. I had a quick look at the source
code, and the reason for that weird yield usage was that there's a
locking bug in iperf's "Reporter thread" abstraction and apparently
instead of fixing the bug it was worked around via a horrible yield()
based user-space lock.
the (small) patch below fixes the iperf locking bug and removes the
yield() use. There are numerous immediate benefits of this patch:
- iperf uses _much_ less CPU time. On my Core2Duo test system, before
the patch it used up 100% CPU time to saturate 1 gigabit of network
traffic to another box. With the patch applied it now uses 9% of
CPU time.
- sys_sched_yield() is removed altogether
- i was able to measure much higher bandwidth over localhost for
example. This is the case for over-the-network measurements as well.
- the results are also more consistent and more deterministic, hence
more reliable as a benchmarking tool. (the reason for that is that
more CPU time is spent on actually delivering packets, instead of
mindlessly polling on the user-space "lock", so we actually max out
the CPU, instead of relying on the random proportion the workload was
able to make progress versus wasting CPU time on polling.)
sched_yield() is almost always the symptom of broken locking or other
bug. In that sense CFS does the right thing by exposing such bugs =B-)
Ingo
------------------------->
Subject: iperf: fix locking
From: Ingo Molnar <mingo@elte.hu>
fix iperf locking - it was burning CPU time while polling
unnecessarily, instead of using the proper wait primitives.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
compat/Thread.c | 3 ---
src/Reporter.c | 13 +++++++++----
src/main.cpp | 2 ++
3 files changed, 11 insertions(+), 7 deletions(-)
Index: iperf-2.0.2/compat/Thread.c
===================================================================
--- iperf-2.0.2.orig/compat/Thread.c
+++ iperf-2.0.2/compat/Thread.c
@@ -405,9 +405,6 @@ int thread_numuserthreads( void ) {
void thread_rest ( void ) {
#if defined( HAVE_THREAD )
#if defined( HAVE_POSIX_THREAD )
- // TODO add checks for sched_yield or pthread_yield and call that
- // if available
- usleep( 0 );
#else // Win32
SwitchToThread( );
#endif
Index: iperf-2.0.2/src/Reporter.c
===================================================================
--- iperf-2.0.2.orig/src/Reporter.c
+++ iperf-2.0.2/src/Reporter.c
@@ -111,6 +111,7 @@ report_statistics multiple_reports[kRepo
char buffer[64]; // Buffer for printing
ReportHeader *ReportRoot = NULL;
extern Condition ReportCond;
+extern Condition ReportDoneCond;
int reporter_process_report ( ReportHeader *report );
void process_report ( ReportHeader *report );
int reporter_handle_packet( ReportHeader *report );
@@ -338,7 +339,7 @@ void ReportPacket( ReportHeader* agent,
// item
while ( index == 0 ) {
Condition_Signal( &ReportCond );
- thread_rest();
+ Condition_Wait( &ReportDoneCond );
index = agent->reporterindex;
}
agent->agentindex = 0;
@@ -346,7 +347,7 @@ void ReportPacket( ReportHeader* agent,
// Need to make sure that reporter is not about to be "lapped"
while ( index - 1 == agent->agentindex ) {
Condition_Signal( &ReportCond );
- thread_rest();
+ Condition_Wait( &ReportDoneCond );
index = agent->reporterindex;
}
@@ -553,6 +554,7 @@ void reporter_spawn( thread_Settings *th
}
Condition_Unlock ( ReportCond );
+again:
if ( ReportRoot != NULL ) {
ReportHeader *temp = ReportRoot;
//Condition_Unlock ( ReportCond );
@@ -575,9 +577,12 @@ void reporter_spawn( thread_Settings *th
// finished with report so free it
free( temp );
Condition_Unlock ( ReportCond );
+ Condition_Signal( &ReportDoneCond );
+ if (ReportRoot)
+ goto again;
}
- // yield control of CPU is another thread is waiting
- thread_rest();
+ Condition_Signal( &ReportDoneCond );
+ usleep(10000);
} else {
//Condition_Unlock ( ReportCond );
}
Index: iperf-2.0.2/src/main.cpp
===================================================================
--- iperf-2.0.2.orig/src/main.cpp
+++ iperf-2.0.2/src/main.cpp
@@ -96,6 +96,7 @@ extern "C" {
// records being accessed in a report and also to
// serialize modification of the report list
Condition ReportCond;
+ Condition ReportDoneCond;
}
// global variables only accessed within this file
@@ -141,6 +142,7 @@ int main( int argc, char **argv ) {
// Initialize global mutexes and conditions
Condition_Initialize ( &ReportCond );
+ Condition_Initialize ( &ReportDoneCond );
Mutex_Initialize( &groupCond );
Mutex_Initialize( &clients_mutex );
next prev parent reply other threads:[~2007-09-26 13:32 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-26 8:52 Network slowdown due to CFS Martin Michlmayr
2007-09-26 9:34 ` Ingo Molnar
2007-09-26 9:47 ` Ingo Molnar
2007-09-26 10:08 ` Martin Michlmayr
2007-09-26 10:18 ` Ingo Molnar
2007-09-26 10:20 ` Mike Galbraith
2007-09-26 10:23 ` Mike Galbraith
2007-09-26 10:48 ` Martin Michlmayr
2007-09-26 11:21 ` Ingo Molnar
2007-09-26 11:29 ` Martin Michlmayr
2007-09-26 12:00 ` David Schwartz
2007-09-26 13:31 ` Ingo Molnar [this message]
2007-09-26 15:40 ` Stephen Hemminger
2007-09-26 15:46 ` Stephen Hemminger
2007-09-27 9:30 ` Jarek Poplawski
2007-09-27 9:46 ` Ingo Molnar
2007-09-27 12:27 ` Jarek Poplawski
2007-09-27 13:31 ` Ingo Molnar
2007-09-27 14:42 ` Jarek Poplawski
2007-09-28 6:10 ` Nick Piggin
2007-10-01 8:43 ` Jarek Poplawski
2007-10-01 16:25 ` Ingo Molnar
2007-10-01 16:49 ` David Schwartz
2007-10-01 17:31 ` Ingo Molnar
2007-10-01 18:23 ` David Schwartz
2007-10-02 6:06 ` Ingo Molnar
2007-10-02 6:47 ` Andi Kleen
2007-10-03 8:02 ` Jarek Poplawski
2007-10-03 8:16 ` Ingo Molnar
2007-10-03 8:56 ` Jarek Poplawski
2007-10-03 9:10 ` Ingo Molnar
2007-10-03 9:50 ` Jarek Poplawski
2007-10-03 10:55 ` Dmitry Adamushko
2007-10-03 10:58 ` Dmitry Adamushko
2007-10-03 11:20 ` Jarek Poplawski
2007-10-03 11:22 ` Ingo Molnar
2007-10-03 11:40 ` Jarek Poplawski
2007-10-03 11:56 ` yield Ingo Molnar
2007-10-03 12:16 ` yield Jarek Poplawski
2007-10-07 7:18 ` Network slowdown due to CFS Ingo Molnar
2007-10-04 5:33 ` Casey Dahlin
2007-10-02 6:08 ` Ingo Molnar
2007-10-02 6:26 ` Ingo Molnar
2007-10-02 6:46 ` yield API Ingo Molnar
2007-10-02 11:50 ` linux-os (Dick Johnson)
2007-10-02 15:24 ` Douglas McNaught
2007-10-02 21:57 ` Eric St-Laurent
2007-12-12 22:39 ` Jesper Juhl
2007-12-13 4:43 ` Kyle Moffett
2007-12-13 20:10 ` David Schwartz
2007-10-01 19:53 ` Network slowdown due to CFS Arjan van de Ven
2007-10-01 22:17 ` David Schwartz
2007-10-01 22:35 ` Arjan van de Ven
2007-10-01 22:44 ` David Schwartz
2007-10-01 22:55 ` Arjan van de Ven
2007-10-02 15:37 ` David Schwartz
2007-10-03 7:15 ` Jarek Poplawski
2007-10-03 11:31 ` Helge Hafting
2007-10-04 0:31 ` Rusty Russell
2007-10-01 16:55 ` Chris Friesen
2007-10-01 17:09 ` Ingo Molnar
2007-10-01 17:45 ` Chris Friesen
2007-10-01 19:09 ` iperf yield usage Ingo Molnar
2007-10-02 9:03 ` Network slowdown due to CFS Jarek Poplawski
2007-10-02 13:39 ` Jarek Poplawski
2007-10-02 9:26 ` Jarek Poplawski
2007-09-27 9:49 ` Ingo Molnar
2007-09-27 10:54 ` Martin Michlmayr
2007-09-27 10:56 ` Ingo Molnar
2007-09-27 11:12 ` Martin Michlmayr
-- strict thread matches above, loose matches on Subject: below --
2007-10-01 22:27 Hubert Tonneau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070926133138.GA23187@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=davids@webmaster.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=shemminger@linux-foundation.org \
--cc=tbm@cyrius.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.