All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: David Schwartz <davids@webmaster.com>
Cc: "Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
	Mike Galbraith <efault@gmx.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Martin Michlmayr <tbm@cyrius.com>,
	Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
	Stephen Hemminger <shemminger@linux-foundation.org>
Subject: Re: Network slowdown due to CFS
Date: Wed, 26 Sep 2007 15:31:38 +0200	[thread overview]
Message-ID: <20070926133138.GA23187@elte.hu> (raw)
In-Reply-To: <MDEHLPKNGKAHNMBLJOLKCEPKHAAC.davids@webmaster.com>


* David Schwartz <davids@webmaster.com> wrote:

> > > I think the real fix would be for iperf to use blocking network IO 
> > > though, or maybe to use a POSIX mutex or POSIX semaphores.
> >
> > So it's definitely not a bug in the kernel, only in iperf?
> 
> Martin:
> 
> Actually, in this case I think iperf is doing the right thing (though not
> the best thing) and the kernel is doing the wrong thing. [...]

it's not doing the right thing at all. I had a quick look at the source 
code, and the reason for that weird yield usage was that there's a 
locking bug in iperf's "Reporter thread" abstraction and apparently 
instead of fixing the bug it was worked around via a horrible yield() 
based user-space lock.

the (small) patch below fixes the iperf locking bug and removes the 
yield() use. There are numerous immediate benefits of this patch:

 - iperf uses _much_ less CPU time. On my Core2Duo test system, before 
   the patch it used up 100% CPU time to saturate 1 gigabit of network 
   traffic to another box. With the patch applied it now uses 9% of 
   CPU time.

 - sys_sched_yield() is removed altogether

 - i was able to measure much higher bandwidth over localhost for 
   example. This is the case for over-the-network measurements as well.

 - the results are also more consistent and more deterministic, hence 
   more reliable as a benchmarking tool. (the reason for that is that
   more CPU time is spent on actually delivering packets, instead of
   mindlessly polling on the user-space "lock", so we actually max out
   the CPU, instead of relying on the random proportion the workload was
   able to make progress versus wasting CPU time on polling.)

sched_yield() is almost always the symptom of broken locking or other 
bug. In that sense CFS does the right thing by exposing such bugs =B-)
 
	Ingo

------------------------->
Subject: iperf: fix locking
From: Ingo Molnar <mingo@elte.hu>

fix iperf locking - it was burning CPU time while polling
unnecessarily, instead of using the proper wait primitives.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 compat/Thread.c |    3 ---
 src/Reporter.c  |   13 +++++++++----
 src/main.cpp    |    2 ++
 3 files changed, 11 insertions(+), 7 deletions(-)

Index: iperf-2.0.2/compat/Thread.c
===================================================================
--- iperf-2.0.2.orig/compat/Thread.c
+++ iperf-2.0.2/compat/Thread.c
@@ -405,9 +405,6 @@ int thread_numuserthreads( void ) {
 void thread_rest ( void ) {
 #if defined( HAVE_THREAD )
 #if defined( HAVE_POSIX_THREAD )
-    // TODO add checks for sched_yield or pthread_yield and call that
-    // if available
-    usleep( 0 );
 #else // Win32
     SwitchToThread( );
 #endif
Index: iperf-2.0.2/src/Reporter.c
===================================================================
--- iperf-2.0.2.orig/src/Reporter.c
+++ iperf-2.0.2/src/Reporter.c
@@ -111,6 +111,7 @@ report_statistics multiple_reports[kRepo
 char buffer[64]; // Buffer for printing
 ReportHeader *ReportRoot = NULL;
 extern Condition ReportCond;
+extern Condition ReportDoneCond;
 int reporter_process_report ( ReportHeader *report );
 void process_report ( ReportHeader *report );
 int reporter_handle_packet( ReportHeader *report );
@@ -338,7 +339,7 @@ void ReportPacket( ReportHeader* agent, 
             // item
             while ( index == 0 ) {
                 Condition_Signal( &ReportCond );
-                thread_rest();
+                Condition_Wait( &ReportDoneCond );
                 index = agent->reporterindex;
             }
             agent->agentindex = 0;
@@ -346,7 +347,7 @@ void ReportPacket( ReportHeader* agent, 
         // Need to make sure that reporter is not about to be "lapped"
         while ( index - 1 == agent->agentindex ) {
             Condition_Signal( &ReportCond );
-            thread_rest();
+            Condition_Wait( &ReportDoneCond );
             index = agent->reporterindex;
         }
         
@@ -553,6 +554,7 @@ void reporter_spawn( thread_Settings *th
         }
         Condition_Unlock ( ReportCond );
 
+again:
         if ( ReportRoot != NULL ) {
             ReportHeader *temp = ReportRoot;
             //Condition_Unlock ( ReportCond );
@@ -575,9 +577,12 @@ void reporter_spawn( thread_Settings *th
                 // finished with report so free it
                 free( temp );
                 Condition_Unlock ( ReportCond );
+            	Condition_Signal( &ReportDoneCond );
+		if (ReportRoot)
+			goto again;
             }
-            // yield control of CPU is another thread is waiting
-            thread_rest();
+            Condition_Signal( &ReportDoneCond );
+            usleep(10000);
         } else {
             //Condition_Unlock ( ReportCond );
         }
Index: iperf-2.0.2/src/main.cpp
===================================================================
--- iperf-2.0.2.orig/src/main.cpp
+++ iperf-2.0.2/src/main.cpp
@@ -96,6 +96,7 @@ extern "C" {
     // records being accessed in a report and also to
     // serialize modification of the report list
     Condition ReportCond;
+    Condition ReportDoneCond;
 }
 
 // global variables only accessed within this file
@@ -141,6 +142,7 @@ int main( int argc, char **argv ) {
 
     // Initialize global mutexes and conditions
     Condition_Initialize ( &ReportCond );
+    Condition_Initialize ( &ReportDoneCond );
     Mutex_Initialize( &groupCond );
     Mutex_Initialize( &clients_mutex );
 

  reply	other threads:[~2007-09-26 13:32 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-26  8:52 Network slowdown due to CFS Martin Michlmayr
2007-09-26  9:34 ` Ingo Molnar
2007-09-26  9:47   ` Ingo Molnar
2007-09-26 10:08     ` Martin Michlmayr
2007-09-26 10:18       ` Ingo Molnar
2007-09-26 10:20 ` Mike Galbraith
2007-09-26 10:23 ` Mike Galbraith
2007-09-26 10:48   ` Martin Michlmayr
2007-09-26 11:21     ` Ingo Molnar
2007-09-26 11:29       ` Martin Michlmayr
2007-09-26 12:00         ` David Schwartz
2007-09-26 13:31           ` Ingo Molnar [this message]
2007-09-26 15:40             ` Stephen Hemminger
2007-09-26 15:46             ` Stephen Hemminger
2007-09-27  9:30             ` Jarek Poplawski
2007-09-27  9:46               ` Ingo Molnar
2007-09-27 12:27                 ` Jarek Poplawski
2007-09-27 13:31                   ` Ingo Molnar
2007-09-27 14:42                     ` Jarek Poplawski
2007-09-28  6:10                       ` Nick Piggin
2007-10-01  8:43                         ` Jarek Poplawski
2007-10-01 16:25                           ` Ingo Molnar
2007-10-01 16:49                             ` David Schwartz
2007-10-01 17:31                               ` Ingo Molnar
2007-10-01 18:23                                 ` David Schwartz
2007-10-02  6:06                                   ` Ingo Molnar
2007-10-02  6:47                                     ` Andi Kleen
2007-10-03  8:02                                     ` Jarek Poplawski
2007-10-03  8:16                                       ` Ingo Molnar
2007-10-03  8:56                                         ` Jarek Poplawski
2007-10-03  9:10                                           ` Ingo Molnar
2007-10-03  9:50                                             ` Jarek Poplawski
2007-10-03 10:55                                               ` Dmitry Adamushko
2007-10-03 10:58                                                 ` Dmitry Adamushko
2007-10-03 11:20                                                   ` Jarek Poplawski
2007-10-03 11:22                                                 ` Ingo Molnar
2007-10-03 11:40                                                 ` Jarek Poplawski
2007-10-03 11:56                                                   ` yield Ingo Molnar
2007-10-03 12:16                                                     ` yield Jarek Poplawski
2007-10-07  7:18                                               ` Network slowdown due to CFS Ingo Molnar
2007-10-04  5:33                                             ` Casey Dahlin
2007-10-02  6:08                                   ` Ingo Molnar
2007-10-02  6:26                                   ` Ingo Molnar
2007-10-02  6:46                                   ` yield API Ingo Molnar
2007-10-02 11:50                                     ` linux-os (Dick Johnson)
2007-10-02 15:24                                       ` Douglas McNaught
2007-10-02 21:57                                     ` Eric St-Laurent
2007-12-12 22:39                                     ` Jesper Juhl
2007-12-13  4:43                                       ` Kyle Moffett
2007-12-13 20:10                                         ` David Schwartz
2007-10-01 19:53                               ` Network slowdown due to CFS Arjan van de Ven
2007-10-01 22:17                                 ` David Schwartz
2007-10-01 22:35                                   ` Arjan van de Ven
2007-10-01 22:44                                     ` David Schwartz
2007-10-01 22:55                                       ` Arjan van de Ven
2007-10-02 15:37                                         ` David Schwartz
2007-10-03  7:15                                           ` Jarek Poplawski
2007-10-03 11:31                               ` Helge Hafting
2007-10-04  0:31                               ` Rusty Russell
2007-10-01 16:55                             ` Chris Friesen
2007-10-01 17:09                               ` Ingo Molnar
2007-10-01 17:45                                 ` Chris Friesen
2007-10-01 19:09                                   ` iperf yield usage Ingo Molnar
2007-10-02  9:03                             ` Network slowdown due to CFS Jarek Poplawski
2007-10-02 13:39                               ` Jarek Poplawski
2007-10-02  9:26                           ` Jarek Poplawski
2007-09-27  9:49         ` Ingo Molnar
2007-09-27 10:54           ` Martin Michlmayr
2007-09-27 10:56             ` Ingo Molnar
2007-09-27 11:12               ` Martin Michlmayr
  -- strict thread matches above, loose matches on Subject: below --
2007-10-01 22:27 Hubert Tonneau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070926133138.GA23187@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=davids@webmaster.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shemminger@linux-foundation.org \
    --cc=tbm@cyrius.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.