lmbench regression due to cond_resched nullification change 26-rc5 vs. 25

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* lmbench regression due to cond_resched nullification change 26-rc5 vs. 25
@ 2008-06-20  8:14 Christian Borntraeger
  2008-06-20 19:10 ` Linus Torvalds
  0 siblings, 1 reply; 3+ messages in thread
From: Christian Borntraeger @ 2008-06-20  8:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, Heiko Carstens, Martin Schwidefsky

Hello Linus,

On a 6-way s390 I have seen some interesting regression in 2.6.26-rc5 vs. 
2.6.25 for the lmbench benchmark.

For example select file 500:
23 microseconds
32 microseconds

Several lmbench tests show a regression but I only bisected the select test 
case so far:
-------------------------<snip---------------------

commit c714a534d85576af21b06be605ca55cb2fb887ee
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon May 12 13:34:13 2008 -0700

    Make 'cond_resched()' nullification depend on PREEMPT_BKL

    Because it's not correct with a non-preemptable BKL and just causes
    PREEMPT kernels to have longer latencies than non-PREEMPT ones (which is
    obviously not the point of it at all).

    Of course, that config option actually got removed as an option earlier,
    so for now this basically disables it entirely, but if BKL preemption is
    ever resurrected it will be a meaningful optimization.  And in the
    meantime, it at least documents the intent of the code, while not doing
    the wrong thing.

    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5a63f2d..5395a61 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2038,7 +2038,7 @@ static inline int need_resched(void)
  * cond_resched_softirq() will enable bhs before scheduling.
  */
 extern int _cond_resched(void);
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPT_BKL
 static inline int cond_resched(void)
 {
        return 0;
-------------------------<snip---------------------

Reverting that patch gives me the 2.6.25 performance. 

I think the patch is fine from the correctness point of view (do resched 
inside BKL protected zones if its safe) but I dont understand why it has a 
large impact on the select microbenchmark. Any ideas? Is it simply the 
overhead of _cond_resched?

Christian

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: lmbench regression due to cond_resched nullification change 26-rc5 vs. 25
  2008-06-20  8:14 lmbench regression due to cond_resched nullification change 26-rc5 vs. 25 Christian Borntraeger
@ 2008-06-20 19:10 ` Linus Torvalds
  2008-06-22 19:17   ` Christian Borntraeger
  0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 2008-06-20 19:10 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: LKML, Heiko Carstens, Martin Schwidefsky, Ingo Molnar

On Fri, 20 Jun 2008, Christian Borntraeger wrote:
> 
> On a 6-way s390 I have seen some interesting regression in 2.6.26-rc5 vs. 
> 2.6.25 for the lmbench benchmark.
> 
> For example select file 500:
> 23 microseconds
> 32 microseconds
> 
> Several lmbench tests show a regression but I only bisected the select test 
> case so far:
> -------------------------<snip---------------------
> 
> commit c714a534d85576af21b06be605ca55cb2fb887ee
> 
>     Make 'cond_resched()' nullification depend on PREEMPT_BKL
> 
> Reverting that patch gives me the 2.6.25 performance. 
> 
> I think the patch is fine from the correctness point of view (do resched 
> inside BKL protected zones if its safe) but I dont understand why it has a 
> large impact on the select microbenchmark. Any ideas? Is it simply the 
> overhead of _cond_resched?

Yeah, I do think that it really is simply the overhead of _cond_resched.

Most places that use cond_resched() and friends do it in fairly heavy 
loops. That inner select() loop, in contrast, is rather heavily optimized, 
partly exactly because it's one of the hottest loops in one of the lmbench 
benchmarks.

Of course, it *can* also be simply that it now reschedules more, and a lot 
of lmbench benchmarks are very sensitive to scheduling behaviour. 
Batch-mode is almost invariably more efficient than any fine-grained 
scheduling can ever be, so there is always a performance count in having 
low latency.

There's also a third (and rather funny) reason the "cond_resched()" may be 
unnecessarily expensive in that loop: the "need_resched" bit may be set 
because that very same running thread was woken up! So we may end up 
deciding to reschedule because the thread thinks it needs to wake itself 
up ;)

[ That third case happens because the way we avoid race conditions with 
  going to sleep while a wakeup event happens is that we *mark* ourselves 
  as being sleeping before we actually do all the tests in lmbench. ]

So there's a few different things that may make that conditional 
reschedule a bad thing in the poll() loop.

But quite frankly, regardless of exactly why it happens, it absolutely 
makes no sense to even have that cond_resched() in the _innermost_ loop - 
the one that is called for every single fd. It's much better to move the 
conditional reschedule out a bit.

That inner loop was very much designed to compile into nice assembly 
language, and it's possible that the cond_resched() simply causes extra 
register pressure and keeps us from keeping the bitmasks in registers etc.

So this trivial patch, which moves the cond_resched() to be outside the 
single-word select() loop, would be a good idea _regardless_. Does it make 
any difference for you? If it's purely a "code generation got worse" 
issue, it should help a bit.

		Linus

---
 fs/select.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 8dda969..da0e882 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -249,7 +249,6 @@ int do_select(int n, fd_set_bits *fds, s64 *timeout)
 						retval++;
 					}
 				}
-				cond_resched();
 			}
 			if (res_in)
 				*rinp = res_in;
@@ -257,6 +256,7 @@ int do_select(int n, fd_set_bits *fds, s64 *timeout)
 				*routp = res_out;
 			if (res_ex)
 				*rexp = res_ex;
+			cond_resched();
 		}
 		wait = NULL;
 		if (retval || !*timeout || signal_pending(current))

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: lmbench regression due to cond_resched nullification change 26-rc5 vs. 25
  2008-06-20 19:10 ` Linus Torvalds
@ 2008-06-22 19:17   ` Christian Borntraeger
  0 siblings, 0 replies; 3+ messages in thread
From: Christian Borntraeger @ 2008-06-22 19:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, Heiko Carstens, Martin Schwidefsky, Ingo Molnar

Am Freitag, 20. Juni 2008 schrieben Sie:
> But quite frankly, regardless of exactly why it happens, it absolutely 
> makes no sense to even have that cond_resched() in the _innermost_ loop - 
> the one that is called for every single fd. It's much better to move the 
> conditional reschedule out a bit.
> 
> That inner loop was very much designed to compile into nice assembly 
> language, and it's possible that the cond_resched() simply causes extra 
> register pressure and keeps us from keeping the bitmasks in registers etc.
> 
> So this trivial patch, which moves the cond_resched() to be outside the 
> single-word select() loop, would be a good idea _regardless_. Does it make 
> any difference for you? If it's purely a "code generation got worse" 
> issue, it should help a bit.

Yes, it helps. I am now at 24 microseconds which is very close to the original 
time for lat_select.

Thanks


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-06-22 19:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-20  8:14 lmbench regression due to cond_resched nullification change 26-rc5 vs. 25 Christian Borntraeger
2008-06-20 19:10 ` Linus Torvalds
2008-06-22 19:17   ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox