* lmbench regression due to cond_resched nullification change 26-rc5 vs. 25
@ 2008-06-20 8:14 Christian Borntraeger
2008-06-20 19:10 ` Linus Torvalds
0 siblings, 1 reply; 3+ messages in thread
From: Christian Borntraeger @ 2008-06-20 8:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: LKML, Heiko Carstens, Martin Schwidefsky
Hello Linus,
On a 6-way s390 I have seen some interesting regression in 2.6.26-rc5 vs.
2.6.25 for the lmbench benchmark.
For example select file 500:
23 microseconds
32 microseconds
Several lmbench tests show a regression but I only bisected the select test
case so far:
-------------------------<snip---------------------
commit c714a534d85576af21b06be605ca55cb2fb887ee
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon May 12 13:34:13 2008 -0700
Make 'cond_resched()' nullification depend on PREEMPT_BKL
Because it's not correct with a non-preemptable BKL and just causes
PREEMPT kernels to have longer latencies than non-PREEMPT ones (which is
obviously not the point of it at all).
Of course, that config option actually got removed as an option earlier,
so for now this basically disables it entirely, but if BKL preemption is
ever resurrected it will be a meaningful optimization. And in the
meantime, it at least documents the intent of the code, while not doing
the wrong thing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5a63f2d..5395a61 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2038,7 +2038,7 @@ static inline int need_resched(void)
* cond_resched_softirq() will enable bhs before scheduling.
*/
extern int _cond_resched(void);
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPT_BKL
static inline int cond_resched(void)
{
return 0;
-------------------------<snip---------------------
Reverting that patch gives me the 2.6.25 performance.
I think the patch is fine from the correctness point of view (do resched
inside BKL protected zones if its safe) but I dont understand why it has a
large impact on the select microbenchmark. Any ideas? Is it simply the
overhead of _cond_resched?
Christian
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: lmbench regression due to cond_resched nullification change 26-rc5 vs. 25
2008-06-20 8:14 lmbench regression due to cond_resched nullification change 26-rc5 vs. 25 Christian Borntraeger
@ 2008-06-20 19:10 ` Linus Torvalds
2008-06-22 19:17 ` Christian Borntraeger
0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 2008-06-20 19:10 UTC (permalink / raw)
To: Christian Borntraeger
Cc: LKML, Heiko Carstens, Martin Schwidefsky, Ingo Molnar
On Fri, 20 Jun 2008, Christian Borntraeger wrote:
>
> On a 6-way s390 I have seen some interesting regression in 2.6.26-rc5 vs.
> 2.6.25 for the lmbench benchmark.
>
> For example select file 500:
> 23 microseconds
> 32 microseconds
>
> Several lmbench tests show a regression but I only bisected the select test
> case so far:
> -------------------------<snip---------------------
>
> commit c714a534d85576af21b06be605ca55cb2fb887ee
>
> Make 'cond_resched()' nullification depend on PREEMPT_BKL
>
> Reverting that patch gives me the 2.6.25 performance.
>
> I think the patch is fine from the correctness point of view (do resched
> inside BKL protected zones if its safe) but I dont understand why it has a
> large impact on the select microbenchmark. Any ideas? Is it simply the
> overhead of _cond_resched?
Yeah, I do think that it really is simply the overhead of _cond_resched.
Most places that use cond_resched() and friends do it in fairly heavy
loops. That inner select() loop, in contrast, is rather heavily optimized,
partly exactly because it's one of the hottest loops in one of the lmbench
benchmarks.
Of course, it *can* also be simply that it now reschedules more, and a lot
of lmbench benchmarks are very sensitive to scheduling behaviour.
Batch-mode is almost invariably more efficient than any fine-grained
scheduling can ever be, so there is always a performance count in having
low latency.
There's also a third (and rather funny) reason the "cond_resched()" may be
unnecessarily expensive in that loop: the "need_resched" bit may be set
because that very same running thread was woken up! So we may end up
deciding to reschedule because the thread thinks it needs to wake itself
up ;)
[ That third case happens because the way we avoid race conditions with
going to sleep while a wakeup event happens is that we *mark* ourselves
as being sleeping before we actually do all the tests in lmbench. ]
So there's a few different things that may make that conditional
reschedule a bad thing in the poll() loop.
But quite frankly, regardless of exactly why it happens, it absolutely
makes no sense to even have that cond_resched() in the _innermost_ loop -
the one that is called for every single fd. It's much better to move the
conditional reschedule out a bit.
That inner loop was very much designed to compile into nice assembly
language, and it's possible that the cond_resched() simply causes extra
register pressure and keeps us from keeping the bitmasks in registers etc.
So this trivial patch, which moves the cond_resched() to be outside the
single-word select() loop, would be a good idea _regardless_. Does it make
any difference for you? If it's purely a "code generation got worse"
issue, it should help a bit.
Linus
---
fs/select.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index 8dda969..da0e882 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -249,7 +249,6 @@ int do_select(int n, fd_set_bits *fds, s64 *timeout)
retval++;
}
}
- cond_resched();
}
if (res_in)
*rinp = res_in;
@@ -257,6 +256,7 @@ int do_select(int n, fd_set_bits *fds, s64 *timeout)
*routp = res_out;
if (res_ex)
*rexp = res_ex;
+ cond_resched();
}
wait = NULL;
if (retval || !*timeout || signal_pending(current))
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: lmbench regression due to cond_resched nullification change 26-rc5 vs. 25
2008-06-20 19:10 ` Linus Torvalds
@ 2008-06-22 19:17 ` Christian Borntraeger
0 siblings, 0 replies; 3+ messages in thread
From: Christian Borntraeger @ 2008-06-22 19:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: LKML, Heiko Carstens, Martin Schwidefsky, Ingo Molnar
Am Freitag, 20. Juni 2008 schrieben Sie:
> But quite frankly, regardless of exactly why it happens, it absolutely
> makes no sense to even have that cond_resched() in the _innermost_ loop -
> the one that is called for every single fd. It's much better to move the
> conditional reschedule out a bit.
>
> That inner loop was very much designed to compile into nice assembly
> language, and it's possible that the cond_resched() simply causes extra
> register pressure and keeps us from keeping the bitmasks in registers etc.
>
> So this trivial patch, which moves the cond_resched() to be outside the
> single-word select() loop, would be a good idea _regardless_. Does it make
> any difference for you? If it's purely a "code generation got worse"
> issue, it should help a bit.
Yes, it helps. I am now at 24 microseconds which is very close to the original
time for lat_select.
Thanks
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-06-22 19:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-20 8:14 lmbench regression due to cond_resched nullification change 26-rc5 vs. 25 Christian Borntraeger
2008-06-20 19:10 ` Linus Torvalds
2008-06-22 19:17 ` Christian Borntraeger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox