From: Ingo Molnar <mingo@elte.hu>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: torvalds@linux-foundation.org, a.p.zijlstra@chello.nl,
paulmck@linux.vnet.ibm.com, ghaskins@novell.com, matthew@wil.cx,
andi@firstfloor.org, chris.mason@oracle.com,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-btrfs@vger.kernel.org, tglx@linutronix.de, npiggin@suse.de,
pmorreale@novell.com, SDietrich@novell.com,
dmitry.adamushko@gmail.com, hannes@cmpxchg.org
Subject: Re: [GIT PULL] adaptive spinning mutexes
Date: Thu, 15 Jan 2009 00:23:27 +0100 [thread overview]
Message-ID: <20090114232327.GA29821@elte.hu> (raw)
In-Reply-To: <20090114133529.317a346c.akpm@linux-foundation.org>
* Andrew Morton <akpm@linux-foundation.org> wrote:
> > I also checked Fedora and it has SCHED_DEBUG=y
> > in its kernel rpms.
>
> If all distros set SCHED_DEBUG=y then fine.
95% of the distros and significant majority of the lkml traffic.
And no, we dont generally dont provide knobs for essential performance
features of core Linux kernel primitives - so the existence of SPIN_OWNER
in /sys/debug/sched_features is an exception already.
We dont have any knob to switch ticket spinlocks to old-style spinlocks.
We dont have any knob to switch the page allocator from LIFO to FIFO. We
dont have any knob to turn off the coalescing of vmas in the MM. We dont
have any knob to turn the mmap_sem from an rwsem to a mutex to a spinlock.
Why? Beacause such design and implementation details are what make Linux
Linux, and we stand by those decisions for better or worse. And we do try
to eliminate as many 'worse' situations as possible, but we dont provide
knobs galore. We offer flexibility in our willingness to fix any genuine
performance issues in our source code.
The thing is that apps tend to gravitate towards solutions with the least
short-term cost. If a super important enterprise app can solve their
performance problem by either redesigning their broken code, or by turning
off a feature we have in the kernel in their install scripts (which we
made so easy to tune via a stable sysctl), guess which variant they will
chose? Even if they hurt all other apps in the process.
> > note that there's also a performance issue here: we generally _dont
> > want_ a debug sysctl overhead in the mutex code or in any fastpath for
> > that matter. So making it depend on SCHED_DEBUG is useful.
> >
> > sched_feat() features get optimized out at build time when SCHED_DEBUG
> > is disabled. So it gives us the best of two worlds: the utility of
> > sysctls in the SCHED_DEBUG=y, and they get compiled out in the
> > !SCHED_DEBUG case.
>
> I'm not detecting here a sufficient appreciation of the number of
> sched-related regressions we've seen in recent years, nor of the
> difficulty encountered in diagnosing and fixing them. Let alone the
> difficulty getting those fixes propagated out a *long* time after the
> regression was added.
The bugzilla you just dug out in another thread does not seem to apply, so
i'm not sure what you are referring to.
Regarding historic tendencies, we have numbers like:
[v2.6.14] [v2.6.29]
Semaphores | Mutexes
----------------------------------------------
| no-spin spin
|
[tmpfs] ops/sec: 50713 | 291038 392865 (+34.9%)
[ext3] ops/sec: 45214 | 283291 435674 (+53.7%)
10x performance improvement on ext3, compared to 2.6.14.
I'm sure there will be other numbers that go down - but the thing is,
we've _never_ been good at finding the worst-possible workload cases
during development.
> You're taking a whizzy new feature which drastically changes a critical
> core kernel feature and jamming it into mainline with a vestigial amount
> of testing coverage without giving sufficient care and thought to the
> practical lessons which we have learned from doing this in the past.
If you look at the whole existence of /sys/debug/sched_feature you'll see
how careful we've been about performance regressions. We made it a
sched_feat() exactly because if a change goes wrong and becomes a step
backwards then it's a oneliner to turn it default-off.
We made use of that facility in the past and we have a number of debug
knobs there right now:
# cat /debug/sched_features
NEW_FAIR_SLEEPERS NORMALIZED_SLEEPER WAKEUP_PREEMPT START_DEBIT
AFFINE_WAKEUPS CACHE_HOT_BUDDY SYNC_WAKEUPS NO_HRTICK NO_DOUBLE_TICK
ASYM_GRAN LB_BIAS LB_WAKEUP_UPDATE ASYM_EFF_LOAD NO_WAKEUP_OVERLAP
LAST_BUDDY OWNER_SPIN
All of those ~16 scheduler knobs were done out of caution, to make sure
that if we change some scheduling aspect there's a convenient way to debug
performance or interactivity regressions, without forcing people into
bisection and/or reboots, etc.
> This is a highly risky change. It's not that the probability of failure
> is high - the problem is that the *cost* of the improbable failure is
> high. We should seek to minimize that cost.
It never mattered much to the efficiency of finding performance
regressions whether a feature sat tight for 4 kernel releases in -mm or
went upstream in a week. It _does_ matter to stability - but not
significantly to performance.
What matteres most to getting performance right is testing exposure and
variance, not length of the integration period. Easy revertability helps
too - and that is a given here - it's literally a oneliner to disable it.
See that oneliner below.
Ingo
Index: linux/kernel/sched_features.h
===================================================================
--- linux.orig/kernel/sched_features.h
+++ linux/kernel/sched_features.h
@@ -13,4 +13,4 @@ SCHED_FEAT(LB_WAKEUP_UPDATE, 1)
SCHED_FEAT(ASYM_EFF_LOAD, 1)
SCHED_FEAT(WAKEUP_OVERLAP, 0)
SCHED_FEAT(LAST_BUDDY, 1)
-SCHED_FEAT(OWNER_SPIN, 1)
+SCHED_FEAT(OWNER_SPIN, 0)
next prev parent reply other threads:[~2009-01-14 23:24 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-12 15:37 [PATCH -v8][RFC] mutex: implement adaptive spinning Peter Zijlstra
2009-01-12 16:04 ` Linus Torvalds
2009-01-12 16:20 ` Linus Torvalds
2009-01-12 16:45 ` Chris Mason
2009-01-12 16:50 ` Peter Zijlstra
2009-01-12 17:14 ` Chris Mason
2009-01-12 17:24 ` Peter Zijlstra
2009-01-12 17:30 ` Chris Mason
2009-01-12 17:16 ` Peter Zijlstra
2009-01-12 17:33 ` Boaz Harrosh
2009-01-12 18:07 ` Peter Zijlstra
2009-01-12 16:13 ` Avi Kivity
2009-01-12 17:13 ` Peter Zijlstra
2009-01-12 17:23 ` Avi Kivity
2009-01-12 17:32 ` Avi Kivity
2009-01-14 16:46 ` Peter Zijlstra
2009-01-14 17:04 ` Nick Piggin
2009-01-14 17:23 ` Avi Kivity
2009-01-15 0:50 ` Nick Piggin
2009-01-13 15:15 ` [PATCH -v9][RFC] " Peter Zijlstra
2009-01-13 16:16 ` Linus Torvalds
2009-01-13 16:21 ` Peter Zijlstra
2009-01-13 16:39 ` Ingo Molnar
2009-01-13 16:40 ` Peter Zijlstra
2009-01-13 16:49 ` Linus Torvalds
2009-01-13 17:21 ` Peter Zijlstra
2009-01-13 18:33 ` Ingo Molnar
2009-01-13 18:40 ` Linus Torvalds
2009-01-13 19:01 ` Ingo Molnar
2009-01-14 2:58 ` Chris Mason
2009-01-14 11:18 ` Dmitry Adamushko
2009-01-14 16:47 ` Chris Mason
2009-01-14 17:32 ` Dmitry Adamushko
2009-01-14 11:21 ` Ingo Molnar
2009-01-14 15:43 ` Linus Torvalds
2009-01-14 16:23 ` Chris Mason
2009-01-14 17:06 ` [PATCH -v11 delta] " Ingo Molnar
2009-01-14 17:00 ` [PATCH -v11][RFC] " Peter Zijlstra
2009-01-14 17:18 ` Nick Piggin
2009-01-14 17:22 ` Peter Zijlstra
2009-01-15 0:46 ` Nick Piggin
2009-01-15 7:44 ` Peter Zijlstra
2009-01-15 7:52 ` Nick Piggin
2009-01-14 18:33 ` [GIT PULL] adaptive spinning mutexes Ingo Molnar
2009-01-14 18:40 ` Chris Mason
2009-01-15 9:53 ` Ingo Molnar
2009-01-14 18:47 ` Ingo Molnar
2009-01-14 19:28 ` Ingo Molnar
2009-01-15 17:44 ` Matthew Wilcox
2009-01-15 18:05 ` Linus Torvalds
2009-01-15 18:08 ` Ingo Molnar
2009-01-15 18:16 ` Linus Torvalds
2009-01-15 19:26 ` Chris Mason
2009-01-15 20:13 ` Linus Torvalds
2009-01-15 21:04 ` Chris Mason
2009-01-15 22:03 ` Ingo Molnar
2009-01-16 13:32 ` Folkert van Heusden
2009-01-16 13:57 ` Folkert van Heusden
2009-01-16 18:37 ` Bill Davidsen
2009-01-16 0:53 ` Paul E. McKenney
2009-01-16 1:01 ` Linus Torvalds
2009-01-16 1:34 ` Paul E. McKenney
2009-01-16 14:07 ` Folkert van Heusden
2009-01-16 3:03 ` Nick Piggin
2009-01-15 18:06 ` Ingo Molnar
2009-01-14 18:53 ` Andrew Morton
2009-01-14 19:00 ` Ingo Molnar
2009-01-14 19:36 ` Andrew Morton
2009-01-14 19:50 ` Peter Zijlstra
2009-01-14 20:21 ` Andrew Morton
2009-01-14 20:27 ` Ingo Molnar
2009-01-14 20:44 ` Andrew Morton
2009-01-14 20:14 ` Ingo Molnar
2009-01-14 20:30 ` Andrew Morton
2009-01-14 20:51 ` Ingo Molnar
2009-01-14 21:06 ` Andrew Morton
2009-01-14 21:14 ` Ingo Molnar
2009-01-14 21:35 ` Andrew Morton
2009-01-14 23:23 ` Ingo Molnar [this message]
2009-01-15 0:55 ` Nick Piggin
2009-01-14 21:41 ` Ingo Molnar
2009-01-14 21:50 ` Kay Sievers
2009-01-14 22:34 ` Ingo Molnar
2009-01-15 11:45 ` Folkert van Heusden
2009-01-15 12:53 ` Chris Samuel
2009-01-14 19:23 ` Peter Zijlstra
2009-01-14 19:33 ` Ingo Molnar
2009-01-15 8:41 ` [PATCH] mutex: set owner only once on acquisition Johannes Weiner
2009-01-15 8:56 ` Johannes Weiner
2009-01-13 18:12 ` [PATCH -v9][RFC] mutex: implement adaptive spinning Ingo Molnar
2009-01-13 18:21 ` Linus Torvalds
2009-01-13 18:24 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090114232327.GA29821@elte.hu \
--to=mingo@elte.hu \
--cc=SDietrich@novell.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=chris.mason@oracle.com \
--cc=dmitry.adamushko@gmail.com \
--cc=ghaskins@novell.com \
--cc=hannes@cmpxchg.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=npiggin@suse.de \
--cc=paulmck@linux.vnet.ibm.com \
--cc=pmorreale@novell.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).