From: Ingo Molnar <mingo@elte.hu>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, torvalds@osdl.org,
arjanv@infradead.org, nico@cam.org, jes@trained-monkey.org,
zwane@arm.linux.org.uk, oleg@tv-sign.ru, dhowells@redhat.com,
alan@lxorguk.ukuu.org.uk, bcrl@kvack.org, rostedt@goodmis.org,
hch@infradead.org, ak@suse.de, rmk+lkml@arm.linux.org.uk
Subject: Re: [patch 0/9] mutex subsystem, -V4
Date: Thu, 22 Dec 2005 13:20:11 +0100 [thread overview]
Message-ID: <20051222122011.GA20789@elte.hu> (raw)
In-Reply-To: <20051222035443.19a4b24e.akpm@osdl.org>
* Andrew Morton <akpm@osdl.org> wrote:
> I've only been following this with half an eye, with the apparently
> erroneous expectation that future versions of the patchset would come
> with some explanation of why on earth we'd want to merge all this
> stuff into the kernel.
in my initial announcement i listed 10 good reasons to do so, and they
are still true:
http://people.redhat.com/mingo/generic-mutex-subsystem/mutex-announce.txt
[...]
But firstly, i'd like to answer the most important question:
"Why the hell do we need a new mutex subsystem, and what's wrong
with semaphores??"
This question is clearly nagging most of the doubters, so i'm listing my
answers first, before fully explaining the patchset. (For more
implementational details, see the subseqent sections.)
here are the top 10 reasons of why i think the generic mutex code should
be considered for upstream integration:
- 'struct mutex' is smaller: on x86, 'struct semaphore' is 20 bytes,
'struct mutex' is 16 bytes. A smaller structure size means less RAM
footprint, and better CPU-cache utilization.
- tighter code. On x86 i get the following .text sizes when
switching all mutex-alike semaphores in the kernel to the mutex
subsystem:
text data bss dec hex filename
3280380 868188 396860 4545428 455b94 vmlinux-semaphore
3255329 865296 396732 4517357 44eded vmlinux-mutex
that's 25051 bytes of code saved, or a 0.76% win - off the hottest
codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
Smaller code means better icache footprint, which is one of the
major optimization goals in the Linux kernel currently.
- the mutex subsystem is faster and has superior scalability for
contented workloads. On an 8-way x86 system, running a mutex-based
kernel and testing creat+unlink+close (of separate, per-task files)
in /tmp with 16 parallel tasks, the average number of ops/sec is:
Semaphores: Mutexes:
$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
checking VFS performance. checking VFS performance.
avg loops/sec: 34713 avg loops/sec: 84153
CPU utilization: 63% CPU utilization: 22%
i.e. in this workload, the mutex based kernel was 2.4 times faster
than the semaphore based kernel, _and_ it also had 2.8 times less CPU
utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
more efficient.)
the scalability difference is visible even on a 2-way P4 HT box:
Semaphores: Mutexes:
$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
checking VFS performance. checking VFS performance.
avg loops/sec: 127659 avg loops/sec: 181082
CPU utilization: 100% CPU utilization: 34%
(the straight performance advantage of mutexes is 41%, the per-cycle
efficiency of mutexes is 4.1 times better.)
- there are no fastpath tradeoffs, the mutex fastpath is just as tight
as the semaphore fastpath. On x86, the locking fastpath is 2
instructions:
c0377ccb <mutex_lock>:
c0377ccb: f0 ff 08 lock decl (%eax)
c0377cce: 78 0e js c0377cde <.text.lock.mutex>
c0377cd0: c3 ret
the unlocking fastpath is equally tight:
c0377cd1 <mutex_unlock>:
c0377cd1: f0 ff 00 lock incl (%eax)
c0377cd4: 7e 0f jle c0377ce5 <.text.lock.mutex+0x7>
c0377cd6: c3 ret
- the per-call-site inlining cost of the slowpath is cheaper and
smaller than that of semaphores, by one instruction, because the
mutex trampoline code does not do a "lea %0,%%eax" that the semaphore
code does before calling __down_failed. The mutex subsystem uses out
of line code currently so this makes only a small difference in .text
size, but in case we want to inline mutexes, they will be cheaper
than semaphores.
- No wholesale or dangerous migration path. The migration to mutexes is
fundamentally opt-in, safe and easy: multiple type-based and .config
based migration helpers are provided to make the migration to mutexes
easy. Migration is as finegrained as it gets, so robustness of the
kernel or out-of-tree code should not be jeopardized at any stage.
The migration helpers can be eliminated once migration is completed,
once the kernel has been separated into 'mutex users' and 'semaphore
users'. Out-of-tree code automatically defaults to semaphore
semantics, mutexes are not forced upon anyone, at any stage of the
migration.
- 'struct mutex' semantics are well-defined and are enforced if
CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
virtually no debugging code or instrumentation. The mutex subsystem
checks and enforces the following rules:
* - only one task can hold the mutex at a time
* - only the owner can unlock the mutex
* - multiple unlocks are not permitted
* - recursive locking is not permitted
* - a mutex object must be initialized via the API
* - a mutex object must not be initialized via memset or copying
* - task may not exit with mutex held
* - memory areas where held locks reside must not be freed
furthermore, there are also convenience features in the debugging
code:
* - uses symbolic names of mutexes, whenever they are printed in debug output
* - point-of-acquire tracking, symbolic lookup of function names
* - list of all locks held in the system, printout of them
* - owner tracking
* - detects self-recursing locks and prints out all relevant info
* - detects multi-task circular deadlocks and prints out all affected
* locks and tasks (and only those tasks)
we have extensive experience with the mutex debugging code in the -rt
kernel, and it eases the debugging of mutex related bugs
considerably. A handful of upstream bugs were found as well this
way, and were contributed back to the vanilla kernel. We do believe
that improved debugging code is an important tool in improving the
fast-paced upstream kernel's quality.
a side-effect of the strict semantics is that mutexes are much easier
to analyze on a static level. E.g. Sparse could check the correctness
of mutex users, further improving the kernel's quality. Also, the
highest-level security and reliability validation techniques (and
certification processes) involve static code analysis.
- kernel/mutex.c is generic, and has minimal per-arch needs. No new
primitives have to be implemented to support spinlock-based generic
mutexes. Only 2 new atomic primitives have to be implemented for an
architecture to support optimized, lockless generic mutexes. In
contrast, to implement semaphores on a new architecture, hundreds of
lines of nontrivial (often assembly) code has to be written and
debugged.
- kernel/mutex.c is highly hackable. New locking features can be
implemented in C, and they carry over to every architecture.
Extensive self-consistency debugging checks of the mutex
implementation are done if CONFIG_DEBUG_MUTEXES is turned on. I do
think that hackability is the most important property of
kernel code.
- the generic mutex subsystem is also one more step towards enabling
the fully preemptable -rt kernel. Ok, this shouldnt bother the
upstream kernel too much at the moment, but it's a personal driving
force for me nevertheless ;-)
(NOTE: i consciously did not list 'Priority Inheritance' amongst the
reasons, because priority inheritance for blocking kernel locks
would be a misguided reason at best, and flat out wrong at worst.)
next prev parent reply other threads:[~2005-12-22 12:21 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-12-22 11:41 [patch 0/9] mutex subsystem, -V4 Ingo Molnar
2005-12-22 11:53 ` Christoph Hellwig
2005-12-22 12:45 ` Ingo Molnar
2005-12-22 15:34 ` Nicolas Pitre
2005-12-22 15:40 ` Ingo Molnar
2005-12-22 16:32 ` Nicolas Pitre
2005-12-22 16:44 ` Ingo Molnar
2005-12-22 16:58 ` Russell King
2005-12-22 21:04 ` Ingo Molnar
2005-12-22 21:26 ` Russell King
2005-12-22 21:27 ` Nicolas Pitre
2005-12-22 21:37 ` [patch 1/2] mutex subsystem: basic per arch fast path primitives Nicolas Pitre
2005-12-22 21:53 ` Christoph Hellwig
2005-12-22 21:40 ` [patch 2/2] mutex subsystem: use the per architecture fast path lock_unlock defines Nicolas Pitre
2005-12-22 21:54 ` [patch 0/9] mutex subsystem, -V4 Ingo Molnar
2005-12-22 16:58 ` Nicolas Pitre
2005-12-22 17:20 ` Christoph Hellwig
2005-12-22 17:33 ` Steven Rostedt
2005-12-22 18:24 ` Nicolas Pitre
2005-12-22 11:54 ` Andrew Morton
2005-12-22 12:20 ` Ingo Molnar [this message]
2005-12-22 13:07 ` Andrew Morton
2005-12-22 13:23 ` Arjan van de Ven
2005-12-22 13:44 ` Andrew Morton
2005-12-22 14:11 ` Alan Cox
2005-12-22 23:30 ` Andrew Morton
2005-12-22 23:34 ` Christoph Hellwig
[not found] ` <32801.10.10.10.28.1135295357.squirrel@linux1>
2005-12-22 23:49 ` Sean
2005-12-22 23:53 ` Randy.Dunlap
[not found] ` <50572.10.10.10.28.1135296023.squirrel@linux1>
2005-12-23 0:00 ` Sean
2005-12-23 0:00 ` Steven Rostedt
[not found] ` <20051222221311.2f6056ec.akpm@osdl.org>
2005-12-23 14:24 ` Nicolas Pitre
2005-12-23 14:51 ` Andrew Morton
2005-12-23 14:57 ` Russell King
2005-12-23 15:04 ` Xavier Bestel
2005-12-23 15:27 ` Andrew Morton
2005-12-23 15:00 ` Steven Rostedt
2006-01-03 17:54 ` Abhijit Bhopatkar
2005-12-25 16:08 ` Roman Zippel
2005-12-25 22:54 ` Ingo Molnar
2005-12-26 21:49 ` Roman Zippel
2005-12-25 23:04 ` Andrew Morton
2005-12-25 23:22 ` Ingo Molnar
2005-12-26 10:35 ` Andrew Morton
2005-12-26 10:42 ` Arjan van de Ven
2005-12-26 11:11 ` Andrew Morton
2005-12-26 17:15 ` Mike Galbraith
2005-12-26 17:44 ` Lee Revell
2005-12-27 0:32 ` David Lang
2005-12-26 18:15 ` Linus Torvalds
2005-12-27 14:42 ` Ingo Molnar
2005-12-27 23:02 ` Andrew Morton
2005-12-26 0:33 ` Moore's law (was Re: [patch 0/9] mutex subsystem, -V4) Pavel Machek
2006-01-05 15:30 ` Andi Kleen
2006-01-05 19:08 ` Pavel Machek
2005-12-26 15:29 ` [patch 0/9] mutex subsystem, -V4 Nicolas Pitre
2005-12-22 15:46 ` Thomas Gleixner
2005-12-22 17:40 ` Linus Torvalds
2005-12-22 20:09 ` Steven Rostedt
2005-12-22 17:17 ` Christoph Hellwig
2005-12-22 15:19 ` Nicolas Pitre
2005-12-22 21:43 ` Paul Mackerras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20051222122011.GA20789@elte.hu \
--to=mingo@elte.hu \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=arjanv@infradead.org \
--cc=bcrl@kvack.org \
--cc=dhowells@redhat.com \
--cc=hch@infradead.org \
--cc=jes@trained-monkey.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nico@cam.org \
--cc=oleg@tv-sign.ru \
--cc=rmk+lkml@arm.linux.org.uk \
--cc=rostedt@goodmis.org \
--cc=torvalds@osdl.org \
--cc=zwane@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox