* A quick view of the performance benchmark for semaphore-like and mutex
@ 2012-04-17 9:36 Chen, Dennis (SRDC SW)
2012-04-17 10:09 ` Peter Zijlstra
2012-04-17 10:12 ` Peter Zijlstra
0 siblings, 2 replies; 4+ messages in thread
From: Chen, Dennis (SRDC SW) @ 2012-04-17 9:36 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar, paulmck@linux.vnet.ibm.com, peterz@infradead.org,
Paul Mackerras, Arnaldo Carvalho de Melo
Just as a quick & rough test, with below changes based on mutex (almost the same as semaphore):
--- /home/dennis/Linux/linux-3.3.2-sem/kernel/mutex.c 2012-04-17 14:59:49.823177615 +0800
+++ ./mutex.c 2012-04-17 17:00:12.963059284 +0800
@@ -140,6 +140,7 @@ __mutex_lock_common(struct mutex *lock,
preempt_disable();
mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
+#if 0
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
/*
* Optimistic spinning.
@@ -195,6 +196,7 @@ __mutex_lock_common(struct mutex *lock,
arch_mutex_cpu_relax();
}
#endif
+#endif
spin_lock_mutex(&lock->wait_lock, flags);
debug_mutex_lock_common(lock, &waiter);
#perf record -a perf bench locking mutex -p 8 -t 3000
The benchmark result BEFORE (mutex)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
round 1:
Total duration 39868 s 536095 us
real: 15.89 s
user: 0.00
sys: 0.31
Events: 64K cycles
20.18% perf [kernel.kallsyms] [k] __mutex_lock_slowpath
8.41% perf [kernel.kallsyms] [k] _raw_spin_lock
8.00% perf [kernel.kallsyms] [k] mutex_unlock
5.29% perf [kernel.kallsyms] [k] mutex_lock
2.88% perf [kernel.kallsyms] [k] link_path_walk
2.56% perf [kernel.kallsyms] [k] __mutex_unlock_slowpath
2.31% perf [kernel.kallsyms] [k] mutex_spin_on_owner
2.29% perf [kernel.kallsyms] [k] _raw_spin_lock_irqsave
1.68% perf [kernel.kallsyms] [k] __d_lookup
1.33% perf [kernel.kallsyms] [k] dput
1.33% perf [kernel.kallsyms] [k] clear_page_c
1.06% perf [kernel.kallsyms] [k] __strncpy_from_user
1.04% perf [kernel.kallsyms] [k] do_lookup
...
-------------------------------------------------------------------------------------
round 2:
Total duration 39748 s 176410 us
real: 15.92 s
user: 0.00
sys: 0.32
Events: 63K cycles
19.68% perf [kernel.kallsyms] [k] __mutex_lock_slowpath
8.53% perf [kernel.kallsyms] [k] _raw_spin_lock
7.74% perf [kernel.kallsyms] [k] mutex_unlock
5.09% perf [kernel.kallsyms] [k] mutex_lock
3.06% perf [kernel.kallsyms] [k] link_path_walk
2.54% perf [kernel.kallsyms] [k] __mutex_unlock_slowpath
2.31% perf [kernel.kallsyms] [k] mutex_spin_on_owner
2.30% perf [kernel.kallsyms] [k] _raw_spin_lock_irqsave
1.76% perf [kernel.kallsyms] [k] __d_lookup
1.46% perf [kernel.kallsyms] [k] clear_page_c
1.31% perf [kernel.kallsyms] [k] dput
1.10% perf [kernel.kallsyms] [k] __strncpy_from_user
1.08% perf [kernel.kallsyms] [k] do_lookup
...
-------------------------------------------------------------------------------------
round 3:
Total duration 40047 s 394364 us
real: 15.59 s
user: 0.00
sys: 0.30
Events: 58K cycles
19.18% perf [kernel.kallsyms] [k] __mutex_lock_slowpath
8.68% perf [kernel.kallsyms] [k] _raw_spin_lock
7.80% perf [kernel.kallsyms] [k] mutex_unlock
5.24% perf [kernel.kallsyms] [k] mutex_lock
3.22% perf [kernel.kallsyms] [k] link_path_walk
2.57% perf [kernel.kallsyms] [k] __mutex_unlock_slowpath
2.38% perf [kernel.kallsyms] [k] _raw_spin_lock_irqsave
2.13% perf [kernel.kallsyms] [k] mutex_spin_on_owner
1.79% perf [kernel.kallsyms] [k] __d_lookup
1.54% perf [kernel.kallsyms] [k] clear_page_c
1.34% perf [kernel.kallsyms] [k] dput
1.12% perf [kernel.kallsyms] [k] do_lookup
1.04% perf [kernel.kallsyms] [k] __strncpy_from_user
1.02% perf [kernel.kallsyms] [k] system_call
1.02% perf [kernel.kallsyms] [k] get_page_from_freelist
...
The benchmark result AFTER (remove the optimization part of mutex)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
round 1:
Total duration 66319 s 868892 us
real: 23.16 s
user: 0.00
sys: 0.29
Events: 81K cycles
6.30% perf [kernel.kallsyms] [k] _raw_spin_lock
3.13% perf [kernel.kallsyms] [k] mutex_unlock
3.09% perf [kernel.kallsyms] [k] mutex_lock
3.07% perf [kernel.kallsyms] [k] link_path_walk
2.66% swapper [kernel.kallsyms] [k] intel_idle
2.21% perf [kernel.kallsyms] [k] __d_lookup
1.80% perf [kernel.kallsyms] [k] clear_page_c
1.58% perf [kernel.kallsyms] [k] system_call
1.56% perf [kernel.kallsyms] [k] __strncpy_from_user
1.53% perf [kernel.kallsyms] [k] do_lookup
1.47% perf [kernel.kallsyms] [k] dput
1.43% perf [kernel.kallsyms] [k] get_page_from_freelist
1.28% perf libc-2.13.so [.] 0xa99f6
1.19% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave
1.15% perf [kernel.kallsyms] [k] vfsmount_lock_local_lock
1.12% perf [kernel.kallsyms] [k] kfree
...
-------------------------------------------------------------------------------------
round 2:
Total duration 67448 s 392232 us
real: 23.21 s
user: 0.00
sys: 0.29
Events: 82K cycles
6.23% perf [kernel.kallsyms] [k] _raw_spin_lock
3.23% perf [kernel.kallsyms] [k] mutex_unlock
3.10% perf [kernel.kallsyms] [k] mutex_lock
3.10% perf [kernel.kallsyms] [k] link_path_walk
2.59% swapper [kernel.kallsyms] [k] intel_idle
2.18% perf [kernel.kallsyms] [k] __d_lookup
1.88% perf [kernel.kallsyms] [k] clear_page_c
1.60% perf [kernel.kallsyms] [k] __strncpy_from_user
1.50% perf [kernel.kallsyms] [k] system_call
1.48% perf [kernel.kallsyms] [k] dput
1.44% perf [kernel.kallsyms] [k] do_lookup
1.33% perf [kernel.kallsyms] [k] get_page_from_freelist
1.29% perf libc-2.13.so [.] 0x82715
1.19% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave
1.11% perf [kernel.kallsyms] [k] kfree
1.10% perf [kernel.kallsyms] [k] vfsmount_lock_local_lock
1.01% perf [kernel.kallsyms] [k] __alloc_pages_nodemask
...
-------------------------------------------------------------------------------------
round 3:
Total duration 66468 s 532417 us
real: 23.35 s
user: 0.00
sys: 0.28
Events: 87K cycles
6.30% perf [kernel.kallsyms] [k] _raw_spin_lock
3.09% perf [kernel.kallsyms] [k] mutex_unlock
2.98% perf [kernel.kallsyms] [k] link_path_walk
2.98% perf [kernel.kallsyms] [k] mutex_lock
2.70% swapper [kernel.kallsyms] [k] intel_idle
2.25% perf [kernel.kallsyms] [k] __d_lookup
1.92% perf [kernel.kallsyms] [k] clear_page_c
1.56% perf [kernel.kallsyms] [k] __strncpy_from_user
1.47% perf [kernel.kallsyms] [k] dput
1.47% perf [kernel.kallsyms] [k] system_call
1.42% perf [kernel.kallsyms] [k] do_lookup
1.35% perf [kernel.kallsyms] [k] get_page_from_freelist
1.32% perf libc-2.13.so [.] 0x12902e
1.32% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave
1.10% perf [kernel.kallsyms] [k] vfsmount_lock_local_lock
1.02% perf [kernel.kallsyms] [k] kfree
1.00% perf [kernel.kallsyms] [k] __alloc_pages_nodemask
Interesting!! Semaphore-like is almost 8s slower than mutex... Also, the Events sycles of perf
reported is different
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: A quick view of the performance benchmark for semaphore-like and mutex
2012-04-17 9:36 A quick view of the performance benchmark for semaphore-like and mutex Chen, Dennis (SRDC SW)
@ 2012-04-17 10:09 ` Peter Zijlstra
2012-04-17 10:12 ` Peter Zijlstra
1 sibling, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2012-04-17 10:09 UTC (permalink / raw)
To: Chen, Dennis (SRDC SW)
Cc: linux-kernel@vger.kernel.org, Ingo Molnar,
paulmck@linux.vnet.ibm.com, Paul Mackerras,
Arnaldo Carvalho de Melo
On Tue, 2012-04-17 at 09:36 +0000, Chen, Dennis (SRDC SW) wrote:
> Just as a quick & rough test, with below changes based on mutex (almost the same as semaphore):
>
> --- /home/dennis/Linux/linux-3.3.2-sem/kernel/mutex.c 2012-04-17 14:59:49.823177615 +0800
> +++ ./mutex.c 2012-04-17 17:00:12.963059284 +0800
> @@ -140,6 +140,7 @@ __mutex_lock_common(struct mutex *lock,
> preempt_disable();
> mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
>
> +#if 0
> #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
> /*
> * Optimistic spinning.
> @@ -195,6 +196,7 @@ __mutex_lock_common(struct mutex *lock,
> arch_mutex_cpu_relax();
> }
> #endif
> +#endif
> spin_lock_mutex(&lock->wait_lock, flags);
>
> debug_mutex_lock_common(lock, &waiter);
or you do:
echo NO_OWNER_SPIN > /debug/sched_features
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: A quick view of the performance benchmark for semaphore-like and mutex
2012-04-17 9:36 A quick view of the performance benchmark for semaphore-like and mutex Chen, Dennis (SRDC SW)
2012-04-17 10:09 ` Peter Zijlstra
@ 2012-04-17 10:12 ` Peter Zijlstra
2012-04-17 11:52 ` Chen, Dennis (SRDC SW)
1 sibling, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2012-04-17 10:12 UTC (permalink / raw)
To: Chen, Dennis (SRDC SW)
Cc: linux-kernel@vger.kernel.org, Ingo Molnar,
paulmck@linux.vnet.ibm.com, Paul Mackerras,
Arnaldo Carvalho de Melo
On Tue, 2012-04-17 at 09:36 +0000, Chen, Dennis (SRDC SW) wrote:
>
> Interesting!! Semaphore-like is almost 8s slower than mutex... Also, the Events sycles of perf
> reported is different
I suspect that if you were to use actual semaphores it would be even
worse, the semaphore implementation doesn't do lock-stealing nor does it
have fancy assembly fast paths.
In fact, I don't know why you even bother with sems, they're a
deprecated serialization primitive that really shouldn't be used
anymore.
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: A quick view of the performance benchmark for semaphore-like and mutex
2012-04-17 10:12 ` Peter Zijlstra
@ 2012-04-17 11:52 ` Chen, Dennis (SRDC SW)
0 siblings, 0 replies; 4+ messages in thread
From: Chen, Dennis (SRDC SW) @ 2012-04-17 11:52 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel@vger.kernel.org, Ingo Molnar,
paulmck@linux.vnet.ibm.com, Paul Mackerras,
Arnaldo Carvalho de Melo
On Tue, Apr 17, 2012 at 6:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2012-04-17 at 09:36 +0000, Chen, Dennis (SRDC SW) wrote:
>>
>> Interesting!! Semaphore-like is almost 8s slower than mutex... Also, the Events sycles of perf
>> reported is different
>
> I suspect that if you were to use actual semaphores it would be even
> worse, the semaphore implementation doesn't do lock-stealing nor does it
> have fancy assembly fast paths.
>
> In fact, I don't know why you even bother with sems, they're a
> deprecated serialization primitive that really shouldn't be used
> anymore.
Yes, I am also apt to believe that...Actually I never been bothered by sems in my real life, just
for fun! I enjoy the data :-)
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-04-17 11:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-17 9:36 A quick view of the performance benchmark for semaphore-like and mutex Chen, Dennis (SRDC SW)
2012-04-17 10:09 ` Peter Zijlstra
2012-04-17 10:12 ` Peter Zijlstra
2012-04-17 11:52 ` Chen, Dennis (SRDC SW)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox