* [patch 0/3] Allow inlined spinlocks again V3
@ 2009-08-12 18:39 Heiko Carstens
2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
` (3 more replies)
0 siblings, 4 replies; 34+ messages in thread
From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt
Hi Andrew,
since there were no objections to this patch set, would you mind picking
it up for -mm? The patches are against current Linus' git head but also
apply on top of the latest mmotm patch set.
===
This patch set allows to have inlined spinlocks again.
The rationale behind this is that function calls on at least s390 are
expensive.
If one considers that server kernels are usually compiled with
!CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop.
The extra overhead for a function call is significant.
With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390.
These numbers were taken with some network benchmarks. However I expect
any workload that calls frequently into the kernel and which grabs a few
locks to perform better.
The implementation is straight forward: move the function bodies of the
locking functions to static inline functions and place them in a header
file.
Dependent on CONFIG_SPINLOCK_INLINE generate out-of-line or inlined
locking functions.
The patches should be self explaining.
V2: rewritten from scratch - now also with readable code
V3: removed macro to generate out-of-line spinlock variants since that
would break ctags. As requested by Arnd Bergmann.
^ permalink raw reply [flat|nested] 34+ messages in thread* [patch 1/3] spinlock: move spinlock function bodies to header file 2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens @ 2009-08-12 18:39 ` Heiko Carstens 2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens ` (2 subsequent siblings) 3 siblings, 0 replies; 34+ messages in thread From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt [-- Attachment #1: 01_spinlock_move.diff --] [-- Type: text/plain, Size: 17156 bytes --] From: Heiko Carstens <heiko.carstens@de.ibm.com> Move spinlock function bodies to header file by creating a static inline version of each variant. Use the inline version on the out-of-line code. This shouldn't make any difference besides that the spinlock code can now be used to generate inlined spinlock code. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> --- include/linux/spinlock.h | 18 +- include/linux/spinlock_api_smp.h | 263 +++++++++++++++++++++++++++++++++++++++ kernel/spinlock.c | 174 ++++--------------------- 3 files changed, 300 insertions(+), 155 deletions(-) Index: linux-2.6/include/linux/spinlock_api_smp.h =================================================================== --- linux-2.6.orig/include/linux/spinlock_api_smp.h +++ linux-2.6/include/linux/spinlock_api_smp.h @@ -60,4 +60,267 @@ void __lockfunc _read_unlock_irqrestore( void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags) __releases(lock); +static inline int __spin_trylock(spinlock_t *lock) +{ + preempt_disable(); + if (_raw_spin_trylock(lock)) { + spin_acquire(&lock->dep_map, 0, 1, _RET_IP_); + return 1; + } + preempt_enable(); + return 0; +} + +static inline int __read_trylock(rwlock_t *lock) +{ + preempt_disable(); + if (_raw_read_trylock(lock)) { + rwlock_acquire_read(&lock->dep_map, 0, 1, _RET_IP_); + return 1; + } + preempt_enable(); + return 0; +} + +static inline int __write_trylock(rwlock_t *lock) +{ + preempt_disable(); + if (_raw_write_trylock(lock)) { + rwlock_acquire(&lock->dep_map, 0, 1, _RET_IP_); + return 1; + } + preempt_enable(); + return 0; +} + +/* + * If lockdep is enabled then we use the non-preemption spin-ops + * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are + * not re-enabled during lock-acquire (which the preempt-spin-ops do): + */ +#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC) + +static inline void __read_lock(rwlock_t *lock) +{ + preempt_disable(); + rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock); +} + +static inline unsigned long __spin_lock_irqsave(spinlock_t *lock) +{ + unsigned long flags; + + local_irq_save(flags); + preempt_disable(); + spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); + /* + * On lockdep we dont want the hand-coded irq-enable of + * _raw_spin_lock_flags() code, because lockdep assumes + * that interrupts are not re-enabled during lock-acquire: + */ +#ifdef CONFIG_LOCKDEP + LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock); +#else + _raw_spin_lock_flags(lock, &flags); +#endif + return flags; +} + +static inline void __spin_lock_irq(spinlock_t *lock) +{ + local_irq_disable(); + preempt_disable(); + spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock); +} + +static inline void __spin_lock_bh(spinlock_t *lock) +{ + local_bh_disable(); + preempt_disable(); + spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock); +} + +static inline unsigned long __read_lock_irqsave(rwlock_t *lock) +{ + unsigned long flags; + + local_irq_save(flags); + preempt_disable(); + rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED_FLAGS(lock, _raw_read_trylock, _raw_read_lock, + _raw_read_lock_flags, &flags); + return flags; +} + +static inline void __read_lock_irq(rwlock_t *lock) +{ + local_irq_disable(); + preempt_disable(); + rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock); +} + +static inline void __read_lock_bh(rwlock_t *lock) +{ + local_bh_disable(); + preempt_disable(); + rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock); +} + +static inline unsigned long __write_lock_irqsave(rwlock_t *lock) +{ + unsigned long flags; + + local_irq_save(flags); + preempt_disable(); + rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED_FLAGS(lock, _raw_write_trylock, _raw_write_lock, + _raw_write_lock_flags, &flags); + return flags; +} + +static inline void __write_lock_irq(rwlock_t *lock) +{ + local_irq_disable(); + preempt_disable(); + rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock); +} + +static inline void __write_lock_bh(rwlock_t *lock) +{ + local_bh_disable(); + preempt_disable(); + rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock); +} + +static inline void __spin_lock(spinlock_t *lock) +{ + preempt_disable(); + spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock); +} + +static inline void __write_lock(rwlock_t *lock) +{ + preempt_disable(); + rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); + LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock); +} + +#endif /* CONFIG_PREEMPT */ + +static inline void __spin_unlock(spinlock_t *lock) +{ + spin_release(&lock->dep_map, 1, _RET_IP_); + _raw_spin_unlock(lock); + preempt_enable(); +} + +static inline void __write_unlock(rwlock_t *lock) +{ + rwlock_release(&lock->dep_map, 1, _RET_IP_); + _raw_write_unlock(lock); + preempt_enable(); +} + +static inline void __read_unlock(rwlock_t *lock) +{ + rwlock_release(&lock->dep_map, 1, _RET_IP_); + _raw_read_unlock(lock); + preempt_enable(); +} + +static inline void __spin_unlock_irqrestore(spinlock_t *lock, + unsigned long flags) +{ + spin_release(&lock->dep_map, 1, _RET_IP_); + _raw_spin_unlock(lock); + local_irq_restore(flags); + preempt_enable(); +} + +static inline void __spin_unlock_irq(spinlock_t *lock) +{ + spin_release(&lock->dep_map, 1, _RET_IP_); + _raw_spin_unlock(lock); + local_irq_enable(); + preempt_enable(); +} + +static inline void __spin_unlock_bh(spinlock_t *lock) +{ + spin_release(&lock->dep_map, 1, _RET_IP_); + _raw_spin_unlock(lock); + preempt_enable_no_resched(); + local_bh_enable_ip((unsigned long)__builtin_return_address(0)); +} + +static inline void __read_unlock_irqrestore(rwlock_t *lock, unsigned long flags) +{ + rwlock_release(&lock->dep_map, 1, _RET_IP_); + _raw_read_unlock(lock); + local_irq_restore(flags); + preempt_enable(); +} + +static inline void __read_unlock_irq(rwlock_t *lock) +{ + rwlock_release(&lock->dep_map, 1, _RET_IP_); + _raw_read_unlock(lock); + local_irq_enable(); + preempt_enable(); +} + +static inline void __read_unlock_bh(rwlock_t *lock) +{ + rwlock_release(&lock->dep_map, 1, _RET_IP_); + _raw_read_unlock(lock); + preempt_enable_no_resched(); + local_bh_enable_ip((unsigned long)__builtin_return_address(0)); +} + +static inline void __write_unlock_irqrestore(rwlock_t *lock, + unsigned long flags) +{ + rwlock_release(&lock->dep_map, 1, _RET_IP_); + _raw_write_unlock(lock); + local_irq_restore(flags); + preempt_enable(); +} + +static inline void __write_unlock_irq(rwlock_t *lock) +{ + rwlock_release(&lock->dep_map, 1, _RET_IP_); + _raw_write_unlock(lock); + local_irq_enable(); + preempt_enable(); +} + +static inline void __write_unlock_bh(rwlock_t *lock) +{ + rwlock_release(&lock->dep_map, 1, _RET_IP_); + _raw_write_unlock(lock); + preempt_enable_no_resched(); + local_bh_enable_ip((unsigned long)__builtin_return_address(0)); +} + +static inline int __spin_trylock_bh(spinlock_t *lock) +{ + local_bh_disable(); + preempt_disable(); + if (_raw_spin_trylock(lock)) { + spin_acquire(&lock->dep_map, 0, 1, _RET_IP_); + return 1; + } + preempt_enable_no_resched(); + local_bh_enable_ip((unsigned long)__builtin_return_address(0)); + return 0; +} + #endif /* __LINUX_SPINLOCK_API_SMP_H */ Index: linux-2.6/include/linux/spinlock.h =================================================================== --- linux-2.6.orig/include/linux/spinlock.h +++ linux-2.6/include/linux/spinlock.h @@ -143,15 +143,6 @@ static inline void smp_mb__after_lock(vo */ #define spin_unlock_wait(lock) __raw_spin_unlock_wait(&(lock)->raw_lock) -/* - * Pull the _spin_*()/_read_*()/_write_*() functions/declarations: - */ -#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) -# include <linux/spinlock_api_smp.h> -#else -# include <linux/spinlock_api_up.h> -#endif - #ifdef CONFIG_DEBUG_SPINLOCK extern void _raw_spin_lock(spinlock_t *lock); #define _raw_spin_lock_flags(lock, flags) _raw_spin_lock(lock) @@ -380,4 +371,13 @@ extern int _atomic_dec_and_lock(atomic_t */ #define spin_can_lock(lock) (!spin_is_locked(lock)) +/* + * Pull the _spin_*()/_read_*()/_write_*() functions/declarations: + */ +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) +# include <linux/spinlock_api_smp.h> +#else +# include <linux/spinlock_api_up.h> +#endif + #endif /* __LINUX_SPINLOCK_H */ Index: linux-2.6/kernel/spinlock.c =================================================================== --- linux-2.6.orig/kernel/spinlock.c +++ linux-2.6/kernel/spinlock.c @@ -23,40 +23,19 @@ int __lockfunc _spin_trylock(spinlock_t *lock) { - preempt_disable(); - if (_raw_spin_trylock(lock)) { - spin_acquire(&lock->dep_map, 0, 1, _RET_IP_); - return 1; - } - - preempt_enable(); - return 0; + return __spin_trylock(lock); } EXPORT_SYMBOL(_spin_trylock); int __lockfunc _read_trylock(rwlock_t *lock) { - preempt_disable(); - if (_raw_read_trylock(lock)) { - rwlock_acquire_read(&lock->dep_map, 0, 1, _RET_IP_); - return 1; - } - - preempt_enable(); - return 0; + return __read_trylock(lock); } EXPORT_SYMBOL(_read_trylock); int __lockfunc _write_trylock(rwlock_t *lock) { - preempt_disable(); - if (_raw_write_trylock(lock)) { - rwlock_acquire(&lock->dep_map, 0, 1, _RET_IP_); - return 1; - } - - preempt_enable(); - return 0; + return __write_trylock(lock); } EXPORT_SYMBOL(_write_trylock); @@ -69,129 +48,74 @@ EXPORT_SYMBOL(_write_trylock); void __lockfunc _read_lock(rwlock_t *lock) { - preempt_disable(); - rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock); + __read_lock(lock); } EXPORT_SYMBOL(_read_lock); unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock) { - unsigned long flags; - - local_irq_save(flags); - preempt_disable(); - spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); - /* - * On lockdep we dont want the hand-coded irq-enable of - * _raw_spin_lock_flags() code, because lockdep assumes - * that interrupts are not re-enabled during lock-acquire: - */ -#ifdef CONFIG_LOCKDEP - LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock); -#else - _raw_spin_lock_flags(lock, &flags); -#endif - return flags; + return __spin_lock_irqsave(lock); } EXPORT_SYMBOL(_spin_lock_irqsave); void __lockfunc _spin_lock_irq(spinlock_t *lock) { - local_irq_disable(); - preempt_disable(); - spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock); + __spin_lock_irq(lock); } EXPORT_SYMBOL(_spin_lock_irq); void __lockfunc _spin_lock_bh(spinlock_t *lock) { - local_bh_disable(); - preempt_disable(); - spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock); + __spin_lock_bh(lock); } EXPORT_SYMBOL(_spin_lock_bh); unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock) { - unsigned long flags; - - local_irq_save(flags); - preempt_disable(); - rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED_FLAGS(lock, _raw_read_trylock, _raw_read_lock, - _raw_read_lock_flags, &flags); - return flags; + return __read_lock_irqsave(lock); } EXPORT_SYMBOL(_read_lock_irqsave); void __lockfunc _read_lock_irq(rwlock_t *lock) { - local_irq_disable(); - preempt_disable(); - rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock); + __read_lock_irq(lock); } EXPORT_SYMBOL(_read_lock_irq); void __lockfunc _read_lock_bh(rwlock_t *lock) { - local_bh_disable(); - preempt_disable(); - rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock); + __read_lock_bh(lock); } EXPORT_SYMBOL(_read_lock_bh); unsigned long __lockfunc _write_lock_irqsave(rwlock_t *lock) { - unsigned long flags; - - local_irq_save(flags); - preempt_disable(); - rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED_FLAGS(lock, _raw_write_trylock, _raw_write_lock, - _raw_write_lock_flags, &flags); - return flags; + return __write_lock_irqsave(lock); } EXPORT_SYMBOL(_write_lock_irqsave); void __lockfunc _write_lock_irq(rwlock_t *lock) { - local_irq_disable(); - preempt_disable(); - rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock); + __write_lock_irq(lock); } EXPORT_SYMBOL(_write_lock_irq); void __lockfunc _write_lock_bh(rwlock_t *lock) { - local_bh_disable(); - preempt_disable(); - rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock); + __write_lock_bh(lock); } EXPORT_SYMBOL(_write_lock_bh); void __lockfunc _spin_lock(spinlock_t *lock) { - preempt_disable(); - spin_acquire(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock); + __spin_lock(lock); } - EXPORT_SYMBOL(_spin_lock); void __lockfunc _write_lock(rwlock_t *lock) { - preempt_disable(); - rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); - LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock); + __write_lock(lock); } - EXPORT_SYMBOL(_write_lock); #else /* CONFIG_PREEMPT: */ @@ -320,121 +244,79 @@ EXPORT_SYMBOL(_spin_lock_nest_lock); void __lockfunc _spin_unlock(spinlock_t *lock) { - spin_release(&lock->dep_map, 1, _RET_IP_); - _raw_spin_unlock(lock); - preempt_enable(); + __spin_unlock(lock); } EXPORT_SYMBOL(_spin_unlock); void __lockfunc _write_unlock(rwlock_t *lock) { - rwlock_release(&lock->dep_map, 1, _RET_IP_); - _raw_write_unlock(lock); - preempt_enable(); + __write_unlock(lock); } EXPORT_SYMBOL(_write_unlock); void __lockfunc _read_unlock(rwlock_t *lock) { - rwlock_release(&lock->dep_map, 1, _RET_IP_); - _raw_read_unlock(lock); - preempt_enable(); + __read_unlock(lock); } EXPORT_SYMBOL(_read_unlock); void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags) { - spin_release(&lock->dep_map, 1, _RET_IP_); - _raw_spin_unlock(lock); - local_irq_restore(flags); - preempt_enable(); + __spin_unlock_irqrestore(lock, flags); } EXPORT_SYMBOL(_spin_unlock_irqrestore); void __lockfunc _spin_unlock_irq(spinlock_t *lock) { - spin_release(&lock->dep_map, 1, _RET_IP_); - _raw_spin_unlock(lock); - local_irq_enable(); - preempt_enable(); + __spin_unlock_irq(lock); } EXPORT_SYMBOL(_spin_unlock_irq); void __lockfunc _spin_unlock_bh(spinlock_t *lock) { - spin_release(&lock->dep_map, 1, _RET_IP_); - _raw_spin_unlock(lock); - preempt_enable_no_resched(); - local_bh_enable_ip((unsigned long)__builtin_return_address(0)); + __spin_unlock_bh(lock); } EXPORT_SYMBOL(_spin_unlock_bh); void __lockfunc _read_unlock_irqrestore(rwlock_t *lock, unsigned long flags) { - rwlock_release(&lock->dep_map, 1, _RET_IP_); - _raw_read_unlock(lock); - local_irq_restore(flags); - preempt_enable(); + __read_unlock_irqrestore(lock, flags); } EXPORT_SYMBOL(_read_unlock_irqrestore); void __lockfunc _read_unlock_irq(rwlock_t *lock) { - rwlock_release(&lock->dep_map, 1, _RET_IP_); - _raw_read_unlock(lock); - local_irq_enable(); - preempt_enable(); + __read_unlock_irq(lock); } EXPORT_SYMBOL(_read_unlock_irq); void __lockfunc _read_unlock_bh(rwlock_t *lock) { - rwlock_release(&lock->dep_map, 1, _RET_IP_); - _raw_read_unlock(lock); - preempt_enable_no_resched(); - local_bh_enable_ip((unsigned long)__builtin_return_address(0)); + __read_unlock_bh(lock); } EXPORT_SYMBOL(_read_unlock_bh); void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags) { - rwlock_release(&lock->dep_map, 1, _RET_IP_); - _raw_write_unlock(lock); - local_irq_restore(flags); - preempt_enable(); + __write_unlock_irqrestore(lock, flags); } EXPORT_SYMBOL(_write_unlock_irqrestore); void __lockfunc _write_unlock_irq(rwlock_t *lock) { - rwlock_release(&lock->dep_map, 1, _RET_IP_); - _raw_write_unlock(lock); - local_irq_enable(); - preempt_enable(); + __write_unlock_irq(lock); } EXPORT_SYMBOL(_write_unlock_irq); void __lockfunc _write_unlock_bh(rwlock_t *lock) { - rwlock_release(&lock->dep_map, 1, _RET_IP_); - _raw_write_unlock(lock); - preempt_enable_no_resched(); - local_bh_enable_ip((unsigned long)__builtin_return_address(0)); + __write_unlock_bh(lock); } EXPORT_SYMBOL(_write_unlock_bh); int __lockfunc _spin_trylock_bh(spinlock_t *lock) { - local_bh_disable(); - preempt_disable(); - if (_raw_spin_trylock(lock)) { - spin_acquire(&lock->dep_map, 0, 1, _RET_IP_); - return 1; - } - - preempt_enable_no_resched(); - local_bh_enable_ip((unsigned long)__builtin_return_address(0)); - return 0; + return __spin_trylock_bh(lock); } EXPORT_SYMBOL(_spin_trylock_bh); -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 2/3] spinlock: allow inlined spinlocks 2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens 2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens @ 2009-08-12 18:39 ` Heiko Carstens 2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens 2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds 3 siblings, 0 replies; 34+ messages in thread From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt [-- Attachment #1: 02_spinlock_inline.diff --] [-- Type: text/plain, Size: 4678 bytes --] From: Heiko Carstens <heiko.carstens@de.ibm.com> Add new config option SPINLOCK_INLINE and some defines which depend on it in order to generate inlined spinlock code instead of out-of-line code. Avoiding function calls for spinlocks gives 1%-5% less cpu usage on network benchmarks on s390. Architectures must select HAVE_SPINLOCK_INLINE_SUPPORT to enable this config option. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> --- include/linux/spinlock_api_smp.h | 35 +++++++++++++++++++++++++++++++++++ kernel/spinlock.c | 4 ++++ lib/Kconfig.debug | 14 ++++++++++++++ 3 files changed, 53 insertions(+) Index: linux-2.6/include/linux/spinlock_api_smp.h =================================================================== --- linux-2.6.orig/include/linux/spinlock_api_smp.h +++ linux-2.6/include/linux/spinlock_api_smp.h @@ -19,6 +19,8 @@ int in_lock_functions(unsigned long addr #define assert_spin_locked(x) BUG_ON(!spin_is_locked(x)) +#ifndef CONFIG_SPINLOCK_INLINE + void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); void __lockfunc _spin_lock_nested(spinlock_t *lock, int subclass) __acquires(lock); @@ -60,6 +62,39 @@ void __lockfunc _read_unlock_irqrestore( void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags) __releases(lock); +#else /* CONFIG_HAVE_SPINLOCK_INLINE_SUPPORT */ + +#define _spin_trylock(lock) __spin_trylock(lock) +#define _read_trylock(lock) __read_trylock(lock) +#define _write_trylock(lock) __write_trylock(lock) +#define _read_lock(lock) __read_lock(lock) +#define _spin_lock_irqsave(lock) __spin_lock_irqsave(lock) +#define _spin_lock_irq(lock) __spin_lock_irq(lock) +#define _spin_lock_bh(lock) __spin_lock_bh(lock) +#define _read_lock_irqsave(lock) __read_lock_irqsave(lock) +#define _read_lock_irq(lock) __read_lock_irq(lock) +#define _read_lock_bh(lock) __read_lock_bh(lock) +#define _write_lock_irqsave(lock) __write_lock_irqsave(lock) +#define _write_lock_irq(lock) __write_lock_irq(lock) +#define _write_lock_bh(lock) __write_lock_bh(lock) +#define _spin_lock(lock) __spin_lock(lock) +#define _write_lock(lock) __write_lock(lock) +#define _spin_unlock(lock) __spin_unlock(lock) +#define _write_unlock(lock) __write_unlock(lock) +#define _read_unlock(lock) __read_unlock(lock) +#define _spin_unlock_irq(lock) __spin_unlock_irq(lock) +#define _spin_unlock_bh(lock) __spin_unlock_bh(lock) +#define _read_unlock_irq(lock) __read_unlock_irq(lock) +#define _read_unlock_bh(lock) __read_unlock_bh(lock) +#define _write_unlock_irq(lock) __write_unlock_irq(lock) +#define _write_unlock_bh(lock) __write_unlock_bh(lock) +#define _spin_trylock_bh(lock) __spin_trylock_bh(lock) +#define _spin_unlock_irqrestore(lock, flags) __spin_unlock_irqrestore(lock, flags) +#define _read_unlock_irqrestore(lock, flags) __read_unlock_irqrestore(lock, flags) +#define _write_unlock_irqrestore(lock, flags) __write_unlock_irqrestore(lock, flags) + +#endif /* CONFIG_HAVE_SPINLOCK_INLINE_SUPPORT */ + static inline int __spin_trylock(spinlock_t *lock) { preempt_disable(); Index: linux-2.6/lib/Kconfig.debug =================================================================== --- linux-2.6.orig/lib/Kconfig.debug +++ linux-2.6/lib/Kconfig.debug @@ -879,6 +879,20 @@ config SYSCTL_SYSCALL_CHECK to properly maintain and use. This enables checks that help you to keep things correct. +config HAVE_SPINLOCK_INLINE_SUPPORT + bool + +config SPINLOCK_INLINE + bool "Inline spinlock code" + depends on HAVE_SPINLOCK_INLINE_SUPPORT + depends on !DEBUG_SPINLOCK + depends on SMP && !PREEMPT + help + Select this option if you want to have inlined spinlock code instead + of an out of line implementation. + This will generate a larger kernel image. On some architectures this + increases performance. + source mm/Kconfig.debug source kernel/trace/Kconfig Index: linux-2.6/kernel/spinlock.c =================================================================== --- linux-2.6.orig/kernel/spinlock.c +++ linux-2.6/kernel/spinlock.c @@ -21,6 +21,8 @@ #include <linux/debug_locks.h> #include <linux/module.h> +#ifndef CONFIG_SPINLOCK_INLINE + int __lockfunc _spin_trylock(spinlock_t *lock) { return __spin_trylock(lock); @@ -320,6 +322,8 @@ int __lockfunc _spin_trylock_bh(spinlock } EXPORT_SYMBOL(_spin_trylock_bh); +#endif /* CONFIG_HAVE_SPINLOCK_INLINE_SUPPORT */ + notrace int in_lock_functions(unsigned long addr) { /* Linker adds these: start and end of __lockfunc functions */ -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 3/3] spinlock: allow inlined spinlocks on s390 2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens 2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens 2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens @ 2009-08-12 18:39 ` Heiko Carstens 2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds 3 siblings, 0 replies; 34+ messages in thread From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt [-- Attachment #1: 03_spinlock_s390.diff --] [-- Type: text/plain, Size: 680 bytes --] From: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> --- arch/s390/Kconfig | 1 + 1 file changed, 1 insertion(+) Index: linux-2.6/arch/s390/Kconfig =================================================================== --- linux-2.6.orig/arch/s390/Kconfig +++ linux-2.6/arch/s390/Kconfig @@ -93,6 +93,7 @@ config S390 select HAVE_KRETPROBES select HAVE_KVM if 64BIT select HAVE_ARCH_TRACEHOOK + select HAVE_SPINLOCK_INLINE_SUPPORT select INIT_ALL_POSSIBLE select HAVE_PERF_COUNTERS select GENERIC_ATOMIC64 if !64BIT -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens ` (2 preceding siblings ...) 2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens @ 2009-08-13 18:11 ` Linus Torvalds 2009-08-13 18:34 ` Ingo Molnar 2009-08-14 12:34 ` Heiko Carstens 3 siblings, 2 replies; 34+ messages in thread From: Linus Torvalds @ 2009-08-13 18:11 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt On Wed, 12 Aug 2009, Heiko Carstens wrote: > > This patch set allows to have inlined spinlocks again. I'm not convinced this is the right approach. The thing is, CONFIG_SPINLOCK_INLINE doesn't look like the right thing to do. On x86, for example, functions calls are usually quite cheap, and it may well make sense to keep 'spin_lock()' out-of-line, especially once you start doing things like playing with the softirq counts or interrupts. At the same time, the special case of spin *unlock* is generally just a single (small) instruction on x86, so that one almost certainly is worth inlining - at least as long as it's not one of the softirq ones. In other words, I think your hammer is too large. If you want to inline the lock functions, you should _not_ make this a single "everything or nothing" thing. > If one considers that server kernels are usually compiled with > !CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop. > The extra overhead for a function call is significant. > With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390. > These numbers were taken with some network benchmarks. However I expect > any workload that calls frequently into the kernel and which grabs a few > locks to perform better. Have you tried to determine which kinds of spinlocks you care most about? With networking, I guess the softirq-safe ones (and perhaps the irq ones) are pretty common, but maybe even there it's really just a couple of cases. Now, there is one advantage to inlining that I really like - and that I personally have missed from the days when we used to always do it: profiling. Inlining the spinlocks tends to do _wonders_ for showing exactly which spinlock ends up being the contention cases for certain loads. So there are advantages outside of the "size vs speed" kind of things, but at the same time I do suspect that the disadvantages mean that we really should make this be something where an architecture can specify things at a finer granularity. Perhaps by using per-symbol defines (ie have the <asm/spinlock.h> file do things like #define __spin_lock_is_small #define __spin_unlock_is_small kind of things for the particular sequences that are worth inlining. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds @ 2009-08-13 18:34 ` Ingo Molnar 2009-08-13 18:43 ` Ingo Molnar 2009-08-14 12:34 ` Heiko Carstens 1 sibling, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2009-08-13 18:34 UTC (permalink / raw) To: Linus Torvalds Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt * Linus Torvalds <torvalds@linux-foundation.org> wrote: > Now, there is one advantage to inlining that I really like - and > that I personally have missed from the days when we used to > always do it: profiling. Inlining the spinlocks tends to do > _wonders_ for showing exactly which spinlock ends up being the > contention cases for certain loads. [...] We've got a new tool to determine that kind of information though: perf + call-graph profiling. This is how it works and how it can be used. Say we want to profile 'bw_tcp' from lmbench. The traditional way is to record a flat profile: aldebaran:~/l> perf record -f ./bw_pipe Pipe bandwidth: 4607.33 MB/sec [ perf record: Captured and wrote 2.503 MB perf.data (~109368 samples) ] and then to view the flat profile: aldebaran:~/l> perf report # Samples: 109331 # # Overhead Command Shared Object Symbol # ........ ....... ........................ ...... # 39.61% bw_pipe [kernel] [k] copy_user_generic_string 19.32% bw_pipe ./bw_pipe [.] enough_duration 6.21% bw_pipe [kernel] [k] mutex_spin_on_owner 3.54% bw_pipe ./bw_pipe [.] two_op 3.54% bw_pipe ./bw_pipe [.] one_op 2.79% bw_pipe [kernel] [k] get_page_from_freelist 2.16% bw_pipe [kernel] [k] ia32_sysenter_target 1.91% bw_pipe [kernel] [k] pipe_read 1.83% bw_pipe [kernel] [k] pipe_write 1.63% bw_pipe [kernel] [k] free_hot_cold_page 1.21% bw_pipe [kernel] [k] put_page 1.19% bw_pipe [kernel] [k] __alloc_pages_nodemask 0.99% bw_pipe [kernel] [k] anon_pipe_buf_release 0.74% bw_pipe [kernel] [k] get_pageblock_flags_group Annoyingly enough, mutex_spin_on_owner() is in the profile but we have no idea what the parent function(s) are. (Plus there's no way we could reasonably inline mutex functions.) So we are stuck with guessing which of the many other functions above (or some other function which does not even get listed!) has that mutex spinning overhead. But if you do the recording via call-graph traces: aldebaran:~/l> perf record -g -f ./bw_pipe Pipe bandwidth: 4432.75 MB/sec [ perf record: Captured and wrote 7.550 MB perf.data (~329886 samples) ] (note the -g/--graph option) And then view the profile via perf report, we get: aldebaran:~/l> perf report -S _spin_lock # symbol: _spin_lock # Samples: 724 # # Overhead Command Shared Object # ........ ....... ........................ # 98.34% lat_tcp [kernel] | |--31.74%-- ipt_do_table | | | |--26.99%-- ipt_local_in_hook | | nf_iterate | | nf_hook_slow | | ip_local_deliver | | ip_rcv_finish | | ip_rcv | | netif_receive_skb | | process_backlog | | net_rx_action | | __do_softirq | | call_softirq | | do_softirq | | _local_bh_enable_ip | | local_bh_enable | | dev_queue_xmit | | ip_finish_output2 | | ip_finish_output | | ip_output | | ip_local_out | | ip_queue_xmit | | tcp_transmit_skb | | tcp_write_xmit | | __tcp_push_pending_frames | | tcp_push | | tcp_sendmsg | | __sock_sendmsg | | sock_aio_write | | do_sync_write | | vfs_write | | sys_write | | sysenter_dispatch | | 0xf7f50430 | | 0x5 | | | |--24.34%-- ipt_local_in_hook | | nf_iterate | | nf_hook_slow | | ip_local_deliver | | ip_rcv_finish | | ip_rcv | | netif_receive_skb | | process_backlog | | net_rx_action | | __do_softirq | | call_softirq | | do_softirq | | _local_bh_enable_ip | | local_bh_enable | | dev_queue_xmit | | ip_finish_output2 | | ip_finish_output | | ip_output | | ip_local_out | | ip_queue_xmit | | tcp_transmit_skb | | tcp_write_xmit | | __tcp_push_pending_frames | | tcp_push | | tcp_sendmsg | | __sock_sendmsg | | sock_aio_write | | do_sync_write | | vfs_write | | sys_write | | sysenter_dispatch | | 0xf7f50430 | | 0x5 | | | |--16.81%-- ipt_pre_routing_hook | | nf_iterate | | nf_hook_slow | | ip_rcv | | netif_receive_skb | | process_backlog | | net_rx_action | | __do_softirq | | call_softirq | | do_softirq | | _local_bh_enable_ip | | local_bh_enable | | dev_queue_xmit | | ip_finish_output2 | | ip_finish_output | | ip_output | | ip_local_out | | ip_queue_xmit | | tcp_transmit_skb | | tcp_write_xmit | | __tcp_push_pending_frames | | tcp_push | | tcp_sendmsg | | __sock_sendmsg | | sock_aio_write | | do_sync_write | | vfs_write | | sys_write | | sysenter_dispatch | | 0xf7f50430 | | 0x5 | | | |--11.95%-- ipt_post_routing_hook | | nf_iterate | | nf_hook_slow | | ip_output | | ip_local_out | | ip_queue_xmit | | tcp_transmit_skb | | tcp_write_xmit | | __tcp_push_pending_frames | | tcp_push | | tcp_sendmsg | | __sock_sendmsg | | sock_aio_write | | do_sync_write | | vfs_write | | sys_write | | sysenter_dispatch | | 0xf7f50430 | | 0x5 | | | |--10.18%-- ipt_local_out_hook | | nf_iterate | | nf_hook_slow | | __ip_local_out | | ip_local_out | | ip_queue_xmit | | tcp_transmit_skb | | tcp_write_xmit | | __tcp_push_pending_frames | | tcp_push | | tcp_sendmsg | | __sock_sendmsg | | sock_aio_write | | do_sync_write | | vfs_write | | sys_write | | sysenter_dispatch | | 0xf7f50430 | | 0x5 | | | --9.73%-- ipt_local_hook | nf_iterate | nf_hook_slow | __ip_local_out | ip_local_out | ip_queue_xmit | tcp_transmit_skb | tcp_write_xmit | __tcp_push_pending_frames | tcp_push | tcp_sendmsg | __sock_sendmsg | sock_aio_write | do_sync_write | vfs_write | sys_write | sysenter_dispatch | 0xf7f50430 | 0x5 | |--31.46%-- __mod_timer | mod_timer | sk_reset_timer | | | |--55.36%-- tcp_send_delayed_ack | | __tcp_ack_snd_check | | tcp_rcv_established | | tcp_v4_do_rcv | | tcp_prequeue_process | | tcp_recvmsg | | sock_common_recvmsg | | __sock_recvmsg | | sock_aio_read | | do_sync_read | | vfs_read | | sys_read | | sysenter_dispatch | | 0xf7f50430 | | 0x5 | | | --44.64%-- tcp_v4_rcv | ip_local_deliver_finish | ip_local_deliver | ip_rcv_finish | ip_rcv | netif_receive_skb | process_backlog | net_rx_action | __do_softirq | call_softirq | do_softirq | _local_bh_enable_ip | local_bh_enable | dev_queue_xmit | ip_finish_output2 | ip_finish_output | ip_output | ip_local_out | ip_queue_xmit | tcp_transmit_skb | tcp_write_xmit | __tcp_push_pending_frames | tcp_push | tcp_sendmsg | __sock_sendmsg | sock_aio_write | do_sync_write | vfs_write | sys_write | sysenter_dispatch | 0xf7f50430 | 0x5 | |--21.21%-- task_rq_lock | try_to_wake_up | default_wake_function | | | |--99.34%-- autoremove_wake_function | | __wake_up_common | | __wake_up | | tcp_v4_rcv | | ip_local_deliver_finish | | ip_local_deliver | | ip_rcv_finish | | ip_rcv | | netif_receive_skb | | process_backlog | | net_rx_action | | __do_softirq | | call_softirq | | do_softirq | | _local_bh_enable_ip | | local_bh_enable | | dev_queue_xmit | | ip_finish_output2 | | ip_finish_output | | ip_output | | ip_local_out | | ip_queue_xmit | | tcp_transmit_skb | | tcp_write_xmit | | __tcp_push_pending_frames | | tcp_push | | tcp_sendmsg | | __sock_sendmsg | | sock_aio_write | | do_sync_write | | vfs_write | | sys_write | | sysenter_dispatch | | 0xf7f50430 | | 0x5 | | | --0.66%-- pollwake | __wake_up_common | __wake_up | perf_counter_wakeup | perf_pending_counter | perf_counter_do_pending | smp_perf_pending_interrupt | perf_pending_interrupt | 0xf7f50430 | 0x5 | |--5.06%-- tcp_v4_rcv | ip_local_deliver_finish | ip_local_deliver | ip_rcv_finish | ip_rcv | netif_receive_skb | process_backlog | net_rx_action | __do_softirq | call_softirq | do_softirq | _local_bh_enable_ip | local_bh_enable | dev_queue_xmit | ip_finish_output2 | ip_finish_output | ip_output | ip_local_out | ip_queue_xmit | tcp_transmit_skb | tcp_write_xmit | __tcp_push_pending_frames | tcp_push | tcp_sendmsg | __sock_sendmsg | sock_aio_write | do_sync_write | vfs_write | sys_write | sysenter_dispatch | 0xf7f50430 | 0x5 | |--4.07%-- __perf_counter_sched_out | perf_counter_task_sched_out | schedule | schedule_timeout | sk_wait_data | tcp_recvmsg | sock_common_recvmsg | __sock_recvmsg | sock_aio_read | do_sync_read | vfs_read | sys_read | sysenter_dispatch | 0xf7f50430 | 0x5 | |--1.40%-- ipt_local_hook | nf_iterate | nf_hook_slow | __ip_local_out | ip_local_out | ip_queue_xmit | tcp_transmit_skb | tcp_write_xmit | __tcp_push_pending_frames | tcp_push | tcp_sendmsg | __sock_sendmsg | sock_aio_write | do_sync_write | vfs_write | sys_write | sysenter_dispatch | 0xf7f50430 | 0x5 | |--1.26%-- mod_timer | sk_reset_timer | | | |--66.67%-- tcp_send_delayed_ack | | __tcp_ack_snd_check | | tcp_rcv_established | | tcp_v4_do_rcv | | tcp_prequeue_process | | tcp_recvmsg | | sock_common_recvmsg | | __sock_recvmsg | | sock_aio_read | | do_sync_read | | vfs_read | | sys_read | | sysenter_dispatch | | 0xf7f50430 | | 0x5 | | | --33.33%-- tcp_v4_rcv | ip_local_deliver_finish | ip_local_deliver | ip_rcv_finish | ip_rcv | netif_receive_skb | process_backlog | net_rx_action | __do_softirq | call_softirq | do_softirq | _local_bh_enable_ip | local_bh_enable | dev_queue_xmit | ip_finish_output2 | ip_finish_output | ip_output | ip_local_out | ip_queue_xmit | tcp_transmit_skb | tcp_write_xmit | __tcp_push_pending_frames | tcp_push | tcp_sendmsg | __sock_sendmsg | sock_aio_write | do_sync_write | vfs_write | sys_write | sysenter_dispatch | 0xf7f50430 | 0x5 | |--0.70%-- ipt_local_in_hook | nf_iterate | nf_hook_slow | ip_local_deliver | ip_rcv_finish | ip_rcv | netif_receive_skb | process_backlog | net_rx_action | __do_softirq | call_softirq | do_softirq | _local_bh_enable_ip | local_bh_enable | dev_queue_xmit | ip_finish_output2 | ip_finish_output | ip_output | ip_local_out | ip_queue_xmit | tcp_transmit_skb | tcp_write_xmit | __tcp_push_pending_frames | tcp_push | tcp_sendmsg | __sock_sendmsg | sock_aio_write | do_sync_write | vfs_write | sys_write | sysenter_dispatch | 0xf7f50430 | 0x5 | |--0.56%-- perf_counter_task_sched_out | schedule | schedule_timeout | sk_wait_data | tcp_recvmsg | sock_common_recvmsg | __sock_recvmsg | sock_aio_read | do_sync_read | vfs_read | sys_read | sysenter_dispatch | 0xf7f50430 | 0x5 --2.53%-- [...] 1.66% perf [kernel] | |--25.00%-- journal_stop | __ext3_journal_stop | ext3_writeback_write_end | generic_perform_write | generic_file_buffered_write | __generic_file_aio_write_nolock | generic_file_aio_write | ext3_file_write | do_sync_write | vfs_write | sys_write | system_call_fastpath | __write_nocancel | |--16.67%-- __perf_counter_sched_out | perf_counter_task_sched_out | schedule | schedule_hrtimeout_range | poll_schedule_timeout | do_poll | do_sys_poll | sys_poll | system_call_fastpath | __GI___poll | |--8.33%-- dup_mmap | dup_mm | copy_process | do_fork | sys_clone | stub_clone | fork | __cmd_record | |--8.33%-- hrtimer_interrupt | smp_apic_timer_interrupt | apic_timer_interrupt | __write_nocancel | |--8.33%-- journal_start | ext3_journal_start_sb | ext3_write_begin | generic_perform_write | generic_file_buffered_write | __generic_file_aio_write_nolock | generic_file_aio_write | ext3_file_write | do_sync_write | vfs_write | sys_write | system_call_fastpath | __write_nocancel | |--8.33%-- create_empty_buffers | __block_prepare_write | block_write_begin | ext3_write_begin | generic_perform_write | generic_file_buffered_write | __generic_file_aio_write_nolock | generic_file_aio_write | ext3_file_write | do_sync_write | vfs_write | sys_write | system_call_fastpath | __write_nocancel | |--8.33%-- perf_counter_task_sched_out | schedule | schedule_hrtimeout_range | poll_schedule_timeout | do_poll | do_sys_poll | sys_poll | system_call_fastpath | __GI___poll | |--8.33%-- ext3_alloc_branch | ext3_get_blocks_handle | ext3_get_block | __block_prepare_write | block_write_begin | ext3_write_begin | generic_perform_write | generic_file_buffered_write | __generic_file_aio_write_nolock | generic_file_aio_write | ext3_file_write | do_sync_write | vfs_write | sys_write | system_call_fastpath | __write_nocancel | --8.33%-- start_this_handle journal_start ext3_journal_start_sb ext3_write_begin generic_perform_write generic_file_buffered_write __generic_file_aio_write_nolock generic_file_aio_write ext3_file_write do_sync_write vfs_write sys_write system_call_fastpath __write_nocancel the graph above (which is the sub-graph of the full graph, limited via the -S/--symbols option to all callchains that intersect with the _spin_lock symbol) details all the various code-paths that make use of _spin_lock(), and the weights (and relative weights) of those codepaths. Note that this works for just about any locking primitive, and note how it's recursive. If we inlined spin_lock we'd only know this __mod_timer overhead: |--31.46%-- __mod_timer but we wouldnt know the second level (which is interesting too): | |--55.36%-- tcp_send_delayed_ack | --44.64%-- tcp_v4_rcv All in one, i think we should detach the issue of instrumentation and profiling from the issue of whether to inline or not. We should inline where it makes things faster. Our profiling tools are able to deal with it, independently of whether we inline or not. Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-13 18:34 ` Ingo Molnar @ 2009-08-13 18:43 ` Ingo Molnar 0 siblings, 0 replies; 34+ messages in thread From: Ingo Molnar @ 2009-08-13 18:43 UTC (permalink / raw) To: Linus Torvalds Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt * Ingo Molnar <mingo@elte.hu> wrote: > > * Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > Now, there is one advantage to inlining that I really like - and > > that I personally have missed from the days when we used to > > always do it: profiling. Inlining the spinlocks tends to do > > _wonders_ for showing exactly which spinlock ends up being the > > contention cases for certain loads. [...] > > We've got a new tool to determine that kind of information though: > perf + call-graph profiling. > > This is how it works and how it can be used. Say we want to profile > 'bw_tcp' from lmbench. The traditional way is to record a flat > profile: > > aldebaran:~/l> perf record -f ./bw_pipe > Pipe bandwidth: 4607.33 MB/sec > [ perf record: Captured and wrote 2.503 MB perf.data (~109368 samples) ] > > and then to view the flat profile: > > aldebaran:~/l> perf report > # Samples: 109331 > # > # Overhead Command Shared Object Symbol > # ........ ....... ........................ ...... > # > 39.61% bw_pipe [kernel] [k] copy_user_generic_string > 19.32% bw_pipe ./bw_pipe [.] enough_duration > 6.21% bw_pipe [kernel] [k] mutex_spin_on_owner > 3.54% bw_pipe ./bw_pipe [.] two_op > 3.54% bw_pipe ./bw_pipe [.] one_op > 2.79% bw_pipe [kernel] [k] get_page_from_freelist > 2.16% bw_pipe [kernel] [k] ia32_sysenter_target > 1.91% bw_pipe [kernel] [k] pipe_read > 1.83% bw_pipe [kernel] [k] pipe_write > 1.63% bw_pipe [kernel] [k] free_hot_cold_page > 1.21% bw_pipe [kernel] [k] put_page > 1.19% bw_pipe [kernel] [k] __alloc_pages_nodemask > 0.99% bw_pipe [kernel] [k] anon_pipe_buf_release > 0.74% bw_pipe [kernel] [k] get_pageblock_flags_group > > Annoyingly enough, mutex_spin_on_owner() is in the profile but we > have no idea what the parent function(s) are. (Plus there's no way > we could reasonably inline mutex functions.) So we are stuck with > guessing which of the many other functions above (or some other > function which does not even get listed!) has that mutex spinning > overhead. > > But if you do the recording via call-graph traces: > > aldebaran:~/l> perf record -g -f ./bw_pipe > Pipe bandwidth: 4432.75 MB/sec > [ perf record: Captured and wrote 7.550 MB perf.data (~329886 samples) ] > > (note the -g/--graph option) And then view the profile via perf > report, we get: sorry - mail got trimmed due to a cut&paste mishap. Here's the missing bw_pipe profile: # Samples: 81007 # # Overhead Command Shared Object Symbol # ........ ....... ........................ ...... # 7.60% bw_pipe [kernel] [k] mutex_spin_on_owner | |--99.97%-- __mutex_lock_common | __mutex_lock_slowpath | mutex_lock | | | |--98.46%-- pipe_write | | do_sync_write | | vfs_write | | sys_write | | sysenter_dispatch | | 0xf7f4f430 | | 0x600000007 | | | --1.54%-- pipe_lock | pipe_wait | pipe_write | do_sync_write | vfs_write | sys_write | sysenter_dispatch | 0xf7f4f430 | 0x600000007 --0.03%-- [...] it shows that the mutex overhead is distributed between these two functions: | |--98.46%-- pipe_write | --1.54%-- pipe_lock I also wanted to show a more complex example, how spin-lock overhead gets distributed in the networking stack, in a "lat_tcp localhost" workload: > > aldebaran:~/l> perf report -S _spin_lock > > # symbol: _spin_lock > # Samples: 724 > # > # Overhead Command Shared Object > # ........ ....... ........................ > # > 98.34% lat_tcp [kernel] > | > |--31.74%-- ipt_do_table > | | > | |--26.99%-- ipt_local_in_hook > | | nf_iterate > | | nf_hook_slow > | | ip_local_deliver > | | ip_rcv_finish > | | ip_rcv > | | netif_receive_skb > | | process_backlog > | | net_rx_action > | | __do_softirq > | | call_softirq > | | do_softirq > | | _local_bh_enable_ip > | | local_bh_enable > | | dev_queue_xmit > | | ip_finish_output2 > | | ip_finish_output > | | ip_output > | | ip_local_out > | | ip_queue_xmit > | | tcp_transmit_skb > | | tcp_write_xmit > | | __tcp_push_pending_frames > | | tcp_push > | | tcp_sendmsg > | | __sock_sendmsg > | | sock_aio_write > | | do_sync_write > | | vfs_write > | | sys_write > | | sysenter_dispatch > | | 0xf7f50430 > | | 0x5 > | | > | |--24.34%-- ipt_local_in_hook > | | nf_iterate > | | nf_hook_slow > | | ip_local_deliver > | | ip_rcv_finish > | | ip_rcv > | | netif_receive_skb > | | process_backlog > | | net_rx_action > | | __do_softirq > | | call_softirq > | | do_softirq > | | _local_bh_enable_ip > | | local_bh_enable > | | dev_queue_xmit > | | ip_finish_output2 > | | ip_finish_output > | | ip_output > | | ip_local_out > | | ip_queue_xmit > | | tcp_transmit_skb > | | tcp_write_xmit > | | __tcp_push_pending_frames > | | tcp_push > | | tcp_sendmsg > | | __sock_sendmsg > | | sock_aio_write > | | do_sync_write > | | vfs_write > | | sys_write > | | sysenter_dispatch > | | 0xf7f50430 > | | 0x5 > | | > | |--16.81%-- ipt_pre_routing_hook > | | nf_iterate > | | nf_hook_slow > | | ip_rcv > | | netif_receive_skb > | | process_backlog > | | net_rx_action > | | __do_softirq > | | call_softirq > | | do_softirq > | | _local_bh_enable_ip > | | local_bh_enable > | | dev_queue_xmit > | | ip_finish_output2 > | | ip_finish_output > | | ip_output > | | ip_local_out > | | ip_queue_xmit > | | tcp_transmit_skb > | | tcp_write_xmit > | | __tcp_push_pending_frames > | | tcp_push > | | tcp_sendmsg > | | __sock_sendmsg > | | sock_aio_write > | | do_sync_write > | | vfs_write > | | sys_write > | | sysenter_dispatch > | | 0xf7f50430 > | | 0x5 > | | > | |--11.95%-- ipt_post_routing_hook > | | nf_iterate > | | nf_hook_slow > | | ip_output > | | ip_local_out > | | ip_queue_xmit > | | tcp_transmit_skb > | | tcp_write_xmit > | | __tcp_push_pending_frames > | | tcp_push > | | tcp_sendmsg > | | __sock_sendmsg > | | sock_aio_write > | | do_sync_write > | | vfs_write > | | sys_write > | | sysenter_dispatch > | | 0xf7f50430 > | | 0x5 > | | > | |--10.18%-- ipt_local_out_hook > | | nf_iterate > | | nf_hook_slow > | | __ip_local_out > | | ip_local_out > | | ip_queue_xmit > | | tcp_transmit_skb > | | tcp_write_xmit > | | __tcp_push_pending_frames > | | tcp_push > | | tcp_sendmsg > | | __sock_sendmsg > | | sock_aio_write > | | do_sync_write > | | vfs_write > | | sys_write > | | sysenter_dispatch > | | 0xf7f50430 > | | 0x5 > | | > | --9.73%-- ipt_local_hook > | nf_iterate > | nf_hook_slow > | __ip_local_out > | ip_local_out > | ip_queue_xmit > | tcp_transmit_skb > | tcp_write_xmit > | __tcp_push_pending_frames > | tcp_push > | tcp_sendmsg > | __sock_sendmsg > | sock_aio_write > | do_sync_write > | vfs_write > | sys_write > | sysenter_dispatch > | 0xf7f50430 > | 0x5 > | > |--31.46%-- __mod_timer > | mod_timer > | sk_reset_timer > | | > | |--55.36%-- tcp_send_delayed_ack > | | __tcp_ack_snd_check > | | tcp_rcv_established > | | tcp_v4_do_rcv > | | tcp_prequeue_process > | | tcp_recvmsg > | | sock_common_recvmsg > | | __sock_recvmsg > | | sock_aio_read > | | do_sync_read > | | vfs_read > | | sys_read > | | sysenter_dispatch > | | 0xf7f50430 > | | 0x5 > | | > | --44.64%-- tcp_v4_rcv > | ip_local_deliver_finish > | ip_local_deliver > | ip_rcv_finish > | ip_rcv > | netif_receive_skb > | process_backlog > | net_rx_action > | __do_softirq > | call_softirq > | do_softirq > | _local_bh_enable_ip > | local_bh_enable > | dev_queue_xmit > | ip_finish_output2 > | ip_finish_output > | ip_output > | ip_local_out > | ip_queue_xmit > | tcp_transmit_skb > | tcp_write_xmit > | __tcp_push_pending_frames > | tcp_push > | tcp_sendmsg > | __sock_sendmsg > | sock_aio_write > | do_sync_write > | vfs_write > | sys_write > | sysenter_dispatch > | 0xf7f50430 > | 0x5 > | > |--21.21%-- task_rq_lock > | try_to_wake_up > | default_wake_function > | | > | |--99.34%-- autoremove_wake_function > | | __wake_up_common > | | __wake_up > | | tcp_v4_rcv > | | ip_local_deliver_finish > | | ip_local_deliver > | | ip_rcv_finish > | | ip_rcv > | | netif_receive_skb > | | process_backlog > | | net_rx_action > | | __do_softirq > | | call_softirq > | | do_softirq > | | _local_bh_enable_ip > | | local_bh_enable > | | dev_queue_xmit > | | ip_finish_output2 > | | ip_finish_output > | | ip_output > | | ip_local_out > | | ip_queue_xmit > | | tcp_transmit_skb > | | tcp_write_xmit > | | __tcp_push_pending_frames > | | tcp_push > | | tcp_sendmsg > | | __sock_sendmsg > | | sock_aio_write > | | do_sync_write > | | vfs_write > | | sys_write > | | sysenter_dispatch > | | 0xf7f50430 > | | 0x5 > | | > | --0.66%-- pollwake > | __wake_up_common > | __wake_up > | perf_counter_wakeup > | perf_pending_counter > | perf_counter_do_pending > | smp_perf_pending_interrupt > | perf_pending_interrupt > | 0xf7f50430 > | 0x5 > | > |--5.06%-- tcp_v4_rcv > | ip_local_deliver_finish > | ip_local_deliver > | ip_rcv_finish > | ip_rcv > | netif_receive_skb > | process_backlog > | net_rx_action > | __do_softirq > | call_softirq > | do_softirq > | _local_bh_enable_ip > | local_bh_enable > | dev_queue_xmit > | ip_finish_output2 > | ip_finish_output > | ip_output > | ip_local_out > | ip_queue_xmit > | tcp_transmit_skb > | tcp_write_xmit > | __tcp_push_pending_frames > | tcp_push > | tcp_sendmsg > | __sock_sendmsg > | sock_aio_write > | do_sync_write > | vfs_write > | sys_write > | sysenter_dispatch > | 0xf7f50430 > | 0x5 > | > |--4.07%-- __perf_counter_sched_out > | perf_counter_task_sched_out > | schedule > | schedule_timeout > | sk_wait_data > | tcp_recvmsg > | sock_common_recvmsg > | __sock_recvmsg > | sock_aio_read > | do_sync_read > | vfs_read > | sys_read > | sysenter_dispatch > | 0xf7f50430 > | 0x5 > | > |--1.40%-- ipt_local_hook > | nf_iterate > | nf_hook_slow > | __ip_local_out > | ip_local_out > | ip_queue_xmit > | tcp_transmit_skb > | tcp_write_xmit > | __tcp_push_pending_frames > | tcp_push > | tcp_sendmsg > | __sock_sendmsg > | sock_aio_write > | do_sync_write > | vfs_write > | sys_write > | sysenter_dispatch > | 0xf7f50430 > | 0x5 > | > |--1.26%-- mod_timer > | sk_reset_timer > | | > | |--66.67%-- tcp_send_delayed_ack > | | __tcp_ack_snd_check > | | tcp_rcv_established > | | tcp_v4_do_rcv > | | tcp_prequeue_process > | | tcp_recvmsg > | | sock_common_recvmsg > | | __sock_recvmsg > | | sock_aio_read > | | do_sync_read > | | vfs_read > | | sys_read > | | sysenter_dispatch > | | 0xf7f50430 > | | 0x5 > | | > | --33.33%-- tcp_v4_rcv > | ip_local_deliver_finish > | ip_local_deliver > | ip_rcv_finish > | ip_rcv > | netif_receive_skb > | process_backlog > | net_rx_action > | __do_softirq > | call_softirq > | do_softirq > | _local_bh_enable_ip > | local_bh_enable > | dev_queue_xmit > | ip_finish_output2 > | ip_finish_output > | ip_output > | ip_local_out > | ip_queue_xmit > | tcp_transmit_skb > | tcp_write_xmit > | __tcp_push_pending_frames > | tcp_push > | tcp_sendmsg > | __sock_sendmsg > | sock_aio_write > | do_sync_write > | vfs_write > | sys_write > | sysenter_dispatch > | 0xf7f50430 > | 0x5 > | > |--0.70%-- ipt_local_in_hook > | nf_iterate > | nf_hook_slow > | ip_local_deliver > | ip_rcv_finish > | ip_rcv > | netif_receive_skb > | process_backlog > | net_rx_action > | __do_softirq > | call_softirq > | do_softirq > | _local_bh_enable_ip > | local_bh_enable > | dev_queue_xmit > | ip_finish_output2 > | ip_finish_output > | ip_output > | ip_local_out > | ip_queue_xmit > | tcp_transmit_skb > | tcp_write_xmit > | __tcp_push_pending_frames > | tcp_push > | tcp_sendmsg > | __sock_sendmsg > | sock_aio_write > | do_sync_write > | vfs_write > | sys_write > | sysenter_dispatch > | 0xf7f50430 > | 0x5 > | > |--0.56%-- perf_counter_task_sched_out > | schedule > | schedule_timeout > | sk_wait_data > | tcp_recvmsg > | sock_common_recvmsg > | __sock_recvmsg > | sock_aio_read > | do_sync_read > | vfs_read > | sys_read > | sysenter_dispatch > | 0xf7f50430 > | 0x5 > --2.53%-- [...] > > 1.66% perf [kernel] > | > |--25.00%-- journal_stop > | __ext3_journal_stop > | ext3_writeback_write_end > | generic_perform_write > | generic_file_buffered_write > | __generic_file_aio_write_nolock > | generic_file_aio_write > | ext3_file_write > | do_sync_write > | vfs_write > | sys_write > | system_call_fastpath > | __write_nocancel > | > |--16.67%-- __perf_counter_sched_out > | perf_counter_task_sched_out > | schedule > | schedule_hrtimeout_range > | poll_schedule_timeout > | do_poll > | do_sys_poll > | sys_poll > | system_call_fastpath > | __GI___poll > | > |--8.33%-- dup_mmap > | dup_mm > | copy_process > | do_fork > | sys_clone > | stub_clone > | fork > | __cmd_record > | > |--8.33%-- hrtimer_interrupt > | smp_apic_timer_interrupt > | apic_timer_interrupt > | __write_nocancel > | > |--8.33%-- journal_start > | ext3_journal_start_sb > | ext3_write_begin > | generic_perform_write > | generic_file_buffered_write > | __generic_file_aio_write_nolock > | generic_file_aio_write > | ext3_file_write > | do_sync_write > | vfs_write > | sys_write > | system_call_fastpath > | __write_nocancel > | > |--8.33%-- create_empty_buffers > | __block_prepare_write > | block_write_begin > | ext3_write_begin > | generic_perform_write > | generic_file_buffered_write > | __generic_file_aio_write_nolock > | generic_file_aio_write > | ext3_file_write > | do_sync_write > | vfs_write > | sys_write > | system_call_fastpath > | __write_nocancel > | > |--8.33%-- perf_counter_task_sched_out > | schedule > | schedule_hrtimeout_range > | poll_schedule_timeout > | do_poll > | do_sys_poll > | sys_poll > | system_call_fastpath > | __GI___poll > | > |--8.33%-- ext3_alloc_branch > | ext3_get_blocks_handle > | ext3_get_block > | __block_prepare_write > | block_write_begin > | ext3_write_begin > | generic_perform_write > | generic_file_buffered_write > | __generic_file_aio_write_nolock > | generic_file_aio_write > | ext3_file_write > | do_sync_write > | vfs_write > | sys_write > | system_call_fastpath > | __write_nocancel > | > --8.33%-- start_this_handle > journal_start > ext3_journal_start_sb > ext3_write_begin > generic_perform_write > generic_file_buffered_write > __generic_file_aio_write_nolock > generic_file_aio_write > ext3_file_write > do_sync_write > vfs_write > sys_write > system_call_fastpath > __write_nocancel > > the graph above (which is the sub-graph of the full graph, limited > via the -S/--symbols option to all callchains that intersect with > the _spin_lock symbol) details all the various code-paths that make > use of _spin_lock(), and the weights (and relative weights) of > those codepaths. > > Note that this works for just about any locking primitive, and note > how it's recursive. If we inlined spin_lock we'd only know this > __mod_timer overhead: > > |--31.46%-- __mod_timer > > but we wouldnt know the second level (which is interesting too): > > | |--55.36%-- tcp_send_delayed_ack > | --44.64%-- tcp_v4_rcv > > All in one, i think we should detach the issue of instrumentation > and profiling from the issue of whether to inline or not. We should > inline where it makes things faster. Our profiling tools are able > to deal with it, independently of whether we inline or not. > > Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds 2009-08-13 18:34 ` Ingo Molnar @ 2009-08-14 12:34 ` Heiko Carstens 2009-08-14 16:04 ` Linus Torvalds 1 sibling, 1 reply; 34+ messages in thread From: Heiko Carstens @ 2009-08-14 12:34 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt On Thu, Aug 13, 2009 at 11:11:22AM -0700, Linus Torvalds wrote: > On Wed, 12 Aug 2009, Heiko Carstens wrote: > > If one considers that server kernels are usually compiled with > > !CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop. > > The extra overhead for a function call is significant. > > With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390. > > These numbers were taken with some network benchmarks. However I expect > > any workload that calls frequently into the kernel and which grabs a few > > locks to perform better. > > Have you tried to determine which kinds of spinlocks you care most about? > With networking, I guess the softirq-safe ones (and perhaps the irq ones) > are pretty common, but maybe even there it's really just a couple of > cases. It's all spinlock* variants that are frequently called (see also below). > [...] > So there are advantages outside of the "size vs speed" kind of things, but > at the same time I do suspect that the disadvantages mean that we really > should make this be something where an architecture can specify things at > a finer granularity. Perhaps by using per-symbol defines (ie have the > <asm/spinlock.h> file do things like > > #define __spin_lock_is_small > #define __spin_unlock_is_small > > kind of things for the particular sequences that are worth inlining. Ok, that makes sense. I will post updated patches. FWIW: All of this started when we compared a 2.6.27 based distro kernel to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu cycles than 2.6.16 does for the same workloads. As an example: we took a simple one connection ping pong network load. (client sends packet to server and waits until packet from server returns and game starts again). It uses more cpu cycles on 2.6.27. Using ftrace we figured out that on 2.6.27 it takes more than 500 function calls on the client for a single roundtrip while on 2.6.16 it took only ~230 calls. Changing s390 from its own NOHZ code to the generic version did cost quite a few cycles. The idle entry and exit paths are much more expensive now. That's why you've seen Martin posting NOHZ/timer related patches lately. But also quite a few functions that were inlined in 2.6.16 are out-of-line in 2.6.27. Reverting those changes did improve performance as well. Since out of the ~500 function calls 60 are spin_[un]lock* calls it seemed worthwile inlining them. Which also improved things again. Here are the function traces for 2.6.27 and 2.6.16: (the timestamp output has been modified) For 2.6.27: # tracer: ftrace # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | perl-2581 [00] 28991519113889978 [+ 125]: SyS_write <-sysc_noemu perl-2581 [00] 28991519113890103 [+ 156]: fget_light <-SyS_write perl-2581 [00] 28991519113890259 [+ 125]: vfs_write <-SyS_write perl-2581 [00] 28991519113890384 [+ 156]: rw_verify_area <-vfs_write perl-2581 [00] 28991519113890540 [+ 125]: security_file_permission <-rw_verify_area perl-2581 [00] 28991519113890665 [+ 93]: apparmor_file_permission <-security_file_permission perl-2581 [00] 28991519113890759 [+ 156]: aa_file_permission <-apparmor_file_permission perl-2581 [00] 28991519113890915 [+ 125]: do_sync_write <-vfs_write perl-2581 [00] 28991519113891040 [+ 125]: sock_aio_write <-do_sync_write perl-2581 [00] 28991519113891165 [+ 125]: security_socket_sendmsg <-sock_aio_write perl-2581 [00] 28991519113891290 [+ 125]: apparmor_socket_sendmsg <-security_socket_sendmsg perl-2581 [00] 28991519113891415 [+ 125]: aa_revalidate_sk <-apparmor_socket_sendmsg perl-2581 [00] 28991519113891540 [+ 125]: tcp_sendmsg <-sock_aio_write perl-2581 [00] 28991519113891665 [+ 125]: lock_sock_nested <-tcp_sendmsg perl-2581 [00] 28991519113891790 [+ 125]: _spin_lock_bh <-lock_sock_nested perl-2581 [00] 28991519113891915 [+ 125]: local_bh_disable <-_spin_lock_bh perl-2581 [00] 28991519113892040 [+ 187]: local_bh_enable <-lock_sock_nested perl-2581 [00] 28991519113892228 [+ 125]: tcp_current_mss <-tcp_sendmsg perl-2581 [00] 28991519113892353 [+ 156]: tcp_established_options <-tcp_current_mss perl-2581 [00] 28991519113892509 [+ 125]: sk_stream_alloc_skb <-tcp_sendmsg perl-2581 [00] 28991519113892634 [+ 125]: __alloc_skb <-sk_stream_alloc_skb perl-2581 [00] 28991519113892759 [+ 156]: kmem_cache_alloc <-__alloc_skb perl-2581 [00] 28991519113892915 [+ 281]: __kmalloc <-__alloc_skb perl-2581 [00] 28991519113893196 [+ 125]: skb_put <-tcp_sendmsg perl-2581 [00] 28991519113893321 [+ 125]: copy_from_user_mvcos_check <-tcp_sendmsg perl-2581 [00] 28991519113893446 [+ 156]: copy_from_user_std <-copy_from_user_mvcos_check perl-2581 [00] 28991519113893603 [+ 125]: __tcp_push_pending_frames <-tcp_sendmsg perl-2581 [00] 28991519113893728 [+ 125]: tcp_init_tso_segs <-__tcp_push_pending_frames perl-2581 [00] 28991519113893853 [+ 125]: tcp_set_skb_tso_segs <-tcp_init_tso_segs perl-2581 [00] 28991519113893978 [+ 156]: tcp_transmit_skb <-__tcp_push_pending_frames perl-2581 [00] 28991519113894134 [+ 93]: skb_clone <-tcp_transmit_skb perl-2581 [00] 28991519113894228 [+ 125]: __skb_clone <-skb_clone perl-2581 [00] 28991519113894353 [+ 156]: __copy_skb_header <-__skb_clone perl-2581 [00] 28991519113894509 [+ 125]: tcp_established_options <-tcp_transmit_skb perl-2581 [00] 28991519113894634 [+ 125]: skb_push <-tcp_transmit_skb perl-2581 [00] 28991519113894759 [+ 156]: __tcp_select_window <-tcp_transmit_skb perl-2581 [00] 28991519113894915 [+ 125]: tcp_options_write <-tcp_transmit_skb perl-2581 [00] 28991519113895040 [+ 156]: tcp_v4_send_check <-tcp_transmit_skb perl-2581 [00] 28991519113895196 [+ 125]: ip_queue_xmit <-tcp_transmit_skb perl-2581 [00] 28991519113895321 [+ 156]: __sk_dst_check <-ip_queue_xmit perl-2581 [00] 28991519113895478 [+ 125]: skb_push <-ip_queue_xmit perl-2581 [00] 28991519113895603 [+ 93]: ip_local_out <-ip_queue_xmit perl-2581 [00] 28991519113895696 [+ 156]: __ip_local_out <-ip_local_out perl-2581 [00] 28991519113895853 [+ 125]: ip_output <-ip_local_out perl-2581 [00] 28991519113895978 [+ 156]: ip_finish_output <-ip_output perl-2581 [00] 28991519113896134 [+ 125]: skb_push <-ip_finish_output perl-2581 [00] 28991519113896259 [+ 125]: dev_queue_xmit <-ip_finish_output perl-2581 [00] 28991519113896384 [+ 125]: local_bh_disable <-dev_queue_xmit perl-2581 [00] 28991519113896509 [+ 187]: _spin_lock <-dev_queue_xmit perl-2581 [00] 28991519113896696 [+ 125]: pfifo_fast_enqueue <-dev_queue_xmit perl-2581 [00] 28991519113896821 [+ 125]: __qdisc_run <-dev_queue_xmit perl-2581 [00] 28991519113896946 [+ 156]: pfifo_fast_dequeue <-__qdisc_run perl-2581 [00] 28991519113897103 [+ 125]: _spin_lock <-__qdisc_run perl-2581 [00] 28991519113897228 [+ 187]: dev_hard_start_xmit <-__qdisc_run perl-2581 [00] 28991519113897415 [+ 156]: qeth_l3_hard_start_xmit <-dev_hard_start_xmit perl-2581 [00] 28991519113897571 [+ 125]: qeth_get_cast_type <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113897696 [+ 156]: qeth_get_priority_queue <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113897853 [+ 125]: skb_realloc_headroom <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113897978 [+ 125]: pskb_copy <-skb_realloc_headroom perl-2581 [00] 28991519113898103 [+ 93]: __alloc_skb <-pskb_copy perl-2581 [00] 28991519113898196 [+ 187]: kmem_cache_alloc <-__alloc_skb perl-2581 [00] 28991519113898384 [+ 218]: __kmalloc <-__alloc_skb perl-2581 [00] 28991519113898603 [+ 156]: skb_put <-pskb_copy perl-2581 [00] 28991519113898759 [+ 93]: copy_skb_header <-pskb_copy perl-2581 [00] 28991519113898853 [+ 218]: __copy_skb_header <-copy_skb_header perl-2581 [00] 28991519113899071 [+ 125]: skb_pull <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113899196 [+ 125]: skb_push <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113899321 [+ 156]: qeth_l3_fill_header <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113899478 [+ 125]: qeth_get_elements_no <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113899603 [+ 156]: qeth_do_send_packet <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113899759 [+ 125]: skb_queue_tail <-qeth_do_send_packet perl-2581 [00] 28991519113899884 [+ 125]: _spin_lock_irqsave <-skb_queue_tail perl-2581 [00] 28991519113900009 [+ 156]: _spin_unlock_irqrestore <-skb_queue_tail perl-2581 [00] 28991519113900165 [+ 187]: qeth_flush_buffers <-qeth_do_send_packet perl-2581 [00] 28991519113900353 [+ 156]: do_QDIO <-qeth_flush_buffers perl-2581 [00] 28991519113900509 [+ 156]: qdio_perf_stat_inc <-do_QDIO perl-2581 [00] 28991519113900665 [+ 125]: qdio_kick_outbound_q <-do_QDIO perl-2581 [00] 28991519113900790 [+ 812]: qdio_perf_stat_inc <-qdio_kick_outbound_q perl-2581 [00] 28991519113901603 [+ 218]: __tasklet_schedule <-do_QDIO perl-2581 [00] 28991519113901821 [+ 156]: dev_kfree_skb_any <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113901978 [+ 125]: kfree_skb <-dev_kfree_skb_any perl-2581 [00] 28991519113902103 [+ 93]: __kfree_skb <-kfree_skb perl-2581 [00] 28991519113902196 [+ 125]: skb_release_all <-__kfree_skb perl-2581 [00] 28991519113902321 [+ 156]: dst_release <-skb_release_all perl-2581 [00] 28991519113902478 [+ 93]: sock_wfree <-skb_release_all perl-2581 [00] 28991519113902571 [+ 187]: skb_release_data <-skb_release_all perl-2581 [00] 28991519113902759 [+ 125]: __netif_schedule <-qeth_l3_hard_start_xmit perl-2581 [00] 28991519113902884 [+ 156]: raise_softirq_irqoff <-__netif_schedule perl-2581 [00] 28991519113903040 [+ 125]: _spin_lock <-__qdisc_run perl-2581 [00] 28991519113903165 [+ 187]: local_bh_enable <-dev_queue_xmit perl-2581 [00] 28991519113903353 [+ 156]: do_softirq <-local_bh_enable perl-2581 [00] 28991519113903509 [+ 93]: __do_softirq <-do_softirq perl-2581 [00] 28991519113903603 [+ 93]: account_system_vtime <-__do_softirq perl-2581 [00] 28991519113903696 [+ 125]: account_system_time <-account_system_vtime perl-2581 [00] 28991519113903821 [+ 93]: acct_update_integrals <-account_system_time perl-2581 [00] 28991519113903915 [+ 187]: jiffies_to_timeval <-acct_update_integrals perl-2581 [00] 28991519113904103 [+ 156]: net_tx_action <-__do_softirq perl-2581 [00] 28991519113904259 [+ 125]: _spin_trylock <-net_tx_action perl-2581 [00] 28991519113904384 [+ 125]: __qdisc_run <-net_tx_action perl-2581 [00] 28991519113904509 [+ 156]: pfifo_fast_dequeue <-__qdisc_run perl-2581 [00] 28991519113904665 [+ 187]: tasklet_action <-__do_softirq perl-2581 [00] 28991519113904853 [+ 93]: qdio_outbound_processing <-tasklet_action perl-2581 [00] 28991519113904946 [+ 187]: qdio_perf_stat_inc <-qdio_outbound_processing perl-2581 [00] 28991519113905134 [+ 156]: mod_timer <-qdio_outbound_processing perl-2581 [00] 28991519113905290 [+ 125]: __timer_stats_timer_set_start_info <-mod_timer perl-2581 [00] 28991519113905415 [+ 125]: __mod_timer <-mod_timer perl-2581 [00] 28991519113905540 [+ 125]: __timer_stats_timer_set_start_info <-__mod_timer perl-2581 [00] 28991519113905665 [+ 93]: lock_timer_base <-__mod_timer perl-2581 [00] 28991519113905759 [+ 156]: _spin_lock_irqsave <-lock_timer_base perl-2581 [00] 28991519113905915 [+ 125]: internal_add_timer <-__mod_timer perl-2581 [00] 28991519113906040 [+ 125]: _spin_unlock_irqrestore <-__mod_timer perl-2581 [00] 28991519113906165 [+ 156]: qdio_perf_stat_inc <-qdio_outbound_processing perl-2581 [00] 28991519113906321 [+ 125]: account_system_vtime <-__do_softirq perl-2581 [00] 28991519113906446 [+ 62]: account_system_time <-account_system_vtime perl-2581 [00] 28991519113906509 [+ 125]: acct_update_integrals <-account_system_time perl-2581 [00] 28991519113906634 [+ 125]: jiffies_to_timeval <-acct_update_integrals perl-2581 [00] 28991519113906759 [+ 156]: _local_bh_enable <-__do_softirq perl-2581 [00] 28991519113906915 [+ 125]: tcp_event_new_data_sent <-__tcp_push_pending_frames perl-2581 [00] 28991519113907040 [+ 125]: sk_reset_timer <-tcp_event_new_data_sent perl-2581 [00] 28991519113907165 [+ 93]: mod_timer <-sk_reset_timer perl-2581 [00] 28991519113907259 [+ 187]: __timer_stats_timer_set_start_info <-mod_timer perl-2581 [00] 28991519113907446 [+ 156]: release_sock <-tcp_sendmsg perl-2581 [00] 28991519113907603 [+ 125]: _spin_lock_bh <-release_sock perl-2581 [00] 28991519113907728 [+ 125]: local_bh_disable <-_spin_lock_bh perl-2581 [00] 28991519113907853 [+ 125]: _spin_unlock_bh <-release_sock perl-2581 [00] 28991519113907978 [+ 187]: local_bh_enable_ip <-_spin_unlock_bh perl-2581 [00] 28991519113908165 [+ 93]: dnotify_parent <-vfs_write perl-2581 [00] 28991519113908259 [+ 156]: _spin_lock <-dnotify_parent perl-2581 [00] 28991519113908415 [+ 156]: inotify_dentry_parent_queue_event <-vfs_write perl-2581 [00] 28991519113908571 [+ 875]: inotify_inode_queue_event <-vfs_write perl-2581 [00] 28991519113909446 [+ 93]: SyS_read <-sysc_noemu perl-2581 [00] 28991519113909540 [+ 156]: fget_light <-SyS_read perl-2581 [00] 28991519113909696 [+ 125]: vfs_read <-SyS_read perl-2581 [00] 28991519113909821 [+ 125]: rw_verify_area <-vfs_read perl-2581 [00] 28991519113909946 [+ 125]: security_file_permission <-rw_verify_area perl-2581 [00] 28991519113910071 [+ 125]: apparmor_file_permission <-security_file_permission perl-2581 [00] 28991519113910196 [+ 125]: aa_file_permission <-apparmor_file_permission perl-2581 [00] 28991519113910321 [+ 156]: do_sync_read <-vfs_read perl-2581 [00] 28991519113910478 [+ 125]: sock_aio_read <-do_sync_read perl-2581 [00] 28991519113910603 [+ 125]: security_socket_recvmsg <-sock_aio_read perl-2581 [00] 28991519113910728 [+ 93]: apparmor_socket_recvmsg <-security_socket_recvmsg perl-2581 [00] 28991519113910821 [+ 156]: aa_revalidate_sk <-apparmor_socket_recvmsg perl-2581 [00] 28991519113910978 [+ 125]: sock_common_recvmsg <-sock_aio_read perl-2581 [00] 28991519113911103 [+ 125]: tcp_recvmsg <-sock_common_recvmsg perl-2581 [00] 28991519113911228 [+ 125]: lock_sock_nested <-tcp_recvmsg perl-2581 [00] 28991519113911353 [+ 125]: _spin_lock_bh <-lock_sock_nested perl-2581 [00] 28991519113911478 [+ 125]: local_bh_disable <-_spin_lock_bh perl-2581 [00] 28991519113911603 [+ 250]: local_bh_enable <-lock_sock_nested perl-2581 [00] 28991519113911853 [+ 125]: tcp_cleanup_rbuf <-tcp_recvmsg perl-2581 [00] 28991519113911978 [+ 125]: sk_wait_data <-tcp_recvmsg perl-2581 [00] 28991519113912103 [+ 125]: prepare_to_wait <-sk_wait_data perl-2581 [00] 28991519113912228 [+ 125]: _spin_lock_irqsave <-prepare_to_wait perl-2581 [00] 28991519113912353 [+ 125]: _spin_unlock_irqrestore <-prepare_to_wait perl-2581 [00] 28991519113912478 [+ 125]: release_sock <-sk_wait_data perl-2581 [00] 28991519113912603 [+ 125]: _spin_lock_bh <-release_sock perl-2581 [00] 28991519113912728 [+ 125]: local_bh_disable <-_spin_lock_bh perl-2581 [00] 28991519113912853 [+ 125]: _spin_unlock_bh <-release_sock perl-2581 [00] 28991519113912978 [+ 187]: local_bh_enable_ip <-_spin_unlock_bh perl-2581 [00] 28991519113913165 [+ 125]: schedule_timeout <-sk_wait_data perl-2581 [00] 28991519113913290 [+ 125]: schedule <-schedule_timeout perl-2581 [00] 28991519113913415 [+ 125]: _spin_lock <-schedule perl-2581 [00] 28991519113913540 [+ 93]: sched_clock_cpu <-schedule perl-2581 [00] 28991519113913634 [+ 125]: sched_clock <-sched_clock_cpu perl-2581 [00] 28991519113913759 [+ 93]: deactivate_task <-schedule perl-2581 [00] 28991519113913853 [+ 125]: dequeue_task <-deactivate_task perl-2581 [00] 28991519113913978 [+ 93]: dequeue_task_fair <-dequeue_task perl-2581 [00] 28991519113914071 [+ 93]: dequeue_entity <-dequeue_task_fair perl-2581 [00] 28991519113914165 [+ 156]: update_curr <-dequeue_entity perl-2581 [00] 28991519113914321 [+ 93]: hrtick_start_fair <-dequeue_task_fair perl-2581 [00] 28991519113914415 [+ 125]: put_prev_task_fair <-schedule perl-2581 [00] 28991519113914540 [+ 125]: pick_next_task_fair <-schedule perl-2581 [00] 28991519113914665 [+ 93]: pick_next_task_rt <-schedule perl-2581 [00] 28991519113914759 [+ 93]: pick_next_task_fair <-schedule perl-2581 [00] 28991519113914853 [+ 125]: pick_next_task_idle <-schedule perl-2581 [00] 28991519113914978 [+ 218]: _spin_trylock <-tracing_record_cmdline <idle>-0 [00] 28991519113915196 [+ 125]: finish_task_switch <-schedule <idle>-0 [00] 28991519113915321 [+ 62]: account_vtime <-finish_task_switch <idle>-0 [00] 28991519113915384 [+ 93]: account_user_time <-account_vtime <idle>-0 [00] 28991519113915478 [+ 93]: acct_update_integrals <-account_user_time <idle>-0 [00] 28991519113915571 [+ 125]: jiffies_to_timeval <-acct_update_integrals <idle>-0 [00] 28991519113915696 [+ 93]: account_system_time <-account_vtime <idle>-0 [00] 28991519113915790 [+ 93]: acct_update_integrals <-account_system_time <idle>-0 [00] 28991519113915884 [+ 187]: jiffies_to_timeval <-acct_update_integrals <idle>-0 [00] 28991519113916071 [+ 125]: tick_nohz_stop_sched_tick <-cpu_idle <idle>-0 [00] 28991519113916196 [+ 93]: ktime_get <-tick_nohz_stop_sched_tick <idle>-0 [00] 28991519113916290 [+ 93]: ktime_get_ts <-ktime_get <idle>-0 [00] 28991519113916384 [+ 93]: getnstimeofday <-ktime_get_ts <idle>-0 [00] 28991519113916478 [+ 125]: read_tod_clock <-getnstimeofday <idle>-0 [00] 28991519113916603 [+ 156]: set_normalized_timespec <-ktime_get_ts <idle>-0 [00] 28991519113916759 [+ 93]: get_next_timer_interrupt <-tick_nohz_stop_sched_tick <idle>-0 [00] 28991519113916853 [+ 125]: _spin_lock <-get_next_timer_interrupt <idle>-0 [00] 28991519113916978 [+ 93]: hrtimer_get_next_event <-get_next_timer_interrupt <idle>-0 [00] 28991519113917071 [+ 125]: _spin_lock_irqsave <-hrtimer_get_next_event <idle>-0 [00] 28991519113917196 [+ 93]: _spin_unlock_irqrestore <-hrtimer_get_next_event <idle>-0 [00] 28991519113917290 [+ 93]: rcu_needs_cpu <-tick_nohz_stop_sched_tick <idle>-0 [00] 28991519113917384 [+ 156]: rcu_pending <-rcu_needs_cpu <idle>-0 [00] 28991519113917540 [+ 93]: select_nohz_load_balancer <-tick_nohz_stop_sched_tick <idle>-0 [00] 28991519113917634 [+ 93]: hweight64 <-select_nohz_load_balancer <idle>-0 [00] 28991519113917728 [+ 156]: hweight64 <-select_nohz_load_balancer <idle>-0 [00] 28991519113917884 [+ 62]: hrtimer_start <-tick_nohz_stop_sched_tick <idle>-0 [00] 28991519113917946 [+ 93]: lock_hrtimer_base <-hrtimer_start <idle>-0 [00] 28991519113918040 [+ 125]: _spin_lock_irqsave <-lock_hrtimer_base <idle>-0 [00] 28991519113918165 [+ 93]: __remove_hrtimer <-hrtimer_start <idle>-0 [00] 28991519113918259 [+ 125]: rb_next <-__remove_hrtimer <idle>-0 [00] 28991519113918384 [+ 125]: hrtimer_force_reprogram <-__remove_hrtimer <idle>-0 [00] 28991519113918509 [+ 93]: tick_program_event <-hrtimer_force_reprogram <idle>-0 [00] 28991519113918603 [+ 93]: tick_dev_program_event <-tick_program_event <idle>-0 [00] 28991519113918696 [+ 93]: ktime_get <-tick_dev_program_event <idle>-0 [00] 28991519113918790 [+ 93]: ktime_get_ts <-ktime_get <idle>-0 [00] 28991519113918884 [+ 62]: getnstimeofday <-ktime_get_ts <idle>-0 [00] 28991519113918946 [+ 125]: read_tod_clock <-getnstimeofday <idle>-0 [00] 28991519113919071 [+ 125]: set_normalized_timespec <-ktime_get_ts <idle>-0 [00] 28991519113919196 [+ 93]: clockevents_program_event <-tick_dev_program_event <idle>-0 [00] 28991519113919290 [+ 93]: s390_next_event <-clockevents_program_event <idle>-0 [00] 28991519113919384 [+ 156]: rb_erase <-__remove_hrtimer <idle>-0 [00] 28991519113919540 [+ 93]: __timer_stats_hrtimer_set_start_info <-hrtimer_start <idle>-0 [00] 28991519113919634 [+ 93]: enqueue_hrtimer <-hrtimer_start <idle>-0 [00] 28991519113919728 [+ 125]: hrtimer_reprogram <-enqueue_hrtimer <idle>-0 [00] 28991519113919853 [+ 62]: tick_program_event <-hrtimer_reprogram <idle>-0 [00] 28991519113919915 [+ 93]: tick_dev_program_event <-tick_program_event <idle>-0 [00] 28991519113920009 [+ 93]: ktime_get <-tick_dev_program_event <idle>-0 [00] 28991519113920103 [+ 93]: ktime_get_ts <-ktime_get <idle>-0 [00] 28991519113920196 [+ 62]: getnstimeofday <-ktime_get_ts <idle>-0 [00] 28991519113920259 [+ 125]: read_tod_clock <-getnstimeofday <idle>-0 [00] 28991519113920384 [+ 125]: set_normalized_timespec <-ktime_get_ts <idle>-0 [00] 28991519113920509 [+ 62]: clockevents_program_event <-tick_dev_program_event <idle>-0 [00] 28991519113920571 [+ 125]: s390_next_event <-clockevents_program_event <idle>-0 [00] 28991519113920696 [+ 125]: rb_insert_color <-enqueue_hrtimer <idle>-0 [00] 28991519113920821 [+ 156]: _spin_unlock_irqrestore <-hrtimer_start <idle>-0 [00] 28991519113920978 [+ 125]: _spin_lock <-cpu_idle <idle>-0 [00] 28991519113921103 [+229750]: vtime_stop_cpu_timer <-cpu_idle <idle>-0 [00] 28991519114150853 [+ 93]: do_IRQ <-io_return <idle>-0 [00] 28991519114150946 [+ 62]: irq_enter <-do_IRQ <idle>-0 [00] 28991519114151009 [+ 125]: idle_cpu <-irq_enter <idle>-0 [00] 28991519114151134 [+ 93]: tick_nohz_stop_idle <-irq_enter <idle>-0 [00] 28991519114151228 [+ 62]: ktime_get <-tick_nohz_stop_idle <idle>-0 [00] 28991519114151290 [+ 125]: ktime_get_ts <-ktime_get <idle>-0 [00] 28991519114151415 [+ 62]: getnstimeofday <-ktime_get_ts <idle>-0 [00] 28991519114151478 [+ 93]: read_tod_clock <-getnstimeofday <idle>-0 [00] 28991519114151571 [+ 125]: set_normalized_timespec <-ktime_get_ts <idle>-0 [00] 28991519114151696 [+ 93]: account_system_vtime <-irq_enter <idle>-0 [00] 28991519114151790 [+ 93]: account_system_time <-account_system_vtime <idle>-0 [00] 28991519114151884 [+ 125]: acct_update_integrals <-account_system_time <idle>-0 [00] 28991519114152009 [+ 93]: idle_cpu <-irq_enter <idle>-0 [00] 28991519114152103 [+ 62]: tick_nohz_update_jiffies <-irq_enter <idle>-0 [00] 28991519114152165 [+ 93]: ktime_get <-tick_nohz_update_jiffies <idle>-0 [00] 28991519114152259 [+ 93]: ktime_get_ts <-ktime_get <idle>-0 [00] 28991519114152353 [+ 62]: getnstimeofday <-ktime_get_ts <idle>-0 [00] 28991519114152415 [+ 93]: read_tod_clock <-getnstimeofday <idle>-0 [00] 28991519114152509 [+ 125]: set_normalized_timespec <-ktime_get_ts <idle>-0 [00] 28991519114152634 [+ 125]: tick_do_update_jiffies64 <-tick_nohz_update_jiffies <idle>-0 [00] 28991519114152759 [+ 62]: s390_idle_leave <-do_IRQ <idle>-0 [00] 28991519114152821 [+ 125]: vtime_start_cpu_timer <-s390_idle_leave <idle>-0 [00] 28991519114152946 [+ 93]: _spin_lock <-s390_idle_leave <idle>-0 [00] 28991519114153040 [+ 250]: do_adapter_IO <-do_IRQ <idle>-0 [00] 28991519114153290 [+ 93]: tiqdio_thinint_handler <-do_adapter_IO <idle>-0 [00] 28991519114153384 [+ 156]: qdio_perf_stat_inc <-tiqdio_thinint_handler <idle>-0 [00] 28991519114153540 [+ 250]: __tasklet_hi_schedule <-tiqdio_thinint_handler <idle>-0 [00] 28991519114153790 [+ 93]: irq_exit <-do_IRQ <idle>-0 [00] 28991519114153884 [+ 93]: account_system_vtime <-irq_exit <idle>-0 [00] 28991519114153978 [+ 93]: account_system_time <-account_system_vtime <idle>-0 [00] 28991519114154071 [+ 93]: acct_update_integrals <-account_system_time <idle>-0 [00] 28991519114154165 [+ 93]: do_softirq <-irq_exit <idle>-0 [00] 28991519114154259 [+ 93]: __do_softirq <-do_softirq <idle>-0 [00] 28991519114154353 [+ 62]: account_system_vtime <-__do_softirq <idle>-0 [00] 28991519114154415 [+ 125]: account_system_time <-account_system_vtime <idle>-0 [00] 28991519114154540 [+ 93]: acct_update_integrals <-account_system_time <idle>-0 [00] 28991519114154634 [+ 93]: tasklet_hi_action <-__do_softirq <idle>-0 [00] 28991519114154728 [+ 93]: tiqdio_tasklet_fn <-tasklet_hi_action <idle>-0 [00] 28991519114154821 [+ 218]: qdio_perf_stat_inc <-tiqdio_tasklet_fn <idle>-0 [00] 28991519114155040 [+ 218]: __tasklet_schedule <-tiqdio_tasklet_fn <idle>-0 [00] 28991519114155259 [+ 125]: tasklet_action <-__do_softirq <idle>-0 [00] 28991519114155384 [+ 93]: tiqdio_inbound_processing <-tasklet_action <idle>-0 [00] 28991519114155478 [+ 93]: qdio_perf_stat_inc <-tiqdio_inbound_processing <idle>-0 [00] 28991519114155571 [+ 125]: qdio_sync_after_thinint <-tiqdio_inbound_processing <idle>-0 [00] 28991519114155696 [+ 93]: qdio_siga_sync <-qdio_sync_after_thinint <idle>-0 [00] 28991519114155790 [+ 125]: qdio_check_outbound_after_thinint <-tiqdio_inbound_processing <idle>-0 [00] 28991519114155915 [+ 125]: __tasklet_schedule <-qdio_check_outbound_after_thinint <idle>-0 [00] 28991519114156040 [+ 218]: qdio_inbound_q_moved <-tiqdio_inbound_processing <idle>-0 [00] 28991519114156259 [+ 93]: qdio_kick_inbound_handler <-tiqdio_inbound_processing <idle>-0 [00] 28991519114156353 [+ 93]: qdio_perf_stat_inc <-qdio_kick_inbound_handler <idle>-0 [00] 28991519114156446 [+ 156]: qeth_l3_qdio_input_handler <-qdio_kick_inbound_handler <idle>-0 [00] 28991519114156603 [+ 218]: qeth_core_get_next_skb <-qeth_l3_qdio_input_handler <idle>-0 [00] 28991519114156821 [+ 62]: dev_alloc_skb <-qeth_core_get_next_skb <idle>-0 [00] 28991519114156884 [+ 125]: __alloc_skb <-dev_alloc_skb <idle>-0 [00] 28991519114157009 [+ 93]: kmem_cache_alloc <-__alloc_skb <idle>-0 [00] 28991519114157103 [+ 187]: __kmalloc <-__alloc_skb <idle>-0 [00] 28991519114157290 [+ 187]: skb_put <-qeth_core_get_next_skb <idle>-0 [00] 28991519114157478 [+ 93]: eth_header <-qeth_l3_qdio_input_handler <idle>-0 [00] 28991519114157571 [+ 93]: skb_push <-eth_header <idle>-0 [00] 28991519114157665 [+ 93]: eth_type_trans <-qeth_l3_qdio_input_handler <idle>-0 [00] 28991519114157759 [+ 125]: skb_pull <-eth_type_trans <idle>-0 [00] 28991519114157884 [+ 93]: netif_rx <-qeth_l3_qdio_input_handler <idle>-0 [00] 28991519114157978 [+ 125]: __napi_schedule <-netif_rx <idle>-0 [00] 28991519114158103 [+ 125]: qeth_core_get_next_skb <-qeth_l3_qdio_input_handler <idle>-0 [00] 28991519114158228 [+ 125]: qeth_queue_input_buffer <-qeth_l3_qdio_input_handler <idle>-0 [00] 28991519114158353 [+ 62]: qdio_siga_sync_q <-tiqdio_inbound_processing <idle>-0 [00] 28991519114158415 [+ 125]: qdio_siga_sync <-qdio_siga_sync_q <idle>-0 [00] 28991519114158540 [+ 93]: get_buf_state <-tiqdio_inbound_processing <idle>-0 [00] 28991519114158634 [+ 93]: qdio_stop_polling <-tiqdio_inbound_processing <idle>-0 [00] 28991519114158728 [+ 125]: qdio_perf_stat_inc <-qdio_stop_polling <idle>-0 [00] 28991519114158853 [+ 93]: qdio_siga_sync_q <-tiqdio_inbound_processing <idle>-0 [00] 28991519114158946 [+ 93]: qdio_siga_sync <-qdio_siga_sync_q <idle>-0 [00] 28991519114159040 [+ 156]: get_buf_state <-tiqdio_inbound_processing <idle>-0 [00] 28991519114159196 [+ 93]: net_rx_action <-__do_softirq <idle>-0 [00] 28991519114159290 [+ 93]: process_backlog <-net_rx_action <idle>-0 [00] 28991519114159384 [+ 156]: netif_receive_skb <-process_backlog <idle>-0 [00] 28991519114159540 [+ 125]: ip_rcv <-netif_receive_skb <idle>-0 [00] 28991519114159665 [+ 125]: ip_rcv_finish <-ip_rcv <idle>-0 [00] 28991519114159790 [+ 156]: ip_route_input <-ip_rcv_finish <idle>-0 [00] 28991519114159946 [+ 62]: ip_local_deliver <-ip_rcv_finish <idle>-0 [00] 28991519114160009 [+ 93]: ip_local_deliver_finish <-ip_local_deliver <idle>-0 [00] 28991519114160103 [+ 125]: raw_local_deliver <-ip_local_deliver_finish <idle>-0 [00] 28991519114160228 [+ 93]: tcp_v4_rcv <-ip_local_deliver_finish <idle>-0 [00] 28991519114160321 [+ 125]: __inet_lookup_established <-tcp_v4_rcv <idle>-0 [00] 28991519114160446 [+ 125]: _read_lock <-__inet_lookup_established <idle>-0 [00] 28991519114160571 [+ 93]: sk_filter <-tcp_v4_rcv <idle>-0 [00] 28991519114160665 [+ 93]: security_sock_rcv_skb <-sk_filter <idle>-0 [00] 28991519114160759 [+ 62]: cap_socket_sock_rcv_skb <-security_sock_rcv_skb <idle>-0 [00] 28991519114160821 [+ 125]: local_bh_disable <-sk_filter <idle>-0 [00] 28991519114160946 [+ 93]: local_bh_enable <-sk_filter <idle>-0 [00] 28991519114161040 [+ 93]: _spin_lock <-tcp_v4_rcv <idle>-0 [00] 28991519114161134 [+ 93]: __wake_up <-tcp_v4_rcv <idle>-0 [00] 28991519114161228 [+ 93]: _spin_lock_irqsave <-__wake_up <idle>-0 [00] 28991519114161321 [+ 93]: __wake_up_common <-__wake_up <idle>-0 [00] 28991519114161415 [+ 93]: autoremove_wake_function <-__wake_up_common <idle>-0 [00] 28991519114161509 [+ 62]: default_wake_function <-autoremove_wake_function <idle>-0 [00] 28991519114161571 [+ 125]: try_to_wake_up <-default_wake_function <idle>-0 [00] 28991519114161696 [+ 62]: task_rq_lock <-try_to_wake_up <idle>-0 [00] 28991519114161759 [+ 125]: _spin_lock <-task_rq_lock <idle>-0 [00] 28991519114161884 [+ 125]: select_task_rq_fair <-try_to_wake_up <idle>-0 [00] 28991519114162009 [+ 93]: sched_clock_cpu <-try_to_wake_up <idle>-0 [00] 28991519114162103 [+ 93]: sched_clock <-sched_clock_cpu <idle>-0 [00] 28991519114162196 [+ 93]: activate_task <-try_to_wake_up <idle>-0 [00] 28991519114162290 [+ 93]: enqueue_task <-activate_task <idle>-0 [00] 28991519114162384 [+ 93]: enqueue_task_fair <-enqueue_task <idle>-0 [00] 28991519114162478 [+ 93]: enqueue_entity <-enqueue_task_fair <idle>-0 [00] 28991519114162571 [+ 93]: update_curr <-enqueue_entity <idle>-0 [00] 28991519114162665 [+ 93]: place_entity <-enqueue_entity <idle>-0 [00] 28991519114162759 [+ 93]: account_scheduler_latency <-enqueue_entity <idle>-0 [00] 28991519114162853 [+ 93]: __enqueue_entity <-enqueue_entity <idle>-0 [00] 28991519114162946 [+ 93]: rb_insert_color <-__enqueue_entity <idle>-0 [00] 28991519114163040 [+ 125]: hrtick_start_fair <-enqueue_task_fair <idle>-0 [00] 28991519114163165 [+ 93]: check_preempt_curr_idle <-try_to_wake_up <idle>-0 [00] 28991519114163259 [+ 62]: resched_task <-check_preempt_curr_idle <idle>-0 [00] 28991519114163321 [+ 125]: _spin_unlock_irqrestore <-try_to_wake_up <idle>-0 [00] 28991519114163446 [+ 93]: _spin_unlock_irqrestore <-__wake_up <idle>-0 [00] 28991519114163540 [+ 93]: sk_reset_timer <-tcp_v4_rcv <idle>-0 [00] 28991519114163634 [+ 125]: mod_timer <-sk_reset_timer <idle>-0 [00] 28991519114163759 [+ 125]: __timer_stats_timer_set_start_info <-mod_timer <idle>-0 [00] 28991519114163884 [+ 125]: __mod_timer <-mod_timer <idle>-0 [00] 28991519114164009 [+ 125]: __timer_stats_timer_set_start_info <-__mod_timer <idle>-0 [00] 28991519114164134 [+ 125]: lock_timer_base <-__mod_timer <idle>-0 [00] 28991519114164259 [+ 125]: _spin_lock_irqsave <-lock_timer_base <idle>-0 [00] 28991519114164384 [+ 125]: internal_add_timer <-__mod_timer <idle>-0 [00] 28991519114164509 [+ 281]: _spin_unlock_irqrestore <-__mod_timer <idle>-0 [00] 28991519114164790 [+ 156]: tasklet_action <-__do_softirq <idle>-0 [00] 28991519114164946 [+ 125]: qdio_outbound_processing <-tasklet_action <idle>-0 [00] 28991519114165071 [+ 250]: qdio_perf_stat_inc <-qdio_outbound_processing <idle>-0 [00] 28991519114165321 [+ 187]: qeth_qdio_output_handler <-qdio_outbound_processing <idle>-0 [00] 28991519114165509 [+ 125]: qeth_check_qdio_errors <-qeth_qdio_output_handler <idle>-0 [00] 28991519114165634 [+ 125]: qeth_clear_output_buffer <-qeth_qdio_output_handler <idle>-0 [00] 28991519114165759 [+ 93]: skb_dequeue <-qeth_clear_output_buffer <idle>-0 [00] 28991519114165853 [+ 125]: _spin_lock_irqsave <-skb_dequeue <idle>-0 [00] 28991519114165978 [+ 156]: _spin_unlock_irqrestore <-skb_dequeue <idle>-0 [00] 28991519114166134 [+ 156]: dev_kfree_skb_any <-qeth_clear_output_buffer <idle>-0 [00] 28991519114166290 [+ 125]: kfree_skb <-dev_kfree_skb_any <idle>-0 [00] 28991519114166415 [+ 125]: __kfree_skb <-kfree_skb <idle>-0 [00] 28991519114166540 [+ 125]: skb_release_all <-__kfree_skb <idle>-0 [00] 28991519114166665 [+ 125]: dst_release <-skb_release_all <idle>-0 [00] 28991519114166790 [+ 125]: skb_release_data <-skb_release_all <idle>-0 [00] 28991519114166915 [+ 218]: kfree <-skb_release_data <idle>-0 [00] 28991519114167134 [+ 156]: kmem_cache_free <-__kfree_skb <idle>-0 [00] 28991519114167290 [+ 125]: skb_dequeue <-qeth_clear_output_buffer <idle>-0 [00] 28991519114167415 [+ 125]: _spin_lock_irqsave <-skb_dequeue <idle>-0 [00] 28991519114167540 [+ 125]: _spin_unlock_irqrestore <-skb_dequeue <idle>-0 [00] 28991519114167665 [+ 312]: qeth_eddp_buf_release_contexts <-qeth_clear_output_buffer <idle>-0 [00] 28991519114167978 [+ 156]: qeth_switch_to_nonpacking_if_needed <-qeth_qdio_output_handler <idle>-0 [00] 28991519114168134 [+ 125]: __netif_schedule <-qeth_qdio_output_handler <idle>-0 [00] 28991519114168259 [+ 156]: raise_softirq_irqoff <-__netif_schedule <idle>-0 [00] 28991519114168415 [+ 125]: del_timer <-qdio_outbound_processing <idle>-0 [00] 28991519114168540 [+ 93]: lock_timer_base <-del_timer <idle>-0 [00] 28991519114168634 [+ 156]: _spin_lock_irqsave <-lock_timer_base <idle>-0 [00] 28991519114168790 [+ 218]: _spin_unlock_irqrestore <-del_timer <idle>-0 [00] 28991519114169009 [+ 187]: net_tx_action <-__do_softirq <idle>-0 [00] 28991519114169196 [+ 125]: _spin_trylock <-net_tx_action <idle>-0 [00] 28991519114169321 [+ 125]: __qdisc_run <-net_tx_action <idle>-0 [00] 28991519114169446 [+ 156]: pfifo_fast_dequeue <-__qdisc_run <idle>-0 [00] 28991519114169603 [+ 93]: account_system_vtime <-__do_softirq <idle>-0 [00] 28991519114169696 [+ 125]: account_system_time <-account_system_vtime <idle>-0 [00] 28991519114169821 [+ 93]: acct_update_integrals <-account_system_time <idle>-0 [00] 28991519114169915 [+ 125]: _local_bh_enable <-__do_softirq <idle>-0 [00] 28991519114170040 [+ 187]: idle_cpu <-irq_exit <idle>-0 [00] 28991519114170228 [+ 125]: tick_nohz_restart_sched_tick <-cpu_idle <idle>-0 [00] 28991519114170353 [+ 93]: tick_nohz_stop_idle <-tick_nohz_restart_sched_tick <idle>-0 [00] 28991519114170446 [+ 93]: select_nohz_load_balancer <-tick_nohz_restart_sched_tick <idle>-0 [00] 28991519114170540 [+ 93]: ktime_get <-tick_nohz_restart_sched_tick <idle>-0 [00] 28991519114170634 [+ 125]: ktime_get_ts <-ktime_get <idle>-0 [00] 28991519114170759 [+ 62]: getnstimeofday <-ktime_get_ts <idle>-0 [00] 28991519114170821 [+ 156]: read_tod_clock <-getnstimeofday <idle>-0 [00] 28991519114170978 [+ 93]: set_normalized_timespec <-ktime_get_ts <idle>-0 [00] 28991519114171071 [+ 125]: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick <idle>-0 [00] 28991519114171196 [+ 93]: hrtimer_cancel <-tick_nohz_restart_sched_tick <idle>-0 [00] 28991519114171290 [+ 93]: hrtimer_try_to_cancel <-hrtimer_cancel <idle>-0 [00] 28991519114171384 [+ 62]: lock_hrtimer_base <-hrtimer_try_to_cancel <idle>-0 [00] 28991519114171446 [+ 125]: _spin_lock_irqsave <-lock_hrtimer_base <idle>-0 [00] 28991519114171571 [+ 93]: __remove_hrtimer <-hrtimer_try_to_cancel <idle>-0 [00] 28991519114171665 [+ 93]: rb_next <-__remove_hrtimer <idle>-0 [00] 28991519114171759 [+ 93]: hrtimer_force_reprogram <-__remove_hrtimer <idle>-0 [00] 28991519114171853 [+ 93]: tick_program_event <-hrtimer_force_reprogram <idle>-0 [00] 28991519114171946 [+ 93]: tick_dev_program_event <-tick_program_event <idle>-0 [00] 28991519114172040 [+ 125]: ktime_get <-tick_dev_program_event <idle>-0 [00] 28991519114172165 [+ 62]: ktime_get_ts <-ktime_get <idle>-0 [00] 28991519114172228 [+ 93]: getnstimeofday <-ktime_get_ts <idle>-0 [00] 28991519114172321 [+ 93]: read_tod_clock <-getnstimeofday <idle>-0 [00] 28991519114172415 [+ 156]: set_normalized_timespec <-ktime_get_ts <idle>-0 [00] 28991519114172571 [+ 93]: clockevents_program_event <-tick_dev_program_event <idle>-0 [00] 28991519114172665 [+ 125]: s390_next_event <-clockevents_program_event <idle>-0 [00] 28991519114172790 [+ 125]: rb_erase <-__remove_hrtimer <idle>-0 [00] 28991519114172915 [+ 93]: _spin_unlock_irqrestore <-hrtimer_try_to_cancel <idle>-0 [00] 28991519114173009 [+ 93]: hrtimer_forward <-tick_nohz_restart_sched_tick <idle>-0 [00] 28991519114173103 [+ 125]: hrtimer_start <-tick_nohz_restart_sched_tick <idle>-0 [00] 28991519114173228 [+ 62]: lock_hrtimer_base <-hrtimer_start <idle>-0 [00] 28991519114173290 [+ 125]: _spin_lock_irqsave <-lock_hrtimer_base <idle>-0 [00] 28991519114173415 [+ 93]: __timer_stats_hrtimer_set_start_info <-hrtimer_start <idle>-0 [00] 28991519114173509 [+ 93]: enqueue_hrtimer <-hrtimer_start <idle>-0 [00] 28991519114173603 [+ 93]: hrtimer_reprogram <-enqueue_hrtimer <idle>-0 [00] 28991519114173696 [+ 93]: tick_program_event <-hrtimer_reprogram <idle>-0 [00] 28991519114173790 [+ 93]: tick_dev_program_event <-tick_program_event <idle>-0 [00] 28991519114173884 [+ 62]: ktime_get <-tick_dev_program_event <idle>-0 [00] 28991519114173946 [+ 93]: ktime_get_ts <-ktime_get <idle>-0 [00] 28991519114174040 [+ 93]: getnstimeofday <-ktime_get_ts <idle>-0 [00] 28991519114174134 [+ 93]: read_tod_clock <-getnstimeofday <idle>-0 [00] 28991519114174228 [+ 93]: set_normalized_timespec <-ktime_get_ts <idle>-0 [00] 28991519114174321 [+ 125]: clockevents_program_event <-tick_dev_program_event <idle>-0 [00] 28991519114174446 [+ 125]: s390_next_event <-clockevents_program_event <idle>-0 [00] 28991519114174571 [+ 93]: rb_insert_color <-enqueue_hrtimer <idle>-0 [00] 28991519114174665 [+ 156]: _spin_unlock_irqrestore <-hrtimer_start <idle>-0 [00] 28991519114174821 [+ 125]: schedule <-cpu_idle <idle>-0 [00] 28991519114174946 [+ 93]: _spin_lock <-schedule <idle>-0 [00] 28991519114175040 [+ 125]: sched_clock_cpu <-schedule <idle>-0 [00] 28991519114175165 [+ 125]: sched_clock <-sched_clock_cpu <idle>-0 [00] 28991519114175290 [+ 93]: put_prev_task_idle <-schedule <idle>-0 [00] 28991519114175384 [+ 93]: pick_next_task_fair <-schedule <idle>-0 [00] 28991519114175478 [+ 93]: set_next_entity <-pick_next_task_fair <idle>-0 [00] 28991519114175571 [+ 93]: __dequeue_entity <-set_next_entity <idle>-0 [00] 28991519114175665 [+ 93]: rb_next <-__dequeue_entity <idle>-0 [00] 28991519114175759 [+ 125]: rb_erase <-__dequeue_entity <idle>-0 [00] 28991519114175884 [+ 187]: hrtick_start_fair <-pick_next_task_fair <idle>-0 [00] 28991519114176071 [+ 187]: _spin_trylock <-tracing_record_cmdline perl-2581 [00] 28991519114176259 [+ 125]: finish_task_switch <-schedule perl-2581 [00] 28991519114176384 [+ 93]: account_vtime <-finish_task_switch perl-2581 [00] 28991519114176478 [+ 93]: account_user_time <-account_vtime perl-2581 [00] 28991519114176571 [+ 93]: acct_update_integrals <-account_user_time perl-2581 [00] 28991519114176665 [+ 125]: account_system_time <-account_vtime perl-2581 [00] 28991519114176790 [+ 156]: acct_update_integrals <-account_system_time perl-2581 [00] 28991519114176946 [+ 125]: lock_sock_nested <-sk_wait_data perl-2581 [00] 28991519114177071 [+ 125]: _spin_lock_bh <-lock_sock_nested perl-2581 [00] 28991519114177196 [+ 125]: local_bh_disable <-_spin_lock_bh perl-2581 [00] 28991519114177321 [+ 187]: local_bh_enable <-lock_sock_nested perl-2581 [00] 28991519114177509 [+ 156]: finish_wait <-sk_wait_data perl-2581 [00] 28991519114177665 [+ 125]: tcp_prequeue_process <-tcp_recvmsg perl-2581 [00] 28991519114177790 [+ 156]: local_bh_disable <-tcp_prequeue_process perl-2581 [00] 28991519114177946 [+ 125]: tcp_v4_do_rcv <-tcp_prequeue_process perl-2581 [00] 28991519114178071 [+ 125]: tcp_rcv_established <-tcp_v4_do_rcv perl-2581 [00] 28991519114178196 [+ 156]: local_bh_enable <-tcp_rcv_established perl-2581 [00] 28991519114178353 [+ 125]: skb_copy_datagram_iovec <-tcp_rcv_established perl-2581 [00] 28991519114178478 [+ 125]: memcpy_toiovec <-skb_copy_datagram_iovec perl-2581 [00] 28991519114178603 [+ 125]: copy_to_user_mvcos_check <-memcpy_toiovec perl-2581 [00] 28991519114178728 [+ 187]: copy_to_user_std <-copy_to_user_mvcos_check perl-2581 [00] 28991519114178915 [+ 93]: tcp_rcv_space_adjust <-tcp_rcv_established perl-2581 [00] 28991519114179009 [+ 156]: local_bh_disable <-tcp_rcv_established perl-2581 [00] 28991519114179165 [+ 125]: get_seconds <-tcp_rcv_established perl-2581 [00] 28991519114179290 [+ 156]: tcp_event_data_recv <-tcp_rcv_established perl-2581 [00] 28991519114179446 [+ 187]: tcp_ack <-tcp_rcv_established perl-2581 [00] 28991519114179634 [+ 156]: __kfree_skb <-tcp_ack perl-2581 [00] 28991519114179790 [+ 93]: skb_release_all <-__kfree_skb perl-2581 [00] 28991519114179884 [+ 125]: dst_release <-skb_release_all perl-2581 [00] 28991519114180009 [+ 156]: skb_release_data <-skb_release_all perl-2581 [00] 28991519114180165 [+ 187]: kfree <-skb_release_data perl-2581 [00] 28991519114180353 [+ 187]: kmem_cache_free <-__kfree_skb perl-2581 [00] 28991519114180540 [+ 125]: tcp_rtt_estimator <-tcp_ack perl-2581 [00] 28991519114180665 [+ 156]: bictcp_acked <-tcp_ack perl-2581 [00] 28991519114180821 [+ 93]: bictcp_cong_avoid <-tcp_ack perl-2581 [00] 28991519114180915 [+ 156]: tcp_is_cwnd_limited <-bictcp_cong_avoid perl-2581 [00] 28991519114181071 [+ 125]: tcp_current_mss <-tcp_rcv_established perl-2581 [00] 28991519114181196 [+ 125]: tcp_established_options <-tcp_current_mss perl-2581 [00] 28991519114181321 [+ 125]: __tcp_push_pending_frames <-tcp_rcv_established perl-2581 [00] 28991519114181446 [+ 125]: __tcp_ack_snd_check <-tcp_rcv_established perl-2581 [00] 28991519114181571 [+ 125]: tcp_send_delayed_ack <-__tcp_ack_snd_check perl-2581 [00] 28991519114181696 [+ 125]: sk_reset_timer <-tcp_send_delayed_ack perl-2581 [00] 28991519114181821 [+ 125]: mod_timer <-sk_reset_timer perl-2581 [00] 28991519114181946 [+ 125]: __timer_stats_timer_set_start_info <-mod_timer perl-2581 [00] 28991519114182071 [+ 93]: __mod_timer <-mod_timer perl-2581 [00] 28991519114182165 [+ 156]: __timer_stats_timer_set_start_info <-__mod_timer perl-2581 [00] 28991519114182321 [+ 93]: lock_timer_base <-__mod_timer perl-2581 [00] 28991519114182415 [+ 156]: _spin_lock_irqsave <-lock_timer_base perl-2581 [00] 28991519114182571 [+ 93]: internal_add_timer <-__mod_timer perl-2581 [00] 28991519114182665 [+ 156]: _spin_unlock_irqrestore <-__mod_timer perl-2581 [00] 28991519114182821 [+ 93]: __kfree_skb <-tcp_rcv_established perl-2581 [00] 28991519114182915 [+ 125]: skb_release_all <-__kfree_skb perl-2581 [00] 28991519114183040 [+ 125]: dst_release <-skb_release_all perl-2581 [00] 28991519114183165 [+ 125]: skb_release_data <-skb_release_all perl-2581 [00] 28991519114183290 [+ 187]: kfree <-skb_release_data perl-2581 [00] 28991519114183478 [+ 187]: kmem_cache_free <-__kfree_skb perl-2581 [00] 28991519114183665 [+ 187]: local_bh_enable <-tcp_prequeue_process perl-2581 [00] 28991519114183853 [+ 125]: tcp_cleanup_rbuf <-tcp_recvmsg perl-2581 [00] 28991519114183978 [+ 156]: __tcp_select_window <-tcp_cleanup_rbuf perl-2581 [00] 28991519114184134 [+ 125]: release_sock <-tcp_recvmsg perl-2581 [00] 28991519114184259 [+ 93]: _spin_lock_bh <-release_sock perl-2581 [00] 28991519114184353 [+ 125]: local_bh_disable <-_spin_lock_bh perl-2581 [00] 28991519114184478 [+ 156]: _spin_unlock_bh <-release_sock perl-2581 [00] 28991519114184634 [+ 218]: local_bh_enable_ip <-_spin_unlock_bh perl-2581 [00] 28991519114184853 [+ 156]: dnotify_parent <-vfs_read perl-2581 [00] 28991519114185009 [+ 125]: _spin_lock <-dnotify_parent perl-2581 [00] 28991519114185134 [+ 125]: inotify_dentry_parent_queue_event <-vfs_read perl-2581 [00] 28991519114185259 [+ 1687]: inotify_inode_queue_event <-vfs_read And this is for 2.6.16: # tracer: ftrace # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | <...>-2804 [00] 28994586831213199 [+ 93]: sys_write <-sysc_noemu <...>-2804 [00] 28994586831213292 [+ 218]: fget_light <-sys_write <...>-2804 [00] 28994586831213511 [+ 156]: apparmor_file_permission <-sys_write <...>-2804 [00] 28994586831213667 [+ 125]: sock_aio_write <-sys_write <...>-2804 [00] 28994586831213792 [+ 125]: dummy_socket_sendmsg <-sock_aio_write <...>-2804 [00] 28994586831213917 [+ 125]: inet_sendmsg <-sock_aio_write <...>-2804 [00] 28994586831214042 [+ 93]: tcp_sendmsg <-inet_sendmsg <...>-2804 [00] 28994586831214136 [+ 125]: lock_sock <-tcp_sendmsg <...>-2804 [00] 28994586831214261 [+ 125]: _spin_lock_bh <-lock_sock <...>-2804 [00] 28994586831214386 [+ 125]: _spin_unlock_bh <-lock_sock <...>-2804 [00] 28994586831214511 [+ 187]: local_bh_enable <-_spin_unlock_bh <...>-2804 [00] 28994586831214699 [+ 156]: tcp_current_mss <-tcp_sendmsg <...>-2804 [00] 28994586831214855 [+ 125]: __alloc_skb <-tcp_sendmsg <...>-2804 [00] 28994586831214980 [+ 156]: kmem_cache_alloc <-__alloc_skb <...>-2804 [00] 28994586831215136 [+ 218]: __kmalloc <-__alloc_skb <...>-2804 [00] 28994586831215355 [+ 187]: memset <-__alloc_skb <...>-2804 [00] 28994586831215542 [+ 125]: copy_from_user_mvcos_check <-tcp_sendmsg <...>-2804 [00] 28994586831215667 [+ 218]: copy_from_user_std <-copy_from_user_mvcos_check <...>-2804 [00] 28994586831215886 [+ 125]: __tcp_push_pending_frames <-tcp_sendmsg <...>-2804 [00] 28994586831216011 [+ 125]: tcp_transmit_skb <-__tcp_push_pending_frames <...>-2804 [00] 28994586831216136 [+ 156]: skb_clone <-tcp_transmit_skb <...>-2804 [00] 28994586831216292 [+ 250]: memcpy <-skb_clone <...>-2804 [00] 28994586831216542 [+ 156]: tcp_v4_send_check <-tcp_transmit_skb <...>-2804 [00] 28994586831216699 [+ 187]: ip_queue_xmit <-tcp_transmit_skb <...>-2804 [00] 28994586831216886 [+ 187]: ip_output <-ip_queue_xmit <...>-2804 [00] 28994586831217074 [+ 125]: dev_queue_xmit <-ip_output <...>-2804 [00] 28994586831217199 [+ 125]: _spin_lock <-dev_queue_xmit <...>-2804 [00] 28994586831217324 [+ 125]: pfifo_fast_enqueue <-dev_queue_xmit <...>-2804 [00] 28994586831217449 [+ 125]: __qdisc_run <-dev_queue_xmit <...>-2804 [00] 28994586831217574 [+ 156]: pfifo_fast_dequeue <-__qdisc_run <...>-2804 [00] 28994586831217730 [+ 125]: _spin_trylock <-__qdisc_run <...>-2804 [00] 28994586831217855 [+ 156]: dev_hard_start_xmit <-__qdisc_run <...>-2804 [00] 28994586831218011 [+ 250]: qeth_hard_start_xmit <-dev_hard_start_xmit <...>-2804 [00] 28994586831218261 [+ 125]: qeth_fill_header <-qeth_hard_start_xmit <...>-2804 [00] 28994586831218386 [+ 156]: memset <-qeth_fill_header <...>-2804 [00] 28994586831218542 [+ 187]: memset <-qeth_fill_header <...>-2804 [00] 28994586831218730 [+ 125]: skb_queue_tail <-qeth_hard_start_xmit <...>-2804 [00] 28994586831218855 [+ 125]: _spin_lock_irqsave <-skb_queue_tail <...>-2804 [00] 28994586831218980 [+ 156]: _spin_unlock_irqrestore <-skb_queue_tail <...>-2804 [00] 28994586831219136 [+ 218]: qeth_flush_buffers <-qeth_hard_start_xmit <...>-2804 [00] 28994586831219355 [+ 875]: do_QDIO <-qeth_flush_buffers <...>-2804 [00] 28994586831220230 [+ 93]: mod_timer <-do_QDIO <...>-2804 [00] 28994586831220324 [+ 187]: _spin_lock_irqsave <-mod_timer <...>-2804 [00] 28994586831220511 [+ 187]: _spin_unlock_irqrestore <-mod_timer <...>-2804 [00] 28994586831220699 [+ 125]: raise_softirq_irqoff <-qeth_hard_start_xmit <...>-2804 [00] 28994586831220824 [+ 156]: _spin_lock <-__qdisc_run <...>-2804 [00] 28994586831220980 [+ 125]: pfifo_fast_dequeue <-__qdisc_run <...>-2804 [00] 28994586831221105 [+ 156]: local_bh_enable <-dev_queue_xmit <...>-2804 [00] 28994586831221261 [+ 156]: do_softirq <-local_bh_enable <...>-2804 [00] 28994586831221417 [+ 93]: account_system_vtime <-do_softirq <...>-2804 [00] 28994586831221511 [+ 93]: account_system_time <-account_system_vtime <...>-2804 [00] 28994586831221605 [+ 125]: acct_update_integrals <-account_system_time <...>-2804 [00] 28994586831221730 [+ 156]: __do_softirq <-do_softirq <...>-2804 [00] 28994586831221886 [+ 156]: net_tx_action <-__do_softirq <...>-2804 [00] 28994586831222042 [+ 125]: _spin_trylock <-net_tx_action <...>-2804 [00] 28994586831222167 [+ 125]: __qdisc_run <-net_tx_action <...>-2804 [00] 28994586831222292 [+ 156]: pfifo_fast_dequeue <-__qdisc_run <...>-2804 [00] 28994586831222449 [+ 93]: account_system_vtime <-do_softirq <...>-2804 [00] 28994586831222542 [+ 125]: account_system_time <-account_system_vtime <...>-2804 [00] 28994586831222667 [+ 156]: acct_update_integrals <-account_system_time <...>-2804 [00] 28994586831222824 [+ 125]: sk_reset_timer <-__tcp_push_pending_frames <...>-2804 [00] 28994586831222949 [+ 156]: mod_timer <-sk_reset_timer <...>-2804 [00] 28994586831223105 [+ 125]: release_sock <-tcp_sendmsg <...>-2804 [00] 28994586831223230 [+ 93]: _spin_lock_bh <-release_sock <...>-2804 [00] 28994586831223324 [+ 156]: _spin_unlock_bh <-release_sock <...>-2804 [00] 28994586831223480 [+ 187]: local_bh_enable <-_spin_unlock_bh <...>-2804 [00] 28994586831223667 [+ 93]: dnotify_parent <-sys_write <...>-2804 [00] 28994586831223761 [+ 125]: _spin_lock <-dnotify_parent <...>-2804 [00] 28994586831223886 [+ 156]: inotify_dentry_parent_queue_event <-sys_write <...>-2804 [00] 28994586831224042 [+ 812]: inotify_inode_queue_event <-sys_write <...>-2804 [00] 28994586831224855 [+ 93]: sys_read <-sysc_noemu <...>-2804 [00] 28994586831224949 [+ 187]: fget_light <-sys_read <...>-2804 [00] 28994586831225136 [+ 125]: apparmor_file_permission <-sys_read <...>-2804 [00] 28994586831225261 [+ 125]: sock_aio_read <-sys_read <...>-2804 [00] 28994586831225386 [+ 156]: dummy_socket_recvmsg <-sock_aio_read <...>-2804 [00] 28994586831225542 [+ 93]: sock_common_recvmsg <-sock_aio_read <...>-2804 [00] 28994586831225636 [+ 125]: tcp_recvmsg <-sock_common_recvmsg <...>-2804 [00] 28994586831225761 [+ 125]: lock_sock <-tcp_recvmsg <...>-2804 [00] 28994586831225886 [+ 125]: _spin_lock_bh <-lock_sock <...>-2804 [00] 28994586831226011 [+ 125]: _spin_unlock_bh <-lock_sock <...>-2804 [00] 28994586831226136 [+ 250]: local_bh_enable <-_spin_unlock_bh <...>-2804 [00] 28994586831226386 [+ 125]: sk_wait_data <-tcp_recvmsg <...>-2804 [00] 28994586831226511 [+ 125]: prepare_to_wait <-sk_wait_data <...>-2804 [00] 28994586831226636 [+ 156]: _spin_lock_irqsave <-prepare_to_wait <...>-2804 [00] 28994586831226792 [+ 156]: _spin_unlock_irqrestore <-prepare_to_wait <...>-2804 [00] 28994586831226949 [+ 93]: _spin_lock_bh <-sk_wait_data <...>-2804 [00] 28994586831227042 [+ 125]: _spin_unlock_bh <-sk_wait_data <...>-2804 [00] 28994586831227167 [+ 187]: local_bh_enable <-_spin_unlock_bh <...>-2804 [00] 28994586831227355 [+ 125]: schedule_timeout <-sk_wait_data <...>-2804 [00] 28994586831227480 [+ 93]: schedule <-schedule_timeout <...>-2804 [00] 28994586831227574 [+ 156]: profile_hit <-schedule <...>-2804 [00] 28994586831227730 [+ 156]: sched_clock <-schedule <...>-2804 [00] 28994586831227886 [+ 281]: _spin_lock_irq <-schedule <idle>-0 [00] 28994586831228167 [+ 93]: account_vtime <-schedule <idle>-0 [00] 28994586831228261 [+ 93]: account_user_time <-account_vtime <idle>-0 [00] 28994586831228355 [+ 93]: account_system_time <-account_vtime <idle>-0 [00] 28994586831228449 [+ 218]: acct_update_integrals <-account_system_time <idle>-0 [00] 28994586831228667 [+ 93]: notifier_call_chain <-cpu_idle <idle>-0 [00] 28994586831228761 [+ 93]: nohz_idle_notify <-notifier_call_chain <idle>-0 [00] 28994586831228855 [+ 156]: rcu_needs_cpu <-nohz_idle_notify <idle>-0 [00] 28994586831229011 [+ 62]: next_timer_interrupt <-nohz_idle_notify <idle>-0 [00] 28994586831229074 [+ 93]: hrtimer_get_next_event <-next_timer_interrupt <idle>-0 [00] 28994586831229167 [+ 125]: _spin_lock_irqsave <-hrtimer_get_next_event <idle>-0 [00] 28994586831229292 [+ 93]: _spin_unlock_irqrestore <-hrtimer_get_next_event <idle>-0 [00] 28994586831229386 [+ 93]: _spin_lock_irqsave <-hrtimer_get_next_event <idle>-0 [00] 28994586831229480 [+ 93]: _spin_unlock_irqrestore <-hrtimer_get_next_event <idle>-0 [00] 28994586831229574 [+ 125]: ktime_get <-hrtimer_get_next_event <idle>-0 [00] 28994586831229699 [+ 93]: getnstimeofday <-ktime_get <idle>-0 [00] 28994586831229792 [+ 156]: do_gettimeofday <-getnstimeofday <idle>-0 [00] 28994586831229949 [+ 156]: set_normalized_timespec <-ktime_get <idle>-0 [00] 28994586831230105 [+ 125]: ns_to_timespec <-next_timer_interrupt <idle>-0 [00] 28994586831230230 [+ 156]: _spin_lock <-next_timer_interrupt <idle>-0 [00] 28994586831230386 [+212406]: vtimer_idle_notify <-notifier_call_chain <idle>-0 [00] 28994586831442792 [+ 93]: do_IRQ <-io_return <idle>-0 [00] 28994586831442886 [+ 62]: account_system_vtime <-do_IRQ <idle>-0 [00] 28994586831442949 [+ 125]: account_system_time <-account_system_vtime <idle>-0 [00] 28994586831443074 [+ 250]: acct_update_integrals <-account_system_time <idle>-0 [00] 28994586831443324 [+ 93]: do_monitor_call <-sysc_return <idle>-0 [00] 28994586831443417 [+ 62]: notifier_call_chain <-do_monitor_call <idle>-0 [00] 28994586831443480 [+ 156]: nohz_idle_notify <-notifier_call_chain <idle>-0 [00] 28994586831443636 [+ 125]: vtimer_idle_notify <-notifier_call_chain <idle>-0 [00] 28994586831443761 [+ 125]: do_adapter_IO <-do_IRQ <idle>-0 [00] 28994586831443886 [+ 500]: tiqdio_thinint_handler <-do_adapter_IO <idle>-0 [00] 28994586831444386 [+ 156]: qeth_qdio_output_handler <-tiqdio_thinint_handler <idle>-0 [00] 28994586831444542 [+ 62]: skb_dequeue <-qeth_qdio_output_handler <idle>-0 [00] 28994586831444605 [+ 125]: _spin_lock_irqsave <-skb_dequeue <idle>-0 [00] 28994586831444730 [+ 125]: _spin_unlock_irqrestore <-skb_dequeue <idle>-0 [00] 28994586831444855 [+ 93]: raise_softirq_irqoff <-qeth_qdio_output_handler <idle>-0 [00] 28994586831444949 [+ 93]: skb_dequeue <-qeth_qdio_output_handler <idle>-0 [00] 28994586831445042 [+ 93]: _spin_lock_irqsave <-skb_dequeue <idle>-0 [00] 28994586831445136 [+ 93]: _spin_unlock_irqrestore <-skb_dequeue <idle>-0 [00] 28994586831445230 [+ 250]: qeth_eddp_buf_release_contexts <-qeth_qdio_output_handler <idle>-0 [00] 28994586831445480 [+ 125]: raise_softirq_irqoff <-qeth_qdio_output_handler <idle>-0 [00] 28994586831445605 [+ 93]: del_timer <-tiqdio_thinint_handler <idle>-0 [00] 28994586831445699 [+ 125]: _spin_lock_irqsave <-del_timer <idle>-0 [00] 28994586831445824 [+ 218]: _spin_unlock_irqrestore <-del_timer <idle>-0 [00] 28994586831446042 [+ 250]: qeth_qdio_input_handler <-tiqdio_thinint_handler <idle>-0 [00] 28994586831446292 [+ 62]: __alloc_skb <-qeth_qdio_input_handler <idle>-0 [00] 28994586831446355 [+ 125]: kmem_cache_alloc <-__alloc_skb <idle>-0 [00] 28994586831446480 [+ 125]: __kmalloc <-__alloc_skb <idle>-0 [00] 28994586831446605 [+ 156]: memset <-__alloc_skb <idle>-0 [00] 28994586831446761 [+ 187]: memcpy <-qeth_qdio_input_handler <idle>-0 [00] 28994586831446949 [+ 500]: netif_rx <-qeth_qdio_input_handler <idle>-0 [00] 28994586831447449 [+ 93]: irq_exit <-do_IRQ <idle>-0 [00] 28994586831447542 [+ 93]: account_system_vtime <-irq_exit <idle>-0 [00] 28994586831447636 [+ 93]: account_system_time <-account_system_vtime <idle>-0 [00] 28994586831447730 [+ 93]: acct_update_integrals <-account_system_time <idle>-0 [00] 28994586831447824 [+ 93]: do_softirq <-irq_exit <idle>-0 [00] 28994586831447917 [+ 62]: account_system_vtime <-do_softirq <idle>-0 [00] 28994586831447980 [+ 93]: account_system_time <-account_system_vtime <idle>-0 [00] 28994586831448074 [+ 125]: acct_update_integrals <-account_system_time <idle>-0 [00] 28994586831448199 [+ 93]: __do_softirq <-do_softirq <idle>-0 [00] 28994586831448292 [+ 93]: net_tx_action <-__do_softirq <idle>-0 [00] 28994586831448386 [+ 125]: __kfree_skb <-net_tx_action <idle>-0 [00] 28994586831448511 [+ 156]: sock_wfree <-__kfree_skb <idle>-0 [00] 28994586831448667 [+ 93]: _spin_trylock <-net_tx_action <idle>-0 [00] 28994586831448761 [+ 62]: __qdisc_run <-net_tx_action <idle>-0 [00] 28994586831448824 [+ 156]: pfifo_fast_dequeue <-__qdisc_run <idle>-0 [00] 28994586831448980 [+ 93]: net_rx_action <-__do_softirq <idle>-0 [00] 28994586831449074 [+ 156]: process_backlog <-net_rx_action <idle>-0 [00] 28994586831449230 [+ 156]: ip_rcv <-process_backlog <idle>-0 [00] 28994586831449386 [+ 125]: ip_route_input <-ip_rcv <idle>-0 [00] 28994586831449511 [+ 125]: ip_local_deliver <-ip_rcv <idle>-0 [00] 28994586831449636 [+ 125]: tcp_v4_rcv <-ip_local_deliver <idle>-0 [00] 28994586831449761 [+ 125]: _read_lock <-tcp_v4_rcv <idle>-0 [00] 28994586831449886 [+ 93]: dummy_socket_sock_rcv_skb <-tcp_v4_rcv <idle>-0 [00] 28994586831449980 [+ 93]: _spin_lock <-tcp_v4_rcv <idle>-0 [00] 28994586831450074 [+ 93]: __wake_up <-tcp_v4_rcv <idle>-0 [00] 28994586831450167 [+ 93]: _spin_lock_irqsave <-__wake_up <idle>-0 [00] 28994586831450261 [+ 93]: autoremove_wake_function <-__wake_up <idle>-0 [00] 28994586831450355 [+ 93]: default_wake_function <-autoremove_wake_function <idle>-0 [00] 28994586831450449 [+ 125]: _spin_lock <-default_wake_function <idle>-0 [00] 28994586831450574 [+ 187]: sched_clock <-default_wake_function <idle>-0 [00] 28994586831450761 [+ 125]: resched_task <-default_wake_function <idle>-0 [00] 28994586831450886 [+ 93]: _spin_unlock_irqrestore <-default_wake_function <idle>-0 [00] 28994586831450980 [+ 93]: _spin_unlock_irqrestore <-__wake_up <idle>-0 [00] 28994586831451074 [+ 93]: sk_reset_timer <-tcp_v4_rcv <idle>-0 [00] 28994586831451167 [+ 93]: mod_timer <-sk_reset_timer <idle>-0 [00] 28994586831451261 [+ 125]: _spin_lock_irqsave <-mod_timer <idle>-0 [00] 28994586831451386 [+ 156]: _spin_unlock_irqrestore <-mod_timer <idle>-0 [00] 28994586831451542 [+ 93]: account_system_vtime <-do_softirq <idle>-0 [00] 28994586831451636 [+ 93]: account_system_time <-account_system_vtime <idle>-0 [00] 28994586831451730 [+ 156]: acct_update_integrals <-account_system_time <idle>-0 [00] 28994586831451886 [+ 125]: schedule <-cpu_idle <idle>-0 [00] 28994586831452011 [+ 93]: profile_hit <-schedule <idle>-0 [00] 28994586831452105 [+ 125]: sched_clock <-schedule <idle>-0 [00] 28994586831452230 [+ 343]: _spin_lock_irq <-schedule <...>-2804 [00] 28994586831452574 [+ 93]: account_vtime <-schedule <...>-2804 [00] 28994586831452667 [+ 62]: account_user_time <-account_vtime <...>-2804 [00] 28994586831452730 [+ 125]: account_system_time <-account_vtime <...>-2804 [00] 28994586831452855 [+ 125]: acct_update_integrals <-account_system_time <...>-2804 [00] 28994586831452980 [+ 93]: _spin_lock_bh <-sk_wait_data <...>-2804 [00] 28994586831453074 [+ 93]: _spin_unlock_bh <-sk_wait_data <...>-2804 [00] 28994586831453167 [+ 93]: local_bh_enable <-_spin_unlock_bh <...>-2804 [00] 28994586831453261 [+ 125]: finish_wait <-sk_wait_data <...>-2804 [00] 28994586831453386 [+ 93]: tcp_v4_do_rcv <-tcp_recvmsg <...>-2804 [00] 28994586831453480 [+ 93]: tcp_rcv_established <-tcp_v4_do_rcv <...>-2804 [00] 28994586831453574 [+ 93]: local_bh_enable <-tcp_rcv_established <...>-2804 [00] 28994586831453667 [+ 93]: skb_copy_datagram_iovec <-tcp_rcv_established <...>-2804 [00] 28994586831453761 [+ 93]: memcpy_toiovec <-skb_copy_datagram_iovec <...>-2804 [00] 28994586831453855 [+ 62]: copy_to_user_mvcos_check <-memcpy_toiovec <...>-2804 [00] 28994586831453917 [+ 187]: copy_to_user_std <-copy_to_user_mvcos_check <...>-2804 [00] 28994586831454105 [+ 125]: tcp_ack <-tcp_rcv_established <...>-2804 [00] 28994586831454230 [+ 125]: __kfree_skb <-tcp_ack <...>-2804 [00] 28994586831454355 [+ 125]: kfree <-__kfree_skb <...>-2804 [00] 28994586831454480 [+ 187]: kmem_cache_free <-__kfree_skb <...>-2804 [00] 28994586831454667 [+ 93]: tcp_reno_cong_avoid <-tcp_ack <...>-2804 [00] 28994586831454761 [+ 93]: tcp_current_mss <-tcp_rcv_established <...>-2804 [00] 28994586831454855 [+ 156]: __tcp_push_pending_frames <-tcp_rcv_established <...>-2804 [00] 28994586831455011 [+ 62]: tcp_send_delayed_ack <-tcp_rcv_established <...>-2804 [00] 28994586831455074 [+ 93]: sk_reset_timer <-tcp_send_delayed_ack <...>-2804 [00] 28994586831455167 [+ 93]: mod_timer <-sk_reset_timer <...>-2804 [00] 28994586831455261 [+ 93]: _spin_lock_irqsave <-mod_timer <...>-2804 [00] 28994586831455355 [+ 93]: _spin_unlock_irqrestore <-mod_timer <...>-2804 [00] 28994586831455449 [+ 125]: __kfree_skb <-tcp_rcv_established <...>-2804 [00] 28994586831455574 [+ 93]: kfree <-__kfree_skb <...>-2804 [00] 28994586831455667 [+ 125]: kmem_cache_free <-__kfree_skb <...>-2804 [00] 28994586831455792 [+ 156]: local_bh_enable <-tcp_recvmsg <...>-2804 [00] 28994586831455949 [+ 93]: __tcp_select_window <-tcp_recvmsg <...>-2804 [00] 28994586831456042 [+ 93]: release_sock <-tcp_recvmsg <...>-2804 [00] 28994586831456136 [+ 93]: _spin_lock_bh <-release_sock <...>-2804 [00] 28994586831456230 [+ 93]: _spin_unlock_bh <-release_sock <...>-2804 [00] 28994586831456324 [+ 156]: local_bh_enable <-_spin_unlock_bh <...>-2804 [00] 28994586831456480 [+ 93]: dnotify_parent <-sys_read <...>-2804 [00] 28994586831456574 [+ 125]: _spin_lock <-dnotify_parent <...>-2804 [00] 28994586831456699 [+ 62]: inotify_dentry_parent_queue_event <-sys_read <...>-2804 [00] 28994586831456761 [+ 1437]: inotify_inode_queue_event <-sys_read ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-14 12:34 ` Heiko Carstens @ 2009-08-14 16:04 ` Linus Torvalds 2009-08-14 17:13 ` Heiko Carstens 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2009-08-14 16:04 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, David S. Miller On Fri, 14 Aug 2009, Heiko Carstens wrote: > > All of this started when we compared a 2.6.27 based distro kernel > to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu > cycles than 2.6.16 does for the same workloads. > > As an example: we took a simple one connection ping pong network load. > (client sends packet to server and waits until packet from server > returns and game starts again). It uses more cpu cycles on 2.6.27. > > Using ftrace we figured out that on 2.6.27 it takes more than 500 function > calls on the client for a single roundtrip while on 2.6.16 it took only > ~230 calls. Hmm. But the spinlock part of this seems to not really have changed. Removing everything but the actual callchain info, and then doing a "diff" between your two roundtrips, and they look very different, and basically none of the difference seems to be due to spinlocks. It seems to be due to a number of different things, but the bulk of the new costs seem to be in networking (and to some degree scheduling). There are smaller differences elsewhere, but the networking code _really_ blows up. I don't know how much of some of these are "real" kernel changes, and how much of it is less inlining, but spinlocks don't seem to be the issue. Here's a quick walk-through (some of these get repeated a couple of times in your traces) * Apparmor going through the generic security code and rw_verify_area: apparmor_file_permission <-sys_write becomes security_file_permission <-rw_verify_area apparmor_file_permission <-security_file_permission aa_file_permission <-apparmor_file_permission * the vfs write itself being deeper sock_aio_write <-sys_write becomes vfs_write <-SyS_write rw_verify_area <-vfs_write do_sync_write <-vfs_write sock_aio_write <-do_sync_write * the socket code does totally new apparmor security things that were not in 2.6.16 at all: security_socket_sendmsg <-sock_aio_write apparmor_socket_sendmsg <-security_socket_sendmsg aa_revalidate_sk <-apparmor_socket_sendmsg * TCP doing more: __alloc_skb <-tcp_sendmsg .. .. ip_output <-ip_queue_xmit dev_queue_xmit <-ip_output becomes tcp_established_options <-tcp_current_mss sk_stream_alloc_skb <-tcp_sendmsg __alloc_skb <-sk_stream_alloc_skb .. tcp_init_tso_segs <-__tcp_push_pending_frames tcp_set_skb_tso_segs <-tcp_init_tso_segs .. __skb_clone <-skb_clone __copy_skb_header <-__skb_clone tcp_established_options <-tcp_transmit_skb skb_push <-tcp_transmit_skb __tcp_select_window <-tcp_transmit_skb tcp_options_write <-tcp_transmit_skb .. __sk_dst_check <-ip_queue_xmit skb_push <-ip_queue_xmit ip_local_out <-ip_queue_xmit __ip_local_out <-ip_local_out ip_output <-ip_local_out ip_finish_output <-ip_output skb_push <-ip_finish_output dev_queue_xmit <-ip_finish_output local_bh_disable <-dev_queue_xmit * More TCP changes at tcp_ack time: kfree <-__kfree_skb kmem_cache_free <-__kfree_skb tcp_reno_cong_avoid <-tcp_ack tcp_current_mss <-tcp_rcv_established __tcp_push_pending_frames <-tcp_rcv_established tcp_send_delayed_ack <-tcp_rcv_established becomes skb_release_all <-__kfree_skb dst_release <-skb_release_all skb_release_data <-skb_release_all kfree <-skb_release_data kmem_cache_free <-__kfree_skb tcp_rtt_estimator <-tcp_ack bictcp_acked <-tcp_ack bictcp_cong_avoid <-tcp_ack tcp_is_cwnd_limited <-bictcp_cong_avoid tcp_current_mss <-tcp_rcv_established tcp_established_options <-tcp_current_mss __tcp_push_pending_frames <-tcp_rcv_established __tcp_ack_snd_check <-tcp_rcv_established tcp_send_delayed_ack <-__tcp_ack_snd_check * The network driver doing more: qeth_hard_start_xmit <-dev_hard_start_xmit qeth_fill_header <-qeth_hard_start_xmit memset <-qeth_fill_header memset <-qeth_fill_header skb_queue_tail <-qeth_hard_start_xmit .. mod_timer <-do_QDIO _spin_lock_irqsave <-mod_timer _spin_unlock_irqrestore <-mod_timer raise_softirq_irqoff <-qeth_hard_start_xmit .. becomes qeth_l3_hard_start_xmit <-dev_hard_start_xmit qeth_get_cast_type <-qeth_l3_hard_start_xmit qeth_get_priority_queue <-qeth_l3_hard_start_xmit skb_realloc_headroom <-qeth_l3_hard_start_xmit pskb_copy <-skb_realloc_headroom __alloc_skb <-pskb_copy kmem_cache_alloc <-__alloc_skb __kmalloc <-__alloc_skb skb_put <-pskb_copy copy_skb_header <-pskb_copy __copy_skb_header <-copy_skb_header skb_pull <-qeth_l3_hard_start_xmit skb_push <-qeth_l3_hard_start_xmit qeth_l3_fill_header <-qeth_l3_hard_start_xmit qeth_get_elements_no <-qeth_l3_hard_start_xmit qeth_do_send_packet <-qeth_l3_hard_start_xmit skb_queue_tail <-qeth_do_send_packet .. qdio_perf_stat_inc <-do_QDIO qdio_kick_outbound_q <-do_QDIO qdio_perf_stat_inc <-qdio_kick_outbound_q __tasklet_schedule <-do_QDIO dev_kfree_skb_any <-qeth_l3_hard_start_xmit kfree_skb <-dev_kfree_skb_any __kfree_skb <-kfree_skb skb_release_all <-__kfree_skb dst_release <-skb_release_all sock_wfree <-skb_release_all skb_release_data <-skb_release_all __netif_schedule <-qeth_l3_hard_start_xmit raise_softirq_irqoff <-__netif_schedule .. tasklet_action <-__do_softirq qdio_outbound_processing <-tasklet_action qdio_perf_stat_inc <-qdio_outbound_processing mod_timer <-qdio_outbound_processing * the timer code doing more: __timer_stats_timer_set_start_info <-mod_timer __mod_timer <-mod_timer __timer_stats_timer_set_start_info <-__mod_timer lock_timer_base <-__mod_timer _spin_lock_irqsave <-lock_timer_base internal_add_timer <-__mod_timer _spin_unlock_irqrestore <-__mod_timer qdio_perf_stat_inc <-qdio_outbound_processing account_system_vtime <-__do_softirq * the scheduler code doing way deeper chains: profile_hit <-schedule sched_clock <-schedule _spin_lock_irq <-schedule account_vtime <-schedule account_user_time <-account_vtime becomes _spin_lock <-schedule sched_clock_cpu <-schedule sched_clock <-sched_clock_cpu deactivate_task <-schedule dequeue_task <-deactivate_task dequeue_task_fair <-dequeue_task dequeue_entity <-dequeue_task_fair update_curr <-dequeue_entity hrtick_start_fair <-dequeue_task_fair put_prev_task_fair <-schedule pick_next_task_fair <-schedule pick_next_task_rt <-schedule pick_next_task_fair <-schedule pick_next_task_idle <-schedule _spin_trylock <-tracing_record_cmdline finish_task_switch <-schedule account_vtime <-finish_task_switch account_user_time <-account_vtime acct_update_integrals <-account_user_time jiffies_to_timeval <-acct_update_integrals Hmm. Not pretty. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-14 16:04 ` Linus Torvalds @ 2009-08-14 17:13 ` Heiko Carstens 2009-08-14 18:08 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Heiko Carstens @ 2009-08-14 17:13 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, David S. Miller On Fri, Aug 14, 2009 at 09:04:49AM -0700, Linus Torvalds wrote: > On Fri, 14 Aug 2009, Heiko Carstens wrote: > > > > All of this started when we compared a 2.6.27 based distro kernel > > to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu > > cycles than 2.6.16 does for the same workloads. > > > > As an example: we took a simple one connection ping pong network load. > > (client sends packet to server and waits until packet from server > > returns and game starts again). It uses more cpu cycles on 2.6.27. > > > > Using ftrace we figured out that on 2.6.27 it takes more than 500 function > > calls on the client for a single roundtrip while on 2.6.16 it took only > > ~230 calls. > > Hmm. But the spinlock part of this seems to not really have changed. > > Removing everything but the actual callchain info, and then doing a "diff" > between your two roundtrips, and they look very different, and basically > none of the difference seems to be due to spinlocks. > > It seems to be due to a number of different things, but the bulk of the > new costs seem to be in networking (and to some degree scheduling). There > are smaller differences elsewhere, but the networking code _really_ blows > up. > > I don't know how much of some of these are "real" kernel changes, and how > much of it is less inlining, but spinlocks don't seem to be the issue. > Here's a quick walk-through (some of these get repeated a couple of times > in your traces) That's true. I just wanted to give some background how all of this started. Spinlocks are equally good or bad for s390 on both kernel versions. However quite a few of the additional function calls in networking code come from uninlining: c2aa270a [NET]: uninline skb_push, de-bloats a lot 6be8ac2f [NET]: uninline skb_pull, de-bloats a lot 419ae74e [NET]: uninline skb_trim, de-bloats Christian Ehrhardt identified these and more, but hasn't posted patches yet. Reverting these patches does increase performance. That's when I started wondering what we would gain when we would inline spinlock code again. > * the timer code doing more: > > __timer_stats_timer_set_start_info <-mod_timer > __mod_timer <-mod_timer > __timer_stats_timer_set_start_info <-__mod_timer > lock_timer_base <-__mod_timer > _spin_lock_irqsave <-lock_timer_base > internal_add_timer <-__mod_timer > _spin_unlock_irqrestore <-__mod_timer > qdio_perf_stat_inc <-qdio_outbound_processing > account_system_vtime <-__do_softirq This should be better now: 507e1231 timer stats: Optimize by adding quick check to avoid function calls We did also address our device driver with several patches. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-14 17:13 ` Heiko Carstens @ 2009-08-14 18:08 ` Linus Torvalds 2009-08-14 20:19 ` David Miller 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2009-08-14 18:08 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, David S. Miller On Fri, 14 Aug 2009, Heiko Carstens wrote: > > However quite a few of the additional function calls in networking code > come from uninlining: > > c2aa270a [NET]: uninline skb_push, de-bloats a lot > 6be8ac2f [NET]: uninline skb_pull, de-bloats a lot > 419ae74e [NET]: uninline skb_trim, de-bloats Hmm. Those functions are big only because of the stupid debugging, which is almost certainly not worth it any more. I doubt people have seen enough skb under/over-flows in the last years to merit the code. I suspect we should remove the silly skb_under/over_panic functions, and do - skb_put: - if (unlikely(skb->tail > skb->end)) - skb_over_panic(skb, len, __builtin_return_address(0)); + BUG_ON(skb->tail > skb->end); - skb_push: - if (unlikely(skb->data<skb->head)) - skb_under_panic(skb, len, __builtin_return_address(0)); + BUG_ON(skb->data < skb->head); at which point it might be worthwhile to inline them again, because the footprint of a BUG_ON() is really fairly small. If this is a big deal on s390, it might be worth it. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-14 18:08 ` Linus Torvalds @ 2009-08-14 20:19 ` David Miller 2009-08-14 20:45 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: David Miller @ 2009-08-14 20:19 UTC (permalink / raw) To: torvalds Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky, arnd, horsth, ehrhardt From: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 14 Aug 2009 11:08:16 -0700 (PDT) > On Fri, 14 Aug 2009, Heiko Carstens wrote: >> >> However quite a few of the additional function calls in networking code >> come from uninlining: >> >> c2aa270a [NET]: uninline skb_push, de-bloats a lot >> 6be8ac2f [NET]: uninline skb_pull, de-bloats a lot >> 419ae74e [NET]: uninline skb_trim, de-bloats > > Hmm. Those functions are big only because of the stupid debugging, which > is almost certainly not worth it any more. I doubt people have seen enough > skb under/over-flows in the last years to merit the code. We have found a few bugs in the past year or two because of the check, but indeed not "a lot". > I suspect we should remove the silly skb_under/over_panic functions, and > do > > - skb_put: > - if (unlikely(skb->tail > skb->end)) > - skb_over_panic(skb, len, __builtin_return_address(0)); > + BUG_ON(skb->tail > skb->end); > > - skb_push: > - if (unlikely(skb->data<skb->head)) > - skb_under_panic(skb, len, __builtin_return_address(0)); > + BUG_ON(skb->data < skb->head); > > at which point it might be worthwhile to inline them again, because the > footprint of a BUG_ON() is really fairly small. Maybe. Although most people turn on verbose BUG's and that expands to the same if not more code than what's there now in these SKB inlines. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-14 20:19 ` David Miller @ 2009-08-14 20:45 ` Linus Torvalds 2009-08-14 21:10 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2009-08-14 20:45 UTC (permalink / raw) To: David Miller Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky, arnd, horsth, ehrhardt On Fri, 14 Aug 2009, David Miller wrote: > > Maybe. Although most people turn on verbose BUG's and that > expands to the same if not more code than what's there now > in these SKB inlines. But not inline in the code, though. So yeah, it has a memory footprint, but shouldn't have a cache footprint. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-14 20:45 ` Linus Torvalds @ 2009-08-14 21:10 ` Linus Torvalds 2009-08-14 22:23 ` David Miller 2009-08-16 18:27 ` Heiko Carstens 0 siblings, 2 replies; 34+ messages in thread From: Linus Torvalds @ 2009-08-14 21:10 UTC (permalink / raw) To: David Miller Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky, arnd, horsth, ehrhardt On Fri, 14 Aug 2009, Linus Torvalds wrote: > > But not inline in the code, though. So yeah, it has a memory footprint, > but shouldn't have a cache footprint. An example of this: stack usage. This is the code with BUG_ON(): skb_push: push %rbp mov %esi,%eax add %esi,0x68(%rdi) neg %rax mov %rsp,%rbp add 0xd8(%rdi),%rax mov %rax,0xd8(%rdi) cmp 0xd0(%rdi),%rax jae 2f ud2a 1: jmp 1b 2: leaveq retq and note how in the code, we're just jumping over something like four bytes ("ud2" plus that silly endless loop-jump just to make gcc happy are both 2 bytes on x86[-64]). Here's the same code with that horrid skb_under_panic(): skb_push: push %rbp mov %esi,%eax mov %rsp,%rbp neg %rax push %rbx mov %rdi,%rbx sub $0x8,%rsp add 0xd8(%rdi),%rax add %esi,0x68(%rdi) mov %rax,0xd8(%rdi) cmp 0xd0(%rdi),%rax jae 1f mov 0x8(%rbp),%rdx callq skb_under_panic 1: mov 0xd8(%rbx),%rax pop %r11 pop %rbx leaveq retq and notice how it creates a stack frame due to the call (even when the call is never done!) and the dead space is now 11 bytes. (Total size of functions: 41 bytes vs 64 bytes) So things like that do make a difference. It's surprisingly expensive to add a function call t what otherwise could be a leaf function. And BUG_ON() avoids that. Of course, even at just 41 bytes, it's not _entirely_ obvious that inlining is the right thing. Of those 41 bytes, four are just function overhead, and the "call" is five bytes in the caller and a few bytes of spills and argument setups, but inlining is still probably going to change each call-site from ~10 bytes to ~35 bytes. Linus --- include/linux/skbuff.h | 4 --- net/core/skbuff.c | 51 +---------------------------------------------- 2 files changed, 2 insertions(+), 53 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f2c69a2..2af92b4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -483,10 +483,6 @@ extern int skb_cow_data(struct sk_buff *skb, int tailbits, extern int skb_pad(struct sk_buff *skb, int pad); #define dev_kfree_skb(a) consume_skb(a) #define dev_consume_skb(a) kfree_skb_clean(a) -extern void skb_over_panic(struct sk_buff *skb, int len, - void *here); -extern void skb_under_panic(struct sk_buff *skb, int len, - void *here); extern int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb, int getfrag(void *from, char *to, int offset, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9e0597d..ae63473 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -103,51 +103,6 @@ static struct pipe_buf_operations sock_pipe_buf_ops = { .get = sock_pipe_buf_get, }; -/* - * Keep out-of-line to prevent kernel bloat. - * __builtin_return_address is not used because it is not always - * reliable. - */ - -/** - * skb_over_panic - private function - * @skb: buffer - * @sz: size - * @here: address - * - * Out of line support code for skb_put(). Not user callable. - */ -void skb_over_panic(struct sk_buff *skb, int sz, void *here) -{ - printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p " - "data:%p tail:%#lx end:%#lx dev:%s\n", - here, skb->len, sz, skb->head, skb->data, - (unsigned long)skb->tail, (unsigned long)skb->end, - skb->dev ? skb->dev->name : "<NULL>"); - BUG(); -} -EXPORT_SYMBOL(skb_over_panic); - -/** - * skb_under_panic - private function - * @skb: buffer - * @sz: size - * @here: address - * - * Out of line support code for skb_push(). Not user callable. - */ - -void skb_under_panic(struct sk_buff *skb, int sz, void *here) -{ - printk(KERN_EMERG "skb_under_panic: text:%p len:%d put:%d head:%p " - "data:%p tail:%#lx end:%#lx dev:%s\n", - here, skb->len, sz, skb->head, skb->data, - (unsigned long)skb->tail, (unsigned long)skb->end, - skb->dev ? skb->dev->name : "<NULL>"); - BUG(); -} -EXPORT_SYMBOL(skb_under_panic); - /* Allocate a new skbuff. We do this ourselves so we can fill in a few * 'private' fields and also do memory statistics to find all the * [BEEP] leaks. @@ -1014,8 +969,7 @@ unsigned char *skb_put(struct sk_buff *skb, unsigned int len) SKB_LINEAR_ASSERT(skb); skb->tail += len; skb->len += len; - if (unlikely(skb->tail > skb->end)) - skb_over_panic(skb, len, __builtin_return_address(0)); + BUG_ON(skb->tail > skb->end); return tmp; } EXPORT_SYMBOL(skb_put); @@ -1033,8 +987,7 @@ unsigned char *skb_push(struct sk_buff *skb, unsigned int len) { skb->data -= len; skb->len += len; - if (unlikely(skb->data<skb->head)) - skb_under_panic(skb, len, __builtin_return_address(0)); + BUG_ON(skb->data < skb->head); return skb->data; } EXPORT_SYMBOL(skb_push); ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-14 21:10 ` Linus Torvalds @ 2009-08-14 22:23 ` David Miller 2009-08-16 18:27 ` Heiko Carstens 1 sibling, 0 replies; 34+ messages in thread From: David Miller @ 2009-08-14 22:23 UTC (permalink / raw) To: torvalds Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky, arnd, horsth, ehrhardt From: Linus Torvalds <torvalds@linux-foundation.org> Date: Fri, 14 Aug 2009 14:10:45 -0700 (PDT) > On Fri, 14 Aug 2009, Linus Torvalds wrote: >> >> But not inline in the code, though. So yeah, it has a memory footprint, >> but shouldn't have a cache footprint. > > An example of this: stack usage. This is the code with BUG_ON(): ... > and note how in the code, we're just jumping over something like four > bytes ("ud2" plus that silly endless loop-jump just to make gcc happy > are both 2 bytes on x86[-64]). > > Here's the same code with that horrid skb_under_panic(): Yes on x86 it's better even with verbose debugging because of the inlining of the debug message pointer etc. I'll have to do something more intelligent than I currently do on sparc64, and I had always been meaning to do that. :-) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-14 21:10 ` Linus Torvalds 2009-08-14 22:23 ` David Miller @ 2009-08-16 18:27 ` Heiko Carstens 2009-08-16 18:45 ` Linus Torvalds 2009-08-17 21:26 ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar 1 sibling, 2 replies; 34+ messages in thread From: Heiko Carstens @ 2009-08-16 18:27 UTC (permalink / raw) To: Linus Torvalds Cc: David Miller, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky, arnd, horsth, ehrhardt On Fri, Aug 14, 2009 at 02:10:45PM -0700, Linus Torvalds wrote: > On Fri, 14 Aug 2009, Linus Torvalds wrote: > > > > But not inline in the code, though. So yeah, it has a memory footprint, > > but shouldn't have a cache footprint. > > An example of this: stack usage. This is the code with BUG_ON(): > > skb_push: > push %rbp > mov %esi,%eax > add %esi,0x68(%rdi) > neg %rax > mov %rsp,%rbp > add 0xd8(%rdi),%rax > mov %rax,0xd8(%rdi) > cmp 0xd0(%rdi),%rax > jae 2f > ud2a > 1: jmp 1b > 2: leaveq > retq An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 results in the following vmlinux sizes: 574152755 2009-08-16 20:05 vmlinux-orig 574862000 2009-08-16 20:13 vmlinux-skb-inline The skb-inline variant was created by using your patch and on top of it the patch below. Size increased by 692KB (0.12%). Dunno.. is that acceptable? --- include/linux/skbuff.h | 35 +++++++++++++++++++++++++++++++++-- net/core/skbuff.c | 38 -------------------------------------- 2 files changed, 33 insertions(+), 40 deletions(-) Index: linux-2.6/include/linux/skbuff.h =================================================================== --- linux-2.6.orig/include/linux/skbuff.h +++ linux-2.6/include/linux/skbuff.h @@ -1116,7 +1116,6 @@ static inline void skb_set_tail_pointer( /* * Add data to an sk_buff */ -extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len); static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len) { unsigned char *tmp = skb_tail_pointer(skb); @@ -1126,7 +1125,23 @@ static inline unsigned char *__skb_put(s return tmp; } -extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len); +/** + * skb_put - add data to a buffer + * @skb: buffer to use + * @len: amount of data to add + * + * This function extends the used data area of the buffer. If this would + * exceed the total buffer size the kernel will panic. A pointer to the + * first byte of the extra data is returned. + */ +static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len) +{ + unsigned char *tmp = __skb_put(skb, len); + + BUG_ON(skb->tail > skb->end); + return tmp; +} + static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len) { skb->data -= len; @@ -1134,6 +1149,22 @@ static inline unsigned char *__skb_push( return skb->data; } +/** + * skb_push - add data to the start of a buffer + * @skb: buffer to use + * @len: amount of data to add + * + * This function extends the used data area of the buffer at the buffer + * start. If this would exceed the total buffer headroom the kernel will + * panic. A pointer to the first byte of the extra data is returned. + */ +static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len) +{ + __skb_push(skb, len); + BUG_ON(skb->data < skb->head); + return skb->data; +} + extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len); static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len) { Index: linux-2.6/net/core/skbuff.c =================================================================== --- linux-2.6.orig/net/core/skbuff.c +++ linux-2.6/net/core/skbuff.c @@ -955,44 +955,6 @@ free_skb: EXPORT_SYMBOL(skb_pad); /** - * skb_put - add data to a buffer - * @skb: buffer to use - * @len: amount of data to add - * - * This function extends the used data area of the buffer. If this would - * exceed the total buffer size the kernel will panic. A pointer to the - * first byte of the extra data is returned. - */ -unsigned char *skb_put(struct sk_buff *skb, unsigned int len) -{ - unsigned char *tmp = skb_tail_pointer(skb); - SKB_LINEAR_ASSERT(skb); - skb->tail += len; - skb->len += len; - BUG_ON(skb->tail > skb->end); - return tmp; -} -EXPORT_SYMBOL(skb_put); - -/** - * skb_push - add data to the start of a buffer - * @skb: buffer to use - * @len: amount of data to add - * - * This function extends the used data area of the buffer at the buffer - * start. If this would exceed the total buffer headroom the kernel will - * panic. A pointer to the first byte of the extra data is returned. - */ -unsigned char *skb_push(struct sk_buff *skb, unsigned int len) -{ - skb->data -= len; - skb->len += len; - BUG_ON(skb->data < skb->head); - return skb->data; -} -EXPORT_SYMBOL(skb_push); - -/** * skb_pull - remove data from the start of a buffer * @skb: buffer to use * @len: amount of data to remove ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-16 18:27 ` Heiko Carstens @ 2009-08-16 18:45 ` Linus Torvalds 2009-08-16 20:36 ` Ingo Molnar 2009-08-17 21:26 ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar 1 sibling, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2009-08-16 18:45 UTC (permalink / raw) To: Heiko Carstens Cc: David Miller, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky, arnd, horsth, ehrhardt On Sun, 16 Aug 2009, Heiko Carstens wrote: > > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 results in > the following vmlinux sizes: > > 574152755 2009-08-16 20:05 vmlinux-orig > 574862000 2009-08-16 20:13 vmlinux-skb-inline It's more relevant to look at just code size. BUG_ON() has more of an impact on data sizes, because it does something like 10-12 bytes of exception data for each bug-on. But that exception data is out-of-line and not interesting. If the code size blows up a lot, that's worse than if the data size expands a bit. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-16 18:45 ` Linus Torvalds @ 2009-08-16 20:36 ` Ingo Molnar 2009-08-17 10:26 ` Heiko Carstens 0 siblings, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2009-08-16 20:36 UTC (permalink / raw) To: Linus Torvalds Cc: Heiko Carstens, David Miller, akpm, a.p.zijlstra, linux-arch, schwidefsky, arnd, horsth, ehrhardt * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sun, 16 Aug 2009, Heiko Carstens wrote: > > > > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc > > 4.3.2 results in the following vmlinux sizes: > > > > 574152755 2009-08-16 20:05 vmlinux-orig > > 574862000 2009-08-16 20:13 vmlinux-skb-inline > > It's more relevant to look at just code size. BUG_ON() has more of > an impact on data sizes, because it does something like 10-12 > bytes of exception data for each bug-on. But that exception data > is out-of-line and not interesting. > > If the code size blows up a lot, that's worse than if the data > size expands a bit. Also, allyesconfig isnt really relevant to inlining cost in general as it enables everything. The defconfigs are better (at least on x86) because they tend to be some distro-alike config - i.e. they are closer to what most Linux users will run in the end. Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3 2009-08-16 20:36 ` Ingo Molnar @ 2009-08-17 10:26 ` Heiko Carstens 0 siblings, 0 replies; 34+ messages in thread From: Heiko Carstens @ 2009-08-17 10:26 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, David Miller, akpm, a.p.zijlstra, linux-arch, schwidefsky, arnd, horsth, ehrhardt On Sun, Aug 16, 2009 at 10:36:50PM +0200, Ingo Molnar wrote: > > > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc > > > 4.3.2 results in the following vmlinux sizes: > > > > > > 574152755 2009-08-16 20:05 vmlinux-orig > > > 574862000 2009-08-16 20:13 vmlinux-skb-inline > > > > It's more relevant to look at just code size. BUG_ON() has more of > > an impact on data sizes, because it does something like 10-12 > > bytes of exception data for each bug-on. But that exception data > > is out-of-line and not interesting. > > > > If the code size blows up a lot, that's worse than if the data > > size expands a bit. > > Also, allyesconfig isnt really relevant to inlining cost in general > as it enables everything. The defconfigs are better (at least on > x86) because they tend to be some distro-alike config - i.e. they > are closer to what most Linux users will run in the end. Ok, I'm still using allyesconfig but still these numbers should be good enough: For the unpatched kernel we have ffffffff81009000 T _stext ffffffff83614130 T _etext size: 39891248 ffffffff8460e250 R __start___bug_table ffffffff8462e9ec R __stop___bug_table size: 133020 And for the patched kernel (inlined functions) we get ffffffff81009000 T _stext ffffffff83627250 T _etext size: 39969360 ffffffff84621080 R __start___bug_table ffffffff84649c4c R __stop___bug_table size: 166860 We have CONFIG_GENERIC_BUG, CONFIG_GENERIC_BUG_RELATIVE_POINTERS and CONFIG_DEBUG_BUGVERBOSE all enabled. So the size of each bug table entry is 10 bytes. We end up with 3384 additional bug table entries caused by inlining. Text size increased by 78112 bytes which makes an average extra 23 bytes for each call site. ^ permalink raw reply [flat|nested] 34+ messages in thread
* [numbers] Re: [patch] more skb ops inlining 2009-08-16 18:27 ` Heiko Carstens 2009-08-16 18:45 ` Linus Torvalds @ 2009-08-17 21:26 ` Ingo Molnar 2009-08-18 11:34 ` Heiko Carstens 1 sibling, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2009-08-17 21:26 UTC (permalink / raw) To: Heiko Carstens Cc: Linus Torvalds, David Miller, akpm, a.p.zijlstra, linux-arch, schwidefsky, arnd, horsth, ehrhardt * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 > results in the following vmlinux sizes: > > 574152755 2009-08-16 20:05 vmlinux-orig > 574862000 2009-08-16 20:13 vmlinux-skb-inline > > The skb-inline variant was created by using your patch and on top > of it the patch below. Size increased by 692KB (0.12%). > > Dunno.. is that acceptable? i ported your patch to latest mainline (see it attached below) i've done x86(-64) defconfig (with disabled CONFIG_OPTIMIZE_INLINING) measurements vmlinux: text data bss dec hex filename 8455329 1192304 989908 10637541 a250e5 vmlinux.before 8471036 1192304 989908 10653248 a28e40 vmlinux.after that's a +0.18% text size increase - quite significant. I also did performance measurements, trying to figure out whether this increase in size can be measured via any runtime performance metrics such as cycles, instructions or cache-misses. To do that i picked a micro-benchmarks lat_tcp. ( Note that microbenchmarks are biased in favor of inlining patches like yours, because they disproportionately favor the speed benefit of it while not adequately showing the cache footprint disadvantage of increased kernel text size. ) Without your patch i got: titan:~/l> perf stat --repeat 10 -e task-clock -e cycles -e instructions -e L1-icache-refs -e L1-icache-misses ./lat_tcp localhost TCP latency using localhost: 17.5198 microseconds TCP latency using localhost: 17.5060 microseconds TCP latency using localhost: 17.4684 microseconds TCP latency using localhost: 17.5416 microseconds TCP latency using localhost: 17.4881 microseconds TCP latency using localhost: 17.4832 microseconds TCP latency using localhost: 17.4830 microseconds TCP latency using localhost: 17.4723 microseconds TCP latency using localhost: 17.4733 microseconds TCP latency using localhost: 17.5255 microseconds Performance counter stats for './lat_tcp localhost' (10 runs): 1494.625043 task-clock-msecs # 0.564 CPUs ( +- 0.077% ) 3665868023 cycles # 2452.701 M/sec ( +- 0.070% ) 2163212606 instructions # 0.590 IPC ( +- 0.090% ) 2449056216 L1-icache-loads # 1638.576 M/sec ( +- 0.056% ) 61081182 L1-icache-load-misses # 40.867 M/sec ( +- 0.192% ) 2.649159316 seconds time elapsed ( +- 0.090% ) With your patch i got: titan:~/l> perf stat --repeat 10 -e task-clock -e cycles -e instructions -e L1-icache-refs -e L1-icache-misses ./lat_tcp localhost TCP latency using localhost: 17.6913 microseconds TCP latency using localhost: 17.7218 microseconds TCP latency using localhost: 17.6882 microseconds TCP latency using localhost: 17.7196 microseconds TCP latency using localhost: 17.6411 microseconds TCP latency using localhost: 17.7136 microseconds TCP latency using localhost: 17.6488 microseconds TCP latency using localhost: 17.6663 microseconds TCP latency using localhost: 17.7445 microseconds TCP latency using localhost: 17.6851 microseconds Performance counter stats for './lat_tcp localhost' (10 runs): 1497.479984 task-clock-msecs # 0.564 CPUs ( +- 0.106% ) 3690097505 cycles # 2464.205 M/sec ( +- 0.109% ) 2162432402 instructions # 0.586 IPC ( +- 0.127% ) 2437974192 L1-icache-loads # 1628.051 M/sec ( +- 0.105% ) 63064588 L1-icache-load-misses # 42.114 M/sec ( +- 0.278% ) 2.656952976 seconds time elapsed ( +- 0.110% ) Summarized (for lat_tcp): | Before: | | 1494.625043 task-clock-msecs # 0.564 CPUs ( +- 0.077% ) | 3665868023 cycles # 2452.701 M/sec ( +- 0.070% ) | 2163212606 instructions # 0.590 IPC ( +- 0.090% ) | | After: | | 1497.479984 task-clock-msecs # 0.564 CPUs ( +- 0.106% ) | 3690097505 cycles # 2464.205 M/sec ( +- 0.109% ) | 2162432402 instructions # 0.586 IPC ( +- 0.127% ) So we've got a 0.2% slowdown, which is provably outside the noise level of 0.1%. The instruction count remained almost the same. The reason for the slowdown can be seen in the L1 icache-fetch stats: 2449056216 L1-icache-loads # 1638.576 M/sec ( +- 0.056% ) 61081182 L1-icache-load-misses # 40.867 M/sec ( +- 0.192% ) 2437974192 L1-icache-loads # 1628.051 M/sec ( +- 0.105% ) 63064588 L1-icache-load-misses # 42.114 M/sec ( +- 0.278% ) The icache-misses went up from 40.867 M/sec to 42.114 M/sec, a 3.05% worsening - well beyond the statistical significance of 0.28%. So unless i botched the measurements or some accidental code layout differences punish the patch artificially, this patch is a small but measureable performance regression on x86. The slowdown itself is barely provable with a test repeat count of 10 iterations (not a surprise really - such patches were not really provable before via /usr/bin/time kinds of measurements), on an otherwise totally idle system. The icache-miss stats do support the notion that what is causing the slowdown is the critical path fallout out of the icache. ( By all means please repeat these measurements - they are very easy to do via 'perf stat --repeat 10' - see the examples above. ) To build the kernel i used gcc 4.3.2. The CPU i used is friendly to inlining patches as well: it's an Extreme Edition Core2 Duo (booted to a single core, to stabilize the numbers): Intel(R) Core(TM)2 CPU E6800 @ 2.93GHz With 4MB of L2 cache. I suspect smaller CPUs would feel the negative effects of inlining in a more pronounced way. Thanks, Ingo --- include/linux/skbuff.h | 37 +++++++++++++++++++++++++++++++++++-- net/core/skbuff.c | 40 ---------------------------------------- 2 files changed, 35 insertions(+), 42 deletions(-) Index: linux/include/linux/skbuff.h =================================================================== --- linux.orig/include/linux/skbuff.h +++ linux/include/linux/skbuff.h @@ -1120,7 +1120,6 @@ static inline void skb_set_tail_pointer( /* * Add data to an sk_buff */ -extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len); static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len) { unsigned char *tmp = skb_tail_pointer(skb); @@ -1130,7 +1129,24 @@ static inline unsigned char *__skb_put(s return tmp; } -extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len); +/** + * skb_put - add data to a buffer + * @skb: buffer to use + * @len: amount of data to add + * + * This function extends the used data area of the buffer. If this would + * exceed the total buffer size the kernel will panic. A pointer to the + * first byte of the extra data is returned. + */ +static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len) +{ + unsigned char *tmp = __skb_put(skb, len); + + if (unlikely(skb->tail > skb->end)) + skb_over_panic(skb, len, __builtin_return_address(0)); + return tmp; +} + static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len) { skb->data -= len; @@ -1138,6 +1154,23 @@ static inline unsigned char *__skb_push( return skb->data; } +/** + * skb_push - add data to the start of a buffer + * @skb: buffer to use + * @len: amount of data to add + * + * This function extends the used data area of the buffer at the buffer + * start. If this would exceed the total buffer headroom the kernel will + * panic. A pointer to the first byte of the extra data is returned. + */ +static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len) +{ + __skb_push(skb, len); + if (unlikely(skb->data<skb->head)) + skb_under_panic(skb, len, __builtin_return_address(0)); + return skb->data; +} + extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len); static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len) { Index: linux/net/core/skbuff.c =================================================================== --- linux.orig/net/core/skbuff.c +++ linux/net/core/skbuff.c @@ -1000,46 +1000,6 @@ free_skb: EXPORT_SYMBOL(skb_pad); /** - * skb_put - add data to a buffer - * @skb: buffer to use - * @len: amount of data to add - * - * This function extends the used data area of the buffer. If this would - * exceed the total buffer size the kernel will panic. A pointer to the - * first byte of the extra data is returned. - */ -unsigned char *skb_put(struct sk_buff *skb, unsigned int len) -{ - unsigned char *tmp = skb_tail_pointer(skb); - SKB_LINEAR_ASSERT(skb); - skb->tail += len; - skb->len += len; - if (unlikely(skb->tail > skb->end)) - skb_over_panic(skb, len, __builtin_return_address(0)); - return tmp; -} -EXPORT_SYMBOL(skb_put); - -/** - * skb_push - add data to the start of a buffer - * @skb: buffer to use - * @len: amount of data to add - * - * This function extends the used data area of the buffer at the buffer - * start. If this would exceed the total buffer headroom the kernel will - * panic. A pointer to the first byte of the extra data is returned. - */ -unsigned char *skb_push(struct sk_buff *skb, unsigned int len) -{ - skb->data -= len; - skb->len += len; - if (unlikely(skb->data<skb->head)) - skb_under_panic(skb, len, __builtin_return_address(0)); - return skb->data; -} -EXPORT_SYMBOL(skb_push); - -/** * skb_pull - remove data from the start of a buffer * @skb: buffer to use * @len: amount of data to remove ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [numbers] Re: [patch] more skb ops inlining 2009-08-17 21:26 ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar @ 2009-08-18 11:34 ` Heiko Carstens 0 siblings, 0 replies; 34+ messages in thread From: Heiko Carstens @ 2009-08-18 11:34 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, David Miller, akpm, a.p.zijlstra, linux-arch, schwidefsky, arnd, horsth, ehrhardt On Mon, Aug 17, 2009 at 11:26:29PM +0200, Ingo Molnar wrote: > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 > > results in the following vmlinux sizes: > > > > 574152755 2009-08-16 20:05 vmlinux-orig > > 574862000 2009-08-16 20:13 vmlinux-skb-inline > > > > The skb-inline variant was created by using your patch and on top > > of it the patch below. Size increased by 692KB (0.12%). > > > > Dunno.. is that acceptable? > > i ported your patch to latest mainline (see it attached below) i've > done x86(-64) defconfig (with disabled CONFIG_OPTIMIZE_INLINING) > measurements vmlinux: > +static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len) > +{ > + unsigned char *tmp = __skb_put(skb, len); > + > + if (unlikely(skb->tail > skb->end)) > + skb_over_panic(skb, len, __builtin_return_address(0)); > + return tmp; > +} No, the proposed patch replaces skb_over_panic() with BUG_ON() (see below). With that I get slightly different results. However that's on my Thinkpad booted with maxcpus=1 and a 32 bit kernel. Kernel is latest git as of today. Now, this is a completely different machine than yours. With a different config, so the machine actually boots. But the numbers are strange. Part of /proc/cpuinfo : model name : Intel(R) Core(TM)2 CPU T7600 @ 2.33GHz stepping : 6 cache size : 4096 KB Without the patch the result is: TCP latency using localhost: 0.2576 microseconds TCP latency using localhost: 0.2571 microseconds TCP latency using localhost: 0.2579 microseconds TCP latency using localhost: 0.2573 microseconds TCP latency using localhost: 0.2574 microseconds TCP latency using localhost: 0.2583 microseconds TCP latency using localhost: 0.2572 microseconds TCP latency using localhost: 0.2582 microseconds TCP latency using localhost: 0.2583 microseconds TCP latency using localhost: 0.2584 microseconds Performance counter stats for './lat_tcp localhost' (10 runs): 24578.750193 task-clock-msecs # 0.994 CPUs ( +- 0.295% ) 57166815217 cycles # 2325.863 M/sec ( +- 0.296% ) 24994309721 instructions # 0.437 IPC ( +- 0.288% ) 53691289530 L1-icache-loads # 2184.460 M/sec ( +- 0.304% ) 424027 L1-icache-load-misses # 0.017 M/sec ( +- 0.824% ) 24.714691813 seconds time elapsed ( +- 0.285% ) With the patch I get this: TCP latency using localhost: 0.2588 microseconds TCP latency using localhost: 0.2589 microseconds TCP latency using localhost: 0.2581 microseconds TCP latency using localhost: 0.2586 microseconds TCP latency using localhost: 0.2584 microseconds TCP latency using localhost: 0.2588 microseconds TCP latency using localhost: 0.2586 microseconds TCP latency using localhost: 0.2589 microseconds TCP latency using localhost: 0.2582 microseconds TCP latency using localhost: 0.2586 microseconds Performance counter stats for './lat_tcp localhost' (10 runs): 24464.554706 task-clock-msecs # 0.995 CPUs ( +- 0.063% ) 56903708182 cycles # 2325.965 M/sec ( +- 0.063% ) 24831456567 instructions # 0.436 IPC ( +- 0.048% ) 53068943303 L1-icache-loads # 2169.218 M/sec ( +- 0.081% ) 418802 L1-icache-load-misses # 0.017 M/sec ( +- 0.637% ) 24.596311828 seconds time elapsed ( +- 0.097% ) So latency increases while all other numbers decrease.. --- include/linux/skbuff.h | 39 +++++++++++++++++++--- net/core/skbuff.c | 85 ------------------------------------------------- 2 files changed, 33 insertions(+), 91 deletions(-) Index: linux-2.6/include/linux/skbuff.h =================================================================== --- linux-2.6.orig/include/linux/skbuff.h +++ linux-2.6/include/linux/skbuff.h @@ -483,10 +483,6 @@ extern int skb_cow_data(struct sk extern int skb_pad(struct sk_buff *skb, int pad); #define dev_kfree_skb(a) consume_skb(a) #define dev_consume_skb(a) kfree_skb_clean(a) -extern void skb_over_panic(struct sk_buff *skb, int len, - void *here); -extern void skb_under_panic(struct sk_buff *skb, int len, - void *here); extern int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb, int getfrag(void *from, char *to, int offset, @@ -1120,7 +1116,6 @@ static inline void skb_set_tail_pointer( /* * Add data to an sk_buff */ -extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len); static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len) { unsigned char *tmp = skb_tail_pointer(skb); @@ -1130,7 +1125,23 @@ static inline unsigned char *__skb_put(s return tmp; } -extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len); +/** + * skb_put - add data to a buffer + * @skb: buffer to use + * @len: amount of data to add + * + * This function extends the used data area of the buffer. If this would + * exceed the total buffer size the kernel will panic. A pointer to the + * first byte of the extra data is returned. + */ +static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len) +{ + unsigned char *tmp = __skb_put(skb, len); + + BUG_ON(skb->tail > skb->end); + return tmp; +} + static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len) { skb->data -= len; @@ -1138,6 +1149,22 @@ static inline unsigned char *__skb_push( return skb->data; } +/** + * skb_push - add data to the start of a buffer + * @skb: buffer to use + * @len: amount of data to add + * + * This function extends the used data area of the buffer at the buffer + * start. If this would exceed the total buffer headroom the kernel will + * panic. A pointer to the first byte of the extra data is returned. + */ +static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len) +{ + __skb_push(skb, len); + BUG_ON(skb->data < skb->head); + return skb->data; +} + extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len); static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len) { Index: linux-2.6/net/core/skbuff.c =================================================================== --- linux-2.6.orig/net/core/skbuff.c +++ linux-2.6/net/core/skbuff.c @@ -103,51 +103,6 @@ static struct pipe_buf_operations sock_p .get = sock_pipe_buf_get, }; -/* - * Keep out-of-line to prevent kernel bloat. - * __builtin_return_address is not used because it is not always - * reliable. - */ - -/** - * skb_over_panic - private function - * @skb: buffer - * @sz: size - * @here: address - * - * Out of line support code for skb_put(). Not user callable. - */ -void skb_over_panic(struct sk_buff *skb, int sz, void *here) -{ - printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p " - "data:%p tail:%#lx end:%#lx dev:%s\n", - here, skb->len, sz, skb->head, skb->data, - (unsigned long)skb->tail, (unsigned long)skb->end, - skb->dev ? skb->dev->name : "<NULL>"); - BUG(); -} -EXPORT_SYMBOL(skb_over_panic); - -/** - * skb_under_panic - private function - * @skb: buffer - * @sz: size - * @here: address - * - * Out of line support code for skb_push(). Not user callable. - */ - -void skb_under_panic(struct sk_buff *skb, int sz, void *here) -{ - printk(KERN_EMERG "skb_under_panic: text:%p len:%d put:%d head:%p " - "data:%p tail:%#lx end:%#lx dev:%s\n", - here, skb->len, sz, skb->head, skb->data, - (unsigned long)skb->tail, (unsigned long)skb->end, - skb->dev ? skb->dev->name : "<NULL>"); - BUG(); -} -EXPORT_SYMBOL(skb_under_panic); - /* Allocate a new skbuff. We do this ourselves so we can fill in a few * 'private' fields and also do memory statistics to find all the * [BEEP] leaks. @@ -1000,46 +955,6 @@ free_skb: EXPORT_SYMBOL(skb_pad); /** - * skb_put - add data to a buffer - * @skb: buffer to use - * @len: amount of data to add - * - * This function extends the used data area of the buffer. If this would - * exceed the total buffer size the kernel will panic. A pointer to the - * first byte of the extra data is returned. - */ -unsigned char *skb_put(struct sk_buff *skb, unsigned int len) -{ - unsigned char *tmp = skb_tail_pointer(skb); - SKB_LINEAR_ASSERT(skb); - skb->tail += len; - skb->len += len; - if (unlikely(skb->tail > skb->end)) - skb_over_panic(skb, len, __builtin_return_address(0)); - return tmp; -} -EXPORT_SYMBOL(skb_put); - -/** - * skb_push - add data to the start of a buffer - * @skb: buffer to use - * @len: amount of data to add - * - * This function extends the used data area of the buffer at the buffer - * start. If this would exceed the total buffer headroom the kernel will - * panic. A pointer to the first byte of the extra data is returned. - */ -unsigned char *skb_push(struct sk_buff *skb, unsigned int len) -{ - skb->data -= len; - skb->len += len; - if (unlikely(skb->data<skb->head)) - skb_under_panic(skb, len, __builtin_return_address(0)); - return skb->data; -} -EXPORT_SYMBOL(skb_push); - -/** * skb_pull - remove data from the start of a buffer * @skb: buffer to use * @len: amount of data to remove ^ permalink raw reply [flat|nested] 34+ messages in thread
* [patch 0/3] Allow inlined spinlocks again V4
@ 2009-08-14 12:58 Heiko Carstens
2009-08-14 12:58 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens
0 siblings, 1 reply; 34+ messages in thread
From: Heiko Carstens @ 2009-08-14 12:58 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt, Nick Piggin
This patch set allows to have inlined spinlocks again.
The rationale behind this is that function calls on at least s390 are
expensive.
If one considers that server kernels are usually compiled with
!CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop.
The extra overhead for a function call is significant.
With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390.
These numbers were taken with some network benchmarks. However I expect
any workload that calls frequently into the kernel and which grabs a few
locks to perform better.
The implementation is straight forward: move the function bodies of the
locking functions to static inline functions and place them in a header
file.
By default all locking code remains out-of-line. An architecture can
specify
#define __spin_lock_is_small
in arch/<whatever>/include/asm/spinlock.h to force inlining of a locking
function.
V2: rewritten from scratch - now also with readable code
V3: removed macro to generate out-of-line spinlock variants since that
would break ctags. As requested by Arnd Bergmann.
V4: allow architectures to specify for each lock/unlock variant if
it should be kept out-of-line or inlined.
^ permalink raw reply [flat|nested] 34+ messages in thread* [patch 2/3] spinlock: allow inlined spinlocks 2009-08-14 12:58 [patch 0/3] Allow inlined spinlocks again V4 Heiko Carstens @ 2009-08-14 12:58 ` Heiko Carstens 2009-08-16 17:57 ` Heiko Carstens 0 siblings, 1 reply; 34+ messages in thread From: Heiko Carstens @ 2009-08-14 12:58 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin [-- Attachment #1: 02_spinlock_ifdef.diff --] [-- Type: text/plain, Size: 11770 bytes --] From: Heiko Carstens <heiko.carstens@de.ibm.com> This allows an architecture to specify per lock variant if the locking code should be kept out-of-line or inlined. If an architecure wants out-of-line locking code no change is needed. To force inlining of e.g. spin_lock() the line #define __spin_lock_is_small needs to be added to arch/<whatever>/include/asm/spinlock.h Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> --- include/linux/spinlock_api_smp.h | 146 ++++++++++++++++++++++++++++++++++++++- kernel/spinlock.c | 56 ++++++++++++++ 2 files changed, 199 insertions(+), 3 deletions(-) Index: linux-2.6/include/linux/spinlock_api_smp.h =================================================================== --- linux-2.6.orig/include/linux/spinlock_api_smp.h +++ linux-2.6/include/linux/spinlock_api_smp.h @@ -19,46 +19,186 @@ int in_lock_functions(unsigned long addr #define assert_spin_locked(x) BUG_ON(!spin_is_locked(x)) -void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); void __lockfunc _spin_lock_nested(spinlock_t *lock, int subclass) __acquires(lock); void __lockfunc _spin_lock_nest_lock(spinlock_t *lock, struct lockdep_map *map) __acquires(lock); +unsigned long __lockfunc _spin_lock_irqsave_nested(spinlock_t *lock, int subclass) + __acquires(lock); + +#ifdef __spin_lock_is_small +#define _spin_lock(lock) __spin_lock(lock) +#else +void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); +#endif + +#ifdef __read_lock_is_small +#define _read_lock(lock) __read_lock(lock) +#else void __lockfunc _read_lock(rwlock_t *lock) __acquires(lock); +#endif + +#ifdef __write_lock_is_small +#define _write_lock(lock) __write_lock(lock) +#else void __lockfunc _write_lock(rwlock_t *lock) __acquires(lock); +#endif + +#ifdef __spin_lock_bh_is_small +#define _spin_lock_bh(lock) __spin_lock_bh(lock) +#else void __lockfunc _spin_lock_bh(spinlock_t *lock) __acquires(lock); +#endif + +#ifdef __read_lock_bh_is_small +#define _read_lock_bh(lock) __read_lock_bh(lock) +#else void __lockfunc _read_lock_bh(rwlock_t *lock) __acquires(lock); +#endif + +#ifdef __write_lock_bh_is_small +#define _write_lock_bh(lock) __write_lock_bh(lock) +#else void __lockfunc _write_lock_bh(rwlock_t *lock) __acquires(lock); +#endif + +#ifdef __spin_lock_irq_is_small +#define _spin_lock_irq(lock) __spin_lock_irq(lock) +#else void __lockfunc _spin_lock_irq(spinlock_t *lock) __acquires(lock); +#endif + +#ifdef __read_lock_irq_is_small +#define _read_lock_irq(lock) __read_lock_irq(lock) +#else void __lockfunc _read_lock_irq(rwlock_t *lock) __acquires(lock); +#endif + +#ifdef __write_lock_irq_is_small +#define _write_lock_irq(lock) __write_lock_irq(lock) +#else void __lockfunc _write_lock_irq(rwlock_t *lock) __acquires(lock); +#endif + +#ifdef __spin_lock_irqsave_is_small +#define _spin_lock_irqsave(lock) __spin_lock_irqsave(lock) +#else unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock) __acquires(lock); -unsigned long __lockfunc _spin_lock_irqsave_nested(spinlock_t *lock, int subclass) - __acquires(lock); +#endif + +#ifdef __read_lock_irqsave_is_small +#define _read_lock_irqsave(lock) __read_lock_irqsave(lock) +#else unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock) __acquires(lock); +#endif + +#ifdef __write_lock_irqsave_is_small +#define _write_lock_irqsave(lock) __write_lock_irqsave(lock) +#else unsigned long __lockfunc _write_lock_irqsave(rwlock_t *lock) __acquires(lock); +#endif + +#ifdef __spin_trylock_is_small +#define _spin_trylock(lock) __spin_trylock(lock) +#else int __lockfunc _spin_trylock(spinlock_t *lock); +#endif + +#ifdef __read_trylock_is_small +#define _read_trylock(lock) __read_trylock(lock) +#else int __lockfunc _read_trylock(rwlock_t *lock); +#endif + +#ifdef __write_trylock_is_small +#define _write_trylock(lock) __write_trylock(lock) +#else int __lockfunc _write_trylock(rwlock_t *lock); +#endif + +#ifdef __spin_trylock_bh_is_small +#define _spin_trylock_bh(lock) __spin_trylock_bh(lock) +#else int __lockfunc _spin_trylock_bh(spinlock_t *lock); +#endif + +#ifdef __spin_unlock_is_small +#define _spin_unlock(lock) __spin_unlock(lock) +#else void __lockfunc _spin_unlock(spinlock_t *lock) __releases(lock); +#endif + +#ifdef __read_unlock_is_small +#define _read_unlock(lock) __read_unlock(lock) +#else void __lockfunc _read_unlock(rwlock_t *lock) __releases(lock); +#endif + +#ifdef __write_unlock_is_small +#define _write_unlock(lock) __write_unlock(lock) +#else void __lockfunc _write_unlock(rwlock_t *lock) __releases(lock); +#endif + +#ifdef __spin_unlock_bh_is_small +#define _spin_unlock_bh(lock) __spin_unlock_bh(lock) +#else void __lockfunc _spin_unlock_bh(spinlock_t *lock) __releases(lock); +#endif + +#ifdef __read_unlock_bh_is_small +#define _read_unlock_bh(lock) __read_unlock_bh(lock) +#else void __lockfunc _read_unlock_bh(rwlock_t *lock) __releases(lock); +#endif + +#ifdef __write_unlock_bh_is_small +#define _write_unlock_bh(lock) __write_unlock_bh(lock) +#else void __lockfunc _write_unlock_bh(rwlock_t *lock) __releases(lock); +#endif + +#ifdef __spin_unlock_irq_is_small +#define _spin_unlock_irq(lock) __spin_unlock_irq(lock) +#else void __lockfunc _spin_unlock_irq(spinlock_t *lock) __releases(lock); +#endif + +#ifdef __read_unlock_irq_is_small +#define _read_unlock_irq(lock) __read_unlock_irq(lock) +#else void __lockfunc _read_unlock_irq(rwlock_t *lock) __releases(lock); +#endif + +#ifdef __write_unlock_irq_is_small +#define _write_unlock_irq(lock) __write_unlock_irq(lock) +#else void __lockfunc _write_unlock_irq(rwlock_t *lock) __releases(lock); +#endif + +#ifdef __spin_unlock_irqrestore_is_small +#define _spin_unlock_irqrestore(lock, flags) __spin_unlock_irqrestore(lock, flags) +#else void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags) __releases(lock); +#endif + +#ifdef __read_unlock_irqrestore_is_small +#define _read_unlock_irqrestore(lock, flags) __read_unlock_irqrestore(lock, flags) +#else void __lockfunc _read_unlock_irqrestore(rwlock_t *lock, unsigned long flags) __releases(lock); +#endif + +#ifdef __write_unlock_irqrestore_is_small +#define _write_unlock_irqrestore(lock, flags) __write_unlock_irqrestore(lock, flags) +#else void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags) __releases(lock); +#endif static inline int __spin_trylock(spinlock_t *lock) { Index: linux-2.6/kernel/spinlock.c =================================================================== --- linux-2.6.orig/kernel/spinlock.c +++ linux-2.6/kernel/spinlock.c @@ -21,23 +21,29 @@ #include <linux/debug_locks.h> #include <linux/module.h> +#ifndef __spin_trylock_is_small int __lockfunc _spin_trylock(spinlock_t *lock) { return __spin_trylock(lock); } EXPORT_SYMBOL(_spin_trylock); +#endif +#ifndef __read_trylock_is_small int __lockfunc _read_trylock(rwlock_t *lock) { return __read_trylock(lock); } EXPORT_SYMBOL(_read_trylock); +#endif +#ifndef __write_trylock_is_small int __lockfunc _write_trylock(rwlock_t *lock) { return __write_trylock(lock); } EXPORT_SYMBOL(_write_trylock); +#endif /* * If lockdep is enabled then we use the non-preemption spin-ops @@ -46,77 +52,101 @@ EXPORT_SYMBOL(_write_trylock); */ #if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC) +#ifndef __read_lock_is_small void __lockfunc _read_lock(rwlock_t *lock) { __read_lock(lock); } EXPORT_SYMBOL(_read_lock); +#endif +#ifndef __spin_lock_irqsave_is_small unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock) { return __spin_lock_irqsave(lock); } EXPORT_SYMBOL(_spin_lock_irqsave); +#endif +#ifndef __spin_lock_irq_is_small void __lockfunc _spin_lock_irq(spinlock_t *lock) { __spin_lock_irq(lock); } EXPORT_SYMBOL(_spin_lock_irq); +#endif +#ifndef __spin_lock_bh_is_small void __lockfunc _spin_lock_bh(spinlock_t *lock) { __spin_lock_bh(lock); } EXPORT_SYMBOL(_spin_lock_bh); +#endif +#ifndef __read_lock_irqsave_is_small unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock) { return __read_lock_irqsave(lock); } EXPORT_SYMBOL(_read_lock_irqsave); +#endif +#ifndef __read_lock_irq_is_small void __lockfunc _read_lock_irq(rwlock_t *lock) { __read_lock_irq(lock); } EXPORT_SYMBOL(_read_lock_irq); +#endif +#ifndef __read_lock_bh_is_small void __lockfunc _read_lock_bh(rwlock_t *lock) { __read_lock_bh(lock); } EXPORT_SYMBOL(_read_lock_bh); +#endif +#ifndef __write_lock_irqsave_is_small unsigned long __lockfunc _write_lock_irqsave(rwlock_t *lock) { return __write_lock_irqsave(lock); } EXPORT_SYMBOL(_write_lock_irqsave); +#endif +#ifndef __write_lock_irq_is_small void __lockfunc _write_lock_irq(rwlock_t *lock) { __write_lock_irq(lock); } EXPORT_SYMBOL(_write_lock_irq); +#endif +#ifndef __write_lock_bh_is_small void __lockfunc _write_lock_bh(rwlock_t *lock) { __write_lock_bh(lock); } EXPORT_SYMBOL(_write_lock_bh); +#endif +#ifndef __spin_lock_is_small void __lockfunc _spin_lock(spinlock_t *lock) { __spin_lock(lock); } EXPORT_SYMBOL(_spin_lock); +#endif +#ifndef __write_lock_is_small void __lockfunc _write_lock(rwlock_t *lock) { __write_lock(lock); } EXPORT_SYMBOL(_write_lock); +#endif #else /* CONFIG_PREEMPT: */ @@ -242,83 +272,109 @@ EXPORT_SYMBOL(_spin_lock_nest_lock); #endif +#ifndef __spin_unlock_is_small void __lockfunc _spin_unlock(spinlock_t *lock) { __spin_unlock(lock); } EXPORT_SYMBOL(_spin_unlock); +#endif +#ifndef __write_unlock_is_small void __lockfunc _write_unlock(rwlock_t *lock) { __write_unlock(lock); } EXPORT_SYMBOL(_write_unlock); +#endif +#ifndef __read_unlock_is_small void __lockfunc _read_unlock(rwlock_t *lock) { __read_unlock(lock); } EXPORT_SYMBOL(_read_unlock); +#endif +#ifndef __spin_unlock_irqrestore_is_small void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags) { __spin_unlock_irqrestore(lock, flags); } EXPORT_SYMBOL(_spin_unlock_irqrestore); +#endif +#ifndef __spin_unlock_irq_is_small void __lockfunc _spin_unlock_irq(spinlock_t *lock) { __spin_unlock_irq(lock); } EXPORT_SYMBOL(_spin_unlock_irq); +#endif +#ifndef __spin_unlock_bh_is_small void __lockfunc _spin_unlock_bh(spinlock_t *lock) { __spin_unlock_bh(lock); } EXPORT_SYMBOL(_spin_unlock_bh); +#endif +#ifndef __read_unlock_irqrestore_is_small void __lockfunc _read_unlock_irqrestore(rwlock_t *lock, unsigned long flags) { __read_unlock_irqrestore(lock, flags); } EXPORT_SYMBOL(_read_unlock_irqrestore); +#endif +#ifndef __read_unlock_irq_is_small void __lockfunc _read_unlock_irq(rwlock_t *lock) { __read_unlock_irq(lock); } EXPORT_SYMBOL(_read_unlock_irq); +#endif +#ifndef __read_unlock_bh_is_small void __lockfunc _read_unlock_bh(rwlock_t *lock) { __read_unlock_bh(lock); } EXPORT_SYMBOL(_read_unlock_bh); +#endif +#ifndef __write_unlock_irqrestore_is_small void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags) { __write_unlock_irqrestore(lock, flags); } EXPORT_SYMBOL(_write_unlock_irqrestore); +#endif +#ifndef __write_unlock_irq_is_small void __lockfunc _write_unlock_irq(rwlock_t *lock) { __write_unlock_irq(lock); } EXPORT_SYMBOL(_write_unlock_irq); +#endif +#ifndef __write_unlock_bh_is_small void __lockfunc _write_unlock_bh(rwlock_t *lock) { __write_unlock_bh(lock); } EXPORT_SYMBOL(_write_unlock_bh); +#endif +#ifndef __spin_trylock_bh_is_small int __lockfunc _spin_trylock_bh(spinlock_t *lock) { return __spin_trylock_bh(lock); } EXPORT_SYMBOL(_spin_trylock_bh); +#endif notrace int in_lock_functions(unsigned long addr) { -- ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-14 12:58 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens @ 2009-08-16 17:57 ` Heiko Carstens 2009-08-16 18:06 ` Ingo Molnar 2009-08-16 18:22 ` Linus Torvalds 0 siblings, 2 replies; 34+ messages in thread From: Heiko Carstens @ 2009-08-16 17:57 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin Hi Linus, Andrew, do you have any objections to the approach below? Just wondering, since I didn't receive any comments. On Fri, Aug 14, 2009 at 02:58:03PM +0200, Heiko Carstens wrote: > From: Heiko Carstens <heiko.carstens@de.ibm.com> > > This allows an architecture to specify per lock variant if the > locking code should be kept out-of-line or inlined. > > If an architecure wants out-of-line locking code no change is > needed. To force inlining of e.g. spin_lock() the line > > #define __spin_lock_is_small > > needs to be added to arch/<whatever>/include/asm/spinlock.h > > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> > --- > include/linux/spinlock_api_smp.h | 146 ++++++++++++++++++++++++++++++++++++++- > kernel/spinlock.c | 56 ++++++++++++++ > 2 files changed, 199 insertions(+), 3 deletions(-) > > Index: linux-2.6/include/linux/spinlock_api_smp.h > =================================================================== > --- linux-2.6.orig/include/linux/spinlock_api_smp.h > +++ linux-2.6/include/linux/spinlock_api_smp.h > @@ -19,46 +19,186 @@ int in_lock_functions(unsigned long addr > > #define assert_spin_locked(x) BUG_ON(!spin_is_locked(x)) > > -void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > void __lockfunc _spin_lock_nested(spinlock_t *lock, int subclass) > __acquires(lock); > void __lockfunc _spin_lock_nest_lock(spinlock_t *lock, struct lockdep_map *map) > __acquires(lock); > +unsigned long __lockfunc _spin_lock_irqsave_nested(spinlock_t *lock, int subclass) > + __acquires(lock); > + > +#ifdef __spin_lock_is_small > +#define _spin_lock(lock) __spin_lock(lock) > +#else > +void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > +#endif ... > Index: linux-2.6/kernel/spinlock.c > =================================================================== > --- linux-2.6.orig/kernel/spinlock.c > +++ linux-2.6/kernel/spinlock.c ... > +#ifndef __spin_lock_is_small > void __lockfunc _spin_lock(spinlock_t *lock) > { > __spin_lock(lock); > } > EXPORT_SYMBOL(_spin_lock); > +#endif ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 17:57 ` Heiko Carstens @ 2009-08-16 18:06 ` Ingo Molnar 2009-08-16 18:43 ` Linus Torvalds 2009-08-16 18:44 ` Heiko Carstens 2009-08-16 18:22 ` Linus Torvalds 1 sibling, 2 replies; 34+ messages in thread From: Ingo Molnar @ 2009-08-16 18:06 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > #define __spin_lock_is_small > > +unsigned long __lockfunc _spin_lock_irqsave_nested(spinlock_t *lock, int subclass) > > + __acquires(lock); > > + > > +#ifdef __spin_lock_is_small > > +#define _spin_lock(lock) __spin_lock(lock) > > +#else > > +void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > > +#endif Dunno - i'm somewhat wary of introducing a 2^28 variability here. (although the number of real variations is much lower - but still). What's the current situation on s390, precisely which of the 28 lock functions are a win to be inlined and which ones are a loss? Do you have a list/table perhaps? Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 18:06 ` Ingo Molnar @ 2009-08-16 18:43 ` Linus Torvalds 2009-08-16 20:24 ` Ingo Molnar 2009-08-16 18:44 ` Heiko Carstens 1 sibling, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2009-08-16 18:43 UTC (permalink / raw) To: Ingo Molnar Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin On Sun, 16 Aug 2009, Ingo Molnar wrote: > > What's the current situation on s390, precisely which of the 28 lock > functions are a win to be inlined and which ones are a loss? Do you > have a list/table perhaps? Let's look at x86 instead. The one I can _guarantee_ is worth inlining is "spin_unlock()", since it just generates a single "incb %m" or whatever. No loops, no conditionals, no nuthing. It's not even a locked instruction. Right now we literally generate this function for it: 0xffffffff81420d74 <_spin_unlock+0>: push %rbp 0xffffffff81420d75 <_spin_unlock+1>: mov %rsp,%rbp 0xffffffff81420d78 <_spin_unlock+4>: incb (%rdi) 0xffffffff81420d7a <_spin_unlock+6>: leaveq 0xffffffff81420d7b <_spin_unlock+7>: retq iow, the actual "bulk" of that function is a single two-byte instruction. And for that we generate a whole 5-byte "call" instruction, along with all the costs of fixed register scheduling and stupid spilling etc. read_unlock and write_unlock are similar, and are lock incl (%rdi) // 3 bytes and lock addl $0x1000000,(%rdi) // 7 bytes respectively. At 7 bytes, write_unlock() is still likely to be smaller inlined (because you avoid the register games). Other cases on x86 that would be smaller in-lined: - _spin_unlock_irq: 3 bytes incb (%rdi) sti - _spin_unlock_irqrestore: 4 bytes incb (%rdi) push %rsi popfq - _read_unlock_irq/_read_unlock_irqrestore (4 and 5 bytes respectively): lock incl (%rdi) sti / push+popfq but not, for example, any of the locking functions, nor any of the "_bh" versions (because local_bh_enable ends up pretty complicated, unlike local_bh_disable). Nor even perhaps - _write_unlock_irqrestore: (9 bytes) lock addl $0x1000000,(%rdi) push %rsi popfq which is starting to get to the point where a call _may_ be smaller (largely due to that big constant). And '_spin_lock()' is already too big to inline: mov $0x100,%eax lock xadd %ax,(%rdi) cmp %ah,%al je 2f pause mov (%rdi),%al je 1b which is 20 bytes or so, and that's the simplest of the locking cases. So you really do have a mix of "want to inline" and "do not want to inline". Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 18:43 ` Linus Torvalds @ 2009-08-16 20:24 ` Ingo Molnar 2009-08-16 21:07 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2009-08-16 20:24 UTC (permalink / raw) To: Linus Torvalds Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sun, 16 Aug 2009, Ingo Molnar wrote: > > > > What's the current situation on s390, precisely which of the 28 lock > > functions are a win to be inlined and which ones are a loss? Do you > > have a list/table perhaps? > > Let's look at x86 instead. > > The one I can _guarantee_ is worth inlining is "spin_unlock()", > since it just generates a single "incb %m" or whatever. [...] We do that already for that reason: /* * We inline the unlock functions in the nondebug case: */ #if defined(CONFIG_DEBUG_SPINLOCK) || defined(CONFIG_PREEMPT) || \ !defined(CONFIG_SMP) # define spin_unlock(lock) _spin_unlock(lock) # define read_unlock(lock) _read_unlock(lock) # define write_unlock(lock) _write_unlock(lock) # define spin_unlock_irq(lock) _spin_unlock_irq(lock) # define read_unlock_irq(lock) _read_unlock_irq(lock) # define write_unlock_irq(lock) _write_unlock_irq(lock) this works fine here: ffffffff8103d359 <finish_task_switch>: ... ffffffff8103d393: fe 03 incb (%rbx) ffffffff8103d395: fb sti that's an inlined spin_unlock_irq() in finish_task_switch(). We dont do this if DEBUG_SPINLOCK is defined (which bloats the function) or if CONFIG_PREEMPT is defined. Maybe the latter case could be revised although generally it's not worth inlining functions that also touch task-struct / threadinfo. Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 20:24 ` Ingo Molnar @ 2009-08-16 21:07 ` Linus Torvalds 2009-08-16 21:18 ` Ingo Molnar 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2009-08-16 21:07 UTC (permalink / raw) To: Ingo Molnar Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin On Sun, 16 Aug 2009, Ingo Molnar wrote: > > We do that already for that reason: Ahh, we do. I was mislead by the fact that we still generate the out-of-line ones, even though we don't use them. Which is fine, of course. It doesn't do the irqrestore versions, but that's probably not a big deal. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 21:07 ` Linus Torvalds @ 2009-08-16 21:18 ` Ingo Molnar 0 siblings, 0 replies; 34+ messages in thread From: Ingo Molnar @ 2009-08-16 21:18 UTC (permalink / raw) To: Linus Torvalds Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sun, 16 Aug 2009, Ingo Molnar wrote: > > > > We do that already for that reason: > > Ahh, we do. I was mislead by the fact that we still generate the > out-of-line ones, even though we don't use them. Which is fine, of > course. > > It doesn't do the irqrestore versions, but that's probably not a > big deal. i remember having done gradual builds and turning the inlining of various locking functions on/off to see how it impacts the text size, so we were quite close to the optimal setup ... then. (two years ago?) But ... the fact that you didnt notice it shows that the inlining decisions in spinlock.h are less than transparent and not really cleanly done. Even if we dont change a thing on the x86 side the s390 observations are obviously valid, and Heiko's generalization looks more maintainable in terms of tweakability, as it's per function granular and per arch as well. OTOH, it needs some work: the CONFIG_PREEMPT and CONFIG_DEBUG_SPINLOCKS dependency should be expressed too - and i think CONFIG_PREEMPT is probably was bit s390. And at that point we've got 2^3 * 2^28 * 20 (configs*functions*archs) variants, and i feel a bit uneasy about that kind of complexity (even if the typical cases will be much simpler than that). We wont really get it fully right, we'll just drown in that. Dammit, why cannot the compiler get this right, working it around in the kernel source makes things so ugly :-( It's not rocket science to do a good inliner but IMO GCC messed their design up on day one by separating the assembler from the C compiler, hence their inliner cannot really know about how large instructions are. Rather stupid IMHO. Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 18:06 ` Ingo Molnar 2009-08-16 18:43 ` Linus Torvalds @ 2009-08-16 18:44 ` Heiko Carstens 2009-08-16 20:48 ` Ingo Molnar 1 sibling, 1 reply; 34+ messages in thread From: Heiko Carstens @ 2009-08-16 18:44 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin On Sun, Aug 16, 2009 at 08:06:31PM +0200, Ingo Molnar wrote: > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > #define __spin_lock_is_small > > > > +unsigned long __lockfunc _spin_lock_irqsave_nested(spinlock_t *lock, int subclass) > > > + __acquires(lock); > > > + > > > +#ifdef __spin_lock_is_small > > > +#define _spin_lock(lock) __spin_lock(lock) > > > +#else > > > +void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > > > +#endif > > Dunno - i'm somewhat wary of introducing a 2^28 variability here. > (although the number of real variations is much lower - but still). > > What's the current situation on s390, precisely which of the 28 lock > functions are a win to be inlined and which ones are a loss? Do you > have a list/table perhaps? No list unfortunately. However, the variants we really care about are only the spin_locks. The *try_lock variants are also not that important. So we end up with eight ifdefs (spin_lock/bh/irq/irq_save + unlock). I'll change the patches according to Linus' comments and send them again. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 18:44 ` Heiko Carstens @ 2009-08-16 20:48 ` Ingo Molnar 2009-08-16 21:33 ` Heiko Carstens 0 siblings, 1 reply; 34+ messages in thread From: Ingo Molnar @ 2009-08-16 20:48 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Sun, Aug 16, 2009 at 08:06:31PM +0200, Ingo Molnar wrote: > > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > > #define __spin_lock_is_small > > > > > > +unsigned long __lockfunc _spin_lock_irqsave_nested(spinlock_t *lock, int subclass) > > > > + __acquires(lock); > > > > + > > > > +#ifdef __spin_lock_is_small > > > > +#define _spin_lock(lock) __spin_lock(lock) > > > > +#else > > > > +void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > > > > +#endif > > > > Dunno - i'm somewhat wary of introducing a 2^28 variability here. > > (although the number of real variations is much lower - but still). > > > > What's the current situation on s390, precisely which of the 28 lock > > functions are a win to be inlined and which ones are a loss? Do you > > have a list/table perhaps? > > No list unfortunately. [...] Well, if you dont know the functions you want to inline, how will you make intelligent use of this facility then in s390? Btw., i just noticed that s390 has CONFIG_PREEMPT turned on by default in its defconfig. Have you made your measurements with CONFIG_PREEMPT? If yes then the current inlining rules in spinlock.h will turn all the locking APIs into functions. Could you perhaps re-try your measurements with CONFIG_PREEMPT turned off? I suspect you'll see different behavior. Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 20:48 ` Ingo Molnar @ 2009-08-16 21:33 ` Heiko Carstens 2009-08-16 21:36 ` Ingo Molnar 0 siblings, 1 reply; 34+ messages in thread From: Heiko Carstens @ 2009-08-16 21:33 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin On Sun, Aug 16, 2009 at 10:48:31PM +0200, Ingo Molnar wrote: > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > On Sun, Aug 16, 2009 at 08:06:31PM +0200, Ingo Molnar wrote: > > > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > > > #define __spin_lock_is_small > > > > > > > > +unsigned long __lockfunc _spin_lock_irqsave_nested(spinlock_t *lock, int subclass) > > > > > + __acquires(lock); > > > > > + > > > > > +#ifdef __spin_lock_is_small > > > > > +#define _spin_lock(lock) __spin_lock(lock) > > > > > +#else > > > > > +void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > > > > > +#endif > > > > > > Dunno - i'm somewhat wary of introducing a 2^28 variability here. > > > (although the number of real variations is much lower - but still). > > > > > > What's the current situation on s390, precisely which of the 28 lock > > > functions are a win to be inlined and which ones are a loss? Do you > > > have a list/table perhaps? > > > > No list unfortunately. [...] > > Well, if you dont know the functions you want to inline, how will > you make intelligent use of this facility then in s390? What I tried to say: in general we want to have all locking functions inlined. This is because all our measurements show if anything gets inlined performance improves. This is even true for large functions. > Btw., i just noticed that s390 has CONFIG_PREEMPT turned on by > default in its defconfig. Have you made your measurements with > CONFIG_PREEMPT? If yes then the current inlining rules in spinlock.h > will turn all the locking APIs into functions. All measurements were done with CONFIG_PREEMPT off. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 21:33 ` Heiko Carstens @ 2009-08-16 21:36 ` Ingo Molnar 0 siblings, 0 replies; 34+ messages in thread From: Ingo Molnar @ 2009-08-16 21:36 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, Linus Torvalds, Peter Zijlstra, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Sun, Aug 16, 2009 at 10:48:31PM +0200, Ingo Molnar wrote: > > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > On Sun, Aug 16, 2009 at 08:06:31PM +0200, Ingo Molnar wrote: > > > > * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > > > > > > #define __spin_lock_is_small > > > > > > > > > > +unsigned long __lockfunc _spin_lock_irqsave_nested(spinlock_t *lock, int subclass) > > > > > > + __acquires(lock); > > > > > > + > > > > > > +#ifdef __spin_lock_is_small > > > > > > +#define _spin_lock(lock) __spin_lock(lock) > > > > > > +#else > > > > > > +void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > > > > > > +#endif > > > > > > > > Dunno - i'm somewhat wary of introducing a 2^28 variability here. > > > > (although the number of real variations is much lower - but still). > > > > > > > > What's the current situation on s390, precisely which of the 28 lock > > > > functions are a win to be inlined and which ones are a loss? Do you > > > > have a list/table perhaps? > > > > > > No list unfortunately. [...] > > > > Well, if you dont know the functions you want to inline, how > > will you make intelligent use of this facility then in s390? > > What I tried to say: in general we want to have all locking > functions inlined. This is because all our measurements show if > anything gets inlined performance improves. This is even true for > large functions. > > > Btw., i just noticed that s390 has CONFIG_PREEMPT turned on by > > default in its defconfig. Have you made your measurements with > > CONFIG_PREEMPT? If yes then the current inlining rules in > > spinlock.h will turn all the locking APIs into functions. > > All measurements were done with CONFIG_PREEMPT off. ok. ( Btw., regardless of this patch-set, i suspect you _really_ want compiler help for that - one with a global scope and which can go inline even larger functions, and even if they were declared global in the kernel. ) Ingo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 17:57 ` Heiko Carstens 2009-08-16 18:06 ` Ingo Molnar @ 2009-08-16 18:22 ` Linus Torvalds 2009-08-17 15:46 ` Heiko Carstens 1 sibling, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2009-08-16 18:22 UTC (permalink / raw) To: Heiko Carstens Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin On Sun, 16 Aug 2009, Heiko Carstens wrote: > > do you have any objections to the approach below? > Just wondering, since I didn't receive any comments. Other things going on, but here goes: > > +#ifdef __spin_lock_is_small > > +#define _spin_lock(lock) __spin_lock(lock) > > +#else > > +void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > > +#endif The above repeats many times, and I found that irritating. I think you should be able to get rid of a _lot_ of that by doing it like this instead extern void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); ... plain declarations for all cases, no #ifdef's ... and then at the end you just do #ifdef __spin_lock_is_small #define _spin_lock(lock) __spin_lock(lock) #endif .. and at least make that repeating section be composed of simpler and smaller entries. Now, I actually have a disgusting way to avoid the #ifdef's entirely, but I'm not sure it's worth it. You'd do the same unconditional function declarations, but then we can also do some unconditional CPP expansions: #define __define2_lock(name,extent) name##extent #define __define_lock(name,extent) __define2_lock(name,extent) #define define_lock(name) __define_lock(__##name, __##name##_is_small) // for each lock type #define __spin_lock__spin_lock_is_small _spin_lock #define _spin_lock define_lock(spin_lock) and what happens is that if '__spin_lock_is_small' is undefined, you end up having '_spin_lock()' expand to '__spin_lock__spin_lock_is_small()' and then back to _spin_lock, but if it's #defined (to empty), it gets defined to __spin_lock() instead. No ifdefs, just two unconditional #define lines for each lock type. Ugly? Yes. Less ugly than having variations of that #ifdef repeated many times? I dunno. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [patch 2/3] spinlock: allow inlined spinlocks 2009-08-16 18:22 ` Linus Torvalds @ 2009-08-17 15:46 ` Heiko Carstens 0 siblings, 0 replies; 34+ messages in thread From: Heiko Carstens @ 2009-08-17 15:46 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch, Martin Schwidefsky, Arnd Bergmann, Horst Hartmann, Christian Ehrhardt, Nick Piggin On Sun, Aug 16, 2009 at 11:22:45AM -0700, Linus Torvalds wrote: > Other things going on, but here goes: > > > +#ifdef __spin_lock_is_small > > > +#define _spin_lock(lock) __spin_lock(lock) > > > +#else > > > +void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > > > +#endif > > The above repeats many times, and I found that irritating. I think you > should be able to get rid of a _lot_ of that by doing it like this instead > > extern void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock); > ... plain declarations for all cases, no #ifdef's ... > > and then at the end you just do > > #ifdef __spin_lock_is_small > #define _spin_lock(lock) __spin_lock(lock) > #endif > .. > > and at least make that repeating section be composed of simpler and > smaller entries. Well, I decided to go for the ifdef variant, because that allows to easily ifdef all functions in kernel/spinlock.c: With #ifdef __spin_lock_is_small #define _spin_lock(lock) __spin_lock(lock) #endif We can do this: +#ifndef _spin_lock void __lockfunc _spin_lock(spinlock_t *lock) { __spin_lock(lock); } EXPORT_SYMBOL(_spin_lock); +#endif In addition this allows to overrule architecture settings by adding ifdefs in one common place instead of requiring each architecture to get it right: +#ifndef CONFIG_DEBUG_SPINLOCK +#ifndef CONFIG_GENERIC_LOCKBREAK + +#ifdef __spin_lock_is_small +#define _spin_lock(lock) __spin_lock(lock) +#endif + [...] This should also address Ingo's concerns. Does this look ok now? --- include/linux/spinlock_api_smp.h | 119 +++++++++++++++++++++++++++++++++++++++ kernel/spinlock.c | 56 ++++++++++++++++++ 2 files changed, 175 insertions(+) Index: linux-2.6/kernel/spinlock.c =================================================================== --- linux-2.6.orig/kernel/spinlock.c +++ linux-2.6/kernel/spinlock.c @@ -21,23 +21,29 @@ #include <linux/debug_locks.h> #include <linux/module.h> +#ifndef _spin_trylock int __lockfunc _spin_trylock(spinlock_t *lock) { return __spin_trylock(lock); } EXPORT_SYMBOL(_spin_trylock); +#endif +#ifndef _read_trylock int __lockfunc _read_trylock(rwlock_t *lock) { return __read_trylock(lock); } EXPORT_SYMBOL(_read_trylock); +#endif +#ifndef _write_trylock int __lockfunc _write_trylock(rwlock_t *lock) { return __write_trylock(lock); } EXPORT_SYMBOL(_write_trylock); +#endif /* * If lockdep is enabled then we use the non-preemption spin-ops @@ -46,77 +52,101 @@ EXPORT_SYMBOL(_write_trylock); */ #if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC) +#ifndef _read_lock void __lockfunc _read_lock(rwlock_t *lock) { __read_lock(lock); } EXPORT_SYMBOL(_read_lock); +#endif +#ifndef _spin_lock_irqsave unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock) { return __spin_lock_irqsave(lock); } EXPORT_SYMBOL(_spin_lock_irqsave); +#endif +#ifndef _spin_lock_irq void __lockfunc _spin_lock_irq(spinlock_t *lock) { __spin_lock_irq(lock); } EXPORT_SYMBOL(_spin_lock_irq); +#endif +#ifndef _spin_lock_bh void __lockfunc _spin_lock_bh(spinlock_t *lock) { __spin_lock_bh(lock); } EXPORT_SYMBOL(_spin_lock_bh); +#endif +#ifndef _read_lock_irqsave unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock) { return __read_lock_irqsave(lock); } EXPORT_SYMBOL(_read_lock_irqsave); +#endif +#ifndef _read_lock_irq void __lockfunc _read_lock_irq(rwlock_t *lock) { __read_lock_irq(lock); } EXPORT_SYMBOL(_read_lock_irq); +#endif +#ifndef _read_lock_bh void __lockfunc _read_lock_bh(rwlock_t *lock) { __read_lock_bh(lock); } EXPORT_SYMBOL(_read_lock_bh); +#endif +#ifndef _write_lock_irqsave unsigned long __lockfunc _write_lock_irqsave(rwlock_t *lock) { return __write_lock_irqsave(lock); } EXPORT_SYMBOL(_write_lock_irqsave); +#endif +#ifndef _write_lock_irq void __lockfunc _write_lock_irq(rwlock_t *lock) { __write_lock_irq(lock); } EXPORT_SYMBOL(_write_lock_irq); +#endif +#ifndef _write_lock_bh void __lockfunc _write_lock_bh(rwlock_t *lock) { __write_lock_bh(lock); } EXPORT_SYMBOL(_write_lock_bh); +#endif +#ifndef _spin_lock void __lockfunc _spin_lock(spinlock_t *lock) { __spin_lock(lock); } EXPORT_SYMBOL(_spin_lock); +#endif +#ifndef _write_lock void __lockfunc _write_lock(rwlock_t *lock) { __write_lock(lock); } EXPORT_SYMBOL(_write_lock); +#endif #else /* CONFIG_PREEMPT: */ @@ -242,83 +272,109 @@ EXPORT_SYMBOL(_spin_lock_nest_lock); #endif +#ifndef _spin_unlock void __lockfunc _spin_unlock(spinlock_t *lock) { __spin_unlock(lock); } EXPORT_SYMBOL(_spin_unlock); +#endif +#ifndef _write_unlock void __lockfunc _write_unlock(rwlock_t *lock) { __write_unlock(lock); } EXPORT_SYMBOL(_write_unlock); +#endif +#ifndef _read_unlock void __lockfunc _read_unlock(rwlock_t *lock) { __read_unlock(lock); } EXPORT_SYMBOL(_read_unlock); +#endif +#ifndef _spin_unlock_irqrestore void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags) { __spin_unlock_irqrestore(lock, flags); } EXPORT_SYMBOL(_spin_unlock_irqrestore); +#endif +#ifndef _spin_unlock_irq void __lockfunc _spin_unlock_irq(spinlock_t *lock) { __spin_unlock_irq(lock); } EXPORT_SYMBOL(_spin_unlock_irq); +#endif +#ifndef _spin_unlock_bh void __lockfunc _spin_unlock_bh(spinlock_t *lock) { __spin_unlock_bh(lock); } EXPORT_SYMBOL(_spin_unlock_bh); +#endif +#ifndef _read_unlock_irqrestore void __lockfunc _read_unlock_irqrestore(rwlock_t *lock, unsigned long flags) { __read_unlock_irqrestore(lock, flags); } EXPORT_SYMBOL(_read_unlock_irqrestore); +#endif +#ifndef _read_unlock_irq void __lockfunc _read_unlock_irq(rwlock_t *lock) { __read_unlock_irq(lock); } EXPORT_SYMBOL(_read_unlock_irq); +#endif +#ifndef _read_unlock_bh void __lockfunc _read_unlock_bh(rwlock_t *lock) { __read_unlock_bh(lock); } EXPORT_SYMBOL(_read_unlock_bh); +#endif +#ifndef _write_unlock_irqrestore void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags) { __write_unlock_irqrestore(lock, flags); } EXPORT_SYMBOL(_write_unlock_irqrestore); +#endif +#ifndef _write_unlock_irq void __lockfunc _write_unlock_irq(rwlock_t *lock) { __write_unlock_irq(lock); } EXPORT_SYMBOL(_write_unlock_irq); +#endif +#ifndef _write_unlock_bh void __lockfunc _write_unlock_bh(rwlock_t *lock) { __write_unlock_bh(lock); } EXPORT_SYMBOL(_write_unlock_bh); +#endif +#ifndef _spin_trylock_bh int __lockfunc _spin_trylock_bh(spinlock_t *lock) { return __spin_trylock_bh(lock); } EXPORT_SYMBOL(_spin_trylock_bh); +#endif notrace int in_lock_functions(unsigned long addr) { Index: linux-2.6/include/linux/spinlock_api_smp.h =================================================================== --- linux-2.6.orig/include/linux/spinlock_api_smp.h +++ linux-2.6/include/linux/spinlock_api_smp.h @@ -60,6 +60,125 @@ void __lockfunc _read_unlock_irqrestore( void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags) __releases(lock); +#ifndef CONFIG_DEBUG_SPINLOCK +#ifndef CONFIG_GENERIC_LOCKBREAK + +#ifdef __spin_lock_is_small +#define _spin_lock(lock) __spin_lock(lock) +#endif + +#ifdef __read_lock_is_small +#define _read_lock(lock) __read_lock(lock) +#endif + +#ifdef __write_lock_is_small +#define _write_lock(lock) __write_lock(lock) +#endif + +#ifdef __spin_lock_bh_is_small +#define _spin_lock_bh(lock) __spin_lock_bh(lock) +#endif + +#ifdef __read_lock_bh_is_small +#define _read_lock_bh(lock) __read_lock_bh(lock) +#endif + +#ifdef __write_lock_bh_is_small +#define _write_lock_bh(lock) __write_lock_bh(lock) +#endif + +#ifdef __spin_lock_irq_is_small +#define _spin_lock_irq(lock) __spin_lock_irq(lock) +#endif + +#ifdef __read_lock_irq_is_small +#define _read_lock_irq(lock) __read_lock_irq(lock) +#endif + +#ifdef __write_lock_irq_is_small +#define _write_lock_irq(lock) __write_lock_irq(lock) +#endif + +#ifdef __spin_lock_irqsave_is_small +#define _spin_lock_irqsave(lock) __spin_lock_irqsave(lock) +#endif + +#ifdef __read_lock_irqsave_is_small +#define _read_lock_irqsave(lock) __read_lock_irqsave(lock) +#endif + +#ifdef __write_lock_irqsave_is_small +#define _write_lock_irqsave(lock) __write_lock_irqsave(lock) +#endif + +#endif /* !CONFIG_GENERIC_LOCKBREAK */ + +#ifdef __spin_trylock_is_small +#define _spin_trylock(lock) __spin_trylock(lock) +#endif + +#ifdef __read_trylock_is_small +#define _read_trylock(lock) __read_trylock(lock) +#endif + +#ifdef __write_trylock_is_small +#define _write_trylock(lock) __write_trylock(lock) +#endif + +#ifdef __spin_trylock_bh_is_small +#define _spin_trylock_bh(lock) __spin_trylock_bh(lock) +#endif + +#ifdef __spin_unlock_is_small +#define _spin_unlock(lock) __spin_unlock(lock) +#endif + +#ifdef __read_unlock_is_small +#define _read_unlock(lock) __read_unlock(lock) +#endif + +#ifdef __write_unlock_is_small +#define _write_unlock(lock) __write_unlock(lock) +#endif + +#ifdef __spin_unlock_bh_is_small +#define _spin_unlock_bh(lock) __spin_unlock_bh(lock) +#endif + +#ifdef __read_unlock_bh_is_small +#define _read_unlock_bh(lock) __read_unlock_bh(lock) +#endif + +#ifdef __write_unlock_bh_is_small +#define _write_unlock_bh(lock) __write_unlock_bh(lock) +#endif + +#ifdef __spin_unlock_irq_is_small +#define _spin_unlock_irq(lock) __spin_unlock_irq(lock) +#endif + +#ifdef __read_unlock_irq_is_small +#define _read_unlock_irq(lock) __read_unlock_irq(lock) +#endif + +#ifdef __write_unlock_irq_is_small +#define _write_unlock_irq(lock) __write_unlock_irq(lock) +#endif + +#ifdef __spin_unlock_irqrestore_is_small +#define _spin_unlock_irqrestore(lock, flags) __spin_unlock_irqrestore(lock, flags) +#endif + +#ifdef __read_unlock_irqrestore_is_small +#define _read_unlock_irqrestore(lock, flags) __read_unlock_irqrestore(lock, flags) +#endif + +#ifdef __write_unlock_irqrestore_is_small +#define _write_unlock_irqrestore(lock, flags) __write_unlock_irqrestore(lock, flags) +#endif + +#endif /* CONFIG_DEBUG_SPINLOCK */ + static inline int __spin_trylock(spinlock_t *lock) { preempt_disable(); ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2009-08-18 11:34 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens 2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens 2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens 2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens 2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds 2009-08-13 18:34 ` Ingo Molnar 2009-08-13 18:43 ` Ingo Molnar 2009-08-14 12:34 ` Heiko Carstens 2009-08-14 16:04 ` Linus Torvalds 2009-08-14 17:13 ` Heiko Carstens 2009-08-14 18:08 ` Linus Torvalds 2009-08-14 20:19 ` David Miller 2009-08-14 20:45 ` Linus Torvalds 2009-08-14 21:10 ` Linus Torvalds 2009-08-14 22:23 ` David Miller 2009-08-16 18:27 ` Heiko Carstens 2009-08-16 18:45 ` Linus Torvalds 2009-08-16 20:36 ` Ingo Molnar 2009-08-17 10:26 ` Heiko Carstens 2009-08-17 21:26 ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar 2009-08-18 11:34 ` Heiko Carstens -- strict thread matches above, loose matches on Subject: below -- 2009-08-14 12:58 [patch 0/3] Allow inlined spinlocks again V4 Heiko Carstens 2009-08-14 12:58 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens 2009-08-16 17:57 ` Heiko Carstens 2009-08-16 18:06 ` Ingo Molnar 2009-08-16 18:43 ` Linus Torvalds 2009-08-16 20:24 ` Ingo Molnar 2009-08-16 21:07 ` Linus Torvalds 2009-08-16 21:18 ` Ingo Molnar 2009-08-16 18:44 ` Heiko Carstens 2009-08-16 20:48 ` Ingo Molnar 2009-08-16 21:33 ` Heiko Carstens 2009-08-16 21:36 ` Ingo Molnar 2009-08-16 18:22 ` Linus Torvalds 2009-08-17 15:46 ` Heiko Carstens
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).