* [patch 0/3] Allow inlined spinlocks again V3
@ 2009-08-12 18:39 Heiko Carstens
2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt
Hi Andrew,
since there were no objections to this patch set, would you mind picking
it up for -mm? The patches are against current Linus' git head but also
apply on top of the latest mmotm patch set.
===
This patch set allows to have inlined spinlocks again.
The rationale behind this is that function calls on at least s390 are
expensive.
If one considers that server kernels are usually compiled with
!CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop.
The extra overhead for a function call is significant.
With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390.
These numbers were taken with some network benchmarks. However I expect
any workload that calls frequently into the kernel and which grabs a few
locks to perform better.
The implementation is straight forward: move the function bodies of the
locking functions to static inline functions and place them in a header
file.
Dependent on CONFIG_SPINLOCK_INLINE generate out-of-line or inlined
locking functions.
The patches should be self explaining.
V2: rewritten from scratch - now also with readable code
V3: removed macro to generate out-of-line spinlock variants since that
would break ctags. As requested by Arnd Bergmann.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 1/3] spinlock: move spinlock function bodies to header file
2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
@ 2009-08-12 18:39 ` Heiko Carstens
2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens
` (2 subsequent siblings)
3 siblings, 0 replies; 21+ messages in thread
From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt
[-- Attachment #1: 01_spinlock_move.diff --]
[-- Type: text/plain, Size: 17156 bytes --]
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Move spinlock function bodies to header file by creating a
static inline version of each variant.
Use the inline version on the out-of-line code. This shouldn't
make any difference besides that the spinlock code can now
be used to generate inlined spinlock code.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
include/linux/spinlock.h | 18 +-
include/linux/spinlock_api_smp.h | 263 +++++++++++++++++++++++++++++++++++++++
kernel/spinlock.c | 174 ++++---------------------
3 files changed, 300 insertions(+), 155 deletions(-)
Index: linux-2.6/include/linux/spinlock_api_smp.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock_api_smp.h
+++ linux-2.6/include/linux/spinlock_api_smp.h
@@ -60,4 +60,267 @@ void __lockfunc _read_unlock_irqrestore(
void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
__releases(lock);
+static inline int __spin_trylock(spinlock_t *lock)
+{
+ preempt_disable();
+ if (_raw_spin_trylock(lock)) {
+ spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+ return 1;
+ }
+ preempt_enable();
+ return 0;
+}
+
+static inline int __read_trylock(rwlock_t *lock)
+{
+ preempt_disable();
+ if (_raw_read_trylock(lock)) {
+ rwlock_acquire_read(&lock->dep_map, 0, 1, _RET_IP_);
+ return 1;
+ }
+ preempt_enable();
+ return 0;
+}
+
+static inline int __write_trylock(rwlock_t *lock)
+{
+ preempt_disable();
+ if (_raw_write_trylock(lock)) {
+ rwlock_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+ return 1;
+ }
+ preempt_enable();
+ return 0;
+}
+
+/*
+ * If lockdep is enabled then we use the non-preemption spin-ops
+ * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
+ * not re-enabled during lock-acquire (which the preempt-spin-ops do):
+ */
+#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)
+
+static inline void __read_lock(rwlock_t *lock)
+{
+ preempt_disable();
+ rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+}
+
+static inline unsigned long __spin_lock_irqsave(spinlock_t *lock)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ preempt_disable();
+ spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ /*
+ * On lockdep we dont want the hand-coded irq-enable of
+ * _raw_spin_lock_flags() code, because lockdep assumes
+ * that interrupts are not re-enabled during lock-acquire:
+ */
+#ifdef CONFIG_LOCKDEP
+ LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+#else
+ _raw_spin_lock_flags(lock, &flags);
+#endif
+ return flags;
+}
+
+static inline void __spin_lock_irq(spinlock_t *lock)
+{
+ local_irq_disable();
+ preempt_disable();
+ spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+}
+
+static inline void __spin_lock_bh(spinlock_t *lock)
+{
+ local_bh_disable();
+ preempt_disable();
+ spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+}
+
+static inline unsigned long __read_lock_irqsave(rwlock_t *lock)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ preempt_disable();
+ rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED_FLAGS(lock, _raw_read_trylock, _raw_read_lock,
+ _raw_read_lock_flags, &flags);
+ return flags;
+}
+
+static inline void __read_lock_irq(rwlock_t *lock)
+{
+ local_irq_disable();
+ preempt_disable();
+ rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+}
+
+static inline void __read_lock_bh(rwlock_t *lock)
+{
+ local_bh_disable();
+ preempt_disable();
+ rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+}
+
+static inline unsigned long __write_lock_irqsave(rwlock_t *lock)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ preempt_disable();
+ rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED_FLAGS(lock, _raw_write_trylock, _raw_write_lock,
+ _raw_write_lock_flags, &flags);
+ return flags;
+}
+
+static inline void __write_lock_irq(rwlock_t *lock)
+{
+ local_irq_disable();
+ preempt_disable();
+ rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+}
+
+static inline void __write_lock_bh(rwlock_t *lock)
+{
+ local_bh_disable();
+ preempt_disable();
+ rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+}
+
+static inline void __spin_lock(spinlock_t *lock)
+{
+ preempt_disable();
+ spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+}
+
+static inline void __write_lock(rwlock_t *lock)
+{
+ preempt_disable();
+ rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+}
+
+#endif /* CONFIG_PREEMPT */
+
+static inline void __spin_unlock(spinlock_t *lock)
+{
+ spin_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_spin_unlock(lock);
+ preempt_enable();
+}
+
+static inline void __write_unlock(rwlock_t *lock)
+{
+ rwlock_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_write_unlock(lock);
+ preempt_enable();
+}
+
+static inline void __read_unlock(rwlock_t *lock)
+{
+ rwlock_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_read_unlock(lock);
+ preempt_enable();
+}
+
+static inline void __spin_unlock_irqrestore(spinlock_t *lock,
+ unsigned long flags)
+{
+ spin_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_spin_unlock(lock);
+ local_irq_restore(flags);
+ preempt_enable();
+}
+
+static inline void __spin_unlock_irq(spinlock_t *lock)
+{
+ spin_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_spin_unlock(lock);
+ local_irq_enable();
+ preempt_enable();
+}
+
+static inline void __spin_unlock_bh(spinlock_t *lock)
+{
+ spin_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_spin_unlock(lock);
+ preempt_enable_no_resched();
+ local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
+
+static inline void __read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+{
+ rwlock_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_read_unlock(lock);
+ local_irq_restore(flags);
+ preempt_enable();
+}
+
+static inline void __read_unlock_irq(rwlock_t *lock)
+{
+ rwlock_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_read_unlock(lock);
+ local_irq_enable();
+ preempt_enable();
+}
+
+static inline void __read_unlock_bh(rwlock_t *lock)
+{
+ rwlock_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_read_unlock(lock);
+ preempt_enable_no_resched();
+ local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
+
+static inline void __write_unlock_irqrestore(rwlock_t *lock,
+ unsigned long flags)
+{
+ rwlock_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_write_unlock(lock);
+ local_irq_restore(flags);
+ preempt_enable();
+}
+
+static inline void __write_unlock_irq(rwlock_t *lock)
+{
+ rwlock_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_write_unlock(lock);
+ local_irq_enable();
+ preempt_enable();
+}
+
+static inline void __write_unlock_bh(rwlock_t *lock)
+{
+ rwlock_release(&lock->dep_map, 1, _RET_IP_);
+ _raw_write_unlock(lock);
+ preempt_enable_no_resched();
+ local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
+
+static inline int __spin_trylock_bh(spinlock_t *lock)
+{
+ local_bh_disable();
+ preempt_disable();
+ if (_raw_spin_trylock(lock)) {
+ spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+ return 1;
+ }
+ preempt_enable_no_resched();
+ local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+ return 0;
+}
+
#endif /* __LINUX_SPINLOCK_API_SMP_H */
Index: linux-2.6/include/linux/spinlock.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock.h
+++ linux-2.6/include/linux/spinlock.h
@@ -143,15 +143,6 @@ static inline void smp_mb__after_lock(vo
*/
#define spin_unlock_wait(lock) __raw_spin_unlock_wait(&(lock)->raw_lock)
-/*
- * Pull the _spin_*()/_read_*()/_write_*() functions/declarations:
- */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-# include <linux/spinlock_api_smp.h>
-#else
-# include <linux/spinlock_api_up.h>
-#endif
-
#ifdef CONFIG_DEBUG_SPINLOCK
extern void _raw_spin_lock(spinlock_t *lock);
#define _raw_spin_lock_flags(lock, flags) _raw_spin_lock(lock)
@@ -380,4 +371,13 @@ extern int _atomic_dec_and_lock(atomic_t
*/
#define spin_can_lock(lock) (!spin_is_locked(lock))
+/*
+ * Pull the _spin_*()/_read_*()/_write_*() functions/declarations:
+ */
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+# include <linux/spinlock_api_smp.h>
+#else
+# include <linux/spinlock_api_up.h>
+#endif
+
#endif /* __LINUX_SPINLOCK_H */
Index: linux-2.6/kernel/spinlock.c
===================================================================
--- linux-2.6.orig/kernel/spinlock.c
+++ linux-2.6/kernel/spinlock.c
@@ -23,40 +23,19 @@
int __lockfunc _spin_trylock(spinlock_t *lock)
{
- preempt_disable();
- if (_raw_spin_trylock(lock)) {
- spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
- return 1;
- }
-
- preempt_enable();
- return 0;
+ return __spin_trylock(lock);
}
EXPORT_SYMBOL(_spin_trylock);
int __lockfunc _read_trylock(rwlock_t *lock)
{
- preempt_disable();
- if (_raw_read_trylock(lock)) {
- rwlock_acquire_read(&lock->dep_map, 0, 1, _RET_IP_);
- return 1;
- }
-
- preempt_enable();
- return 0;
+ return __read_trylock(lock);
}
EXPORT_SYMBOL(_read_trylock);
int __lockfunc _write_trylock(rwlock_t *lock)
{
- preempt_disable();
- if (_raw_write_trylock(lock)) {
- rwlock_acquire(&lock->dep_map, 0, 1, _RET_IP_);
- return 1;
- }
-
- preempt_enable();
- return 0;
+ return __write_trylock(lock);
}
EXPORT_SYMBOL(_write_trylock);
@@ -69,129 +48,74 @@ EXPORT_SYMBOL(_write_trylock);
void __lockfunc _read_lock(rwlock_t *lock)
{
- preempt_disable();
- rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+ __read_lock(lock);
}
EXPORT_SYMBOL(_read_lock);
unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock)
{
- unsigned long flags;
-
- local_irq_save(flags);
- preempt_disable();
- spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
- /*
- * On lockdep we dont want the hand-coded irq-enable of
- * _raw_spin_lock_flags() code, because lockdep assumes
- * that interrupts are not re-enabled during lock-acquire:
- */
-#ifdef CONFIG_LOCKDEP
- LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
-#else
- _raw_spin_lock_flags(lock, &flags);
-#endif
- return flags;
+ return __spin_lock_irqsave(lock);
}
EXPORT_SYMBOL(_spin_lock_irqsave);
void __lockfunc _spin_lock_irq(spinlock_t *lock)
{
- local_irq_disable();
- preempt_disable();
- spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+ __spin_lock_irq(lock);
}
EXPORT_SYMBOL(_spin_lock_irq);
void __lockfunc _spin_lock_bh(spinlock_t *lock)
{
- local_bh_disable();
- preempt_disable();
- spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+ __spin_lock_bh(lock);
}
EXPORT_SYMBOL(_spin_lock_bh);
unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock)
{
- unsigned long flags;
-
- local_irq_save(flags);
- preempt_disable();
- rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED_FLAGS(lock, _raw_read_trylock, _raw_read_lock,
- _raw_read_lock_flags, &flags);
- return flags;
+ return __read_lock_irqsave(lock);
}
EXPORT_SYMBOL(_read_lock_irqsave);
void __lockfunc _read_lock_irq(rwlock_t *lock)
{
- local_irq_disable();
- preempt_disable();
- rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+ __read_lock_irq(lock);
}
EXPORT_SYMBOL(_read_lock_irq);
void __lockfunc _read_lock_bh(rwlock_t *lock)
{
- local_bh_disable();
- preempt_disable();
- rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+ __read_lock_bh(lock);
}
EXPORT_SYMBOL(_read_lock_bh);
unsigned long __lockfunc _write_lock_irqsave(rwlock_t *lock)
{
- unsigned long flags;
-
- local_irq_save(flags);
- preempt_disable();
- rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED_FLAGS(lock, _raw_write_trylock, _raw_write_lock,
- _raw_write_lock_flags, &flags);
- return flags;
+ return __write_lock_irqsave(lock);
}
EXPORT_SYMBOL(_write_lock_irqsave);
void __lockfunc _write_lock_irq(rwlock_t *lock)
{
- local_irq_disable();
- preempt_disable();
- rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+ __write_lock_irq(lock);
}
EXPORT_SYMBOL(_write_lock_irq);
void __lockfunc _write_lock_bh(rwlock_t *lock)
{
- local_bh_disable();
- preempt_disable();
- rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+ __write_lock_bh(lock);
}
EXPORT_SYMBOL(_write_lock_bh);
void __lockfunc _spin_lock(spinlock_t *lock)
{
- preempt_disable();
- spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+ __spin_lock(lock);
}
-
EXPORT_SYMBOL(_spin_lock);
void __lockfunc _write_lock(rwlock_t *lock)
{
- preempt_disable();
- rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
- LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+ __write_lock(lock);
}
-
EXPORT_SYMBOL(_write_lock);
#else /* CONFIG_PREEMPT: */
@@ -320,121 +244,79 @@ EXPORT_SYMBOL(_spin_lock_nest_lock);
void __lockfunc _spin_unlock(spinlock_t *lock)
{
- spin_release(&lock->dep_map, 1, _RET_IP_);
- _raw_spin_unlock(lock);
- preempt_enable();
+ __spin_unlock(lock);
}
EXPORT_SYMBOL(_spin_unlock);
void __lockfunc _write_unlock(rwlock_t *lock)
{
- rwlock_release(&lock->dep_map, 1, _RET_IP_);
- _raw_write_unlock(lock);
- preempt_enable();
+ __write_unlock(lock);
}
EXPORT_SYMBOL(_write_unlock);
void __lockfunc _read_unlock(rwlock_t *lock)
{
- rwlock_release(&lock->dep_map, 1, _RET_IP_);
- _raw_read_unlock(lock);
- preempt_enable();
+ __read_unlock(lock);
}
EXPORT_SYMBOL(_read_unlock);
void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
{
- spin_release(&lock->dep_map, 1, _RET_IP_);
- _raw_spin_unlock(lock);
- local_irq_restore(flags);
- preempt_enable();
+ __spin_unlock_irqrestore(lock, flags);
}
EXPORT_SYMBOL(_spin_unlock_irqrestore);
void __lockfunc _spin_unlock_irq(spinlock_t *lock)
{
- spin_release(&lock->dep_map, 1, _RET_IP_);
- _raw_spin_unlock(lock);
- local_irq_enable();
- preempt_enable();
+ __spin_unlock_irq(lock);
}
EXPORT_SYMBOL(_spin_unlock_irq);
void __lockfunc _spin_unlock_bh(spinlock_t *lock)
{
- spin_release(&lock->dep_map, 1, _RET_IP_);
- _raw_spin_unlock(lock);
- preempt_enable_no_resched();
- local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+ __spin_unlock_bh(lock);
}
EXPORT_SYMBOL(_spin_unlock_bh);
void __lockfunc _read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
{
- rwlock_release(&lock->dep_map, 1, _RET_IP_);
- _raw_read_unlock(lock);
- local_irq_restore(flags);
- preempt_enable();
+ __read_unlock_irqrestore(lock, flags);
}
EXPORT_SYMBOL(_read_unlock_irqrestore);
void __lockfunc _read_unlock_irq(rwlock_t *lock)
{
- rwlock_release(&lock->dep_map, 1, _RET_IP_);
- _raw_read_unlock(lock);
- local_irq_enable();
- preempt_enable();
+ __read_unlock_irq(lock);
}
EXPORT_SYMBOL(_read_unlock_irq);
void __lockfunc _read_unlock_bh(rwlock_t *lock)
{
- rwlock_release(&lock->dep_map, 1, _RET_IP_);
- _raw_read_unlock(lock);
- preempt_enable_no_resched();
- local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+ __read_unlock_bh(lock);
}
EXPORT_SYMBOL(_read_unlock_bh);
void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
{
- rwlock_release(&lock->dep_map, 1, _RET_IP_);
- _raw_write_unlock(lock);
- local_irq_restore(flags);
- preempt_enable();
+ __write_unlock_irqrestore(lock, flags);
}
EXPORT_SYMBOL(_write_unlock_irqrestore);
void __lockfunc _write_unlock_irq(rwlock_t *lock)
{
- rwlock_release(&lock->dep_map, 1, _RET_IP_);
- _raw_write_unlock(lock);
- local_irq_enable();
- preempt_enable();
+ __write_unlock_irq(lock);
}
EXPORT_SYMBOL(_write_unlock_irq);
void __lockfunc _write_unlock_bh(rwlock_t *lock)
{
- rwlock_release(&lock->dep_map, 1, _RET_IP_);
- _raw_write_unlock(lock);
- preempt_enable_no_resched();
- local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+ __write_unlock_bh(lock);
}
EXPORT_SYMBOL(_write_unlock_bh);
int __lockfunc _spin_trylock_bh(spinlock_t *lock)
{
- local_bh_disable();
- preempt_disable();
- if (_raw_spin_trylock(lock)) {
- spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
- return 1;
- }
-
- preempt_enable_no_resched();
- local_bh_enable_ip((unsigned long)__builtin_return_address(0));
- return 0;
+ return __spin_trylock_bh(lock);
}
EXPORT_SYMBOL(_spin_trylock_bh);
--
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 2/3] spinlock: allow inlined spinlocks
2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
@ 2009-08-12 18:39 ` Heiko Carstens
2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens
2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
3 siblings, 0 replies; 21+ messages in thread
From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt
[-- Attachment #1: 02_spinlock_inline.diff --]
[-- Type: text/plain, Size: 4678 bytes --]
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Add new config option SPINLOCK_INLINE and some defines which depend
on it in order to generate inlined spinlock code instead of out-of-line
code.
Avoiding function calls for spinlocks gives 1%-5% less cpu usage on
network benchmarks on s390.
Architectures must select HAVE_SPINLOCK_INLINE_SUPPORT to enable this
config option.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
include/linux/spinlock_api_smp.h | 35 +++++++++++++++++++++++++++++++++++
kernel/spinlock.c | 4 ++++
lib/Kconfig.debug | 14 ++++++++++++++
3 files changed, 53 insertions(+)
Index: linux-2.6/include/linux/spinlock_api_smp.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock_api_smp.h
+++ linux-2.6/include/linux/spinlock_api_smp.h
@@ -19,6 +19,8 @@ int in_lock_functions(unsigned long addr
#define assert_spin_locked(x) BUG_ON(!spin_is_locked(x))
+#ifndef CONFIG_SPINLOCK_INLINE
+
void __lockfunc _spin_lock(spinlock_t *lock) __acquires(lock);
void __lockfunc _spin_lock_nested(spinlock_t *lock, int subclass)
__acquires(lock);
@@ -60,6 +62,39 @@ void __lockfunc _read_unlock_irqrestore(
void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
__releases(lock);
+#else /* CONFIG_HAVE_SPINLOCK_INLINE_SUPPORT */
+
+#define _spin_trylock(lock) __spin_trylock(lock)
+#define _read_trylock(lock) __read_trylock(lock)
+#define _write_trylock(lock) __write_trylock(lock)
+#define _read_lock(lock) __read_lock(lock)
+#define _spin_lock_irqsave(lock) __spin_lock_irqsave(lock)
+#define _spin_lock_irq(lock) __spin_lock_irq(lock)
+#define _spin_lock_bh(lock) __spin_lock_bh(lock)
+#define _read_lock_irqsave(lock) __read_lock_irqsave(lock)
+#define _read_lock_irq(lock) __read_lock_irq(lock)
+#define _read_lock_bh(lock) __read_lock_bh(lock)
+#define _write_lock_irqsave(lock) __write_lock_irqsave(lock)
+#define _write_lock_irq(lock) __write_lock_irq(lock)
+#define _write_lock_bh(lock) __write_lock_bh(lock)
+#define _spin_lock(lock) __spin_lock(lock)
+#define _write_lock(lock) __write_lock(lock)
+#define _spin_unlock(lock) __spin_unlock(lock)
+#define _write_unlock(lock) __write_unlock(lock)
+#define _read_unlock(lock) __read_unlock(lock)
+#define _spin_unlock_irq(lock) __spin_unlock_irq(lock)
+#define _spin_unlock_bh(lock) __spin_unlock_bh(lock)
+#define _read_unlock_irq(lock) __read_unlock_irq(lock)
+#define _read_unlock_bh(lock) __read_unlock_bh(lock)
+#define _write_unlock_irq(lock) __write_unlock_irq(lock)
+#define _write_unlock_bh(lock) __write_unlock_bh(lock)
+#define _spin_trylock_bh(lock) __spin_trylock_bh(lock)
+#define _spin_unlock_irqrestore(lock, flags) __spin_unlock_irqrestore(lock, flags)
+#define _read_unlock_irqrestore(lock, flags) __read_unlock_irqrestore(lock, flags)
+#define _write_unlock_irqrestore(lock, flags) __write_unlock_irqrestore(lock, flags)
+
+#endif /* CONFIG_HAVE_SPINLOCK_INLINE_SUPPORT */
+
static inline int __spin_trylock(spinlock_t *lock)
{
preempt_disable();
Index: linux-2.6/lib/Kconfig.debug
===================================================================
--- linux-2.6.orig/lib/Kconfig.debug
+++ linux-2.6/lib/Kconfig.debug
@@ -879,6 +879,20 @@ config SYSCTL_SYSCALL_CHECK
to properly maintain and use. This enables checks that help
you to keep things correct.
+config HAVE_SPINLOCK_INLINE_SUPPORT
+ bool
+
+config SPINLOCK_INLINE
+ bool "Inline spinlock code"
+ depends on HAVE_SPINLOCK_INLINE_SUPPORT
+ depends on !DEBUG_SPINLOCK
+ depends on SMP && !PREEMPT
+ help
+ Select this option if you want to have inlined spinlock code instead
+ of an out of line implementation.
+ This will generate a larger kernel image. On some architectures this
+ increases performance.
+
source mm/Kconfig.debug
source kernel/trace/Kconfig
Index: linux-2.6/kernel/spinlock.c
===================================================================
--- linux-2.6.orig/kernel/spinlock.c
+++ linux-2.6/kernel/spinlock.c
@@ -21,6 +21,8 @@
#include <linux/debug_locks.h>
#include <linux/module.h>
+#ifndef CONFIG_SPINLOCK_INLINE
+
int __lockfunc _spin_trylock(spinlock_t *lock)
{
return __spin_trylock(lock);
@@ -320,6 +322,8 @@ int __lockfunc _spin_trylock_bh(spinlock
}
EXPORT_SYMBOL(_spin_trylock_bh);
+#endif /* CONFIG_HAVE_SPINLOCK_INLINE_SUPPORT */
+
notrace int in_lock_functions(unsigned long addr)
{
/* Linker adds these: start and end of __lockfunc functions */
--
^ permalink raw reply [flat|nested] 21+ messages in thread
* [patch 3/3] spinlock: allow inlined spinlocks on s390
2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens
@ 2009-08-12 18:39 ` Heiko Carstens
2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
3 siblings, 0 replies; 21+ messages in thread
From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt
[-- Attachment #1: 03_spinlock_s390.diff --]
[-- Type: text/plain, Size: 680 bytes --]
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
arch/s390/Kconfig | 1 +
1 file changed, 1 insertion(+)
Index: linux-2.6/arch/s390/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/Kconfig
+++ linux-2.6/arch/s390/Kconfig
@@ -93,6 +93,7 @@ config S390
select HAVE_KRETPROBES
select HAVE_KVM if 64BIT
select HAVE_ARCH_TRACEHOOK
+ select HAVE_SPINLOCK_INLINE_SUPPORT
select INIT_ALL_POSSIBLE
select HAVE_PERF_COUNTERS
select GENERIC_ATOMIC64 if !64BIT
--
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
` (2 preceding siblings ...)
2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens
@ 2009-08-13 18:11 ` Linus Torvalds
2009-08-13 18:34 ` Ingo Molnar
2009-08-14 12:34 ` Heiko Carstens
3 siblings, 2 replies; 21+ messages in thread
From: Linus Torvalds @ 2009-08-13 18:11 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt
On Wed, 12 Aug 2009, Heiko Carstens wrote:
>
> This patch set allows to have inlined spinlocks again.
I'm not convinced this is the right approach.
The thing is, CONFIG_SPINLOCK_INLINE doesn't look like the right thing to
do. On x86, for example, functions calls are usually quite cheap, and it
may well make sense to keep 'spin_lock()' out-of-line, especially once you
start doing things like playing with the softirq counts or interrupts.
At the same time, the special case of spin *unlock* is generally just a
single (small) instruction on x86, so that one almost certainly is worth
inlining - at least as long as it's not one of the softirq ones.
In other words, I think your hammer is too large. If you want to inline
the lock functions, you should _not_ make this a single "everything or
nothing" thing.
> If one considers that server kernels are usually compiled with
> !CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop.
> The extra overhead for a function call is significant.
> With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390.
> These numbers were taken with some network benchmarks. However I expect
> any workload that calls frequently into the kernel and which grabs a few
> locks to perform better.
Have you tried to determine which kinds of spinlocks you care most about?
With networking, I guess the softirq-safe ones (and perhaps the irq ones)
are pretty common, but maybe even there it's really just a couple of
cases.
Now, there is one advantage to inlining that I really like - and that I
personally have missed from the days when we used to always do it:
profiling. Inlining the spinlocks tends to do _wonders_ for showing
exactly which spinlock ends up being the contention cases for certain
loads.
So there are advantages outside of the "size vs speed" kind of things, but
at the same time I do suspect that the disadvantages mean that we really
should make this be something where an architecture can specify things at
a finer granularity. Perhaps by using per-symbol defines (ie have the
<asm/spinlock.h> file do things like
#define __spin_lock_is_small
#define __spin_unlock_is_small
kind of things for the particular sequences that are worth inlining.
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
@ 2009-08-13 18:34 ` Ingo Molnar
2009-08-13 18:43 ` Ingo Molnar
2009-08-14 12:34 ` Heiko Carstens
1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2009-08-13 18:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch,
Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Now, there is one advantage to inlining that I really like - and
> that I personally have missed from the days when we used to
> always do it: profiling. Inlining the spinlocks tends to do
> _wonders_ for showing exactly which spinlock ends up being the
> contention cases for certain loads. [...]
We've got a new tool to determine that kind of information though:
perf + call-graph profiling.
This is how it works and how it can be used. Say we want to profile
'bw_tcp' from lmbench. The traditional way is to record a flat
profile:
aldebaran:~/l> perf record -f ./bw_pipe
Pipe bandwidth: 4607.33 MB/sec
[ perf record: Captured and wrote 2.503 MB perf.data (~109368 samples) ]
and then to view the flat profile:
aldebaran:~/l> perf report
# Samples: 109331
#
# Overhead Command Shared Object Symbol
# ........ ....... ........................ ......
#
39.61% bw_pipe [kernel] [k] copy_user_generic_string
19.32% bw_pipe ./bw_pipe [.] enough_duration
6.21% bw_pipe [kernel] [k] mutex_spin_on_owner
3.54% bw_pipe ./bw_pipe [.] two_op
3.54% bw_pipe ./bw_pipe [.] one_op
2.79% bw_pipe [kernel] [k] get_page_from_freelist
2.16% bw_pipe [kernel] [k] ia32_sysenter_target
1.91% bw_pipe [kernel] [k] pipe_read
1.83% bw_pipe [kernel] [k] pipe_write
1.63% bw_pipe [kernel] [k] free_hot_cold_page
1.21% bw_pipe [kernel] [k] put_page
1.19% bw_pipe [kernel] [k] __alloc_pages_nodemask
0.99% bw_pipe [kernel] [k] anon_pipe_buf_release
0.74% bw_pipe [kernel] [k] get_pageblock_flags_group
Annoyingly enough, mutex_spin_on_owner() is in the profile but we
have no idea what the parent function(s) are. (Plus there's no way
we could reasonably inline mutex functions.) So we are stuck with
guessing which of the many other functions above (or some other
function which does not even get listed!) has that mutex spinning
overhead.
But if you do the recording via call-graph traces:
aldebaran:~/l> perf record -g -f ./bw_pipe
Pipe bandwidth: 4432.75 MB/sec
[ perf record: Captured and wrote 7.550 MB perf.data (~329886 samples) ]
(note the -g/--graph option) And then view the profile via perf
report, we get:
aldebaran:~/l> perf report -S _spin_lock
# symbol: _spin_lock
# Samples: 724
#
# Overhead Command Shared Object
# ........ ....... ........................
#
98.34% lat_tcp [kernel]
|
|--31.74%-- ipt_do_table
| |
| |--26.99%-- ipt_local_in_hook
| | nf_iterate
| | nf_hook_slow
| | ip_local_deliver
| | ip_rcv_finish
| | ip_rcv
| | netif_receive_skb
| | process_backlog
| | net_rx_action
| | __do_softirq
| | call_softirq
| | do_softirq
| | _local_bh_enable_ip
| | local_bh_enable
| | dev_queue_xmit
| | ip_finish_output2
| | ip_finish_output
| | ip_output
| | ip_local_out
| | ip_queue_xmit
| | tcp_transmit_skb
| | tcp_write_xmit
| | __tcp_push_pending_frames
| | tcp_push
| | tcp_sendmsg
| | __sock_sendmsg
| | sock_aio_write
| | do_sync_write
| | vfs_write
| | sys_write
| | sysenter_dispatch
| | 0xf7f50430
| | 0x5
| |
| |--24.34%-- ipt_local_in_hook
| | nf_iterate
| | nf_hook_slow
| | ip_local_deliver
| | ip_rcv_finish
| | ip_rcv
| | netif_receive_skb
| | process_backlog
| | net_rx_action
| | __do_softirq
| | call_softirq
| | do_softirq
| | _local_bh_enable_ip
| | local_bh_enable
| | dev_queue_xmit
| | ip_finish_output2
| | ip_finish_output
| | ip_output
| | ip_local_out
| | ip_queue_xmit
| | tcp_transmit_skb
| | tcp_write_xmit
| | __tcp_push_pending_frames
| | tcp_push
| | tcp_sendmsg
| | __sock_sendmsg
| | sock_aio_write
| | do_sync_write
| | vfs_write
| | sys_write
| | sysenter_dispatch
| | 0xf7f50430
| | 0x5
| |
| |--16.81%-- ipt_pre_routing_hook
| | nf_iterate
| | nf_hook_slow
| | ip_rcv
| | netif_receive_skb
| | process_backlog
| | net_rx_action
| | __do_softirq
| | call_softirq
| | do_softirq
| | _local_bh_enable_ip
| | local_bh_enable
| | dev_queue_xmit
| | ip_finish_output2
| | ip_finish_output
| | ip_output
| | ip_local_out
| | ip_queue_xmit
| | tcp_transmit_skb
| | tcp_write_xmit
| | __tcp_push_pending_frames
| | tcp_push
| | tcp_sendmsg
| | __sock_sendmsg
| | sock_aio_write
| | do_sync_write
| | vfs_write
| | sys_write
| | sysenter_dispatch
| | 0xf7f50430
| | 0x5
| |
| |--11.95%-- ipt_post_routing_hook
| | nf_iterate
| | nf_hook_slow
| | ip_output
| | ip_local_out
| | ip_queue_xmit
| | tcp_transmit_skb
| | tcp_write_xmit
| | __tcp_push_pending_frames
| | tcp_push
| | tcp_sendmsg
| | __sock_sendmsg
| | sock_aio_write
| | do_sync_write
| | vfs_write
| | sys_write
| | sysenter_dispatch
| | 0xf7f50430
| | 0x5
| |
| |--10.18%-- ipt_local_out_hook
| | nf_iterate
| | nf_hook_slow
| | __ip_local_out
| | ip_local_out
| | ip_queue_xmit
| | tcp_transmit_skb
| | tcp_write_xmit
| | __tcp_push_pending_frames
| | tcp_push
| | tcp_sendmsg
| | __sock_sendmsg
| | sock_aio_write
| | do_sync_write
| | vfs_write
| | sys_write
| | sysenter_dispatch
| | 0xf7f50430
| | 0x5
| |
| --9.73%-- ipt_local_hook
| nf_iterate
| nf_hook_slow
| __ip_local_out
| ip_local_out
| ip_queue_xmit
| tcp_transmit_skb
| tcp_write_xmit
| __tcp_push_pending_frames
| tcp_push
| tcp_sendmsg
| __sock_sendmsg
| sock_aio_write
| do_sync_write
| vfs_write
| sys_write
| sysenter_dispatch
| 0xf7f50430
| 0x5
|
|--31.46%-- __mod_timer
| mod_timer
| sk_reset_timer
| |
| |--55.36%-- tcp_send_delayed_ack
| | __tcp_ack_snd_check
| | tcp_rcv_established
| | tcp_v4_do_rcv
| | tcp_prequeue_process
| | tcp_recvmsg
| | sock_common_recvmsg
| | __sock_recvmsg
| | sock_aio_read
| | do_sync_read
| | vfs_read
| | sys_read
| | sysenter_dispatch
| | 0xf7f50430
| | 0x5
| |
| --44.64%-- tcp_v4_rcv
| ip_local_deliver_finish
| ip_local_deliver
| ip_rcv_finish
| ip_rcv
| netif_receive_skb
| process_backlog
| net_rx_action
| __do_softirq
| call_softirq
| do_softirq
| _local_bh_enable_ip
| local_bh_enable
| dev_queue_xmit
| ip_finish_output2
| ip_finish_output
| ip_output
| ip_local_out
| ip_queue_xmit
| tcp_transmit_skb
| tcp_write_xmit
| __tcp_push_pending_frames
| tcp_push
| tcp_sendmsg
| __sock_sendmsg
| sock_aio_write
| do_sync_write
| vfs_write
| sys_write
| sysenter_dispatch
| 0xf7f50430
| 0x5
|
|--21.21%-- task_rq_lock
| try_to_wake_up
| default_wake_function
| |
| |--99.34%-- autoremove_wake_function
| | __wake_up_common
| | __wake_up
| | tcp_v4_rcv
| | ip_local_deliver_finish
| | ip_local_deliver
| | ip_rcv_finish
| | ip_rcv
| | netif_receive_skb
| | process_backlog
| | net_rx_action
| | __do_softirq
| | call_softirq
| | do_softirq
| | _local_bh_enable_ip
| | local_bh_enable
| | dev_queue_xmit
| | ip_finish_output2
| | ip_finish_output
| | ip_output
| | ip_local_out
| | ip_queue_xmit
| | tcp_transmit_skb
| | tcp_write_xmit
| | __tcp_push_pending_frames
| | tcp_push
| | tcp_sendmsg
| | __sock_sendmsg
| | sock_aio_write
| | do_sync_write
| | vfs_write
| | sys_write
| | sysenter_dispatch
| | 0xf7f50430
| | 0x5
| |
| --0.66%-- pollwake
| __wake_up_common
| __wake_up
| perf_counter_wakeup
| perf_pending_counter
| perf_counter_do_pending
| smp_perf_pending_interrupt
| perf_pending_interrupt
| 0xf7f50430
| 0x5
|
|--5.06%-- tcp_v4_rcv
| ip_local_deliver_finish
| ip_local_deliver
| ip_rcv_finish
| ip_rcv
| netif_receive_skb
| process_backlog
| net_rx_action
| __do_softirq
| call_softirq
| do_softirq
| _local_bh_enable_ip
| local_bh_enable
| dev_queue_xmit
| ip_finish_output2
| ip_finish_output
| ip_output
| ip_local_out
| ip_queue_xmit
| tcp_transmit_skb
| tcp_write_xmit
| __tcp_push_pending_frames
| tcp_push
| tcp_sendmsg
| __sock_sendmsg
| sock_aio_write
| do_sync_write
| vfs_write
| sys_write
| sysenter_dispatch
| 0xf7f50430
| 0x5
|
|--4.07%-- __perf_counter_sched_out
| perf_counter_task_sched_out
| schedule
| schedule_timeout
| sk_wait_data
| tcp_recvmsg
| sock_common_recvmsg
| __sock_recvmsg
| sock_aio_read
| do_sync_read
| vfs_read
| sys_read
| sysenter_dispatch
| 0xf7f50430
| 0x5
|
|--1.40%-- ipt_local_hook
| nf_iterate
| nf_hook_slow
| __ip_local_out
| ip_local_out
| ip_queue_xmit
| tcp_transmit_skb
| tcp_write_xmit
| __tcp_push_pending_frames
| tcp_push
| tcp_sendmsg
| __sock_sendmsg
| sock_aio_write
| do_sync_write
| vfs_write
| sys_write
| sysenter_dispatch
| 0xf7f50430
| 0x5
|
|--1.26%-- mod_timer
| sk_reset_timer
| |
| |--66.67%-- tcp_send_delayed_ack
| | __tcp_ack_snd_check
| | tcp_rcv_established
| | tcp_v4_do_rcv
| | tcp_prequeue_process
| | tcp_recvmsg
| | sock_common_recvmsg
| | __sock_recvmsg
| | sock_aio_read
| | do_sync_read
| | vfs_read
| | sys_read
| | sysenter_dispatch
| | 0xf7f50430
| | 0x5
| |
| --33.33%-- tcp_v4_rcv
| ip_local_deliver_finish
| ip_local_deliver
| ip_rcv_finish
| ip_rcv
| netif_receive_skb
| process_backlog
| net_rx_action
| __do_softirq
| call_softirq
| do_softirq
| _local_bh_enable_ip
| local_bh_enable
| dev_queue_xmit
| ip_finish_output2
| ip_finish_output
| ip_output
| ip_local_out
| ip_queue_xmit
| tcp_transmit_skb
| tcp_write_xmit
| __tcp_push_pending_frames
| tcp_push
| tcp_sendmsg
| __sock_sendmsg
| sock_aio_write
| do_sync_write
| vfs_write
| sys_write
| sysenter_dispatch
| 0xf7f50430
| 0x5
|
|--0.70%-- ipt_local_in_hook
| nf_iterate
| nf_hook_slow
| ip_local_deliver
| ip_rcv_finish
| ip_rcv
| netif_receive_skb
| process_backlog
| net_rx_action
| __do_softirq
| call_softirq
| do_softirq
| _local_bh_enable_ip
| local_bh_enable
| dev_queue_xmit
| ip_finish_output2
| ip_finish_output
| ip_output
| ip_local_out
| ip_queue_xmit
| tcp_transmit_skb
| tcp_write_xmit
| __tcp_push_pending_frames
| tcp_push
| tcp_sendmsg
| __sock_sendmsg
| sock_aio_write
| do_sync_write
| vfs_write
| sys_write
| sysenter_dispatch
| 0xf7f50430
| 0x5
|
|--0.56%-- perf_counter_task_sched_out
| schedule
| schedule_timeout
| sk_wait_data
| tcp_recvmsg
| sock_common_recvmsg
| __sock_recvmsg
| sock_aio_read
| do_sync_read
| vfs_read
| sys_read
| sysenter_dispatch
| 0xf7f50430
| 0x5
--2.53%-- [...]
1.66% perf [kernel]
|
|--25.00%-- journal_stop
| __ext3_journal_stop
| ext3_writeback_write_end
| generic_perform_write
| generic_file_buffered_write
| __generic_file_aio_write_nolock
| generic_file_aio_write
| ext3_file_write
| do_sync_write
| vfs_write
| sys_write
| system_call_fastpath
| __write_nocancel
|
|--16.67%-- __perf_counter_sched_out
| perf_counter_task_sched_out
| schedule
| schedule_hrtimeout_range
| poll_schedule_timeout
| do_poll
| do_sys_poll
| sys_poll
| system_call_fastpath
| __GI___poll
|
|--8.33%-- dup_mmap
| dup_mm
| copy_process
| do_fork
| sys_clone
| stub_clone
| fork
| __cmd_record
|
|--8.33%-- hrtimer_interrupt
| smp_apic_timer_interrupt
| apic_timer_interrupt
| __write_nocancel
|
|--8.33%-- journal_start
| ext3_journal_start_sb
| ext3_write_begin
| generic_perform_write
| generic_file_buffered_write
| __generic_file_aio_write_nolock
| generic_file_aio_write
| ext3_file_write
| do_sync_write
| vfs_write
| sys_write
| system_call_fastpath
| __write_nocancel
|
|--8.33%-- create_empty_buffers
| __block_prepare_write
| block_write_begin
| ext3_write_begin
| generic_perform_write
| generic_file_buffered_write
| __generic_file_aio_write_nolock
| generic_file_aio_write
| ext3_file_write
| do_sync_write
| vfs_write
| sys_write
| system_call_fastpath
| __write_nocancel
|
|--8.33%-- perf_counter_task_sched_out
| schedule
| schedule_hrtimeout_range
| poll_schedule_timeout
| do_poll
| do_sys_poll
| sys_poll
| system_call_fastpath
| __GI___poll
|
|--8.33%-- ext3_alloc_branch
| ext3_get_blocks_handle
| ext3_get_block
| __block_prepare_write
| block_write_begin
| ext3_write_begin
| generic_perform_write
| generic_file_buffered_write
| __generic_file_aio_write_nolock
| generic_file_aio_write
| ext3_file_write
| do_sync_write
| vfs_write
| sys_write
| system_call_fastpath
| __write_nocancel
|
--8.33%-- start_this_handle
journal_start
ext3_journal_start_sb
ext3_write_begin
generic_perform_write
generic_file_buffered_write
__generic_file_aio_write_nolock
generic_file_aio_write
ext3_file_write
do_sync_write
vfs_write
sys_write
system_call_fastpath
__write_nocancel
the graph above (which is the sub-graph of the full graph, limited
via the -S/--symbols option to all callchains that intersect with
the _spin_lock symbol) details all the various code-paths that make
use of _spin_lock(), and the weights (and relative weights) of
those codepaths.
Note that this works for just about any locking primitive, and note
how it's recursive. If we inlined spin_lock we'd only know this
__mod_timer overhead:
|--31.46%-- __mod_timer
but we wouldnt know the second level (which is interesting too):
| |--55.36%-- tcp_send_delayed_ack
| --44.64%-- tcp_v4_rcv
All in one, i think we should detach the issue of instrumentation
and profiling from the issue of whether to inline or not. We should
inline where it makes things faster. Our profiling tools are able
to deal with it, independently of whether we inline or not.
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-13 18:34 ` Ingo Molnar
@ 2009-08-13 18:43 ` Ingo Molnar
0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2009-08-13 18:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch,
Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt
* Ingo Molnar <mingo@elte.hu> wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > Now, there is one advantage to inlining that I really like - and
> > that I personally have missed from the days when we used to
> > always do it: profiling. Inlining the spinlocks tends to do
> > _wonders_ for showing exactly which spinlock ends up being the
> > contention cases for certain loads. [...]
>
> We've got a new tool to determine that kind of information though:
> perf + call-graph profiling.
>
> This is how it works and how it can be used. Say we want to profile
> 'bw_tcp' from lmbench. The traditional way is to record a flat
> profile:
>
> aldebaran:~/l> perf record -f ./bw_pipe
> Pipe bandwidth: 4607.33 MB/sec
> [ perf record: Captured and wrote 2.503 MB perf.data (~109368 samples) ]
>
> and then to view the flat profile:
>
> aldebaran:~/l> perf report
> # Samples: 109331
> #
> # Overhead Command Shared Object Symbol
> # ........ ....... ........................ ......
> #
> 39.61% bw_pipe [kernel] [k] copy_user_generic_string
> 19.32% bw_pipe ./bw_pipe [.] enough_duration
> 6.21% bw_pipe [kernel] [k] mutex_spin_on_owner
> 3.54% bw_pipe ./bw_pipe [.] two_op
> 3.54% bw_pipe ./bw_pipe [.] one_op
> 2.79% bw_pipe [kernel] [k] get_page_from_freelist
> 2.16% bw_pipe [kernel] [k] ia32_sysenter_target
> 1.91% bw_pipe [kernel] [k] pipe_read
> 1.83% bw_pipe [kernel] [k] pipe_write
> 1.63% bw_pipe [kernel] [k] free_hot_cold_page
> 1.21% bw_pipe [kernel] [k] put_page
> 1.19% bw_pipe [kernel] [k] __alloc_pages_nodemask
> 0.99% bw_pipe [kernel] [k] anon_pipe_buf_release
> 0.74% bw_pipe [kernel] [k] get_pageblock_flags_group
>
> Annoyingly enough, mutex_spin_on_owner() is in the profile but we
> have no idea what the parent function(s) are. (Plus there's no way
> we could reasonably inline mutex functions.) So we are stuck with
> guessing which of the many other functions above (or some other
> function which does not even get listed!) has that mutex spinning
> overhead.
>
> But if you do the recording via call-graph traces:
>
> aldebaran:~/l> perf record -g -f ./bw_pipe
> Pipe bandwidth: 4432.75 MB/sec
> [ perf record: Captured and wrote 7.550 MB perf.data (~329886 samples) ]
>
> (note the -g/--graph option) And then view the profile via perf
> report, we get:
sorry - mail got trimmed due to a cut&paste mishap. Here's the
missing bw_pipe profile:
# Samples: 81007
#
# Overhead Command Shared Object Symbol
# ........ ....... ........................ ......
#
7.60% bw_pipe [kernel] [k] mutex_spin_on_owner
|
|--99.97%-- __mutex_lock_common
| __mutex_lock_slowpath
| mutex_lock
| |
| |--98.46%-- pipe_write
| | do_sync_write
| | vfs_write
| | sys_write
| | sysenter_dispatch
| | 0xf7f4f430
| | 0x600000007
| |
| --1.54%-- pipe_lock
| pipe_wait
| pipe_write
| do_sync_write
| vfs_write
| sys_write
| sysenter_dispatch
| 0xf7f4f430
| 0x600000007
--0.03%-- [...]
it shows that the mutex overhead is distributed between these two
functions:
| |--98.46%-- pipe_write
| --1.54%-- pipe_lock
I also wanted to show a more complex example, how spin-lock
overhead gets distributed in the networking stack, in a
"lat_tcp localhost" workload:
>
> aldebaran:~/l> perf report -S _spin_lock
>
> # symbol: _spin_lock
> # Samples: 724
> #
> # Overhead Command Shared Object
> # ........ ....... ........................
> #
> 98.34% lat_tcp [kernel]
> |
> |--31.74%-- ipt_do_table
> | |
> | |--26.99%-- ipt_local_in_hook
> | | nf_iterate
> | | nf_hook_slow
> | | ip_local_deliver
> | | ip_rcv_finish
> | | ip_rcv
> | | netif_receive_skb
> | | process_backlog
> | | net_rx_action
> | | __do_softirq
> | | call_softirq
> | | do_softirq
> | | _local_bh_enable_ip
> | | local_bh_enable
> | | dev_queue_xmit
> | | ip_finish_output2
> | | ip_finish_output
> | | ip_output
> | | ip_local_out
> | | ip_queue_xmit
> | | tcp_transmit_skb
> | | tcp_write_xmit
> | | __tcp_push_pending_frames
> | | tcp_push
> | | tcp_sendmsg
> | | __sock_sendmsg
> | | sock_aio_write
> | | do_sync_write
> | | vfs_write
> | | sys_write
> | | sysenter_dispatch
> | | 0xf7f50430
> | | 0x5
> | |
> | |--24.34%-- ipt_local_in_hook
> | | nf_iterate
> | | nf_hook_slow
> | | ip_local_deliver
> | | ip_rcv_finish
> | | ip_rcv
> | | netif_receive_skb
> | | process_backlog
> | | net_rx_action
> | | __do_softirq
> | | call_softirq
> | | do_softirq
> | | _local_bh_enable_ip
> | | local_bh_enable
> | | dev_queue_xmit
> | | ip_finish_output2
> | | ip_finish_output
> | | ip_output
> | | ip_local_out
> | | ip_queue_xmit
> | | tcp_transmit_skb
> | | tcp_write_xmit
> | | __tcp_push_pending_frames
> | | tcp_push
> | | tcp_sendmsg
> | | __sock_sendmsg
> | | sock_aio_write
> | | do_sync_write
> | | vfs_write
> | | sys_write
> | | sysenter_dispatch
> | | 0xf7f50430
> | | 0x5
> | |
> | |--16.81%-- ipt_pre_routing_hook
> | | nf_iterate
> | | nf_hook_slow
> | | ip_rcv
> | | netif_receive_skb
> | | process_backlog
> | | net_rx_action
> | | __do_softirq
> | | call_softirq
> | | do_softirq
> | | _local_bh_enable_ip
> | | local_bh_enable
> | | dev_queue_xmit
> | | ip_finish_output2
> | | ip_finish_output
> | | ip_output
> | | ip_local_out
> | | ip_queue_xmit
> | | tcp_transmit_skb
> | | tcp_write_xmit
> | | __tcp_push_pending_frames
> | | tcp_push
> | | tcp_sendmsg
> | | __sock_sendmsg
> | | sock_aio_write
> | | do_sync_write
> | | vfs_write
> | | sys_write
> | | sysenter_dispatch
> | | 0xf7f50430
> | | 0x5
> | |
> | |--11.95%-- ipt_post_routing_hook
> | | nf_iterate
> | | nf_hook_slow
> | | ip_output
> | | ip_local_out
> | | ip_queue_xmit
> | | tcp_transmit_skb
> | | tcp_write_xmit
> | | __tcp_push_pending_frames
> | | tcp_push
> | | tcp_sendmsg
> | | __sock_sendmsg
> | | sock_aio_write
> | | do_sync_write
> | | vfs_write
> | | sys_write
> | | sysenter_dispatch
> | | 0xf7f50430
> | | 0x5
> | |
> | |--10.18%-- ipt_local_out_hook
> | | nf_iterate
> | | nf_hook_slow
> | | __ip_local_out
> | | ip_local_out
> | | ip_queue_xmit
> | | tcp_transmit_skb
> | | tcp_write_xmit
> | | __tcp_push_pending_frames
> | | tcp_push
> | | tcp_sendmsg
> | | __sock_sendmsg
> | | sock_aio_write
> | | do_sync_write
> | | vfs_write
> | | sys_write
> | | sysenter_dispatch
> | | 0xf7f50430
> | | 0x5
> | |
> | --9.73%-- ipt_local_hook
> | nf_iterate
> | nf_hook_slow
> | __ip_local_out
> | ip_local_out
> | ip_queue_xmit
> | tcp_transmit_skb
> | tcp_write_xmit
> | __tcp_push_pending_frames
> | tcp_push
> | tcp_sendmsg
> | __sock_sendmsg
> | sock_aio_write
> | do_sync_write
> | vfs_write
> | sys_write
> | sysenter_dispatch
> | 0xf7f50430
> | 0x5
> |
> |--31.46%-- __mod_timer
> | mod_timer
> | sk_reset_timer
> | |
> | |--55.36%-- tcp_send_delayed_ack
> | | __tcp_ack_snd_check
> | | tcp_rcv_established
> | | tcp_v4_do_rcv
> | | tcp_prequeue_process
> | | tcp_recvmsg
> | | sock_common_recvmsg
> | | __sock_recvmsg
> | | sock_aio_read
> | | do_sync_read
> | | vfs_read
> | | sys_read
> | | sysenter_dispatch
> | | 0xf7f50430
> | | 0x5
> | |
> | --44.64%-- tcp_v4_rcv
> | ip_local_deliver_finish
> | ip_local_deliver
> | ip_rcv_finish
> | ip_rcv
> | netif_receive_skb
> | process_backlog
> | net_rx_action
> | __do_softirq
> | call_softirq
> | do_softirq
> | _local_bh_enable_ip
> | local_bh_enable
> | dev_queue_xmit
> | ip_finish_output2
> | ip_finish_output
> | ip_output
> | ip_local_out
> | ip_queue_xmit
> | tcp_transmit_skb
> | tcp_write_xmit
> | __tcp_push_pending_frames
> | tcp_push
> | tcp_sendmsg
> | __sock_sendmsg
> | sock_aio_write
> | do_sync_write
> | vfs_write
> | sys_write
> | sysenter_dispatch
> | 0xf7f50430
> | 0x5
> |
> |--21.21%-- task_rq_lock
> | try_to_wake_up
> | default_wake_function
> | |
> | |--99.34%-- autoremove_wake_function
> | | __wake_up_common
> | | __wake_up
> | | tcp_v4_rcv
> | | ip_local_deliver_finish
> | | ip_local_deliver
> | | ip_rcv_finish
> | | ip_rcv
> | | netif_receive_skb
> | | process_backlog
> | | net_rx_action
> | | __do_softirq
> | | call_softirq
> | | do_softirq
> | | _local_bh_enable_ip
> | | local_bh_enable
> | | dev_queue_xmit
> | | ip_finish_output2
> | | ip_finish_output
> | | ip_output
> | | ip_local_out
> | | ip_queue_xmit
> | | tcp_transmit_skb
> | | tcp_write_xmit
> | | __tcp_push_pending_frames
> | | tcp_push
> | | tcp_sendmsg
> | | __sock_sendmsg
> | | sock_aio_write
> | | do_sync_write
> | | vfs_write
> | | sys_write
> | | sysenter_dispatch
> | | 0xf7f50430
> | | 0x5
> | |
> | --0.66%-- pollwake
> | __wake_up_common
> | __wake_up
> | perf_counter_wakeup
> | perf_pending_counter
> | perf_counter_do_pending
> | smp_perf_pending_interrupt
> | perf_pending_interrupt
> | 0xf7f50430
> | 0x5
> |
> |--5.06%-- tcp_v4_rcv
> | ip_local_deliver_finish
> | ip_local_deliver
> | ip_rcv_finish
> | ip_rcv
> | netif_receive_skb
> | process_backlog
> | net_rx_action
> | __do_softirq
> | call_softirq
> | do_softirq
> | _local_bh_enable_ip
> | local_bh_enable
> | dev_queue_xmit
> | ip_finish_output2
> | ip_finish_output
> | ip_output
> | ip_local_out
> | ip_queue_xmit
> | tcp_transmit_skb
> | tcp_write_xmit
> | __tcp_push_pending_frames
> | tcp_push
> | tcp_sendmsg
> | __sock_sendmsg
> | sock_aio_write
> | do_sync_write
> | vfs_write
> | sys_write
> | sysenter_dispatch
> | 0xf7f50430
> | 0x5
> |
> |--4.07%-- __perf_counter_sched_out
> | perf_counter_task_sched_out
> | schedule
> | schedule_timeout
> | sk_wait_data
> | tcp_recvmsg
> | sock_common_recvmsg
> | __sock_recvmsg
> | sock_aio_read
> | do_sync_read
> | vfs_read
> | sys_read
> | sysenter_dispatch
> | 0xf7f50430
> | 0x5
> |
> |--1.40%-- ipt_local_hook
> | nf_iterate
> | nf_hook_slow
> | __ip_local_out
> | ip_local_out
> | ip_queue_xmit
> | tcp_transmit_skb
> | tcp_write_xmit
> | __tcp_push_pending_frames
> | tcp_push
> | tcp_sendmsg
> | __sock_sendmsg
> | sock_aio_write
> | do_sync_write
> | vfs_write
> | sys_write
> | sysenter_dispatch
> | 0xf7f50430
> | 0x5
> |
> |--1.26%-- mod_timer
> | sk_reset_timer
> | |
> | |--66.67%-- tcp_send_delayed_ack
> | | __tcp_ack_snd_check
> | | tcp_rcv_established
> | | tcp_v4_do_rcv
> | | tcp_prequeue_process
> | | tcp_recvmsg
> | | sock_common_recvmsg
> | | __sock_recvmsg
> | | sock_aio_read
> | | do_sync_read
> | | vfs_read
> | | sys_read
> | | sysenter_dispatch
> | | 0xf7f50430
> | | 0x5
> | |
> | --33.33%-- tcp_v4_rcv
> | ip_local_deliver_finish
> | ip_local_deliver
> | ip_rcv_finish
> | ip_rcv
> | netif_receive_skb
> | process_backlog
> | net_rx_action
> | __do_softirq
> | call_softirq
> | do_softirq
> | _local_bh_enable_ip
> | local_bh_enable
> | dev_queue_xmit
> | ip_finish_output2
> | ip_finish_output
> | ip_output
> | ip_local_out
> | ip_queue_xmit
> | tcp_transmit_skb
> | tcp_write_xmit
> | __tcp_push_pending_frames
> | tcp_push
> | tcp_sendmsg
> | __sock_sendmsg
> | sock_aio_write
> | do_sync_write
> | vfs_write
> | sys_write
> | sysenter_dispatch
> | 0xf7f50430
> | 0x5
> |
> |--0.70%-- ipt_local_in_hook
> | nf_iterate
> | nf_hook_slow
> | ip_local_deliver
> | ip_rcv_finish
> | ip_rcv
> | netif_receive_skb
> | process_backlog
> | net_rx_action
> | __do_softirq
> | call_softirq
> | do_softirq
> | _local_bh_enable_ip
> | local_bh_enable
> | dev_queue_xmit
> | ip_finish_output2
> | ip_finish_output
> | ip_output
> | ip_local_out
> | ip_queue_xmit
> | tcp_transmit_skb
> | tcp_write_xmit
> | __tcp_push_pending_frames
> | tcp_push
> | tcp_sendmsg
> | __sock_sendmsg
> | sock_aio_write
> | do_sync_write
> | vfs_write
> | sys_write
> | sysenter_dispatch
> | 0xf7f50430
> | 0x5
> |
> |--0.56%-- perf_counter_task_sched_out
> | schedule
> | schedule_timeout
> | sk_wait_data
> | tcp_recvmsg
> | sock_common_recvmsg
> | __sock_recvmsg
> | sock_aio_read
> | do_sync_read
> | vfs_read
> | sys_read
> | sysenter_dispatch
> | 0xf7f50430
> | 0x5
> --2.53%-- [...]
>
> 1.66% perf [kernel]
> |
> |--25.00%-- journal_stop
> | __ext3_journal_stop
> | ext3_writeback_write_end
> | generic_perform_write
> | generic_file_buffered_write
> | __generic_file_aio_write_nolock
> | generic_file_aio_write
> | ext3_file_write
> | do_sync_write
> | vfs_write
> | sys_write
> | system_call_fastpath
> | __write_nocancel
> |
> |--16.67%-- __perf_counter_sched_out
> | perf_counter_task_sched_out
> | schedule
> | schedule_hrtimeout_range
> | poll_schedule_timeout
> | do_poll
> | do_sys_poll
> | sys_poll
> | system_call_fastpath
> | __GI___poll
> |
> |--8.33%-- dup_mmap
> | dup_mm
> | copy_process
> | do_fork
> | sys_clone
> | stub_clone
> | fork
> | __cmd_record
> |
> |--8.33%-- hrtimer_interrupt
> | smp_apic_timer_interrupt
> | apic_timer_interrupt
> | __write_nocancel
> |
> |--8.33%-- journal_start
> | ext3_journal_start_sb
> | ext3_write_begin
> | generic_perform_write
> | generic_file_buffered_write
> | __generic_file_aio_write_nolock
> | generic_file_aio_write
> | ext3_file_write
> | do_sync_write
> | vfs_write
> | sys_write
> | system_call_fastpath
> | __write_nocancel
> |
> |--8.33%-- create_empty_buffers
> | __block_prepare_write
> | block_write_begin
> | ext3_write_begin
> | generic_perform_write
> | generic_file_buffered_write
> | __generic_file_aio_write_nolock
> | generic_file_aio_write
> | ext3_file_write
> | do_sync_write
> | vfs_write
> | sys_write
> | system_call_fastpath
> | __write_nocancel
> |
> |--8.33%-- perf_counter_task_sched_out
> | schedule
> | schedule_hrtimeout_range
> | poll_schedule_timeout
> | do_poll
> | do_sys_poll
> | sys_poll
> | system_call_fastpath
> | __GI___poll
> |
> |--8.33%-- ext3_alloc_branch
> | ext3_get_blocks_handle
> | ext3_get_block
> | __block_prepare_write
> | block_write_begin
> | ext3_write_begin
> | generic_perform_write
> | generic_file_buffered_write
> | __generic_file_aio_write_nolock
> | generic_file_aio_write
> | ext3_file_write
> | do_sync_write
> | vfs_write
> | sys_write
> | system_call_fastpath
> | __write_nocancel
> |
> --8.33%-- start_this_handle
> journal_start
> ext3_journal_start_sb
> ext3_write_begin
> generic_perform_write
> generic_file_buffered_write
> __generic_file_aio_write_nolock
> generic_file_aio_write
> ext3_file_write
> do_sync_write
> vfs_write
> sys_write
> system_call_fastpath
> __write_nocancel
>
> the graph above (which is the sub-graph of the full graph, limited
> via the -S/--symbols option to all callchains that intersect with
> the _spin_lock symbol) details all the various code-paths that make
> use of _spin_lock(), and the weights (and relative weights) of
> those codepaths.
>
> Note that this works for just about any locking primitive, and note
> how it's recursive. If we inlined spin_lock we'd only know this
> __mod_timer overhead:
>
> |--31.46%-- __mod_timer
>
> but we wouldnt know the second level (which is interesting too):
>
> | |--55.36%-- tcp_send_delayed_ack
> | --44.64%-- tcp_v4_rcv
>
> All in one, i think we should detach the issue of instrumentation
> and profiling from the issue of whether to inline or not. We should
> inline where it makes things faster. Our profiling tools are able
> to deal with it, independently of whether we inline or not.
>
> Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
2009-08-13 18:34 ` Ingo Molnar
@ 2009-08-14 12:34 ` Heiko Carstens
2009-08-14 16:04 ` Linus Torvalds
1 sibling, 1 reply; 21+ messages in thread
From: Heiko Carstens @ 2009-08-14 12:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt
On Thu, Aug 13, 2009 at 11:11:22AM -0700, Linus Torvalds wrote:
> On Wed, 12 Aug 2009, Heiko Carstens wrote:
> > If one considers that server kernels are usually compiled with
> > !CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop.
> > The extra overhead for a function call is significant.
> > With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390.
> > These numbers were taken with some network benchmarks. However I expect
> > any workload that calls frequently into the kernel and which grabs a few
> > locks to perform better.
>
> Have you tried to determine which kinds of spinlocks you care most about?
> With networking, I guess the softirq-safe ones (and perhaps the irq ones)
> are pretty common, but maybe even there it's really just a couple of
> cases.
It's all spinlock* variants that are frequently called (see also below).
> [...]
> So there are advantages outside of the "size vs speed" kind of things, but
> at the same time I do suspect that the disadvantages mean that we really
> should make this be something where an architecture can specify things at
> a finer granularity. Perhaps by using per-symbol defines (ie have the
> <asm/spinlock.h> file do things like
>
> #define __spin_lock_is_small
> #define __spin_unlock_is_small
>
> kind of things for the particular sequences that are worth inlining.
Ok, that makes sense. I will post updated patches.
FWIW:
All of this started when we compared a 2.6.27 based distro kernel
to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu
cycles than 2.6.16 does for the same workloads.
As an example: we took a simple one connection ping pong network load.
(client sends packet to server and waits until packet from server
returns and game starts again). It uses more cpu cycles on 2.6.27.
Using ftrace we figured out that on 2.6.27 it takes more than 500 function
calls on the client for a single roundtrip while on 2.6.16 it took only
~230 calls.
Changing s390 from its own NOHZ code to the generic version did cost
quite a few cycles. The idle entry and exit paths are much more expensive
now. That's why you've seen Martin posting NOHZ/timer related patches lately.
But also quite a few functions that were inlined in 2.6.16 are out-of-line
in 2.6.27. Reverting those changes did improve performance as well.
Since out of the ~500 function calls 60 are spin_[un]lock* calls it seemed
worthwile inlining them. Which also improved things again.
Here are the function traces for 2.6.27 and 2.6.16:
(the timestamp output has been modified)
For 2.6.27:
# tracer: ftrace
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
perl-2581 [00] 28991519113889978 [+ 125]: SyS_write <-sysc_noemu
perl-2581 [00] 28991519113890103 [+ 156]: fget_light <-SyS_write
perl-2581 [00] 28991519113890259 [+ 125]: vfs_write <-SyS_write
perl-2581 [00] 28991519113890384 [+ 156]: rw_verify_area <-vfs_write
perl-2581 [00] 28991519113890540 [+ 125]: security_file_permission <-rw_verify_area
perl-2581 [00] 28991519113890665 [+ 93]: apparmor_file_permission <-security_file_permission
perl-2581 [00] 28991519113890759 [+ 156]: aa_file_permission <-apparmor_file_permission
perl-2581 [00] 28991519113890915 [+ 125]: do_sync_write <-vfs_write
perl-2581 [00] 28991519113891040 [+ 125]: sock_aio_write <-do_sync_write
perl-2581 [00] 28991519113891165 [+ 125]: security_socket_sendmsg <-sock_aio_write
perl-2581 [00] 28991519113891290 [+ 125]: apparmor_socket_sendmsg <-security_socket_sendmsg
perl-2581 [00] 28991519113891415 [+ 125]: aa_revalidate_sk <-apparmor_socket_sendmsg
perl-2581 [00] 28991519113891540 [+ 125]: tcp_sendmsg <-sock_aio_write
perl-2581 [00] 28991519113891665 [+ 125]: lock_sock_nested <-tcp_sendmsg
perl-2581 [00] 28991519113891790 [+ 125]: _spin_lock_bh <-lock_sock_nested
perl-2581 [00] 28991519113891915 [+ 125]: local_bh_disable <-_spin_lock_bh
perl-2581 [00] 28991519113892040 [+ 187]: local_bh_enable <-lock_sock_nested
perl-2581 [00] 28991519113892228 [+ 125]: tcp_current_mss <-tcp_sendmsg
perl-2581 [00] 28991519113892353 [+ 156]: tcp_established_options <-tcp_current_mss
perl-2581 [00] 28991519113892509 [+ 125]: sk_stream_alloc_skb <-tcp_sendmsg
perl-2581 [00] 28991519113892634 [+ 125]: __alloc_skb <-sk_stream_alloc_skb
perl-2581 [00] 28991519113892759 [+ 156]: kmem_cache_alloc <-__alloc_skb
perl-2581 [00] 28991519113892915 [+ 281]: __kmalloc <-__alloc_skb
perl-2581 [00] 28991519113893196 [+ 125]: skb_put <-tcp_sendmsg
perl-2581 [00] 28991519113893321 [+ 125]: copy_from_user_mvcos_check <-tcp_sendmsg
perl-2581 [00] 28991519113893446 [+ 156]: copy_from_user_std <-copy_from_user_mvcos_check
perl-2581 [00] 28991519113893603 [+ 125]: __tcp_push_pending_frames <-tcp_sendmsg
perl-2581 [00] 28991519113893728 [+ 125]: tcp_init_tso_segs <-__tcp_push_pending_frames
perl-2581 [00] 28991519113893853 [+ 125]: tcp_set_skb_tso_segs <-tcp_init_tso_segs
perl-2581 [00] 28991519113893978 [+ 156]: tcp_transmit_skb <-__tcp_push_pending_frames
perl-2581 [00] 28991519113894134 [+ 93]: skb_clone <-tcp_transmit_skb
perl-2581 [00] 28991519113894228 [+ 125]: __skb_clone <-skb_clone
perl-2581 [00] 28991519113894353 [+ 156]: __copy_skb_header <-__skb_clone
perl-2581 [00] 28991519113894509 [+ 125]: tcp_established_options <-tcp_transmit_skb
perl-2581 [00] 28991519113894634 [+ 125]: skb_push <-tcp_transmit_skb
perl-2581 [00] 28991519113894759 [+ 156]: __tcp_select_window <-tcp_transmit_skb
perl-2581 [00] 28991519113894915 [+ 125]: tcp_options_write <-tcp_transmit_skb
perl-2581 [00] 28991519113895040 [+ 156]: tcp_v4_send_check <-tcp_transmit_skb
perl-2581 [00] 28991519113895196 [+ 125]: ip_queue_xmit <-tcp_transmit_skb
perl-2581 [00] 28991519113895321 [+ 156]: __sk_dst_check <-ip_queue_xmit
perl-2581 [00] 28991519113895478 [+ 125]: skb_push <-ip_queue_xmit
perl-2581 [00] 28991519113895603 [+ 93]: ip_local_out <-ip_queue_xmit
perl-2581 [00] 28991519113895696 [+ 156]: __ip_local_out <-ip_local_out
perl-2581 [00] 28991519113895853 [+ 125]: ip_output <-ip_local_out
perl-2581 [00] 28991519113895978 [+ 156]: ip_finish_output <-ip_output
perl-2581 [00] 28991519113896134 [+ 125]: skb_push <-ip_finish_output
perl-2581 [00] 28991519113896259 [+ 125]: dev_queue_xmit <-ip_finish_output
perl-2581 [00] 28991519113896384 [+ 125]: local_bh_disable <-dev_queue_xmit
perl-2581 [00] 28991519113896509 [+ 187]: _spin_lock <-dev_queue_xmit
perl-2581 [00] 28991519113896696 [+ 125]: pfifo_fast_enqueue <-dev_queue_xmit
perl-2581 [00] 28991519113896821 [+ 125]: __qdisc_run <-dev_queue_xmit
perl-2581 [00] 28991519113896946 [+ 156]: pfifo_fast_dequeue <-__qdisc_run
perl-2581 [00] 28991519113897103 [+ 125]: _spin_lock <-__qdisc_run
perl-2581 [00] 28991519113897228 [+ 187]: dev_hard_start_xmit <-__qdisc_run
perl-2581 [00] 28991519113897415 [+ 156]: qeth_l3_hard_start_xmit <-dev_hard_start_xmit
perl-2581 [00] 28991519113897571 [+ 125]: qeth_get_cast_type <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113897696 [+ 156]: qeth_get_priority_queue <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113897853 [+ 125]: skb_realloc_headroom <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113897978 [+ 125]: pskb_copy <-skb_realloc_headroom
perl-2581 [00] 28991519113898103 [+ 93]: __alloc_skb <-pskb_copy
perl-2581 [00] 28991519113898196 [+ 187]: kmem_cache_alloc <-__alloc_skb
perl-2581 [00] 28991519113898384 [+ 218]: __kmalloc <-__alloc_skb
perl-2581 [00] 28991519113898603 [+ 156]: skb_put <-pskb_copy
perl-2581 [00] 28991519113898759 [+ 93]: copy_skb_header <-pskb_copy
perl-2581 [00] 28991519113898853 [+ 218]: __copy_skb_header <-copy_skb_header
perl-2581 [00] 28991519113899071 [+ 125]: skb_pull <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113899196 [+ 125]: skb_push <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113899321 [+ 156]: qeth_l3_fill_header <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113899478 [+ 125]: qeth_get_elements_no <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113899603 [+ 156]: qeth_do_send_packet <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113899759 [+ 125]: skb_queue_tail <-qeth_do_send_packet
perl-2581 [00] 28991519113899884 [+ 125]: _spin_lock_irqsave <-skb_queue_tail
perl-2581 [00] 28991519113900009 [+ 156]: _spin_unlock_irqrestore <-skb_queue_tail
perl-2581 [00] 28991519113900165 [+ 187]: qeth_flush_buffers <-qeth_do_send_packet
perl-2581 [00] 28991519113900353 [+ 156]: do_QDIO <-qeth_flush_buffers
perl-2581 [00] 28991519113900509 [+ 156]: qdio_perf_stat_inc <-do_QDIO
perl-2581 [00] 28991519113900665 [+ 125]: qdio_kick_outbound_q <-do_QDIO
perl-2581 [00] 28991519113900790 [+ 812]: qdio_perf_stat_inc <-qdio_kick_outbound_q
perl-2581 [00] 28991519113901603 [+ 218]: __tasklet_schedule <-do_QDIO
perl-2581 [00] 28991519113901821 [+ 156]: dev_kfree_skb_any <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113901978 [+ 125]: kfree_skb <-dev_kfree_skb_any
perl-2581 [00] 28991519113902103 [+ 93]: __kfree_skb <-kfree_skb
perl-2581 [00] 28991519113902196 [+ 125]: skb_release_all <-__kfree_skb
perl-2581 [00] 28991519113902321 [+ 156]: dst_release <-skb_release_all
perl-2581 [00] 28991519113902478 [+ 93]: sock_wfree <-skb_release_all
perl-2581 [00] 28991519113902571 [+ 187]: skb_release_data <-skb_release_all
perl-2581 [00] 28991519113902759 [+ 125]: __netif_schedule <-qeth_l3_hard_start_xmit
perl-2581 [00] 28991519113902884 [+ 156]: raise_softirq_irqoff <-__netif_schedule
perl-2581 [00] 28991519113903040 [+ 125]: _spin_lock <-__qdisc_run
perl-2581 [00] 28991519113903165 [+ 187]: local_bh_enable <-dev_queue_xmit
perl-2581 [00] 28991519113903353 [+ 156]: do_softirq <-local_bh_enable
perl-2581 [00] 28991519113903509 [+ 93]: __do_softirq <-do_softirq
perl-2581 [00] 28991519113903603 [+ 93]: account_system_vtime <-__do_softirq
perl-2581 [00] 28991519113903696 [+ 125]: account_system_time <-account_system_vtime
perl-2581 [00] 28991519113903821 [+ 93]: acct_update_integrals <-account_system_time
perl-2581 [00] 28991519113903915 [+ 187]: jiffies_to_timeval <-acct_update_integrals
perl-2581 [00] 28991519113904103 [+ 156]: net_tx_action <-__do_softirq
perl-2581 [00] 28991519113904259 [+ 125]: _spin_trylock <-net_tx_action
perl-2581 [00] 28991519113904384 [+ 125]: __qdisc_run <-net_tx_action
perl-2581 [00] 28991519113904509 [+ 156]: pfifo_fast_dequeue <-__qdisc_run
perl-2581 [00] 28991519113904665 [+ 187]: tasklet_action <-__do_softirq
perl-2581 [00] 28991519113904853 [+ 93]: qdio_outbound_processing <-tasklet_action
perl-2581 [00] 28991519113904946 [+ 187]: qdio_perf_stat_inc <-qdio_outbound_processing
perl-2581 [00] 28991519113905134 [+ 156]: mod_timer <-qdio_outbound_processing
perl-2581 [00] 28991519113905290 [+ 125]: __timer_stats_timer_set_start_info <-mod_timer
perl-2581 [00] 28991519113905415 [+ 125]: __mod_timer <-mod_timer
perl-2581 [00] 28991519113905540 [+ 125]: __timer_stats_timer_set_start_info <-__mod_timer
perl-2581 [00] 28991519113905665 [+ 93]: lock_timer_base <-__mod_timer
perl-2581 [00] 28991519113905759 [+ 156]: _spin_lock_irqsave <-lock_timer_base
perl-2581 [00] 28991519113905915 [+ 125]: internal_add_timer <-__mod_timer
perl-2581 [00] 28991519113906040 [+ 125]: _spin_unlock_irqrestore <-__mod_timer
perl-2581 [00] 28991519113906165 [+ 156]: qdio_perf_stat_inc <-qdio_outbound_processing
perl-2581 [00] 28991519113906321 [+ 125]: account_system_vtime <-__do_softirq
perl-2581 [00] 28991519113906446 [+ 62]: account_system_time <-account_system_vtime
perl-2581 [00] 28991519113906509 [+ 125]: acct_update_integrals <-account_system_time
perl-2581 [00] 28991519113906634 [+ 125]: jiffies_to_timeval <-acct_update_integrals
perl-2581 [00] 28991519113906759 [+ 156]: _local_bh_enable <-__do_softirq
perl-2581 [00] 28991519113906915 [+ 125]: tcp_event_new_data_sent <-__tcp_push_pending_frames
perl-2581 [00] 28991519113907040 [+ 125]: sk_reset_timer <-tcp_event_new_data_sent
perl-2581 [00] 28991519113907165 [+ 93]: mod_timer <-sk_reset_timer
perl-2581 [00] 28991519113907259 [+ 187]: __timer_stats_timer_set_start_info <-mod_timer
perl-2581 [00] 28991519113907446 [+ 156]: release_sock <-tcp_sendmsg
perl-2581 [00] 28991519113907603 [+ 125]: _spin_lock_bh <-release_sock
perl-2581 [00] 28991519113907728 [+ 125]: local_bh_disable <-_spin_lock_bh
perl-2581 [00] 28991519113907853 [+ 125]: _spin_unlock_bh <-release_sock
perl-2581 [00] 28991519113907978 [+ 187]: local_bh_enable_ip <-_spin_unlock_bh
perl-2581 [00] 28991519113908165 [+ 93]: dnotify_parent <-vfs_write
perl-2581 [00] 28991519113908259 [+ 156]: _spin_lock <-dnotify_parent
perl-2581 [00] 28991519113908415 [+ 156]: inotify_dentry_parent_queue_event <-vfs_write
perl-2581 [00] 28991519113908571 [+ 875]: inotify_inode_queue_event <-vfs_write
perl-2581 [00] 28991519113909446 [+ 93]: SyS_read <-sysc_noemu
perl-2581 [00] 28991519113909540 [+ 156]: fget_light <-SyS_read
perl-2581 [00] 28991519113909696 [+ 125]: vfs_read <-SyS_read
perl-2581 [00] 28991519113909821 [+ 125]: rw_verify_area <-vfs_read
perl-2581 [00] 28991519113909946 [+ 125]: security_file_permission <-rw_verify_area
perl-2581 [00] 28991519113910071 [+ 125]: apparmor_file_permission <-security_file_permission
perl-2581 [00] 28991519113910196 [+ 125]: aa_file_permission <-apparmor_file_permission
perl-2581 [00] 28991519113910321 [+ 156]: do_sync_read <-vfs_read
perl-2581 [00] 28991519113910478 [+ 125]: sock_aio_read <-do_sync_read
perl-2581 [00] 28991519113910603 [+ 125]: security_socket_recvmsg <-sock_aio_read
perl-2581 [00] 28991519113910728 [+ 93]: apparmor_socket_recvmsg <-security_socket_recvmsg
perl-2581 [00] 28991519113910821 [+ 156]: aa_revalidate_sk <-apparmor_socket_recvmsg
perl-2581 [00] 28991519113910978 [+ 125]: sock_common_recvmsg <-sock_aio_read
perl-2581 [00] 28991519113911103 [+ 125]: tcp_recvmsg <-sock_common_recvmsg
perl-2581 [00] 28991519113911228 [+ 125]: lock_sock_nested <-tcp_recvmsg
perl-2581 [00] 28991519113911353 [+ 125]: _spin_lock_bh <-lock_sock_nested
perl-2581 [00] 28991519113911478 [+ 125]: local_bh_disable <-_spin_lock_bh
perl-2581 [00] 28991519113911603 [+ 250]: local_bh_enable <-lock_sock_nested
perl-2581 [00] 28991519113911853 [+ 125]: tcp_cleanup_rbuf <-tcp_recvmsg
perl-2581 [00] 28991519113911978 [+ 125]: sk_wait_data <-tcp_recvmsg
perl-2581 [00] 28991519113912103 [+ 125]: prepare_to_wait <-sk_wait_data
perl-2581 [00] 28991519113912228 [+ 125]: _spin_lock_irqsave <-prepare_to_wait
perl-2581 [00] 28991519113912353 [+ 125]: _spin_unlock_irqrestore <-prepare_to_wait
perl-2581 [00] 28991519113912478 [+ 125]: release_sock <-sk_wait_data
perl-2581 [00] 28991519113912603 [+ 125]: _spin_lock_bh <-release_sock
perl-2581 [00] 28991519113912728 [+ 125]: local_bh_disable <-_spin_lock_bh
perl-2581 [00] 28991519113912853 [+ 125]: _spin_unlock_bh <-release_sock
perl-2581 [00] 28991519113912978 [+ 187]: local_bh_enable_ip <-_spin_unlock_bh
perl-2581 [00] 28991519113913165 [+ 125]: schedule_timeout <-sk_wait_data
perl-2581 [00] 28991519113913290 [+ 125]: schedule <-schedule_timeout
perl-2581 [00] 28991519113913415 [+ 125]: _spin_lock <-schedule
perl-2581 [00] 28991519113913540 [+ 93]: sched_clock_cpu <-schedule
perl-2581 [00] 28991519113913634 [+ 125]: sched_clock <-sched_clock_cpu
perl-2581 [00] 28991519113913759 [+ 93]: deactivate_task <-schedule
perl-2581 [00] 28991519113913853 [+ 125]: dequeue_task <-deactivate_task
perl-2581 [00] 28991519113913978 [+ 93]: dequeue_task_fair <-dequeue_task
perl-2581 [00] 28991519113914071 [+ 93]: dequeue_entity <-dequeue_task_fair
perl-2581 [00] 28991519113914165 [+ 156]: update_curr <-dequeue_entity
perl-2581 [00] 28991519113914321 [+ 93]: hrtick_start_fair <-dequeue_task_fair
perl-2581 [00] 28991519113914415 [+ 125]: put_prev_task_fair <-schedule
perl-2581 [00] 28991519113914540 [+ 125]: pick_next_task_fair <-schedule
perl-2581 [00] 28991519113914665 [+ 93]: pick_next_task_rt <-schedule
perl-2581 [00] 28991519113914759 [+ 93]: pick_next_task_fair <-schedule
perl-2581 [00] 28991519113914853 [+ 125]: pick_next_task_idle <-schedule
perl-2581 [00] 28991519113914978 [+ 218]: _spin_trylock <-tracing_record_cmdline
<idle>-0 [00] 28991519113915196 [+ 125]: finish_task_switch <-schedule
<idle>-0 [00] 28991519113915321 [+ 62]: account_vtime <-finish_task_switch
<idle>-0 [00] 28991519113915384 [+ 93]: account_user_time <-account_vtime
<idle>-0 [00] 28991519113915478 [+ 93]: acct_update_integrals <-account_user_time
<idle>-0 [00] 28991519113915571 [+ 125]: jiffies_to_timeval <-acct_update_integrals
<idle>-0 [00] 28991519113915696 [+ 93]: account_system_time <-account_vtime
<idle>-0 [00] 28991519113915790 [+ 93]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28991519113915884 [+ 187]: jiffies_to_timeval <-acct_update_integrals
<idle>-0 [00] 28991519113916071 [+ 125]: tick_nohz_stop_sched_tick <-cpu_idle
<idle>-0 [00] 28991519113916196 [+ 93]: ktime_get <-tick_nohz_stop_sched_tick
<idle>-0 [00] 28991519113916290 [+ 93]: ktime_get_ts <-ktime_get
<idle>-0 [00] 28991519113916384 [+ 93]: getnstimeofday <-ktime_get_ts
<idle>-0 [00] 28991519113916478 [+ 125]: read_tod_clock <-getnstimeofday
<idle>-0 [00] 28991519113916603 [+ 156]: set_normalized_timespec <-ktime_get_ts
<idle>-0 [00] 28991519113916759 [+ 93]: get_next_timer_interrupt <-tick_nohz_stop_sched_tick
<idle>-0 [00] 28991519113916853 [+ 125]: _spin_lock <-get_next_timer_interrupt
<idle>-0 [00] 28991519113916978 [+ 93]: hrtimer_get_next_event <-get_next_timer_interrupt
<idle>-0 [00] 28991519113917071 [+ 125]: _spin_lock_irqsave <-hrtimer_get_next_event
<idle>-0 [00] 28991519113917196 [+ 93]: _spin_unlock_irqrestore <-hrtimer_get_next_event
<idle>-0 [00] 28991519113917290 [+ 93]: rcu_needs_cpu <-tick_nohz_stop_sched_tick
<idle>-0 [00] 28991519113917384 [+ 156]: rcu_pending <-rcu_needs_cpu
<idle>-0 [00] 28991519113917540 [+ 93]: select_nohz_load_balancer <-tick_nohz_stop_sched_tick
<idle>-0 [00] 28991519113917634 [+ 93]: hweight64 <-select_nohz_load_balancer
<idle>-0 [00] 28991519113917728 [+ 156]: hweight64 <-select_nohz_load_balancer
<idle>-0 [00] 28991519113917884 [+ 62]: hrtimer_start <-tick_nohz_stop_sched_tick
<idle>-0 [00] 28991519113917946 [+ 93]: lock_hrtimer_base <-hrtimer_start
<idle>-0 [00] 28991519113918040 [+ 125]: _spin_lock_irqsave <-lock_hrtimer_base
<idle>-0 [00] 28991519113918165 [+ 93]: __remove_hrtimer <-hrtimer_start
<idle>-0 [00] 28991519113918259 [+ 125]: rb_next <-__remove_hrtimer
<idle>-0 [00] 28991519113918384 [+ 125]: hrtimer_force_reprogram <-__remove_hrtimer
<idle>-0 [00] 28991519113918509 [+ 93]: tick_program_event <-hrtimer_force_reprogram
<idle>-0 [00] 28991519113918603 [+ 93]: tick_dev_program_event <-tick_program_event
<idle>-0 [00] 28991519113918696 [+ 93]: ktime_get <-tick_dev_program_event
<idle>-0 [00] 28991519113918790 [+ 93]: ktime_get_ts <-ktime_get
<idle>-0 [00] 28991519113918884 [+ 62]: getnstimeofday <-ktime_get_ts
<idle>-0 [00] 28991519113918946 [+ 125]: read_tod_clock <-getnstimeofday
<idle>-0 [00] 28991519113919071 [+ 125]: set_normalized_timespec <-ktime_get_ts
<idle>-0 [00] 28991519113919196 [+ 93]: clockevents_program_event <-tick_dev_program_event
<idle>-0 [00] 28991519113919290 [+ 93]: s390_next_event <-clockevents_program_event
<idle>-0 [00] 28991519113919384 [+ 156]: rb_erase <-__remove_hrtimer
<idle>-0 [00] 28991519113919540 [+ 93]: __timer_stats_hrtimer_set_start_info <-hrtimer_start
<idle>-0 [00] 28991519113919634 [+ 93]: enqueue_hrtimer <-hrtimer_start
<idle>-0 [00] 28991519113919728 [+ 125]: hrtimer_reprogram <-enqueue_hrtimer
<idle>-0 [00] 28991519113919853 [+ 62]: tick_program_event <-hrtimer_reprogram
<idle>-0 [00] 28991519113919915 [+ 93]: tick_dev_program_event <-tick_program_event
<idle>-0 [00] 28991519113920009 [+ 93]: ktime_get <-tick_dev_program_event
<idle>-0 [00] 28991519113920103 [+ 93]: ktime_get_ts <-ktime_get
<idle>-0 [00] 28991519113920196 [+ 62]: getnstimeofday <-ktime_get_ts
<idle>-0 [00] 28991519113920259 [+ 125]: read_tod_clock <-getnstimeofday
<idle>-0 [00] 28991519113920384 [+ 125]: set_normalized_timespec <-ktime_get_ts
<idle>-0 [00] 28991519113920509 [+ 62]: clockevents_program_event <-tick_dev_program_event
<idle>-0 [00] 28991519113920571 [+ 125]: s390_next_event <-clockevents_program_event
<idle>-0 [00] 28991519113920696 [+ 125]: rb_insert_color <-enqueue_hrtimer
<idle>-0 [00] 28991519113920821 [+ 156]: _spin_unlock_irqrestore <-hrtimer_start
<idle>-0 [00] 28991519113920978 [+ 125]: _spin_lock <-cpu_idle
<idle>-0 [00] 28991519113921103 [+229750]: vtime_stop_cpu_timer <-cpu_idle
<idle>-0 [00] 28991519114150853 [+ 93]: do_IRQ <-io_return
<idle>-0 [00] 28991519114150946 [+ 62]: irq_enter <-do_IRQ
<idle>-0 [00] 28991519114151009 [+ 125]: idle_cpu <-irq_enter
<idle>-0 [00] 28991519114151134 [+ 93]: tick_nohz_stop_idle <-irq_enter
<idle>-0 [00] 28991519114151228 [+ 62]: ktime_get <-tick_nohz_stop_idle
<idle>-0 [00] 28991519114151290 [+ 125]: ktime_get_ts <-ktime_get
<idle>-0 [00] 28991519114151415 [+ 62]: getnstimeofday <-ktime_get_ts
<idle>-0 [00] 28991519114151478 [+ 93]: read_tod_clock <-getnstimeofday
<idle>-0 [00] 28991519114151571 [+ 125]: set_normalized_timespec <-ktime_get_ts
<idle>-0 [00] 28991519114151696 [+ 93]: account_system_vtime <-irq_enter
<idle>-0 [00] 28991519114151790 [+ 93]: account_system_time <-account_system_vtime
<idle>-0 [00] 28991519114151884 [+ 125]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28991519114152009 [+ 93]: idle_cpu <-irq_enter
<idle>-0 [00] 28991519114152103 [+ 62]: tick_nohz_update_jiffies <-irq_enter
<idle>-0 [00] 28991519114152165 [+ 93]: ktime_get <-tick_nohz_update_jiffies
<idle>-0 [00] 28991519114152259 [+ 93]: ktime_get_ts <-ktime_get
<idle>-0 [00] 28991519114152353 [+ 62]: getnstimeofday <-ktime_get_ts
<idle>-0 [00] 28991519114152415 [+ 93]: read_tod_clock <-getnstimeofday
<idle>-0 [00] 28991519114152509 [+ 125]: set_normalized_timespec <-ktime_get_ts
<idle>-0 [00] 28991519114152634 [+ 125]: tick_do_update_jiffies64 <-tick_nohz_update_jiffies
<idle>-0 [00] 28991519114152759 [+ 62]: s390_idle_leave <-do_IRQ
<idle>-0 [00] 28991519114152821 [+ 125]: vtime_start_cpu_timer <-s390_idle_leave
<idle>-0 [00] 28991519114152946 [+ 93]: _spin_lock <-s390_idle_leave
<idle>-0 [00] 28991519114153040 [+ 250]: do_adapter_IO <-do_IRQ
<idle>-0 [00] 28991519114153290 [+ 93]: tiqdio_thinint_handler <-do_adapter_IO
<idle>-0 [00] 28991519114153384 [+ 156]: qdio_perf_stat_inc <-tiqdio_thinint_handler
<idle>-0 [00] 28991519114153540 [+ 250]: __tasklet_hi_schedule <-tiqdio_thinint_handler
<idle>-0 [00] 28991519114153790 [+ 93]: irq_exit <-do_IRQ
<idle>-0 [00] 28991519114153884 [+ 93]: account_system_vtime <-irq_exit
<idle>-0 [00] 28991519114153978 [+ 93]: account_system_time <-account_system_vtime
<idle>-0 [00] 28991519114154071 [+ 93]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28991519114154165 [+ 93]: do_softirq <-irq_exit
<idle>-0 [00] 28991519114154259 [+ 93]: __do_softirq <-do_softirq
<idle>-0 [00] 28991519114154353 [+ 62]: account_system_vtime <-__do_softirq
<idle>-0 [00] 28991519114154415 [+ 125]: account_system_time <-account_system_vtime
<idle>-0 [00] 28991519114154540 [+ 93]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28991519114154634 [+ 93]: tasklet_hi_action <-__do_softirq
<idle>-0 [00] 28991519114154728 [+ 93]: tiqdio_tasklet_fn <-tasklet_hi_action
<idle>-0 [00] 28991519114154821 [+ 218]: qdio_perf_stat_inc <-tiqdio_tasklet_fn
<idle>-0 [00] 28991519114155040 [+ 218]: __tasklet_schedule <-tiqdio_tasklet_fn
<idle>-0 [00] 28991519114155259 [+ 125]: tasklet_action <-__do_softirq
<idle>-0 [00] 28991519114155384 [+ 93]: tiqdio_inbound_processing <-tasklet_action
<idle>-0 [00] 28991519114155478 [+ 93]: qdio_perf_stat_inc <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114155571 [+ 125]: qdio_sync_after_thinint <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114155696 [+ 93]: qdio_siga_sync <-qdio_sync_after_thinint
<idle>-0 [00] 28991519114155790 [+ 125]: qdio_check_outbound_after_thinint <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114155915 [+ 125]: __tasklet_schedule <-qdio_check_outbound_after_thinint
<idle>-0 [00] 28991519114156040 [+ 218]: qdio_inbound_q_moved <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114156259 [+ 93]: qdio_kick_inbound_handler <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114156353 [+ 93]: qdio_perf_stat_inc <-qdio_kick_inbound_handler
<idle>-0 [00] 28991519114156446 [+ 156]: qeth_l3_qdio_input_handler <-qdio_kick_inbound_handler
<idle>-0 [00] 28991519114156603 [+ 218]: qeth_core_get_next_skb <-qeth_l3_qdio_input_handler
<idle>-0 [00] 28991519114156821 [+ 62]: dev_alloc_skb <-qeth_core_get_next_skb
<idle>-0 [00] 28991519114156884 [+ 125]: __alloc_skb <-dev_alloc_skb
<idle>-0 [00] 28991519114157009 [+ 93]: kmem_cache_alloc <-__alloc_skb
<idle>-0 [00] 28991519114157103 [+ 187]: __kmalloc <-__alloc_skb
<idle>-0 [00] 28991519114157290 [+ 187]: skb_put <-qeth_core_get_next_skb
<idle>-0 [00] 28991519114157478 [+ 93]: eth_header <-qeth_l3_qdio_input_handler
<idle>-0 [00] 28991519114157571 [+ 93]: skb_push <-eth_header
<idle>-0 [00] 28991519114157665 [+ 93]: eth_type_trans <-qeth_l3_qdio_input_handler
<idle>-0 [00] 28991519114157759 [+ 125]: skb_pull <-eth_type_trans
<idle>-0 [00] 28991519114157884 [+ 93]: netif_rx <-qeth_l3_qdio_input_handler
<idle>-0 [00] 28991519114157978 [+ 125]: __napi_schedule <-netif_rx
<idle>-0 [00] 28991519114158103 [+ 125]: qeth_core_get_next_skb <-qeth_l3_qdio_input_handler
<idle>-0 [00] 28991519114158228 [+ 125]: qeth_queue_input_buffer <-qeth_l3_qdio_input_handler
<idle>-0 [00] 28991519114158353 [+ 62]: qdio_siga_sync_q <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114158415 [+ 125]: qdio_siga_sync <-qdio_siga_sync_q
<idle>-0 [00] 28991519114158540 [+ 93]: get_buf_state <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114158634 [+ 93]: qdio_stop_polling <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114158728 [+ 125]: qdio_perf_stat_inc <-qdio_stop_polling
<idle>-0 [00] 28991519114158853 [+ 93]: qdio_siga_sync_q <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114158946 [+ 93]: qdio_siga_sync <-qdio_siga_sync_q
<idle>-0 [00] 28991519114159040 [+ 156]: get_buf_state <-tiqdio_inbound_processing
<idle>-0 [00] 28991519114159196 [+ 93]: net_rx_action <-__do_softirq
<idle>-0 [00] 28991519114159290 [+ 93]: process_backlog <-net_rx_action
<idle>-0 [00] 28991519114159384 [+ 156]: netif_receive_skb <-process_backlog
<idle>-0 [00] 28991519114159540 [+ 125]: ip_rcv <-netif_receive_skb
<idle>-0 [00] 28991519114159665 [+ 125]: ip_rcv_finish <-ip_rcv
<idle>-0 [00] 28991519114159790 [+ 156]: ip_route_input <-ip_rcv_finish
<idle>-0 [00] 28991519114159946 [+ 62]: ip_local_deliver <-ip_rcv_finish
<idle>-0 [00] 28991519114160009 [+ 93]: ip_local_deliver_finish <-ip_local_deliver
<idle>-0 [00] 28991519114160103 [+ 125]: raw_local_deliver <-ip_local_deliver_finish
<idle>-0 [00] 28991519114160228 [+ 93]: tcp_v4_rcv <-ip_local_deliver_finish
<idle>-0 [00] 28991519114160321 [+ 125]: __inet_lookup_established <-tcp_v4_rcv
<idle>-0 [00] 28991519114160446 [+ 125]: _read_lock <-__inet_lookup_established
<idle>-0 [00] 28991519114160571 [+ 93]: sk_filter <-tcp_v4_rcv
<idle>-0 [00] 28991519114160665 [+ 93]: security_sock_rcv_skb <-sk_filter
<idle>-0 [00] 28991519114160759 [+ 62]: cap_socket_sock_rcv_skb <-security_sock_rcv_skb
<idle>-0 [00] 28991519114160821 [+ 125]: local_bh_disable <-sk_filter
<idle>-0 [00] 28991519114160946 [+ 93]: local_bh_enable <-sk_filter
<idle>-0 [00] 28991519114161040 [+ 93]: _spin_lock <-tcp_v4_rcv
<idle>-0 [00] 28991519114161134 [+ 93]: __wake_up <-tcp_v4_rcv
<idle>-0 [00] 28991519114161228 [+ 93]: _spin_lock_irqsave <-__wake_up
<idle>-0 [00] 28991519114161321 [+ 93]: __wake_up_common <-__wake_up
<idle>-0 [00] 28991519114161415 [+ 93]: autoremove_wake_function <-__wake_up_common
<idle>-0 [00] 28991519114161509 [+ 62]: default_wake_function <-autoremove_wake_function
<idle>-0 [00] 28991519114161571 [+ 125]: try_to_wake_up <-default_wake_function
<idle>-0 [00] 28991519114161696 [+ 62]: task_rq_lock <-try_to_wake_up
<idle>-0 [00] 28991519114161759 [+ 125]: _spin_lock <-task_rq_lock
<idle>-0 [00] 28991519114161884 [+ 125]: select_task_rq_fair <-try_to_wake_up
<idle>-0 [00] 28991519114162009 [+ 93]: sched_clock_cpu <-try_to_wake_up
<idle>-0 [00] 28991519114162103 [+ 93]: sched_clock <-sched_clock_cpu
<idle>-0 [00] 28991519114162196 [+ 93]: activate_task <-try_to_wake_up
<idle>-0 [00] 28991519114162290 [+ 93]: enqueue_task <-activate_task
<idle>-0 [00] 28991519114162384 [+ 93]: enqueue_task_fair <-enqueue_task
<idle>-0 [00] 28991519114162478 [+ 93]: enqueue_entity <-enqueue_task_fair
<idle>-0 [00] 28991519114162571 [+ 93]: update_curr <-enqueue_entity
<idle>-0 [00] 28991519114162665 [+ 93]: place_entity <-enqueue_entity
<idle>-0 [00] 28991519114162759 [+ 93]: account_scheduler_latency <-enqueue_entity
<idle>-0 [00] 28991519114162853 [+ 93]: __enqueue_entity <-enqueue_entity
<idle>-0 [00] 28991519114162946 [+ 93]: rb_insert_color <-__enqueue_entity
<idle>-0 [00] 28991519114163040 [+ 125]: hrtick_start_fair <-enqueue_task_fair
<idle>-0 [00] 28991519114163165 [+ 93]: check_preempt_curr_idle <-try_to_wake_up
<idle>-0 [00] 28991519114163259 [+ 62]: resched_task <-check_preempt_curr_idle
<idle>-0 [00] 28991519114163321 [+ 125]: _spin_unlock_irqrestore <-try_to_wake_up
<idle>-0 [00] 28991519114163446 [+ 93]: _spin_unlock_irqrestore <-__wake_up
<idle>-0 [00] 28991519114163540 [+ 93]: sk_reset_timer <-tcp_v4_rcv
<idle>-0 [00] 28991519114163634 [+ 125]: mod_timer <-sk_reset_timer
<idle>-0 [00] 28991519114163759 [+ 125]: __timer_stats_timer_set_start_info <-mod_timer
<idle>-0 [00] 28991519114163884 [+ 125]: __mod_timer <-mod_timer
<idle>-0 [00] 28991519114164009 [+ 125]: __timer_stats_timer_set_start_info <-__mod_timer
<idle>-0 [00] 28991519114164134 [+ 125]: lock_timer_base <-__mod_timer
<idle>-0 [00] 28991519114164259 [+ 125]: _spin_lock_irqsave <-lock_timer_base
<idle>-0 [00] 28991519114164384 [+ 125]: internal_add_timer <-__mod_timer
<idle>-0 [00] 28991519114164509 [+ 281]: _spin_unlock_irqrestore <-__mod_timer
<idle>-0 [00] 28991519114164790 [+ 156]: tasklet_action <-__do_softirq
<idle>-0 [00] 28991519114164946 [+ 125]: qdio_outbound_processing <-tasklet_action
<idle>-0 [00] 28991519114165071 [+ 250]: qdio_perf_stat_inc <-qdio_outbound_processing
<idle>-0 [00] 28991519114165321 [+ 187]: qeth_qdio_output_handler <-qdio_outbound_processing
<idle>-0 [00] 28991519114165509 [+ 125]: qeth_check_qdio_errors <-qeth_qdio_output_handler
<idle>-0 [00] 28991519114165634 [+ 125]: qeth_clear_output_buffer <-qeth_qdio_output_handler
<idle>-0 [00] 28991519114165759 [+ 93]: skb_dequeue <-qeth_clear_output_buffer
<idle>-0 [00] 28991519114165853 [+ 125]: _spin_lock_irqsave <-skb_dequeue
<idle>-0 [00] 28991519114165978 [+ 156]: _spin_unlock_irqrestore <-skb_dequeue
<idle>-0 [00] 28991519114166134 [+ 156]: dev_kfree_skb_any <-qeth_clear_output_buffer
<idle>-0 [00] 28991519114166290 [+ 125]: kfree_skb <-dev_kfree_skb_any
<idle>-0 [00] 28991519114166415 [+ 125]: __kfree_skb <-kfree_skb
<idle>-0 [00] 28991519114166540 [+ 125]: skb_release_all <-__kfree_skb
<idle>-0 [00] 28991519114166665 [+ 125]: dst_release <-skb_release_all
<idle>-0 [00] 28991519114166790 [+ 125]: skb_release_data <-skb_release_all
<idle>-0 [00] 28991519114166915 [+ 218]: kfree <-skb_release_data
<idle>-0 [00] 28991519114167134 [+ 156]: kmem_cache_free <-__kfree_skb
<idle>-0 [00] 28991519114167290 [+ 125]: skb_dequeue <-qeth_clear_output_buffer
<idle>-0 [00] 28991519114167415 [+ 125]: _spin_lock_irqsave <-skb_dequeue
<idle>-0 [00] 28991519114167540 [+ 125]: _spin_unlock_irqrestore <-skb_dequeue
<idle>-0 [00] 28991519114167665 [+ 312]: qeth_eddp_buf_release_contexts <-qeth_clear_output_buffer
<idle>-0 [00] 28991519114167978 [+ 156]: qeth_switch_to_nonpacking_if_needed <-qeth_qdio_output_handler
<idle>-0 [00] 28991519114168134 [+ 125]: __netif_schedule <-qeth_qdio_output_handler
<idle>-0 [00] 28991519114168259 [+ 156]: raise_softirq_irqoff <-__netif_schedule
<idle>-0 [00] 28991519114168415 [+ 125]: del_timer <-qdio_outbound_processing
<idle>-0 [00] 28991519114168540 [+ 93]: lock_timer_base <-del_timer
<idle>-0 [00] 28991519114168634 [+ 156]: _spin_lock_irqsave <-lock_timer_base
<idle>-0 [00] 28991519114168790 [+ 218]: _spin_unlock_irqrestore <-del_timer
<idle>-0 [00] 28991519114169009 [+ 187]: net_tx_action <-__do_softirq
<idle>-0 [00] 28991519114169196 [+ 125]: _spin_trylock <-net_tx_action
<idle>-0 [00] 28991519114169321 [+ 125]: __qdisc_run <-net_tx_action
<idle>-0 [00] 28991519114169446 [+ 156]: pfifo_fast_dequeue <-__qdisc_run
<idle>-0 [00] 28991519114169603 [+ 93]: account_system_vtime <-__do_softirq
<idle>-0 [00] 28991519114169696 [+ 125]: account_system_time <-account_system_vtime
<idle>-0 [00] 28991519114169821 [+ 93]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28991519114169915 [+ 125]: _local_bh_enable <-__do_softirq
<idle>-0 [00] 28991519114170040 [+ 187]: idle_cpu <-irq_exit
<idle>-0 [00] 28991519114170228 [+ 125]: tick_nohz_restart_sched_tick <-cpu_idle
<idle>-0 [00] 28991519114170353 [+ 93]: tick_nohz_stop_idle <-tick_nohz_restart_sched_tick
<idle>-0 [00] 28991519114170446 [+ 93]: select_nohz_load_balancer <-tick_nohz_restart_sched_tick
<idle>-0 [00] 28991519114170540 [+ 93]: ktime_get <-tick_nohz_restart_sched_tick
<idle>-0 [00] 28991519114170634 [+ 125]: ktime_get_ts <-ktime_get
<idle>-0 [00] 28991519114170759 [+ 62]: getnstimeofday <-ktime_get_ts
<idle>-0 [00] 28991519114170821 [+ 156]: read_tod_clock <-getnstimeofday
<idle>-0 [00] 28991519114170978 [+ 93]: set_normalized_timespec <-ktime_get_ts
<idle>-0 [00] 28991519114171071 [+ 125]: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
<idle>-0 [00] 28991519114171196 [+ 93]: hrtimer_cancel <-tick_nohz_restart_sched_tick
<idle>-0 [00] 28991519114171290 [+ 93]: hrtimer_try_to_cancel <-hrtimer_cancel
<idle>-0 [00] 28991519114171384 [+ 62]: lock_hrtimer_base <-hrtimer_try_to_cancel
<idle>-0 [00] 28991519114171446 [+ 125]: _spin_lock_irqsave <-lock_hrtimer_base
<idle>-0 [00] 28991519114171571 [+ 93]: __remove_hrtimer <-hrtimer_try_to_cancel
<idle>-0 [00] 28991519114171665 [+ 93]: rb_next <-__remove_hrtimer
<idle>-0 [00] 28991519114171759 [+ 93]: hrtimer_force_reprogram <-__remove_hrtimer
<idle>-0 [00] 28991519114171853 [+ 93]: tick_program_event <-hrtimer_force_reprogram
<idle>-0 [00] 28991519114171946 [+ 93]: tick_dev_program_event <-tick_program_event
<idle>-0 [00] 28991519114172040 [+ 125]: ktime_get <-tick_dev_program_event
<idle>-0 [00] 28991519114172165 [+ 62]: ktime_get_ts <-ktime_get
<idle>-0 [00] 28991519114172228 [+ 93]: getnstimeofday <-ktime_get_ts
<idle>-0 [00] 28991519114172321 [+ 93]: read_tod_clock <-getnstimeofday
<idle>-0 [00] 28991519114172415 [+ 156]: set_normalized_timespec <-ktime_get_ts
<idle>-0 [00] 28991519114172571 [+ 93]: clockevents_program_event <-tick_dev_program_event
<idle>-0 [00] 28991519114172665 [+ 125]: s390_next_event <-clockevents_program_event
<idle>-0 [00] 28991519114172790 [+ 125]: rb_erase <-__remove_hrtimer
<idle>-0 [00] 28991519114172915 [+ 93]: _spin_unlock_irqrestore <-hrtimer_try_to_cancel
<idle>-0 [00] 28991519114173009 [+ 93]: hrtimer_forward <-tick_nohz_restart_sched_tick
<idle>-0 [00] 28991519114173103 [+ 125]: hrtimer_start <-tick_nohz_restart_sched_tick
<idle>-0 [00] 28991519114173228 [+ 62]: lock_hrtimer_base <-hrtimer_start
<idle>-0 [00] 28991519114173290 [+ 125]: _spin_lock_irqsave <-lock_hrtimer_base
<idle>-0 [00] 28991519114173415 [+ 93]: __timer_stats_hrtimer_set_start_info <-hrtimer_start
<idle>-0 [00] 28991519114173509 [+ 93]: enqueue_hrtimer <-hrtimer_start
<idle>-0 [00] 28991519114173603 [+ 93]: hrtimer_reprogram <-enqueue_hrtimer
<idle>-0 [00] 28991519114173696 [+ 93]: tick_program_event <-hrtimer_reprogram
<idle>-0 [00] 28991519114173790 [+ 93]: tick_dev_program_event <-tick_program_event
<idle>-0 [00] 28991519114173884 [+ 62]: ktime_get <-tick_dev_program_event
<idle>-0 [00] 28991519114173946 [+ 93]: ktime_get_ts <-ktime_get
<idle>-0 [00] 28991519114174040 [+ 93]: getnstimeofday <-ktime_get_ts
<idle>-0 [00] 28991519114174134 [+ 93]: read_tod_clock <-getnstimeofday
<idle>-0 [00] 28991519114174228 [+ 93]: set_normalized_timespec <-ktime_get_ts
<idle>-0 [00] 28991519114174321 [+ 125]: clockevents_program_event <-tick_dev_program_event
<idle>-0 [00] 28991519114174446 [+ 125]: s390_next_event <-clockevents_program_event
<idle>-0 [00] 28991519114174571 [+ 93]: rb_insert_color <-enqueue_hrtimer
<idle>-0 [00] 28991519114174665 [+ 156]: _spin_unlock_irqrestore <-hrtimer_start
<idle>-0 [00] 28991519114174821 [+ 125]: schedule <-cpu_idle
<idle>-0 [00] 28991519114174946 [+ 93]: _spin_lock <-schedule
<idle>-0 [00] 28991519114175040 [+ 125]: sched_clock_cpu <-schedule
<idle>-0 [00] 28991519114175165 [+ 125]: sched_clock <-sched_clock_cpu
<idle>-0 [00] 28991519114175290 [+ 93]: put_prev_task_idle <-schedule
<idle>-0 [00] 28991519114175384 [+ 93]: pick_next_task_fair <-schedule
<idle>-0 [00] 28991519114175478 [+ 93]: set_next_entity <-pick_next_task_fair
<idle>-0 [00] 28991519114175571 [+ 93]: __dequeue_entity <-set_next_entity
<idle>-0 [00] 28991519114175665 [+ 93]: rb_next <-__dequeue_entity
<idle>-0 [00] 28991519114175759 [+ 125]: rb_erase <-__dequeue_entity
<idle>-0 [00] 28991519114175884 [+ 187]: hrtick_start_fair <-pick_next_task_fair
<idle>-0 [00] 28991519114176071 [+ 187]: _spin_trylock <-tracing_record_cmdline
perl-2581 [00] 28991519114176259 [+ 125]: finish_task_switch <-schedule
perl-2581 [00] 28991519114176384 [+ 93]: account_vtime <-finish_task_switch
perl-2581 [00] 28991519114176478 [+ 93]: account_user_time <-account_vtime
perl-2581 [00] 28991519114176571 [+ 93]: acct_update_integrals <-account_user_time
perl-2581 [00] 28991519114176665 [+ 125]: account_system_time <-account_vtime
perl-2581 [00] 28991519114176790 [+ 156]: acct_update_integrals <-account_system_time
perl-2581 [00] 28991519114176946 [+ 125]: lock_sock_nested <-sk_wait_data
perl-2581 [00] 28991519114177071 [+ 125]: _spin_lock_bh <-lock_sock_nested
perl-2581 [00] 28991519114177196 [+ 125]: local_bh_disable <-_spin_lock_bh
perl-2581 [00] 28991519114177321 [+ 187]: local_bh_enable <-lock_sock_nested
perl-2581 [00] 28991519114177509 [+ 156]: finish_wait <-sk_wait_data
perl-2581 [00] 28991519114177665 [+ 125]: tcp_prequeue_process <-tcp_recvmsg
perl-2581 [00] 28991519114177790 [+ 156]: local_bh_disable <-tcp_prequeue_process
perl-2581 [00] 28991519114177946 [+ 125]: tcp_v4_do_rcv <-tcp_prequeue_process
perl-2581 [00] 28991519114178071 [+ 125]: tcp_rcv_established <-tcp_v4_do_rcv
perl-2581 [00] 28991519114178196 [+ 156]: local_bh_enable <-tcp_rcv_established
perl-2581 [00] 28991519114178353 [+ 125]: skb_copy_datagram_iovec <-tcp_rcv_established
perl-2581 [00] 28991519114178478 [+ 125]: memcpy_toiovec <-skb_copy_datagram_iovec
perl-2581 [00] 28991519114178603 [+ 125]: copy_to_user_mvcos_check <-memcpy_toiovec
perl-2581 [00] 28991519114178728 [+ 187]: copy_to_user_std <-copy_to_user_mvcos_check
perl-2581 [00] 28991519114178915 [+ 93]: tcp_rcv_space_adjust <-tcp_rcv_established
perl-2581 [00] 28991519114179009 [+ 156]: local_bh_disable <-tcp_rcv_established
perl-2581 [00] 28991519114179165 [+ 125]: get_seconds <-tcp_rcv_established
perl-2581 [00] 28991519114179290 [+ 156]: tcp_event_data_recv <-tcp_rcv_established
perl-2581 [00] 28991519114179446 [+ 187]: tcp_ack <-tcp_rcv_established
perl-2581 [00] 28991519114179634 [+ 156]: __kfree_skb <-tcp_ack
perl-2581 [00] 28991519114179790 [+ 93]: skb_release_all <-__kfree_skb
perl-2581 [00] 28991519114179884 [+ 125]: dst_release <-skb_release_all
perl-2581 [00] 28991519114180009 [+ 156]: skb_release_data <-skb_release_all
perl-2581 [00] 28991519114180165 [+ 187]: kfree <-skb_release_data
perl-2581 [00] 28991519114180353 [+ 187]: kmem_cache_free <-__kfree_skb
perl-2581 [00] 28991519114180540 [+ 125]: tcp_rtt_estimator <-tcp_ack
perl-2581 [00] 28991519114180665 [+ 156]: bictcp_acked <-tcp_ack
perl-2581 [00] 28991519114180821 [+ 93]: bictcp_cong_avoid <-tcp_ack
perl-2581 [00] 28991519114180915 [+ 156]: tcp_is_cwnd_limited <-bictcp_cong_avoid
perl-2581 [00] 28991519114181071 [+ 125]: tcp_current_mss <-tcp_rcv_established
perl-2581 [00] 28991519114181196 [+ 125]: tcp_established_options <-tcp_current_mss
perl-2581 [00] 28991519114181321 [+ 125]: __tcp_push_pending_frames <-tcp_rcv_established
perl-2581 [00] 28991519114181446 [+ 125]: __tcp_ack_snd_check <-tcp_rcv_established
perl-2581 [00] 28991519114181571 [+ 125]: tcp_send_delayed_ack <-__tcp_ack_snd_check
perl-2581 [00] 28991519114181696 [+ 125]: sk_reset_timer <-tcp_send_delayed_ack
perl-2581 [00] 28991519114181821 [+ 125]: mod_timer <-sk_reset_timer
perl-2581 [00] 28991519114181946 [+ 125]: __timer_stats_timer_set_start_info <-mod_timer
perl-2581 [00] 28991519114182071 [+ 93]: __mod_timer <-mod_timer
perl-2581 [00] 28991519114182165 [+ 156]: __timer_stats_timer_set_start_info <-__mod_timer
perl-2581 [00] 28991519114182321 [+ 93]: lock_timer_base <-__mod_timer
perl-2581 [00] 28991519114182415 [+ 156]: _spin_lock_irqsave <-lock_timer_base
perl-2581 [00] 28991519114182571 [+ 93]: internal_add_timer <-__mod_timer
perl-2581 [00] 28991519114182665 [+ 156]: _spin_unlock_irqrestore <-__mod_timer
perl-2581 [00] 28991519114182821 [+ 93]: __kfree_skb <-tcp_rcv_established
perl-2581 [00] 28991519114182915 [+ 125]: skb_release_all <-__kfree_skb
perl-2581 [00] 28991519114183040 [+ 125]: dst_release <-skb_release_all
perl-2581 [00] 28991519114183165 [+ 125]: skb_release_data <-skb_release_all
perl-2581 [00] 28991519114183290 [+ 187]: kfree <-skb_release_data
perl-2581 [00] 28991519114183478 [+ 187]: kmem_cache_free <-__kfree_skb
perl-2581 [00] 28991519114183665 [+ 187]: local_bh_enable <-tcp_prequeue_process
perl-2581 [00] 28991519114183853 [+ 125]: tcp_cleanup_rbuf <-tcp_recvmsg
perl-2581 [00] 28991519114183978 [+ 156]: __tcp_select_window <-tcp_cleanup_rbuf
perl-2581 [00] 28991519114184134 [+ 125]: release_sock <-tcp_recvmsg
perl-2581 [00] 28991519114184259 [+ 93]: _spin_lock_bh <-release_sock
perl-2581 [00] 28991519114184353 [+ 125]: local_bh_disable <-_spin_lock_bh
perl-2581 [00] 28991519114184478 [+ 156]: _spin_unlock_bh <-release_sock
perl-2581 [00] 28991519114184634 [+ 218]: local_bh_enable_ip <-_spin_unlock_bh
perl-2581 [00] 28991519114184853 [+ 156]: dnotify_parent <-vfs_read
perl-2581 [00] 28991519114185009 [+ 125]: _spin_lock <-dnotify_parent
perl-2581 [00] 28991519114185134 [+ 125]: inotify_dentry_parent_queue_event <-vfs_read
perl-2581 [00] 28991519114185259 [+ 1687]: inotify_inode_queue_event <-vfs_read
And this is for 2.6.16:
# tracer: ftrace
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<...>-2804 [00] 28994586831213199 [+ 93]: sys_write <-sysc_noemu
<...>-2804 [00] 28994586831213292 [+ 218]: fget_light <-sys_write
<...>-2804 [00] 28994586831213511 [+ 156]: apparmor_file_permission <-sys_write
<...>-2804 [00] 28994586831213667 [+ 125]: sock_aio_write <-sys_write
<...>-2804 [00] 28994586831213792 [+ 125]: dummy_socket_sendmsg <-sock_aio_write
<...>-2804 [00] 28994586831213917 [+ 125]: inet_sendmsg <-sock_aio_write
<...>-2804 [00] 28994586831214042 [+ 93]: tcp_sendmsg <-inet_sendmsg
<...>-2804 [00] 28994586831214136 [+ 125]: lock_sock <-tcp_sendmsg
<...>-2804 [00] 28994586831214261 [+ 125]: _spin_lock_bh <-lock_sock
<...>-2804 [00] 28994586831214386 [+ 125]: _spin_unlock_bh <-lock_sock
<...>-2804 [00] 28994586831214511 [+ 187]: local_bh_enable <-_spin_unlock_bh
<...>-2804 [00] 28994586831214699 [+ 156]: tcp_current_mss <-tcp_sendmsg
<...>-2804 [00] 28994586831214855 [+ 125]: __alloc_skb <-tcp_sendmsg
<...>-2804 [00] 28994586831214980 [+ 156]: kmem_cache_alloc <-__alloc_skb
<...>-2804 [00] 28994586831215136 [+ 218]: __kmalloc <-__alloc_skb
<...>-2804 [00] 28994586831215355 [+ 187]: memset <-__alloc_skb
<...>-2804 [00] 28994586831215542 [+ 125]: copy_from_user_mvcos_check <-tcp_sendmsg
<...>-2804 [00] 28994586831215667 [+ 218]: copy_from_user_std <-copy_from_user_mvcos_check
<...>-2804 [00] 28994586831215886 [+ 125]: __tcp_push_pending_frames <-tcp_sendmsg
<...>-2804 [00] 28994586831216011 [+ 125]: tcp_transmit_skb <-__tcp_push_pending_frames
<...>-2804 [00] 28994586831216136 [+ 156]: skb_clone <-tcp_transmit_skb
<...>-2804 [00] 28994586831216292 [+ 250]: memcpy <-skb_clone
<...>-2804 [00] 28994586831216542 [+ 156]: tcp_v4_send_check <-tcp_transmit_skb
<...>-2804 [00] 28994586831216699 [+ 187]: ip_queue_xmit <-tcp_transmit_skb
<...>-2804 [00] 28994586831216886 [+ 187]: ip_output <-ip_queue_xmit
<...>-2804 [00] 28994586831217074 [+ 125]: dev_queue_xmit <-ip_output
<...>-2804 [00] 28994586831217199 [+ 125]: _spin_lock <-dev_queue_xmit
<...>-2804 [00] 28994586831217324 [+ 125]: pfifo_fast_enqueue <-dev_queue_xmit
<...>-2804 [00] 28994586831217449 [+ 125]: __qdisc_run <-dev_queue_xmit
<...>-2804 [00] 28994586831217574 [+ 156]: pfifo_fast_dequeue <-__qdisc_run
<...>-2804 [00] 28994586831217730 [+ 125]: _spin_trylock <-__qdisc_run
<...>-2804 [00] 28994586831217855 [+ 156]: dev_hard_start_xmit <-__qdisc_run
<...>-2804 [00] 28994586831218011 [+ 250]: qeth_hard_start_xmit <-dev_hard_start_xmit
<...>-2804 [00] 28994586831218261 [+ 125]: qeth_fill_header <-qeth_hard_start_xmit
<...>-2804 [00] 28994586831218386 [+ 156]: memset <-qeth_fill_header
<...>-2804 [00] 28994586831218542 [+ 187]: memset <-qeth_fill_header
<...>-2804 [00] 28994586831218730 [+ 125]: skb_queue_tail <-qeth_hard_start_xmit
<...>-2804 [00] 28994586831218855 [+ 125]: _spin_lock_irqsave <-skb_queue_tail
<...>-2804 [00] 28994586831218980 [+ 156]: _spin_unlock_irqrestore <-skb_queue_tail
<...>-2804 [00] 28994586831219136 [+ 218]: qeth_flush_buffers <-qeth_hard_start_xmit
<...>-2804 [00] 28994586831219355 [+ 875]: do_QDIO <-qeth_flush_buffers
<...>-2804 [00] 28994586831220230 [+ 93]: mod_timer <-do_QDIO
<...>-2804 [00] 28994586831220324 [+ 187]: _spin_lock_irqsave <-mod_timer
<...>-2804 [00] 28994586831220511 [+ 187]: _spin_unlock_irqrestore <-mod_timer
<...>-2804 [00] 28994586831220699 [+ 125]: raise_softirq_irqoff <-qeth_hard_start_xmit
<...>-2804 [00] 28994586831220824 [+ 156]: _spin_lock <-__qdisc_run
<...>-2804 [00] 28994586831220980 [+ 125]: pfifo_fast_dequeue <-__qdisc_run
<...>-2804 [00] 28994586831221105 [+ 156]: local_bh_enable <-dev_queue_xmit
<...>-2804 [00] 28994586831221261 [+ 156]: do_softirq <-local_bh_enable
<...>-2804 [00] 28994586831221417 [+ 93]: account_system_vtime <-do_softirq
<...>-2804 [00] 28994586831221511 [+ 93]: account_system_time <-account_system_vtime
<...>-2804 [00] 28994586831221605 [+ 125]: acct_update_integrals <-account_system_time
<...>-2804 [00] 28994586831221730 [+ 156]: __do_softirq <-do_softirq
<...>-2804 [00] 28994586831221886 [+ 156]: net_tx_action <-__do_softirq
<...>-2804 [00] 28994586831222042 [+ 125]: _spin_trylock <-net_tx_action
<...>-2804 [00] 28994586831222167 [+ 125]: __qdisc_run <-net_tx_action
<...>-2804 [00] 28994586831222292 [+ 156]: pfifo_fast_dequeue <-__qdisc_run
<...>-2804 [00] 28994586831222449 [+ 93]: account_system_vtime <-do_softirq
<...>-2804 [00] 28994586831222542 [+ 125]: account_system_time <-account_system_vtime
<...>-2804 [00] 28994586831222667 [+ 156]: acct_update_integrals <-account_system_time
<...>-2804 [00] 28994586831222824 [+ 125]: sk_reset_timer <-__tcp_push_pending_frames
<...>-2804 [00] 28994586831222949 [+ 156]: mod_timer <-sk_reset_timer
<...>-2804 [00] 28994586831223105 [+ 125]: release_sock <-tcp_sendmsg
<...>-2804 [00] 28994586831223230 [+ 93]: _spin_lock_bh <-release_sock
<...>-2804 [00] 28994586831223324 [+ 156]: _spin_unlock_bh <-release_sock
<...>-2804 [00] 28994586831223480 [+ 187]: local_bh_enable <-_spin_unlock_bh
<...>-2804 [00] 28994586831223667 [+ 93]: dnotify_parent <-sys_write
<...>-2804 [00] 28994586831223761 [+ 125]: _spin_lock <-dnotify_parent
<...>-2804 [00] 28994586831223886 [+ 156]: inotify_dentry_parent_queue_event <-sys_write
<...>-2804 [00] 28994586831224042 [+ 812]: inotify_inode_queue_event <-sys_write
<...>-2804 [00] 28994586831224855 [+ 93]: sys_read <-sysc_noemu
<...>-2804 [00] 28994586831224949 [+ 187]: fget_light <-sys_read
<...>-2804 [00] 28994586831225136 [+ 125]: apparmor_file_permission <-sys_read
<...>-2804 [00] 28994586831225261 [+ 125]: sock_aio_read <-sys_read
<...>-2804 [00] 28994586831225386 [+ 156]: dummy_socket_recvmsg <-sock_aio_read
<...>-2804 [00] 28994586831225542 [+ 93]: sock_common_recvmsg <-sock_aio_read
<...>-2804 [00] 28994586831225636 [+ 125]: tcp_recvmsg <-sock_common_recvmsg
<...>-2804 [00] 28994586831225761 [+ 125]: lock_sock <-tcp_recvmsg
<...>-2804 [00] 28994586831225886 [+ 125]: _spin_lock_bh <-lock_sock
<...>-2804 [00] 28994586831226011 [+ 125]: _spin_unlock_bh <-lock_sock
<...>-2804 [00] 28994586831226136 [+ 250]: local_bh_enable <-_spin_unlock_bh
<...>-2804 [00] 28994586831226386 [+ 125]: sk_wait_data <-tcp_recvmsg
<...>-2804 [00] 28994586831226511 [+ 125]: prepare_to_wait <-sk_wait_data
<...>-2804 [00] 28994586831226636 [+ 156]: _spin_lock_irqsave <-prepare_to_wait
<...>-2804 [00] 28994586831226792 [+ 156]: _spin_unlock_irqrestore <-prepare_to_wait
<...>-2804 [00] 28994586831226949 [+ 93]: _spin_lock_bh <-sk_wait_data
<...>-2804 [00] 28994586831227042 [+ 125]: _spin_unlock_bh <-sk_wait_data
<...>-2804 [00] 28994586831227167 [+ 187]: local_bh_enable <-_spin_unlock_bh
<...>-2804 [00] 28994586831227355 [+ 125]: schedule_timeout <-sk_wait_data
<...>-2804 [00] 28994586831227480 [+ 93]: schedule <-schedule_timeout
<...>-2804 [00] 28994586831227574 [+ 156]: profile_hit <-schedule
<...>-2804 [00] 28994586831227730 [+ 156]: sched_clock <-schedule
<...>-2804 [00] 28994586831227886 [+ 281]: _spin_lock_irq <-schedule
<idle>-0 [00] 28994586831228167 [+ 93]: account_vtime <-schedule
<idle>-0 [00] 28994586831228261 [+ 93]: account_user_time <-account_vtime
<idle>-0 [00] 28994586831228355 [+ 93]: account_system_time <-account_vtime
<idle>-0 [00] 28994586831228449 [+ 218]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28994586831228667 [+ 93]: notifier_call_chain <-cpu_idle
<idle>-0 [00] 28994586831228761 [+ 93]: nohz_idle_notify <-notifier_call_chain
<idle>-0 [00] 28994586831228855 [+ 156]: rcu_needs_cpu <-nohz_idle_notify
<idle>-0 [00] 28994586831229011 [+ 62]: next_timer_interrupt <-nohz_idle_notify
<idle>-0 [00] 28994586831229074 [+ 93]: hrtimer_get_next_event <-next_timer_interrupt
<idle>-0 [00] 28994586831229167 [+ 125]: _spin_lock_irqsave <-hrtimer_get_next_event
<idle>-0 [00] 28994586831229292 [+ 93]: _spin_unlock_irqrestore <-hrtimer_get_next_event
<idle>-0 [00] 28994586831229386 [+ 93]: _spin_lock_irqsave <-hrtimer_get_next_event
<idle>-0 [00] 28994586831229480 [+ 93]: _spin_unlock_irqrestore <-hrtimer_get_next_event
<idle>-0 [00] 28994586831229574 [+ 125]: ktime_get <-hrtimer_get_next_event
<idle>-0 [00] 28994586831229699 [+ 93]: getnstimeofday <-ktime_get
<idle>-0 [00] 28994586831229792 [+ 156]: do_gettimeofday <-getnstimeofday
<idle>-0 [00] 28994586831229949 [+ 156]: set_normalized_timespec <-ktime_get
<idle>-0 [00] 28994586831230105 [+ 125]: ns_to_timespec <-next_timer_interrupt
<idle>-0 [00] 28994586831230230 [+ 156]: _spin_lock <-next_timer_interrupt
<idle>-0 [00] 28994586831230386 [+212406]: vtimer_idle_notify <-notifier_call_chain
<idle>-0 [00] 28994586831442792 [+ 93]: do_IRQ <-io_return
<idle>-0 [00] 28994586831442886 [+ 62]: account_system_vtime <-do_IRQ
<idle>-0 [00] 28994586831442949 [+ 125]: account_system_time <-account_system_vtime
<idle>-0 [00] 28994586831443074 [+ 250]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28994586831443324 [+ 93]: do_monitor_call <-sysc_return
<idle>-0 [00] 28994586831443417 [+ 62]: notifier_call_chain <-do_monitor_call
<idle>-0 [00] 28994586831443480 [+ 156]: nohz_idle_notify <-notifier_call_chain
<idle>-0 [00] 28994586831443636 [+ 125]: vtimer_idle_notify <-notifier_call_chain
<idle>-0 [00] 28994586831443761 [+ 125]: do_adapter_IO <-do_IRQ
<idle>-0 [00] 28994586831443886 [+ 500]: tiqdio_thinint_handler <-do_adapter_IO
<idle>-0 [00] 28994586831444386 [+ 156]: qeth_qdio_output_handler <-tiqdio_thinint_handler
<idle>-0 [00] 28994586831444542 [+ 62]: skb_dequeue <-qeth_qdio_output_handler
<idle>-0 [00] 28994586831444605 [+ 125]: _spin_lock_irqsave <-skb_dequeue
<idle>-0 [00] 28994586831444730 [+ 125]: _spin_unlock_irqrestore <-skb_dequeue
<idle>-0 [00] 28994586831444855 [+ 93]: raise_softirq_irqoff <-qeth_qdio_output_handler
<idle>-0 [00] 28994586831444949 [+ 93]: skb_dequeue <-qeth_qdio_output_handler
<idle>-0 [00] 28994586831445042 [+ 93]: _spin_lock_irqsave <-skb_dequeue
<idle>-0 [00] 28994586831445136 [+ 93]: _spin_unlock_irqrestore <-skb_dequeue
<idle>-0 [00] 28994586831445230 [+ 250]: qeth_eddp_buf_release_contexts <-qeth_qdio_output_handler
<idle>-0 [00] 28994586831445480 [+ 125]: raise_softirq_irqoff <-qeth_qdio_output_handler
<idle>-0 [00] 28994586831445605 [+ 93]: del_timer <-tiqdio_thinint_handler
<idle>-0 [00] 28994586831445699 [+ 125]: _spin_lock_irqsave <-del_timer
<idle>-0 [00] 28994586831445824 [+ 218]: _spin_unlock_irqrestore <-del_timer
<idle>-0 [00] 28994586831446042 [+ 250]: qeth_qdio_input_handler <-tiqdio_thinint_handler
<idle>-0 [00] 28994586831446292 [+ 62]: __alloc_skb <-qeth_qdio_input_handler
<idle>-0 [00] 28994586831446355 [+ 125]: kmem_cache_alloc <-__alloc_skb
<idle>-0 [00] 28994586831446480 [+ 125]: __kmalloc <-__alloc_skb
<idle>-0 [00] 28994586831446605 [+ 156]: memset <-__alloc_skb
<idle>-0 [00] 28994586831446761 [+ 187]: memcpy <-qeth_qdio_input_handler
<idle>-0 [00] 28994586831446949 [+ 500]: netif_rx <-qeth_qdio_input_handler
<idle>-0 [00] 28994586831447449 [+ 93]: irq_exit <-do_IRQ
<idle>-0 [00] 28994586831447542 [+ 93]: account_system_vtime <-irq_exit
<idle>-0 [00] 28994586831447636 [+ 93]: account_system_time <-account_system_vtime
<idle>-0 [00] 28994586831447730 [+ 93]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28994586831447824 [+ 93]: do_softirq <-irq_exit
<idle>-0 [00] 28994586831447917 [+ 62]: account_system_vtime <-do_softirq
<idle>-0 [00] 28994586831447980 [+ 93]: account_system_time <-account_system_vtime
<idle>-0 [00] 28994586831448074 [+ 125]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28994586831448199 [+ 93]: __do_softirq <-do_softirq
<idle>-0 [00] 28994586831448292 [+ 93]: net_tx_action <-__do_softirq
<idle>-0 [00] 28994586831448386 [+ 125]: __kfree_skb <-net_tx_action
<idle>-0 [00] 28994586831448511 [+ 156]: sock_wfree <-__kfree_skb
<idle>-0 [00] 28994586831448667 [+ 93]: _spin_trylock <-net_tx_action
<idle>-0 [00] 28994586831448761 [+ 62]: __qdisc_run <-net_tx_action
<idle>-0 [00] 28994586831448824 [+ 156]: pfifo_fast_dequeue <-__qdisc_run
<idle>-0 [00] 28994586831448980 [+ 93]: net_rx_action <-__do_softirq
<idle>-0 [00] 28994586831449074 [+ 156]: process_backlog <-net_rx_action
<idle>-0 [00] 28994586831449230 [+ 156]: ip_rcv <-process_backlog
<idle>-0 [00] 28994586831449386 [+ 125]: ip_route_input <-ip_rcv
<idle>-0 [00] 28994586831449511 [+ 125]: ip_local_deliver <-ip_rcv
<idle>-0 [00] 28994586831449636 [+ 125]: tcp_v4_rcv <-ip_local_deliver
<idle>-0 [00] 28994586831449761 [+ 125]: _read_lock <-tcp_v4_rcv
<idle>-0 [00] 28994586831449886 [+ 93]: dummy_socket_sock_rcv_skb <-tcp_v4_rcv
<idle>-0 [00] 28994586831449980 [+ 93]: _spin_lock <-tcp_v4_rcv
<idle>-0 [00] 28994586831450074 [+ 93]: __wake_up <-tcp_v4_rcv
<idle>-0 [00] 28994586831450167 [+ 93]: _spin_lock_irqsave <-__wake_up
<idle>-0 [00] 28994586831450261 [+ 93]: autoremove_wake_function <-__wake_up
<idle>-0 [00] 28994586831450355 [+ 93]: default_wake_function <-autoremove_wake_function
<idle>-0 [00] 28994586831450449 [+ 125]: _spin_lock <-default_wake_function
<idle>-0 [00] 28994586831450574 [+ 187]: sched_clock <-default_wake_function
<idle>-0 [00] 28994586831450761 [+ 125]: resched_task <-default_wake_function
<idle>-0 [00] 28994586831450886 [+ 93]: _spin_unlock_irqrestore <-default_wake_function
<idle>-0 [00] 28994586831450980 [+ 93]: _spin_unlock_irqrestore <-__wake_up
<idle>-0 [00] 28994586831451074 [+ 93]: sk_reset_timer <-tcp_v4_rcv
<idle>-0 [00] 28994586831451167 [+ 93]: mod_timer <-sk_reset_timer
<idle>-0 [00] 28994586831451261 [+ 125]: _spin_lock_irqsave <-mod_timer
<idle>-0 [00] 28994586831451386 [+ 156]: _spin_unlock_irqrestore <-mod_timer
<idle>-0 [00] 28994586831451542 [+ 93]: account_system_vtime <-do_softirq
<idle>-0 [00] 28994586831451636 [+ 93]: account_system_time <-account_system_vtime
<idle>-0 [00] 28994586831451730 [+ 156]: acct_update_integrals <-account_system_time
<idle>-0 [00] 28994586831451886 [+ 125]: schedule <-cpu_idle
<idle>-0 [00] 28994586831452011 [+ 93]: profile_hit <-schedule
<idle>-0 [00] 28994586831452105 [+ 125]: sched_clock <-schedule
<idle>-0 [00] 28994586831452230 [+ 343]: _spin_lock_irq <-schedule
<...>-2804 [00] 28994586831452574 [+ 93]: account_vtime <-schedule
<...>-2804 [00] 28994586831452667 [+ 62]: account_user_time <-account_vtime
<...>-2804 [00] 28994586831452730 [+ 125]: account_system_time <-account_vtime
<...>-2804 [00] 28994586831452855 [+ 125]: acct_update_integrals <-account_system_time
<...>-2804 [00] 28994586831452980 [+ 93]: _spin_lock_bh <-sk_wait_data
<...>-2804 [00] 28994586831453074 [+ 93]: _spin_unlock_bh <-sk_wait_data
<...>-2804 [00] 28994586831453167 [+ 93]: local_bh_enable <-_spin_unlock_bh
<...>-2804 [00] 28994586831453261 [+ 125]: finish_wait <-sk_wait_data
<...>-2804 [00] 28994586831453386 [+ 93]: tcp_v4_do_rcv <-tcp_recvmsg
<...>-2804 [00] 28994586831453480 [+ 93]: tcp_rcv_established <-tcp_v4_do_rcv
<...>-2804 [00] 28994586831453574 [+ 93]: local_bh_enable <-tcp_rcv_established
<...>-2804 [00] 28994586831453667 [+ 93]: skb_copy_datagram_iovec <-tcp_rcv_established
<...>-2804 [00] 28994586831453761 [+ 93]: memcpy_toiovec <-skb_copy_datagram_iovec
<...>-2804 [00] 28994586831453855 [+ 62]: copy_to_user_mvcos_check <-memcpy_toiovec
<...>-2804 [00] 28994586831453917 [+ 187]: copy_to_user_std <-copy_to_user_mvcos_check
<...>-2804 [00] 28994586831454105 [+ 125]: tcp_ack <-tcp_rcv_established
<...>-2804 [00] 28994586831454230 [+ 125]: __kfree_skb <-tcp_ack
<...>-2804 [00] 28994586831454355 [+ 125]: kfree <-__kfree_skb
<...>-2804 [00] 28994586831454480 [+ 187]: kmem_cache_free <-__kfree_skb
<...>-2804 [00] 28994586831454667 [+ 93]: tcp_reno_cong_avoid <-tcp_ack
<...>-2804 [00] 28994586831454761 [+ 93]: tcp_current_mss <-tcp_rcv_established
<...>-2804 [00] 28994586831454855 [+ 156]: __tcp_push_pending_frames <-tcp_rcv_established
<...>-2804 [00] 28994586831455011 [+ 62]: tcp_send_delayed_ack <-tcp_rcv_established
<...>-2804 [00] 28994586831455074 [+ 93]: sk_reset_timer <-tcp_send_delayed_ack
<...>-2804 [00] 28994586831455167 [+ 93]: mod_timer <-sk_reset_timer
<...>-2804 [00] 28994586831455261 [+ 93]: _spin_lock_irqsave <-mod_timer
<...>-2804 [00] 28994586831455355 [+ 93]: _spin_unlock_irqrestore <-mod_timer
<...>-2804 [00] 28994586831455449 [+ 125]: __kfree_skb <-tcp_rcv_established
<...>-2804 [00] 28994586831455574 [+ 93]: kfree <-__kfree_skb
<...>-2804 [00] 28994586831455667 [+ 125]: kmem_cache_free <-__kfree_skb
<...>-2804 [00] 28994586831455792 [+ 156]: local_bh_enable <-tcp_recvmsg
<...>-2804 [00] 28994586831455949 [+ 93]: __tcp_select_window <-tcp_recvmsg
<...>-2804 [00] 28994586831456042 [+ 93]: release_sock <-tcp_recvmsg
<...>-2804 [00] 28994586831456136 [+ 93]: _spin_lock_bh <-release_sock
<...>-2804 [00] 28994586831456230 [+ 93]: _spin_unlock_bh <-release_sock
<...>-2804 [00] 28994586831456324 [+ 156]: local_bh_enable <-_spin_unlock_bh
<...>-2804 [00] 28994586831456480 [+ 93]: dnotify_parent <-sys_read
<...>-2804 [00] 28994586831456574 [+ 125]: _spin_lock <-dnotify_parent
<...>-2804 [00] 28994586831456699 [+ 62]: inotify_dentry_parent_queue_event <-sys_read
<...>-2804 [00] 28994586831456761 [+ 1437]: inotify_inode_queue_event <-sys_read
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-14 12:34 ` Heiko Carstens
@ 2009-08-14 16:04 ` Linus Torvalds
2009-08-14 17:13 ` Heiko Carstens
0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2009-08-14 16:04 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt, David S. Miller
On Fri, 14 Aug 2009, Heiko Carstens wrote:
>
> All of this started when we compared a 2.6.27 based distro kernel
> to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu
> cycles than 2.6.16 does for the same workloads.
>
> As an example: we took a simple one connection ping pong network load.
> (client sends packet to server and waits until packet from server
> returns and game starts again). It uses more cpu cycles on 2.6.27.
>
> Using ftrace we figured out that on 2.6.27 it takes more than 500 function
> calls on the client for a single roundtrip while on 2.6.16 it took only
> ~230 calls.
Hmm. But the spinlock part of this seems to not really have changed.
Removing everything but the actual callchain info, and then doing a "diff"
between your two roundtrips, and they look very different, and basically
none of the difference seems to be due to spinlocks.
It seems to be due to a number of different things, but the bulk of the
new costs seem to be in networking (and to some degree scheduling). There
are smaller differences elsewhere, but the networking code _really_ blows
up.
I don't know how much of some of these are "real" kernel changes, and how
much of it is less inlining, but spinlocks don't seem to be the issue.
Here's a quick walk-through (some of these get repeated a couple of times
in your traces)
* Apparmor going through the generic security code and rw_verify_area:
apparmor_file_permission <-sys_write
becomes
security_file_permission <-rw_verify_area
apparmor_file_permission <-security_file_permission
aa_file_permission <-apparmor_file_permission
* the vfs write itself being deeper
sock_aio_write <-sys_write
becomes
vfs_write <-SyS_write
rw_verify_area <-vfs_write
do_sync_write <-vfs_write
sock_aio_write <-do_sync_write
* the socket code does totally new apparmor security things that were not
in 2.6.16 at all:
security_socket_sendmsg <-sock_aio_write
apparmor_socket_sendmsg <-security_socket_sendmsg
aa_revalidate_sk <-apparmor_socket_sendmsg
* TCP doing more:
__alloc_skb <-tcp_sendmsg
..
..
ip_output <-ip_queue_xmit
dev_queue_xmit <-ip_output
becomes
tcp_established_options <-tcp_current_mss
sk_stream_alloc_skb <-tcp_sendmsg
__alloc_skb <-sk_stream_alloc_skb
..
tcp_init_tso_segs <-__tcp_push_pending_frames
tcp_set_skb_tso_segs <-tcp_init_tso_segs
..
__skb_clone <-skb_clone
__copy_skb_header <-__skb_clone
tcp_established_options <-tcp_transmit_skb
skb_push <-tcp_transmit_skb
__tcp_select_window <-tcp_transmit_skb
tcp_options_write <-tcp_transmit_skb
..
__sk_dst_check <-ip_queue_xmit
skb_push <-ip_queue_xmit
ip_local_out <-ip_queue_xmit
__ip_local_out <-ip_local_out
ip_output <-ip_local_out
ip_finish_output <-ip_output
skb_push <-ip_finish_output
dev_queue_xmit <-ip_finish_output
local_bh_disable <-dev_queue_xmit
* More TCP changes at tcp_ack time:
kfree <-__kfree_skb
kmem_cache_free <-__kfree_skb
tcp_reno_cong_avoid <-tcp_ack
tcp_current_mss <-tcp_rcv_established
__tcp_push_pending_frames <-tcp_rcv_established
tcp_send_delayed_ack <-tcp_rcv_established
becomes
skb_release_all <-__kfree_skb
dst_release <-skb_release_all
skb_release_data <-skb_release_all
kfree <-skb_release_data
kmem_cache_free <-__kfree_skb
tcp_rtt_estimator <-tcp_ack
bictcp_acked <-tcp_ack
bictcp_cong_avoid <-tcp_ack
tcp_is_cwnd_limited <-bictcp_cong_avoid
tcp_current_mss <-tcp_rcv_established
tcp_established_options <-tcp_current_mss
__tcp_push_pending_frames <-tcp_rcv_established
__tcp_ack_snd_check <-tcp_rcv_established
tcp_send_delayed_ack <-__tcp_ack_snd_check
* The network driver doing more:
qeth_hard_start_xmit <-dev_hard_start_xmit
qeth_fill_header <-qeth_hard_start_xmit
memset <-qeth_fill_header
memset <-qeth_fill_header
skb_queue_tail <-qeth_hard_start_xmit
..
mod_timer <-do_QDIO
_spin_lock_irqsave <-mod_timer
_spin_unlock_irqrestore <-mod_timer
raise_softirq_irqoff <-qeth_hard_start_xmit
..
becomes
qeth_l3_hard_start_xmit <-dev_hard_start_xmit
qeth_get_cast_type <-qeth_l3_hard_start_xmit
qeth_get_priority_queue <-qeth_l3_hard_start_xmit
skb_realloc_headroom <-qeth_l3_hard_start_xmit
pskb_copy <-skb_realloc_headroom
__alloc_skb <-pskb_copy
kmem_cache_alloc <-__alloc_skb
__kmalloc <-__alloc_skb
skb_put <-pskb_copy
copy_skb_header <-pskb_copy
__copy_skb_header <-copy_skb_header
skb_pull <-qeth_l3_hard_start_xmit
skb_push <-qeth_l3_hard_start_xmit
qeth_l3_fill_header <-qeth_l3_hard_start_xmit
qeth_get_elements_no <-qeth_l3_hard_start_xmit
qeth_do_send_packet <-qeth_l3_hard_start_xmit
skb_queue_tail <-qeth_do_send_packet
..
qdio_perf_stat_inc <-do_QDIO
qdio_kick_outbound_q <-do_QDIO
qdio_perf_stat_inc <-qdio_kick_outbound_q
__tasklet_schedule <-do_QDIO
dev_kfree_skb_any <-qeth_l3_hard_start_xmit
kfree_skb <-dev_kfree_skb_any
__kfree_skb <-kfree_skb
skb_release_all <-__kfree_skb
dst_release <-skb_release_all
sock_wfree <-skb_release_all
skb_release_data <-skb_release_all
__netif_schedule <-qeth_l3_hard_start_xmit
raise_softirq_irqoff <-__netif_schedule
..
tasklet_action <-__do_softirq
qdio_outbound_processing <-tasklet_action
qdio_perf_stat_inc <-qdio_outbound_processing
mod_timer <-qdio_outbound_processing
* the timer code doing more:
__timer_stats_timer_set_start_info <-mod_timer
__mod_timer <-mod_timer
__timer_stats_timer_set_start_info <-__mod_timer
lock_timer_base <-__mod_timer
_spin_lock_irqsave <-lock_timer_base
internal_add_timer <-__mod_timer
_spin_unlock_irqrestore <-__mod_timer
qdio_perf_stat_inc <-qdio_outbound_processing
account_system_vtime <-__do_softirq
* the scheduler code doing way deeper chains:
profile_hit <-schedule
sched_clock <-schedule
_spin_lock_irq <-schedule
account_vtime <-schedule
account_user_time <-account_vtime
becomes
_spin_lock <-schedule
sched_clock_cpu <-schedule
sched_clock <-sched_clock_cpu
deactivate_task <-schedule
dequeue_task <-deactivate_task
dequeue_task_fair <-dequeue_task
dequeue_entity <-dequeue_task_fair
update_curr <-dequeue_entity
hrtick_start_fair <-dequeue_task_fair
put_prev_task_fair <-schedule
pick_next_task_fair <-schedule
pick_next_task_rt <-schedule
pick_next_task_fair <-schedule
pick_next_task_idle <-schedule
_spin_trylock <-tracing_record_cmdline
finish_task_switch <-schedule
account_vtime <-finish_task_switch
account_user_time <-account_vtime
acct_update_integrals <-account_user_time
jiffies_to_timeval <-acct_update_integrals
Hmm. Not pretty.
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-14 16:04 ` Linus Torvalds
@ 2009-08-14 17:13 ` Heiko Carstens
2009-08-14 18:08 ` Linus Torvalds
0 siblings, 1 reply; 21+ messages in thread
From: Heiko Carstens @ 2009-08-14 17:13 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt, David S. Miller
On Fri, Aug 14, 2009 at 09:04:49AM -0700, Linus Torvalds wrote:
> On Fri, 14 Aug 2009, Heiko Carstens wrote:
> >
> > All of this started when we compared a 2.6.27 based distro kernel
> > to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu
> > cycles than 2.6.16 does for the same workloads.
> >
> > As an example: we took a simple one connection ping pong network load.
> > (client sends packet to server and waits until packet from server
> > returns and game starts again). It uses more cpu cycles on 2.6.27.
> >
> > Using ftrace we figured out that on 2.6.27 it takes more than 500 function
> > calls on the client for a single roundtrip while on 2.6.16 it took only
> > ~230 calls.
>
> Hmm. But the spinlock part of this seems to not really have changed.
>
> Removing everything but the actual callchain info, and then doing a "diff"
> between your two roundtrips, and they look very different, and basically
> none of the difference seems to be due to spinlocks.
>
> It seems to be due to a number of different things, but the bulk of the
> new costs seem to be in networking (and to some degree scheduling). There
> are smaller differences elsewhere, but the networking code _really_ blows
> up.
>
> I don't know how much of some of these are "real" kernel changes, and how
> much of it is less inlining, but spinlocks don't seem to be the issue.
> Here's a quick walk-through (some of these get repeated a couple of times
> in your traces)
That's true. I just wanted to give some background how all of this started.
Spinlocks are equally good or bad for s390 on both kernel versions.
However quite a few of the additional function calls in networking code
come from uninlining:
c2aa270a [NET]: uninline skb_push, de-bloats a lot
6be8ac2f [NET]: uninline skb_pull, de-bloats a lot
419ae74e [NET]: uninline skb_trim, de-bloats
Christian Ehrhardt identified these and more, but hasn't posted patches
yet. Reverting these patches does increase performance.
That's when I started wondering what we would gain when we would inline
spinlock code again.
> * the timer code doing more:
>
> __timer_stats_timer_set_start_info <-mod_timer
> __mod_timer <-mod_timer
> __timer_stats_timer_set_start_info <-__mod_timer
> lock_timer_base <-__mod_timer
> _spin_lock_irqsave <-lock_timer_base
> internal_add_timer <-__mod_timer
> _spin_unlock_irqrestore <-__mod_timer
> qdio_perf_stat_inc <-qdio_outbound_processing
> account_system_vtime <-__do_softirq
This should be better now:
507e1231 timer stats: Optimize by adding quick check to avoid function calls
We did also address our device driver with several patches.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-14 17:13 ` Heiko Carstens
@ 2009-08-14 18:08 ` Linus Torvalds
2009-08-14 20:19 ` David Miller
0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2009-08-14 18:08 UTC (permalink / raw)
To: Heiko Carstens
Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
Christian Ehrhardt, David S. Miller
On Fri, 14 Aug 2009, Heiko Carstens wrote:
>
> However quite a few of the additional function calls in networking code
> come from uninlining:
>
> c2aa270a [NET]: uninline skb_push, de-bloats a lot
> 6be8ac2f [NET]: uninline skb_pull, de-bloats a lot
> 419ae74e [NET]: uninline skb_trim, de-bloats
Hmm. Those functions are big only because of the stupid debugging, which
is almost certainly not worth it any more. I doubt people have seen enough
skb under/over-flows in the last years to merit the code.
I suspect we should remove the silly skb_under/over_panic functions, and
do
- skb_put:
- if (unlikely(skb->tail > skb->end))
- skb_over_panic(skb, len, __builtin_return_address(0));
+ BUG_ON(skb->tail > skb->end);
- skb_push:
- if (unlikely(skb->data<skb->head))
- skb_under_panic(skb, len, __builtin_return_address(0));
+ BUG_ON(skb->data < skb->head);
at which point it might be worthwhile to inline them again, because the
footprint of a BUG_ON() is really fairly small.
If this is a big deal on s390, it might be worth it.
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-14 18:08 ` Linus Torvalds
@ 2009-08-14 20:19 ` David Miller
2009-08-14 20:45 ` Linus Torvalds
0 siblings, 1 reply; 21+ messages in thread
From: David Miller @ 2009-08-14 20:19 UTC (permalink / raw)
To: torvalds
Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch,
schwidefsky, arnd, horsth, ehrhardt
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 14 Aug 2009 11:08:16 -0700 (PDT)
> On Fri, 14 Aug 2009, Heiko Carstens wrote:
>>
>> However quite a few of the additional function calls in networking code
>> come from uninlining:
>>
>> c2aa270a [NET]: uninline skb_push, de-bloats a lot
>> 6be8ac2f [NET]: uninline skb_pull, de-bloats a lot
>> 419ae74e [NET]: uninline skb_trim, de-bloats
>
> Hmm. Those functions are big only because of the stupid debugging, which
> is almost certainly not worth it any more. I doubt people have seen enough
> skb under/over-flows in the last years to merit the code.
We have found a few bugs in the past year or two because of
the check, but indeed not "a lot".
> I suspect we should remove the silly skb_under/over_panic functions, and
> do
>
> - skb_put:
> - if (unlikely(skb->tail > skb->end))
> - skb_over_panic(skb, len, __builtin_return_address(0));
> + BUG_ON(skb->tail > skb->end);
>
> - skb_push:
> - if (unlikely(skb->data<skb->head))
> - skb_under_panic(skb, len, __builtin_return_address(0));
> + BUG_ON(skb->data < skb->head);
>
> at which point it might be worthwhile to inline them again, because the
> footprint of a BUG_ON() is really fairly small.
Maybe. Although most people turn on verbose BUG's and that
expands to the same if not more code than what's there now
in these SKB inlines.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-14 20:19 ` David Miller
@ 2009-08-14 20:45 ` Linus Torvalds
2009-08-14 21:10 ` Linus Torvalds
0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2009-08-14 20:45 UTC (permalink / raw)
To: David Miller
Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch,
schwidefsky, arnd, horsth, ehrhardt
On Fri, 14 Aug 2009, David Miller wrote:
>
> Maybe. Although most people turn on verbose BUG's and that
> expands to the same if not more code than what's there now
> in these SKB inlines.
But not inline in the code, though. So yeah, it has a memory footprint,
but shouldn't have a cache footprint.
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-14 20:45 ` Linus Torvalds
@ 2009-08-14 21:10 ` Linus Torvalds
2009-08-14 22:23 ` David Miller
2009-08-16 18:27 ` Heiko Carstens
0 siblings, 2 replies; 21+ messages in thread
From: Linus Torvalds @ 2009-08-14 21:10 UTC (permalink / raw)
To: David Miller
Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch,
schwidefsky, arnd, horsth, ehrhardt
On Fri, 14 Aug 2009, Linus Torvalds wrote:
>
> But not inline in the code, though. So yeah, it has a memory footprint,
> but shouldn't have a cache footprint.
An example of this: stack usage. This is the code with BUG_ON():
skb_push:
push %rbp
mov %esi,%eax
add %esi,0x68(%rdi)
neg %rax
mov %rsp,%rbp
add 0xd8(%rdi),%rax
mov %rax,0xd8(%rdi)
cmp 0xd0(%rdi),%rax
jae 2f
ud2a
1: jmp 1b
2: leaveq
retq
and note how in the code, we're just jumping over something like four
bytes ("ud2" plus that silly endless loop-jump just to make gcc happy
are both 2 bytes on x86[-64]).
Here's the same code with that horrid skb_under_panic():
skb_push:
push %rbp
mov %esi,%eax
mov %rsp,%rbp
neg %rax
push %rbx
mov %rdi,%rbx
sub $0x8,%rsp
add 0xd8(%rdi),%rax
add %esi,0x68(%rdi)
mov %rax,0xd8(%rdi)
cmp 0xd0(%rdi),%rax
jae 1f
mov 0x8(%rbp),%rdx
callq skb_under_panic
1: mov 0xd8(%rbx),%rax
pop %r11
pop %rbx
leaveq
retq
and notice how it creates a stack frame due to the call (even when the
call is never done!) and the dead space is now 11 bytes.
(Total size of functions: 41 bytes vs 64 bytes)
So things like that do make a difference. It's surprisingly expensive
to add a function call t what otherwise could be a leaf function. And
BUG_ON() avoids that.
Of course, even at just 41 bytes, it's not _entirely_ obvious that
inlining is the right thing. Of those 41 bytes, four are just function
overhead, and the "call" is five bytes in the caller and a few bytes of
spills and argument setups, but inlining is still probably going to change
each call-site from ~10 bytes to ~35 bytes.
Linus
---
include/linux/skbuff.h | 4 ---
net/core/skbuff.c | 51 +----------------------------------------------
2 files changed, 2 insertions(+), 53 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f2c69a2..2af92b4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -483,10 +483,6 @@ extern int skb_cow_data(struct sk_buff *skb, int tailbits,
extern int skb_pad(struct sk_buff *skb, int pad);
#define dev_kfree_skb(a) consume_skb(a)
#define dev_consume_skb(a) kfree_skb_clean(a)
-extern void skb_over_panic(struct sk_buff *skb, int len,
- void *here);
-extern void skb_under_panic(struct sk_buff *skb, int len,
- void *here);
extern int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb,
int getfrag(void *from, char *to, int offset,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9e0597d..ae63473 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -103,51 +103,6 @@ static struct pipe_buf_operations sock_pipe_buf_ops = {
.get = sock_pipe_buf_get,
};
-/*
- * Keep out-of-line to prevent kernel bloat.
- * __builtin_return_address is not used because it is not always
- * reliable.
- */
-
-/**
- * skb_over_panic - private function
- * @skb: buffer
- * @sz: size
- * @here: address
- *
- * Out of line support code for skb_put(). Not user callable.
- */
-void skb_over_panic(struct sk_buff *skb, int sz, void *here)
-{
- printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p "
- "data:%p tail:%#lx end:%#lx dev:%s\n",
- here, skb->len, sz, skb->head, skb->data,
- (unsigned long)skb->tail, (unsigned long)skb->end,
- skb->dev ? skb->dev->name : "<NULL>");
- BUG();
-}
-EXPORT_SYMBOL(skb_over_panic);
-
-/**
- * skb_under_panic - private function
- * @skb: buffer
- * @sz: size
- * @here: address
- *
- * Out of line support code for skb_push(). Not user callable.
- */
-
-void skb_under_panic(struct sk_buff *skb, int sz, void *here)
-{
- printk(KERN_EMERG "skb_under_panic: text:%p len:%d put:%d head:%p "
- "data:%p tail:%#lx end:%#lx dev:%s\n",
- here, skb->len, sz, skb->head, skb->data,
- (unsigned long)skb->tail, (unsigned long)skb->end,
- skb->dev ? skb->dev->name : "<NULL>");
- BUG();
-}
-EXPORT_SYMBOL(skb_under_panic);
-
/* Allocate a new skbuff. We do this ourselves so we can fill in a few
* 'private' fields and also do memory statistics to find all the
* [BEEP] leaks.
@@ -1014,8 +969,7 @@ unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
SKB_LINEAR_ASSERT(skb);
skb->tail += len;
skb->len += len;
- if (unlikely(skb->tail > skb->end))
- skb_over_panic(skb, len, __builtin_return_address(0));
+ BUG_ON(skb->tail > skb->end);
return tmp;
}
EXPORT_SYMBOL(skb_put);
@@ -1033,8 +987,7 @@ unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
{
skb->data -= len;
skb->len += len;
- if (unlikely(skb->data<skb->head))
- skb_under_panic(skb, len, __builtin_return_address(0));
+ BUG_ON(skb->data < skb->head);
return skb->data;
}
EXPORT_SYMBOL(skb_push);
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-14 21:10 ` Linus Torvalds
@ 2009-08-14 22:23 ` David Miller
2009-08-16 18:27 ` Heiko Carstens
1 sibling, 0 replies; 21+ messages in thread
From: David Miller @ 2009-08-14 22:23 UTC (permalink / raw)
To: torvalds
Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch,
schwidefsky, arnd, horsth, ehrhardt
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 14 Aug 2009 14:10:45 -0700 (PDT)
> On Fri, 14 Aug 2009, Linus Torvalds wrote:
>>
>> But not inline in the code, though. So yeah, it has a memory footprint,
>> but shouldn't have a cache footprint.
>
> An example of this: stack usage. This is the code with BUG_ON():
...
> and note how in the code, we're just jumping over something like four
> bytes ("ud2" plus that silly endless loop-jump just to make gcc happy
> are both 2 bytes on x86[-64]).
>
> Here's the same code with that horrid skb_under_panic():
Yes on x86 it's better even with verbose debugging because
of the inlining of the debug message pointer etc.
I'll have to do something more intelligent than I currently do on
sparc64, and I had always been meaning to do that. :-)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-14 21:10 ` Linus Torvalds
2009-08-14 22:23 ` David Miller
@ 2009-08-16 18:27 ` Heiko Carstens
2009-08-16 18:45 ` Linus Torvalds
2009-08-17 21:26 ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar
1 sibling, 2 replies; 21+ messages in thread
From: Heiko Carstens @ 2009-08-16 18:27 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Miller, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky,
arnd, horsth, ehrhardt
On Fri, Aug 14, 2009 at 02:10:45PM -0700, Linus Torvalds wrote:
> On Fri, 14 Aug 2009, Linus Torvalds wrote:
> >
> > But not inline in the code, though. So yeah, it has a memory footprint,
> > but shouldn't have a cache footprint.
>
> An example of this: stack usage. This is the code with BUG_ON():
>
> skb_push:
> push %rbp
> mov %esi,%eax
> add %esi,0x68(%rdi)
> neg %rax
> mov %rsp,%rbp
> add 0xd8(%rdi),%rax
> mov %rax,0xd8(%rdi)
> cmp 0xd0(%rdi),%rax
> jae 2f
> ud2a
> 1: jmp 1b
> 2: leaveq
> retq
An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 results in
the following vmlinux sizes:
574152755 2009-08-16 20:05 vmlinux-orig
574862000 2009-08-16 20:13 vmlinux-skb-inline
The skb-inline variant was created by using your patch and on top of it the
patch below. Size increased by 692KB (0.12%).
Dunno.. is that acceptable?
---
include/linux/skbuff.h | 35 +++++++++++++++++++++++++++++++++--
net/core/skbuff.c | 38 --------------------------------------
2 files changed, 33 insertions(+), 40 deletions(-)
Index: linux-2.6/include/linux/skbuff.h
===================================================================
--- linux-2.6.orig/include/linux/skbuff.h
+++ linux-2.6/include/linux/skbuff.h
@@ -1116,7 +1116,6 @@ static inline void skb_set_tail_pointer(
/*
* Add data to an sk_buff
*/
-extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
{
unsigned char *tmp = skb_tail_pointer(skb);
@@ -1126,7 +1125,23 @@ static inline unsigned char *__skb_put(s
return tmp;
}
-extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
+/**
+ * skb_put - add data to a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer. If this would
+ * exceed the total buffer size the kernel will panic. A pointer to the
+ * first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
+{
+ unsigned char *tmp = __skb_put(skb, len);
+
+ BUG_ON(skb->tail > skb->end);
+ return tmp;
+}
+
static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
{
skb->data -= len;
@@ -1134,6 +1149,22 @@ static inline unsigned char *__skb_push(
return skb->data;
}
+/**
+ * skb_push - add data to the start of a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer at the buffer
+ * start. If this would exceed the total buffer headroom the kernel will
+ * panic. A pointer to the first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
+{
+ __skb_push(skb, len);
+ BUG_ON(skb->data < skb->head);
+ return skb->data;
+}
+
extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
{
Index: linux-2.6/net/core/skbuff.c
===================================================================
--- linux-2.6.orig/net/core/skbuff.c
+++ linux-2.6/net/core/skbuff.c
@@ -955,44 +955,6 @@ free_skb:
EXPORT_SYMBOL(skb_pad);
/**
- * skb_put - add data to a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer. If this would
- * exceed the total buffer size the kernel will panic. A pointer to the
- * first byte of the extra data is returned.
- */
-unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
-{
- unsigned char *tmp = skb_tail_pointer(skb);
- SKB_LINEAR_ASSERT(skb);
- skb->tail += len;
- skb->len += len;
- BUG_ON(skb->tail > skb->end);
- return tmp;
-}
-EXPORT_SYMBOL(skb_put);
-
-/**
- * skb_push - add data to the start of a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer at the buffer
- * start. If this would exceed the total buffer headroom the kernel will
- * panic. A pointer to the first byte of the extra data is returned.
- */
-unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
-{
- skb->data -= len;
- skb->len += len;
- BUG_ON(skb->data < skb->head);
- return skb->data;
-}
-EXPORT_SYMBOL(skb_push);
-
-/**
* skb_pull - remove data from the start of a buffer
* @skb: buffer to use
* @len: amount of data to remove
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-16 18:27 ` Heiko Carstens
@ 2009-08-16 18:45 ` Linus Torvalds
2009-08-16 20:36 ` Ingo Molnar
2009-08-17 21:26 ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar
1 sibling, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2009-08-16 18:45 UTC (permalink / raw)
To: Heiko Carstens
Cc: David Miller, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky,
arnd, horsth, ehrhardt
On Sun, 16 Aug 2009, Heiko Carstens wrote:
>
> An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 results in
> the following vmlinux sizes:
>
> 574152755 2009-08-16 20:05 vmlinux-orig
> 574862000 2009-08-16 20:13 vmlinux-skb-inline
It's more relevant to look at just code size. BUG_ON() has more of an
impact on data sizes, because it does something like 10-12 bytes of
exception data for each bug-on. But that exception data is out-of-line and
not interesting.
If the code size blows up a lot, that's worse than if the data size
expands a bit.
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-16 18:45 ` Linus Torvalds
@ 2009-08-16 20:36 ` Ingo Molnar
2009-08-17 10:26 ` Heiko Carstens
0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2009-08-16 20:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Heiko Carstens, David Miller, akpm, a.p.zijlstra, linux-arch,
schwidefsky, arnd, horsth, ehrhardt
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Sun, 16 Aug 2009, Heiko Carstens wrote:
> >
> > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc
> > 4.3.2 results in the following vmlinux sizes:
> >
> > 574152755 2009-08-16 20:05 vmlinux-orig
> > 574862000 2009-08-16 20:13 vmlinux-skb-inline
>
> It's more relevant to look at just code size. BUG_ON() has more of
> an impact on data sizes, because it does something like 10-12
> bytes of exception data for each bug-on. But that exception data
> is out-of-line and not interesting.
>
> If the code size blows up a lot, that's worse than if the data
> size expands a bit.
Also, allyesconfig isnt really relevant to inlining cost in general
as it enables everything. The defconfigs are better (at least on
x86) because they tend to be some distro-alike config - i.e. they
are closer to what most Linux users will run in the end.
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [patch 0/3] Allow inlined spinlocks again V3
2009-08-16 20:36 ` Ingo Molnar
@ 2009-08-17 10:26 ` Heiko Carstens
0 siblings, 0 replies; 21+ messages in thread
From: Heiko Carstens @ 2009-08-17 10:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, David Miller, akpm, a.p.zijlstra, linux-arch,
schwidefsky, arnd, horsth, ehrhardt
On Sun, Aug 16, 2009 at 10:36:50PM +0200, Ingo Molnar wrote:
> > > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc
> > > 4.3.2 results in the following vmlinux sizes:
> > >
> > > 574152755 2009-08-16 20:05 vmlinux-orig
> > > 574862000 2009-08-16 20:13 vmlinux-skb-inline
> >
> > It's more relevant to look at just code size. BUG_ON() has more of
> > an impact on data sizes, because it does something like 10-12
> > bytes of exception data for each bug-on. But that exception data
> > is out-of-line and not interesting.
> >
> > If the code size blows up a lot, that's worse than if the data
> > size expands a bit.
>
> Also, allyesconfig isnt really relevant to inlining cost in general
> as it enables everything. The defconfigs are better (at least on
> x86) because they tend to be some distro-alike config - i.e. they
> are closer to what most Linux users will run in the end.
Ok, I'm still using allyesconfig but still these numbers should be
good enough:
For the unpatched kernel we have
ffffffff81009000 T _stext
ffffffff83614130 T _etext
size: 39891248
ffffffff8460e250 R __start___bug_table
ffffffff8462e9ec R __stop___bug_table
size: 133020
And for the patched kernel (inlined functions) we get
ffffffff81009000 T _stext
ffffffff83627250 T _etext
size: 39969360
ffffffff84621080 R __start___bug_table
ffffffff84649c4c R __stop___bug_table
size: 166860
We have CONFIG_GENERIC_BUG, CONFIG_GENERIC_BUG_RELATIVE_POINTERS and
CONFIG_DEBUG_BUGVERBOSE all enabled. So the size of each bug table entry
is 10 bytes.
We end up with 3384 additional bug table entries caused by inlining.
Text size increased by 78112 bytes which makes an average extra 23 bytes
for each call site.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [numbers] Re: [patch] more skb ops inlining
2009-08-16 18:27 ` Heiko Carstens
2009-08-16 18:45 ` Linus Torvalds
@ 2009-08-17 21:26 ` Ingo Molnar
2009-08-18 11:34 ` Heiko Carstens
1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2009-08-17 21:26 UTC (permalink / raw)
To: Heiko Carstens
Cc: Linus Torvalds, David Miller, akpm, a.p.zijlstra, linux-arch,
schwidefsky, arnd, horsth, ehrhardt
* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2
> results in the following vmlinux sizes:
>
> 574152755 2009-08-16 20:05 vmlinux-orig
> 574862000 2009-08-16 20:13 vmlinux-skb-inline
>
> The skb-inline variant was created by using your patch and on top
> of it the patch below. Size increased by 692KB (0.12%).
>
> Dunno.. is that acceptable?
i ported your patch to latest mainline (see it attached below) i've
done x86(-64) defconfig (with disabled CONFIG_OPTIMIZE_INLINING)
measurements vmlinux:
text data bss dec hex filename
8455329 1192304 989908 10637541 a250e5 vmlinux.before
8471036 1192304 989908 10653248 a28e40 vmlinux.after
that's a +0.18% text size increase - quite significant.
I also did performance measurements, trying to figure out whether
this increase in size can be measured via any runtime performance
metrics such as cycles, instructions or cache-misses.
To do that i picked a micro-benchmarks lat_tcp.
( Note that microbenchmarks are biased in favor of inlining patches
like yours, because they disproportionately favor the speed
benefit of it while not adequately showing the cache footprint
disadvantage of increased kernel text size. )
Without your patch i got:
titan:~/l> perf stat --repeat 10 -e task-clock -e cycles -e instructions -e L1-icache-refs -e L1-icache-misses ./lat_tcp localhost
TCP latency using localhost: 17.5198 microseconds
TCP latency using localhost: 17.5060 microseconds
TCP latency using localhost: 17.4684 microseconds
TCP latency using localhost: 17.5416 microseconds
TCP latency using localhost: 17.4881 microseconds
TCP latency using localhost: 17.4832 microseconds
TCP latency using localhost: 17.4830 microseconds
TCP latency using localhost: 17.4723 microseconds
TCP latency using localhost: 17.4733 microseconds
TCP latency using localhost: 17.5255 microseconds
Performance counter stats for './lat_tcp localhost' (10 runs):
1494.625043 task-clock-msecs # 0.564 CPUs ( +- 0.077% )
3665868023 cycles # 2452.701 M/sec ( +- 0.070% )
2163212606 instructions # 0.590 IPC ( +- 0.090% )
2449056216 L1-icache-loads # 1638.576 M/sec ( +- 0.056% )
61081182 L1-icache-load-misses # 40.867 M/sec ( +- 0.192% )
2.649159316 seconds time elapsed ( +- 0.090% )
With your patch i got:
titan:~/l> perf stat --repeat 10 -e task-clock -e cycles -e instructions -e L1-icache-refs -e L1-icache-misses ./lat_tcp localhost
TCP latency using localhost: 17.6913 microseconds
TCP latency using localhost: 17.7218 microseconds
TCP latency using localhost: 17.6882 microseconds
TCP latency using localhost: 17.7196 microseconds
TCP latency using localhost: 17.6411 microseconds
TCP latency using localhost: 17.7136 microseconds
TCP latency using localhost: 17.6488 microseconds
TCP latency using localhost: 17.6663 microseconds
TCP latency using localhost: 17.7445 microseconds
TCP latency using localhost: 17.6851 microseconds
Performance counter stats for './lat_tcp localhost' (10 runs):
1497.479984 task-clock-msecs # 0.564 CPUs ( +- 0.106% )
3690097505 cycles # 2464.205 M/sec ( +- 0.109% )
2162432402 instructions # 0.586 IPC ( +- 0.127% )
2437974192 L1-icache-loads # 1628.051 M/sec ( +- 0.105% )
63064588 L1-icache-load-misses # 42.114 M/sec ( +- 0.278% )
2.656952976 seconds time elapsed ( +- 0.110% )
Summarized (for lat_tcp):
| Before:
|
| 1494.625043 task-clock-msecs # 0.564 CPUs ( +- 0.077% )
| 3665868023 cycles # 2452.701 M/sec ( +- 0.070% )
| 2163212606 instructions # 0.590 IPC ( +- 0.090% )
|
| After:
|
| 1497.479984 task-clock-msecs # 0.564 CPUs ( +- 0.106% )
| 3690097505 cycles # 2464.205 M/sec ( +- 0.109% )
| 2162432402 instructions # 0.586 IPC ( +- 0.127% )
So we've got a 0.2% slowdown, which is provably outside the noise
level of 0.1%. The instruction count remained almost the same.
The reason for the slowdown can be seen in the L1 icache-fetch
stats:
2449056216 L1-icache-loads # 1638.576 M/sec ( +- 0.056% )
61081182 L1-icache-load-misses # 40.867 M/sec ( +- 0.192% )
2437974192 L1-icache-loads # 1628.051 M/sec ( +- 0.105% )
63064588 L1-icache-load-misses # 42.114 M/sec ( +- 0.278% )
The icache-misses went up from 40.867 M/sec to 42.114 M/sec, a
3.05% worsening - well beyond the statistical significance of
0.28%.
So unless i botched the measurements or some accidental code layout
differences punish the patch artificially, this patch is a small
but measureable performance regression on x86.
The slowdown itself is barely provable with a test repeat count of
10 iterations (not a surprise really - such patches were not really
provable before via /usr/bin/time kinds of measurements), on an
otherwise totally idle system.
The icache-miss stats do support the notion that what is causing
the slowdown is the critical path fallout out of the icache.
( By all means please repeat these measurements - they are very
easy to do via 'perf stat --repeat 10' - see the examples
above. )
To build the kernel i used gcc 4.3.2. The CPU i used is friendly to
inlining patches as well: it's an Extreme Edition Core2 Duo (booted
to a single core, to stabilize the numbers):
Intel(R) Core(TM)2 CPU E6800 @ 2.93GHz
With 4MB of L2 cache. I suspect smaller CPUs would feel the
negative effects of inlining in a more pronounced way.
Thanks,
Ingo
---
include/linux/skbuff.h | 37 +++++++++++++++++++++++++++++++++++--
net/core/skbuff.c | 40 ----------------------------------------
2 files changed, 35 insertions(+), 42 deletions(-)
Index: linux/include/linux/skbuff.h
===================================================================
--- linux.orig/include/linux/skbuff.h
+++ linux/include/linux/skbuff.h
@@ -1120,7 +1120,6 @@ static inline void skb_set_tail_pointer(
/*
* Add data to an sk_buff
*/
-extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
{
unsigned char *tmp = skb_tail_pointer(skb);
@@ -1130,7 +1129,24 @@ static inline unsigned char *__skb_put(s
return tmp;
}
-extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
+/**
+ * skb_put - add data to a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer. If this would
+ * exceed the total buffer size the kernel will panic. A pointer to the
+ * first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
+{
+ unsigned char *tmp = __skb_put(skb, len);
+
+ if (unlikely(skb->tail > skb->end))
+ skb_over_panic(skb, len, __builtin_return_address(0));
+ return tmp;
+}
+
static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
{
skb->data -= len;
@@ -1138,6 +1154,23 @@ static inline unsigned char *__skb_push(
return skb->data;
}
+/**
+ * skb_push - add data to the start of a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer at the buffer
+ * start. If this would exceed the total buffer headroom the kernel will
+ * panic. A pointer to the first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
+{
+ __skb_push(skb, len);
+ if (unlikely(skb->data<skb->head))
+ skb_under_panic(skb, len, __builtin_return_address(0));
+ return skb->data;
+}
+
extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
{
Index: linux/net/core/skbuff.c
===================================================================
--- linux.orig/net/core/skbuff.c
+++ linux/net/core/skbuff.c
@@ -1000,46 +1000,6 @@ free_skb:
EXPORT_SYMBOL(skb_pad);
/**
- * skb_put - add data to a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer. If this would
- * exceed the total buffer size the kernel will panic. A pointer to the
- * first byte of the extra data is returned.
- */
-unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
-{
- unsigned char *tmp = skb_tail_pointer(skb);
- SKB_LINEAR_ASSERT(skb);
- skb->tail += len;
- skb->len += len;
- if (unlikely(skb->tail > skb->end))
- skb_over_panic(skb, len, __builtin_return_address(0));
- return tmp;
-}
-EXPORT_SYMBOL(skb_put);
-
-/**
- * skb_push - add data to the start of a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer at the buffer
- * start. If this would exceed the total buffer headroom the kernel will
- * panic. A pointer to the first byte of the extra data is returned.
- */
-unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
-{
- skb->data -= len;
- skb->len += len;
- if (unlikely(skb->data<skb->head))
- skb_under_panic(skb, len, __builtin_return_address(0));
- return skb->data;
-}
-EXPORT_SYMBOL(skb_push);
-
-/**
* skb_pull - remove data from the start of a buffer
* @skb: buffer to use
* @len: amount of data to remove
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [numbers] Re: [patch] more skb ops inlining
2009-08-17 21:26 ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar
@ 2009-08-18 11:34 ` Heiko Carstens
0 siblings, 0 replies; 21+ messages in thread
From: Heiko Carstens @ 2009-08-18 11:34 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, David Miller, akpm, a.p.zijlstra, linux-arch,
schwidefsky, arnd, horsth, ehrhardt
On Mon, Aug 17, 2009 at 11:26:29PM +0200, Ingo Molnar wrote:
> * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
>
> > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2
> > results in the following vmlinux sizes:
> >
> > 574152755 2009-08-16 20:05 vmlinux-orig
> > 574862000 2009-08-16 20:13 vmlinux-skb-inline
> >
> > The skb-inline variant was created by using your patch and on top
> > of it the patch below. Size increased by 692KB (0.12%).
> >
> > Dunno.. is that acceptable?
>
> i ported your patch to latest mainline (see it attached below) i've
> done x86(-64) defconfig (with disabled CONFIG_OPTIMIZE_INLINING)
> measurements vmlinux:
> +static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
> +{
> + unsigned char *tmp = __skb_put(skb, len);
> +
> + if (unlikely(skb->tail > skb->end))
> + skb_over_panic(skb, len, __builtin_return_address(0));
> + return tmp;
> +}
No, the proposed patch replaces skb_over_panic() with BUG_ON() (see below).
With that I get slightly different results. However that's on my Thinkpad
booted with maxcpus=1 and a 32 bit kernel. Kernel is latest git as of today.
Now, this is a completely different machine than yours. With a different
config, so the machine actually boots. But the numbers are strange.
Part of /proc/cpuinfo :
model name : Intel(R) Core(TM)2 CPU T7600 @ 2.33GHz
stepping : 6
cache size : 4096 KB
Without the patch the result is:
TCP latency using localhost: 0.2576 microseconds
TCP latency using localhost: 0.2571 microseconds
TCP latency using localhost: 0.2579 microseconds
TCP latency using localhost: 0.2573 microseconds
TCP latency using localhost: 0.2574 microseconds
TCP latency using localhost: 0.2583 microseconds
TCP latency using localhost: 0.2572 microseconds
TCP latency using localhost: 0.2582 microseconds
TCP latency using localhost: 0.2583 microseconds
TCP latency using localhost: 0.2584 microseconds
Performance counter stats for './lat_tcp localhost' (10 runs):
24578.750193 task-clock-msecs # 0.994 CPUs ( +- 0.295% )
57166815217 cycles # 2325.863 M/sec ( +- 0.296% )
24994309721 instructions # 0.437 IPC ( +- 0.288% )
53691289530 L1-icache-loads # 2184.460 M/sec ( +- 0.304% )
424027 L1-icache-load-misses # 0.017 M/sec ( +- 0.824% )
24.714691813 seconds time elapsed ( +- 0.285% )
With the patch I get this:
TCP latency using localhost: 0.2588 microseconds
TCP latency using localhost: 0.2589 microseconds
TCP latency using localhost: 0.2581 microseconds
TCP latency using localhost: 0.2586 microseconds
TCP latency using localhost: 0.2584 microseconds
TCP latency using localhost: 0.2588 microseconds
TCP latency using localhost: 0.2586 microseconds
TCP latency using localhost: 0.2589 microseconds
TCP latency using localhost: 0.2582 microseconds
TCP latency using localhost: 0.2586 microseconds
Performance counter stats for './lat_tcp localhost' (10 runs):
24464.554706 task-clock-msecs # 0.995 CPUs ( +- 0.063% )
56903708182 cycles # 2325.965 M/sec ( +- 0.063% )
24831456567 instructions # 0.436 IPC ( +- 0.048% )
53068943303 L1-icache-loads # 2169.218 M/sec ( +- 0.081% )
418802 L1-icache-load-misses # 0.017 M/sec ( +- 0.637% )
24.596311828 seconds time elapsed ( +- 0.097% )
So latency increases while all other numbers decrease..
---
include/linux/skbuff.h | 39 +++++++++++++++++++---
net/core/skbuff.c | 85 -------------------------------------------------
2 files changed, 33 insertions(+), 91 deletions(-)
Index: linux-2.6/include/linux/skbuff.h
===================================================================
--- linux-2.6.orig/include/linux/skbuff.h
+++ linux-2.6/include/linux/skbuff.h
@@ -483,10 +483,6 @@ extern int skb_cow_data(struct sk
extern int skb_pad(struct sk_buff *skb, int pad);
#define dev_kfree_skb(a) consume_skb(a)
#define dev_consume_skb(a) kfree_skb_clean(a)
-extern void skb_over_panic(struct sk_buff *skb, int len,
- void *here);
-extern void skb_under_panic(struct sk_buff *skb, int len,
- void *here);
extern int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb,
int getfrag(void *from, char *to, int offset,
@@ -1120,7 +1116,6 @@ static inline void skb_set_tail_pointer(
/*
* Add data to an sk_buff
*/
-extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
{
unsigned char *tmp = skb_tail_pointer(skb);
@@ -1130,7 +1125,23 @@ static inline unsigned char *__skb_put(s
return tmp;
}
-extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
+/**
+ * skb_put - add data to a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer. If this would
+ * exceed the total buffer size the kernel will panic. A pointer to the
+ * first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
+{
+ unsigned char *tmp = __skb_put(skb, len);
+
+ BUG_ON(skb->tail > skb->end);
+ return tmp;
+}
+
static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
{
skb->data -= len;
@@ -1138,6 +1149,22 @@ static inline unsigned char *__skb_push(
return skb->data;
}
+/**
+ * skb_push - add data to the start of a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer at the buffer
+ * start. If this would exceed the total buffer headroom the kernel will
+ * panic. A pointer to the first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
+{
+ __skb_push(skb, len);
+ BUG_ON(skb->data < skb->head);
+ return skb->data;
+}
+
extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
{
Index: linux-2.6/net/core/skbuff.c
===================================================================
--- linux-2.6.orig/net/core/skbuff.c
+++ linux-2.6/net/core/skbuff.c
@@ -103,51 +103,6 @@ static struct pipe_buf_operations sock_p
.get = sock_pipe_buf_get,
};
-/*
- * Keep out-of-line to prevent kernel bloat.
- * __builtin_return_address is not used because it is not always
- * reliable.
- */
-
-/**
- * skb_over_panic - private function
- * @skb: buffer
- * @sz: size
- * @here: address
- *
- * Out of line support code for skb_put(). Not user callable.
- */
-void skb_over_panic(struct sk_buff *skb, int sz, void *here)
-{
- printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p "
- "data:%p tail:%#lx end:%#lx dev:%s\n",
- here, skb->len, sz, skb->head, skb->data,
- (unsigned long)skb->tail, (unsigned long)skb->end,
- skb->dev ? skb->dev->name : "<NULL>");
- BUG();
-}
-EXPORT_SYMBOL(skb_over_panic);
-
-/**
- * skb_under_panic - private function
- * @skb: buffer
- * @sz: size
- * @here: address
- *
- * Out of line support code for skb_push(). Not user callable.
- */
-
-void skb_under_panic(struct sk_buff *skb, int sz, void *here)
-{
- printk(KERN_EMERG "skb_under_panic: text:%p len:%d put:%d head:%p "
- "data:%p tail:%#lx end:%#lx dev:%s\n",
- here, skb->len, sz, skb->head, skb->data,
- (unsigned long)skb->tail, (unsigned long)skb->end,
- skb->dev ? skb->dev->name : "<NULL>");
- BUG();
-}
-EXPORT_SYMBOL(skb_under_panic);
-
/* Allocate a new skbuff. We do this ourselves so we can fill in a few
* 'private' fields and also do memory statistics to find all the
* [BEEP] leaks.
@@ -1000,46 +955,6 @@ free_skb:
EXPORT_SYMBOL(skb_pad);
/**
- * skb_put - add data to a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer. If this would
- * exceed the total buffer size the kernel will panic. A pointer to the
- * first byte of the extra data is returned.
- */
-unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
-{
- unsigned char *tmp = skb_tail_pointer(skb);
- SKB_LINEAR_ASSERT(skb);
- skb->tail += len;
- skb->len += len;
- if (unlikely(skb->tail > skb->end))
- skb_over_panic(skb, len, __builtin_return_address(0));
- return tmp;
-}
-EXPORT_SYMBOL(skb_put);
-
-/**
- * skb_push - add data to the start of a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer at the buffer
- * start. If this would exceed the total buffer headroom the kernel will
- * panic. A pointer to the first byte of the extra data is returned.
- */
-unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
-{
- skb->data -= len;
- skb->len += len;
- if (unlikely(skb->data<skb->head))
- skb_under_panic(skb, len, __builtin_return_address(0));
- return skb->data;
-}
-EXPORT_SYMBOL(skb_push);
-
-/**
* skb_pull - remove data from the start of a buffer
* @skb: buffer to use
* @len: amount of data to remove
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2009-08-18 11:34 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens
2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens
2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
2009-08-13 18:34 ` Ingo Molnar
2009-08-13 18:43 ` Ingo Molnar
2009-08-14 12:34 ` Heiko Carstens
2009-08-14 16:04 ` Linus Torvalds
2009-08-14 17:13 ` Heiko Carstens
2009-08-14 18:08 ` Linus Torvalds
2009-08-14 20:19 ` David Miller
2009-08-14 20:45 ` Linus Torvalds
2009-08-14 21:10 ` Linus Torvalds
2009-08-14 22:23 ` David Miller
2009-08-16 18:27 ` Heiko Carstens
2009-08-16 18:45 ` Linus Torvalds
2009-08-16 20:36 ` Ingo Molnar
2009-08-17 10:26 ` Heiko Carstens
2009-08-17 21:26 ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar
2009-08-18 11:34 ` Heiko Carstens
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).