linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/3] Allow inlined spinlocks again V3
@ 2009-08-12 18:39 Heiko Carstens
  2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt

Hi Andrew,

since there were no objections to this patch set, would you mind picking
it up for -mm? The patches are against current Linus' git head but also
apply on top of the latest mmotm patch set.

===

This patch set allows to have inlined spinlocks again.

The rationale behind this is that function calls on at least s390 are
expensive.

If one considers that server kernels are usually compiled with
!CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop.
The extra overhead for a function call is significant.
With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390.
These numbers were taken with some network benchmarks. However I expect
any workload that calls frequently into the kernel and which grabs a few
locks to perform better.

The implementation is straight forward: move the function bodies of the
locking functions to static inline functions and place them in a header
file.
Dependent on CONFIG_SPINLOCK_INLINE generate out-of-line or inlined
locking functions.

The patches should be self explaining.

V2: rewritten from scratch - now also with readable code

V3: removed macro to generate out-of-line spinlock variants since that
    would break ctags. As requested by Arnd Bergmann.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 1/3] spinlock: move spinlock function bodies to header file
  2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
@ 2009-08-12 18:39 ` Heiko Carstens
  2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt

[-- Attachment #1: 01_spinlock_move.diff --]
[-- Type: text/plain, Size: 17156 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Move spinlock function bodies to header file by creating a
static inline version of each variant.
Use the inline version on the out-of-line code. This shouldn't
make any difference besides that the spinlock code can now
be used to generate inlined spinlock code.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 include/linux/spinlock.h         |   18 +-
 include/linux/spinlock_api_smp.h |  263 +++++++++++++++++++++++++++++++++++++++
 kernel/spinlock.c                |  174 ++++---------------------
 3 files changed, 300 insertions(+), 155 deletions(-)

Index: linux-2.6/include/linux/spinlock_api_smp.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock_api_smp.h
+++ linux-2.6/include/linux/spinlock_api_smp.h
@@ -60,4 +60,267 @@ void __lockfunc _read_unlock_irqrestore(
 void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 							__releases(lock);
 
+static inline int __spin_trylock(spinlock_t *lock)
+{
+	preempt_disable();
+	if (_raw_spin_trylock(lock)) {
+		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+		return 1;
+	}
+	preempt_enable();
+	return 0;
+}
+
+static inline int __read_trylock(rwlock_t *lock)
+{
+	preempt_disable();
+	if (_raw_read_trylock(lock)) {
+		rwlock_acquire_read(&lock->dep_map, 0, 1, _RET_IP_);
+		return 1;
+	}
+	preempt_enable();
+	return 0;
+}
+
+static inline int __write_trylock(rwlock_t *lock)
+{
+	preempt_disable();
+	if (_raw_write_trylock(lock)) {
+		rwlock_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+		return 1;
+	}
+	preempt_enable();
+	return 0;
+}
+
+/*
+ * If lockdep is enabled then we use the non-preemption spin-ops
+ * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
+ * not re-enabled during lock-acquire (which the preempt-spin-ops do):
+ */
+#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)
+
+static inline void __read_lock(rwlock_t *lock)
+{
+	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+}
+
+static inline unsigned long __spin_lock_irqsave(spinlock_t *lock)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	/*
+	 * On lockdep we dont want the hand-coded irq-enable of
+	 * _raw_spin_lock_flags() code, because lockdep assumes
+	 * that interrupts are not re-enabled during lock-acquire:
+	 */
+#ifdef CONFIG_LOCKDEP
+	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+#else
+	_raw_spin_lock_flags(lock, &flags);
+#endif
+	return flags;
+}
+
+static inline void __spin_lock_irq(spinlock_t *lock)
+{
+	local_irq_disable();
+	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+}
+
+static inline void __spin_lock_bh(spinlock_t *lock)
+{
+	local_bh_disable();
+	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+}
+
+static inline unsigned long __read_lock_irqsave(rwlock_t *lock)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED_FLAGS(lock, _raw_read_trylock, _raw_read_lock,
+			     _raw_read_lock_flags, &flags);
+	return flags;
+}
+
+static inline void __read_lock_irq(rwlock_t *lock)
+{
+	local_irq_disable();
+	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+}
+
+static inline void __read_lock_bh(rwlock_t *lock)
+{
+	local_bh_disable();
+	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+}
+
+static inline unsigned long __write_lock_irqsave(rwlock_t *lock)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED_FLAGS(lock, _raw_write_trylock, _raw_write_lock,
+			     _raw_write_lock_flags, &flags);
+	return flags;
+}
+
+static inline void __write_lock_irq(rwlock_t *lock)
+{
+	local_irq_disable();
+	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+}
+
+static inline void __write_lock_bh(rwlock_t *lock)
+{
+	local_bh_disable();
+	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+}
+
+static inline void __spin_lock(spinlock_t *lock)
+{
+	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+}
+
+static inline void __write_lock(rwlock_t *lock)
+{
+	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+}
+
+#endif /* CONFIG_PREEMPT */
+
+static inline void __spin_unlock(spinlock_t *lock)
+{
+	spin_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_spin_unlock(lock);
+	preempt_enable();
+}
+
+static inline void __write_unlock(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_write_unlock(lock);
+	preempt_enable();
+}
+
+static inline void __read_unlock(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_read_unlock(lock);
+	preempt_enable();
+}
+
+static inline void __spin_unlock_irqrestore(spinlock_t *lock,
+					    unsigned long flags)
+{
+	spin_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_spin_unlock(lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+static inline void __spin_unlock_irq(spinlock_t *lock)
+{
+	spin_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_spin_unlock(lock);
+	local_irq_enable();
+	preempt_enable();
+}
+
+static inline void __spin_unlock_bh(spinlock_t *lock)
+{
+	spin_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_spin_unlock(lock);
+	preempt_enable_no_resched();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
+
+static inline void __read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_read_unlock(lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+static inline void __read_unlock_irq(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_read_unlock(lock);
+	local_irq_enable();
+	preempt_enable();
+}
+
+static inline void __read_unlock_bh(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_read_unlock(lock);
+	preempt_enable_no_resched();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
+
+static inline void __write_unlock_irqrestore(rwlock_t *lock,
+					     unsigned long flags)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_write_unlock(lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+static inline void __write_unlock_irq(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_write_unlock(lock);
+	local_irq_enable();
+	preempt_enable();
+}
+
+static inline void __write_unlock_bh(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_write_unlock(lock);
+	preempt_enable_no_resched();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
+
+static inline int __spin_trylock_bh(spinlock_t *lock)
+{
+	local_bh_disable();
+	preempt_disable();
+	if (_raw_spin_trylock(lock)) {
+		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+		return 1;
+	}
+	preempt_enable_no_resched();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+	return 0;
+}
+
 #endif /* __LINUX_SPINLOCK_API_SMP_H */
Index: linux-2.6/include/linux/spinlock.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock.h
+++ linux-2.6/include/linux/spinlock.h
@@ -143,15 +143,6 @@ static inline void smp_mb__after_lock(vo
  */
 #define spin_unlock_wait(lock)	__raw_spin_unlock_wait(&(lock)->raw_lock)
 
-/*
- * Pull the _spin_*()/_read_*()/_write_*() functions/declarations:
- */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-# include <linux/spinlock_api_smp.h>
-#else
-# include <linux/spinlock_api_up.h>
-#endif
-
 #ifdef CONFIG_DEBUG_SPINLOCK
  extern void _raw_spin_lock(spinlock_t *lock);
 #define _raw_spin_lock_flags(lock, flags) _raw_spin_lock(lock)
@@ -380,4 +371,13 @@ extern int _atomic_dec_and_lock(atomic_t
  */
 #define spin_can_lock(lock)	(!spin_is_locked(lock))
 
+/*
+ * Pull the _spin_*()/_read_*()/_write_*() functions/declarations:
+ */
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+# include <linux/spinlock_api_smp.h>
+#else
+# include <linux/spinlock_api_up.h>
+#endif
+
 #endif /* __LINUX_SPINLOCK_H */
Index: linux-2.6/kernel/spinlock.c
===================================================================
--- linux-2.6.orig/kernel/spinlock.c
+++ linux-2.6/kernel/spinlock.c
@@ -23,40 +23,19 @@
 
 int __lockfunc _spin_trylock(spinlock_t *lock)
 {
-	preempt_disable();
-	if (_raw_spin_trylock(lock)) {
-		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
-		return 1;
-	}
-	
-	preempt_enable();
-	return 0;
+	return __spin_trylock(lock);
 }
 EXPORT_SYMBOL(_spin_trylock);
 
 int __lockfunc _read_trylock(rwlock_t *lock)
 {
-	preempt_disable();
-	if (_raw_read_trylock(lock)) {
-		rwlock_acquire_read(&lock->dep_map, 0, 1, _RET_IP_);
-		return 1;
-	}
-
-	preempt_enable();
-	return 0;
+	return __read_trylock(lock);
 }
 EXPORT_SYMBOL(_read_trylock);
 
 int __lockfunc _write_trylock(rwlock_t *lock)
 {
-	preempt_disable();
-	if (_raw_write_trylock(lock)) {
-		rwlock_acquire(&lock->dep_map, 0, 1, _RET_IP_);
-		return 1;
-	}
-
-	preempt_enable();
-	return 0;
+	return __write_trylock(lock);
 }
 EXPORT_SYMBOL(_write_trylock);
 
@@ -69,129 +48,74 @@ EXPORT_SYMBOL(_write_trylock);
 
 void __lockfunc _read_lock(rwlock_t *lock)
 {
-	preempt_disable();
-	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+	__read_lock(lock);
 }
 EXPORT_SYMBOL(_read_lock);
 
 unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
-	preempt_disable();
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	/*
-	 * On lockdep we dont want the hand-coded irq-enable of
-	 * _raw_spin_lock_flags() code, because lockdep assumes
-	 * that interrupts are not re-enabled during lock-acquire:
-	 */
-#ifdef CONFIG_LOCKDEP
-	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
-#else
-	_raw_spin_lock_flags(lock, &flags);
-#endif
-	return flags;
+	return __spin_lock_irqsave(lock);
 }
 EXPORT_SYMBOL(_spin_lock_irqsave);
 
 void __lockfunc _spin_lock_irq(spinlock_t *lock)
 {
-	local_irq_disable();
-	preempt_disable();
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+	__spin_lock_irq(lock);
 }
 EXPORT_SYMBOL(_spin_lock_irq);
 
 void __lockfunc _spin_lock_bh(spinlock_t *lock)
 {
-	local_bh_disable();
-	preempt_disable();
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+	__spin_lock_bh(lock);
 }
 EXPORT_SYMBOL(_spin_lock_bh);
 
 unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
-	preempt_disable();
-	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED_FLAGS(lock, _raw_read_trylock, _raw_read_lock,
-			     _raw_read_lock_flags, &flags);
-	return flags;
+	return __read_lock_irqsave(lock);
 }
 EXPORT_SYMBOL(_read_lock_irqsave);
 
 void __lockfunc _read_lock_irq(rwlock_t *lock)
 {
-	local_irq_disable();
-	preempt_disable();
-	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+	__read_lock_irq(lock);
 }
 EXPORT_SYMBOL(_read_lock_irq);
 
 void __lockfunc _read_lock_bh(rwlock_t *lock)
 {
-	local_bh_disable();
-	preempt_disable();
-	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+	__read_lock_bh(lock);
 }
 EXPORT_SYMBOL(_read_lock_bh);
 
 unsigned long __lockfunc _write_lock_irqsave(rwlock_t *lock)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
-	preempt_disable();
-	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED_FLAGS(lock, _raw_write_trylock, _raw_write_lock,
-			     _raw_write_lock_flags, &flags);
-	return flags;
+	return __write_lock_irqsave(lock);
 }
 EXPORT_SYMBOL(_write_lock_irqsave);
 
 void __lockfunc _write_lock_irq(rwlock_t *lock)
 {
-	local_irq_disable();
-	preempt_disable();
-	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+	__write_lock_irq(lock);
 }
 EXPORT_SYMBOL(_write_lock_irq);
 
 void __lockfunc _write_lock_bh(rwlock_t *lock)
 {
-	local_bh_disable();
-	preempt_disable();
-	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+	__write_lock_bh(lock);
 }
 EXPORT_SYMBOL(_write_lock_bh);
 
 void __lockfunc _spin_lock(spinlock_t *lock)
 {
-	preempt_disable();
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+	__spin_lock(lock);
 }
-
 EXPORT_SYMBOL(_spin_lock);
 
 void __lockfunc _write_lock(rwlock_t *lock)
 {
-	preempt_disable();
-	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+	__write_lock(lock);
 }
-
 EXPORT_SYMBOL(_write_lock);
 
 #else /* CONFIG_PREEMPT: */
@@ -320,121 +244,79 @@ EXPORT_SYMBOL(_spin_lock_nest_lock);
 
 void __lockfunc _spin_unlock(spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_spin_unlock(lock);
-	preempt_enable();
+	__spin_unlock(lock);
 }
 EXPORT_SYMBOL(_spin_unlock);
 
 void __lockfunc _write_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_write_unlock(lock);
-	preempt_enable();
+	__write_unlock(lock);
 }
 EXPORT_SYMBOL(_write_unlock);
 
 void __lockfunc _read_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_read_unlock(lock);
-	preempt_enable();
+	__read_unlock(lock);
 }
 EXPORT_SYMBOL(_read_unlock);
 
 void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_spin_unlock(lock);
-	local_irq_restore(flags);
-	preempt_enable();
+	__spin_unlock_irqrestore(lock, flags);
 }
 EXPORT_SYMBOL(_spin_unlock_irqrestore);
 
 void __lockfunc _spin_unlock_irq(spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_spin_unlock(lock);
-	local_irq_enable();
-	preempt_enable();
+	__spin_unlock_irq(lock);
 }
 EXPORT_SYMBOL(_spin_unlock_irq);
 
 void __lockfunc _spin_unlock_bh(spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_spin_unlock(lock);
-	preempt_enable_no_resched();
-	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+	__spin_unlock_bh(lock);
 }
 EXPORT_SYMBOL(_spin_unlock_bh);
 
 void __lockfunc _read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_read_unlock(lock);
-	local_irq_restore(flags);
-	preempt_enable();
+	__read_unlock_irqrestore(lock, flags);
 }
 EXPORT_SYMBOL(_read_unlock_irqrestore);
 
 void __lockfunc _read_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_read_unlock(lock);
-	local_irq_enable();
-	preempt_enable();
+	__read_unlock_irq(lock);
 }
 EXPORT_SYMBOL(_read_unlock_irq);
 
 void __lockfunc _read_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_read_unlock(lock);
-	preempt_enable_no_resched();
-	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+	__read_unlock_bh(lock);
 }
 EXPORT_SYMBOL(_read_unlock_bh);
 
 void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_write_unlock(lock);
-	local_irq_restore(flags);
-	preempt_enable();
+	__write_unlock_irqrestore(lock, flags);
 }
 EXPORT_SYMBOL(_write_unlock_irqrestore);
 
 void __lockfunc _write_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_write_unlock(lock);
-	local_irq_enable();
-	preempt_enable();
+	__write_unlock_irq(lock);
 }
 EXPORT_SYMBOL(_write_unlock_irq);
 
 void __lockfunc _write_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_write_unlock(lock);
-	preempt_enable_no_resched();
-	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+	__write_unlock_bh(lock);
 }
 EXPORT_SYMBOL(_write_unlock_bh);
 
 int __lockfunc _spin_trylock_bh(spinlock_t *lock)
 {
-	local_bh_disable();
-	preempt_disable();
-	if (_raw_spin_trylock(lock)) {
-		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
-		return 1;
-	}
-
-	preempt_enable_no_resched();
-	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
-	return 0;
+	return __spin_trylock_bh(lock);
 }
 EXPORT_SYMBOL(_spin_trylock_bh);
 

-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 2/3] spinlock: allow inlined spinlocks
  2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
  2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
@ 2009-08-12 18:39 ` Heiko Carstens
  2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens
  2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
  3 siblings, 0 replies; 22+ messages in thread
From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt

[-- Attachment #1: 02_spinlock_inline.diff --]
[-- Type: text/plain, Size: 4678 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Add new config option SPINLOCK_INLINE and some defines which depend
on it in order to generate inlined spinlock code instead of out-of-line
code.
Avoiding function calls for spinlocks gives 1%-5% less cpu usage on
network benchmarks on s390.

Architectures must select HAVE_SPINLOCK_INLINE_SUPPORT to enable this
config option.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 include/linux/spinlock_api_smp.h |   35 +++++++++++++++++++++++++++++++++++
 kernel/spinlock.c                |    4 ++++
 lib/Kconfig.debug                |   14 ++++++++++++++
 3 files changed, 53 insertions(+)

Index: linux-2.6/include/linux/spinlock_api_smp.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock_api_smp.h
+++ linux-2.6/include/linux/spinlock_api_smp.h
@@ -19,6 +19,8 @@ int in_lock_functions(unsigned long addr
 
 #define assert_spin_locked(x)	BUG_ON(!spin_is_locked(x))
 
+#ifndef CONFIG_SPINLOCK_INLINE
+
 void __lockfunc _spin_lock(spinlock_t *lock)		__acquires(lock);
 void __lockfunc _spin_lock_nested(spinlock_t *lock, int subclass)
 							__acquires(lock);
@@ -60,6 +62,39 @@ void __lockfunc _read_unlock_irqrestore(
 void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 							__releases(lock);
 
+#else /* CONFIG_HAVE_SPINLOCK_INLINE_SUPPORT */
+
+#define _spin_trylock(lock)	  __spin_trylock(lock)
+#define _read_trylock(lock)	  __read_trylock(lock)
+#define _write_trylock(lock)	  __write_trylock(lock)
+#define _read_lock(lock)	  __read_lock(lock)
+#define _spin_lock_irqsave(lock)  __spin_lock_irqsave(lock)
+#define _spin_lock_irq(lock)	  __spin_lock_irq(lock)
+#define _spin_lock_bh(lock)	  __spin_lock_bh(lock)
+#define _read_lock_irqsave(lock)  __read_lock_irqsave(lock)
+#define _read_lock_irq(lock)	  __read_lock_irq(lock)
+#define _read_lock_bh(lock)	  __read_lock_bh(lock)
+#define _write_lock_irqsave(lock) __write_lock_irqsave(lock)
+#define _write_lock_irq(lock)	  __write_lock_irq(lock)
+#define _write_lock_bh(lock)	  __write_lock_bh(lock)
+#define _spin_lock(lock)	  __spin_lock(lock)
+#define _write_lock(lock)	  __write_lock(lock)
+#define _spin_unlock(lock)	  __spin_unlock(lock)
+#define _write_unlock(lock)	  __write_unlock(lock)
+#define _read_unlock(lock)	  __read_unlock(lock)
+#define _spin_unlock_irq(lock)	  __spin_unlock_irq(lock)
+#define _spin_unlock_bh(lock)	  __spin_unlock_bh(lock)
+#define _read_unlock_irq(lock)	  __read_unlock_irq(lock)
+#define _read_unlock_bh(lock)	  __read_unlock_bh(lock)
+#define _write_unlock_irq(lock)	  __write_unlock_irq(lock)
+#define _write_unlock_bh(lock)	  __write_unlock_bh(lock)
+#define _spin_trylock_bh(lock)	  __spin_trylock_bh(lock)
+#define _spin_unlock_irqrestore(lock, flags)  __spin_unlock_irqrestore(lock, flags)
+#define _read_unlock_irqrestore(lock, flags)  __read_unlock_irqrestore(lock, flags)
+#define _write_unlock_irqrestore(lock, flags) __write_unlock_irqrestore(lock, flags)
+
+#endif /* CONFIG_HAVE_SPINLOCK_INLINE_SUPPORT */
+
 static inline int __spin_trylock(spinlock_t *lock)
 {
 	preempt_disable();
Index: linux-2.6/lib/Kconfig.debug
===================================================================
--- linux-2.6.orig/lib/Kconfig.debug
+++ linux-2.6/lib/Kconfig.debug
@@ -879,6 +879,20 @@ config SYSCTL_SYSCALL_CHECK
 	  to properly maintain and use. This enables checks that help
 	  you to keep things correct.
 
+config HAVE_SPINLOCK_INLINE_SUPPORT
+	bool
+
+config SPINLOCK_INLINE
+	bool "Inline spinlock code"
+	depends on HAVE_SPINLOCK_INLINE_SUPPORT
+	depends on !DEBUG_SPINLOCK
+	depends on SMP && !PREEMPT
+	help
+	  Select this option if you want to have inlined spinlock code instead
+	  of an out of line implementation.
+	  This will generate a larger kernel image. On some architectures this
+	  increases performance.
+
 source mm/Kconfig.debug
 source kernel/trace/Kconfig
 
Index: linux-2.6/kernel/spinlock.c
===================================================================
--- linux-2.6.orig/kernel/spinlock.c
+++ linux-2.6/kernel/spinlock.c
@@ -21,6 +21,8 @@
 #include <linux/debug_locks.h>
 #include <linux/module.h>
 
+#ifndef CONFIG_SPINLOCK_INLINE
+
 int __lockfunc _spin_trylock(spinlock_t *lock)
 {
 	return __spin_trylock(lock);
@@ -320,6 +322,8 @@ int __lockfunc _spin_trylock_bh(spinlock
 }
 EXPORT_SYMBOL(_spin_trylock_bh);
 
+#endif /* CONFIG_HAVE_SPINLOCK_INLINE_SUPPORT */
+
 notrace int in_lock_functions(unsigned long addr)
 {
 	/* Linker adds these: start and end of __lockfunc functions */

-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 3/3] spinlock: allow inlined spinlocks on s390
  2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
  2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
  2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens
@ 2009-08-12 18:39 ` Heiko Carstens
  2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
  3 siblings, 0 replies; 22+ messages in thread
From: Heiko Carstens @ 2009-08-12 18:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt

[-- Attachment #1: 03_spinlock_s390.diff --]
[-- Type: text/plain, Size: 680 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 arch/s390/Kconfig |    1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6/arch/s390/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/Kconfig
+++ linux-2.6/arch/s390/Kconfig
@@ -93,6 +93,7 @@ config S390
 	select HAVE_KRETPROBES
 	select HAVE_KVM if 64BIT
 	select HAVE_ARCH_TRACEHOOK
+	select HAVE_SPINLOCK_INLINE_SUPPORT
 	select INIT_ALL_POSSIBLE
 	select HAVE_PERF_COUNTERS
 	select GENERIC_ATOMIC64 if !64BIT

-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
                   ` (2 preceding siblings ...)
  2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens
@ 2009-08-13 18:11 ` Linus Torvalds
  2009-08-13 18:34   ` Ingo Molnar
  2009-08-14 12:34   ` Heiko Carstens
  3 siblings, 2 replies; 22+ messages in thread
From: Linus Torvalds @ 2009-08-13 18:11 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt



On Wed, 12 Aug 2009, Heiko Carstens wrote:
> 
> This patch set allows to have inlined spinlocks again.

I'm not convinced this is the right approach.

The thing is, CONFIG_SPINLOCK_INLINE doesn't look like the right thing to 
do. On x86, for example, functions calls are usually quite cheap, and it 
may well make sense to keep 'spin_lock()' out-of-line, especially once you 
start doing things like playing with the softirq counts or interrupts.

At the same time, the special case of spin *unlock* is generally just a 
single (small) instruction on x86, so that one almost certainly is worth 
inlining - at least as long as it's not one of the softirq ones.

In other words, I think your hammer is too large. If you want to inline 
the lock functions, you should _not_ make this a single "everything or 
nothing" thing.

> If one considers that server kernels are usually compiled with
> !CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop.
> The extra overhead for a function call is significant.
> With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390.
> These numbers were taken with some network benchmarks. However I expect
> any workload that calls frequently into the kernel and which grabs a few
> locks to perform better.

Have you tried to determine which kinds of spinlocks you care most about? 
With networking, I guess the softirq-safe ones (and perhaps the irq ones) 
are pretty common, but maybe even there it's really just a couple of 
cases. 

Now, there is one advantage to inlining that I really like - and that I 
personally have missed from the days when we used to always do it: 
profiling. Inlining the spinlocks tends to do _wonders_ for showing 
exactly which spinlock ends up being the contention cases for certain 
loads.

So there are advantages outside of the "size vs speed" kind of things, but 
at the same time I do suspect that the disadvantages mean that we really 
should make this be something where an architecture can specify things at 
a finer granularity. Perhaps by using per-symbol defines (ie have the 
<asm/spinlock.h> file do things like

	#define __spin_lock_is_small
	#define __spin_unlock_is_small

kind of things for the particular sequences that are worth inlining.

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
@ 2009-08-13 18:34   ` Ingo Molnar
  2009-08-13 18:43     ` Ingo Molnar
  2009-08-14 12:34   ` Heiko Carstens
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2009-08-13 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch,
	Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Now, there is one advantage to inlining that I really like - and 
> that I personally have missed from the days when we used to 
> always do it: profiling. Inlining the spinlocks tends to do 
> _wonders_ for showing exactly which spinlock ends up being the 
> contention cases for certain loads. [...]

We've got a new tool to determine that kind of information though: 
perf + call-graph profiling.

This is how it works and how it can be used. Say we want to profile 
'bw_tcp' from lmbench. The traditional way is to record a flat 
profile:

 aldebaran:~/l> perf record -f ./bw_pipe
 Pipe bandwidth: 4607.33 MB/sec
 [ perf record: Captured and wrote 2.503 MB perf.data (~109368 samples) ]

and then to view the flat profile:

 aldebaran:~/l> perf report
 # Samples: 109331
 #
 # Overhead  Command             Shared Object  Symbol
 # ........  .......  ........................  ......
 #
    39.61%  bw_pipe  [kernel]                  [k] copy_user_generic_string
    19.32%  bw_pipe  ./bw_pipe                 [.] enough_duration
     6.21%  bw_pipe  [kernel]                  [k] mutex_spin_on_owner
     3.54%  bw_pipe  ./bw_pipe                 [.] two_op
     3.54%  bw_pipe  ./bw_pipe                 [.] one_op
     2.79%  bw_pipe  [kernel]                  [k] get_page_from_freelist
     2.16%  bw_pipe  [kernel]                  [k] ia32_sysenter_target
     1.91%  bw_pipe  [kernel]                  [k] pipe_read
     1.83%  bw_pipe  [kernel]                  [k] pipe_write
     1.63%  bw_pipe  [kernel]                  [k] free_hot_cold_page
     1.21%  bw_pipe  [kernel]                  [k] put_page
     1.19%  bw_pipe  [kernel]                  [k] __alloc_pages_nodemask
     0.99%  bw_pipe  [kernel]                  [k] anon_pipe_buf_release
     0.74%  bw_pipe  [kernel]                  [k] get_pageblock_flags_group

Annoyingly enough, mutex_spin_on_owner() is in the profile but we 
have no idea what the parent function(s) are. (Plus there's no way 
we could reasonably inline mutex functions.) So we are stuck with 
guessing which of the many other functions above (or some other 
function which does not even get listed!) has that mutex spinning 
overhead.

But if you do the recording via call-graph traces:

 aldebaran:~/l> perf record -g -f ./bw_pipe
 Pipe bandwidth: 4432.75 MB/sec
 [ perf record: Captured and wrote 7.550 MB perf.data (~329886 samples) ]

(note the -g/--graph option) And then view the profile via perf 
report, we get:

 aldebaran:~/l> perf report -S _spin_lock

 # symbol: _spin_lock
 # Samples: 724
 #
 # Overhead  Command             Shared Object
 # ........  .......  ........................
 #
    98.34%  lat_tcp  [kernel]                
                |          
                |--31.74%-- ipt_do_table
                |          |          
                |          |--26.99%-- ipt_local_in_hook
                |          |          nf_iterate
                |          |          nf_hook_slow
                |          |          ip_local_deliver
                |          |          ip_rcv_finish
                |          |          ip_rcv
                |          |          netif_receive_skb
                |          |          process_backlog
                |          |          net_rx_action
                |          |          __do_softirq
                |          |          call_softirq
                |          |          do_softirq
                |          |          _local_bh_enable_ip
                |          |          local_bh_enable
                |          |          dev_queue_xmit
                |          |          ip_finish_output2
                |          |          ip_finish_output
                |          |          ip_output
                |          |          ip_local_out
                |          |          ip_queue_xmit
                |          |          tcp_transmit_skb
                |          |          tcp_write_xmit
                |          |          __tcp_push_pending_frames
                |          |          tcp_push
                |          |          tcp_sendmsg
                |          |          __sock_sendmsg
                |          |          sock_aio_write
                |          |          do_sync_write
                |          |          vfs_write
                |          |          sys_write
                |          |          sysenter_dispatch
                |          |          0xf7f50430
                |          |          0x5
                |          |          
                |          |--24.34%-- ipt_local_in_hook
                |          |          nf_iterate
                |          |          nf_hook_slow
                |          |          ip_local_deliver
                |          |          ip_rcv_finish
                |          |          ip_rcv
                |          |          netif_receive_skb
                |          |          process_backlog
                |          |          net_rx_action
                |          |          __do_softirq
                |          |          call_softirq
                |          |          do_softirq
                |          |          _local_bh_enable_ip
                |          |          local_bh_enable
                |          |          dev_queue_xmit
                |          |          ip_finish_output2
                |          |          ip_finish_output
                |          |          ip_output
                |          |          ip_local_out
                |          |          ip_queue_xmit
                |          |          tcp_transmit_skb
                |          |          tcp_write_xmit
                |          |          __tcp_push_pending_frames
                |          |          tcp_push
                |          |          tcp_sendmsg
                |          |          __sock_sendmsg
                |          |          sock_aio_write
                |          |          do_sync_write
                |          |          vfs_write
                |          |          sys_write
                |          |          sysenter_dispatch
                |          |          0xf7f50430
                |          |          0x5
                |          |          
                |          |--16.81%-- ipt_pre_routing_hook
                |          |          nf_iterate
                |          |          nf_hook_slow
                |          |          ip_rcv
                |          |          netif_receive_skb
                |          |          process_backlog
                |          |          net_rx_action
                |          |          __do_softirq
                |          |          call_softirq
                |          |          do_softirq
                |          |          _local_bh_enable_ip
                |          |          local_bh_enable
                |          |          dev_queue_xmit
                |          |          ip_finish_output2
                |          |          ip_finish_output
                |          |          ip_output
                |          |          ip_local_out
                |          |          ip_queue_xmit
                |          |          tcp_transmit_skb
                |          |          tcp_write_xmit
                |          |          __tcp_push_pending_frames
                |          |          tcp_push
                |          |          tcp_sendmsg
                |          |          __sock_sendmsg
                |          |          sock_aio_write
                |          |          do_sync_write
                |          |          vfs_write
                |          |          sys_write
                |          |          sysenter_dispatch
                |          |          0xf7f50430
                |          |          0x5
                |          |          
                |          |--11.95%-- ipt_post_routing_hook
                |          |          nf_iterate
                |          |          nf_hook_slow
                |          |          ip_output
                |          |          ip_local_out
                |          |          ip_queue_xmit
                |          |          tcp_transmit_skb
                |          |          tcp_write_xmit
                |          |          __tcp_push_pending_frames
                |          |          tcp_push
                |          |          tcp_sendmsg
                |          |          __sock_sendmsg
                |          |          sock_aio_write
                |          |          do_sync_write
                |          |          vfs_write
                |          |          sys_write
                |          |          sysenter_dispatch
                |          |          0xf7f50430
                |          |          0x5
                |          |          
                |          |--10.18%-- ipt_local_out_hook
                |          |          nf_iterate
                |          |          nf_hook_slow
                |          |          __ip_local_out
                |          |          ip_local_out
                |          |          ip_queue_xmit
                |          |          tcp_transmit_skb
                |          |          tcp_write_xmit
                |          |          __tcp_push_pending_frames
                |          |          tcp_push
                |          |          tcp_sendmsg
                |          |          __sock_sendmsg
                |          |          sock_aio_write
                |          |          do_sync_write
                |          |          vfs_write
                |          |          sys_write
                |          |          sysenter_dispatch
                |          |          0xf7f50430
                |          |          0x5
                |          |          
                |           --9.73%-- ipt_local_hook
                |                     nf_iterate
                |                     nf_hook_slow
                |                     __ip_local_out
                |                     ip_local_out
                |                     ip_queue_xmit
                |                     tcp_transmit_skb
                |                     tcp_write_xmit
                |                     __tcp_push_pending_frames
                |                     tcp_push
                |                     tcp_sendmsg
                |                     __sock_sendmsg
                |                     sock_aio_write
                |                     do_sync_write
                |                     vfs_write
                |                     sys_write
                |                     sysenter_dispatch
                |                     0xf7f50430
                |                     0x5
                |          
                |--31.46%-- __mod_timer
                |          mod_timer
                |          sk_reset_timer
                |          |          
                |          |--55.36%-- tcp_send_delayed_ack
                |          |          __tcp_ack_snd_check
                |          |          tcp_rcv_established
                |          |          tcp_v4_do_rcv
                |          |          tcp_prequeue_process
                |          |          tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_aio_read
                |          |          do_sync_read
                |          |          vfs_read
                |          |          sys_read
                |          |          sysenter_dispatch
                |          |          0xf7f50430
                |          |          0x5
                |          |          
                |           --44.64%-- tcp_v4_rcv
                |                     ip_local_deliver_finish
                |                     ip_local_deliver
                |                     ip_rcv_finish
                |                     ip_rcv
                |                     netif_receive_skb
                |                     process_backlog
                |                     net_rx_action
                |                     __do_softirq
                |                     call_softirq
                |                     do_softirq
                |                     _local_bh_enable_ip
                |                     local_bh_enable
                |                     dev_queue_xmit
                |                     ip_finish_output2
                |                     ip_finish_output
                |                     ip_output
                |                     ip_local_out
                |                     ip_queue_xmit
                |                     tcp_transmit_skb
                |                     tcp_write_xmit
                |                     __tcp_push_pending_frames
                |                     tcp_push
                |                     tcp_sendmsg
                |                     __sock_sendmsg
                |                     sock_aio_write
                |                     do_sync_write
                |                     vfs_write
                |                     sys_write
                |                     sysenter_dispatch
                |                     0xf7f50430
                |                     0x5
                |          
                |--21.21%-- task_rq_lock
                |          try_to_wake_up
                |          default_wake_function
                |          |          
                |          |--99.34%-- autoremove_wake_function
                |          |          __wake_up_common
                |          |          __wake_up
                |          |          tcp_v4_rcv
                |          |          ip_local_deliver_finish
                |          |          ip_local_deliver
                |          |          ip_rcv_finish
                |          |          ip_rcv
                |          |          netif_receive_skb
                |          |          process_backlog
                |          |          net_rx_action
                |          |          __do_softirq
                |          |          call_softirq
                |          |          do_softirq
                |          |          _local_bh_enable_ip
                |          |          local_bh_enable
                |          |          dev_queue_xmit
                |          |          ip_finish_output2
                |          |          ip_finish_output
                |          |          ip_output
                |          |          ip_local_out
                |          |          ip_queue_xmit
                |          |          tcp_transmit_skb
                |          |          tcp_write_xmit
                |          |          __tcp_push_pending_frames
                |          |          tcp_push
                |          |          tcp_sendmsg
                |          |          __sock_sendmsg
                |          |          sock_aio_write
                |          |          do_sync_write
                |          |          vfs_write
                |          |          sys_write
                |          |          sysenter_dispatch
                |          |          0xf7f50430
                |          |          0x5
                |          |          
                |           --0.66%-- pollwake
                |                     __wake_up_common
                |                     __wake_up
                |                     perf_counter_wakeup
                |                     perf_pending_counter
                |                     perf_counter_do_pending
                |                     smp_perf_pending_interrupt
                |                     perf_pending_interrupt
                |                     0xf7f50430
                |                     0x5
                |          
                |--5.06%-- tcp_v4_rcv
                |          ip_local_deliver_finish
                |          ip_local_deliver
                |          ip_rcv_finish
                |          ip_rcv
                |          netif_receive_skb
                |          process_backlog
                |          net_rx_action
                |          __do_softirq
                |          call_softirq
                |          do_softirq
                |          _local_bh_enable_ip
                |          local_bh_enable
                |          dev_queue_xmit
                |          ip_finish_output2
                |          ip_finish_output
                |          ip_output
                |          ip_local_out
                |          ip_queue_xmit
                |          tcp_transmit_skb
                |          tcp_write_xmit
                |          __tcp_push_pending_frames
                |          tcp_push
                |          tcp_sendmsg
                |          __sock_sendmsg
                |          sock_aio_write
                |          do_sync_write
                |          vfs_write
                |          sys_write
                |          sysenter_dispatch
                |          0xf7f50430
                |          0x5
                |          
                |--4.07%-- __perf_counter_sched_out
                |          perf_counter_task_sched_out
                |          schedule
                |          schedule_timeout
                |          sk_wait_data
                |          tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_aio_read
                |          do_sync_read
                |          vfs_read
                |          sys_read
                |          sysenter_dispatch
                |          0xf7f50430
                |          0x5
                |          
                |--1.40%-- ipt_local_hook
                |          nf_iterate
                |          nf_hook_slow
                |          __ip_local_out
                |          ip_local_out
                |          ip_queue_xmit
                |          tcp_transmit_skb
                |          tcp_write_xmit
                |          __tcp_push_pending_frames
                |          tcp_push
                |          tcp_sendmsg
                |          __sock_sendmsg
                |          sock_aio_write
                |          do_sync_write
                |          vfs_write
                |          sys_write
                |          sysenter_dispatch
                |          0xf7f50430
                |          0x5
                |          
                |--1.26%-- mod_timer
                |          sk_reset_timer
                |          |          
                |          |--66.67%-- tcp_send_delayed_ack
                |          |          __tcp_ack_snd_check
                |          |          tcp_rcv_established
                |          |          tcp_v4_do_rcv
                |          |          tcp_prequeue_process
                |          |          tcp_recvmsg
                |          |          sock_common_recvmsg
                |          |          __sock_recvmsg
                |          |          sock_aio_read
                |          |          do_sync_read
                |          |          vfs_read
                |          |          sys_read
                |          |          sysenter_dispatch
                |          |          0xf7f50430
                |          |          0x5
                |          |          
                |           --33.33%-- tcp_v4_rcv
                |                     ip_local_deliver_finish
                |                     ip_local_deliver
                |                     ip_rcv_finish
                |                     ip_rcv
                |                     netif_receive_skb
                |                     process_backlog
                |                     net_rx_action
                |                     __do_softirq
                |                     call_softirq
                |                     do_softirq
                |                     _local_bh_enable_ip
                |                     local_bh_enable
                |                     dev_queue_xmit
                |                     ip_finish_output2
                |                     ip_finish_output
                |                     ip_output
                |                     ip_local_out
                |                     ip_queue_xmit
                |                     tcp_transmit_skb
                |                     tcp_write_xmit
                |                     __tcp_push_pending_frames
                |                     tcp_push
                |                     tcp_sendmsg
                |                     __sock_sendmsg
                |                     sock_aio_write
                |                     do_sync_write
                |                     vfs_write
                |                     sys_write
                |                     sysenter_dispatch
                |                     0xf7f50430
                |                     0x5
                |          
                |--0.70%-- ipt_local_in_hook
                |          nf_iterate
                |          nf_hook_slow
                |          ip_local_deliver
                |          ip_rcv_finish
                |          ip_rcv
                |          netif_receive_skb
                |          process_backlog
                |          net_rx_action
                |          __do_softirq
                |          call_softirq
                |          do_softirq
                |          _local_bh_enable_ip
                |          local_bh_enable
                |          dev_queue_xmit
                |          ip_finish_output2
                |          ip_finish_output
                |          ip_output
                |          ip_local_out
                |          ip_queue_xmit
                |          tcp_transmit_skb
                |          tcp_write_xmit
                |          __tcp_push_pending_frames
                |          tcp_push
                |          tcp_sendmsg
                |          __sock_sendmsg
                |          sock_aio_write
                |          do_sync_write
                |          vfs_write
                |          sys_write
                |          sysenter_dispatch
                |          0xf7f50430
                |          0x5
                |          
                |--0.56%-- perf_counter_task_sched_out
                |          schedule
                |          schedule_timeout
                |          sk_wait_data
                |          tcp_recvmsg
                |          sock_common_recvmsg
                |          __sock_recvmsg
                |          sock_aio_read
                |          do_sync_read
                |          vfs_read
                |          sys_read
                |          sysenter_dispatch
                |          0xf7f50430
                |          0x5
                 --2.53%-- [...]

     1.66%     perf  [kernel]                
                |          
                |--25.00%-- journal_stop
                |          __ext3_journal_stop
                |          ext3_writeback_write_end
                |          generic_perform_write
                |          generic_file_buffered_write
                |          __generic_file_aio_write_nolock
                |          generic_file_aio_write
                |          ext3_file_write
                |          do_sync_write
                |          vfs_write
                |          sys_write
                |          system_call_fastpath
                |          __write_nocancel
                |          
                |--16.67%-- __perf_counter_sched_out
                |          perf_counter_task_sched_out
                |          schedule
                |          schedule_hrtimeout_range
                |          poll_schedule_timeout
                |          do_poll
                |          do_sys_poll
                |          sys_poll
                |          system_call_fastpath
                |          __GI___poll
                |          
                |--8.33%-- dup_mmap
                |          dup_mm
                |          copy_process
                |          do_fork
                |          sys_clone
                |          stub_clone
                |          fork
                |          __cmd_record
                |          
                |--8.33%-- hrtimer_interrupt
                |          smp_apic_timer_interrupt
                |          apic_timer_interrupt
                |          __write_nocancel
                |          
                |--8.33%-- journal_start
                |          ext3_journal_start_sb
                |          ext3_write_begin
                |          generic_perform_write
                |          generic_file_buffered_write
                |          __generic_file_aio_write_nolock
                |          generic_file_aio_write
                |          ext3_file_write
                |          do_sync_write
                |          vfs_write
                |          sys_write
                |          system_call_fastpath
                |          __write_nocancel
                |          
                |--8.33%-- create_empty_buffers
                |          __block_prepare_write
                |          block_write_begin
                |          ext3_write_begin
                |          generic_perform_write
                |          generic_file_buffered_write
                |          __generic_file_aio_write_nolock
                |          generic_file_aio_write
                |          ext3_file_write
                |          do_sync_write
                |          vfs_write
                |          sys_write
                |          system_call_fastpath
                |          __write_nocancel
                |          
                |--8.33%-- perf_counter_task_sched_out
                |          schedule
                |          schedule_hrtimeout_range
                |          poll_schedule_timeout
                |          do_poll
                |          do_sys_poll
                |          sys_poll
                |          system_call_fastpath
                |          __GI___poll
                |          
                |--8.33%-- ext3_alloc_branch
                |          ext3_get_blocks_handle
                |          ext3_get_block
                |          __block_prepare_write
                |          block_write_begin
                |          ext3_write_begin
                |          generic_perform_write
                |          generic_file_buffered_write
                |          __generic_file_aio_write_nolock
                |          generic_file_aio_write
                |          ext3_file_write
                |          do_sync_write
                |          vfs_write
                |          sys_write
                |          system_call_fastpath
                |          __write_nocancel
                |          
                 --8.33%-- start_this_handle
                           journal_start
                           ext3_journal_start_sb
                           ext3_write_begin
                           generic_perform_write
                           generic_file_buffered_write
                           __generic_file_aio_write_nolock
                           generic_file_aio_write
                           ext3_file_write
                           do_sync_write
                           vfs_write
                           sys_write
                           system_call_fastpath
                           __write_nocancel

the graph above (which is the sub-graph of the full graph, limited 
via the -S/--symbols option to all callchains that intersect with 
the _spin_lock symbol) details all the various code-paths that make 
use of _spin_lock(), and the weights (and relative weights) of 
those codepaths.

Note that this works for just about any locking primitive, and note 
how it's recursive. If we inlined spin_lock we'd only know this 
__mod_timer overhead:

                |--31.46%-- __mod_timer

but we wouldnt know the second level (which is interesting too):

                |          |--55.36%-- tcp_send_delayed_ack
                |           --44.64%-- tcp_v4_rcv

All in one, i think we should detach the issue of instrumentation 
and profiling from the issue of whether to inline or not. We should 
inline where it makes things faster. Our profiling tools are able 
to deal with it, independently of whether we inline or not.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-13 18:34   ` Ingo Molnar
@ 2009-08-13 18:43     ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2009-08-13 18:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiko Carstens, Andrew Morton, Peter Zijlstra, linux-arch,
	Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt


* Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > Now, there is one advantage to inlining that I really like - and 
> > that I personally have missed from the days when we used to 
> > always do it: profiling. Inlining the spinlocks tends to do 
> > _wonders_ for showing exactly which spinlock ends up being the 
> > contention cases for certain loads. [...]
> 
> We've got a new tool to determine that kind of information though: 
> perf + call-graph profiling.
> 
> This is how it works and how it can be used. Say we want to profile 
> 'bw_tcp' from lmbench. The traditional way is to record a flat 
> profile:
> 
>  aldebaran:~/l> perf record -f ./bw_pipe
>  Pipe bandwidth: 4607.33 MB/sec
>  [ perf record: Captured and wrote 2.503 MB perf.data (~109368 samples) ]
> 
> and then to view the flat profile:
> 
>  aldebaran:~/l> perf report
>  # Samples: 109331
>  #
>  # Overhead  Command             Shared Object  Symbol
>  # ........  .......  ........................  ......
>  #
>     39.61%  bw_pipe  [kernel]                  [k] copy_user_generic_string
>     19.32%  bw_pipe  ./bw_pipe                 [.] enough_duration
>      6.21%  bw_pipe  [kernel]                  [k] mutex_spin_on_owner
>      3.54%  bw_pipe  ./bw_pipe                 [.] two_op
>      3.54%  bw_pipe  ./bw_pipe                 [.] one_op
>      2.79%  bw_pipe  [kernel]                  [k] get_page_from_freelist
>      2.16%  bw_pipe  [kernel]                  [k] ia32_sysenter_target
>      1.91%  bw_pipe  [kernel]                  [k] pipe_read
>      1.83%  bw_pipe  [kernel]                  [k] pipe_write
>      1.63%  bw_pipe  [kernel]                  [k] free_hot_cold_page
>      1.21%  bw_pipe  [kernel]                  [k] put_page
>      1.19%  bw_pipe  [kernel]                  [k] __alloc_pages_nodemask
>      0.99%  bw_pipe  [kernel]                  [k] anon_pipe_buf_release
>      0.74%  bw_pipe  [kernel]                  [k] get_pageblock_flags_group
> 
> Annoyingly enough, mutex_spin_on_owner() is in the profile but we 
> have no idea what the parent function(s) are. (Plus there's no way 
> we could reasonably inline mutex functions.) So we are stuck with 
> guessing which of the many other functions above (or some other 
> function which does not even get listed!) has that mutex spinning 
> overhead.
> 
> But if you do the recording via call-graph traces:
> 
>  aldebaran:~/l> perf record -g -f ./bw_pipe
>  Pipe bandwidth: 4432.75 MB/sec
>  [ perf record: Captured and wrote 7.550 MB perf.data (~329886 samples) ]
> 
> (note the -g/--graph option) And then view the profile via perf 
> report, we get:

sorry - mail got trimmed due to a cut&paste mishap. Here's the 
missing bw_pipe profile:

 # Samples: 81007
 #
 # Overhead  Command             Shared Object  Symbol
 # ........  .......  ........................  ......
 #

     7.60%  bw_pipe  [kernel]                  [k] mutex_spin_on_owner
                |          
                |--99.97%-- __mutex_lock_common
                |          __mutex_lock_slowpath
                |          mutex_lock
                |          |          
                |          |--98.46%-- pipe_write
                |          |          do_sync_write
                |          |          vfs_write
                |          |          sys_write
                |          |          sysenter_dispatch
                |          |          0xf7f4f430
                |          |          0x600000007
                |          |          
                |           --1.54%-- pipe_lock
                |                     pipe_wait
                |                     pipe_write
                |                     do_sync_write
                |                     vfs_write
                |                     sys_write
                |                     sysenter_dispatch
                |                     0xf7f4f430
                |                     0x600000007
                 --0.03%-- [...]

it shows that the mutex overhead is distributed between these two 
functions:

                |          |--98.46%-- pipe_write
                |           --1.54%-- pipe_lock


I also wanted to show a more complex example, how spin-lock 
overhead gets distributed in the networking stack, in a
"lat_tcp localhost" workload:

> 
>  aldebaran:~/l> perf report -S _spin_lock
> 
>  # symbol: _spin_lock
>  # Samples: 724
>  #
>  # Overhead  Command             Shared Object
>  # ........  .......  ........................
>  #
>     98.34%  lat_tcp  [kernel]                
>                 |          
>                 |--31.74%-- ipt_do_table
>                 |          |          
>                 |          |--26.99%-- ipt_local_in_hook
>                 |          |          nf_iterate
>                 |          |          nf_hook_slow
>                 |          |          ip_local_deliver
>                 |          |          ip_rcv_finish
>                 |          |          ip_rcv
>                 |          |          netif_receive_skb
>                 |          |          process_backlog
>                 |          |          net_rx_action
>                 |          |          __do_softirq
>                 |          |          call_softirq
>                 |          |          do_softirq
>                 |          |          _local_bh_enable_ip
>                 |          |          local_bh_enable
>                 |          |          dev_queue_xmit
>                 |          |          ip_finish_output2
>                 |          |          ip_finish_output
>                 |          |          ip_output
>                 |          |          ip_local_out
>                 |          |          ip_queue_xmit
>                 |          |          tcp_transmit_skb
>                 |          |          tcp_write_xmit
>                 |          |          __tcp_push_pending_frames
>                 |          |          tcp_push
>                 |          |          tcp_sendmsg
>                 |          |          __sock_sendmsg
>                 |          |          sock_aio_write
>                 |          |          do_sync_write
>                 |          |          vfs_write
>                 |          |          sys_write
>                 |          |          sysenter_dispatch
>                 |          |          0xf7f50430
>                 |          |          0x5
>                 |          |          
>                 |          |--24.34%-- ipt_local_in_hook
>                 |          |          nf_iterate
>                 |          |          nf_hook_slow
>                 |          |          ip_local_deliver
>                 |          |          ip_rcv_finish
>                 |          |          ip_rcv
>                 |          |          netif_receive_skb
>                 |          |          process_backlog
>                 |          |          net_rx_action
>                 |          |          __do_softirq
>                 |          |          call_softirq
>                 |          |          do_softirq
>                 |          |          _local_bh_enable_ip
>                 |          |          local_bh_enable
>                 |          |          dev_queue_xmit
>                 |          |          ip_finish_output2
>                 |          |          ip_finish_output
>                 |          |          ip_output
>                 |          |          ip_local_out
>                 |          |          ip_queue_xmit
>                 |          |          tcp_transmit_skb
>                 |          |          tcp_write_xmit
>                 |          |          __tcp_push_pending_frames
>                 |          |          tcp_push
>                 |          |          tcp_sendmsg
>                 |          |          __sock_sendmsg
>                 |          |          sock_aio_write
>                 |          |          do_sync_write
>                 |          |          vfs_write
>                 |          |          sys_write
>                 |          |          sysenter_dispatch
>                 |          |          0xf7f50430
>                 |          |          0x5
>                 |          |          
>                 |          |--16.81%-- ipt_pre_routing_hook
>                 |          |          nf_iterate
>                 |          |          nf_hook_slow
>                 |          |          ip_rcv
>                 |          |          netif_receive_skb
>                 |          |          process_backlog
>                 |          |          net_rx_action
>                 |          |          __do_softirq
>                 |          |          call_softirq
>                 |          |          do_softirq
>                 |          |          _local_bh_enable_ip
>                 |          |          local_bh_enable
>                 |          |          dev_queue_xmit
>                 |          |          ip_finish_output2
>                 |          |          ip_finish_output
>                 |          |          ip_output
>                 |          |          ip_local_out
>                 |          |          ip_queue_xmit
>                 |          |          tcp_transmit_skb
>                 |          |          tcp_write_xmit
>                 |          |          __tcp_push_pending_frames
>                 |          |          tcp_push
>                 |          |          tcp_sendmsg
>                 |          |          __sock_sendmsg
>                 |          |          sock_aio_write
>                 |          |          do_sync_write
>                 |          |          vfs_write
>                 |          |          sys_write
>                 |          |          sysenter_dispatch
>                 |          |          0xf7f50430
>                 |          |          0x5
>                 |          |          
>                 |          |--11.95%-- ipt_post_routing_hook
>                 |          |          nf_iterate
>                 |          |          nf_hook_slow
>                 |          |          ip_output
>                 |          |          ip_local_out
>                 |          |          ip_queue_xmit
>                 |          |          tcp_transmit_skb
>                 |          |          tcp_write_xmit
>                 |          |          __tcp_push_pending_frames
>                 |          |          tcp_push
>                 |          |          tcp_sendmsg
>                 |          |          __sock_sendmsg
>                 |          |          sock_aio_write
>                 |          |          do_sync_write
>                 |          |          vfs_write
>                 |          |          sys_write
>                 |          |          sysenter_dispatch
>                 |          |          0xf7f50430
>                 |          |          0x5
>                 |          |          
>                 |          |--10.18%-- ipt_local_out_hook
>                 |          |          nf_iterate
>                 |          |          nf_hook_slow
>                 |          |          __ip_local_out
>                 |          |          ip_local_out
>                 |          |          ip_queue_xmit
>                 |          |          tcp_transmit_skb
>                 |          |          tcp_write_xmit
>                 |          |          __tcp_push_pending_frames
>                 |          |          tcp_push
>                 |          |          tcp_sendmsg
>                 |          |          __sock_sendmsg
>                 |          |          sock_aio_write
>                 |          |          do_sync_write
>                 |          |          vfs_write
>                 |          |          sys_write
>                 |          |          sysenter_dispatch
>                 |          |          0xf7f50430
>                 |          |          0x5
>                 |          |          
>                 |           --9.73%-- ipt_local_hook
>                 |                     nf_iterate
>                 |                     nf_hook_slow
>                 |                     __ip_local_out
>                 |                     ip_local_out
>                 |                     ip_queue_xmit
>                 |                     tcp_transmit_skb
>                 |                     tcp_write_xmit
>                 |                     __tcp_push_pending_frames
>                 |                     tcp_push
>                 |                     tcp_sendmsg
>                 |                     __sock_sendmsg
>                 |                     sock_aio_write
>                 |                     do_sync_write
>                 |                     vfs_write
>                 |                     sys_write
>                 |                     sysenter_dispatch
>                 |                     0xf7f50430
>                 |                     0x5
>                 |          
>                 |--31.46%-- __mod_timer
>                 |          mod_timer
>                 |          sk_reset_timer
>                 |          |          
>                 |          |--55.36%-- tcp_send_delayed_ack
>                 |          |          __tcp_ack_snd_check
>                 |          |          tcp_rcv_established
>                 |          |          tcp_v4_do_rcv
>                 |          |          tcp_prequeue_process
>                 |          |          tcp_recvmsg
>                 |          |          sock_common_recvmsg
>                 |          |          __sock_recvmsg
>                 |          |          sock_aio_read
>                 |          |          do_sync_read
>                 |          |          vfs_read
>                 |          |          sys_read
>                 |          |          sysenter_dispatch
>                 |          |          0xf7f50430
>                 |          |          0x5
>                 |          |          
>                 |           --44.64%-- tcp_v4_rcv
>                 |                     ip_local_deliver_finish
>                 |                     ip_local_deliver
>                 |                     ip_rcv_finish
>                 |                     ip_rcv
>                 |                     netif_receive_skb
>                 |                     process_backlog
>                 |                     net_rx_action
>                 |                     __do_softirq
>                 |                     call_softirq
>                 |                     do_softirq
>                 |                     _local_bh_enable_ip
>                 |                     local_bh_enable
>                 |                     dev_queue_xmit
>                 |                     ip_finish_output2
>                 |                     ip_finish_output
>                 |                     ip_output
>                 |                     ip_local_out
>                 |                     ip_queue_xmit
>                 |                     tcp_transmit_skb
>                 |                     tcp_write_xmit
>                 |                     __tcp_push_pending_frames
>                 |                     tcp_push
>                 |                     tcp_sendmsg
>                 |                     __sock_sendmsg
>                 |                     sock_aio_write
>                 |                     do_sync_write
>                 |                     vfs_write
>                 |                     sys_write
>                 |                     sysenter_dispatch
>                 |                     0xf7f50430
>                 |                     0x5
>                 |          
>                 |--21.21%-- task_rq_lock
>                 |          try_to_wake_up
>                 |          default_wake_function
>                 |          |          
>                 |          |--99.34%-- autoremove_wake_function
>                 |          |          __wake_up_common
>                 |          |          __wake_up
>                 |          |          tcp_v4_rcv
>                 |          |          ip_local_deliver_finish
>                 |          |          ip_local_deliver
>                 |          |          ip_rcv_finish
>                 |          |          ip_rcv
>                 |          |          netif_receive_skb
>                 |          |          process_backlog
>                 |          |          net_rx_action
>                 |          |          __do_softirq
>                 |          |          call_softirq
>                 |          |          do_softirq
>                 |          |          _local_bh_enable_ip
>                 |          |          local_bh_enable
>                 |          |          dev_queue_xmit
>                 |          |          ip_finish_output2
>                 |          |          ip_finish_output
>                 |          |          ip_output
>                 |          |          ip_local_out
>                 |          |          ip_queue_xmit
>                 |          |          tcp_transmit_skb
>                 |          |          tcp_write_xmit
>                 |          |          __tcp_push_pending_frames
>                 |          |          tcp_push
>                 |          |          tcp_sendmsg
>                 |          |          __sock_sendmsg
>                 |          |          sock_aio_write
>                 |          |          do_sync_write
>                 |          |          vfs_write
>                 |          |          sys_write
>                 |          |          sysenter_dispatch
>                 |          |          0xf7f50430
>                 |          |          0x5
>                 |          |          
>                 |           --0.66%-- pollwake
>                 |                     __wake_up_common
>                 |                     __wake_up
>                 |                     perf_counter_wakeup
>                 |                     perf_pending_counter
>                 |                     perf_counter_do_pending
>                 |                     smp_perf_pending_interrupt
>                 |                     perf_pending_interrupt
>                 |                     0xf7f50430
>                 |                     0x5
>                 |          
>                 |--5.06%-- tcp_v4_rcv
>                 |          ip_local_deliver_finish
>                 |          ip_local_deliver
>                 |          ip_rcv_finish
>                 |          ip_rcv
>                 |          netif_receive_skb
>                 |          process_backlog
>                 |          net_rx_action
>                 |          __do_softirq
>                 |          call_softirq
>                 |          do_softirq
>                 |          _local_bh_enable_ip
>                 |          local_bh_enable
>                 |          dev_queue_xmit
>                 |          ip_finish_output2
>                 |          ip_finish_output
>                 |          ip_output
>                 |          ip_local_out
>                 |          ip_queue_xmit
>                 |          tcp_transmit_skb
>                 |          tcp_write_xmit
>                 |          __tcp_push_pending_frames
>                 |          tcp_push
>                 |          tcp_sendmsg
>                 |          __sock_sendmsg
>                 |          sock_aio_write
>                 |          do_sync_write
>                 |          vfs_write
>                 |          sys_write
>                 |          sysenter_dispatch
>                 |          0xf7f50430
>                 |          0x5
>                 |          
>                 |--4.07%-- __perf_counter_sched_out
>                 |          perf_counter_task_sched_out
>                 |          schedule
>                 |          schedule_timeout
>                 |          sk_wait_data
>                 |          tcp_recvmsg
>                 |          sock_common_recvmsg
>                 |          __sock_recvmsg
>                 |          sock_aio_read
>                 |          do_sync_read
>                 |          vfs_read
>                 |          sys_read
>                 |          sysenter_dispatch
>                 |          0xf7f50430
>                 |          0x5
>                 |          
>                 |--1.40%-- ipt_local_hook
>                 |          nf_iterate
>                 |          nf_hook_slow
>                 |          __ip_local_out
>                 |          ip_local_out
>                 |          ip_queue_xmit
>                 |          tcp_transmit_skb
>                 |          tcp_write_xmit
>                 |          __tcp_push_pending_frames
>                 |          tcp_push
>                 |          tcp_sendmsg
>                 |          __sock_sendmsg
>                 |          sock_aio_write
>                 |          do_sync_write
>                 |          vfs_write
>                 |          sys_write
>                 |          sysenter_dispatch
>                 |          0xf7f50430
>                 |          0x5
>                 |          
>                 |--1.26%-- mod_timer
>                 |          sk_reset_timer
>                 |          |          
>                 |          |--66.67%-- tcp_send_delayed_ack
>                 |          |          __tcp_ack_snd_check
>                 |          |          tcp_rcv_established
>                 |          |          tcp_v4_do_rcv
>                 |          |          tcp_prequeue_process
>                 |          |          tcp_recvmsg
>                 |          |          sock_common_recvmsg
>                 |          |          __sock_recvmsg
>                 |          |          sock_aio_read
>                 |          |          do_sync_read
>                 |          |          vfs_read
>                 |          |          sys_read
>                 |          |          sysenter_dispatch
>                 |          |          0xf7f50430
>                 |          |          0x5
>                 |          |          
>                 |           --33.33%-- tcp_v4_rcv
>                 |                     ip_local_deliver_finish
>                 |                     ip_local_deliver
>                 |                     ip_rcv_finish
>                 |                     ip_rcv
>                 |                     netif_receive_skb
>                 |                     process_backlog
>                 |                     net_rx_action
>                 |                     __do_softirq
>                 |                     call_softirq
>                 |                     do_softirq
>                 |                     _local_bh_enable_ip
>                 |                     local_bh_enable
>                 |                     dev_queue_xmit
>                 |                     ip_finish_output2
>                 |                     ip_finish_output
>                 |                     ip_output
>                 |                     ip_local_out
>                 |                     ip_queue_xmit
>                 |                     tcp_transmit_skb
>                 |                     tcp_write_xmit
>                 |                     __tcp_push_pending_frames
>                 |                     tcp_push
>                 |                     tcp_sendmsg
>                 |                     __sock_sendmsg
>                 |                     sock_aio_write
>                 |                     do_sync_write
>                 |                     vfs_write
>                 |                     sys_write
>                 |                     sysenter_dispatch
>                 |                     0xf7f50430
>                 |                     0x5
>                 |          
>                 |--0.70%-- ipt_local_in_hook
>                 |          nf_iterate
>                 |          nf_hook_slow
>                 |          ip_local_deliver
>                 |          ip_rcv_finish
>                 |          ip_rcv
>                 |          netif_receive_skb
>                 |          process_backlog
>                 |          net_rx_action
>                 |          __do_softirq
>                 |          call_softirq
>                 |          do_softirq
>                 |          _local_bh_enable_ip
>                 |          local_bh_enable
>                 |          dev_queue_xmit
>                 |          ip_finish_output2
>                 |          ip_finish_output
>                 |          ip_output
>                 |          ip_local_out
>                 |          ip_queue_xmit
>                 |          tcp_transmit_skb
>                 |          tcp_write_xmit
>                 |          __tcp_push_pending_frames
>                 |          tcp_push
>                 |          tcp_sendmsg
>                 |          __sock_sendmsg
>                 |          sock_aio_write
>                 |          do_sync_write
>                 |          vfs_write
>                 |          sys_write
>                 |          sysenter_dispatch
>                 |          0xf7f50430
>                 |          0x5
>                 |          
>                 |--0.56%-- perf_counter_task_sched_out
>                 |          schedule
>                 |          schedule_timeout
>                 |          sk_wait_data
>                 |          tcp_recvmsg
>                 |          sock_common_recvmsg
>                 |          __sock_recvmsg
>                 |          sock_aio_read
>                 |          do_sync_read
>                 |          vfs_read
>                 |          sys_read
>                 |          sysenter_dispatch
>                 |          0xf7f50430
>                 |          0x5
>                  --2.53%-- [...]
> 
>      1.66%     perf  [kernel]                
>                 |          
>                 |--25.00%-- journal_stop
>                 |          __ext3_journal_stop
>                 |          ext3_writeback_write_end
>                 |          generic_perform_write
>                 |          generic_file_buffered_write
>                 |          __generic_file_aio_write_nolock
>                 |          generic_file_aio_write
>                 |          ext3_file_write
>                 |          do_sync_write
>                 |          vfs_write
>                 |          sys_write
>                 |          system_call_fastpath
>                 |          __write_nocancel
>                 |          
>                 |--16.67%-- __perf_counter_sched_out
>                 |          perf_counter_task_sched_out
>                 |          schedule
>                 |          schedule_hrtimeout_range
>                 |          poll_schedule_timeout
>                 |          do_poll
>                 |          do_sys_poll
>                 |          sys_poll
>                 |          system_call_fastpath
>                 |          __GI___poll
>                 |          
>                 |--8.33%-- dup_mmap
>                 |          dup_mm
>                 |          copy_process
>                 |          do_fork
>                 |          sys_clone
>                 |          stub_clone
>                 |          fork
>                 |          __cmd_record
>                 |          
>                 |--8.33%-- hrtimer_interrupt
>                 |          smp_apic_timer_interrupt
>                 |          apic_timer_interrupt
>                 |          __write_nocancel
>                 |          
>                 |--8.33%-- journal_start
>                 |          ext3_journal_start_sb
>                 |          ext3_write_begin
>                 |          generic_perform_write
>                 |          generic_file_buffered_write
>                 |          __generic_file_aio_write_nolock
>                 |          generic_file_aio_write
>                 |          ext3_file_write
>                 |          do_sync_write
>                 |          vfs_write
>                 |          sys_write
>                 |          system_call_fastpath
>                 |          __write_nocancel
>                 |          
>                 |--8.33%-- create_empty_buffers
>                 |          __block_prepare_write
>                 |          block_write_begin
>                 |          ext3_write_begin
>                 |          generic_perform_write
>                 |          generic_file_buffered_write
>                 |          __generic_file_aio_write_nolock
>                 |          generic_file_aio_write
>                 |          ext3_file_write
>                 |          do_sync_write
>                 |          vfs_write
>                 |          sys_write
>                 |          system_call_fastpath
>                 |          __write_nocancel
>                 |          
>                 |--8.33%-- perf_counter_task_sched_out
>                 |          schedule
>                 |          schedule_hrtimeout_range
>                 |          poll_schedule_timeout
>                 |          do_poll
>                 |          do_sys_poll
>                 |          sys_poll
>                 |          system_call_fastpath
>                 |          __GI___poll
>                 |          
>                 |--8.33%-- ext3_alloc_branch
>                 |          ext3_get_blocks_handle
>                 |          ext3_get_block
>                 |          __block_prepare_write
>                 |          block_write_begin
>                 |          ext3_write_begin
>                 |          generic_perform_write
>                 |          generic_file_buffered_write
>                 |          __generic_file_aio_write_nolock
>                 |          generic_file_aio_write
>                 |          ext3_file_write
>                 |          do_sync_write
>                 |          vfs_write
>                 |          sys_write
>                 |          system_call_fastpath
>                 |          __write_nocancel
>                 |          
>                  --8.33%-- start_this_handle
>                            journal_start
>                            ext3_journal_start_sb
>                            ext3_write_begin
>                            generic_perform_write
>                            generic_file_buffered_write
>                            __generic_file_aio_write_nolock
>                            generic_file_aio_write
>                            ext3_file_write
>                            do_sync_write
>                            vfs_write
>                            sys_write
>                            system_call_fastpath
>                            __write_nocancel
> 
> the graph above (which is the sub-graph of the full graph, limited 
> via the -S/--symbols option to all callchains that intersect with 
> the _spin_lock symbol) details all the various code-paths that make 
> use of _spin_lock(), and the weights (and relative weights) of 
> those codepaths.
> 
> Note that this works for just about any locking primitive, and note 
> how it's recursive. If we inlined spin_lock we'd only know this 
> __mod_timer overhead:
> 
>                 |--31.46%-- __mod_timer
> 
> but we wouldnt know the second level (which is interesting too):
> 
>                 |          |--55.36%-- tcp_send_delayed_ack
>                 |           --44.64%-- tcp_v4_rcv
> 
> All in one, i think we should detach the issue of instrumentation 
> and profiling from the issue of whether to inline or not. We should 
> inline where it makes things faster. Our profiling tools are able 
> to deal with it, independently of whether we inline or not.
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
  2009-08-13 18:34   ` Ingo Molnar
@ 2009-08-14 12:34   ` Heiko Carstens
  2009-08-14 16:04     ` Linus Torvalds
  1 sibling, 1 reply; 22+ messages in thread
From: Heiko Carstens @ 2009-08-14 12:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt

On Thu, Aug 13, 2009 at 11:11:22AM -0700, Linus Torvalds wrote:
> On Wed, 12 Aug 2009, Heiko Carstens wrote:
> > If one considers that server kernels are usually compiled with
> > !CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop.
> > The extra overhead for a function call is significant.
> > With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390.
> > These numbers were taken with some network benchmarks. However I expect
> > any workload that calls frequently into the kernel and which grabs a few
> > locks to perform better.
> 
> Have you tried to determine which kinds of spinlocks you care most about? 
> With networking, I guess the softirq-safe ones (and perhaps the irq ones) 
> are pretty common, but maybe even there it's really just a couple of 
> cases.

It's all spinlock* variants that are frequently called (see also below).
 
> [...]
> So there are advantages outside of the "size vs speed" kind of things, but 
> at the same time I do suspect that the disadvantages mean that we really 
> should make this be something where an architecture can specify things at 
> a finer granularity. Perhaps by using per-symbol defines (ie have the 
> <asm/spinlock.h> file do things like
> 
> 	#define __spin_lock_is_small
> 	#define __spin_unlock_is_small
> 
> kind of things for the particular sequences that are worth inlining.

Ok, that makes sense. I will post updated patches.

FWIW:
All of this started when we compared a 2.6.27 based distro kernel
to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu
cycles than 2.6.16 does for the same workloads.

As an example: we took a simple one connection ping pong network load.
(client sends packet to server and waits until packet from server
 returns and game starts again). It uses more cpu cycles on 2.6.27.

Using ftrace we figured out that on 2.6.27 it takes more than 500 function
calls on the client for a single roundtrip while on 2.6.16 it took only
~230 calls.
Changing s390 from its own NOHZ code to the generic version did cost
quite a few cycles. The idle entry and exit paths are much more expensive
now. That's why you've seen Martin posting NOHZ/timer related patches lately.

But also quite a few functions that were inlined in 2.6.16 are out-of-line
in 2.6.27. Reverting those changes did improve performance as well.

Since out of the ~500 function calls 60 are spin_[un]lock* calls it seemed
worthwile inlining them. Which also improved things again.

Here are the function traces for 2.6.27 and 2.6.16:
(the timestamp output has been modified)

For 2.6.27:

# tracer: ftrace
#
#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
#              | |      |          |         |
            perl-2581  [00] 28991519113889978 [+  125]: SyS_write <-sysc_noemu
            perl-2581  [00] 28991519113890103 [+  156]: fget_light <-SyS_write
            perl-2581  [00] 28991519113890259 [+  125]: vfs_write <-SyS_write
            perl-2581  [00] 28991519113890384 [+  156]: rw_verify_area <-vfs_write
            perl-2581  [00] 28991519113890540 [+  125]: security_file_permission <-rw_verify_area
            perl-2581  [00] 28991519113890665 [+   93]: apparmor_file_permission <-security_file_permission
            perl-2581  [00] 28991519113890759 [+  156]: aa_file_permission <-apparmor_file_permission
            perl-2581  [00] 28991519113890915 [+  125]: do_sync_write <-vfs_write
            perl-2581  [00] 28991519113891040 [+  125]: sock_aio_write <-do_sync_write
            perl-2581  [00] 28991519113891165 [+  125]: security_socket_sendmsg <-sock_aio_write
            perl-2581  [00] 28991519113891290 [+  125]: apparmor_socket_sendmsg <-security_socket_sendmsg
            perl-2581  [00] 28991519113891415 [+  125]: aa_revalidate_sk <-apparmor_socket_sendmsg
            perl-2581  [00] 28991519113891540 [+  125]: tcp_sendmsg <-sock_aio_write
            perl-2581  [00] 28991519113891665 [+  125]: lock_sock_nested <-tcp_sendmsg
            perl-2581  [00] 28991519113891790 [+  125]: _spin_lock_bh <-lock_sock_nested
            perl-2581  [00] 28991519113891915 [+  125]: local_bh_disable <-_spin_lock_bh
            perl-2581  [00] 28991519113892040 [+  187]: local_bh_enable <-lock_sock_nested
            perl-2581  [00] 28991519113892228 [+  125]: tcp_current_mss <-tcp_sendmsg
            perl-2581  [00] 28991519113892353 [+  156]: tcp_established_options <-tcp_current_mss
            perl-2581  [00] 28991519113892509 [+  125]: sk_stream_alloc_skb <-tcp_sendmsg
            perl-2581  [00] 28991519113892634 [+  125]: __alloc_skb <-sk_stream_alloc_skb
            perl-2581  [00] 28991519113892759 [+  156]: kmem_cache_alloc <-__alloc_skb
            perl-2581  [00] 28991519113892915 [+  281]: __kmalloc <-__alloc_skb
            perl-2581  [00] 28991519113893196 [+  125]: skb_put <-tcp_sendmsg
            perl-2581  [00] 28991519113893321 [+  125]: copy_from_user_mvcos_check <-tcp_sendmsg
            perl-2581  [00] 28991519113893446 [+  156]: copy_from_user_std <-copy_from_user_mvcos_check
            perl-2581  [00] 28991519113893603 [+  125]: __tcp_push_pending_frames <-tcp_sendmsg
            perl-2581  [00] 28991519113893728 [+  125]: tcp_init_tso_segs <-__tcp_push_pending_frames
            perl-2581  [00] 28991519113893853 [+  125]: tcp_set_skb_tso_segs <-tcp_init_tso_segs
            perl-2581  [00] 28991519113893978 [+  156]: tcp_transmit_skb <-__tcp_push_pending_frames
            perl-2581  [00] 28991519113894134 [+   93]: skb_clone <-tcp_transmit_skb
            perl-2581  [00] 28991519113894228 [+  125]: __skb_clone <-skb_clone
            perl-2581  [00] 28991519113894353 [+  156]: __copy_skb_header <-__skb_clone
            perl-2581  [00] 28991519113894509 [+  125]: tcp_established_options <-tcp_transmit_skb
            perl-2581  [00] 28991519113894634 [+  125]: skb_push <-tcp_transmit_skb
            perl-2581  [00] 28991519113894759 [+  156]: __tcp_select_window <-tcp_transmit_skb
            perl-2581  [00] 28991519113894915 [+  125]: tcp_options_write <-tcp_transmit_skb
            perl-2581  [00] 28991519113895040 [+  156]: tcp_v4_send_check <-tcp_transmit_skb
            perl-2581  [00] 28991519113895196 [+  125]: ip_queue_xmit <-tcp_transmit_skb
            perl-2581  [00] 28991519113895321 [+  156]: __sk_dst_check <-ip_queue_xmit
            perl-2581  [00] 28991519113895478 [+  125]: skb_push <-ip_queue_xmit
            perl-2581  [00] 28991519113895603 [+   93]: ip_local_out <-ip_queue_xmit
            perl-2581  [00] 28991519113895696 [+  156]: __ip_local_out <-ip_local_out
            perl-2581  [00] 28991519113895853 [+  125]: ip_output <-ip_local_out
            perl-2581  [00] 28991519113895978 [+  156]: ip_finish_output <-ip_output
            perl-2581  [00] 28991519113896134 [+  125]: skb_push <-ip_finish_output
            perl-2581  [00] 28991519113896259 [+  125]: dev_queue_xmit <-ip_finish_output
            perl-2581  [00] 28991519113896384 [+  125]: local_bh_disable <-dev_queue_xmit
            perl-2581  [00] 28991519113896509 [+  187]: _spin_lock <-dev_queue_xmit
            perl-2581  [00] 28991519113896696 [+  125]: pfifo_fast_enqueue <-dev_queue_xmit
            perl-2581  [00] 28991519113896821 [+  125]: __qdisc_run <-dev_queue_xmit
            perl-2581  [00] 28991519113896946 [+  156]: pfifo_fast_dequeue <-__qdisc_run
            perl-2581  [00] 28991519113897103 [+  125]: _spin_lock <-__qdisc_run
            perl-2581  [00] 28991519113897228 [+  187]: dev_hard_start_xmit <-__qdisc_run
            perl-2581  [00] 28991519113897415 [+  156]: qeth_l3_hard_start_xmit <-dev_hard_start_xmit
            perl-2581  [00] 28991519113897571 [+  125]: qeth_get_cast_type <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113897696 [+  156]: qeth_get_priority_queue <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113897853 [+  125]: skb_realloc_headroom <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113897978 [+  125]: pskb_copy <-skb_realloc_headroom
            perl-2581  [00] 28991519113898103 [+   93]: __alloc_skb <-pskb_copy
            perl-2581  [00] 28991519113898196 [+  187]: kmem_cache_alloc <-__alloc_skb
            perl-2581  [00] 28991519113898384 [+  218]: __kmalloc <-__alloc_skb
            perl-2581  [00] 28991519113898603 [+  156]: skb_put <-pskb_copy
            perl-2581  [00] 28991519113898759 [+   93]: copy_skb_header <-pskb_copy
            perl-2581  [00] 28991519113898853 [+  218]: __copy_skb_header <-copy_skb_header
            perl-2581  [00] 28991519113899071 [+  125]: skb_pull <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113899196 [+  125]: skb_push <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113899321 [+  156]: qeth_l3_fill_header <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113899478 [+  125]: qeth_get_elements_no <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113899603 [+  156]: qeth_do_send_packet <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113899759 [+  125]: skb_queue_tail <-qeth_do_send_packet
            perl-2581  [00] 28991519113899884 [+  125]: _spin_lock_irqsave <-skb_queue_tail
            perl-2581  [00] 28991519113900009 [+  156]: _spin_unlock_irqrestore <-skb_queue_tail
            perl-2581  [00] 28991519113900165 [+  187]: qeth_flush_buffers <-qeth_do_send_packet
            perl-2581  [00] 28991519113900353 [+  156]: do_QDIO <-qeth_flush_buffers
            perl-2581  [00] 28991519113900509 [+  156]: qdio_perf_stat_inc <-do_QDIO
            perl-2581  [00] 28991519113900665 [+  125]: qdio_kick_outbound_q <-do_QDIO
            perl-2581  [00] 28991519113900790 [+  812]: qdio_perf_stat_inc <-qdio_kick_outbound_q
            perl-2581  [00] 28991519113901603 [+  218]: __tasklet_schedule <-do_QDIO
            perl-2581  [00] 28991519113901821 [+  156]: dev_kfree_skb_any <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113901978 [+  125]: kfree_skb <-dev_kfree_skb_any
            perl-2581  [00] 28991519113902103 [+   93]: __kfree_skb <-kfree_skb
            perl-2581  [00] 28991519113902196 [+  125]: skb_release_all <-__kfree_skb
            perl-2581  [00] 28991519113902321 [+  156]: dst_release <-skb_release_all
            perl-2581  [00] 28991519113902478 [+   93]: sock_wfree <-skb_release_all
            perl-2581  [00] 28991519113902571 [+  187]: skb_release_data <-skb_release_all
            perl-2581  [00] 28991519113902759 [+  125]: __netif_schedule <-qeth_l3_hard_start_xmit
            perl-2581  [00] 28991519113902884 [+  156]: raise_softirq_irqoff <-__netif_schedule
            perl-2581  [00] 28991519113903040 [+  125]: _spin_lock <-__qdisc_run
            perl-2581  [00] 28991519113903165 [+  187]: local_bh_enable <-dev_queue_xmit
            perl-2581  [00] 28991519113903353 [+  156]: do_softirq <-local_bh_enable
            perl-2581  [00] 28991519113903509 [+   93]: __do_softirq <-do_softirq
            perl-2581  [00] 28991519113903603 [+   93]: account_system_vtime <-__do_softirq
            perl-2581  [00] 28991519113903696 [+  125]: account_system_time <-account_system_vtime
            perl-2581  [00] 28991519113903821 [+   93]: acct_update_integrals <-account_system_time
            perl-2581  [00] 28991519113903915 [+  187]: jiffies_to_timeval <-acct_update_integrals
            perl-2581  [00] 28991519113904103 [+  156]: net_tx_action <-__do_softirq
            perl-2581  [00] 28991519113904259 [+  125]: _spin_trylock <-net_tx_action
            perl-2581  [00] 28991519113904384 [+  125]: __qdisc_run <-net_tx_action
            perl-2581  [00] 28991519113904509 [+  156]: pfifo_fast_dequeue <-__qdisc_run
            perl-2581  [00] 28991519113904665 [+  187]: tasklet_action <-__do_softirq
            perl-2581  [00] 28991519113904853 [+   93]: qdio_outbound_processing <-tasklet_action
            perl-2581  [00] 28991519113904946 [+  187]: qdio_perf_stat_inc <-qdio_outbound_processing
            perl-2581  [00] 28991519113905134 [+  156]: mod_timer <-qdio_outbound_processing
            perl-2581  [00] 28991519113905290 [+  125]: __timer_stats_timer_set_start_info <-mod_timer
            perl-2581  [00] 28991519113905415 [+  125]: __mod_timer <-mod_timer
            perl-2581  [00] 28991519113905540 [+  125]: __timer_stats_timer_set_start_info <-__mod_timer
            perl-2581  [00] 28991519113905665 [+   93]: lock_timer_base <-__mod_timer
            perl-2581  [00] 28991519113905759 [+  156]: _spin_lock_irqsave <-lock_timer_base
            perl-2581  [00] 28991519113905915 [+  125]: internal_add_timer <-__mod_timer
            perl-2581  [00] 28991519113906040 [+  125]: _spin_unlock_irqrestore <-__mod_timer
            perl-2581  [00] 28991519113906165 [+  156]: qdio_perf_stat_inc <-qdio_outbound_processing
            perl-2581  [00] 28991519113906321 [+  125]: account_system_vtime <-__do_softirq
            perl-2581  [00] 28991519113906446 [+   62]: account_system_time <-account_system_vtime
            perl-2581  [00] 28991519113906509 [+  125]: acct_update_integrals <-account_system_time
            perl-2581  [00] 28991519113906634 [+  125]: jiffies_to_timeval <-acct_update_integrals
            perl-2581  [00] 28991519113906759 [+  156]: _local_bh_enable <-__do_softirq
            perl-2581  [00] 28991519113906915 [+  125]: tcp_event_new_data_sent <-__tcp_push_pending_frames
            perl-2581  [00] 28991519113907040 [+  125]: sk_reset_timer <-tcp_event_new_data_sent
            perl-2581  [00] 28991519113907165 [+   93]: mod_timer <-sk_reset_timer
            perl-2581  [00] 28991519113907259 [+  187]: __timer_stats_timer_set_start_info <-mod_timer
            perl-2581  [00] 28991519113907446 [+  156]: release_sock <-tcp_sendmsg
            perl-2581  [00] 28991519113907603 [+  125]: _spin_lock_bh <-release_sock
            perl-2581  [00] 28991519113907728 [+  125]: local_bh_disable <-_spin_lock_bh
            perl-2581  [00] 28991519113907853 [+  125]: _spin_unlock_bh <-release_sock
            perl-2581  [00] 28991519113907978 [+  187]: local_bh_enable_ip <-_spin_unlock_bh
            perl-2581  [00] 28991519113908165 [+   93]: dnotify_parent <-vfs_write
            perl-2581  [00] 28991519113908259 [+  156]: _spin_lock <-dnotify_parent
            perl-2581  [00] 28991519113908415 [+  156]: inotify_dentry_parent_queue_event <-vfs_write
            perl-2581  [00] 28991519113908571 [+  875]: inotify_inode_queue_event <-vfs_write
            perl-2581  [00] 28991519113909446 [+   93]: SyS_read <-sysc_noemu
            perl-2581  [00] 28991519113909540 [+  156]: fget_light <-SyS_read
            perl-2581  [00] 28991519113909696 [+  125]: vfs_read <-SyS_read
            perl-2581  [00] 28991519113909821 [+  125]: rw_verify_area <-vfs_read
            perl-2581  [00] 28991519113909946 [+  125]: security_file_permission <-rw_verify_area
            perl-2581  [00] 28991519113910071 [+  125]: apparmor_file_permission <-security_file_permission
            perl-2581  [00] 28991519113910196 [+  125]: aa_file_permission <-apparmor_file_permission
            perl-2581  [00] 28991519113910321 [+  156]: do_sync_read <-vfs_read
            perl-2581  [00] 28991519113910478 [+  125]: sock_aio_read <-do_sync_read
            perl-2581  [00] 28991519113910603 [+  125]: security_socket_recvmsg <-sock_aio_read
            perl-2581  [00] 28991519113910728 [+   93]: apparmor_socket_recvmsg <-security_socket_recvmsg
            perl-2581  [00] 28991519113910821 [+  156]: aa_revalidate_sk <-apparmor_socket_recvmsg
            perl-2581  [00] 28991519113910978 [+  125]: sock_common_recvmsg <-sock_aio_read
            perl-2581  [00] 28991519113911103 [+  125]: tcp_recvmsg <-sock_common_recvmsg
            perl-2581  [00] 28991519113911228 [+  125]: lock_sock_nested <-tcp_recvmsg
            perl-2581  [00] 28991519113911353 [+  125]: _spin_lock_bh <-lock_sock_nested
            perl-2581  [00] 28991519113911478 [+  125]: local_bh_disable <-_spin_lock_bh
            perl-2581  [00] 28991519113911603 [+  250]: local_bh_enable <-lock_sock_nested
            perl-2581  [00] 28991519113911853 [+  125]: tcp_cleanup_rbuf <-tcp_recvmsg
            perl-2581  [00] 28991519113911978 [+  125]: sk_wait_data <-tcp_recvmsg
            perl-2581  [00] 28991519113912103 [+  125]: prepare_to_wait <-sk_wait_data
            perl-2581  [00] 28991519113912228 [+  125]: _spin_lock_irqsave <-prepare_to_wait
            perl-2581  [00] 28991519113912353 [+  125]: _spin_unlock_irqrestore <-prepare_to_wait
            perl-2581  [00] 28991519113912478 [+  125]: release_sock <-sk_wait_data
            perl-2581  [00] 28991519113912603 [+  125]: _spin_lock_bh <-release_sock
            perl-2581  [00] 28991519113912728 [+  125]: local_bh_disable <-_spin_lock_bh
            perl-2581  [00] 28991519113912853 [+  125]: _spin_unlock_bh <-release_sock
            perl-2581  [00] 28991519113912978 [+  187]: local_bh_enable_ip <-_spin_unlock_bh
            perl-2581  [00] 28991519113913165 [+  125]: schedule_timeout <-sk_wait_data
            perl-2581  [00] 28991519113913290 [+  125]: schedule <-schedule_timeout
            perl-2581  [00] 28991519113913415 [+  125]: _spin_lock <-schedule
            perl-2581  [00] 28991519113913540 [+   93]: sched_clock_cpu <-schedule
            perl-2581  [00] 28991519113913634 [+  125]: sched_clock <-sched_clock_cpu
            perl-2581  [00] 28991519113913759 [+   93]: deactivate_task <-schedule
            perl-2581  [00] 28991519113913853 [+  125]: dequeue_task <-deactivate_task
            perl-2581  [00] 28991519113913978 [+   93]: dequeue_task_fair <-dequeue_task
            perl-2581  [00] 28991519113914071 [+   93]: dequeue_entity <-dequeue_task_fair
            perl-2581  [00] 28991519113914165 [+  156]: update_curr <-dequeue_entity
            perl-2581  [00] 28991519113914321 [+   93]: hrtick_start_fair <-dequeue_task_fair
            perl-2581  [00] 28991519113914415 [+  125]: put_prev_task_fair <-schedule
            perl-2581  [00] 28991519113914540 [+  125]: pick_next_task_fair <-schedule
            perl-2581  [00] 28991519113914665 [+   93]: pick_next_task_rt <-schedule
            perl-2581  [00] 28991519113914759 [+   93]: pick_next_task_fair <-schedule
            perl-2581  [00] 28991519113914853 [+  125]: pick_next_task_idle <-schedule
            perl-2581  [00] 28991519113914978 [+  218]: _spin_trylock <-tracing_record_cmdline
          <idle>-0     [00] 28991519113915196 [+  125]: finish_task_switch <-schedule
          <idle>-0     [00] 28991519113915321 [+   62]: account_vtime <-finish_task_switch
          <idle>-0     [00] 28991519113915384 [+   93]: account_user_time <-account_vtime
          <idle>-0     [00] 28991519113915478 [+   93]: acct_update_integrals <-account_user_time
          <idle>-0     [00] 28991519113915571 [+  125]: jiffies_to_timeval <-acct_update_integrals
          <idle>-0     [00] 28991519113915696 [+   93]: account_system_time <-account_vtime
          <idle>-0     [00] 28991519113915790 [+   93]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28991519113915884 [+  187]: jiffies_to_timeval <-acct_update_integrals
          <idle>-0     [00] 28991519113916071 [+  125]: tick_nohz_stop_sched_tick <-cpu_idle
          <idle>-0     [00] 28991519113916196 [+   93]: ktime_get <-tick_nohz_stop_sched_tick
          <idle>-0     [00] 28991519113916290 [+   93]: ktime_get_ts <-ktime_get
          <idle>-0     [00] 28991519113916384 [+   93]: getnstimeofday <-ktime_get_ts
          <idle>-0     [00] 28991519113916478 [+  125]: read_tod_clock <-getnstimeofday
          <idle>-0     [00] 28991519113916603 [+  156]: set_normalized_timespec <-ktime_get_ts
          <idle>-0     [00] 28991519113916759 [+   93]: get_next_timer_interrupt <-tick_nohz_stop_sched_tick
          <idle>-0     [00] 28991519113916853 [+  125]: _spin_lock <-get_next_timer_interrupt
          <idle>-0     [00] 28991519113916978 [+   93]: hrtimer_get_next_event <-get_next_timer_interrupt
          <idle>-0     [00] 28991519113917071 [+  125]: _spin_lock_irqsave <-hrtimer_get_next_event
          <idle>-0     [00] 28991519113917196 [+   93]: _spin_unlock_irqrestore <-hrtimer_get_next_event
          <idle>-0     [00] 28991519113917290 [+   93]: rcu_needs_cpu <-tick_nohz_stop_sched_tick
          <idle>-0     [00] 28991519113917384 [+  156]: rcu_pending <-rcu_needs_cpu
          <idle>-0     [00] 28991519113917540 [+   93]: select_nohz_load_balancer <-tick_nohz_stop_sched_tick
          <idle>-0     [00] 28991519113917634 [+   93]: hweight64 <-select_nohz_load_balancer
          <idle>-0     [00] 28991519113917728 [+  156]: hweight64 <-select_nohz_load_balancer
          <idle>-0     [00] 28991519113917884 [+   62]: hrtimer_start <-tick_nohz_stop_sched_tick
          <idle>-0     [00] 28991519113917946 [+   93]: lock_hrtimer_base <-hrtimer_start
          <idle>-0     [00] 28991519113918040 [+  125]: _spin_lock_irqsave <-lock_hrtimer_base
          <idle>-0     [00] 28991519113918165 [+   93]: __remove_hrtimer <-hrtimer_start
          <idle>-0     [00] 28991519113918259 [+  125]: rb_next <-__remove_hrtimer
          <idle>-0     [00] 28991519113918384 [+  125]: hrtimer_force_reprogram <-__remove_hrtimer
          <idle>-0     [00] 28991519113918509 [+   93]: tick_program_event <-hrtimer_force_reprogram
          <idle>-0     [00] 28991519113918603 [+   93]: tick_dev_program_event <-tick_program_event
          <idle>-0     [00] 28991519113918696 [+   93]: ktime_get <-tick_dev_program_event
          <idle>-0     [00] 28991519113918790 [+   93]: ktime_get_ts <-ktime_get
          <idle>-0     [00] 28991519113918884 [+   62]: getnstimeofday <-ktime_get_ts
          <idle>-0     [00] 28991519113918946 [+  125]: read_tod_clock <-getnstimeofday
          <idle>-0     [00] 28991519113919071 [+  125]: set_normalized_timespec <-ktime_get_ts
          <idle>-0     [00] 28991519113919196 [+   93]: clockevents_program_event <-tick_dev_program_event
          <idle>-0     [00] 28991519113919290 [+   93]: s390_next_event <-clockevents_program_event
          <idle>-0     [00] 28991519113919384 [+  156]: rb_erase <-__remove_hrtimer
          <idle>-0     [00] 28991519113919540 [+   93]: __timer_stats_hrtimer_set_start_info <-hrtimer_start
          <idle>-0     [00] 28991519113919634 [+   93]: enqueue_hrtimer <-hrtimer_start
          <idle>-0     [00] 28991519113919728 [+  125]: hrtimer_reprogram <-enqueue_hrtimer
          <idle>-0     [00] 28991519113919853 [+   62]: tick_program_event <-hrtimer_reprogram
          <idle>-0     [00] 28991519113919915 [+   93]: tick_dev_program_event <-tick_program_event
          <idle>-0     [00] 28991519113920009 [+   93]: ktime_get <-tick_dev_program_event
          <idle>-0     [00] 28991519113920103 [+   93]: ktime_get_ts <-ktime_get
          <idle>-0     [00] 28991519113920196 [+   62]: getnstimeofday <-ktime_get_ts
          <idle>-0     [00] 28991519113920259 [+  125]: read_tod_clock <-getnstimeofday
          <idle>-0     [00] 28991519113920384 [+  125]: set_normalized_timespec <-ktime_get_ts
          <idle>-0     [00] 28991519113920509 [+   62]: clockevents_program_event <-tick_dev_program_event
          <idle>-0     [00] 28991519113920571 [+  125]: s390_next_event <-clockevents_program_event
          <idle>-0     [00] 28991519113920696 [+  125]: rb_insert_color <-enqueue_hrtimer
          <idle>-0     [00] 28991519113920821 [+  156]: _spin_unlock_irqrestore <-hrtimer_start
          <idle>-0     [00] 28991519113920978 [+  125]: _spin_lock <-cpu_idle
          <idle>-0     [00] 28991519113921103 [+229750]: vtime_stop_cpu_timer <-cpu_idle
          <idle>-0     [00] 28991519114150853 [+   93]: do_IRQ <-io_return
          <idle>-0     [00] 28991519114150946 [+   62]: irq_enter <-do_IRQ
          <idle>-0     [00] 28991519114151009 [+  125]: idle_cpu <-irq_enter
          <idle>-0     [00] 28991519114151134 [+   93]: tick_nohz_stop_idle <-irq_enter
          <idle>-0     [00] 28991519114151228 [+   62]: ktime_get <-tick_nohz_stop_idle
          <idle>-0     [00] 28991519114151290 [+  125]: ktime_get_ts <-ktime_get
          <idle>-0     [00] 28991519114151415 [+   62]: getnstimeofday <-ktime_get_ts
          <idle>-0     [00] 28991519114151478 [+   93]: read_tod_clock <-getnstimeofday
          <idle>-0     [00] 28991519114151571 [+  125]: set_normalized_timespec <-ktime_get_ts
          <idle>-0     [00] 28991519114151696 [+   93]: account_system_vtime <-irq_enter
          <idle>-0     [00] 28991519114151790 [+   93]: account_system_time <-account_system_vtime
          <idle>-0     [00] 28991519114151884 [+  125]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28991519114152009 [+   93]: idle_cpu <-irq_enter
          <idle>-0     [00] 28991519114152103 [+   62]: tick_nohz_update_jiffies <-irq_enter
          <idle>-0     [00] 28991519114152165 [+   93]: ktime_get <-tick_nohz_update_jiffies
          <idle>-0     [00] 28991519114152259 [+   93]: ktime_get_ts <-ktime_get
          <idle>-0     [00] 28991519114152353 [+   62]: getnstimeofday <-ktime_get_ts
          <idle>-0     [00] 28991519114152415 [+   93]: read_tod_clock <-getnstimeofday
          <idle>-0     [00] 28991519114152509 [+  125]: set_normalized_timespec <-ktime_get_ts
          <idle>-0     [00] 28991519114152634 [+  125]: tick_do_update_jiffies64 <-tick_nohz_update_jiffies
          <idle>-0     [00] 28991519114152759 [+   62]: s390_idle_leave <-do_IRQ
          <idle>-0     [00] 28991519114152821 [+  125]: vtime_start_cpu_timer <-s390_idle_leave
          <idle>-0     [00] 28991519114152946 [+   93]: _spin_lock <-s390_idle_leave
          <idle>-0     [00] 28991519114153040 [+  250]: do_adapter_IO <-do_IRQ
          <idle>-0     [00] 28991519114153290 [+   93]: tiqdio_thinint_handler <-do_adapter_IO
          <idle>-0     [00] 28991519114153384 [+  156]: qdio_perf_stat_inc <-tiqdio_thinint_handler
          <idle>-0     [00] 28991519114153540 [+  250]: __tasklet_hi_schedule <-tiqdio_thinint_handler
          <idle>-0     [00] 28991519114153790 [+   93]: irq_exit <-do_IRQ
          <idle>-0     [00] 28991519114153884 [+   93]: account_system_vtime <-irq_exit
          <idle>-0     [00] 28991519114153978 [+   93]: account_system_time <-account_system_vtime
          <idle>-0     [00] 28991519114154071 [+   93]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28991519114154165 [+   93]: do_softirq <-irq_exit
          <idle>-0     [00] 28991519114154259 [+   93]: __do_softirq <-do_softirq
          <idle>-0     [00] 28991519114154353 [+   62]: account_system_vtime <-__do_softirq
          <idle>-0     [00] 28991519114154415 [+  125]: account_system_time <-account_system_vtime
          <idle>-0     [00] 28991519114154540 [+   93]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28991519114154634 [+   93]: tasklet_hi_action <-__do_softirq
          <idle>-0     [00] 28991519114154728 [+   93]: tiqdio_tasklet_fn <-tasklet_hi_action
          <idle>-0     [00] 28991519114154821 [+  218]: qdio_perf_stat_inc <-tiqdio_tasklet_fn
          <idle>-0     [00] 28991519114155040 [+  218]: __tasklet_schedule <-tiqdio_tasklet_fn
          <idle>-0     [00] 28991519114155259 [+  125]: tasklet_action <-__do_softirq
          <idle>-0     [00] 28991519114155384 [+   93]: tiqdio_inbound_processing <-tasklet_action
          <idle>-0     [00] 28991519114155478 [+   93]: qdio_perf_stat_inc <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114155571 [+  125]: qdio_sync_after_thinint <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114155696 [+   93]: qdio_siga_sync <-qdio_sync_after_thinint
          <idle>-0     [00] 28991519114155790 [+  125]: qdio_check_outbound_after_thinint <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114155915 [+  125]: __tasklet_schedule <-qdio_check_outbound_after_thinint
          <idle>-0     [00] 28991519114156040 [+  218]: qdio_inbound_q_moved <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114156259 [+   93]: qdio_kick_inbound_handler <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114156353 [+   93]: qdio_perf_stat_inc <-qdio_kick_inbound_handler
          <idle>-0     [00] 28991519114156446 [+  156]: qeth_l3_qdio_input_handler <-qdio_kick_inbound_handler
          <idle>-0     [00] 28991519114156603 [+  218]: qeth_core_get_next_skb <-qeth_l3_qdio_input_handler
          <idle>-0     [00] 28991519114156821 [+   62]: dev_alloc_skb <-qeth_core_get_next_skb
          <idle>-0     [00] 28991519114156884 [+  125]: __alloc_skb <-dev_alloc_skb
          <idle>-0     [00] 28991519114157009 [+   93]: kmem_cache_alloc <-__alloc_skb
          <idle>-0     [00] 28991519114157103 [+  187]: __kmalloc <-__alloc_skb
          <idle>-0     [00] 28991519114157290 [+  187]: skb_put <-qeth_core_get_next_skb
          <idle>-0     [00] 28991519114157478 [+   93]: eth_header <-qeth_l3_qdio_input_handler
          <idle>-0     [00] 28991519114157571 [+   93]: skb_push <-eth_header
          <idle>-0     [00] 28991519114157665 [+   93]: eth_type_trans <-qeth_l3_qdio_input_handler
          <idle>-0     [00] 28991519114157759 [+  125]: skb_pull <-eth_type_trans
          <idle>-0     [00] 28991519114157884 [+   93]: netif_rx <-qeth_l3_qdio_input_handler
          <idle>-0     [00] 28991519114157978 [+  125]: __napi_schedule <-netif_rx
          <idle>-0     [00] 28991519114158103 [+  125]: qeth_core_get_next_skb <-qeth_l3_qdio_input_handler
          <idle>-0     [00] 28991519114158228 [+  125]: qeth_queue_input_buffer <-qeth_l3_qdio_input_handler
          <idle>-0     [00] 28991519114158353 [+   62]: qdio_siga_sync_q <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114158415 [+  125]: qdio_siga_sync <-qdio_siga_sync_q
          <idle>-0     [00] 28991519114158540 [+   93]: get_buf_state <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114158634 [+   93]: qdio_stop_polling <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114158728 [+  125]: qdio_perf_stat_inc <-qdio_stop_polling
          <idle>-0     [00] 28991519114158853 [+   93]: qdio_siga_sync_q <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114158946 [+   93]: qdio_siga_sync <-qdio_siga_sync_q
          <idle>-0     [00] 28991519114159040 [+  156]: get_buf_state <-tiqdio_inbound_processing
          <idle>-0     [00] 28991519114159196 [+   93]: net_rx_action <-__do_softirq
          <idle>-0     [00] 28991519114159290 [+   93]: process_backlog <-net_rx_action
          <idle>-0     [00] 28991519114159384 [+  156]: netif_receive_skb <-process_backlog
          <idle>-0     [00] 28991519114159540 [+  125]: ip_rcv <-netif_receive_skb
          <idle>-0     [00] 28991519114159665 [+  125]: ip_rcv_finish <-ip_rcv
          <idle>-0     [00] 28991519114159790 [+  156]: ip_route_input <-ip_rcv_finish
          <idle>-0     [00] 28991519114159946 [+   62]: ip_local_deliver <-ip_rcv_finish
          <idle>-0     [00] 28991519114160009 [+   93]: ip_local_deliver_finish <-ip_local_deliver
          <idle>-0     [00] 28991519114160103 [+  125]: raw_local_deliver <-ip_local_deliver_finish
          <idle>-0     [00] 28991519114160228 [+   93]: tcp_v4_rcv <-ip_local_deliver_finish
          <idle>-0     [00] 28991519114160321 [+  125]: __inet_lookup_established <-tcp_v4_rcv
          <idle>-0     [00] 28991519114160446 [+  125]: _read_lock <-__inet_lookup_established
          <idle>-0     [00] 28991519114160571 [+   93]: sk_filter <-tcp_v4_rcv
          <idle>-0     [00] 28991519114160665 [+   93]: security_sock_rcv_skb <-sk_filter
          <idle>-0     [00] 28991519114160759 [+   62]: cap_socket_sock_rcv_skb <-security_sock_rcv_skb
          <idle>-0     [00] 28991519114160821 [+  125]: local_bh_disable <-sk_filter
          <idle>-0     [00] 28991519114160946 [+   93]: local_bh_enable <-sk_filter
          <idle>-0     [00] 28991519114161040 [+   93]: _spin_lock <-tcp_v4_rcv
          <idle>-0     [00] 28991519114161134 [+   93]: __wake_up <-tcp_v4_rcv
          <idle>-0     [00] 28991519114161228 [+   93]: _spin_lock_irqsave <-__wake_up
          <idle>-0     [00] 28991519114161321 [+   93]: __wake_up_common <-__wake_up
          <idle>-0     [00] 28991519114161415 [+   93]: autoremove_wake_function <-__wake_up_common
          <idle>-0     [00] 28991519114161509 [+   62]: default_wake_function <-autoremove_wake_function
          <idle>-0     [00] 28991519114161571 [+  125]: try_to_wake_up <-default_wake_function
          <idle>-0     [00] 28991519114161696 [+   62]: task_rq_lock <-try_to_wake_up
          <idle>-0     [00] 28991519114161759 [+  125]: _spin_lock <-task_rq_lock
          <idle>-0     [00] 28991519114161884 [+  125]: select_task_rq_fair <-try_to_wake_up
          <idle>-0     [00] 28991519114162009 [+   93]: sched_clock_cpu <-try_to_wake_up
          <idle>-0     [00] 28991519114162103 [+   93]: sched_clock <-sched_clock_cpu
          <idle>-0     [00] 28991519114162196 [+   93]: activate_task <-try_to_wake_up
          <idle>-0     [00] 28991519114162290 [+   93]: enqueue_task <-activate_task
          <idle>-0     [00] 28991519114162384 [+   93]: enqueue_task_fair <-enqueue_task
          <idle>-0     [00] 28991519114162478 [+   93]: enqueue_entity <-enqueue_task_fair
          <idle>-0     [00] 28991519114162571 [+   93]: update_curr <-enqueue_entity
          <idle>-0     [00] 28991519114162665 [+   93]: place_entity <-enqueue_entity
          <idle>-0     [00] 28991519114162759 [+   93]: account_scheduler_latency <-enqueue_entity
          <idle>-0     [00] 28991519114162853 [+   93]: __enqueue_entity <-enqueue_entity
          <idle>-0     [00] 28991519114162946 [+   93]: rb_insert_color <-__enqueue_entity
          <idle>-0     [00] 28991519114163040 [+  125]: hrtick_start_fair <-enqueue_task_fair
          <idle>-0     [00] 28991519114163165 [+   93]: check_preempt_curr_idle <-try_to_wake_up
          <idle>-0     [00] 28991519114163259 [+   62]: resched_task <-check_preempt_curr_idle
          <idle>-0     [00] 28991519114163321 [+  125]: _spin_unlock_irqrestore <-try_to_wake_up
          <idle>-0     [00] 28991519114163446 [+   93]: _spin_unlock_irqrestore <-__wake_up
          <idle>-0     [00] 28991519114163540 [+   93]: sk_reset_timer <-tcp_v4_rcv
          <idle>-0     [00] 28991519114163634 [+  125]: mod_timer <-sk_reset_timer
          <idle>-0     [00] 28991519114163759 [+  125]: __timer_stats_timer_set_start_info <-mod_timer
          <idle>-0     [00] 28991519114163884 [+  125]: __mod_timer <-mod_timer
          <idle>-0     [00] 28991519114164009 [+  125]: __timer_stats_timer_set_start_info <-__mod_timer
          <idle>-0     [00] 28991519114164134 [+  125]: lock_timer_base <-__mod_timer
          <idle>-0     [00] 28991519114164259 [+  125]: _spin_lock_irqsave <-lock_timer_base
          <idle>-0     [00] 28991519114164384 [+  125]: internal_add_timer <-__mod_timer
          <idle>-0     [00] 28991519114164509 [+  281]: _spin_unlock_irqrestore <-__mod_timer
          <idle>-0     [00] 28991519114164790 [+  156]: tasklet_action <-__do_softirq
          <idle>-0     [00] 28991519114164946 [+  125]: qdio_outbound_processing <-tasklet_action
          <idle>-0     [00] 28991519114165071 [+  250]: qdio_perf_stat_inc <-qdio_outbound_processing
          <idle>-0     [00] 28991519114165321 [+  187]: qeth_qdio_output_handler <-qdio_outbound_processing
          <idle>-0     [00] 28991519114165509 [+  125]: qeth_check_qdio_errors <-qeth_qdio_output_handler
          <idle>-0     [00] 28991519114165634 [+  125]: qeth_clear_output_buffer <-qeth_qdio_output_handler
          <idle>-0     [00] 28991519114165759 [+   93]: skb_dequeue <-qeth_clear_output_buffer
          <idle>-0     [00] 28991519114165853 [+  125]: _spin_lock_irqsave <-skb_dequeue
          <idle>-0     [00] 28991519114165978 [+  156]: _spin_unlock_irqrestore <-skb_dequeue
          <idle>-0     [00] 28991519114166134 [+  156]: dev_kfree_skb_any <-qeth_clear_output_buffer
          <idle>-0     [00] 28991519114166290 [+  125]: kfree_skb <-dev_kfree_skb_any
          <idle>-0     [00] 28991519114166415 [+  125]: __kfree_skb <-kfree_skb
          <idle>-0     [00] 28991519114166540 [+  125]: skb_release_all <-__kfree_skb
          <idle>-0     [00] 28991519114166665 [+  125]: dst_release <-skb_release_all
          <idle>-0     [00] 28991519114166790 [+  125]: skb_release_data <-skb_release_all
          <idle>-0     [00] 28991519114166915 [+  218]: kfree <-skb_release_data
          <idle>-0     [00] 28991519114167134 [+  156]: kmem_cache_free <-__kfree_skb
          <idle>-0     [00] 28991519114167290 [+  125]: skb_dequeue <-qeth_clear_output_buffer
          <idle>-0     [00] 28991519114167415 [+  125]: _spin_lock_irqsave <-skb_dequeue
          <idle>-0     [00] 28991519114167540 [+  125]: _spin_unlock_irqrestore <-skb_dequeue
          <idle>-0     [00] 28991519114167665 [+  312]: qeth_eddp_buf_release_contexts <-qeth_clear_output_buffer
          <idle>-0     [00] 28991519114167978 [+  156]: qeth_switch_to_nonpacking_if_needed <-qeth_qdio_output_handler
          <idle>-0     [00] 28991519114168134 [+  125]: __netif_schedule <-qeth_qdio_output_handler
          <idle>-0     [00] 28991519114168259 [+  156]: raise_softirq_irqoff <-__netif_schedule
          <idle>-0     [00] 28991519114168415 [+  125]: del_timer <-qdio_outbound_processing
          <idle>-0     [00] 28991519114168540 [+   93]: lock_timer_base <-del_timer
          <idle>-0     [00] 28991519114168634 [+  156]: _spin_lock_irqsave <-lock_timer_base
          <idle>-0     [00] 28991519114168790 [+  218]: _spin_unlock_irqrestore <-del_timer
          <idle>-0     [00] 28991519114169009 [+  187]: net_tx_action <-__do_softirq
          <idle>-0     [00] 28991519114169196 [+  125]: _spin_trylock <-net_tx_action
          <idle>-0     [00] 28991519114169321 [+  125]: __qdisc_run <-net_tx_action
          <idle>-0     [00] 28991519114169446 [+  156]: pfifo_fast_dequeue <-__qdisc_run
          <idle>-0     [00] 28991519114169603 [+   93]: account_system_vtime <-__do_softirq
          <idle>-0     [00] 28991519114169696 [+  125]: account_system_time <-account_system_vtime
          <idle>-0     [00] 28991519114169821 [+   93]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28991519114169915 [+  125]: _local_bh_enable <-__do_softirq
          <idle>-0     [00] 28991519114170040 [+  187]: idle_cpu <-irq_exit
          <idle>-0     [00] 28991519114170228 [+  125]: tick_nohz_restart_sched_tick <-cpu_idle
          <idle>-0     [00] 28991519114170353 [+   93]: tick_nohz_stop_idle <-tick_nohz_restart_sched_tick
          <idle>-0     [00] 28991519114170446 [+   93]: select_nohz_load_balancer <-tick_nohz_restart_sched_tick
          <idle>-0     [00] 28991519114170540 [+   93]: ktime_get <-tick_nohz_restart_sched_tick
          <idle>-0     [00] 28991519114170634 [+  125]: ktime_get_ts <-ktime_get
          <idle>-0     [00] 28991519114170759 [+   62]: getnstimeofday <-ktime_get_ts
          <idle>-0     [00] 28991519114170821 [+  156]: read_tod_clock <-getnstimeofday
          <idle>-0     [00] 28991519114170978 [+   93]: set_normalized_timespec <-ktime_get_ts
          <idle>-0     [00] 28991519114171071 [+  125]: tick_do_update_jiffies64 <-tick_nohz_restart_sched_tick
          <idle>-0     [00] 28991519114171196 [+   93]: hrtimer_cancel <-tick_nohz_restart_sched_tick
          <idle>-0     [00] 28991519114171290 [+   93]: hrtimer_try_to_cancel <-hrtimer_cancel
          <idle>-0     [00] 28991519114171384 [+   62]: lock_hrtimer_base <-hrtimer_try_to_cancel
          <idle>-0     [00] 28991519114171446 [+  125]: _spin_lock_irqsave <-lock_hrtimer_base
          <idle>-0     [00] 28991519114171571 [+   93]: __remove_hrtimer <-hrtimer_try_to_cancel
          <idle>-0     [00] 28991519114171665 [+   93]: rb_next <-__remove_hrtimer
          <idle>-0     [00] 28991519114171759 [+   93]: hrtimer_force_reprogram <-__remove_hrtimer
          <idle>-0     [00] 28991519114171853 [+   93]: tick_program_event <-hrtimer_force_reprogram
          <idle>-0     [00] 28991519114171946 [+   93]: tick_dev_program_event <-tick_program_event
          <idle>-0     [00] 28991519114172040 [+  125]: ktime_get <-tick_dev_program_event
          <idle>-0     [00] 28991519114172165 [+   62]: ktime_get_ts <-ktime_get
          <idle>-0     [00] 28991519114172228 [+   93]: getnstimeofday <-ktime_get_ts
          <idle>-0     [00] 28991519114172321 [+   93]: read_tod_clock <-getnstimeofday
          <idle>-0     [00] 28991519114172415 [+  156]: set_normalized_timespec <-ktime_get_ts
          <idle>-0     [00] 28991519114172571 [+   93]: clockevents_program_event <-tick_dev_program_event
          <idle>-0     [00] 28991519114172665 [+  125]: s390_next_event <-clockevents_program_event
          <idle>-0     [00] 28991519114172790 [+  125]: rb_erase <-__remove_hrtimer
          <idle>-0     [00] 28991519114172915 [+   93]: _spin_unlock_irqrestore <-hrtimer_try_to_cancel
          <idle>-0     [00] 28991519114173009 [+   93]: hrtimer_forward <-tick_nohz_restart_sched_tick
          <idle>-0     [00] 28991519114173103 [+  125]: hrtimer_start <-tick_nohz_restart_sched_tick
          <idle>-0     [00] 28991519114173228 [+   62]: lock_hrtimer_base <-hrtimer_start
          <idle>-0     [00] 28991519114173290 [+  125]: _spin_lock_irqsave <-lock_hrtimer_base
          <idle>-0     [00] 28991519114173415 [+   93]: __timer_stats_hrtimer_set_start_info <-hrtimer_start
          <idle>-0     [00] 28991519114173509 [+   93]: enqueue_hrtimer <-hrtimer_start
          <idle>-0     [00] 28991519114173603 [+   93]: hrtimer_reprogram <-enqueue_hrtimer
          <idle>-0     [00] 28991519114173696 [+   93]: tick_program_event <-hrtimer_reprogram
          <idle>-0     [00] 28991519114173790 [+   93]: tick_dev_program_event <-tick_program_event
          <idle>-0     [00] 28991519114173884 [+   62]: ktime_get <-tick_dev_program_event
          <idle>-0     [00] 28991519114173946 [+   93]: ktime_get_ts <-ktime_get
          <idle>-0     [00] 28991519114174040 [+   93]: getnstimeofday <-ktime_get_ts
          <idle>-0     [00] 28991519114174134 [+   93]: read_tod_clock <-getnstimeofday
          <idle>-0     [00] 28991519114174228 [+   93]: set_normalized_timespec <-ktime_get_ts
          <idle>-0     [00] 28991519114174321 [+  125]: clockevents_program_event <-tick_dev_program_event
          <idle>-0     [00] 28991519114174446 [+  125]: s390_next_event <-clockevents_program_event
          <idle>-0     [00] 28991519114174571 [+   93]: rb_insert_color <-enqueue_hrtimer
          <idle>-0     [00] 28991519114174665 [+  156]: _spin_unlock_irqrestore <-hrtimer_start
          <idle>-0     [00] 28991519114174821 [+  125]: schedule <-cpu_idle
          <idle>-0     [00] 28991519114174946 [+   93]: _spin_lock <-schedule
          <idle>-0     [00] 28991519114175040 [+  125]: sched_clock_cpu <-schedule
          <idle>-0     [00] 28991519114175165 [+  125]: sched_clock <-sched_clock_cpu
          <idle>-0     [00] 28991519114175290 [+   93]: put_prev_task_idle <-schedule
          <idle>-0     [00] 28991519114175384 [+   93]: pick_next_task_fair <-schedule
          <idle>-0     [00] 28991519114175478 [+   93]: set_next_entity <-pick_next_task_fair
          <idle>-0     [00] 28991519114175571 [+   93]: __dequeue_entity <-set_next_entity
          <idle>-0     [00] 28991519114175665 [+   93]: rb_next <-__dequeue_entity
          <idle>-0     [00] 28991519114175759 [+  125]: rb_erase <-__dequeue_entity
          <idle>-0     [00] 28991519114175884 [+  187]: hrtick_start_fair <-pick_next_task_fair
          <idle>-0     [00] 28991519114176071 [+  187]: _spin_trylock <-tracing_record_cmdline
            perl-2581  [00] 28991519114176259 [+  125]: finish_task_switch <-schedule
            perl-2581  [00] 28991519114176384 [+   93]: account_vtime <-finish_task_switch
            perl-2581  [00] 28991519114176478 [+   93]: account_user_time <-account_vtime
            perl-2581  [00] 28991519114176571 [+   93]: acct_update_integrals <-account_user_time
            perl-2581  [00] 28991519114176665 [+  125]: account_system_time <-account_vtime
            perl-2581  [00] 28991519114176790 [+  156]: acct_update_integrals <-account_system_time
            perl-2581  [00] 28991519114176946 [+  125]: lock_sock_nested <-sk_wait_data
            perl-2581  [00] 28991519114177071 [+  125]: _spin_lock_bh <-lock_sock_nested
            perl-2581  [00] 28991519114177196 [+  125]: local_bh_disable <-_spin_lock_bh
            perl-2581  [00] 28991519114177321 [+  187]: local_bh_enable <-lock_sock_nested
            perl-2581  [00] 28991519114177509 [+  156]: finish_wait <-sk_wait_data
            perl-2581  [00] 28991519114177665 [+  125]: tcp_prequeue_process <-tcp_recvmsg
            perl-2581  [00] 28991519114177790 [+  156]: local_bh_disable <-tcp_prequeue_process
            perl-2581  [00] 28991519114177946 [+  125]: tcp_v4_do_rcv <-tcp_prequeue_process
            perl-2581  [00] 28991519114178071 [+  125]: tcp_rcv_established <-tcp_v4_do_rcv
            perl-2581  [00] 28991519114178196 [+  156]: local_bh_enable <-tcp_rcv_established
            perl-2581  [00] 28991519114178353 [+  125]: skb_copy_datagram_iovec <-tcp_rcv_established
            perl-2581  [00] 28991519114178478 [+  125]: memcpy_toiovec <-skb_copy_datagram_iovec
            perl-2581  [00] 28991519114178603 [+  125]: copy_to_user_mvcos_check <-memcpy_toiovec
            perl-2581  [00] 28991519114178728 [+  187]: copy_to_user_std <-copy_to_user_mvcos_check
            perl-2581  [00] 28991519114178915 [+   93]: tcp_rcv_space_adjust <-tcp_rcv_established
            perl-2581  [00] 28991519114179009 [+  156]: local_bh_disable <-tcp_rcv_established
            perl-2581  [00] 28991519114179165 [+  125]: get_seconds <-tcp_rcv_established
            perl-2581  [00] 28991519114179290 [+  156]: tcp_event_data_recv <-tcp_rcv_established
            perl-2581  [00] 28991519114179446 [+  187]: tcp_ack <-tcp_rcv_established
            perl-2581  [00] 28991519114179634 [+  156]: __kfree_skb <-tcp_ack
            perl-2581  [00] 28991519114179790 [+   93]: skb_release_all <-__kfree_skb
            perl-2581  [00] 28991519114179884 [+  125]: dst_release <-skb_release_all
            perl-2581  [00] 28991519114180009 [+  156]: skb_release_data <-skb_release_all
            perl-2581  [00] 28991519114180165 [+  187]: kfree <-skb_release_data
            perl-2581  [00] 28991519114180353 [+  187]: kmem_cache_free <-__kfree_skb
            perl-2581  [00] 28991519114180540 [+  125]: tcp_rtt_estimator <-tcp_ack
            perl-2581  [00] 28991519114180665 [+  156]: bictcp_acked <-tcp_ack
            perl-2581  [00] 28991519114180821 [+   93]: bictcp_cong_avoid <-tcp_ack
            perl-2581  [00] 28991519114180915 [+  156]: tcp_is_cwnd_limited <-bictcp_cong_avoid
            perl-2581  [00] 28991519114181071 [+  125]: tcp_current_mss <-tcp_rcv_established
            perl-2581  [00] 28991519114181196 [+  125]: tcp_established_options <-tcp_current_mss
            perl-2581  [00] 28991519114181321 [+  125]: __tcp_push_pending_frames <-tcp_rcv_established
            perl-2581  [00] 28991519114181446 [+  125]: __tcp_ack_snd_check <-tcp_rcv_established
            perl-2581  [00] 28991519114181571 [+  125]: tcp_send_delayed_ack <-__tcp_ack_snd_check
            perl-2581  [00] 28991519114181696 [+  125]: sk_reset_timer <-tcp_send_delayed_ack
            perl-2581  [00] 28991519114181821 [+  125]: mod_timer <-sk_reset_timer
            perl-2581  [00] 28991519114181946 [+  125]: __timer_stats_timer_set_start_info <-mod_timer
            perl-2581  [00] 28991519114182071 [+   93]: __mod_timer <-mod_timer
            perl-2581  [00] 28991519114182165 [+  156]: __timer_stats_timer_set_start_info <-__mod_timer
            perl-2581  [00] 28991519114182321 [+   93]: lock_timer_base <-__mod_timer
            perl-2581  [00] 28991519114182415 [+  156]: _spin_lock_irqsave <-lock_timer_base
            perl-2581  [00] 28991519114182571 [+   93]: internal_add_timer <-__mod_timer
            perl-2581  [00] 28991519114182665 [+  156]: _spin_unlock_irqrestore <-__mod_timer
            perl-2581  [00] 28991519114182821 [+   93]: __kfree_skb <-tcp_rcv_established
            perl-2581  [00] 28991519114182915 [+  125]: skb_release_all <-__kfree_skb
            perl-2581  [00] 28991519114183040 [+  125]: dst_release <-skb_release_all
            perl-2581  [00] 28991519114183165 [+  125]: skb_release_data <-skb_release_all
            perl-2581  [00] 28991519114183290 [+  187]: kfree <-skb_release_data
            perl-2581  [00] 28991519114183478 [+  187]: kmem_cache_free <-__kfree_skb
            perl-2581  [00] 28991519114183665 [+  187]: local_bh_enable <-tcp_prequeue_process
            perl-2581  [00] 28991519114183853 [+  125]: tcp_cleanup_rbuf <-tcp_recvmsg
            perl-2581  [00] 28991519114183978 [+  156]: __tcp_select_window <-tcp_cleanup_rbuf
            perl-2581  [00] 28991519114184134 [+  125]: release_sock <-tcp_recvmsg
            perl-2581  [00] 28991519114184259 [+   93]: _spin_lock_bh <-release_sock
            perl-2581  [00] 28991519114184353 [+  125]: local_bh_disable <-_spin_lock_bh
            perl-2581  [00] 28991519114184478 [+  156]: _spin_unlock_bh <-release_sock
            perl-2581  [00] 28991519114184634 [+  218]: local_bh_enable_ip <-_spin_unlock_bh
            perl-2581  [00] 28991519114184853 [+  156]: dnotify_parent <-vfs_read
            perl-2581  [00] 28991519114185009 [+  125]: _spin_lock <-dnotify_parent
            perl-2581  [00] 28991519114185134 [+  125]: inotify_dentry_parent_queue_event <-vfs_read
            perl-2581  [00] 28991519114185259 [+ 1687]: inotify_inode_queue_event <-vfs_read

And this is for 2.6.16:

# tracer: ftrace
#
#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
#              | |      |          |         |
           <...>-2804  [00] 28994586831213199 [+   93]: sys_write <-sysc_noemu
           <...>-2804  [00] 28994586831213292 [+  218]: fget_light <-sys_write
           <...>-2804  [00] 28994586831213511 [+  156]: apparmor_file_permission <-sys_write
           <...>-2804  [00] 28994586831213667 [+  125]: sock_aio_write <-sys_write
           <...>-2804  [00] 28994586831213792 [+  125]: dummy_socket_sendmsg <-sock_aio_write
           <...>-2804  [00] 28994586831213917 [+  125]: inet_sendmsg <-sock_aio_write
           <...>-2804  [00] 28994586831214042 [+   93]: tcp_sendmsg <-inet_sendmsg
           <...>-2804  [00] 28994586831214136 [+  125]: lock_sock <-tcp_sendmsg
           <...>-2804  [00] 28994586831214261 [+  125]: _spin_lock_bh <-lock_sock
           <...>-2804  [00] 28994586831214386 [+  125]: _spin_unlock_bh <-lock_sock
           <...>-2804  [00] 28994586831214511 [+  187]: local_bh_enable <-_spin_unlock_bh
           <...>-2804  [00] 28994586831214699 [+  156]: tcp_current_mss <-tcp_sendmsg
           <...>-2804  [00] 28994586831214855 [+  125]: __alloc_skb <-tcp_sendmsg
           <...>-2804  [00] 28994586831214980 [+  156]: kmem_cache_alloc <-__alloc_skb
           <...>-2804  [00] 28994586831215136 [+  218]: __kmalloc <-__alloc_skb
           <...>-2804  [00] 28994586831215355 [+  187]: memset <-__alloc_skb
           <...>-2804  [00] 28994586831215542 [+  125]: copy_from_user_mvcos_check <-tcp_sendmsg
           <...>-2804  [00] 28994586831215667 [+  218]: copy_from_user_std <-copy_from_user_mvcos_check
           <...>-2804  [00] 28994586831215886 [+  125]: __tcp_push_pending_frames <-tcp_sendmsg
           <...>-2804  [00] 28994586831216011 [+  125]: tcp_transmit_skb <-__tcp_push_pending_frames
           <...>-2804  [00] 28994586831216136 [+  156]: skb_clone <-tcp_transmit_skb
           <...>-2804  [00] 28994586831216292 [+  250]: memcpy <-skb_clone
           <...>-2804  [00] 28994586831216542 [+  156]: tcp_v4_send_check <-tcp_transmit_skb
           <...>-2804  [00] 28994586831216699 [+  187]: ip_queue_xmit <-tcp_transmit_skb
           <...>-2804  [00] 28994586831216886 [+  187]: ip_output <-ip_queue_xmit
           <...>-2804  [00] 28994586831217074 [+  125]: dev_queue_xmit <-ip_output
           <...>-2804  [00] 28994586831217199 [+  125]: _spin_lock <-dev_queue_xmit
           <...>-2804  [00] 28994586831217324 [+  125]: pfifo_fast_enqueue <-dev_queue_xmit
           <...>-2804  [00] 28994586831217449 [+  125]: __qdisc_run <-dev_queue_xmit
           <...>-2804  [00] 28994586831217574 [+  156]: pfifo_fast_dequeue <-__qdisc_run
           <...>-2804  [00] 28994586831217730 [+  125]: _spin_trylock <-__qdisc_run
           <...>-2804  [00] 28994586831217855 [+  156]: dev_hard_start_xmit <-__qdisc_run
           <...>-2804  [00] 28994586831218011 [+  250]: qeth_hard_start_xmit <-dev_hard_start_xmit
           <...>-2804  [00] 28994586831218261 [+  125]: qeth_fill_header <-qeth_hard_start_xmit
           <...>-2804  [00] 28994586831218386 [+  156]: memset <-qeth_fill_header
           <...>-2804  [00] 28994586831218542 [+  187]: memset <-qeth_fill_header
           <...>-2804  [00] 28994586831218730 [+  125]: skb_queue_tail <-qeth_hard_start_xmit
           <...>-2804  [00] 28994586831218855 [+  125]: _spin_lock_irqsave <-skb_queue_tail
           <...>-2804  [00] 28994586831218980 [+  156]: _spin_unlock_irqrestore <-skb_queue_tail
           <...>-2804  [00] 28994586831219136 [+  218]: qeth_flush_buffers <-qeth_hard_start_xmit
           <...>-2804  [00] 28994586831219355 [+  875]: do_QDIO <-qeth_flush_buffers
           <...>-2804  [00] 28994586831220230 [+   93]: mod_timer <-do_QDIO
           <...>-2804  [00] 28994586831220324 [+  187]: _spin_lock_irqsave <-mod_timer
           <...>-2804  [00] 28994586831220511 [+  187]: _spin_unlock_irqrestore <-mod_timer
           <...>-2804  [00] 28994586831220699 [+  125]: raise_softirq_irqoff <-qeth_hard_start_xmit
           <...>-2804  [00] 28994586831220824 [+  156]: _spin_lock <-__qdisc_run
           <...>-2804  [00] 28994586831220980 [+  125]: pfifo_fast_dequeue <-__qdisc_run
           <...>-2804  [00] 28994586831221105 [+  156]: local_bh_enable <-dev_queue_xmit
           <...>-2804  [00] 28994586831221261 [+  156]: do_softirq <-local_bh_enable
           <...>-2804  [00] 28994586831221417 [+   93]: account_system_vtime <-do_softirq
           <...>-2804  [00] 28994586831221511 [+   93]: account_system_time <-account_system_vtime
           <...>-2804  [00] 28994586831221605 [+  125]: acct_update_integrals <-account_system_time
           <...>-2804  [00] 28994586831221730 [+  156]: __do_softirq <-do_softirq
           <...>-2804  [00] 28994586831221886 [+  156]: net_tx_action <-__do_softirq
           <...>-2804  [00] 28994586831222042 [+  125]: _spin_trylock <-net_tx_action
           <...>-2804  [00] 28994586831222167 [+  125]: __qdisc_run <-net_tx_action
           <...>-2804  [00] 28994586831222292 [+  156]: pfifo_fast_dequeue <-__qdisc_run
           <...>-2804  [00] 28994586831222449 [+   93]: account_system_vtime <-do_softirq
           <...>-2804  [00] 28994586831222542 [+  125]: account_system_time <-account_system_vtime
           <...>-2804  [00] 28994586831222667 [+  156]: acct_update_integrals <-account_system_time
           <...>-2804  [00] 28994586831222824 [+  125]: sk_reset_timer <-__tcp_push_pending_frames
           <...>-2804  [00] 28994586831222949 [+  156]: mod_timer <-sk_reset_timer
           <...>-2804  [00] 28994586831223105 [+  125]: release_sock <-tcp_sendmsg
           <...>-2804  [00] 28994586831223230 [+   93]: _spin_lock_bh <-release_sock
           <...>-2804  [00] 28994586831223324 [+  156]: _spin_unlock_bh <-release_sock
           <...>-2804  [00] 28994586831223480 [+  187]: local_bh_enable <-_spin_unlock_bh
           <...>-2804  [00] 28994586831223667 [+   93]: dnotify_parent <-sys_write
           <...>-2804  [00] 28994586831223761 [+  125]: _spin_lock <-dnotify_parent
           <...>-2804  [00] 28994586831223886 [+  156]: inotify_dentry_parent_queue_event <-sys_write
           <...>-2804  [00] 28994586831224042 [+  812]: inotify_inode_queue_event <-sys_write
           <...>-2804  [00] 28994586831224855 [+   93]: sys_read <-sysc_noemu
           <...>-2804  [00] 28994586831224949 [+  187]: fget_light <-sys_read
           <...>-2804  [00] 28994586831225136 [+  125]: apparmor_file_permission <-sys_read
           <...>-2804  [00] 28994586831225261 [+  125]: sock_aio_read <-sys_read
           <...>-2804  [00] 28994586831225386 [+  156]: dummy_socket_recvmsg <-sock_aio_read
           <...>-2804  [00] 28994586831225542 [+   93]: sock_common_recvmsg <-sock_aio_read
           <...>-2804  [00] 28994586831225636 [+  125]: tcp_recvmsg <-sock_common_recvmsg
           <...>-2804  [00] 28994586831225761 [+  125]: lock_sock <-tcp_recvmsg
           <...>-2804  [00] 28994586831225886 [+  125]: _spin_lock_bh <-lock_sock
           <...>-2804  [00] 28994586831226011 [+  125]: _spin_unlock_bh <-lock_sock
           <...>-2804  [00] 28994586831226136 [+  250]: local_bh_enable <-_spin_unlock_bh
           <...>-2804  [00] 28994586831226386 [+  125]: sk_wait_data <-tcp_recvmsg
           <...>-2804  [00] 28994586831226511 [+  125]: prepare_to_wait <-sk_wait_data
           <...>-2804  [00] 28994586831226636 [+  156]: _spin_lock_irqsave <-prepare_to_wait
           <...>-2804  [00] 28994586831226792 [+  156]: _spin_unlock_irqrestore <-prepare_to_wait
           <...>-2804  [00] 28994586831226949 [+   93]: _spin_lock_bh <-sk_wait_data
           <...>-2804  [00] 28994586831227042 [+  125]: _spin_unlock_bh <-sk_wait_data
           <...>-2804  [00] 28994586831227167 [+  187]: local_bh_enable <-_spin_unlock_bh
           <...>-2804  [00] 28994586831227355 [+  125]: schedule_timeout <-sk_wait_data
           <...>-2804  [00] 28994586831227480 [+   93]: schedule <-schedule_timeout
           <...>-2804  [00] 28994586831227574 [+  156]: profile_hit <-schedule
           <...>-2804  [00] 28994586831227730 [+  156]: sched_clock <-schedule
           <...>-2804  [00] 28994586831227886 [+  281]: _spin_lock_irq <-schedule
          <idle>-0     [00] 28994586831228167 [+   93]: account_vtime <-schedule
          <idle>-0     [00] 28994586831228261 [+   93]: account_user_time <-account_vtime
          <idle>-0     [00] 28994586831228355 [+   93]: account_system_time <-account_vtime
          <idle>-0     [00] 28994586831228449 [+  218]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28994586831228667 [+   93]: notifier_call_chain <-cpu_idle
          <idle>-0     [00] 28994586831228761 [+   93]: nohz_idle_notify <-notifier_call_chain
          <idle>-0     [00] 28994586831228855 [+  156]: rcu_needs_cpu <-nohz_idle_notify
          <idle>-0     [00] 28994586831229011 [+   62]: next_timer_interrupt <-nohz_idle_notify
          <idle>-0     [00] 28994586831229074 [+   93]: hrtimer_get_next_event <-next_timer_interrupt
          <idle>-0     [00] 28994586831229167 [+  125]: _spin_lock_irqsave <-hrtimer_get_next_event
          <idle>-0     [00] 28994586831229292 [+   93]: _spin_unlock_irqrestore <-hrtimer_get_next_event
          <idle>-0     [00] 28994586831229386 [+   93]: _spin_lock_irqsave <-hrtimer_get_next_event
          <idle>-0     [00] 28994586831229480 [+   93]: _spin_unlock_irqrestore <-hrtimer_get_next_event
          <idle>-0     [00] 28994586831229574 [+  125]: ktime_get <-hrtimer_get_next_event
          <idle>-0     [00] 28994586831229699 [+   93]: getnstimeofday <-ktime_get
          <idle>-0     [00] 28994586831229792 [+  156]: do_gettimeofday <-getnstimeofday
          <idle>-0     [00] 28994586831229949 [+  156]: set_normalized_timespec <-ktime_get
          <idle>-0     [00] 28994586831230105 [+  125]: ns_to_timespec <-next_timer_interrupt
          <idle>-0     [00] 28994586831230230 [+  156]: _spin_lock <-next_timer_interrupt
          <idle>-0     [00] 28994586831230386 [+212406]: vtimer_idle_notify <-notifier_call_chain
          <idle>-0     [00] 28994586831442792 [+   93]: do_IRQ <-io_return
          <idle>-0     [00] 28994586831442886 [+   62]: account_system_vtime <-do_IRQ
          <idle>-0     [00] 28994586831442949 [+  125]: account_system_time <-account_system_vtime
          <idle>-0     [00] 28994586831443074 [+  250]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28994586831443324 [+   93]: do_monitor_call <-sysc_return
          <idle>-0     [00] 28994586831443417 [+   62]: notifier_call_chain <-do_monitor_call
          <idle>-0     [00] 28994586831443480 [+  156]: nohz_idle_notify <-notifier_call_chain
          <idle>-0     [00] 28994586831443636 [+  125]: vtimer_idle_notify <-notifier_call_chain
          <idle>-0     [00] 28994586831443761 [+  125]: do_adapter_IO <-do_IRQ
          <idle>-0     [00] 28994586831443886 [+  500]: tiqdio_thinint_handler <-do_adapter_IO
          <idle>-0     [00] 28994586831444386 [+  156]: qeth_qdio_output_handler <-tiqdio_thinint_handler
          <idle>-0     [00] 28994586831444542 [+   62]: skb_dequeue <-qeth_qdio_output_handler
          <idle>-0     [00] 28994586831444605 [+  125]: _spin_lock_irqsave <-skb_dequeue
          <idle>-0     [00] 28994586831444730 [+  125]: _spin_unlock_irqrestore <-skb_dequeue
          <idle>-0     [00] 28994586831444855 [+   93]: raise_softirq_irqoff <-qeth_qdio_output_handler
          <idle>-0     [00] 28994586831444949 [+   93]: skb_dequeue <-qeth_qdio_output_handler
          <idle>-0     [00] 28994586831445042 [+   93]: _spin_lock_irqsave <-skb_dequeue
          <idle>-0     [00] 28994586831445136 [+   93]: _spin_unlock_irqrestore <-skb_dequeue
          <idle>-0     [00] 28994586831445230 [+  250]: qeth_eddp_buf_release_contexts <-qeth_qdio_output_handler
          <idle>-0     [00] 28994586831445480 [+  125]: raise_softirq_irqoff <-qeth_qdio_output_handler
          <idle>-0     [00] 28994586831445605 [+   93]: del_timer <-tiqdio_thinint_handler
          <idle>-0     [00] 28994586831445699 [+  125]: _spin_lock_irqsave <-del_timer
          <idle>-0     [00] 28994586831445824 [+  218]: _spin_unlock_irqrestore <-del_timer
          <idle>-0     [00] 28994586831446042 [+  250]: qeth_qdio_input_handler <-tiqdio_thinint_handler
          <idle>-0     [00] 28994586831446292 [+   62]: __alloc_skb <-qeth_qdio_input_handler
          <idle>-0     [00] 28994586831446355 [+  125]: kmem_cache_alloc <-__alloc_skb
          <idle>-0     [00] 28994586831446480 [+  125]: __kmalloc <-__alloc_skb
          <idle>-0     [00] 28994586831446605 [+  156]: memset <-__alloc_skb
          <idle>-0     [00] 28994586831446761 [+  187]: memcpy <-qeth_qdio_input_handler
          <idle>-0     [00] 28994586831446949 [+  500]: netif_rx <-qeth_qdio_input_handler
          <idle>-0     [00] 28994586831447449 [+   93]: irq_exit <-do_IRQ
          <idle>-0     [00] 28994586831447542 [+   93]: account_system_vtime <-irq_exit
          <idle>-0     [00] 28994586831447636 [+   93]: account_system_time <-account_system_vtime
          <idle>-0     [00] 28994586831447730 [+   93]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28994586831447824 [+   93]: do_softirq <-irq_exit
          <idle>-0     [00] 28994586831447917 [+   62]: account_system_vtime <-do_softirq
          <idle>-0     [00] 28994586831447980 [+   93]: account_system_time <-account_system_vtime
          <idle>-0     [00] 28994586831448074 [+  125]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28994586831448199 [+   93]: __do_softirq <-do_softirq
          <idle>-0     [00] 28994586831448292 [+   93]: net_tx_action <-__do_softirq
          <idle>-0     [00] 28994586831448386 [+  125]: __kfree_skb <-net_tx_action
          <idle>-0     [00] 28994586831448511 [+  156]: sock_wfree <-__kfree_skb
          <idle>-0     [00] 28994586831448667 [+   93]: _spin_trylock <-net_tx_action
          <idle>-0     [00] 28994586831448761 [+   62]: __qdisc_run <-net_tx_action
          <idle>-0     [00] 28994586831448824 [+  156]: pfifo_fast_dequeue <-__qdisc_run
          <idle>-0     [00] 28994586831448980 [+   93]: net_rx_action <-__do_softirq
          <idle>-0     [00] 28994586831449074 [+  156]: process_backlog <-net_rx_action
          <idle>-0     [00] 28994586831449230 [+  156]: ip_rcv <-process_backlog
          <idle>-0     [00] 28994586831449386 [+  125]: ip_route_input <-ip_rcv
          <idle>-0     [00] 28994586831449511 [+  125]: ip_local_deliver <-ip_rcv
          <idle>-0     [00] 28994586831449636 [+  125]: tcp_v4_rcv <-ip_local_deliver
          <idle>-0     [00] 28994586831449761 [+  125]: _read_lock <-tcp_v4_rcv
          <idle>-0     [00] 28994586831449886 [+   93]: dummy_socket_sock_rcv_skb <-tcp_v4_rcv
          <idle>-0     [00] 28994586831449980 [+   93]: _spin_lock <-tcp_v4_rcv
          <idle>-0     [00] 28994586831450074 [+   93]: __wake_up <-tcp_v4_rcv
          <idle>-0     [00] 28994586831450167 [+   93]: _spin_lock_irqsave <-__wake_up
          <idle>-0     [00] 28994586831450261 [+   93]: autoremove_wake_function <-__wake_up
          <idle>-0     [00] 28994586831450355 [+   93]: default_wake_function <-autoremove_wake_function
          <idle>-0     [00] 28994586831450449 [+  125]: _spin_lock <-default_wake_function
          <idle>-0     [00] 28994586831450574 [+  187]: sched_clock <-default_wake_function
          <idle>-0     [00] 28994586831450761 [+  125]: resched_task <-default_wake_function
          <idle>-0     [00] 28994586831450886 [+   93]: _spin_unlock_irqrestore <-default_wake_function
          <idle>-0     [00] 28994586831450980 [+   93]: _spin_unlock_irqrestore <-__wake_up
          <idle>-0     [00] 28994586831451074 [+   93]: sk_reset_timer <-tcp_v4_rcv
          <idle>-0     [00] 28994586831451167 [+   93]: mod_timer <-sk_reset_timer
          <idle>-0     [00] 28994586831451261 [+  125]: _spin_lock_irqsave <-mod_timer
          <idle>-0     [00] 28994586831451386 [+  156]: _spin_unlock_irqrestore <-mod_timer
          <idle>-0     [00] 28994586831451542 [+   93]: account_system_vtime <-do_softirq
          <idle>-0     [00] 28994586831451636 [+   93]: account_system_time <-account_system_vtime
          <idle>-0     [00] 28994586831451730 [+  156]: acct_update_integrals <-account_system_time
          <idle>-0     [00] 28994586831451886 [+  125]: schedule <-cpu_idle
          <idle>-0     [00] 28994586831452011 [+   93]: profile_hit <-schedule
          <idle>-0     [00] 28994586831452105 [+  125]: sched_clock <-schedule
          <idle>-0     [00] 28994586831452230 [+  343]: _spin_lock_irq <-schedule
           <...>-2804  [00] 28994586831452574 [+   93]: account_vtime <-schedule
           <...>-2804  [00] 28994586831452667 [+   62]: account_user_time <-account_vtime
           <...>-2804  [00] 28994586831452730 [+  125]: account_system_time <-account_vtime
           <...>-2804  [00] 28994586831452855 [+  125]: acct_update_integrals <-account_system_time
           <...>-2804  [00] 28994586831452980 [+   93]: _spin_lock_bh <-sk_wait_data
           <...>-2804  [00] 28994586831453074 [+   93]: _spin_unlock_bh <-sk_wait_data
           <...>-2804  [00] 28994586831453167 [+   93]: local_bh_enable <-_spin_unlock_bh
           <...>-2804  [00] 28994586831453261 [+  125]: finish_wait <-sk_wait_data
           <...>-2804  [00] 28994586831453386 [+   93]: tcp_v4_do_rcv <-tcp_recvmsg
           <...>-2804  [00] 28994586831453480 [+   93]: tcp_rcv_established <-tcp_v4_do_rcv
           <...>-2804  [00] 28994586831453574 [+   93]: local_bh_enable <-tcp_rcv_established
           <...>-2804  [00] 28994586831453667 [+   93]: skb_copy_datagram_iovec <-tcp_rcv_established
           <...>-2804  [00] 28994586831453761 [+   93]: memcpy_toiovec <-skb_copy_datagram_iovec
           <...>-2804  [00] 28994586831453855 [+   62]: copy_to_user_mvcos_check <-memcpy_toiovec
           <...>-2804  [00] 28994586831453917 [+  187]: copy_to_user_std <-copy_to_user_mvcos_check
           <...>-2804  [00] 28994586831454105 [+  125]: tcp_ack <-tcp_rcv_established
           <...>-2804  [00] 28994586831454230 [+  125]: __kfree_skb <-tcp_ack
           <...>-2804  [00] 28994586831454355 [+  125]: kfree <-__kfree_skb
           <...>-2804  [00] 28994586831454480 [+  187]: kmem_cache_free <-__kfree_skb
           <...>-2804  [00] 28994586831454667 [+   93]: tcp_reno_cong_avoid <-tcp_ack
           <...>-2804  [00] 28994586831454761 [+   93]: tcp_current_mss <-tcp_rcv_established
           <...>-2804  [00] 28994586831454855 [+  156]: __tcp_push_pending_frames <-tcp_rcv_established
           <...>-2804  [00] 28994586831455011 [+   62]: tcp_send_delayed_ack <-tcp_rcv_established
           <...>-2804  [00] 28994586831455074 [+   93]: sk_reset_timer <-tcp_send_delayed_ack
           <...>-2804  [00] 28994586831455167 [+   93]: mod_timer <-sk_reset_timer
           <...>-2804  [00] 28994586831455261 [+   93]: _spin_lock_irqsave <-mod_timer
           <...>-2804  [00] 28994586831455355 [+   93]: _spin_unlock_irqrestore <-mod_timer
           <...>-2804  [00] 28994586831455449 [+  125]: __kfree_skb <-tcp_rcv_established
           <...>-2804  [00] 28994586831455574 [+   93]: kfree <-__kfree_skb
           <...>-2804  [00] 28994586831455667 [+  125]: kmem_cache_free <-__kfree_skb
           <...>-2804  [00] 28994586831455792 [+  156]: local_bh_enable <-tcp_recvmsg
           <...>-2804  [00] 28994586831455949 [+   93]: __tcp_select_window <-tcp_recvmsg
           <...>-2804  [00] 28994586831456042 [+   93]: release_sock <-tcp_recvmsg
           <...>-2804  [00] 28994586831456136 [+   93]: _spin_lock_bh <-release_sock
           <...>-2804  [00] 28994586831456230 [+   93]: _spin_unlock_bh <-release_sock
           <...>-2804  [00] 28994586831456324 [+  156]: local_bh_enable <-_spin_unlock_bh
           <...>-2804  [00] 28994586831456480 [+   93]: dnotify_parent <-sys_read
           <...>-2804  [00] 28994586831456574 [+  125]: _spin_lock <-dnotify_parent
           <...>-2804  [00] 28994586831456699 [+   62]: inotify_dentry_parent_queue_event <-sys_read
           <...>-2804  [00] 28994586831456761 [+ 1437]: inotify_inode_queue_event <-sys_read

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch 1/3] spinlock: move spinlock function bodies to header file
  2009-08-14 12:58 [patch 0/3] Allow inlined spinlocks again V4 Heiko Carstens
@ 2009-08-14 12:58 ` Heiko Carstens
  0 siblings, 0 replies; 22+ messages in thread
From: Heiko Carstens @ 2009-08-14 12:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Heiko Carstens, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt, Nick Piggin

[-- Attachment #1: 01_spinlock_move.diff --]
[-- Type: text/plain, Size: 17156 bytes --]

From: Heiko Carstens <heiko.carstens@de.ibm.com>

Move spinlock function bodies to header file by creating a
static inline version of each variant.
Use the inline version on the out-of-line code. This shouldn't
make any difference besides that the spinlock code can now
be used to generate inlined spinlock code.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 include/linux/spinlock.h         |   18 +-
 include/linux/spinlock_api_smp.h |  263 +++++++++++++++++++++++++++++++++++++++
 kernel/spinlock.c                |  174 ++++---------------------
 3 files changed, 300 insertions(+), 155 deletions(-)

Index: linux-2.6/include/linux/spinlock_api_smp.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock_api_smp.h
+++ linux-2.6/include/linux/spinlock_api_smp.h
@@ -60,4 +60,267 @@ void __lockfunc _read_unlock_irqrestore(
 void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 							__releases(lock);
 
+static inline int __spin_trylock(spinlock_t *lock)
+{
+	preempt_disable();
+	if (_raw_spin_trylock(lock)) {
+		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+		return 1;
+	}
+	preempt_enable();
+	return 0;
+}
+
+static inline int __read_trylock(rwlock_t *lock)
+{
+	preempt_disable();
+	if (_raw_read_trylock(lock)) {
+		rwlock_acquire_read(&lock->dep_map, 0, 1, _RET_IP_);
+		return 1;
+	}
+	preempt_enable();
+	return 0;
+}
+
+static inline int __write_trylock(rwlock_t *lock)
+{
+	preempt_disable();
+	if (_raw_write_trylock(lock)) {
+		rwlock_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+		return 1;
+	}
+	preempt_enable();
+	return 0;
+}
+
+/*
+ * If lockdep is enabled then we use the non-preemption spin-ops
+ * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
+ * not re-enabled during lock-acquire (which the preempt-spin-ops do):
+ */
+#if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)
+
+static inline void __read_lock(rwlock_t *lock)
+{
+	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+}
+
+static inline unsigned long __spin_lock_irqsave(spinlock_t *lock)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	/*
+	 * On lockdep we dont want the hand-coded irq-enable of
+	 * _raw_spin_lock_flags() code, because lockdep assumes
+	 * that interrupts are not re-enabled during lock-acquire:
+	 */
+#ifdef CONFIG_LOCKDEP
+	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+#else
+	_raw_spin_lock_flags(lock, &flags);
+#endif
+	return flags;
+}
+
+static inline void __spin_lock_irq(spinlock_t *lock)
+{
+	local_irq_disable();
+	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+}
+
+static inline void __spin_lock_bh(spinlock_t *lock)
+{
+	local_bh_disable();
+	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+}
+
+static inline unsigned long __read_lock_irqsave(rwlock_t *lock)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED_FLAGS(lock, _raw_read_trylock, _raw_read_lock,
+			     _raw_read_lock_flags, &flags);
+	return flags;
+}
+
+static inline void __read_lock_irq(rwlock_t *lock)
+{
+	local_irq_disable();
+	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+}
+
+static inline void __read_lock_bh(rwlock_t *lock)
+{
+	local_bh_disable();
+	preempt_disable();
+	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+}
+
+static inline unsigned long __write_lock_irqsave(rwlock_t *lock)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED_FLAGS(lock, _raw_write_trylock, _raw_write_lock,
+			     _raw_write_lock_flags, &flags);
+	return flags;
+}
+
+static inline void __write_lock_irq(rwlock_t *lock)
+{
+	local_irq_disable();
+	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+}
+
+static inline void __write_lock_bh(rwlock_t *lock)
+{
+	local_bh_disable();
+	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+}
+
+static inline void __spin_lock(spinlock_t *lock)
+{
+	preempt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+}
+
+static inline void __write_lock(rwlock_t *lock)
+{
+	preempt_disable();
+	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+}
+
+#endif /* CONFIG_PREEMPT */
+
+static inline void __spin_unlock(spinlock_t *lock)
+{
+	spin_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_spin_unlock(lock);
+	preempt_enable();
+}
+
+static inline void __write_unlock(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_write_unlock(lock);
+	preempt_enable();
+}
+
+static inline void __read_unlock(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_read_unlock(lock);
+	preempt_enable();
+}
+
+static inline void __spin_unlock_irqrestore(spinlock_t *lock,
+					    unsigned long flags)
+{
+	spin_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_spin_unlock(lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+static inline void __spin_unlock_irq(spinlock_t *lock)
+{
+	spin_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_spin_unlock(lock);
+	local_irq_enable();
+	preempt_enable();
+}
+
+static inline void __spin_unlock_bh(spinlock_t *lock)
+{
+	spin_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_spin_unlock(lock);
+	preempt_enable_no_resched();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
+
+static inline void __read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_read_unlock(lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+static inline void __read_unlock_irq(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_read_unlock(lock);
+	local_irq_enable();
+	preempt_enable();
+}
+
+static inline void __read_unlock_bh(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_read_unlock(lock);
+	preempt_enable_no_resched();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
+
+static inline void __write_unlock_irqrestore(rwlock_t *lock,
+					     unsigned long flags)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_write_unlock(lock);
+	local_irq_restore(flags);
+	preempt_enable();
+}
+
+static inline void __write_unlock_irq(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_write_unlock(lock);
+	local_irq_enable();
+	preempt_enable();
+}
+
+static inline void __write_unlock_bh(rwlock_t *lock)
+{
+	rwlock_release(&lock->dep_map, 1, _RET_IP_);
+	_raw_write_unlock(lock);
+	preempt_enable_no_resched();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+}
+
+static inline int __spin_trylock_bh(spinlock_t *lock)
+{
+	local_bh_disable();
+	preempt_disable();
+	if (_raw_spin_trylock(lock)) {
+		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+		return 1;
+	}
+	preempt_enable_no_resched();
+	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+	return 0;
+}
+
 #endif /* __LINUX_SPINLOCK_API_SMP_H */
Index: linux-2.6/include/linux/spinlock.h
===================================================================
--- linux-2.6.orig/include/linux/spinlock.h
+++ linux-2.6/include/linux/spinlock.h
@@ -143,15 +143,6 @@ static inline void smp_mb__after_lock(vo
  */
 #define spin_unlock_wait(lock)	__raw_spin_unlock_wait(&(lock)->raw_lock)
 
-/*
- * Pull the _spin_*()/_read_*()/_write_*() functions/declarations:
- */
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
-# include <linux/spinlock_api_smp.h>
-#else
-# include <linux/spinlock_api_up.h>
-#endif
-
 #ifdef CONFIG_DEBUG_SPINLOCK
  extern void _raw_spin_lock(spinlock_t *lock);
 #define _raw_spin_lock_flags(lock, flags) _raw_spin_lock(lock)
@@ -380,4 +371,13 @@ extern int _atomic_dec_and_lock(atomic_t
  */
 #define spin_can_lock(lock)	(!spin_is_locked(lock))
 
+/*
+ * Pull the _spin_*()/_read_*()/_write_*() functions/declarations:
+ */
+#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
+# include <linux/spinlock_api_smp.h>
+#else
+# include <linux/spinlock_api_up.h>
+#endif
+
 #endif /* __LINUX_SPINLOCK_H */
Index: linux-2.6/kernel/spinlock.c
===================================================================
--- linux-2.6.orig/kernel/spinlock.c
+++ linux-2.6/kernel/spinlock.c
@@ -23,40 +23,19 @@
 
 int __lockfunc _spin_trylock(spinlock_t *lock)
 {
-	preempt_disable();
-	if (_raw_spin_trylock(lock)) {
-		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
-		return 1;
-	}
-	
-	preempt_enable();
-	return 0;
+	return __spin_trylock(lock);
 }
 EXPORT_SYMBOL(_spin_trylock);
 
 int __lockfunc _read_trylock(rwlock_t *lock)
 {
-	preempt_disable();
-	if (_raw_read_trylock(lock)) {
-		rwlock_acquire_read(&lock->dep_map, 0, 1, _RET_IP_);
-		return 1;
-	}
-
-	preempt_enable();
-	return 0;
+	return __read_trylock(lock);
 }
 EXPORT_SYMBOL(_read_trylock);
 
 int __lockfunc _write_trylock(rwlock_t *lock)
 {
-	preempt_disable();
-	if (_raw_write_trylock(lock)) {
-		rwlock_acquire(&lock->dep_map, 0, 1, _RET_IP_);
-		return 1;
-	}
-
-	preempt_enable();
-	return 0;
+	return __write_trylock(lock);
 }
 EXPORT_SYMBOL(_write_trylock);
 
@@ -69,129 +48,74 @@ EXPORT_SYMBOL(_write_trylock);
 
 void __lockfunc _read_lock(rwlock_t *lock)
 {
-	preempt_disable();
-	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+	__read_lock(lock);
 }
 EXPORT_SYMBOL(_read_lock);
 
 unsigned long __lockfunc _spin_lock_irqsave(spinlock_t *lock)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
-	preempt_disable();
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	/*
-	 * On lockdep we dont want the hand-coded irq-enable of
-	 * _raw_spin_lock_flags() code, because lockdep assumes
-	 * that interrupts are not re-enabled during lock-acquire:
-	 */
-#ifdef CONFIG_LOCKDEP
-	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
-#else
-	_raw_spin_lock_flags(lock, &flags);
-#endif
-	return flags;
+	return __spin_lock_irqsave(lock);
 }
 EXPORT_SYMBOL(_spin_lock_irqsave);
 
 void __lockfunc _spin_lock_irq(spinlock_t *lock)
 {
-	local_irq_disable();
-	preempt_disable();
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+	__spin_lock_irq(lock);
 }
 EXPORT_SYMBOL(_spin_lock_irq);
 
 void __lockfunc _spin_lock_bh(spinlock_t *lock)
 {
-	local_bh_disable();
-	preempt_disable();
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+	__spin_lock_bh(lock);
 }
 EXPORT_SYMBOL(_spin_lock_bh);
 
 unsigned long __lockfunc _read_lock_irqsave(rwlock_t *lock)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
-	preempt_disable();
-	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED_FLAGS(lock, _raw_read_trylock, _raw_read_lock,
-			     _raw_read_lock_flags, &flags);
-	return flags;
+	return __read_lock_irqsave(lock);
 }
 EXPORT_SYMBOL(_read_lock_irqsave);
 
 void __lockfunc _read_lock_irq(rwlock_t *lock)
 {
-	local_irq_disable();
-	preempt_disable();
-	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+	__read_lock_irq(lock);
 }
 EXPORT_SYMBOL(_read_lock_irq);
 
 void __lockfunc _read_lock_bh(rwlock_t *lock)
 {
-	local_bh_disable();
-	preempt_disable();
-	rwlock_acquire_read(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_read_trylock, _raw_read_lock);
+	__read_lock_bh(lock);
 }
 EXPORT_SYMBOL(_read_lock_bh);
 
 unsigned long __lockfunc _write_lock_irqsave(rwlock_t *lock)
 {
-	unsigned long flags;
-
-	local_irq_save(flags);
-	preempt_disable();
-	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED_FLAGS(lock, _raw_write_trylock, _raw_write_lock,
-			     _raw_write_lock_flags, &flags);
-	return flags;
+	return __write_lock_irqsave(lock);
 }
 EXPORT_SYMBOL(_write_lock_irqsave);
 
 void __lockfunc _write_lock_irq(rwlock_t *lock)
 {
-	local_irq_disable();
-	preempt_disable();
-	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+	__write_lock_irq(lock);
 }
 EXPORT_SYMBOL(_write_lock_irq);
 
 void __lockfunc _write_lock_bh(rwlock_t *lock)
 {
-	local_bh_disable();
-	preempt_disable();
-	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+	__write_lock_bh(lock);
 }
 EXPORT_SYMBOL(_write_lock_bh);
 
 void __lockfunc _spin_lock(spinlock_t *lock)
 {
-	preempt_disable();
-	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
+	__spin_lock(lock);
 }
-
 EXPORT_SYMBOL(_spin_lock);
 
 void __lockfunc _write_lock(rwlock_t *lock)
 {
-	preempt_disable();
-	rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_);
-	LOCK_CONTENDED(lock, _raw_write_trylock, _raw_write_lock);
+	__write_lock(lock);
 }
-
 EXPORT_SYMBOL(_write_lock);
 
 #else /* CONFIG_PREEMPT: */
@@ -320,121 +244,79 @@ EXPORT_SYMBOL(_spin_lock_nest_lock);
 
 void __lockfunc _spin_unlock(spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_spin_unlock(lock);
-	preempt_enable();
+	__spin_unlock(lock);
 }
 EXPORT_SYMBOL(_spin_unlock);
 
 void __lockfunc _write_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_write_unlock(lock);
-	preempt_enable();
+	__write_unlock(lock);
 }
 EXPORT_SYMBOL(_write_unlock);
 
 void __lockfunc _read_unlock(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_read_unlock(lock);
-	preempt_enable();
+	__read_unlock(lock);
 }
 EXPORT_SYMBOL(_read_unlock);
 
 void __lockfunc _spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_spin_unlock(lock);
-	local_irq_restore(flags);
-	preempt_enable();
+	__spin_unlock_irqrestore(lock, flags);
 }
 EXPORT_SYMBOL(_spin_unlock_irqrestore);
 
 void __lockfunc _spin_unlock_irq(spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_spin_unlock(lock);
-	local_irq_enable();
-	preempt_enable();
+	__spin_unlock_irq(lock);
 }
 EXPORT_SYMBOL(_spin_unlock_irq);
 
 void __lockfunc _spin_unlock_bh(spinlock_t *lock)
 {
-	spin_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_spin_unlock(lock);
-	preempt_enable_no_resched();
-	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+	__spin_unlock_bh(lock);
 }
 EXPORT_SYMBOL(_spin_unlock_bh);
 
 void __lockfunc _read_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_read_unlock(lock);
-	local_irq_restore(flags);
-	preempt_enable();
+	__read_unlock_irqrestore(lock, flags);
 }
 EXPORT_SYMBOL(_read_unlock_irqrestore);
 
 void __lockfunc _read_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_read_unlock(lock);
-	local_irq_enable();
-	preempt_enable();
+	__read_unlock_irq(lock);
 }
 EXPORT_SYMBOL(_read_unlock_irq);
 
 void __lockfunc _read_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_read_unlock(lock);
-	preempt_enable_no_resched();
-	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+	__read_unlock_bh(lock);
 }
 EXPORT_SYMBOL(_read_unlock_bh);
 
 void __lockfunc _write_unlock_irqrestore(rwlock_t *lock, unsigned long flags)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_write_unlock(lock);
-	local_irq_restore(flags);
-	preempt_enable();
+	__write_unlock_irqrestore(lock, flags);
 }
 EXPORT_SYMBOL(_write_unlock_irqrestore);
 
 void __lockfunc _write_unlock_irq(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_write_unlock(lock);
-	local_irq_enable();
-	preempt_enable();
+	__write_unlock_irq(lock);
 }
 EXPORT_SYMBOL(_write_unlock_irq);
 
 void __lockfunc _write_unlock_bh(rwlock_t *lock)
 {
-	rwlock_release(&lock->dep_map, 1, _RET_IP_);
-	_raw_write_unlock(lock);
-	preempt_enable_no_resched();
-	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
+	__write_unlock_bh(lock);
 }
 EXPORT_SYMBOL(_write_unlock_bh);
 
 int __lockfunc _spin_trylock_bh(spinlock_t *lock)
 {
-	local_bh_disable();
-	preempt_disable();
-	if (_raw_spin_trylock(lock)) {
-		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
-		return 1;
-	}
-
-	preempt_enable_no_resched();
-	local_bh_enable_ip((unsigned long)__builtin_return_address(0));
-	return 0;
+	return __spin_trylock_bh(lock);
 }
 EXPORT_SYMBOL(_spin_trylock_bh);
 

-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-14 12:34   ` Heiko Carstens
@ 2009-08-14 16:04     ` Linus Torvalds
  2009-08-14 17:13       ` Heiko Carstens
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2009-08-14 16:04 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt, David S. Miller



On Fri, 14 Aug 2009, Heiko Carstens wrote:
> 
> All of this started when we compared a 2.6.27 based distro kernel
> to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu
> cycles than 2.6.16 does for the same workloads.
> 
> As an example: we took a simple one connection ping pong network load.
> (client sends packet to server and waits until packet from server
>  returns and game starts again). It uses more cpu cycles on 2.6.27.
> 
> Using ftrace we figured out that on 2.6.27 it takes more than 500 function
> calls on the client for a single roundtrip while on 2.6.16 it took only
> ~230 calls.

Hmm. But the spinlock part of this seems to not really have changed.

Removing everything but the actual callchain info, and then doing a "diff" 
between your two roundtrips, and they look very different, and basically 
none of the difference seems to be due to spinlocks.

It seems to be due to a number of different things, but the bulk of the 
new costs seem to be in networking (and to some degree scheduling). There 
are smaller differences elsewhere, but the networking code _really_ blows 
up.

I don't know how much of some of these are "real" kernel changes, and how 
much of it is less inlining, but spinlocks don't seem to be the issue. 
Here's a quick walk-through (some of these get repeated a couple of times 
in your traces)

 * Apparmor going through the generic security code and rw_verify_area:

	apparmor_file_permission <-sys_write

   becomes

	security_file_permission <-rw_verify_area
	apparmor_file_permission <-security_file_permission
	aa_file_permission <-apparmor_file_permission

 * the vfs write itself being deeper

	sock_aio_write <-sys_write

   becomes

	vfs_write <-SyS_write
	rw_verify_area <-vfs_write
	do_sync_write <-vfs_write
	sock_aio_write <-do_sync_write

 * the socket code does totally new apparmor security things that were not 
   in 2.6.16 at all:

	security_socket_sendmsg <-sock_aio_write
	apparmor_socket_sendmsg <-security_socket_sendmsg
	aa_revalidate_sk <-apparmor_socket_sendmsg

 * TCP doing more:

	__alloc_skb <-tcp_sendmsg
	..
	..
	ip_output <-ip_queue_xmit
	dev_queue_xmit <-ip_output

   becomes

	tcp_established_options <-tcp_current_mss
	sk_stream_alloc_skb <-tcp_sendmsg
	__alloc_skb <-sk_stream_alloc_skb
	..
	tcp_init_tso_segs <-__tcp_push_pending_frames
	tcp_set_skb_tso_segs <-tcp_init_tso_segs
	..
	__skb_clone <-skb_clone
	__copy_skb_header <-__skb_clone
	tcp_established_options <-tcp_transmit_skb
	skb_push <-tcp_transmit_skb
	__tcp_select_window <-tcp_transmit_skb
	tcp_options_write <-tcp_transmit_skb
	..
	__sk_dst_check <-ip_queue_xmit
	skb_push <-ip_queue_xmit
	ip_local_out <-ip_queue_xmit
	__ip_local_out <-ip_local_out
	ip_output <-ip_local_out
	ip_finish_output <-ip_output
	skb_push <-ip_finish_output
	dev_queue_xmit <-ip_finish_output
	local_bh_disable <-dev_queue_xmit

 * More TCP changes at tcp_ack time:

	kfree <-__kfree_skb
	kmem_cache_free <-__kfree_skb
	tcp_reno_cong_avoid <-tcp_ack
	tcp_current_mss <-tcp_rcv_established
	__tcp_push_pending_frames <-tcp_rcv_established
	tcp_send_delayed_ack <-tcp_rcv_established

   becomes

	skb_release_all <-__kfree_skb
	dst_release <-skb_release_all
	skb_release_data <-skb_release_all
	kfree <-skb_release_data
	kmem_cache_free <-__kfree_skb
	tcp_rtt_estimator <-tcp_ack
	bictcp_acked <-tcp_ack
	bictcp_cong_avoid <-tcp_ack
	tcp_is_cwnd_limited <-bictcp_cong_avoid
	tcp_current_mss <-tcp_rcv_established
	tcp_established_options <-tcp_current_mss
	__tcp_push_pending_frames <-tcp_rcv_established
	__tcp_ack_snd_check <-tcp_rcv_established
	tcp_send_delayed_ack <-__tcp_ack_snd_check

 * The network driver doing more:

	qeth_hard_start_xmit <-dev_hard_start_xmit
	qeth_fill_header <-qeth_hard_start_xmit
	memset <-qeth_fill_header
	memset <-qeth_fill_header
	skb_queue_tail <-qeth_hard_start_xmit
	..
	mod_timer <-do_QDIO
	_spin_lock_irqsave <-mod_timer
	_spin_unlock_irqrestore <-mod_timer
	raise_softirq_irqoff <-qeth_hard_start_xmit
	..

   becomes

	qeth_l3_hard_start_xmit <-dev_hard_start_xmit
	qeth_get_cast_type <-qeth_l3_hard_start_xmit
	qeth_get_priority_queue <-qeth_l3_hard_start_xmit
	skb_realloc_headroom <-qeth_l3_hard_start_xmit
	pskb_copy <-skb_realloc_headroom
	__alloc_skb <-pskb_copy
	kmem_cache_alloc <-__alloc_skb
	__kmalloc <-__alloc_skb
	skb_put <-pskb_copy
	copy_skb_header <-pskb_copy
	__copy_skb_header <-copy_skb_header
	skb_pull <-qeth_l3_hard_start_xmit
	skb_push <-qeth_l3_hard_start_xmit
	qeth_l3_fill_header <-qeth_l3_hard_start_xmit
	qeth_get_elements_no <-qeth_l3_hard_start_xmit
	qeth_do_send_packet <-qeth_l3_hard_start_xmit
	skb_queue_tail <-qeth_do_send_packet
	..
	qdio_perf_stat_inc <-do_QDIO
	qdio_kick_outbound_q <-do_QDIO
	qdio_perf_stat_inc <-qdio_kick_outbound_q
	__tasklet_schedule <-do_QDIO
	dev_kfree_skb_any <-qeth_l3_hard_start_xmit
	kfree_skb <-dev_kfree_skb_any
	__kfree_skb <-kfree_skb
	skb_release_all <-__kfree_skb
	dst_release <-skb_release_all
	sock_wfree <-skb_release_all
	skb_release_data <-skb_release_all
	__netif_schedule <-qeth_l3_hard_start_xmit
	raise_softirq_irqoff <-__netif_schedule
	..
	tasklet_action <-__do_softirq
	qdio_outbound_processing <-tasklet_action
	qdio_perf_stat_inc <-qdio_outbound_processing
	mod_timer <-qdio_outbound_processing

 * the timer code doing more:

	__timer_stats_timer_set_start_info <-mod_timer
	__mod_timer <-mod_timer
	__timer_stats_timer_set_start_info <-__mod_timer
	lock_timer_base <-__mod_timer
	_spin_lock_irqsave <-lock_timer_base
	internal_add_timer <-__mod_timer
	_spin_unlock_irqrestore <-__mod_timer
	qdio_perf_stat_inc <-qdio_outbound_processing
	account_system_vtime <-__do_softirq

 * the scheduler code doing way deeper chains:

	profile_hit <-schedule
	sched_clock <-schedule
	_spin_lock_irq <-schedule
	account_vtime <-schedule
	account_user_time <-account_vtime

   becomes

	_spin_lock <-schedule
	sched_clock_cpu <-schedule
	sched_clock <-sched_clock_cpu
	deactivate_task <-schedule
	dequeue_task <-deactivate_task
	dequeue_task_fair <-dequeue_task
	dequeue_entity <-dequeue_task_fair
	update_curr <-dequeue_entity
	hrtick_start_fair <-dequeue_task_fair
	put_prev_task_fair <-schedule
	pick_next_task_fair <-schedule
	pick_next_task_rt <-schedule
	pick_next_task_fair <-schedule
	pick_next_task_idle <-schedule
	_spin_trylock <-tracing_record_cmdline
	finish_task_switch <-schedule
	account_vtime <-finish_task_switch
	account_user_time <-account_vtime
	acct_update_integrals <-account_user_time
	jiffies_to_timeval <-acct_update_integrals

Hmm. Not pretty.

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-14 16:04     ` Linus Torvalds
@ 2009-08-14 17:13       ` Heiko Carstens
  2009-08-14 18:08         ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: Heiko Carstens @ 2009-08-14 17:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt, David S. Miller

On Fri, Aug 14, 2009 at 09:04:49AM -0700, Linus Torvalds wrote:
> On Fri, 14 Aug 2009, Heiko Carstens wrote:
> > 
> > All of this started when we compared a 2.6.27 based distro kernel
> > to a 2.6.16 based one. It turned out that 2.6.27 burns a lot more cpu
> > cycles than 2.6.16 does for the same workloads.
> > 
> > As an example: we took a simple one connection ping pong network load.
> > (client sends packet to server and waits until packet from server
> >  returns and game starts again). It uses more cpu cycles on 2.6.27.
> > 
> > Using ftrace we figured out that on 2.6.27 it takes more than 500 function
> > calls on the client for a single roundtrip while on 2.6.16 it took only
> > ~230 calls.
> 
> Hmm. But the spinlock part of this seems to not really have changed.
> 
> Removing everything but the actual callchain info, and then doing a "diff" 
> between your two roundtrips, and they look very different, and basically 
> none of the difference seems to be due to spinlocks.
> 
> It seems to be due to a number of different things, but the bulk of the 
> new costs seem to be in networking (and to some degree scheduling). There 
> are smaller differences elsewhere, but the networking code _really_ blows 
> up.
> 
> I don't know how much of some of these are "real" kernel changes, and how 
> much of it is less inlining, but spinlocks don't seem to be the issue. 
> Here's a quick walk-through (some of these get repeated a couple of times 
> in your traces)

That's true. I just wanted to give some background how all of this started.
Spinlocks are equally good or bad for s390 on both kernel versions.

However quite a few of the additional function calls in networking code
come from uninlining:

c2aa270a [NET]: uninline skb_push, de-bloats a lot
6be8ac2f [NET]: uninline skb_pull, de-bloats a lot
419ae74e [NET]: uninline skb_trim, de-bloats

Christian Ehrhardt identified these and more, but hasn't posted patches
yet. Reverting these patches does increase performance.

That's when I started wondering what we would gain when we would inline
spinlock code again.

>  * the timer code doing more:
> 
> 	__timer_stats_timer_set_start_info <-mod_timer
> 	__mod_timer <-mod_timer
> 	__timer_stats_timer_set_start_info <-__mod_timer
> 	lock_timer_base <-__mod_timer
> 	_spin_lock_irqsave <-lock_timer_base
> 	internal_add_timer <-__mod_timer
> 	_spin_unlock_irqrestore <-__mod_timer
> 	qdio_perf_stat_inc <-qdio_outbound_processing
> 	account_system_vtime <-__do_softirq

This should be better now:

507e1231 timer stats: Optimize by adding quick check to avoid function calls

We did also address our device driver with several patches.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-14 17:13       ` Heiko Carstens
@ 2009-08-14 18:08         ` Linus Torvalds
  2009-08-14 20:19           ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2009-08-14 18:08 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-arch,
	Martin Schwidefsky, Arnd Bergmann, Horst Hartmann,
	Christian Ehrhardt, David S. Miller



On Fri, 14 Aug 2009, Heiko Carstens wrote:
> 
> However quite a few of the additional function calls in networking code
> come from uninlining:
> 
> c2aa270a [NET]: uninline skb_push, de-bloats a lot
> 6be8ac2f [NET]: uninline skb_pull, de-bloats a lot
> 419ae74e [NET]: uninline skb_trim, de-bloats

Hmm. Those functions are big only because of the stupid debugging, which 
is almost certainly not worth it any more. I doubt people have seen enough 
skb under/over-flows in the last years to merit the code.

I suspect we should remove the silly skb_under/over_panic functions, and 
do

 - skb_put:
	-       if (unlikely(skb->tail > skb->end))
	-               skb_over_panic(skb, len, __builtin_return_address(0));
	+       BUG_ON(skb->tail > skb->end);

 - skb_push:
	-       if (unlikely(skb->data<skb->head))
	-               skb_under_panic(skb, len, __builtin_return_address(0));
	+       BUG_ON(skb->data < skb->head);

at which point it might be worthwhile to inline them again, because the 
footprint of a BUG_ON() is really fairly small.

If this is a big deal on s390, it might be worth it.

			Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-14 18:08         ` Linus Torvalds
@ 2009-08-14 20:19           ` David Miller
  2009-08-14 20:45             ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2009-08-14 20:19 UTC (permalink / raw)
  To: torvalds
  Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch,
	schwidefsky, arnd, horsth, ehrhardt

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 14 Aug 2009 11:08:16 -0700 (PDT)

> On Fri, 14 Aug 2009, Heiko Carstens wrote:
>> 
>> However quite a few of the additional function calls in networking code
>> come from uninlining:
>> 
>> c2aa270a [NET]: uninline skb_push, de-bloats a lot
>> 6be8ac2f [NET]: uninline skb_pull, de-bloats a lot
>> 419ae74e [NET]: uninline skb_trim, de-bloats
> 
> Hmm. Those functions are big only because of the stupid debugging, which 
> is almost certainly not worth it any more. I doubt people have seen enough 
> skb under/over-flows in the last years to merit the code.

We have found a few bugs in the past year or two because of
the check, but indeed not "a lot".

> I suspect we should remove the silly skb_under/over_panic functions, and 
> do
> 
>  - skb_put:
> 	-       if (unlikely(skb->tail > skb->end))
> 	-               skb_over_panic(skb, len, __builtin_return_address(0));
> 	+       BUG_ON(skb->tail > skb->end);
> 
>  - skb_push:
> 	-       if (unlikely(skb->data<skb->head))
> 	-               skb_under_panic(skb, len, __builtin_return_address(0));
> 	+       BUG_ON(skb->data < skb->head);
> 
> at which point it might be worthwhile to inline them again, because the 
> footprint of a BUG_ON() is really fairly small.

Maybe.  Although most people turn on verbose BUG's and that
expands to the same if not more code than what's there now
in these SKB inlines.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-14 20:19           ` David Miller
@ 2009-08-14 20:45             ` Linus Torvalds
  2009-08-14 21:10               ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2009-08-14 20:45 UTC (permalink / raw)
  To: David Miller
  Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch,
	schwidefsky, arnd, horsth, ehrhardt



On Fri, 14 Aug 2009, David Miller wrote:
> 
> Maybe.  Although most people turn on verbose BUG's and that
> expands to the same if not more code than what's there now
> in these SKB inlines.

But not inline in the code, though. So yeah, it has a memory footprint, 
but shouldn't have a cache footprint.

			Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-14 20:45             ` Linus Torvalds
@ 2009-08-14 21:10               ` Linus Torvalds
  2009-08-14 22:23                 ` David Miller
  2009-08-16 18:27                 ` Heiko Carstens
  0 siblings, 2 replies; 22+ messages in thread
From: Linus Torvalds @ 2009-08-14 21:10 UTC (permalink / raw)
  To: David Miller
  Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch,
	schwidefsky, arnd, horsth, ehrhardt



On Fri, 14 Aug 2009, Linus Torvalds wrote:
> 
> But not inline in the code, though. So yeah, it has a memory footprint, 
> but shouldn't have a cache footprint.

An example of this: stack usage. This is the code with BUG_ON():

	skb_push:
		push   %rbp
		mov    %esi,%eax
		add    %esi,0x68(%rdi)
		neg    %rax
		mov    %rsp,%rbp
		add    0xd8(%rdi),%rax
		mov    %rax,0xd8(%rdi)
		cmp    0xd0(%rdi),%rax
		jae    2f
		ud2a   
	1:	jmp    1b
	2:	leaveq 
		retq   

and note how in the code, we're just jumping over something like four
bytes ("ud2" plus that silly endless loop-jump just to make gcc happy
are both 2 bytes on x86[-64]). 

Here's the same code with that horrid skb_under_panic():

	skb_push:
		push   %rbp
		mov    %esi,%eax
		mov    %rsp,%rbp
		neg    %rax
		push   %rbx
		mov    %rdi,%rbx
		sub    $0x8,%rsp
		add    0xd8(%rdi),%rax
		add    %esi,0x68(%rdi)
		mov    %rax,0xd8(%rdi)
		cmp    0xd0(%rdi),%rax
		jae    1f
		mov    0x8(%rbp),%rdx
		callq  skb_under_panic
	1:	mov    0xd8(%rbx),%rax
		pop    %r11
		pop    %rbx
		leaveq 
		retq   

and notice how it creates a stack frame due to the call (even when the
call is never done!) and the dead space is now 11 bytes. 

(Total size of functions: 41 bytes vs 64 bytes)

So things like that do make a difference.  It's surprisingly expensive
to add a function call t what otherwise could be a leaf function.  And
BUG_ON() avoids that. 

Of course, even at just 41 bytes, it's not _entirely_ obvious that 
inlining is the right thing.  Of those 41 bytes, four are just function 
overhead, and the "call" is five bytes in the caller and a few bytes of 
spills and argument setups, but inlining is still probably going to change 
each call-site from ~10 bytes to ~35 bytes. 

		Linus

---
 include/linux/skbuff.h |    4 ---
 net/core/skbuff.c      |   51 +----------------------------------------------
 2 files changed, 2 insertions(+), 53 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f2c69a2..2af92b4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -483,10 +483,6 @@ extern int	       skb_cow_data(struct sk_buff *skb, int tailbits,
 extern int	       skb_pad(struct sk_buff *skb, int pad);
 #define dev_kfree_skb(a)	consume_skb(a)
 #define dev_consume_skb(a)	kfree_skb_clean(a)
-extern void	      skb_over_panic(struct sk_buff *skb, int len,
-				     void *here);
-extern void	      skb_under_panic(struct sk_buff *skb, int len,
-				      void *here);
 
 extern int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb,
 			int getfrag(void *from, char *to, int offset,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9e0597d..ae63473 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -103,51 +103,6 @@ static struct pipe_buf_operations sock_pipe_buf_ops = {
 	.get = sock_pipe_buf_get,
 };
 
-/*
- *	Keep out-of-line to prevent kernel bloat.
- *	__builtin_return_address is not used because it is not always
- *	reliable.
- */
-
-/**
- *	skb_over_panic	- 	private function
- *	@skb: buffer
- *	@sz: size
- *	@here: address
- *
- *	Out of line support code for skb_put(). Not user callable.
- */
-void skb_over_panic(struct sk_buff *skb, int sz, void *here)
-{
-	printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p "
-			  "data:%p tail:%#lx end:%#lx dev:%s\n",
-	       here, skb->len, sz, skb->head, skb->data,
-	       (unsigned long)skb->tail, (unsigned long)skb->end,
-	       skb->dev ? skb->dev->name : "<NULL>");
-	BUG();
-}
-EXPORT_SYMBOL(skb_over_panic);
-
-/**
- *	skb_under_panic	- 	private function
- *	@skb: buffer
- *	@sz: size
- *	@here: address
- *
- *	Out of line support code for skb_push(). Not user callable.
- */
-
-void skb_under_panic(struct sk_buff *skb, int sz, void *here)
-{
-	printk(KERN_EMERG "skb_under_panic: text:%p len:%d put:%d head:%p "
-			  "data:%p tail:%#lx end:%#lx dev:%s\n",
-	       here, skb->len, sz, skb->head, skb->data,
-	       (unsigned long)skb->tail, (unsigned long)skb->end,
-	       skb->dev ? skb->dev->name : "<NULL>");
-	BUG();
-}
-EXPORT_SYMBOL(skb_under_panic);
-
 /* 	Allocate a new skbuff. We do this ourselves so we can fill in a few
  *	'private' fields and also do memory statistics to find all the
  *	[BEEP] leaks.
@@ -1014,8 +969,7 @@ unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
 	SKB_LINEAR_ASSERT(skb);
 	skb->tail += len;
 	skb->len  += len;
-	if (unlikely(skb->tail > skb->end))
-		skb_over_panic(skb, len, __builtin_return_address(0));
+	BUG_ON(skb->tail > skb->end);
 	return tmp;
 }
 EXPORT_SYMBOL(skb_put);
@@ -1033,8 +987,7 @@ unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
 {
 	skb->data -= len;
 	skb->len  += len;
-	if (unlikely(skb->data<skb->head))
-		skb_under_panic(skb, len, __builtin_return_address(0));
+	BUG_ON(skb->data < skb->head);
 	return skb->data;
 }
 EXPORT_SYMBOL(skb_push);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-14 21:10               ` Linus Torvalds
@ 2009-08-14 22:23                 ` David Miller
  2009-08-16 18:27                 ` Heiko Carstens
  1 sibling, 0 replies; 22+ messages in thread
From: David Miller @ 2009-08-14 22:23 UTC (permalink / raw)
  To: torvalds
  Cc: heiko.carstens, akpm, a.p.zijlstra, mingo, linux-arch,
	schwidefsky, arnd, horsth, ehrhardt

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 14 Aug 2009 14:10:45 -0700 (PDT)

> On Fri, 14 Aug 2009, Linus Torvalds wrote:
>> 
>> But not inline in the code, though. So yeah, it has a memory footprint, 
>> but shouldn't have a cache footprint.
> 
> An example of this: stack usage. This is the code with BUG_ON():
 ...
> and note how in the code, we're just jumping over something like four
> bytes ("ud2" plus that silly endless loop-jump just to make gcc happy
> are both 2 bytes on x86[-64]). 
> 
> Here's the same code with that horrid skb_under_panic():

Yes on x86 it's better even with verbose debugging because
of the inlining of the debug message pointer etc.

I'll have to do something more intelligent than I currently do on
sparc64, and I had always been meaning to do that. :-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-14 21:10               ` Linus Torvalds
  2009-08-14 22:23                 ` David Miller
@ 2009-08-16 18:27                 ` Heiko Carstens
  2009-08-16 18:45                   ` Linus Torvalds
  2009-08-17 21:26                   ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar
  1 sibling, 2 replies; 22+ messages in thread
From: Heiko Carstens @ 2009-08-16 18:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky,
	arnd, horsth, ehrhardt

On Fri, Aug 14, 2009 at 02:10:45PM -0700, Linus Torvalds wrote:
> On Fri, 14 Aug 2009, Linus Torvalds wrote:
> > 
> > But not inline in the code, though. So yeah, it has a memory footprint, 
> > but shouldn't have a cache footprint.
> 
> An example of this: stack usage. This is the code with BUG_ON():
> 
> 	skb_push:
> 		push   %rbp
> 		mov    %esi,%eax
> 		add    %esi,0x68(%rdi)
> 		neg    %rax
> 		mov    %rsp,%rbp
> 		add    0xd8(%rdi),%rax
> 		mov    %rax,0xd8(%rdi)
> 		cmp    0xd0(%rdi),%rax
> 		jae    2f
> 		ud2a   
> 	1:	jmp    1b
> 	2:	leaveq 
> 		retq   

An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 results in
the following vmlinux sizes:

574152755 2009-08-16 20:05 vmlinux-orig
574862000 2009-08-16 20:13 vmlinux-skb-inline

The skb-inline variant was created by using your patch and on top of it the
patch below. Size increased by 692KB (0.12%).

Dunno.. is that acceptable?

---
 include/linux/skbuff.h |   35 +++++++++++++++++++++++++++++++++--
 net/core/skbuff.c      |   38 --------------------------------------
 2 files changed, 33 insertions(+), 40 deletions(-)

Index: linux-2.6/include/linux/skbuff.h
===================================================================
--- linux-2.6.orig/include/linux/skbuff.h
+++ linux-2.6/include/linux/skbuff.h
@@ -1116,7 +1116,6 @@ static inline void skb_set_tail_pointer(
 /*
  *	Add data to an sk_buff
  */
-extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
 static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
 {
 	unsigned char *tmp = skb_tail_pointer(skb);
@@ -1126,7 +1125,23 @@ static inline unsigned char *__skb_put(s
 	return tmp;
 }
 
-extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
+/**
+ *	skb_put - add data to a buffer
+ *	@skb: buffer to use
+ *	@len: amount of data to add
+ *
+ *	This function extends the used data area of the buffer. If this would
+ *	exceed the total buffer size the kernel will panic. A pointer to the
+ *	first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
+{
+	unsigned char *tmp = __skb_put(skb, len);
+
+	BUG_ON(skb->tail > skb->end);
+	return tmp;
+}
+
 static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
 {
 	skb->data -= len;
@@ -1134,6 +1149,22 @@ static inline unsigned char *__skb_push(
 	return skb->data;
 }
 
+/**
+ *	skb_push - add data to the start of a buffer
+ *	@skb: buffer to use
+ *	@len: amount of data to add
+ *
+ *	This function extends the used data area of the buffer at the buffer
+ *	start. If this would exceed the total buffer headroom the kernel will
+ *	panic. A pointer to the first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
+{
+	__skb_push(skb, len);
+	BUG_ON(skb->data < skb->head);
+	return skb->data;
+}
+
 extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
 static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
 {
Index: linux-2.6/net/core/skbuff.c
===================================================================
--- linux-2.6.orig/net/core/skbuff.c
+++ linux-2.6/net/core/skbuff.c
@@ -955,44 +955,6 @@ free_skb:
 EXPORT_SYMBOL(skb_pad);
 
 /**
- *	skb_put - add data to a buffer
- *	@skb: buffer to use
- *	@len: amount of data to add
- *
- *	This function extends the used data area of the buffer. If this would
- *	exceed the total buffer size the kernel will panic. A pointer to the
- *	first byte of the extra data is returned.
- */
-unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
-{
-	unsigned char *tmp = skb_tail_pointer(skb);
-	SKB_LINEAR_ASSERT(skb);
-	skb->tail += len;
-	skb->len  += len;
-	BUG_ON(skb->tail > skb->end);
-	return tmp;
-}
-EXPORT_SYMBOL(skb_put);
-
-/**
- *	skb_push - add data to the start of a buffer
- *	@skb: buffer to use
- *	@len: amount of data to add
- *
- *	This function extends the used data area of the buffer at the buffer
- *	start. If this would exceed the total buffer headroom the kernel will
- *	panic. A pointer to the first byte of the extra data is returned.
- */
-unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
-{
-	skb->data -= len;
-	skb->len  += len;
-	BUG_ON(skb->data < skb->head);
-	return skb->data;
-}
-EXPORT_SYMBOL(skb_push);
-
-/**
  *	skb_pull - remove data from the start of a buffer
  *	@skb: buffer to use
  *	@len: amount of data to remove

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-16 18:27                 ` Heiko Carstens
@ 2009-08-16 18:45                   ` Linus Torvalds
  2009-08-16 20:36                     ` Ingo Molnar
  2009-08-17 21:26                   ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar
  1 sibling, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2009-08-16 18:45 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: David Miller, akpm, a.p.zijlstra, mingo, linux-arch, schwidefsky,
	arnd, horsth, ehrhardt



On Sun, 16 Aug 2009, Heiko Carstens wrote:
> 
> An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 results in
> the following vmlinux sizes:
> 
> 574152755 2009-08-16 20:05 vmlinux-orig
> 574862000 2009-08-16 20:13 vmlinux-skb-inline

It's more relevant to look at just code size. BUG_ON() has more of an 
impact on data sizes, because it does something like 10-12 bytes of 
exception data for each bug-on. But that exception data is out-of-line and 
not interesting.

If the code size blows up a lot, that's worse than if the data size 
expands a bit.

			Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-16 18:45                   ` Linus Torvalds
@ 2009-08-16 20:36                     ` Ingo Molnar
  2009-08-17 10:26                       ` Heiko Carstens
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2009-08-16 20:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiko Carstens, David Miller, akpm, a.p.zijlstra, linux-arch,
	schwidefsky, arnd, horsth, ehrhardt


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, 16 Aug 2009, Heiko Carstens wrote:
> > 
> > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 
> > 4.3.2 results in the following vmlinux sizes:
> > 
> > 574152755 2009-08-16 20:05 vmlinux-orig
> > 574862000 2009-08-16 20:13 vmlinux-skb-inline
> 
> It's more relevant to look at just code size. BUG_ON() has more of 
> an impact on data sizes, because it does something like 10-12 
> bytes of exception data for each bug-on. But that exception data 
> is out-of-line and not interesting.
> 
> If the code size blows up a lot, that's worse than if the data 
> size expands a bit.

Also, allyesconfig isnt really relevant to inlining cost in general 
as it enables everything. The defconfigs are better (at least on 
x86) because they tend to be some distro-alike config - i.e. they 
are closer to what most Linux users will run in the end.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch 0/3] Allow inlined spinlocks again V3
  2009-08-16 20:36                     ` Ingo Molnar
@ 2009-08-17 10:26                       ` Heiko Carstens
  0 siblings, 0 replies; 22+ messages in thread
From: Heiko Carstens @ 2009-08-17 10:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, David Miller, akpm, a.p.zijlstra, linux-arch,
	schwidefsky, arnd, horsth, ehrhardt

On Sun, Aug 16, 2009 at 10:36:50PM +0200, Ingo Molnar wrote:
> > > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 
> > > 4.3.2 results in the following vmlinux sizes:
> > > 
> > > 574152755 2009-08-16 20:05 vmlinux-orig
> > > 574862000 2009-08-16 20:13 vmlinux-skb-inline
> > 
> > It's more relevant to look at just code size. BUG_ON() has more of 
> > an impact on data sizes, because it does something like 10-12 
> > bytes of exception data for each bug-on. But that exception data 
> > is out-of-line and not interesting.
> > 
> > If the code size blows up a lot, that's worse than if the data 
> > size expands a bit.
> 
> Also, allyesconfig isnt really relevant to inlining cost in general 
> as it enables everything. The defconfigs are better (at least on 
> x86) because they tend to be some distro-alike config - i.e. they 
> are closer to what most Linux users will run in the end.

Ok, I'm still using allyesconfig but still these numbers should be
good enough:

For the unpatched kernel we have

ffffffff81009000 T _stext
ffffffff83614130 T _etext
size: 39891248

ffffffff8460e250 R __start___bug_table
ffffffff8462e9ec R __stop___bug_table
size: 133020

And for the patched kernel (inlined functions) we get

ffffffff81009000 T _stext
ffffffff83627250 T _etext
size: 39969360

ffffffff84621080 R __start___bug_table
ffffffff84649c4c R __stop___bug_table
size: 166860

We have CONFIG_GENERIC_BUG, CONFIG_GENERIC_BUG_RELATIVE_POINTERS and
CONFIG_DEBUG_BUGVERBOSE all enabled. So the size of each bug table entry
is 10 bytes.
We end up with 3384 additional bug table entries caused by inlining.
Text size increased by 78112 bytes which makes an average extra 23 bytes
for each call site.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [numbers] Re: [patch] more skb ops inlining
  2009-08-16 18:27                 ` Heiko Carstens
  2009-08-16 18:45                   ` Linus Torvalds
@ 2009-08-17 21:26                   ` Ingo Molnar
  2009-08-18 11:34                     ` Heiko Carstens
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2009-08-17 21:26 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Linus Torvalds, David Miller, akpm, a.p.zijlstra, linux-arch,
	schwidefsky, arnd, horsth, ehrhardt


* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 
> results in the following vmlinux sizes:
> 
> 574152755 2009-08-16 20:05 vmlinux-orig
> 574862000 2009-08-16 20:13 vmlinux-skb-inline
> 
> The skb-inline variant was created by using your patch and on top 
> of it the patch below. Size increased by 692KB (0.12%).
> 
> Dunno.. is that acceptable?

i ported your patch to latest mainline (see it attached below) i've 
done x86(-64) defconfig (with disabled CONFIG_OPTIMIZE_INLINING) 
measurements vmlinux:

   text	   data	    bss	     dec	    hex	filename
8455329	1192304	 989908	10637541	 a250e5	vmlinux.before
8471036	1192304	 989908	10653248	 a28e40	vmlinux.after

that's a +0.18% text size increase - quite significant.

I also did performance measurements, trying to figure out whether 
this increase in size can be measured via any runtime performance 
metrics such as cycles, instructions or cache-misses.

To do that i picked a micro-benchmarks lat_tcp.

( Note that microbenchmarks are biased in favor of inlining patches
  like yours, because they disproportionately favor the speed
  benefit of it while not adequately showing the cache footprint
  disadvantage of increased kernel text size. )

Without your patch i got:

titan:~/l> perf stat --repeat 10 -e task-clock -e cycles -e instructions -e L1-icache-refs -e L1-icache-misses ./lat_tcp localhost
TCP latency using localhost: 17.5198 microseconds
TCP latency using localhost: 17.5060 microseconds
TCP latency using localhost: 17.4684 microseconds
TCP latency using localhost: 17.5416 microseconds
TCP latency using localhost: 17.4881 microseconds
TCP latency using localhost: 17.4832 microseconds
TCP latency using localhost: 17.4830 microseconds
TCP latency using localhost: 17.4723 microseconds
TCP latency using localhost: 17.4733 microseconds
TCP latency using localhost: 17.5255 microseconds

 Performance counter stats for './lat_tcp localhost' (10 runs):

    1494.625043  task-clock-msecs         #      0.564 CPUs    ( +-   0.077% )
     3665868023  cycles                   #   2452.701 M/sec   ( +-   0.070% )
     2163212606  instructions             #      0.590 IPC     ( +-   0.090% )
     2449056216  L1-icache-loads          #   1638.576 M/sec   ( +-   0.056% )
       61081182  L1-icache-load-misses    #     40.867 M/sec   ( +-   0.192% )

    2.649159316  seconds time elapsed   ( +-   0.090% )

With your patch i got:

titan:~/l> perf stat --repeat 10 -e task-clock -e cycles -e instructions -e L1-icache-refs -e L1-icache-misses ./lat_tcp localhost
TCP latency using localhost: 17.6913 microseconds
TCP latency using localhost: 17.7218 microseconds
TCP latency using localhost: 17.6882 microseconds
TCP latency using localhost: 17.7196 microseconds
TCP latency using localhost: 17.6411 microseconds
TCP latency using localhost: 17.7136 microseconds
TCP latency using localhost: 17.6488 microseconds
TCP latency using localhost: 17.6663 microseconds
TCP latency using localhost: 17.7445 microseconds
TCP latency using localhost: 17.6851 microseconds

 Performance counter stats for './lat_tcp localhost' (10 runs):

    1497.479984  task-clock-msecs         #      0.564 CPUs    ( +-   0.106% )
     3690097505  cycles                   #   2464.205 M/sec   ( +-   0.109% )
     2162432402  instructions             #      0.586 IPC     ( +-   0.127% )
     2437974192  L1-icache-loads          #   1628.051 M/sec   ( +-   0.105% )
       63064588  L1-icache-load-misses    #     42.114 M/sec   ( +-   0.278% )

    2.656952976  seconds time elapsed   ( +-   0.110% )

Summarized (for lat_tcp):

|  Before:
|
|    1494.625043  task-clock-msecs         #      0.564 CPUs    ( +-   0.077% )
|     3665868023  cycles                   #   2452.701 M/sec   ( +-   0.070% )
|     2163212606  instructions             #      0.590 IPC     ( +-   0.090% )
|
|  After:
|
|    1497.479984  task-clock-msecs         #      0.564 CPUs    ( +-   0.106% )
|     3690097505  cycles                   #   2464.205 M/sec   ( +-   0.109% )
|     2162432402  instructions             #      0.586 IPC     ( +-   0.127% )

So we've got a 0.2% slowdown, which is provably outside the noise 
level of 0.1%. The instruction count remained almost the same.

The reason for the slowdown can be seen in the L1 icache-fetch 
stats:

     2449056216  L1-icache-loads          #   1638.576 M/sec   ( +-   0.056% )
       61081182  L1-icache-load-misses    #     40.867 M/sec   ( +-   0.192% )

     2437974192  L1-icache-loads          #   1628.051 M/sec   ( +-   0.105% )
       63064588  L1-icache-load-misses    #     42.114 M/sec   ( +-   0.278% )

The icache-misses went up from 40.867 M/sec to 42.114 M/sec, a 
3.05% worsening - well beyond the statistical significance of 
0.28%.

So unless i botched the measurements or some accidental code layout 
differences punish the patch artificially, this patch is a small 
but measureable performance regression on x86.

The slowdown itself is barely provable with a test repeat count of 
10 iterations (not a surprise really - such patches were not really 
provable before via /usr/bin/time kinds of measurements), on an 
otherwise totally idle system.

The icache-miss stats do support the notion that what is causing 
the slowdown is the critical path fallout out of the icache.

( By all means please repeat these measurements - they are very
  easy to do via 'perf stat --repeat 10' - see the examples
  above. )

To build the kernel i used gcc 4.3.2. The CPU i used is friendly to 
inlining patches as well: it's an Extreme Edition Core2 Duo (booted 
to a single core, to stabilize the numbers):

  Intel(R) Core(TM)2 CPU E6800  @ 2.93GHz

With 4MB of L2 cache. I suspect smaller CPUs would feel the 
negative effects of inlining in a more pronounced way.

Thanks,

	Ingo

---
 include/linux/skbuff.h |   37 +++++++++++++++++++++++++++++++++++--
 net/core/skbuff.c      |   40 ----------------------------------------
 2 files changed, 35 insertions(+), 42 deletions(-)

Index: linux/include/linux/skbuff.h
===================================================================
--- linux.orig/include/linux/skbuff.h
+++ linux/include/linux/skbuff.h
@@ -1120,7 +1120,6 @@ static inline void skb_set_tail_pointer(
 /*
  *	Add data to an sk_buff
  */
-extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
 static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
 {
 	unsigned char *tmp = skb_tail_pointer(skb);
@@ -1130,7 +1129,24 @@ static inline unsigned char *__skb_put(s
 	return tmp;
 }
 
-extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
+/**
+ *	skb_put - add data to a buffer
+ *	@skb: buffer to use
+ *	@len: amount of data to add
+ *
+ *	This function extends the used data area of the buffer. If this would
+ *	exceed the total buffer size the kernel will panic. A pointer to the
+ *	first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
+{
+	unsigned char *tmp = __skb_put(skb, len);
+
+	if (unlikely(skb->tail > skb->end))
+		skb_over_panic(skb, len, __builtin_return_address(0));
+	return tmp;
+}
+
 static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
 {
 	skb->data -= len;
@@ -1138,6 +1154,23 @@ static inline unsigned char *__skb_push(
 	return skb->data;
 }
 
+/**
+ *	skb_push - add data to the start of a buffer
+ *	@skb: buffer to use
+ *	@len: amount of data to add
+ *
+ *	This function extends the used data area of the buffer at the buffer
+ *	start. If this would exceed the total buffer headroom the kernel will
+ *	panic. A pointer to the first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
+{
+	__skb_push(skb, len);
+	if (unlikely(skb->data<skb->head))
+		skb_under_panic(skb, len, __builtin_return_address(0));
+	return skb->data;
+}
+
 extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
 static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
 {
Index: linux/net/core/skbuff.c
===================================================================
--- linux.orig/net/core/skbuff.c
+++ linux/net/core/skbuff.c
@@ -1000,46 +1000,6 @@ free_skb:
 EXPORT_SYMBOL(skb_pad);
 
 /**
- *	skb_put - add data to a buffer
- *	@skb: buffer to use
- *	@len: amount of data to add
- *
- *	This function extends the used data area of the buffer. If this would
- *	exceed the total buffer size the kernel will panic. A pointer to the
- *	first byte of the extra data is returned.
- */
-unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
-{
-	unsigned char *tmp = skb_tail_pointer(skb);
-	SKB_LINEAR_ASSERT(skb);
-	skb->tail += len;
-	skb->len  += len;
-	if (unlikely(skb->tail > skb->end))
-		skb_over_panic(skb, len, __builtin_return_address(0));
-	return tmp;
-}
-EXPORT_SYMBOL(skb_put);
-
-/**
- *	skb_push - add data to the start of a buffer
- *	@skb: buffer to use
- *	@len: amount of data to add
- *
- *	This function extends the used data area of the buffer at the buffer
- *	start. If this would exceed the total buffer headroom the kernel will
- *	panic. A pointer to the first byte of the extra data is returned.
- */
-unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
-{
-	skb->data -= len;
-	skb->len  += len;
-	if (unlikely(skb->data<skb->head))
-		skb_under_panic(skb, len, __builtin_return_address(0));
-	return skb->data;
-}
-EXPORT_SYMBOL(skb_push);
-
-/**
  *	skb_pull - remove data from the start of a buffer
  *	@skb: buffer to use
  *	@len: amount of data to remove

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [numbers] Re: [patch] more skb ops inlining
  2009-08-17 21:26                   ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar
@ 2009-08-18 11:34                     ` Heiko Carstens
  0 siblings, 0 replies; 22+ messages in thread
From: Heiko Carstens @ 2009-08-18 11:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, David Miller, akpm, a.p.zijlstra, linux-arch,
	schwidefsky, arnd, horsth, ehrhardt

On Mon, Aug 17, 2009 at 11:26:29PM +0200, Ingo Molnar wrote:
> * Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> 
> > An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2 
> > results in the following vmlinux sizes:
> > 
> > 574152755 2009-08-16 20:05 vmlinux-orig
> > 574862000 2009-08-16 20:13 vmlinux-skb-inline
> > 
> > The skb-inline variant was created by using your patch and on top 
> > of it the patch below. Size increased by 692KB (0.12%).
> > 
> > Dunno.. is that acceptable?
> 
> i ported your patch to latest mainline (see it attached below) i've 
> done x86(-64) defconfig (with disabled CONFIG_OPTIMIZE_INLINING) 
> measurements vmlinux:

> +static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
> +{
> +	unsigned char *tmp = __skb_put(skb, len);
> +
> +	if (unlikely(skb->tail > skb->end))
> +		skb_over_panic(skb, len, __builtin_return_address(0));
> +	return tmp;
> +}

No, the proposed patch replaces skb_over_panic() with BUG_ON() (see below).
With that I get slightly different results. However that's on my Thinkpad
booted with maxcpus=1 and a 32 bit kernel. Kernel is latest git as of today.

Now, this is a completely different machine than yours. With a different
config, so the machine actually boots. But the numbers are strange.

Part of /proc/cpuinfo :

model name      : Intel(R) Core(TM)2 CPU         T7600  @ 2.33GHz
stepping        : 6
cache size      : 4096 KB

Without the patch the result is:

TCP latency using localhost: 0.2576 microseconds
TCP latency using localhost: 0.2571 microseconds
TCP latency using localhost: 0.2579 microseconds
TCP latency using localhost: 0.2573 microseconds
TCP latency using localhost: 0.2574 microseconds
TCP latency using localhost: 0.2583 microseconds
TCP latency using localhost: 0.2572 microseconds
TCP latency using localhost: 0.2582 microseconds
TCP latency using localhost: 0.2583 microseconds
TCP latency using localhost: 0.2584 microseconds

 Performance counter stats for './lat_tcp localhost' (10 runs):

   24578.750193  task-clock-msecs         #      0.994 CPUs    ( +-   0.295% )
    57166815217  cycles                   #   2325.863 M/sec   ( +-   0.296% )
    24994309721  instructions             #      0.437 IPC     ( +-   0.288% )
    53691289530  L1-icache-loads          #   2184.460 M/sec   ( +-   0.304% )
         424027  L1-icache-load-misses    #      0.017 M/sec   ( +-   0.824% )

   24.714691813  seconds time elapsed   ( +-   0.285% )


With the patch I get this:

TCP latency using localhost: 0.2588 microseconds
TCP latency using localhost: 0.2589 microseconds
TCP latency using localhost: 0.2581 microseconds
TCP latency using localhost: 0.2586 microseconds
TCP latency using localhost: 0.2584 microseconds
TCP latency using localhost: 0.2588 microseconds
TCP latency using localhost: 0.2586 microseconds
TCP latency using localhost: 0.2589 microseconds
TCP latency using localhost: 0.2582 microseconds
TCP latency using localhost: 0.2586 microseconds

 Performance counter stats for './lat_tcp localhost' (10 runs):

   24464.554706  task-clock-msecs         #      0.995 CPUs    ( +-   0.063% )
    56903708182  cycles                   #   2325.965 M/sec   ( +-   0.063% )
    24831456567  instructions             #      0.436 IPC     ( +-   0.048% )
    53068943303  L1-icache-loads          #   2169.218 M/sec   ( +-   0.081% )
         418802  L1-icache-load-misses    #      0.017 M/sec   ( +-   0.637% )

   24.596311828  seconds time elapsed   ( +-   0.097% )

So latency increases while all other numbers decrease..

---
 include/linux/skbuff.h |   39 +++++++++++++++++++---
 net/core/skbuff.c      |   85 -------------------------------------------------
 2 files changed, 33 insertions(+), 91 deletions(-)

Index: linux-2.6/include/linux/skbuff.h
===================================================================
--- linux-2.6.orig/include/linux/skbuff.h
+++ linux-2.6/include/linux/skbuff.h
@@ -483,10 +483,6 @@ extern int	       skb_cow_data(struct sk
 extern int	       skb_pad(struct sk_buff *skb, int pad);
 #define dev_kfree_skb(a)	consume_skb(a)
 #define dev_consume_skb(a)	kfree_skb_clean(a)
-extern void	      skb_over_panic(struct sk_buff *skb, int len,
-				     void *here);
-extern void	      skb_under_panic(struct sk_buff *skb, int len,
-				      void *here);
 
 extern int skb_append_datato_frags(struct sock *sk, struct sk_buff *skb,
 			int getfrag(void *from, char *to, int offset,
@@ -1120,7 +1116,6 @@ static inline void skb_set_tail_pointer(
 /*
  *	Add data to an sk_buff
  */
-extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
 static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
 {
 	unsigned char *tmp = skb_tail_pointer(skb);
@@ -1130,7 +1125,23 @@ static inline unsigned char *__skb_put(s
 	return tmp;
 }
 
-extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
+/**
+ *	skb_put - add data to a buffer
+ *	@skb: buffer to use
+ *	@len: amount of data to add
+ *
+ *	This function extends the used data area of the buffer. If this would
+ *	exceed the total buffer size the kernel will panic. A pointer to the
+ *	first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
+{
+	unsigned char *tmp = __skb_put(skb, len);
+
+	BUG_ON(skb->tail > skb->end);
+	return tmp;
+}
+
 static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
 {
 	skb->data -= len;
@@ -1138,6 +1149,22 @@ static inline unsigned char *__skb_push(
 	return skb->data;
 }
 
+/**
+ *	skb_push - add data to the start of a buffer
+ *	@skb: buffer to use
+ *	@len: amount of data to add
+ *
+ *	This function extends the used data area of the buffer at the buffer
+ *	start. If this would exceed the total buffer headroom the kernel will
+ *	panic. A pointer to the first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
+{
+	__skb_push(skb, len);
+	BUG_ON(skb->data < skb->head);
+	return skb->data;
+}
+
 extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
 static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
 {
Index: linux-2.6/net/core/skbuff.c
===================================================================
--- linux-2.6.orig/net/core/skbuff.c
+++ linux-2.6/net/core/skbuff.c
@@ -103,51 +103,6 @@ static struct pipe_buf_operations sock_p
 	.get = sock_pipe_buf_get,
 };
 
-/*
- *	Keep out-of-line to prevent kernel bloat.
- *	__builtin_return_address is not used because it is not always
- *	reliable.
- */
-
-/**
- *	skb_over_panic	- 	private function
- *	@skb: buffer
- *	@sz: size
- *	@here: address
- *
- *	Out of line support code for skb_put(). Not user callable.
- */
-void skb_over_panic(struct sk_buff *skb, int sz, void *here)
-{
-	printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p "
-			  "data:%p tail:%#lx end:%#lx dev:%s\n",
-	       here, skb->len, sz, skb->head, skb->data,
-	       (unsigned long)skb->tail, (unsigned long)skb->end,
-	       skb->dev ? skb->dev->name : "<NULL>");
-	BUG();
-}
-EXPORT_SYMBOL(skb_over_panic);
-
-/**
- *	skb_under_panic	- 	private function
- *	@skb: buffer
- *	@sz: size
- *	@here: address
- *
- *	Out of line support code for skb_push(). Not user callable.
- */
-
-void skb_under_panic(struct sk_buff *skb, int sz, void *here)
-{
-	printk(KERN_EMERG "skb_under_panic: text:%p len:%d put:%d head:%p "
-			  "data:%p tail:%#lx end:%#lx dev:%s\n",
-	       here, skb->len, sz, skb->head, skb->data,
-	       (unsigned long)skb->tail, (unsigned long)skb->end,
-	       skb->dev ? skb->dev->name : "<NULL>");
-	BUG();
-}
-EXPORT_SYMBOL(skb_under_panic);
-
 /* 	Allocate a new skbuff. We do this ourselves so we can fill in a few
  *	'private' fields and also do memory statistics to find all the
  *	[BEEP] leaks.
@@ -1000,46 +955,6 @@ free_skb:
 EXPORT_SYMBOL(skb_pad);
 
 /**
- *	skb_put - add data to a buffer
- *	@skb: buffer to use
- *	@len: amount of data to add
- *
- *	This function extends the used data area of the buffer. If this would
- *	exceed the total buffer size the kernel will panic. A pointer to the
- *	first byte of the extra data is returned.
- */
-unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
-{
-	unsigned char *tmp = skb_tail_pointer(skb);
-	SKB_LINEAR_ASSERT(skb);
-	skb->tail += len;
-	skb->len  += len;
-	if (unlikely(skb->tail > skb->end))
-		skb_over_panic(skb, len, __builtin_return_address(0));
-	return tmp;
-}
-EXPORT_SYMBOL(skb_put);
-
-/**
- *	skb_push - add data to the start of a buffer
- *	@skb: buffer to use
- *	@len: amount of data to add
- *
- *	This function extends the used data area of the buffer at the buffer
- *	start. If this would exceed the total buffer headroom the kernel will
- *	panic. A pointer to the first byte of the extra data is returned.
- */
-unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
-{
-	skb->data -= len;
-	skb->len  += len;
-	if (unlikely(skb->data<skb->head))
-		skb_under_panic(skb, len, __builtin_return_address(0));
-	return skb->data;
-}
-EXPORT_SYMBOL(skb_push);
-
-/**
  *	skb_pull - remove data from the start of a buffer
  *	@skb: buffer to use
  *	@len: amount of data to remove

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-08-18 11:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens
2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens
2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
2009-08-13 18:34   ` Ingo Molnar
2009-08-13 18:43     ` Ingo Molnar
2009-08-14 12:34   ` Heiko Carstens
2009-08-14 16:04     ` Linus Torvalds
2009-08-14 17:13       ` Heiko Carstens
2009-08-14 18:08         ` Linus Torvalds
2009-08-14 20:19           ` David Miller
2009-08-14 20:45             ` Linus Torvalds
2009-08-14 21:10               ` Linus Torvalds
2009-08-14 22:23                 ` David Miller
2009-08-16 18:27                 ` Heiko Carstens
2009-08-16 18:45                   ` Linus Torvalds
2009-08-16 20:36                     ` Ingo Molnar
2009-08-17 10:26                       ` Heiko Carstens
2009-08-17 21:26                   ` [numbers] Re: [patch] more skb ops inlining Ingo Molnar
2009-08-18 11:34                     ` Heiko Carstens
  -- strict thread matches above, loose matches on Subject: below --
2009-08-14 12:58 [patch 0/3] Allow inlined spinlocks again V4 Heiko Carstens
2009-08-14 12:58 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).