All of lore.kernel.org
 help / color / mirror / Atom feed
* Re-tune x86 uaccess code for PREEMPT_VOLUNTARY v2
@ 2013-08-14  0:07 Andi Kleen
  2013-08-14  0:07 ` [PATCH 1/8] x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic Andi Kleen
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Andi Kleen @ 2013-08-14  0:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, torvalds

The x86 user access functions (*_user) were originally very well tuned,
with partial inline code and other optimizations.

Then over time various new checks -- particularly the sleep checks for
a voluntary preempt kernel -- destroyed a lot of the tunings

A typical user access operation is now doing multiple useless
function calls. Also the without force inline gcc's inlining
policy makes it even worse, with adding more unnecessary calls.

Here's a typical example from ftrace:

     10)               |    might_fault() {
     10)               |      _cond_resched() {
     10)               |        should_resched() {
     10)               |          need_resched() {
     10)   0.063 us    |            test_ti_thread_flag();
     10)   0.643 us    |          }
     10)   1.238 us    |        }
     10)   1.845 us    |      }
     10)   2.438 us    |    }

So we spent 2.5us doing nothing (ok it's a bit less without
ftrace, but still pretty bad)

Then in other cases we would have an out of line function,
but would actually do the might_sleep() checks in the inlined
caller. This doesn't make any sense at all.

There were also a few other problems, for example the x86-64 uaccess
code regularly falls back to string functions, even though a simple
mov would be enough. For example every futex access to the lock
variable would actually use string instructions, even though 
it's just 4 bytes.

This patch kit is an attempt to get us back to sane code, 
mostly by doing proper inlining and doing sleep checks in the right
place. Unfortunately I had to add one tree sweep to avoid an nasty
include loop.

v2: Now completely remove reschedule checks for uaccess functions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-08-14  9:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-14  0:07 Re-tune x86 uaccess code for PREEMPT_VOLUNTARY v2 Andi Kleen
2013-08-14  0:07 ` [PATCH 1/8] x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic Andi Kleen
2013-08-14  0:17   ` Linus Torvalds
2013-08-14  0:07 ` [PATCH 2/8] x86: Include linux/sched.h in asm/uaccess.h Andi Kleen
2013-08-14  0:07 ` [PATCH 3/8] tree-sweep: Include linux/sched.h for might_sleep users Andi Kleen
2013-08-14  0:07 ` [PATCH 4/8] Move might_sleep and friends from kernel.h to sched.h Andi Kleen
2013-08-14  0:07 ` [PATCH 5/8] sched: mark should_resched() __always_inline Andi Kleen
2013-08-14  0:07 ` [PATCH 6/8] Add might_fault_debug_only() Andi Kleen
2013-08-14  0:07 ` [PATCH 7/8] x86: Remove cond_resched() from uaccess code Andi Kleen
2013-08-14  0:07 ` [PATCH 8/8] sched: Inline the need_resched test into the caller for _cond_resched Andi Kleen
2013-08-14  9:56 ` Re-tune x86 uaccess code for PREEMPT_VOLUNTARY v2 Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.