linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re-tune x86 uaccess code for PREEMPT_VOLUNTARY
@ 2013-08-09 23:04 Andi Kleen
  2013-08-09 23:04 ` [PATCH 01/13] x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic Andi Kleen
                   ` (15 more replies)
  0 siblings, 16 replies; 40+ messages in thread
From: Andi Kleen @ 2013-08-09 23:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, mingo, torvalds

The x86 user access functions (*_user) were originally very well tuned,
with partial inline code and other optimizations.

Then over time various new checks -- particularly the sleep checks for
a voluntary preempt kernel -- destroyed a lot of the tunings

A typical user access operation is now doing multiple useless
function calls. Also the without force inline gcc's inlining
policy makes it even worse, with adding more unnecessary calls.

Here's a typical example from ftrace:

     10)               |    might_fault() {
     10)               |      _cond_resched() {
     10)               |        should_resched() {
     10)               |          need_resched() {
     10)   0.063 us    |            test_ti_thread_flag();
     10)   0.643 us    |          }
     10)   1.238 us    |        }
     10)   1.845 us    |      }
     10)   2.438 us    |    }

So we spent 2.5us doing nothing (ok it's a bit less without
ftrace, but still pretty bad)

Then in other cases we would have an out of line function,
but would actually do the might_sleep() checks in the inlined
caller. This doesn't make any sense at all.

There were also a few other problems, for example the x86-64 uaccess
code regularly falls back to string functions, even though a simple
mov would be enough. For example every futex access to the lock
variable would actually use string instructions, even though 
it's just 4 bytes.

This patch kit is an attempt to get us back to sane code, 
mostly by doing proper inlining and doing sleep checks in the right
place. Unfortunately I had to add one tree sweep to avoid an nasty
include loop.

It costs a bit of text space, but I think it's worth it
(if only to keep my blood pressure down while reading ftrace logs...)

I haven't done any particular benchmarks, but important low level
functions just ought to be fast.

64bit:
13249492        1881328 1159168 16289988         f890c4 vmlinux-before-uaccess
13260877        1877232 1159168 16297277         f8ad3d vmlinux-uaccess
+ 11k, +0.08%

32bit:
11223248         899512 1916928 14039688         d63a88 vmlinux-before-uaccess
11230358         895416 1916928 14042702         d6464e vmlinux-uaccess
+ 7k, +0.06%


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2013-08-20 21:03 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-09 23:04 Re-tune x86 uaccess code for PREEMPT_VOLUNTARY Andi Kleen
2013-08-09 23:04 ` [PATCH 01/13] x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic Andi Kleen
2013-08-09 23:04 ` [PATCH 02/13] x86: Include linux/sched.h in asm/uaccess.h Andi Kleen
2013-08-09 23:04 ` [PATCH 03/13] tree-sweep: Include linux/sched.h for might_sleep users Andi Kleen
2013-08-09 23:04 ` [PATCH 04/13] Move might_sleep and friends from kernel.h to sched.h Andi Kleen
2013-08-09 23:04 ` [PATCH 05/13] sched: mark should_resched() __always_inline Andi Kleen
2013-08-09 23:04 ` [PATCH 06/13] x86: Add 32bit versions of SAVE_ALL/RESTORE_ALL to calling.h Andi Kleen
2013-08-09 23:04 ` [PATCH 07/13] Add might_fault_debug_only() Andi Kleen
2013-08-14 18:24   ` Michael S. Tsirkin
2013-08-09 23:04 ` [PATCH 08/13] x86: Move cond_resched into the out of line put_user code Andi Kleen
2013-08-09 23:04 ` [PATCH 09/13] x86: Move cond_resched into the out of line get_user code Andi Kleen
2013-08-09 23:04 ` [PATCH 10/13] x86: Move cond resched for copy_{from,to}_user into low level code 64bit Andi Kleen
2013-08-10 15:42   ` Linus Torvalds
2013-08-10 16:10     ` Andi Kleen
2013-08-10 16:27       ` Linus Torvalds
2013-08-10 18:23         ` Borislav Petkov
2013-08-10 19:05           ` Jörn Engel
2013-08-20 21:03         ` KOSAKI Motohiro
2013-08-15  5:04     ` Michael S. Tsirkin
2013-08-09 23:04 ` [PATCH 11/13] sched: Inline the need_resched test into the caller for _cond_resched Andi Kleen
2013-08-09 23:04 ` [PATCH 12/13] x86: move __copy_*_nocache might fault check out of line Andi Kleen
2013-08-09 23:04 ` [PATCH 13/13] x86: drop cond rescheds from __copy_{from,to}_user Andi Kleen
2013-08-10  4:42 ` Re-tune x86 uaccess code for PREEMPT_VOLUNTARY H. Peter Anvin
2013-08-10  5:55   ` Mike Galbraith
2013-08-10 16:09     ` H. Peter Anvin
2013-08-10 16:43       ` Linus Torvalds
2013-08-10 17:18         ` H. Peter Anvin
2013-08-10 18:51           ` Linus Torvalds
2013-08-10 19:18             ` H. Peter Anvin
2013-08-10 20:26             ` H. Peter Anvin
2013-08-10 23:00             ` H. Peter Anvin
2013-08-11  4:17       ` Mike Galbraith
2013-08-11  4:27         ` H. Peter Anvin
2013-08-11  4:36           ` Mike Galbraith
2013-08-11  4:57             ` H. Peter Anvin
2013-08-11  5:58               ` Mike Galbraith
2013-08-13 18:09 ` H. Peter Anvin
2013-08-13 18:12   ` Andi Kleen
2013-08-14 18:27 ` Michael S. Tsirkin
2013-08-14 22:08   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).