* [RFC] speeding up the stat() family of system calls...
@ 2013-12-21 20:27 Linus Torvalds
2013-12-21 22:54 ` John Stoffel
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Linus Torvalds @ 2013-12-21 20:27 UTC (permalink / raw)
To: Peter Anvin, Ingo Molnar, Thomas Gleixner, Al Viro
Cc: the arch/x86 maintainers, linux-fsdevel,
Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1774 bytes --]
Here's both x86 people and filesystem people involved, because this
hacky RFC patch touches both.
NOTE NOTE NOTE! I've modified "cp_new_stat()" in place, in a way that
is x86-64 specific. So the attached patch *only* works on x86-64, and
will very actively break on anything else. That's intentional, because
that way it's more obvious how the code changes, but a real patch
would create a *new* cp_new_stat() for x86-64, and conditionalize the
existing generic "cp_new_stat()" on not already having an
architecture-optimized one.
Basically, under some filesystem loads, "stat()" and friends are the
most common ops (think tree traversal, but also things like git
verifying the index). And our "cp_new_stat()" function (which is the
common interface, ignoring 32-bit legacy stuff) is generic, but
actually pretty disgusting. It copies things to a temporary 'struct
stat' buffer on the kernel stack, and then uses copy_to_user() to copy
it to user space. The double copy is quite noticeable in profiles, and
it generates a big stack frame too.
By doing an architecture-specific cp_new_stat() function, we can
improve on that.
HOWEVER. On x86, doing an efficient field-at-a-time copy also requires
us to use put_user_try() and put_user_catch() in order to not have
tons of clac/stac instructions for the extended permission testing.
And the implementation of that was actually fairly non-optimal, so to
actually get the code I wanted, I had to change how that all worked
too, using "asm_volatile_goto()".
Thus both x86 and FS people on the list.
Comments? This would obviously be a 3.14 issue, I'm not suggesting
we'd do this now. I just want to lay the ground-work..
It's tested in the sense that "it works for me", and profiles look nice, but..
Linus
[-- Attachment #2: vfs-stat-improvement --]
[-- Type: application/octet-stream, Size: 6155 bytes --]
arch/x86/include/asm/uaccess.h | 60 +++++++++++++++++++++----------------
fs/stat.c | 67 +++++++++++++++++++++++++-----------------
2 files changed, 74 insertions(+), 53 deletions(-)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 8ec57c07b125..b6f6816b9bba 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -197,13 +197,12 @@ __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL))
: "A" (x), "r" (addr), "i" (errret), "0" (err))
#define __put_user_asm_ex_u64(x, addr) \
- asm volatile(ASM_STAC "\n" \
+ asm_volatile_goto( \
"1: movl %%eax,0(%1)\n" \
"2: movl %%edx,4(%1)\n" \
- "3: " ASM_CLAC "\n" \
- _ASM_EXTABLE_EX(1b, 2b) \
- _ASM_EXTABLE_EX(2b, 3b) \
- : : "A" (x), "r" (addr))
+ _ASM_EXTABLE_EX(1b, %l[put_user_fail]) \
+ _ASM_EXTABLE_EX(2b, %l[put_user_fail]) \
+ : : "A" (x), "r" (addr) : : put_user_fail)
#define __put_user_x8(x, ptr, __ret_pu) \
asm volatile("call __put_user_8" : "=a" (__ret_pu) \
@@ -424,23 +423,9 @@ struct __large_struct { unsigned long buf[100]; };
: ltype(x), "m" (__m(addr)), "i" (errret), "0" (err))
#define __put_user_asm_ex(x, addr, itype, rtype, ltype) \
- asm volatile("1: mov"itype" %"rtype"0,%1\n" \
- "2:\n" \
- _ASM_EXTABLE_EX(1b, 2b) \
- : : ltype(x), "m" (__m(addr)))
-
-/*
- * uaccess_try and catch
- */
-#define uaccess_try do { \
- current_thread_info()->uaccess_err = 0; \
- stac(); \
- barrier();
-
-#define uaccess_catch(err) \
- clac(); \
- (err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0); \
-} while (0)
+ asm_volatile_goto("1: mov"itype" %"rtype"0,%1\n" \
+ _ASM_EXTABLE_EX(1b, %l[put_user_fail]) \
+ : : ltype(x), "m" (__m(addr)) : : put_user_fail)
/**
* __get_user: - Get a simple variable from user space, with less checking.
@@ -499,8 +484,18 @@ struct __large_struct { unsigned long buf[100]; };
* get_user_ex(...);
* } get_user_catch(err)
*/
-#define get_user_try uaccess_try
-#define get_user_catch(err) uaccess_catch(err)
+/*
+ * uaccess_try and catch
+ */
+#define get_user_try do { \
+ current_thread_info()->uaccess_err = 0; \
+ stac(); \
+ barrier();
+
+#define get_user_catch(err) \
+ clac(); \
+ (err) |= (current_thread_info()->uaccess_err ? -EFAULT : 0); \
+} while (0)
#define get_user_ex(x, ptr) do { \
unsigned long __gue_val; \
@@ -508,8 +503,21 @@ struct __large_struct { unsigned long buf[100]; };
(x) = (__force __typeof__(*(ptr)))__gue_val; \
} while (0)
-#define put_user_try uaccess_try
-#define put_user_catch(err) uaccess_catch(err)
+/*
+ * uaccess_try and catch
+ */
+#define put_user_try do { \
+ stac(); \
+ barrier();
+
+#define put_user_catch(err) \
+ if (0) { \
+put_user_fail: \
+ (err) = -EFAULT; \
+ } \
+ clac(); \
+} while (0)
+
#define put_user_ex(x, ptr) \
__put_user_size_ex((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
diff --git a/fs/stat.c b/fs/stat.c
index ae0c3cef9927..5bfa3dae5999 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -228,39 +228,52 @@ SYSCALL_DEFINE2(fstat, unsigned int, fd, struct __old_kernel_stat __user *, stat
static int cp_new_stat(struct kstat *stat, struct stat __user *statbuf)
{
- struct stat tmp;
+ int err;
+ typeof(statbuf->st_ino) st_ino;
+ typeof(statbuf->st_nlink) st_nlink;
+
+ if (!access_ok(VERIFY_WRITE, statbuf, sizeof(*statbuf)))
+ return -EFAULT;
if (!valid_dev(stat->dev) || !valid_dev(stat->rdev))
return -EOVERFLOW;
-#if BITS_PER_LONG == 32
- if (stat->size > MAX_NON_LFS)
- return -EOVERFLOW;
-#endif
- INIT_STRUCT_STAT_PADDING(tmp);
- tmp.st_dev = encode_dev(stat->dev);
- tmp.st_ino = stat->ino;
- if (sizeof(tmp.st_ino) < sizeof(stat->ino) && tmp.st_ino != stat->ino)
+ st_ino = stat->ino;
+ if (sizeof(st_ino) < sizeof(stat->ino) && st_ino != stat->ino)
return -EOVERFLOW;
- tmp.st_mode = stat->mode;
- tmp.st_nlink = stat->nlink;
- if (tmp.st_nlink != stat->nlink)
+ st_nlink = stat->nlink;
+ if (st_nlink != stat->nlink)
return -EOVERFLOW;
- SET_UID(tmp.st_uid, from_kuid_munged(current_user_ns(), stat->uid));
- SET_GID(tmp.st_gid, from_kgid_munged(current_user_ns(), stat->gid));
- tmp.st_rdev = encode_dev(stat->rdev);
- tmp.st_size = stat->size;
- tmp.st_atime = stat->atime.tv_sec;
- tmp.st_mtime = stat->mtime.tv_sec;
- tmp.st_ctime = stat->ctime.tv_sec;
-#ifdef STAT_HAVE_NSEC
- tmp.st_atime_nsec = stat->atime.tv_nsec;
- tmp.st_mtime_nsec = stat->mtime.tv_nsec;
- tmp.st_ctime_nsec = stat->ctime.tv_nsec;
-#endif
- tmp.st_blocks = stat->blocks;
- tmp.st_blksize = stat->blksize;
- return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;
+
+ err = 0;
+ put_user_try {
+ typeof(statbuf->st_uid) st_uid;
+ typeof(statbuf->st_gid) st_gid;
+
+ put_user_ex(encode_dev(stat->dev), &statbuf->st_dev);
+ put_user_ex(st_ino, &statbuf->st_ino);
+ put_user_ex(st_nlink, &statbuf->st_nlink);
+ put_user_ex(stat->mode, &statbuf->st_mode);
+ SET_UID(st_uid, from_kuid_munged(current_user_ns(), stat->uid));
+ put_user_ex(st_uid, &statbuf->st_uid);
+ SET_GID(st_gid, from_kgid_munged(current_user_ns(), stat->gid));
+ put_user_ex(st_gid, &statbuf->st_gid);
+ put_user_ex(0, &statbuf->__pad0);
+ put_user_ex(encode_dev(stat->rdev), &statbuf->st_rdev);
+ put_user_ex(stat->size, &statbuf->st_size);
+ put_user_ex(stat->blksize, &statbuf->st_blksize);
+ put_user_ex(stat->blocks, &statbuf->st_blocks);
+ put_user_ex(stat->atime.tv_sec, &statbuf->st_atime);
+ put_user_ex(stat->atime.tv_nsec, &statbuf->st_atime_nsec);
+ put_user_ex(stat->mtime.tv_sec, &statbuf->st_mtime);
+ put_user_ex(stat->mtime.tv_nsec, &statbuf->st_mtime_nsec);
+ put_user_ex(stat->ctime.tv_sec, &statbuf->st_ctime);
+ put_user_ex(stat->ctime.tv_nsec, &statbuf->st_ctime_nsec);
+ put_user_ex(0, &statbuf->__unused[0]);
+ put_user_ex(0, &statbuf->__unused[1]);
+ put_user_ex(0, &statbuf->__unused[2]);
+ } put_user_catch(err);
+ return err;
}
SYSCALL_DEFINE2(newstat, const char __user *, filename,
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-21 20:27 [RFC] speeding up the stat() family of system calls Linus Torvalds
@ 2013-12-21 22:54 ` John Stoffel
2013-12-22 0:11 ` Linus Torvalds
2013-12-24 0:00 ` H. Peter Anvin
2013-12-24 20:46 ` Ingo Molnar
2 siblings, 1 reply; 13+ messages in thread
From: John Stoffel @ 2013-12-21 22:54 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Anvin, Ingo Molnar, Thomas Gleixner, Al Viro,
the arch/x86 maintainers, linux-fsdevel,
Linux Kernel Mailing List
Linus> Here's both x86 people and filesystem people involved, because this
Linus> hacky RFC patch touches both.
Linus> NOTE NOTE NOTE! I've modified "cp_new_stat()" in place, in a way that
Linus> is x86-64 specific. So the attached patch *only* works on x86-64, and
Linus> will very actively break on anything else. That's intentional, because
Linus> that way it's more obvious how the code changes, but a real patch
Linus> would create a *new* cp_new_stat() for x86-64, and conditionalize the
Linus> existing generic "cp_new_stat()" on not already having an
Linus> architecture-optimized one.
As a SysAdmin, I'm always interested in any speedups to filesystem
ops, since I tend to do lots of trawling through filesystems with find
looking for data on filesystem usage, largest files, etc. So this is
good news. Any numbers of how much better this is? I'm travelling
tomorrow, so I won't have time to spin up a VM and play, though it's
tempting to do so.
Thanks,
John
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-21 22:54 ` John Stoffel
@ 2013-12-22 0:11 ` Linus Torvalds
0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2013-12-22 0:11 UTC (permalink / raw)
To: John Stoffel
Cc: Peter Anvin, Ingo Molnar, Thomas Gleixner, Al Viro,
the arch/x86 maintainers, linux-fsdevel,
Linux Kernel Mailing List
On Sat, Dec 21, 2013 at 2:54 PM, John Stoffel <john@stoffel.org> wrote:
>
> Any numbers of how much better this is? I'm travelling
> tomorrow, so I won't have time to spin up a VM and play, though it's
> tempting to do so.
On most _real_ loads, the kernel footprint is pretty small, and
obviously if you do IO or other stuff, the cost of some wasted CPU
cycles in stat() is negligible. And even if you loop doing normal
'stat()' (and everything is cached), over a filesystem, the actual
path walk itself is still dominant, although at that point the CPU
overhead of the stat data copy starts to at least show up in profiles.
Showing up in profiles is why I started looking at it.
And then, for the extreme case, my test program went from 0.92s to
0.80s. But that stupid test program just does ten million fstat(0,&st)
calls, in order to show the highest cp_new_stat() CPU overhead (with
minimal cache footprint). So at that point it's about a 13%
difference, when the only other overhead really is just the system
call itself. The profile went from
10.42% a.out [k] copy_user_enhanced_fast_string
5.91% a.out [k] cp_new_stat
(the "copy_user_enhanced_fast_string" is the "rep movsb" that copies
things to user space) to
6.69% a.out [k] cp_new_stat
(and here there is no separate user-copy, since it's all done directly
inside the optimized cp_new_stat).
It's worth pointing out that the 5.91% -> 6.69% profile change of
cp_new_stat() is *not* because cp_new_stat() got slower: it's simply a
direct result of the 13% performance improvements - 13% of the cycles
went away, and so 5.91% becomes 6.69%.
But the above 13% is really the extreme case with hot caches etc.
Normally it's not really noticeable in any bigger picture load. For
example, on a big "git diff", the cp_new_stat() function shows up at
0.63% of the load, so making it faster isn't going to make any very
noticeable difference.
The reason I care really is that that stat overhead always does show
up on the kinds of profiles I care about, even if it's never all that
high. In other words, it's more of an annoyance over how stupid that
code was than a huge performance increase.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-21 20:27 [RFC] speeding up the stat() family of system calls Linus Torvalds
2013-12-21 22:54 ` John Stoffel
@ 2013-12-24 0:00 ` H. Peter Anvin
2013-12-24 0:12 ` Linus Torvalds
2013-12-24 20:46 ` Ingo Molnar
2 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2013-12-24 0:00 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar, Thomas Gleixner, Al Viro
Cc: the arch/x86 maintainers, linux-fsdevel,
Linux Kernel Mailing List
On 12/21/2013 12:27 PM, Linus Torvalds wrote:
>
> HOWEVER. On x86, doing an efficient field-at-a-time copy also requires
> us to use put_user_try() and put_user_catch() in order to not have
> tons of clac/stac instructions for the extended permission testing.
> And the implementation of that was actually fairly non-optimal, so to
> actually get the code I wanted, I had to change how that all worked
> too, using "asm_volatile_goto()".
>
I guess I'm a bit puzzled... the current code should be just fine if
everything is present, and do we really care about the performance if we
actually have an error condition?
I'm a bit concerned about the put_user_fail: label having uniqueness
problem, which I know some versions of gcc at least get very noisy over.
I like the overall approach, however.
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-24 0:00 ` H. Peter Anvin
@ 2013-12-24 0:12 ` Linus Torvalds
2013-12-24 6:00 ` H. Peter Anvin
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2013-12-24 0:12 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Thomas Gleixner, Al Viro, the arch/x86 maintainers,
linux-fsdevel, Linux Kernel Mailing List
On Mon, Dec 23, 2013 at 4:00 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> I guess I'm a bit puzzled... the current code should be just fine if
> everything is present, and do we really care about the performance if we
> actually have an error condition?
I think we should. You could make it to do something like eighteen
expensive page faults in a row for EFAULT, and that's just disgusting,
when there is no reason to do it.
But to be honest, the resulting assembly is also easier to read,
because it doesn't have those annoying bogus branch targets all over
in the middle of the code. That was actually my main issue - looking
at the generated fs/stat.s file and not puking ;)
(it's still hard to read with all the fixup section stuff, but it's
better. And it really does generate better code, so..)
> I'm a bit concerned about the put_user_fail: label having uniqueness
> problem, which I know some versions of gcc at least get very noisy over.
Oh, you're right, I forgot to actually declare the label so that gcc
sees that it's a local one.
So it needs a
__label__ put_user_fail;
in the put_user_try() (and yes, maybe the label name should have
underscores prepended or something, just to make sure it's internal).
But gcc is perfectly fine with multiple labels in different scopes if
you do that. We already use that in a few places, so this isn't even a
new pattern for us.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-24 0:12 ` Linus Torvalds
@ 2013-12-24 6:00 ` H. Peter Anvin
0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2013-12-24 6:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Thomas Gleixner, Al Viro, the arch/x86 maintainers,
linux-fsdevel, Linux Kernel Mailing List
Right, the __label__ declaration should take care of it.
Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Mon, Dec 23, 2013 at 4:00 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> I guess I'm a bit puzzled... the current code should be just fine if
>> everything is present, and do we really care about the performance if
>we
>> actually have an error condition?
>
>I think we should. You could make it to do something like eighteen
>expensive page faults in a row for EFAULT, and that's just disgusting,
>when there is no reason to do it.
>
>But to be honest, the resulting assembly is also easier to read,
>because it doesn't have those annoying bogus branch targets all over
>in the middle of the code. That was actually my main issue - looking
>at the generated fs/stat.s file and not puking ;)
>
>(it's still hard to read with all the fixup section stuff, but it's
>better. And it really does generate better code, so..)
>
>> I'm a bit concerned about the put_user_fail: label having uniqueness
>> problem, which I know some versions of gcc at least get very noisy
>over.
>
>Oh, you're right, I forgot to actually declare the label so that gcc
>sees that it's a local one.
>
>So it needs a
>
> __label__ put_user_fail;
>
>in the put_user_try() (and yes, maybe the label name should have
>underscores prepended or something, just to make sure it's internal).
>
>But gcc is perfectly fine with multiple labels in different scopes if
>you do that. We already use that in a few places, so this isn't even a
>new pattern for us.
>
> Linus
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-21 20:27 [RFC] speeding up the stat() family of system calls Linus Torvalds
2013-12-21 22:54 ` John Stoffel
2013-12-24 0:00 ` H. Peter Anvin
@ 2013-12-24 20:46 ` Ingo Molnar
2013-12-26 19:00 ` Linus Torvalds
2 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2013-12-24 20:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Anvin, Ingo Molnar, Thomas Gleixner, Al Viro,
the arch/x86 maintainers, linux-fsdevel,
Linux Kernel Mailing List
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Here's both x86 people and filesystem people involved, because this
> hacky RFC patch touches both.
>
> NOTE NOTE NOTE! I've modified "cp_new_stat()" in place, in a way that
> is x86-64 specific. So the attached patch *only* works on x86-64, and
> will very actively break on anything else. That's intentional, because
> that way it's more obvious how the code changes, but a real patch
> would create a *new* cp_new_stat() for x86-64, and conditionalize the
> existing generic "cp_new_stat()" on not already having an
> architecture-optimized one.
>
> Basically, under some filesystem loads, "stat()" and friends are the
> most common ops (think tree traversal, but also things like git
> verifying the index). And our "cp_new_stat()" function (which is the
> common interface, ignoring 32-bit legacy stuff) is generic, but
> actually pretty disgusting. It copies things to a temporary 'struct
> stat' buffer on the kernel stack, and then uses copy_to_user() to copy
> it to user space. The double copy is quite noticeable in profiles, and
> it generates a big stack frame too.
>
> By doing an architecture-specific cp_new_stat() function, we can
> improve on that.
>
> HOWEVER. On x86, doing an efficient field-at-a-time copy also requires
> us to use put_user_try() and put_user_catch() in order to not have
> tons of clac/stac instructions for the extended permission testing.
> And the implementation of that was actually fairly non-optimal, so to
> actually get the code I wanted, I had to change how that all worked
> too, using "asm_volatile_goto()".
>
> Thus both x86 and FS people on the list.
>
> Comments? This would obviously be a 3.14 issue, I'm not suggesting
> we'd do this now. I just want to lay the ground-work..
>
> It's tested in the sense that "it works for me", and profiles look
> nice, but..
Looks cool - it looks rather similar to the try/catch model Richard
Henderson came up with many eons ago when we implemented the original
exception mechanism for Linux, which IIRC we decided wasn't safe due
to lack of compiler support.
Now we have compiler support ... written by Richard Henderson ;-)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-24 20:46 ` Ingo Molnar
@ 2013-12-26 19:00 ` Linus Torvalds
2013-12-27 0:45 ` H. Peter Anvin
2013-12-27 6:09 ` H. Peter Anvin
0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2013-12-26 19:00 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Anvin, Ingo Molnar, Thomas Gleixner, Al Viro,
the arch/x86 maintainers, linux-fsdevel,
Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1534 bytes --]
On Tue, Dec 24, 2013 at 12:46 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Looks cool - it looks rather similar to the try/catch model Richard
> Henderson came up with many eons ago when we implemented the original
> exception mechanism for Linux, which IIRC we decided wasn't safe due
> to lack of compiler support.
>
> Now we have compiler support ... written by Richard Henderson ;-)
Interestingly, looking at the cp_new_stat() profiles, the games we
play to get efficient range checking seem to actually hurt us. Maybe
it's the "sbb" that is just expensive, or maybe it's turning a (very
predictable) conditional branch into a data dependency chain instead.
Or maybe it's just random noise in my profiles that happened to make
those sbb's look bad.
What do people think about this access_ok() simplification that gets
rid of inline asm, and instead optimizes the (reasonably common) case
of a constant size. The complex case that requires overflow checking
*might* get a bit slower (it's not clear: it really wasn't generating
wonderful code), but the simple case of a known constant size actually
gets simpler.
Random aside: this simplification made the special spurious
access_ok() check in kernel/futex.c (the NULL user pointer check in
futex_init()) just go away entirely, because the compiler could now
see that it can never trigger for a NULL pointer.. Not a performance
issue, but it was kind of funny to notice how getting rid of the
inline asm actually made the compiler able to notice that.
Linus
[-- Attachment #2: vfs-simplify-uaccess-range --]
[-- Type: application/octet-stream, Size: 1573 bytes --]
arch/x86/include/asm/uaccess.h | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 8ec57c07b125..84ecf1df2ac6 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -40,22 +40,28 @@
/*
* Test whether a block of memory is a valid user space address.
* Returns 0 if the range is valid, nonzero otherwise.
- *
- * This is equivalent to the following test:
- * (u33)addr + (u33)size > (u33)current->addr_limit.seg (u65 for x86_64)
- *
- * This needs 33-bit (65-bit for x86_64) arithmetic. We have a carry...
*/
+static inline int __chk_range_not_ok(unsigned long addr, unsigned long size, unsigned long limit)
+{
+ /*
+ * If we have used "sizeof()" for the size,
+ * we know it won't overflow the limit (but
+ * it might overflow the 'addr', so it's
+ * important to subtract the size from the
+ * limit, not add it to the address).
+ */
+ if (__builtin_constant_p(size))
+ return addr > limit - size;
+
+ /* Arbitrary sizes? Be careful about overflow */
+ addr += size;
+ return (addr < size) || (addr > limit);
+}
#define __range_not_ok(addr, size, limit) \
({ \
- unsigned long flag, roksum; \
__chk_user_ptr(addr); \
- asm("add %3,%1 ; sbb %0,%0 ; cmp %1,%4 ; sbb $0,%0" \
- : "=&r" (flag), "=r" (roksum) \
- : "1" (addr), "g" ((long)(size)), \
- "rm" (limit)); \
- flag; \
+ __chk_range_not_ok((unsigned long __force)(addr), size, limit); \
})
/**
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-26 19:00 ` Linus Torvalds
@ 2013-12-27 0:45 ` H. Peter Anvin
2013-12-27 3:18 ` H. Peter Anvin
2013-12-27 6:09 ` H. Peter Anvin
1 sibling, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2013-12-27 0:45 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Ingo Molnar, Thomas Gleixner, Al Viro, the arch/x86 maintainers,
linux-fsdevel, Linux Kernel Mailing List
On 12/26/2013 11:00 AM, Linus Torvalds wrote:
>
> Interestingly, looking at the cp_new_stat() profiles, the games we
> play to get efficient range checking seem to actually hurt us. Maybe
> it's the "sbb" that is just expensive, or maybe it's turning a (very
> predictable) conditional branch into a data dependency chain instead.
> Or maybe it's just random noise in my profiles that happened to make
> those sbb's look bad.
>
I'm not at all surprised... there is a pretty serious data dependency
chain here and in the end we end up manifesting a value in a register
that has to be tested even though it is available in the flags. Inline
assembly also means the compiler can't optimize it at all.
I have to wonder if we actually have to test the upper limit, though: we
can always guarantee a guard zone between user space and kernel space,
and thus guarantee either a #PF or #GP if someone tries to overflow user
space. Testing just the lower limit would be much cheaper, especially
on 64 bits where we can simply test the sign bit.
What do you think?
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-27 0:45 ` H. Peter Anvin
@ 2013-12-27 3:18 ` H. Peter Anvin
0 siblings, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2013-12-27 3:18 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Ingo Molnar, Thomas Gleixner, Al Viro, the arch/x86 maintainers,
linux-fsdevel, Linux Kernel Mailing List
Ok the sign bit doesn't really make any sense on second thought... to work with set_fs() we have to load something from memory anyway and then we might as well do a compare...
"H. Peter Anvin" <hpa@zytor.com> wrote:
>On 12/26/2013 11:00 AM, Linus Torvalds wrote:
>>
>> Interestingly, looking at the cp_new_stat() profiles, the games we
>> play to get efficient range checking seem to actually hurt us. Maybe
>> it's the "sbb" that is just expensive, or maybe it's turning a (very
>> predictable) conditional branch into a data dependency chain instead.
>> Or maybe it's just random noise in my profiles that happened to make
>> those sbb's look bad.
>>
>
>I'm not at all surprised... there is a pretty serious data dependency
>chain here and in the end we end up manifesting a value in a register
>that has to be tested even though it is available in the flags. Inline
>assembly also means the compiler can't optimize it at all.
>
>I have to wonder if we actually have to test the upper limit, though:
>we
>can always guarantee a guard zone between user space and kernel space,
>and thus guarantee either a #PF or #GP if someone tries to overflow
>user
>space. Testing just the lower limit would be much cheaper, especially
>on 64 bits where we can simply test the sign bit.
>
>What do you think?
>
> -hpa
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-26 19:00 ` Linus Torvalds
2013-12-27 0:45 ` H. Peter Anvin
@ 2013-12-27 6:09 ` H. Peter Anvin
2013-12-27 23:30 ` H. Peter Anvin
1 sibling, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2013-12-27 6:09 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Ingo Molnar, Thomas Gleixner, Al Viro, the arch/x86 maintainers,
linux-fsdevel, Linux Kernel Mailing List
On 12/26/2013 11:00 AM, Linus Torvalds wrote:
>
> Interestingly, looking at the cp_new_stat() profiles, the games we
> play to get efficient range checking seem to actually hurt us. Maybe
> it's the "sbb" that is just expensive, or maybe it's turning a (very
> predictable) conditional branch into a data dependency chain instead.
> Or maybe it's just random noise in my profiles that happened to make
> those sbb's look bad.
>
Much to my surprise, this patch adds almost 10K of text to an
"allyesconfig" build. I wouldn't have expected it. I'll look at it
some more tomorrow.
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-27 6:09 ` H. Peter Anvin
@ 2013-12-27 23:30 ` H. Peter Anvin
2014-01-12 17:46 ` Ingo Molnar
0 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2013-12-27 23:30 UTC (permalink / raw)
To: Linus Torvalds, Ingo Molnar
Cc: Ingo Molnar, Thomas Gleixner, Al Viro, the arch/x86 maintainers,
linux-fsdevel, Linux Kernel Mailing List
On 12/26/2013 10:09 PM, H. Peter Anvin wrote:
> On 12/26/2013 11:00 AM, Linus Torvalds wrote:
>>
>> Interestingly, looking at the cp_new_stat() profiles, the games we
>> play to get efficient range checking seem to actually hurt us. Maybe
>> it's the "sbb" that is just expensive, or maybe it's turning a (very
>> predictable) conditional branch into a data dependency chain instead.
>> Or maybe it's just random noise in my profiles that happened to make
>> those sbb's look bad.
>>
>
> Much to my surprise, this patch adds almost 10K of text to an
> "allyesconfig" build. I wouldn't have expected it. I'll look at it
> some more tomorrow.
>
Mystery solved... it is all code added by gcov &c because a new (inline)
function is added to the code base. So it is fluff, not real.
-hpa
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] speeding up the stat() family of system calls...
2013-12-27 23:30 ` H. Peter Anvin
@ 2014-01-12 17:46 ` Ingo Molnar
0 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2014-01-12 17:46 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, Al Viro,
the arch/x86 maintainers, linux-fsdevel,
Linux Kernel Mailing List
* H. Peter Anvin <hpa@zytor.com> wrote:
> On 12/26/2013 10:09 PM, H. Peter Anvin wrote:
> > On 12/26/2013 11:00 AM, Linus Torvalds wrote:
> >>
> >> Interestingly, looking at the cp_new_stat() profiles, the games we
> >> play to get efficient range checking seem to actually hurt us. Maybe
> >> it's the "sbb" that is just expensive, or maybe it's turning a (very
> >> predictable) conditional branch into a data dependency chain instead.
> >> Or maybe it's just random noise in my profiles that happened to make
> >> those sbb's look bad.
> >>
> >
> > Much to my surprise, this patch adds almost 10K of text to an
> > "allyesconfig" build. I wouldn't have expected it. I'll look at
> > it some more tomorrow.
>
> Mystery solved... it is all code added by gcov &c because a new
> (inline) function is added to the code base. So it is fluff, not
> real.
Yeah, defconfig builds are better for size comparisons, at least on
x86 they are distro-config derived so a lot more relevant to real life
than allyesconfig or allmodconfig.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-01-12 17:46 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-21 20:27 [RFC] speeding up the stat() family of system calls Linus Torvalds
2013-12-21 22:54 ` John Stoffel
2013-12-22 0:11 ` Linus Torvalds
2013-12-24 0:00 ` H. Peter Anvin
2013-12-24 0:12 ` Linus Torvalds
2013-12-24 6:00 ` H. Peter Anvin
2013-12-24 20:46 ` Ingo Molnar
2013-12-26 19:00 ` Linus Torvalds
2013-12-27 0:45 ` H. Peter Anvin
2013-12-27 3:18 ` H. Peter Anvin
2013-12-27 6:09 ` H. Peter Anvin
2013-12-27 23:30 ` H. Peter Anvin
2014-01-12 17:46 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).