Mostly portable strnlen

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Mostly portable strnlen_user()
@ 2012-05-25 22:35 Linus Torvalds
  2012-05-25 23:06 ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2012-05-25 22:35 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-arch


Ok, so I'm going to merge the appended commit into mainline at some point 
here soon, and I thought maybe the BE people wanted to pipe up.

This does *not* use the generic "find zero" or "count bytes" logic that 
David used for strncpy_from_user(), because quite frankly, that one is 
suboptimal on x86 and causes conditionals and crap.

So it's not truly generic, but it's in lib/strnlen_user.c and under the 
CONFIG_GENERIC_STRNLEN_USER config option, because I'm convinced it should 
be possible to do something that does the right thing for everybody. 
Eventually.

But "the right thing" definitely involves arch-specific helpers for doing 
the byte counting, because it *is* going to occasionally be a popcount 
instruction or similar. And it *is* going to take advantage of the fact 
that things are sometimes simply easier on little-endian.

So this isn't truly generic, but without a BE machine I simply cannot care 
enough about the BE case. So I'm posting this as a heads-up before I 
actually merge it, to see.. Maybe the existing lib/strncpy_from_user.c 
could also be improved sufficiently that it would generate the fairly 
optimal code on x86 that our x86 code currently does.. I'll take a look at 
that at some point.

Comments? What's the appropriate *clean* interface to allow architectures 
to tweak the little details?

                      Linus
---
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 15:02:46 -0700
Subject: [PATCH] lib: add a generic str[n]len_user() implementation, use it on x86

This is reasonably generic, but unlike the strncpy_from_user() function
it requires the bytemask counting functions from word-at-a-time.h that
only x86 does right now, and still does some things that only work on
little-endian.

With a bit of tweaking, the BE people could use it though, and unlike
the dcache hashing or the strncpy implementation, this has no issues
with architectures that require aligned accesses, since doing
pre-alignment is simple and thus worthwhile even on architectures that
do efficient aligned word accesses.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/x86/Kconfig                  |   1 +
 arch/x86/include/asm/uaccess.h    |   4 ++
 arch/x86/include/asm/uaccess_32.h |  17 -----
 arch/x86/include/asm/uaccess_64.h |   3 -
 arch/x86/lib/usercopy.c           |   2 +-
 arch/x86/lib/usercopy_32.c        |  41 -----------
 arch/x86/lib/usercopy_64.c        |  48 -------------
 lib/Kconfig                       |   3 +
 lib/Makefile                      |   1 +
 lib/strnlen_user.c                | 143 ++++++++++++++++++++++++++++++++++++++
 10 files changed, 153 insertions(+), 110 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 81c3e8be789a..54bbfac5b91a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -93,6 +93,7 @@ config X86
 	select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC)
 	select GENERIC_TIME_VSYSCALL if X86_64
 	select KTIME_SCALAR if X86_32
+	select GENERIC_STRNLEN_USER
 
 config INSTRUCTION_DECODER
 	def_bool (KPROBES || PERF_EVENTS || UPROBES)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 851fe0dc13bc..04cd6882308e 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -32,6 +32,7 @@
 
 #define segment_eq(a, b)	((a).seg == (b).seg)
 
+#define user_addr_max() (current_thread_info()->addr_limit.seg)
 #define __addr_ok(addr)					\
 	((unsigned long __force)(addr) <		\
 	 (current_thread_info()->addr_limit.seg))
@@ -565,6 +566,9 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n);
 extern __must_check long
 strncpy_from_user(char *dst, const char __user *src, long count);
 
+extern __must_check long strlen_user(const char __user *str);
+extern __must_check long strnlen_user(const char __user *str, long n);
+
 /*
  * movsl can be slow when source and dest are not both 8-byte aligned
  */
diff --git a/arch/x86/include/asm/uaccess_32.h b/arch/x86/include/asm/uaccess_32.h
index 8084bc73b18c..576e39bca6ad 100644
--- a/arch/x86/include/asm/uaccess_32.h
+++ b/arch/x86/include/asm/uaccess_32.h
@@ -213,23 +213,6 @@ static inline unsigned long __must_check copy_from_user(void *to,
 	return n;
 }
 
-/**
- * strlen_user: - Get the size of a string in user space.
- * @str: The string to measure.
- *
- * Context: User context only.  This function may sleep.
- *
- * Get the size of a NUL-terminated string in user space.
- *
- * Returns the size of the string INCLUDING the terminating NUL.
- * On exception, returns 0.
- *
- * If there is a limit on the length of a valid string, you may wish to
- * consider using strnlen_user() instead.
- */
-#define strlen_user(str) strnlen_user(str, LONG_MAX)
-
-long strnlen_user(const char __user *str, long n);
 unsigned long __must_check clear_user(void __user *mem, unsigned long len);
 unsigned long __must_check __clear_user(void __user *mem, unsigned long len);
 
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index fcd4b6f3ef02..8e796fbbf9c6 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -208,9 +208,6 @@ int __copy_in_user(void __user *dst, const void __user *src, unsigned size)
 	}
 }
 
-__must_check long strnlen_user(const char __user *str, long n);
-__must_check long __strnlen_user(const char __user *str, long n);
-__must_check long strlen_user(const char __user *str);
 __must_check unsigned long clear_user(void __user *mem, unsigned long len);
 __must_check unsigned long __clear_user(void __user *mem, unsigned long len);
 
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index 2e4e4b02c37a..4239f063f6ac 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -131,7 +131,7 @@ strncpy_from_user(char *dst, const char __user *src, long count)
 	if (unlikely(count <= 0))
 		return 0;
 
-	max_addr = current_thread_info()->addr_limit.seg;
+	max_addr = user_addr_max();
 	src_addr = (unsigned long)src;
 	if (likely(src_addr < max_addr)) {
 		unsigned long max = max_addr - src_addr;
diff --git a/arch/x86/lib/usercopy_32.c b/arch/x86/lib/usercopy_32.c
index 883b216c60b2..1781b2f950e2 100644
--- a/arch/x86/lib/usercopy_32.c
+++ b/arch/x86/lib/usercopy_32.c
@@ -95,47 +95,6 @@ __clear_user(void __user *to, unsigned long n)
 }
 EXPORT_SYMBOL(__clear_user);
 
-/**
- * strnlen_user: - Get the size of a string in user space.
- * @s: The string to measure.
- * @n: The maximum valid length
- *
- * Get the size of a NUL-terminated string in user space.
- *
- * Returns the size of the string INCLUDING the terminating NUL.
- * On exception, returns 0.
- * If the string is too long, returns a value greater than @n.
- */
-long strnlen_user(const char __user *s, long n)
-{
-	unsigned long mask = -__addr_ok(s);
-	unsigned long res, tmp;
-
-	might_fault();
-
-	__asm__ __volatile__(
-		"	testl %0, %0\n"
-		"	jz 3f\n"
-		"	andl %0,%%ecx\n"
-		"0:	repne; scasb\n"
-		"	setne %%al\n"
-		"	subl %%ecx,%0\n"
-		"	addl %0,%%eax\n"
-		"1:\n"
-		".section .fixup,\"ax\"\n"
-		"2:	xorl %%eax,%%eax\n"
-		"	jmp 1b\n"
-		"3:	movb $1,%%al\n"
-		"	jmp 1b\n"
-		".previous\n"
-		_ASM_EXTABLE(0b,2b)
-		:"=&r" (n), "=&D" (s), "=&a" (res), "=&c" (tmp)
-		:"0" (n), "1" (s), "2" (0), "3" (mask)
-		:"cc");
-	return res & mask;
-}
-EXPORT_SYMBOL(strnlen_user);
-
 #ifdef CONFIG_X86_INTEL_USERCOPY
 static unsigned long
 __copy_user_intel(void __user *to, const void *from, unsigned long size)
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 0d0326f388c0..e5b130bc2d0e 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -52,54 +52,6 @@ unsigned long clear_user(void __user *to, unsigned long n)
 }
 EXPORT_SYMBOL(clear_user);
 
-/*
- * Return the size of a string (including the ending 0)
- *
- * Return 0 on exception, a value greater than N if too long
- */
-
-long __strnlen_user(const char __user *s, long n)
-{
-	long res = 0;
-	char c;
-
-	while (1) {
-		if (res>n)
-			return n+1;
-		if (__get_user(c, s))
-			return 0;
-		if (!c)
-			return res+1;
-		res++;
-		s++;
-	}
-}
-EXPORT_SYMBOL(__strnlen_user);
-
-long strnlen_user(const char __user *s, long n)
-{
-	if (!access_ok(VERIFY_READ, s, 1))
-		return 0;
-	return __strnlen_user(s, n);
-}
-EXPORT_SYMBOL(strnlen_user);
-
-long strlen_user(const char __user *s)
-{
-	long res = 0;
-	char c;
-
-	for (;;) {
-		if (get_user(c, s))
-			return 0;
-		if (!c)
-			return res+1;
-		res++;
-		s++;
-	}
-}
-EXPORT_SYMBOL(strlen_user);
-
 unsigned long copy_in_user(void __user *to, const void __user *from, unsigned len)
 {
 	if (access_ok(VERIFY_WRITE, to, len) && access_ok(VERIFY_READ, from, len)) { 
diff --git a/lib/Kconfig b/lib/Kconfig
index 98230ac3db29..64ddc44d0b81 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -19,6 +19,9 @@ config RATIONAL
 config GENERIC_STRNCPY_FROM_USER
 	bool
 
+config GENERIC_STRNLEN_USER
+	bool
+
 config GENERIC_FIND_FIRST_BIT
 	bool
 
diff --git a/lib/Makefile b/lib/Makefile
index b98df505f335..77937a7dd5ce 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -126,6 +126,7 @@ obj-$(CONFIG_CLZ_TAB) += clz_tab.o
 obj-$(CONFIG_DDR) += jedec_ddr_data.o
 
 obj-$(CONFIG_GENERIC_STRNCPY_FROM_USER) += strncpy_from_user.o
+obj-$(CONFIG_GENERIC_STRNLEN_USER) += strnlen_user.o
 
 hostprogs-y	:= gen_crc32table
 clean-files	:= crc32table.h
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
new file mode 100644
index 000000000000..3dc78a846b1a
--- /dev/null
+++ b/lib/strnlen_user.c
@@ -0,0 +1,143 @@
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/uaccess.h>
+
+/*
+ * Currently this requires the word-at-a-time
+ * macros for the "generic" part.
+ */
+#include <asm/word-at-a-time.h>
+
+/* Set bits in the first 'n' bytes when loaded from memory */
+#ifdef __LITTLE_ENDIAN
+#  define aligned_byte_mask(n) ((1ul << 8*(n))-1)
+#else
+#  define aligned_byte_mask(n) (~0xfful << 8*(7-(n)))
+#endif
+
+/*
+ * Do a strnlen, return length of string *with* final '\0'.
+ * 'count' is the user-supplied count, while 'max' is the
+ * address space maximum.
+ *
+ * Return 0 for exceptions (which includes hitting the address
+ * space maximum), or 'count+1' if hitting the user-supplied
+ * maximum count.
+ *
+ * NOTE! We can sometimes overshoot the user-supplied maximum
+ * if it fits in a aligned 'long'. The caller needs to check
+ * the return value against "> max".
+ */
+static inline long do_strnlen_user(const char __user *src, unsigned long count, unsigned long max)
+{
+	long align, res = 0;
+	unsigned long c;
+
+	/*
+	 * Truncate 'max' to the user-specified limit, so that
+	 * we only have one limit we need to check in the loop
+	 */
+	if (max > count)
+		max = count;
+
+	/*
+	 * Do everything aligned. But that means that we
+	 * need to also expand the maximum..
+	 */
+	align = (sizeof(long) - 1) & (unsigned long)src;
+	src -= align;
+	max += align;
+
+	if (unlikely(__get_user(c,(unsigned long __user *)src)))
+		return 0;
+	c |= aligned_byte_mask(align);
+
+	for (;;) {
+		unsigned long mask;
+
+		mask = has_zero(c);
+		if (mask) {
+			mask = (mask - 1) & ~mask;
+			mask >>= 7;
+			return res + count_masked_bytes(mask) + 1 - align;
+		}
+		res += sizeof(unsigned long);
+		if (unlikely(max < sizeof(unsigned long)))
+			break;
+		max -= sizeof(unsigned long);
+		if (unlikely(__get_user(c,(unsigned long __user *)(src+res))))
+			return 0;
+	}
+	res -= align;
+
+	/*
+	 * Uhhuh. We hit 'max'. But was that the user-specified maximum
+	 * too? If so, return the marker for "too long".
+	 */
+	if (res >= count)
+		return count+1;
+
+	/*
+	 * Nope: we hit the address space limit, and we still had more
+	 * characters the caller would have wanted. That's 0.
+	 */
+	return 0;
+}
+
+/**
+ * strnlen_user: - Get the size of a user string INCLUDING final NUL.
+ * @str: The string to measure.
+ * @count: Maximum count (including NUL character)
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * Get the size of a NUL-terminated string in user space.
+ *
+ * Returns the size of the string INCLUDING the terminating NUL.
+ * If the string is too long, returns 'count+1'.
+ * On exception (or invalid count), returns 0.
+ */
+long strnlen_user(const char __user *str, long count)
+{
+	unsigned long max_addr, src_addr;
+
+	if (unlikely(count <= 0))
+		return 0;
+
+	max_addr = user_addr_max();
+	src_addr = (unsigned long)str;
+	if (likely(src_addr < max_addr)) {
+		unsigned long max = max_addr - src_addr;
+		return do_strnlen_user(str, count, max);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(strnlen_user);
+
+/**
+ * strlen_user: - Get the size of a user string INCLUDING final NUL.
+ * @str: The string to measure.
+ *
+ * Context: User context only.  This function may sleep.
+ *
+ * Get the size of a NUL-terminated string in user space.
+ *
+ * Returns the size of the string INCLUDING the terminating NUL.
+ * On exception, returns 0.
+ *
+ * If there is a limit on the length of a valid string, you may wish to
+ * consider using strnlen_user() instead.
+ */
+long strlen_user(const char __user *str)
+{
+	unsigned long max_addr, src_addr;
+
+	max_addr = user_addr_max();
+	src_addr = (unsigned long)str;
+	if (likely(src_addr < max_addr)) {
+		unsigned long max = max_addr - src_addr;
+		return do_strnlen_user(str, ~0ul, max);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(strlen_user);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-25 22:35 Mostly portable strnlen_user() Linus Torvalds
@ 2012-05-25 23:06 ` David Miller
  2012-05-25 23:11   ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2012-05-25 23:06 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 15:35:38 -0700 (PDT)

> So it's not truly generic, but it's in lib/strnlen_user.c and under the 
> CONFIG_GENERIC_STRNLEN_USER config option, because I'm convinced it should 
> be possible to do something that does the right thing for everybody. 

It should be relatively easy, especially since you work with aligned
words.

> What's the appropriate *clean* interface to allow architectures 
> to tweak the little details?

Maybe something like:

#define HAS_ZERO_CONST_DECLARE	...
#define HAS_ZERO_VAR_DECLARE	...
#define HAS_ZERO_ARGS		...

static inline unsigned long has_zero(unsigned long c ...)
{
}

#define FIND_ZERO_ARGS		...

static inline long find_zero(unsigned long c, unsigned long mask ...)
{
}

So at the use sites it's something like:

long do_strwhatever_user()
{
	HAS_ZERO_CONST_DECLARE;

	...

	while (max >= sizeof(unsigned long)) {
		unsigned long mask, c;
		HAS_ZERO_VAR_DECLARE;

		mask = has_zero(c HAS_ZERO_ARGS);
		if (mask)
			return res + find_zero(c, mask FIND_ZERO_ARGS);
	...
	}
	...
}

A little ugly.  The main problem is the magic args macros.

We can kill off the HAS_ZERO_CONST_DECLARE booger by instead
exposing the magic constants used by the particular has_zero()
implementation.  Then we'll have:

#define HAVE_ZERO_CONST1	REPEAT_BYTE(X)
#define HAVE_ZERO_CONST2	REPEAT_BYTE(X)

static inline unsigned long has_zero(unsigned long c,
				     const unsigned long magic1,
				     const unsigned long magic2);

and then it becomes something like:

long do_strwhatever_user()
{
	const unsigned long magic1 = HAVE_ZERO_CONST1;
	const unsigned long magic2 = HAVE_ZERO_CONST2;

	...

	while (max >= sizeof(unsigned long)) {
		unsigned long mask, c;
		HAS_ZERO_VAR_DECLARE;

		mask = has_zero(c, magic1, magic2);
		if (mask)
			return res + find_zero(c, mask, magic1, magic2);

The remaining problem, which I was trying to accomodate with the funky
ARGS macros, is propagating part of the has_zero() calculation into the
find_zero() call.

Hmmm...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-25 23:06 ` David Miller
@ 2012-05-25 23:11   ` Linus Torvalds
  2012-05-25 23:14     ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2012-05-25 23:11 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch

On Fri, May 25, 2012 at 4:06 PM, David Miller <davem@davemloft.net> wrote:
>
> Maybe something like:
>
> #define HAS_ZERO_CONST_DECLARE  ...
> #define HAS_ZERO_VAR_DECLARE    ...
> #define HAS_ZERO_ARGS           ...

Ugh. I'd rather then have a opaque typedef that may be a single word
or a couple of words. Then it just declares that (without knowing what
it is), and passes it around to the "find zeroes" and "get length from
mask" functions.

For x86, it would be that single "mask" variable, for sparc it would
be a struct of the "v" and "rhs" variables.

Hmm?

                             Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-25 23:11   ` Linus Torvalds
@ 2012-05-25 23:14     ` David Miller
  2012-05-25 23:37       ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2012-05-25 23:14 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 16:11:36 -0700

> Ugh. I'd rather then have a opaque typedef that may be a single word
> or a couple of words. Then it just declares that (without knowing what
> it is), and passes it around to the "find zeroes" and "get length from
> mask" functions.
> 
> For x86, it would be that single "mask" variable, for sparc it would
> be a struct of the "v" and "rhs" variables.
> 
> Hmm?

That might be better.

I suppose then I'd need to make BE's has_zero() a macro instead of a
function.  Either that or we pass a pointer to this opaque typedef
thing.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-25 23:14     ` David Miller
@ 2012-05-25 23:37       ` Linus Torvalds
  2012-05-25 23:41         ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2012-05-25 23:37 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch

On Fri, May 25, 2012 at 4:14 PM, David Miller <davem@davemloft.net> wrote:
>
> I suppose then I'd need to make BE's has_zero() a macro instead of a
> function.  Either that or we pass a pointer to this opaque typedef
> thing.

Gcc is *usually* pretty good about optimizing small structures on the
stack, even if you pass a pointer (if the pointer then always gets
dereferenced within that function). So I think you could try the
"pointer to opaque thing" approach and see.

But yeah, the macro approach obviously puts much less reliance on the
optimizer getting things right, so it might be the way to go if it
turns out that gcc screws up code generation.

                  Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-25 23:37       ` Linus Torvalds
@ 2012-05-25 23:41         ` David Miller
  2012-05-26  0:41           ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2012-05-25 23:41 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 16:37:40 -0700

> On Fri, May 25, 2012 at 4:14 PM, David Miller <davem@davemloft.net> wrote:
>>
>> I suppose then I'd need to make BE's has_zero() a macro instead of a
>> function.  Either that or we pass a pointer to this opaque typedef
>> thing.
> 
> Gcc is *usually* pretty good about optimizing small structures on the
> stack, even if you pass a pointer (if the pointer then always gets
> dereferenced within that function). So I think you could try the
> "pointer to opaque thing" approach and see.
> 
> But yeah, the macro approach obviously puts much less reliance on the
> optimizer getting things right, so it might be the way to go if it
> turns out that gcc screws up code generation.

I'm playing around with this now, I should have something for you
to look at in the next few hours.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-25 23:41         ` David Miller
@ 2012-05-26  0:41           ` David Miller
  2012-05-26  1:09             ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2012-05-26  0:41 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: David Miller <davem@davemloft.net>
Date: Fri, 25 May 2012 19:41:25 -0400 (EDT)

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 25 May 2012 16:37:40 -0700
> 
>> On Fri, May 25, 2012 at 4:14 PM, David Miller <davem@davemloft.net> wrote:
>>>
>>> I suppose then I'd need to make BE's has_zero() a macro instead of a
>>> function.  Either that or we pass a pointer to this opaque typedef
>>> thing.
>> 
>> Gcc is *usually* pretty good about optimizing small structures on the
>> stack, even if you pass a pointer (if the pointer then always gets
>> dereferenced within that function). So I think you could try the
>> "pointer to opaque thing" approach and see.
>> 
>> But yeah, the macro approach obviously puts much less reliance on the
>> optimizer getting things right, so it might be the way to go if it
>> turns out that gcc screws up code generation.
> 
> I'm playing around with this now, I should have something for you
> to look at in the next few hours.

Ok, something like the following, just to give you an idea.  Just the
sparc word-at-a-time.h header addition and the lib/strnlen_user.c
changes.

Probably I should add little-endian code to the header and put it
under asm-generic instead since there isn't really anything sparc
specific about it and this will make it easier for other ports to use
this stuff.

This is untested and of course we'd need to tweak the x86
word-at-a-time header and adjust the other call sites in
the DCACHE and elsewhere.

In fact, one thing I need to do next is see how suitable these
interface adjustments are for those uses.  I've only really made sure
that they work out for the new generic lib/str*.c routines.

Looking at the assembler, gcc does a reasonable job with the
aggregates, optimizing them just as if they were scalar local
variables.

diff --git a/arch/sparc/include/asm/word-at-a-time.h b/arch/sparc/include/asm/word-at-a-time.h
new file mode 100644
index 0000000..7fb32e1
--- /dev/null
+++ b/arch/sparc/include/asm/word-at-a-time.h
@@ -0,0 +1,49 @@
+#ifndef _ASM_WORD_AT_A_TIME_H
+#define _ASM_WORD_AT_A_TIME_H
+
+#include <linux/kernel.h>
+
+struct word_op_data {
+	const unsigned long high_bits;
+	const unsigned long low_bits;
+	unsigned long rhs;
+};
+
+typedef struct word_op_data word_op_data_t;
+
+#define INIT_WORD_OP_DATA			\
+{	.high_bits = REPEAT_BYTE(0xfe) + 1,	\
+	.low_bits  = REPEAT_BYTE(0x7f),		\
+}
+
+#define DEFINE_WORD_OP_DATA(X)	\
+	word_op_data_t X = INIT_WORD_OP_DATA
+
+static inline long find_zero(unsigned long c, word_op_data_t *p)
+{
+	unsigned long mask = (c & p->low_bits) + p->low_bits;
+	long byte;
+
+	mask = ~(mask | p->rhs);
+
+	byte = 0;
+#ifdef CONFIG_64BIT
+	if (mask >> 32)
+		mask >>= 32;
+	else
+		byte = 4;
+#endif
+	if (mask >> 16)
+		mask >>= 16;
+	else
+		byte += 2;
+	return (mask >> 8) ? byte : byte + 1;
+}
+
+static inline bool has_zero(unsigned long c, word_op_data_t *p)
+{
+	p->rhs = c | p->low_bits;
+	return (c + p->high_bits) & ~p->rhs;
+}
+
+#endif /* _ASM_WORD_AT_A_TIME_H */
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index 3dc78a8..d5fdd26 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -30,6 +30,7 @@
  */
 static inline long do_strnlen_user(const char __user *src, unsigned long count, unsigned long max)
 {
+	DEFINE_WORD_OP_DATA(wd);
 	long align, res = 0;
 	unsigned long c;
 
@@ -53,14 +54,8 @@ static inline long do_strnlen_user(const char __user *src, unsigned long count,
 	c |= aligned_byte_mask(align);
 
 	for (;;) {
-		unsigned long mask;
-
-		mask = has_zero(c);
-		if (mask) {
-			mask = (mask - 1) & ~mask;
-			mask >>= 7;
-			return res + count_masked_bytes(mask) + 1 - align;
-		}
+		if (has_zero(c, &wd))
+			return res + find_zero(c, &wd) + 1 - align;
 		res += sizeof(unsigned long);
 		if (unlikely(max < sizeof(unsigned long)))
 			break;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  0:41           ` David Miller
@ 2012-05-26  1:09             ` Linus Torvalds
  2012-05-26  1:21               ` David Miller
  2012-05-26  2:13               ` Linus Torvalds
  0 siblings, 2 replies; 26+ messages in thread
From: Linus Torvalds @ 2012-05-26  1:09 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch

On Fri, May 25, 2012 at 5:41 PM, David Miller <davem@davemloft.net> wrote:
>
> Ok, something like the following, just to give you an idea.  Just the
> sparc word-at-a-time.h header addition and the lib/strnlen_user.c
> changes.

Looks good to me.

> Probably I should add little-endian code to the header and put it
> under asm-generic instead since there isn't really anything sparc
> specific about it and this will make it easier for other ports to use
> this stuff.

Actually, I think it's better if we just do two separate ones: a
"generic little-endian" and a "generic big-endian" one. Rather than
mixing both into one file that is then included from some arch-file
that statically knows its endianness instead.

> This is untested and of course we'd need to tweak the x86
> word-at-a-time header and adjust the other call sites in
> the DCACHE and elsewhere.

I can do that. Give me a few minutes. Or more. Let's see how it ends up..

                 Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  1:09             ` Linus Torvalds
@ 2012-05-26  1:21               ` David Miller
  2012-05-26  2:13               ` Linus Torvalds
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2012-05-26  1:21 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 18:09:33 -0700

> On Fri, May 25, 2012 at 5:41 PM, David Miller <davem@davemloft.net> wrote:
>> Probably I should add little-endian code to the header and put it
>> under asm-generic instead since there isn't really anything sparc
>> specific about it and this will make it easier for other ports to use
>> this stuff.
> 
> Actually, I think it's better if we just do two separate ones: a
> "generic little-endian" and a "generic big-endian" one. Rather than
> mixing both into one file that is then included from some arch-file
> that statically knows its endianness instead.

Agreed, it will look a lot nicer that way.

>> This is untested and of course we'd need to tweak the x86
>> word-at-a-time header and adjust the other call sites in
>> the DCACHE and elsewhere.
> 
> I can do that. Give me a few minutes. Or more. Let's see how it ends
> up..

Ok.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  1:09             ` Linus Torvalds
  2012-05-26  1:21               ` David Miller
@ 2012-05-26  2:13               ` Linus Torvalds
  2012-05-26  2:43                 ` David Miller
  1 sibling, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2012-05-26  2:13 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch

On Fri, May 25, 2012 at 6:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I can do that. Give me a few minutes. Or more. Let's see how it ends up..

Ugh. So this was not at all as trivial as I thought it would be.

One issue is just syntactic: the fact that you want to initialize
those constants just results in nasty syntax. Is gcc really so bad on
sparc that it doesn't do the obvious CSE etc on the constants? That
surprises me, because it does it on x86-64..

But the more annoying issue is that the dentry code really does depend
on the fact that we can just combine the masks for two values (the
zero bytes and the '/' bytes) and test and munge them together. And
that's simply not true of your big-endian version.

I think that instead of the "find_zero()" function, we need a
"generate_mask()" function that actually generates the reliable mask
of "these bytes have a bit set if the byte was zero". That's the

  unsigned long mask = (c & p->low_bits) + p->low_bits;
  mask = (mask | p->rhs);

phase of the big-endian logic (and would be a no-op on little endian).

So it looks like we'd really want to have three phases:

 - the "quick check if it has a zero" (and save that value away as the
preliminary mask)

   This is what the loop generates and tests end on, and could
(obviously) be or'ed together in the *test* to check two cases at the
same time.

      preliminary = has_zero(val);

 - the "reliable mask" that is generated from the quick mask and the last value

   This would be a "prep" function right after the loop exit, and it
could be or'ed together to have just one value for the final phase:

        reliable = prep_zero_mask(val, preliminary);

 - the find zero byte (and for some users, find "mask" that goes with
it) from the single value generated above.

        offset = find_zero(reliable);

But quite frankly, your three-value struct makes this much uglier than
I think it would need to be. That's *especially* true since the
constants should be shared, even if you have two different values
going on concurrently for the dentry case (the NUL and '/' values).

Because if I read the sparc code correctly, it really never needs more
than one single word at any time. The only reason you have that opaque
data structure is that the other two words are those constants that
you preload, and that really the compiler *should* be able to CSE just
fine. Do you *really* need to CSE them by hand?

I think we could get rid of that structure entirely, and just have
everybody use a single "unsigned long". If you just used the constants
directly in the code and could rely on the compiler. Hmm?

So how bad does the code look on sparc if you just get rid of the
"low_bits" and "high_bits" and just replace them with the constants
they are?

                        Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  2:13               ` Linus Torvalds
@ 2012-05-26  2:43                 ` David Miller
  2012-05-26  3:56                   ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2012-05-26  2:43 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 19:13:31 -0700

> One issue is just syntactic: the fact that you want to initialize
> those constants just results in nasty syntax. Is gcc really so bad on
> sparc that it doesn't do the obvious CSE etc on the constants? That
> surprises me, because it does it on x86-64..

It CSE's them into the loop, but like I said a few days
ago it doesn't CSE them into the find_zero() code block.

> Do you *really* need to CSE them by hand?

Yes, GCC refused to do it.

It CSEs it into the loop, but not the tail code.

> So how bad does the code look on sparc if you just get rid of the
> "low_bits" and "high_bits" and just replace them with the constants
> they are?

It looks like GCC reconstituting the constants for the zero byte
determination, which on 64-bit is nearly half of the instructions
of the exit path.

But you shouldn't really need to care about this, we can surely
abstract it behind something.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  2:43                 ` David Miller
@ 2012-05-26  3:56                   ` Linus Torvalds
  2012-05-26  4:15                     ` David Miller
  2012-05-26  4:16                     ` Linus Torvalds
  0 siblings, 2 replies; 26+ messages in thread
From: Linus Torvalds @ 2012-05-26  3:56 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On Fri, May 25, 2012 at 7:43 PM, David Miller <davem@davemloft.net> wrote:
>
> It CSE's them into the loop, but like I said a few days
> ago it doesn't CSE them into the find_zero() code block.

Ok, I think I have a solution.

This is *totally* untested, but it compiles on x86. And I think it's
"close to the right thing".

It makes those constants explicit, so that sharing them is easy when
we have two different users, and actually does that in fs/namei.c.

The interface is a bit odd, but the rules are:

 - has_zero -> prep_zero_mask -> create_zero_mask -> { zero_bytemask ,
find_zero }

where two masks that have been created by prep_zero_mask can be or'ed
together to create one "one or the other" mask.

So the has_zero -> prep_zero_mask boundary is due to your BE
efficiency issue (ie "has_zero()" goes inside the loop, and
"prep_zero_mask()" goes outside).

The prep_zero_mask -> create_zero_mask boundary is due to that "we can
combine multiple masks at this level" issue.

And the create_zero_mask -> { zero_bytemask , find_zero } boundary is
because we actually want to create a bytemask for some things, and
just find the zero for others.

Does this *work*? I don't know. But it generates code that looks *roughly* sane.

Btw, this patch replaces the one you sent - the interdiff didn't look
sane (but it goes on top of the one I sent out earlier). And I just
put your BE-specific header into asm-generic after all.

Does this look sane to you?

                   Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 11569 bytes --]

 arch/openrisc/include/asm/Kbuild      |  1 +
 arch/sparc/include/asm/Kbuild         |  1 +
 arch/x86/Kconfig                      |  1 +
 arch/x86/include/asm/word-at-a-time.h | 33 ++++++++++--
 arch/x86/lib/usercopy.c               | 97 -----------------------------------
 fs/namei.c                            | 22 ++++----
 include/asm-generic/word-at-a-time.h  | 44 ++++++++++++++++
 lib/strncpy_from_user.c               | 47 +++--------------
 lib/strnlen_user.c                    | 13 +++--
 9 files changed, 102 insertions(+), 157 deletions(-)

diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index c936483bc8e2..3f35c38d7b64 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -66,3 +66,4 @@ generic-y += topology.h
 generic-y += types.h
 generic-y += ucontext.h
 generic-y += user.h
+generic-y += word-at-a-time.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 2c2e38821f60..67f83e0a0d68 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -21,3 +21,4 @@ generic-y += div64.h
 generic-y += local64.h
 generic-y += irq_regs.h
 generic-y += local.h
+generic-y += word-at-a-time.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 54bbfac5b91a..d700811785ea 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -93,6 +93,7 @@ config X86
 	select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC)
 	select GENERIC_TIME_VSYSCALL if X86_64
 	select KTIME_SCALAR if X86_32
+	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
 
 config INSTRUCTION_DECODER
diff --git a/arch/x86/include/asm/word-at-a-time.h b/arch/x86/include/asm/word-at-a-time.h
index ae03facfadd6..0a993cae80a8 100644
--- a/arch/x86/include/asm/word-at-a-time.h
+++ b/arch/x86/include/asm/word-at-a-time.h
@@ -13,6 +13,12 @@
 
 #ifdef CONFIG_64BIT
 
+struct word_at_a_time {
+	const unsigned long one_bits, high_bits;
+};
+
+#define WORD_AT_A_TIME_CONSTANTS { REPEAT_BYTE(0x01), REPEAT_BYTE(0x80) }
+
 /*
  * Jan Achrenius on G+: microoptimized version of
  * the simpler "(mask & ONEBYTES) * ONEBYTES >> 56"
@@ -37,10 +43,31 @@ static inline long count_masked_bytes(long mask)
 
 #endif
 
-/* Return the high bit set in the first byte that is a zero */
-static inline unsigned long has_zero(unsigned long a)
+/* Return nonzero if it has a zero */
+static inline unsigned long has_zero(unsigned long a, unsigned long *bits, const struct word_at_a_time *c)
+{
+	unsigned long mask = ((a - c->one_bits) & ~a) & c->high_bits;
+	*bits = mask;
+	return mask;
+}
+
+static inline unsigned long prep_zero_mask(unsigned long a, unsigned long bits, const struct word_at_a_time *c)
+{
+	return bits;
+}
+
+static inline unsigned long create_zero_mask(unsigned long bits)
+{
+	bits = (bits - 1) & ~bits;
+	return bits >> 7;
+}
+
+/* The mask we created is directly usable as a bytemask */
+#define zero_bytemask(mask) (mask)
+
+static inline unsigned long find_zero(unsigned long mask)
 {
-	return ((a - REPEAT_BYTE(0x01)) & ~a) & REPEAT_BYTE(0x80);
+	return count_masked_bytes(mask);
 }
 
 /*
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index 4239f063f6ac..f61ee67ec00f 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -43,100 +43,3 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	return len;
 }
 EXPORT_SYMBOL_GPL(copy_from_user_nmi);
-
-/*
- * Do a strncpy, return length of string without final '\0'.
- * 'count' is the user-supplied count (return 'count' if we
- * hit it), 'max' is the address space maximum (and we return
- * -EFAULT if we hit it).
- */
-static inline long do_strncpy_from_user(char *dst, const char __user *src, long count, unsigned long max)
-{
-	long res = 0;
-
-	/*
-	 * Truncate 'max' to the user-specified limit, so that
-	 * we only have one limit we need to check in the loop
-	 */
-	if (max > count)
-		max = count;
-
-	while (max >= sizeof(unsigned long)) {
-		unsigned long c, mask;
-
-		/* Fall back to byte-at-a-time if we get a page fault */
-		if (unlikely(__get_user(c,(unsigned long __user *)(src+res))))
-			break;
-		mask = has_zero(c);
-		if (mask) {
-			mask = (mask - 1) & ~mask;
-			mask >>= 7;
-			*(unsigned long *)(dst+res) = c & mask;
-			return res + count_masked_bytes(mask);
-		}
-		*(unsigned long *)(dst+res) = c;
-		res += sizeof(unsigned long);
-		max -= sizeof(unsigned long);
-	}
-
-	while (max) {
-		char c;
-
-		if (unlikely(__get_user(c,src+res)))
-			return -EFAULT;
-		dst[res] = c;
-		if (!c)
-			return res;
-		res++;
-		max--;
-	}
-
-	/*
-	 * Uhhuh. We hit 'max'. But was that the user-specified maximum
-	 * too? If so, that's ok - we got as much as the user asked for.
-	 */
-	if (res >= count)
-		return res;
-
-	/*
-	 * Nope: we hit the address space limit, and we still had more
-	 * characters the caller would have wanted. That's an EFAULT.
-	 */
-	return -EFAULT;
-}
-
-/**
- * strncpy_from_user: - Copy a NUL terminated string from userspace.
- * @dst:   Destination address, in kernel space.  This buffer must be at
- *         least @count bytes long.
- * @src:   Source address, in user space.
- * @count: Maximum number of bytes to copy, including the trailing NUL.
- *
- * Copies a NUL-terminated string from userspace to kernel space.
- *
- * On success, returns the length of the string (not including the trailing
- * NUL).
- *
- * If access to userspace fails, returns -EFAULT (some data may have been
- * copied).
- *
- * If @count is smaller than the length of the string, copies @count bytes
- * and returns @count.
- */
-long
-strncpy_from_user(char *dst, const char __user *src, long count)
-{
-	unsigned long max_addr, src_addr;
-
-	if (unlikely(count <= 0))
-		return 0;
-
-	max_addr = user_addr_max();
-	src_addr = (unsigned long)src;
-	if (likely(src_addr < max_addr)) {
-		unsigned long max = max_addr - src_addr;
-		return do_strncpy_from_user(dst, src, count, max);
-	}
-	return -EFAULT;
-}
-EXPORT_SYMBOL(strncpy_from_user);
diff --git a/fs/namei.c b/fs/namei.c
index 93ff12b1a1de..c651f02c9fec 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1452,7 +1452,8 @@ EXPORT_SYMBOL(full_name_hash);
  */
 static inline unsigned long hash_name(const char *name, unsigned int *hashp)
 {
-	unsigned long a, mask, hash, len;
+	unsigned long a, b, adata, bdata, mask, hash, len;
+	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 
 	hash = a = 0;
 	len = -sizeof(unsigned long);
@@ -1460,17 +1461,18 @@ static inline unsigned long hash_name(const char *name, unsigned int *hashp)
 		hash = (hash + a) * 9;
 		len += sizeof(unsigned long);
 		a = load_unaligned_zeropad(name+len);
-		/* Do we have any NUL or '/' bytes in this word? */
-		mask = has_zero(a) | has_zero(a ^ REPEAT_BYTE('/'));
-	} while (!mask);
-
-	/* The mask *below* the first high bit set */
-	mask = (mask - 1) & ~mask;
-	mask >>= 7;
-	hash += a & mask;
+		b = a ^ REPEAT_BYTE('/');
+	} while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata, &constants)));
+
+	adata = prep_zero_mask(a, adata, &constants);
+	bdata = prep_zero_mask(b, bdata, &constants);
+
+	mask = create_zero_mask(adata | bdata);
+
+	hash += a & zero_bytemask(mask);
 	*hashp = fold_hash(hash);
 
-	return len + count_masked_bytes(mask);
+	return len + find_zero(mask);
 }
 
 #else
diff --git a/include/asm-generic/word-at-a-time.h b/include/asm-generic/word-at-a-time.h
new file mode 100644
index 000000000000..90c3d41188da
--- /dev/null
+++ b/include/asm-generic/word-at-a-time.h
@@ -0,0 +1,44 @@
+#ifndef _ASM_WORD_AT_A_TIME_H
+#define _ASM_WORD_AT_A_TIME_H
+
+#include <linux/kernel.h>
+
+struct word_at_a_time {
+	const unsigned long high_bits, low_bits;
+};
+
+#define DECLARE_CONST \
+	const struct word_at_a_time constants = { REPEAT_BYTE(0xfe) + 1, REPEAT_BYTE(0x7f) };
+#define USE_CONST , &constants
+
+/* Bit set in the bytes that have a zero */
+static inline long create_zero_mask(unsigned long val, unsigned long rhs, const struct word_at_a_time *c)
+{
+	unsigned long mask = (val & c->low_bits) + c->low_bits;
+	return ~(mask | rhs);
+}
+
+static inline long find_zero(unsigned long mask)
+{
+	long byte = 0;
+#ifdef CONFIG_64BIT
+	if (mask >> 32)
+		mask >>= 32;
+	else
+		byte = 4;
+#endif
+	if (mask >> 16)
+		mask >>= 16;
+	else
+		byte += 2;
+	return (mask >> 8) ? byte : byte + 1;
+}
+
+static inline bool has_zero(unsigned long val, unsigned long *data, const struct word_at_a_time *c)
+{
+	unsigned long rhs = val | c->low_bits;
+	*data = rhs;
+	return (val + c->high_bits) & ~rhs;
+}
+
+#endif /* _ASM_WORD_AT_A_TIME_H */
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index c4c09b0e96ba..bb2b201d6ad0 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -4,37 +4,7 @@
 #include <linux/errno.h>
 
 #include <asm/byteorder.h>
-
-static inline long find_zero(unsigned long mask)
-{
-	long byte = 0;
-
-#ifdef __BIG_ENDIAN
-#ifdef CONFIG_64BIT
-	if (mask >> 32)
-		mask >>= 32;
-	else
-		byte = 4;
-#endif
-	if (mask >> 16)
-		mask >>= 16;
-	else
-		byte += 2;
-	return (mask >> 8) ? byte : byte + 1;
-#else
-#ifdef CONFIG_64BIT
-	if (!((unsigned int) mask)) {
-		mask >>= 32;
-		byte = 4;
-	}
-#endif
-	if (!(mask & 0xffff)) {
-		mask >>= 16;
-		byte += 2;
-	}
-	return (mask & 0xff) ? byte : byte + 1;
-#endif
-}
+#include <asm/word-at-a-time.h>
 
 #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 #define IS_UNALIGNED(src, dst)	0
@@ -51,8 +21,7 @@ static inline long find_zero(unsigned long mask)
  */
 static inline long do_strncpy_from_user(char *dst, const char __user *src, long count, unsigned long max)
 {
-	const unsigned long high_bits = REPEAT_BYTE(0xfe) + 1;
-	const unsigned long low_bits = REPEAT_BYTE(0x7f);
+	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 	long res = 0;
 
 	/*
@@ -66,18 +35,16 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src, long
 		goto byte_at_a_time;
 
 	while (max >= sizeof(unsigned long)) {
-		unsigned long c, v, rhs;
+		unsigned long c, data;
 
 		/* Fall back to byte-at-a-time if we get a page fault */
 		if (unlikely(__get_user(c,(unsigned long __user *)(src+res))))
 			break;
-		rhs = c | low_bits;
-		v = (c + high_bits) & ~rhs;
 		*(unsigned long *)(dst+res) = c;
-		if (v) {
-			v = (c & low_bits) + low_bits;
-			v = ~(v | rhs);
-			return res + find_zero(v);
+		if (has_zero(c, &data, &constants)) {
+			data = prep_zero_mask(c, data, &constants);
+			data = create_zero_mask(data);
+			return res + find_zero(data);
 		}
 		res += sizeof(unsigned long);
 		max -= sizeof(unsigned long);
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index 3dc78a846b1a..735f91ee9b94 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -30,6 +30,7 @@
  */
 static inline long do_strnlen_user(const char __user *src, unsigned long count, unsigned long max)
 {
+	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 	long align, res = 0;
 	unsigned long c;
 
@@ -53,13 +54,11 @@ static inline long do_strnlen_user(const char __user *src, unsigned long count,
 	c |= aligned_byte_mask(align);
 
 	for (;;) {
-		unsigned long mask;
-
-		mask = has_zero(c);
-		if (mask) {
-			mask = (mask - 1) & ~mask;
-			mask >>= 7;
-			return res + count_masked_bytes(mask) + 1 - align;
+		unsigned long data;
+		if (has_zero(c, &data, &constants)) {
+			data = prep_zero_mask(c, data, &constants);
+			data = create_zero_mask(data);
+			return res + find_zero(data) + 1 - align;
 		}
 		res += sizeof(unsigned long);
 		if (unlikely(max < sizeof(unsigned long)))

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  3:56                   ` Linus Torvalds
@ 2012-05-26  4:15                     ` David Miller
  2012-05-26  4:19                       ` Linus Torvalds
  2012-05-26  4:16                     ` Linus Torvalds
  1 sibling, 1 reply; 26+ messages in thread
From: David Miller @ 2012-05-26  4:15 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 20:56:37 -0700

> Does this look sane to you?

At first glance, it does.

I play with this and try to get it working on sparc tonight, thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  3:56                   ` Linus Torvalds
  2012-05-26  4:15                     ` David Miller
@ 2012-05-26  4:16                     ` Linus Torvalds
  1 sibling, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2012-05-26  4:16 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]

On Fri, May 25, 2012 at 8:56 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This is *totally* untested, but it compiles on x86. And I think it's
> "close to the right thing".

It actually works on x86 - I just booted it, and it all seems ok. The
extra abstraction generates *one* extra instruction in fs/namei.c, but
on the whole that all seems ok.

But the asm-generic.h thing was in a half-arsed half-way-state between
your version and my final interface, so it sure as hell wouldn't have
worked on sparc or openrisc.

This is the "fix up that header file to actually do what it is
supposed to do" version. Sorry about that,

                   Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 11542 bytes --]

 arch/openrisc/include/asm/Kbuild      |  1 +
 arch/sparc/include/asm/Kbuild         |  1 +
 arch/x86/Kconfig                      |  1 +
 arch/x86/include/asm/word-at-a-time.h | 33 ++++++++++--
 arch/x86/lib/usercopy.c               | 97 -----------------------------------
 fs/namei.c                            | 22 ++++----
 include/asm-generic/word-at-a-time.h  | 44 ++++++++++++++++
 lib/strncpy_from_user.c               | 47 +++--------------
 lib/strnlen_user.c                    | 13 +++--
 9 files changed, 102 insertions(+), 157 deletions(-)

diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index c936483bc8e2..3f35c38d7b64 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -66,3 +66,4 @@ generic-y += topology.h
 generic-y += types.h
 generic-y += ucontext.h
 generic-y += user.h
+generic-y += word-at-a-time.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 2c2e38821f60..67f83e0a0d68 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -21,3 +21,4 @@ generic-y += div64.h
 generic-y += local64.h
 generic-y += irq_regs.h
 generic-y += local.h
+generic-y += word-at-a-time.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 54bbfac5b91a..d700811785ea 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -93,6 +93,7 @@ config X86
 	select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC)
 	select GENERIC_TIME_VSYSCALL if X86_64
 	select KTIME_SCALAR if X86_32
+	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
 
 config INSTRUCTION_DECODER
diff --git a/arch/x86/include/asm/word-at-a-time.h b/arch/x86/include/asm/word-at-a-time.h
index ae03facfadd6..0a993cae80a8 100644
--- a/arch/x86/include/asm/word-at-a-time.h
+++ b/arch/x86/include/asm/word-at-a-time.h
@@ -13,6 +13,12 @@
 
 #ifdef CONFIG_64BIT
 
+struct word_at_a_time {
+	const unsigned long one_bits, high_bits;
+};
+
+#define WORD_AT_A_TIME_CONSTANTS { REPEAT_BYTE(0x01), REPEAT_BYTE(0x80) }
+
 /*
  * Jan Achrenius on G+: microoptimized version of
  * the simpler "(mask & ONEBYTES) * ONEBYTES >> 56"
@@ -37,10 +43,31 @@ static inline long count_masked_bytes(long mask)
 
 #endif
 
-/* Return the high bit set in the first byte that is a zero */
-static inline unsigned long has_zero(unsigned long a)
+/* Return nonzero if it has a zero */
+static inline unsigned long has_zero(unsigned long a, unsigned long *bits, const struct word_at_a_time *c)
+{
+	unsigned long mask = ((a - c->one_bits) & ~a) & c->high_bits;
+	*bits = mask;
+	return mask;
+}
+
+static inline unsigned long prep_zero_mask(unsigned long a, unsigned long bits, const struct word_at_a_time *c)
+{
+	return bits;
+}
+
+static inline unsigned long create_zero_mask(unsigned long bits)
+{
+	bits = (bits - 1) & ~bits;
+	return bits >> 7;
+}
+
+/* The mask we created is directly usable as a bytemask */
+#define zero_bytemask(mask) (mask)
+
+static inline unsigned long find_zero(unsigned long mask)
 {
-	return ((a - REPEAT_BYTE(0x01)) & ~a) & REPEAT_BYTE(0x80);
+	return count_masked_bytes(mask);
 }
 
 /*
diff --git a/arch/x86/lib/usercopy.c b/arch/x86/lib/usercopy.c
index 4239f063f6ac..f61ee67ec00f 100644
--- a/arch/x86/lib/usercopy.c
+++ b/arch/x86/lib/usercopy.c
@@ -43,100 +43,3 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
 	return len;
 }
 EXPORT_SYMBOL_GPL(copy_from_user_nmi);
-
-/*
- * Do a strncpy, return length of string without final '\0'.
- * 'count' is the user-supplied count (return 'count' if we
- * hit it), 'max' is the address space maximum (and we return
- * -EFAULT if we hit it).
- */
-static inline long do_strncpy_from_user(char *dst, const char __user *src, long count, unsigned long max)
-{
-	long res = 0;
-
-	/*
-	 * Truncate 'max' to the user-specified limit, so that
-	 * we only have one limit we need to check in the loop
-	 */
-	if (max > count)
-		max = count;
-
-	while (max >= sizeof(unsigned long)) {
-		unsigned long c, mask;
-
-		/* Fall back to byte-at-a-time if we get a page fault */
-		if (unlikely(__get_user(c,(unsigned long __user *)(src+res))))
-			break;
-		mask = has_zero(c);
-		if (mask) {
-			mask = (mask - 1) & ~mask;
-			mask >>= 7;
-			*(unsigned long *)(dst+res) = c & mask;
-			return res + count_masked_bytes(mask);
-		}
-		*(unsigned long *)(dst+res) = c;
-		res += sizeof(unsigned long);
-		max -= sizeof(unsigned long);
-	}
-
-	while (max) {
-		char c;
-
-		if (unlikely(__get_user(c,src+res)))
-			return -EFAULT;
-		dst[res] = c;
-		if (!c)
-			return res;
-		res++;
-		max--;
-	}
-
-	/*
-	 * Uhhuh. We hit 'max'. But was that the user-specified maximum
-	 * too? If so, that's ok - we got as much as the user asked for.
-	 */
-	if (res >= count)
-		return res;
-
-	/*
-	 * Nope: we hit the address space limit, and we still had more
-	 * characters the caller would have wanted. That's an EFAULT.
-	 */
-	return -EFAULT;
-}
-
-/**
- * strncpy_from_user: - Copy a NUL terminated string from userspace.
- * @dst:   Destination address, in kernel space.  This buffer must be at
- *         least @count bytes long.
- * @src:   Source address, in user space.
- * @count: Maximum number of bytes to copy, including the trailing NUL.
- *
- * Copies a NUL-terminated string from userspace to kernel space.
- *
- * On success, returns the length of the string (not including the trailing
- * NUL).
- *
- * If access to userspace fails, returns -EFAULT (some data may have been
- * copied).
- *
- * If @count is smaller than the length of the string, copies @count bytes
- * and returns @count.
- */
-long
-strncpy_from_user(char *dst, const char __user *src, long count)
-{
-	unsigned long max_addr, src_addr;
-
-	if (unlikely(count <= 0))
-		return 0;
-
-	max_addr = user_addr_max();
-	src_addr = (unsigned long)src;
-	if (likely(src_addr < max_addr)) {
-		unsigned long max = max_addr - src_addr;
-		return do_strncpy_from_user(dst, src, count, max);
-	}
-	return -EFAULT;
-}
-EXPORT_SYMBOL(strncpy_from_user);
diff --git a/fs/namei.c b/fs/namei.c
index 93ff12b1a1de..c651f02c9fec 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1452,7 +1452,8 @@ EXPORT_SYMBOL(full_name_hash);
  */
 static inline unsigned long hash_name(const char *name, unsigned int *hashp)
 {
-	unsigned long a, mask, hash, len;
+	unsigned long a, b, adata, bdata, mask, hash, len;
+	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 
 	hash = a = 0;
 	len = -sizeof(unsigned long);
@@ -1460,17 +1461,18 @@ static inline unsigned long hash_name(const char *name, unsigned int *hashp)
 		hash = (hash + a) * 9;
 		len += sizeof(unsigned long);
 		a = load_unaligned_zeropad(name+len);
-		/* Do we have any NUL or '/' bytes in this word? */
-		mask = has_zero(a) | has_zero(a ^ REPEAT_BYTE('/'));
-	} while (!mask);
-
-	/* The mask *below* the first high bit set */
-	mask = (mask - 1) & ~mask;
-	mask >>= 7;
-	hash += a & mask;
+		b = a ^ REPEAT_BYTE('/');
+	} while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata, &constants)));
+
+	adata = prep_zero_mask(a, adata, &constants);
+	bdata = prep_zero_mask(b, bdata, &constants);
+
+	mask = create_zero_mask(adata | bdata);
+
+	hash += a & zero_bytemask(mask);
 	*hashp = fold_hash(hash);
 
-	return len + count_masked_bytes(mask);
+	return len + find_zero(mask);
 }
 
 #else
diff --git a/include/asm-generic/word-at-a-time.h b/include/asm-generic/word-at-a-time.h
new file mode 100644
index 000000000000..266e0afd88ac
--- /dev/null
+++ b/include/asm-generic/word-at-a-time.h
@@ -0,0 +1,44 @@
+#ifndef _ASM_WORD_AT_A_TIME_H
+#define _ASM_WORD_AT_A_TIME_H
+
+#include <linux/kernel.h>
+
+struct word_at_a_time {
+	const unsigned long high_bits, low_bits;
+};
+
+#define WORD_AT_A_TIME_CONSTANTS { REPEAT_BYTE(0xfe) + 1, REPEAT_BYTE(0x7f) }
+
+/* Bit set in the bytes that have a zero */
+static inline long prep_zero_mask(unsigned long val, unsigned long rhs, const struct word_at_a_time *c)
+{
+	unsigned long mask = (val & c->low_bits) + c->low_bits;
+	return ~(mask | rhs);
+}
+
+#define create_zero_mask(mask) (mask)
+
+static inline long find_zero(unsigned long mask)
+{
+	long byte = 0;
+#ifdef CONFIG_64BIT
+	if (mask >> 32)
+		mask >>= 32;
+	else
+		byte = 4;
+#endif
+	if (mask >> 16)
+		mask >>= 16;
+	else
+		byte += 2;
+	return (mask >> 8) ? byte : byte + 1;
+}
+
+static inline bool has_zero(unsigned long val, unsigned long *data, const struct word_at_a_time *c)
+{
+	unsigned long rhs = val | c->low_bits;
+	*data = rhs;
+	return (val + c->high_bits) & ~rhs;
+}
+
+#endif /* _ASM_WORD_AT_A_TIME_H */
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index c4c09b0e96ba..bb2b201d6ad0 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -4,37 +4,7 @@
 #include <linux/errno.h>
 
 #include <asm/byteorder.h>
-
-static inline long find_zero(unsigned long mask)
-{
-	long byte = 0;
-
-#ifdef __BIG_ENDIAN
-#ifdef CONFIG_64BIT
-	if (mask >> 32)
-		mask >>= 32;
-	else
-		byte = 4;
-#endif
-	if (mask >> 16)
-		mask >>= 16;
-	else
-		byte += 2;
-	return (mask >> 8) ? byte : byte + 1;
-#else
-#ifdef CONFIG_64BIT
-	if (!((unsigned int) mask)) {
-		mask >>= 32;
-		byte = 4;
-	}
-#endif
-	if (!(mask & 0xffff)) {
-		mask >>= 16;
-		byte += 2;
-	}
-	return (mask & 0xff) ? byte : byte + 1;
-#endif
-}
+#include <asm/word-at-a-time.h>
 
 #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 #define IS_UNALIGNED(src, dst)	0
@@ -51,8 +21,7 @@ static inline long find_zero(unsigned long mask)
  */
 static inline long do_strncpy_from_user(char *dst, const char __user *src, long count, unsigned long max)
 {
-	const unsigned long high_bits = REPEAT_BYTE(0xfe) + 1;
-	const unsigned long low_bits = REPEAT_BYTE(0x7f);
+	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 	long res = 0;
 
 	/*
@@ -66,18 +35,16 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src, long
 		goto byte_at_a_time;
 
 	while (max >= sizeof(unsigned long)) {
-		unsigned long c, v, rhs;
+		unsigned long c, data;
 
 		/* Fall back to byte-at-a-time if we get a page fault */
 		if (unlikely(__get_user(c,(unsigned long __user *)(src+res))))
 			break;
-		rhs = c | low_bits;
-		v = (c + high_bits) & ~rhs;
 		*(unsigned long *)(dst+res) = c;
-		if (v) {
-			v = (c & low_bits) + low_bits;
-			v = ~(v | rhs);
-			return res + find_zero(v);
+		if (has_zero(c, &data, &constants)) {
+			data = prep_zero_mask(c, data, &constants);
+			data = create_zero_mask(data);
+			return res + find_zero(data);
 		}
 		res += sizeof(unsigned long);
 		max -= sizeof(unsigned long);
diff --git a/lib/strnlen_user.c b/lib/strnlen_user.c
index 3dc78a846b1a..735f91ee9b94 100644
--- a/lib/strnlen_user.c
+++ b/lib/strnlen_user.c
@@ -30,6 +30,7 @@
  */
 static inline long do_strnlen_user(const char __user *src, unsigned long count, unsigned long max)
 {
+	const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
 	long align, res = 0;
 	unsigned long c;
 
@@ -53,13 +54,11 @@ static inline long do_strnlen_user(const char __user *src, unsigned long count,
 	c |= aligned_byte_mask(align);
 
 	for (;;) {
-		unsigned long mask;
-
-		mask = has_zero(c);
-		if (mask) {
-			mask = (mask - 1) & ~mask;
-			mask >>= 7;
-			return res + count_masked_bytes(mask) + 1 - align;
+		unsigned long data;
+		if (has_zero(c, &data, &constants)) {
+			data = prep_zero_mask(c, data, &constants);
+			data = create_zero_mask(data);
+			return res + find_zero(data) + 1 - align;
 		}
 		res += sizeof(unsigned long);
 		if (unlikely(max < sizeof(unsigned long)))

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  4:15                     ` David Miller
@ 2012-05-26  4:19                       ` Linus Torvalds
  2012-05-26  4:34                         ` David Miller
                                           ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Linus Torvalds @ 2012-05-26  4:19 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch

On Fri, May 25, 2012 at 9:15 PM, David Miller <davem@davemloft.net> wrote:
>
> At first glance, it does.
>
> I play with this and try to get it working on sparc tonight, thanks.

Make sure to take the second patch, the first was scrogged for the
asm-generic case (and thus sparc). It would never have compiled.

The second one should actually hopefully come close to working. At
least I tested the compile by building the lib/ files with that,
instead of the x86 native version.

Obviously the asm-generic version cannot actually work with the dentry
hashing routines (it needs the whole bytemask generation at a minimum
for that), so it may be broken in the sense of "it doesn't actually
work", but it is *closer*. And the concept seems to be ok, since it
seems to work on x86 (I'm running it right now)

                Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  4:19                       ` Linus Torvalds
@ 2012-05-26  4:34                         ` David Miller
  2012-05-26  4:44                         ` David Miller
  2012-05-26  8:32                         ` Jonas Bonn
  2 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2012-05-26  4:34 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 21:19:01 -0700

> Make sure to take the second patch, the first was scrogged for the
> asm-generic case (and thus sparc). It would never have compiled.

Will do.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  4:19                       ` Linus Torvalds
  2012-05-26  4:34                         ` David Miller
@ 2012-05-26  4:44                         ` David Miller
  2012-05-26  5:06                           ` Linus Torvalds
  2012-05-26  8:32                         ` Jonas Bonn
  2 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2012-05-26  4:44 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 21:19:01 -0700

> The second one should actually hopefully come close to working. At
> least I tested the compile by building the lib/ files with that,
> instead of the x86 native version.

Ok, with the following patch, it seems to work on sparc too.

I'll beat it up with some glibc testsuite runs and stuff like that,
but so far it looks great.

 arch/sparc/Kconfig                      |    1 +
 arch/sparc/include/asm/uaccess_32.h     |   22 ++-----
 arch/sparc/include/asm/uaccess_64.h     |    8 +--
 arch/sparc/include/asm/word-at-a-time.h |    1 +
 arch/sparc/lib/Makefile                 |    1 -
 arch/sparc/lib/ksyms.c                  |    2 -
 arch/sparc/lib/strlen_user_32.S         |  109 -------------------------------
 arch/sparc/lib/strlen_user_64.S         |   97 ---------------------------
 8 files changed, 10 insertions(+), 231 deletions(-)
 create mode 100644 arch/sparc/include/asm/word-at-a-time.h
 delete mode 100644 arch/sparc/lib/strlen_user_32.S
 delete mode 100644 arch/sparc/lib/strlen_user_64.S

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 15e9e05..83bd051 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -35,6 +35,7 @@ config SPARC
 	select GENERIC_CMOS_UPDATE
 	select GENERIC_CLOCKEVENTS
 	select GENERIC_STRNCPY_FROM_USER
+	select GENERIC_STRNLEN_USER
 
 config SPARC32
 	def_bool !64BIT
diff --git a/arch/sparc/include/asm/uaccess_32.h b/arch/sparc/include/asm/uaccess_32.h
index 59586b5..53a28dd 100644
--- a/arch/sparc/include/asm/uaccess_32.h
+++ b/arch/sparc/include/asm/uaccess_32.h
@@ -16,6 +16,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
+
 #define ARCH_HAS_SORT_EXTABLE
 #define ARCH_HAS_SEARCH_EXTABLE
 
@@ -304,24 +306,8 @@ static inline unsigned long clear_user(void __user *addr, unsigned long n)
 		return n;
 }
 
-extern long __strlen_user(const char __user *);
-extern long __strnlen_user(const char __user *, long len);
-
-static inline long strlen_user(const char __user *str)
-{
-	if (!access_ok(VERIFY_READ, str, 0))
-		return 0;
-	else
-		return __strlen_user(str);
-}
-
-static inline long strnlen_user(const char __user *str, long len)
-{
-	if (!access_ok(VERIFY_READ, str, 0))
-		return 0;
-	else
-		return __strnlen_user(str, len);
-}
+extern __must_check long strlen_user(const char __user *str);
+extern __must_check long strnlen_user(const char __user *str, long n);
 
 #endif  /* __ASSEMBLY__ */
 
diff --git a/arch/sparc/include/asm/uaccess_64.h b/arch/sparc/include/asm/uaccess_64.h
index dcdfb89..7c831d8 100644
--- a/arch/sparc/include/asm/uaccess_64.h
+++ b/arch/sparc/include/asm/uaccess_64.h
@@ -17,6 +17,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
+
 /*
  * Sparc64 is segmented, though more like the M68K than the I386.
  * We use the secondary ASI to address user memory, which references a
@@ -257,11 +259,9 @@ extern unsigned long __must_check __clear_user(void __user *, unsigned long);
 
 #define clear_user __clear_user
 
-extern long __strlen_user(const char __user *);
-extern long __strnlen_user(const char __user *, long len);
+extern __must_check long strlen_user(const char __user *str);
+extern __must_check long strnlen_user(const char __user *str, long n);
 
-#define strlen_user __strlen_user
-#define strnlen_user __strnlen_user
 #define __copy_to_user_inatomic ___copy_to_user
 #define __copy_from_user_inatomic ___copy_from_user
 
diff --git a/arch/sparc/include/asm/word-at-a-time.h b/arch/sparc/include/asm/word-at-a-time.h
new file mode 100644
index 0000000..b522cc3
--- /dev/null
+++ b/arch/sparc/include/asm/word-at-a-time.h
@@ -0,0 +1 @@
+#include <asm-generic/word-at-a-time.h>
diff --git a/arch/sparc/lib/Makefile b/arch/sparc/lib/Makefile
index 943d98d..dff4096 100644
--- a/arch/sparc/lib/Makefile
+++ b/arch/sparc/lib/Makefile
@@ -10,7 +10,6 @@ lib-y                 += strlen.o
 lib-y                 += checksum_$(BITS).o
 lib-$(CONFIG_SPARC32) += blockops.o
 lib-y                 += memscan_$(BITS).o memcmp.o strncmp_$(BITS).o
-lib-y                 += strlen_user_$(BITS).o
 lib-$(CONFIG_SPARC32) += divdi3.o udivdi3.o
 lib-$(CONFIG_SPARC32) += copy_user.o locks.o
 lib-$(CONFIG_SPARC64) += atomic_64.o
diff --git a/arch/sparc/lib/ksyms.c b/arch/sparc/lib/ksyms.c
index 6b278ab..3b31218 100644
--- a/arch/sparc/lib/ksyms.c
+++ b/arch/sparc/lib/ksyms.c
@@ -15,8 +15,6 @@
 
 /* string functions */
 EXPORT_SYMBOL(strlen);
-EXPORT_SYMBOL(__strlen_user);
-EXPORT_SYMBOL(__strnlen_user);
 EXPORT_SYMBOL(strncmp);
 
 /* mem* functions */
diff --git a/arch/sparc/lib/strlen_user_32.S b/arch/sparc/lib/strlen_user_32.S
deleted file mode 100644
index 8c8a371..0000000
--- a/arch/sparc/lib/strlen_user_32.S
+++ /dev/null
@@ -1,109 +0,0 @@
-/* strlen_user.S: Sparc optimized strlen_user code
- *
- * Return length of string in userspace including terminating 0
- * or 0 for error
- *
- * Copyright (C) 1991,1996 Free Software Foundation
- * Copyright (C) 1996 David S. Miller (davem@caip.rutgers.edu)
- * Copyright (C) 1996 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
- */
-
-#define LO_MAGIC 0x01010101
-#define HI_MAGIC 0x80808080
-
-10:
-	ldub	[%o0], %o5
-	cmp	%o5, 0
-	be	1f
-	 add	%o0, 1, %o0
-	andcc	%o0, 3, %g0
-	be	4f
-	 or	%o4, %lo(HI_MAGIC), %o3
-11:
-	ldub	[%o0], %o5
-	cmp	%o5, 0
-	be	2f
-	 add	%o0, 1, %o0
-	andcc	%o0, 3, %g0
-	be	5f
-	 sethi	%hi(LO_MAGIC), %o4
-12:
-	ldub	[%o0], %o5
-	cmp	%o5, 0
-	be	3f
-	 add	%o0, 1, %o0
-	b	13f
-	 or	%o4, %lo(LO_MAGIC), %o2
-1:
-	retl
-	 mov	1, %o0
-2:
-	retl
-	 mov	2, %o0
-3:
-	retl
-	 mov	3, %o0
-
-	.align 4
-	.global __strlen_user, __strnlen_user
-__strlen_user:
-	sethi	%hi(32768), %o1
-__strnlen_user:
-	mov	%o1, %g1
-	mov	%o0, %o1
-	andcc	%o0, 3, %g0
-	bne	10b
-	 sethi	%hi(HI_MAGIC), %o4
-	or	%o4, %lo(HI_MAGIC), %o3
-4:
-	sethi	%hi(LO_MAGIC), %o4
-5:
-	or	%o4, %lo(LO_MAGIC), %o2
-13:
-	ld	[%o0], %o5
-2:
-	sub	%o5, %o2, %o4
-	andcc	%o4, %o3, %g0
-	bne	82f
-	 add	%o0, 4, %o0
-	sub	%o0, %o1, %g2
-81:	cmp	%g2, %g1
-	blu	13b
-	 mov	%o0, %o4
-	ba,a	1f
-
-	/* Check every byte. */
-82:	srl	%o5, 24, %g5
-	andcc	%g5, 0xff, %g0
-	be	1f
-	 add	%o0, -3, %o4
-	srl	%o5, 16, %g5
-	andcc	%g5, 0xff, %g0
-	be	1f
-	 add	%o4, 1, %o4
-	srl	%o5, 8, %g5
-	andcc	%g5, 0xff, %g0
-	be	1f
-	 add	%o4, 1, %o4
-	andcc	%o5, 0xff, %g0
-	bne	81b
-	 sub	%o0, %o1, %g2
-
-	add	%o4, 1, %o4
-1:
-	retl
-	 sub	%o4, %o1, %o0
-
-	.section .fixup,#alloc,#execinstr
-	.align	4
-9:
-	retl
-	 clr	%o0
-
-	.section __ex_table,#alloc
-	.align	4
-
-	.word	10b, 9b
-	.word	11b, 9b
-	.word	12b, 9b
-	.word	13b, 9b
diff --git a/arch/sparc/lib/strlen_user_64.S b/arch/sparc/lib/strlen_user_64.S
deleted file mode 100644
index c3df71f..0000000
--- a/arch/sparc/lib/strlen_user_64.S
+++ /dev/null
@@ -1,97 +0,0 @@
-/* strlen_user.S: Sparc64 optimized strlen_user code
- *
- * Return length of string in userspace including terminating 0
- * or 0 for error
- *
- * Copyright (C) 1991,1996 Free Software Foundation
- * Copyright (C) 1996,1999 David S. Miller (davem@redhat.com)
- * Copyright (C) 1996,1997 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
- */
-
-#include <linux/linkage.h>
-#include <asm/asi.h>
-
-#define LO_MAGIC 0x01010101
-#define HI_MAGIC 0x80808080
-
-	.align 4
-ENTRY(__strlen_user)
-	sethi	%hi(32768), %o1
-ENTRY(__strnlen_user)
-	mov	%o1, %g1
-	mov	%o0, %o1
-	andcc	%o0, 3, %g0
-	be,pt	%icc, 9f
-	 sethi	%hi(HI_MAGIC), %o4
-10:	lduba	[%o0] %asi, %o5
-	brz,pn	%o5, 21f
-	 add	%o0, 1, %o0
-	andcc	%o0, 3, %g0
-	be,pn	%icc, 4f
-	 or	%o4, %lo(HI_MAGIC), %o3
-11:	lduba	[%o0] %asi, %o5
-	brz,pn	%o5, 22f
-	 add	%o0, 1, %o0
-	andcc	%o0, 3, %g0
-	be,pt	%icc, 13f
-	 srl	%o3, 7, %o2
-12:	lduba	[%o0] %asi, %o5
-	brz,pn	%o5, 23f
-	 add	%o0, 1, %o0
-	ba,pt	%icc, 2f
-15:	 lda	[%o0] %asi, %o5
-9:	or	%o4, %lo(HI_MAGIC), %o3
-4:	srl	%o3, 7, %o2
-13:	lda	[%o0] %asi, %o5
-2:	sub	%o5, %o2, %o4
-	andcc	%o4, %o3, %g0
-	bne,pn	%icc, 82f
-	 add	%o0, 4, %o0
-	sub	%o0, %o1, %g2
-81:	cmp	%g2, %g1
-	blu,pt	%icc, 13b
-	 mov	%o0, %o4
-	ba,a,pt	%xcc, 1f
-
-	/* Check every byte. */
-82:	srl	%o5, 24, %g7
-	andcc	%g7, 0xff, %g0
-	be,pn	%icc, 1f
-	 add	%o0, -3, %o4
-	srl	%o5, 16, %g7
-	andcc	%g7, 0xff, %g0
-	be,pn	%icc, 1f
-	 add	%o4, 1, %o4
-	srl	%o5, 8, %g7
-	andcc	%g7, 0xff, %g0
-	be,pn	%icc, 1f
-	 add	%o4, 1, %o4
-	andcc	%o5, 0xff, %g0
-	bne,pt	%icc, 81b
-	 sub	%o0, %o1, %g2
-	add	%o4, 1, %o4
-1:	retl
-	 sub	%o4, %o1, %o0
-21:	retl
-	 mov	1, %o0
-22:	retl
-	 mov	2, %o0
-23:	retl
-	 mov	3, %o0
-ENDPROC(__strlen_user)
-ENDPROC(__strnlen_user)
-
-        .section .fixup,#alloc,#execinstr
-        .align  4
-30:
-        retl
-         clr    %o0
-
-	.section __ex_table,"a"
-	.align	4
-
-	.word	10b, 30b
-	.word	11b, 30b
-	.word	12b, 30b
-	.word	15b, 30b
-	.word	13b, 30b
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  4:44                         ` David Miller
@ 2012-05-26  5:06                           ` Linus Torvalds
  2012-05-26  5:59                             ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2012-05-26  5:06 UTC (permalink / raw)
  To: David Miller; +Cc: linux-arch

On Fri, May 25, 2012 at 9:44 PM, David Miller <davem@davemloft.net> wrote:
>
> Ok, with the following patch, it seems to work on sparc too.

Good. I'll combine things and then split them up a bit differently (ie
a "create generic version" patch and then separate "start using
generic version" patches for x86 and sparc).

> diff --git a/arch/sparc/include/asm/word-at-a-time.h b/arch/sparc/include/asm/word-at-a-time.h
> new file mode 100644
> index 0000000..b522cc3
> --- /dev/null
> +++ b/arch/sparc/include/asm/word-at-a-time.h
> @@ -0,0 +1 @@
> +#include <asm-generic/word-at-a-time.h>

This part *should* be unnecessary, I already did the

   generic-y += word-at-a-time.h

in arch/sparc/include/asm/Kbuild. Can you verify? Or did something go wrong?

                    Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  5:06                           ` Linus Torvalds
@ 2012-05-26  5:59                             ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2012-05-26  5:59 UTC (permalink / raw)
  To: torvalds; +Cc: linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 25 May 2012 22:06:31 -0700

>> diff --git a/arch/sparc/include/asm/word-at-a-time.h b/arch/sparc/include/asm/word-at-a-time.h
>> new file mode 100644
>> index 0000000..b522cc3
>> --- /dev/null
>> +++ b/arch/sparc/include/asm/word-at-a-time.h
>> @@ -0,0 +1 @@
>> +#include <asm-generic/word-at-a-time.h>
> 
> This part *should* be unnecessary, I already did the
> 
>    generic-y += word-at-a-time.h
> 
> in arch/sparc/include/asm/Kbuild. Can you verify? Or did something go wrong?

Aha, I see, yes it does work without adding the sparc header.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  4:19                       ` Linus Torvalds
  2012-05-26  4:34                         ` David Miller
  2012-05-26  4:44                         ` David Miller
@ 2012-05-26  8:32                         ` Jonas Bonn
  2012-05-26 18:39                           ` Linus Torvalds
  2 siblings, 1 reply; 26+ messages in thread
From: Jonas Bonn @ 2012-05-26  8:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-arch


On 26 May 2012 06:19, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Make sure to take the second patch, the first was scrogged for the
> asm-generic case (and thus sparc). It would never have compiled.
>
> The second one should actually hopefully come close to working. At
> least I tested the compile by building the lib/ files with that,
> instead of the x86 native version.
>
> Obviously the asm-generic version cannot actually work with the dentry
> hashing routines (it needs the whole bytemask generation at a minimum
> for that), so it may be broken in the sense of "it doesn't actually
> work", but it is *closer*. And the concept seems to be ok, since it
> seems to work on x86 (I'm running it right now)

I gave this a try on OpenRISC... works fine here, too.

/Jonas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26  8:32                         ` Jonas Bonn
@ 2012-05-26 18:39                           ` Linus Torvalds
  2012-05-26 23:46                             ` David Miller
                                               ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Linus Torvalds @ 2012-05-26 18:39 UTC (permalink / raw)
  To: Jonas Bonn, David Miller; +Cc: linux-arch

On Sat, May 26, 2012 at 1:32 AM, Jonas Bonn <jonas@southpole.se> wrote:
>
> I gave this a try on OpenRISC... works fine here, too.

Ok, guys, there's a "generic-string-functions" branch in my regular
git repositories that has the cleaned-up commit series.

Can you guys double-check that everything looks good, and Jonas, I
assume you could just do a patch to remove the openrisc
str[n]len_user() functions, and then add the "select
GENERIC_STRNLEN_USER". Quite frankly, I could probably do it for you,
but I'm not going to generate a random patch for some architecture
I've never in my life compiled anything for.

David - I committed the sparc strnlen_user removal with you as author,
but without any sign-off. I don't really care, I can sign off on that
patch that just removes code, and I didn't want to add a sign-off line
for you that you had never actually sent. But it's attributed to you
anyway, since it actually came from you.

Further testing would be good, but I feel pretty comfy about it. It
looks fine, and when I once tried to boot a kernel with a broken
strnlen (due to incorrect byte-endian-testing, not actually incorrect
code otherwise), it didn't get very far at all. So this is actually
code that gets a fair amount of coverage testing from not even doing
anything special.

Other big-endian architectures could easily also test this out by
adding the appropriate few lines:

 - arch Kconfig file:
        select GENERIC_STRNCPY_FROM_USER
        select GENERIC_STRNLEN_USER

 - arch/include/asm/Kbuild:
        generic-y += word-at-a-time.h

 - remove your own strncpy_from_user/strnlen_user functions, and
replace them with the appropriate declaration in <asm/uaccess.h>

NOTE NOTE NOTE for arch people. The "asm-generic" header file is
*only* for big-endian machines. Little-endian can do a simpler version
of it. See the x86 file for an example (it looks more complex, but
that's largely because it (a) has the optimized find_zero() and (b)
also supports dcache hashing, which requires the
load_unaligned_zeropad() and zero_bytemask() functions.

You can literally copy the x86 version for other little-endian
architectures, and remove the load_unaligned_zeropad() function.
However, you *probably* also want to tweak the zero-finding logic,
especially if you have a slow multiplier or if you have a good
population-count function.

Also see the explanations in commit 36126f8f2ed8 ("word-at-a-time:
make the interfaces truly generic") which talks about the interfaces
and tries to explain why they look the way they do.

Comments? The diffstat for the series looks quite nice:

 23 files changed, 259 insertions(+), 490 deletions(-)

and the end result is actually much better code than what it used to
be before (not that I can really comment on the sparc code, since I
didn't really look at the asm, but I'm assuming it didn't do all the
clever word-wise stuff. And I *know* the x86 code didn't).

                      Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26 18:39                           ` Linus Torvalds
@ 2012-05-26 23:46                             ` David Miller
  2012-05-27  8:28                             ` Jonas Bonn
  2012-05-28  3:07                             ` Paul Mackerras
  2 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2012-05-26 23:46 UTC (permalink / raw)
  To: torvalds; +Cc: jonas, linux-arch

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 26 May 2012 11:39:10 -0700

> Ok, guys, there's a "generic-string-functions" branch in my regular
> git repositories that has the cleaned-up commit series.
> 
> Can you guys double-check that everything looks good,
 ...

Looks great and boots/works fine on sparc.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26 18:39                           ` Linus Torvalds
  2012-05-26 23:46                             ` David Miller
@ 2012-05-27  8:28                             ` Jonas Bonn
  2012-05-28  3:07                             ` Paul Mackerras
  2 siblings, 0 replies; 26+ messages in thread
From: Jonas Bonn @ 2012-05-27  8:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-arch


On Sat, 2012-05-26 at 11:39 -0700, Linus Torvalds wrote:
> On Sat, May 26, 2012 at 1:32 AM, Jonas Bonn <jonas@southpole.se> wrote:
> >
> > I gave this a try on OpenRISC... works fine here, too.
> 

> Can you guys double-check that everything looks good, and Jonas, I
> assume you could just do a patch to remove the openrisc
> str[n]len_user() functions, and then add the "select
> GENERIC_STRNLEN_USER". Quite frankly, I could probably do it for you,
> but I'm not going to generate a random patch for some architecture
> I've never in my life compiled anything for.

Yes, all tested and looks good.  I'll send you a patch for OpenRISC.

Thanks,
Jonas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-26 18:39                           ` Linus Torvalds
  2012-05-26 23:46                             ` David Miller
  2012-05-27  8:28                             ` Jonas Bonn
@ 2012-05-28  3:07                             ` Paul Mackerras
  2012-05-28  3:47                               ` Linus Torvalds
  2 siblings, 1 reply; 26+ messages in thread
From: Paul Mackerras @ 2012-05-28  3:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jonas Bonn, David Miller, linux-arch

On Sat, May 26, 2012 at 11:39:10AM -0700, Linus Torvalds wrote:

> Further testing would be good, but I feel pretty comfy about it. It
> looks fine, and when I once tried to boot a kernel with a broken
> strnlen (due to incorrect byte-endian-testing, not actually incorrect
> code otherwise), it didn't get very far at all. So this is actually
> code that gets a fair amount of coverage testing from not even doing
> anything special.

Apart from a thinko in the aligned_byte_mask definition (it's wrong
for 32-bit big-endian), it looks fine.  You're right about it not
getting very far at all; my 32-bit powerbook couldn't even mount
root.  I have sent a patch for that and one to use this stuff on
powerpc - without Ben's ack since he is off sick at the moment.

Paul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-28  3:07                             ` Paul Mackerras
@ 2012-05-28  3:47                               ` Linus Torvalds
  2012-05-28  3:53                                 ` Linus Torvalds
  0 siblings, 1 reply; 26+ messages in thread
From: Linus Torvalds @ 2012-05-28  3:47 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Jonas Bonn, David Miller, linux-arch

On Sun, May 27, 2012 at 8:07 PM, Paul Mackerras <paulus@samba.org> wrote:
>
> Apart from a thinko in the aligned_byte_mask definition (it's wrong
> for 32-bit big-endian), it looks fine.  You're right about it not
> getting very far at all; my 32-bit powerbook couldn't even mount
> root.  I have sent a patch for that and one to use this stuff on
> powerpc - without Ben's ack since he is off sick at the moment.

Ahh. OpenRISC is 64-bit, and apparently David never tested 32-bit
sparc. I'd tested it on both x86-63 and i386, but those obviously
wouldn't have shown any big-endian issues.

Thanks,

                    Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Mostly portable strnlen_user()
  2012-05-28  3:47                               ` Linus Torvalds
@ 2012-05-28  3:53                                 ` Linus Torvalds
  0 siblings, 0 replies; 26+ messages in thread
From: Linus Torvalds @ 2012-05-28  3:53 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Jonas Bonn, David Miller, linux-arch

On Sun, May 27, 2012 at 8:47 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Ahh. OpenRISC is 64-bit

oops, no it isn't. But apparently the test coverage is crap ;_)

            Linus

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-05-28  3:54 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-25 22:35 Mostly portable strnlen_user() Linus Torvalds
2012-05-25 23:06 ` David Miller
2012-05-25 23:11   ` Linus Torvalds
2012-05-25 23:14     ` David Miller
2012-05-25 23:37       ` Linus Torvalds
2012-05-25 23:41         ` David Miller
2012-05-26  0:41           ` David Miller
2012-05-26  1:09             ` Linus Torvalds
2012-05-26  1:21               ` David Miller
2012-05-26  2:13               ` Linus Torvalds
2012-05-26  2:43                 ` David Miller
2012-05-26  3:56                   ` Linus Torvalds
2012-05-26  4:15                     ` David Miller
2012-05-26  4:19                       ` Linus Torvalds
2012-05-26  4:34                         ` David Miller
2012-05-26  4:44                         ` David Miller
2012-05-26  5:06                           ` Linus Torvalds
2012-05-26  5:59                             ` David Miller
2012-05-26  8:32                         ` Jonas Bonn
2012-05-26 18:39                           ` Linus Torvalds
2012-05-26 23:46                             ` David Miller
2012-05-27  8:28                             ` Jonas Bonn
2012-05-28  3:07                             ` Paul Mackerras
2012-05-28  3:47                               ` Linus Torvalds
2012-05-28  3:53                                 ` Linus Torvalds
2012-05-26  4:16                     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).