[bug] pwritev02 hang on s390x with 4.8.0-rc7

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [bug] pwritev02 hang on s390x with 4.8.0-rc7
@ 2016-09-20 12:56 Jan Stancek
  2016-09-20 15:06 ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Stancek @ 2016-09-20 12:56 UTC (permalink / raw)
  To: viro, linux-kernel; +Cc: jstancek

Hi,

I'm hitting a regression with LTP's pwritev02 [1] on s390x with 4.8.0-rc7.
Specifically the EFAULT case, which is passing an iovec with invalid base address:
  #define CHUNK 64
  static struct iovec wr_iovec3[] = {
  	{(char *)-1, CHUNK},
  };
hangs with 100% cpu usage and not very helpful stack trace:
  # cat /proc/28544/stack
  [<0000000000001000>] 0x1000
  [<ffffffffffffffff>] 0xffffffffffffffff

The problem starts with d4690f1e1cda "fix iov_iter_fault_in_readable()".

Before this commit fault_in_pages_readable() called __get_user() on start
address which presumably failed with -EFAULT immediately.

With this commit applied fault_in_multipages_readable() appears to return 0
for the case when "start" is invalid but "end" overflows. Then according to
my traces we keep spinning inside while loop in iomap_write_actor().

Patch below makes the issue go away for me:

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 66a1260b33de..e443dbd2b5df 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -600,11 +600,11 @@ static inline int fault_in_multipages_readable(const char __user *uaddr,
                                               int size)
 {
        volatile char c;
-       int ret = 0;
+       int ret = -EFAULT;
        const char __user *end = uaddr + size - 1;

        if (unlikely(size == 0))
-               return ret;
+               return 0;

        while (uaddr <= end) {
                ret = __get_user(c, uaddr);

Regards,
Jan

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/pwritev/pwritev02.c

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7
  2016-09-20 12:56 [bug] pwritev02 hang on s390x with 4.8.0-rc7 Jan Stancek
@ 2016-09-20 15:06 ` Al Viro
  2016-09-20 17:11   ` Jan Stancek
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2016-09-20 15:06 UTC (permalink / raw)
  To: Jan Stancek; +Cc: linux-kernel

On Tue, Sep 20, 2016 at 02:56:06PM +0200, Jan Stancek wrote:
> Hi,
> 
> I'm hitting a regression with LTP's pwritev02 [1] on s390x with 4.8.0-rc7.
> Specifically the EFAULT case, which is passing an iovec with invalid base address:
>   #define CHUNK 64
>   static struct iovec wr_iovec3[] = {
>   	{(char *)-1, CHUNK},
>   };
> hangs with 100% cpu usage and not very helpful stack trace:
>   # cat /proc/28544/stack
>   [<0000000000001000>] 0x1000
>   [<ffffffffffffffff>] 0xffffffffffffffff
> 
> The problem starts with d4690f1e1cda "fix iov_iter_fault_in_readable()".
> 
> Before this commit fault_in_pages_readable() called __get_user() on start
> address which presumably failed with -EFAULT immediately.
> 
> With this commit applied fault_in_multipages_readable() appears to return 0
> for the case when "start" is invalid but "end" overflows. Then according to
> my traces we keep spinning inside while loop in iomap_write_actor().

Cute.  Let me see if I understand what's going on there: we have a wraparound
that would've been caught by most of access_ok(), but not on an architectures
where access_ok() is a no-op; in that case the loop is skipped and we
just check the last address, which passes and we get a false positive.
Bug is real and it's definitely -stable fodder.

I'm not sure that the fix you propose is right, though.  Note that ERR_PTR()
is not a valid address on any architecture, so any wraparound automatically
means -EFAULT and we can simply check unlikely(uaddr > end) and bugger off
if it holds.  while() turns into do-while(), of course, and the same is
needed for the read side.

Could you check if the following works for you?

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 66a1260..7e3d537 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -571,56 +571,56 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size)
  */
 static inline int fault_in_multipages_writeable(char __user *uaddr, int size)
 {
-	int ret = 0;
 	char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
-		return ret;
+		return 0;
 
+	if (unlikely(uaddr > end))
+		return -EFAULT;
 	/*
 	 * Writing zeroes into userspace here is OK, because we know that if
 	 * the zero gets there, we'll be overwriting it.
 	 */
-	while (uaddr <= end) {
-		ret = __put_user(0, uaddr);
-		if (ret != 0)
-			return ret;
+	do {
+		if (unlikely(__put_user(0, uaddr) != 0))
+			return -EFAULT;
 		uaddr += PAGE_SIZE;
-	}
+	} while (uaddr <= end);
 
 	/* Check whether the range spilled into the next page. */
 	if (((unsigned long)uaddr & PAGE_MASK) ==
 			((unsigned long)end & PAGE_MASK))
-		ret = __put_user(0, end);
+		return __put_user(0, end);
 
-	return ret;
+	return 0;
 }
 
 static inline int fault_in_multipages_readable(const char __user *uaddr,
 					       int size)
 {
 	volatile char c;
-	int ret = 0;
 	const char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
-		return ret;
+		return 0;
 
-	while (uaddr <= end) {
-		ret = __get_user(c, uaddr);
-		if (ret != 0)
-			return ret;
+	if (unlikely(uaddr > end))
+		return -EFAULT;
+
+	do {
+		if (unlikely(__get_user(c, uaddr) != 0))
+			return -EFAULT;
 		uaddr += PAGE_SIZE;
-	}
+	} while (uaddr <= end);
 
 	/* Check whether the range spilled into the next page. */
 	if (((unsigned long)uaddr & PAGE_MASK) ==
 			((unsigned long)end & PAGE_MASK)) {
-		ret = __get_user(c, end);
-		(void)c;
+		return __get_user(c, end);
 	}
 
-	return ret;
+	return 0;
 }
 
 int add_to_page_cache_locked(struct page *page, struct address_space *mapping,

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7
  2016-09-20 15:06 ` Al Viro
@ 2016-09-20 17:11   ` Jan Stancek
  2016-09-20 17:30     ` Al Viro
  2016-09-20 19:07     ` [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() Al Viro
  0 siblings, 2 replies; 12+ messages in thread
From: Jan Stancek @ 2016-09-20 17:11 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel




----- Original Message -----
> From: "Al Viro" <viro@ZenIV.linux.org.uk>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: linux-kernel@vger.kernel.org
> Sent: Tuesday, 20 September, 2016 5:06:57 PM
> Subject: Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7
> 
> On Tue, Sep 20, 2016 at 02:56:06PM +0200, Jan Stancek wrote:
> > Hi,
> > 
> > I'm hitting a regression with LTP's pwritev02 [1] on s390x with 4.8.0-rc7.
> > Specifically the EFAULT case, which is passing an iovec with invalid base
> > address:
> >   #define CHUNK 64
> >   static struct iovec wr_iovec3[] = {
> >   	{(char *)-1, CHUNK},
> >   };
> > hangs with 100% cpu usage and not very helpful stack trace:
> >   # cat /proc/28544/stack
> >   [<0000000000001000>] 0x1000
> >   [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > The problem starts with d4690f1e1cda "fix iov_iter_fault_in_readable()".
> > 
> > Before this commit fault_in_pages_readable() called __get_user() on start
> > address which presumably failed with -EFAULT immediately.
> > 
> > With this commit applied fault_in_multipages_readable() appears to return 0
> > for the case when "start" is invalid but "end" overflows. Then according to
> > my traces we keep spinning inside while loop in iomap_write_actor().
> 
> Cute.  Let me see if I understand what's going on there: we have a wraparound
> that would've been caught by most of access_ok(), but not on an architectures
> where access_ok() is a no-op; in that case the loop is skipped and we
> just check the last address, which passes and we get a false positive.
> Bug is real and it's definitely -stable fodder.
> 
> I'm not sure that the fix you propose is right, though.  Note that ERR_PTR()
> is not a valid address on any architecture, so any wraparound automatically
> means -EFAULT and we can simply check unlikely(uaddr > end) and bugger off
> if it holds.  while() turns into do-while(), of course, and the same is
> needed for the read side.
> 
> Could you check if the following works for you?

This fixes pwritev02 hang.

I ran all syscalls tests from LTP, and I see a change in behaviour
of couple other tests (writev01, writev03 and writev04 [1]) in 4.8.0-rc7.

These call writev() with partially invalid iovecs, and now fail with
EFAULT, while with previous -rc6 kernel they returned number of bytes
written before they encountered invalid iovec record.
This should be reproducible also on x86.

Regards,
Jan

[1] https://github.com/linux-test-project/ltp/tree/master/testcases/kernel/syscalls/writev

> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 66a1260..7e3d537 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -571,56 +571,56 @@ static inline int fault_in_pages_readable(const char
> __user *uaddr, int size)
>   */
>  static inline int fault_in_multipages_writeable(char __user *uaddr, int
>  size)
>  {
> -	int ret = 0;
>  	char __user *end = uaddr + size - 1;
>  
>  	if (unlikely(size == 0))
> -		return ret;
> +		return 0;
>  
> +	if (unlikely(uaddr > end))
> +		return -EFAULT;
>  	/*
>  	 * Writing zeroes into userspace here is OK, because we know that if
>  	 * the zero gets there, we'll be overwriting it.
>  	 */
> -	while (uaddr <= end) {
> -		ret = __put_user(0, uaddr);
> -		if (ret != 0)
> -			return ret;
> +	do {
> +		if (unlikely(__put_user(0, uaddr) != 0))
> +			return -EFAULT;
>  		uaddr += PAGE_SIZE;
> -	}
> +	} while (uaddr <= end);
>  
>  	/* Check whether the range spilled into the next page. */
>  	if (((unsigned long)uaddr & PAGE_MASK) ==
>  			((unsigned long)end & PAGE_MASK))
> -		ret = __put_user(0, end);
> +		return __put_user(0, end);
>  
> -	return ret;
> +	return 0;
>  }
>  
>  static inline int fault_in_multipages_readable(const char __user *uaddr,
>  					       int size)
>  {
>  	volatile char c;
> -	int ret = 0;
>  	const char __user *end = uaddr + size - 1;
>  
>  	if (unlikely(size == 0))
> -		return ret;
> +		return 0;
>  
> -	while (uaddr <= end) {
> -		ret = __get_user(c, uaddr);
> -		if (ret != 0)
> -			return ret;
> +	if (unlikely(uaddr > end))
> +		return -EFAULT;
> +
> +	do {
> +		if (unlikely(__get_user(c, uaddr) != 0))
> +			return -EFAULT;
>  		uaddr += PAGE_SIZE;
> -	}
> +	} while (uaddr <= end);
>  
>  	/* Check whether the range spilled into the next page. */
>  	if (((unsigned long)uaddr & PAGE_MASK) ==
>  			((unsigned long)end & PAGE_MASK)) {
> -		ret = __get_user(c, end);
> -		(void)c;
> +		return __get_user(c, end);
>  	}
>  
> -	return ret;
> +	return 0;
>  }
>  
>  int add_to_page_cache_locked(struct page *page, struct address_space
>  *mapping,
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7
  2016-09-20 17:11   ` Jan Stancek
@ 2016-09-20 17:30     ` Al Viro
  2016-09-20 19:07     ` [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() Al Viro
  1 sibling, 0 replies; 12+ messages in thread
From: Al Viro @ 2016-09-20 17:30 UTC (permalink / raw)
  To: Jan Stancek; +Cc: linux-kernel

On Tue, Sep 20, 2016 at 01:11:41PM -0400, Jan Stancek wrote:

> I ran all syscalls tests from LTP, and I see a change in behaviour
> of couple other tests (writev01, writev03 and writev04 [1]) in 4.8.0-rc7.
> 
> These call writev() with partially invalid iovecs, and now fail with
> EFAULT, while with previous -rc6 kernel they returned number of bytes
> written before they encountered invalid iovec record.
> This should be reproducible also on x86.

Known, discussed and considered legitimate.  It's not so much EFAULT vs short
write, it's how far do we shorten the write.  Change consists of removing
an accidental (and undocumented) property of iovec boundaries wrt write
shortening.  Usually an invalid address anywhere in the data we are asked to
write leads to write shortened to the last pagecache boundary (i.e file
position multiple of page size) entirely covered by valid data.  It is
filesystem-dependent and already deep in nasal demon territory.  writev,
pretty much by accident, never shortened past an iovec boundary.  That's
what got changed - now the rules are same as they are for all writes.

Having an LTP test (as opposed to actual real-world code) deliberately
stepping into that and checking how far does the shortening go means just
one thing: update the test.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok()
  2016-09-20 17:11   ` Jan Stancek
  2016-09-20 17:30     ` Al Viro
@ 2016-09-20 19:07     ` Al Viro
  2016-09-20 20:24       ` Linus Torvalds
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2016-09-20 19:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Switching iov_iter fault-in to multipages variants has exposed an old
bug in underlying fault_in_multipages_...(); they break if the range
passed to them wraps around.  Normally access_ok() done by callers
will prevent such (and it's a guaranteed EFAULT - ERR_PTR() values
fall into such a range and they should not point to any valid objects).
However, on architectures where userland and kernel live in different
MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
with a wraparound can reach fault_in_multipages_...().

Since any wraparound means EFAULT there, the fix is trivial - turn
those
    while (uaddr <= end)
	    ...
into
    if (unlikely(uaddr > end))
	    return -EFAULT;
    do
	    ...
    while (uaddr <= end);

Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 66a1260..7e3d537 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -571,56 +571,56 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size)
  */
 static inline int fault_in_multipages_writeable(char __user *uaddr, int size)
 {
-	int ret = 0;
 	char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
-		return ret;
+		return 0;
 
+	if (unlikely(uaddr > end))
+		return -EFAULT;
 	/*
 	 * Writing zeroes into userspace here is OK, because we know that if
 	 * the zero gets there, we'll be overwriting it.
 	 */
-	while (uaddr <= end) {
-		ret = __put_user(0, uaddr);
-		if (ret != 0)
-			return ret;
+	do {
+		if (unlikely(__put_user(0, uaddr) != 0))
+			return -EFAULT;
 		uaddr += PAGE_SIZE;
-	}
+	} while (uaddr <= end);
 
 	/* Check whether the range spilled into the next page. */
 	if (((unsigned long)uaddr & PAGE_MASK) ==
 			((unsigned long)end & PAGE_MASK))
-		ret = __put_user(0, end);
+		return __put_user(0, end);
 
-	return ret;
+	return 0;
 }
 
 static inline int fault_in_multipages_readable(const char __user *uaddr,
 					       int size)
 {
 	volatile char c;
-	int ret = 0;
 	const char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
-		return ret;
+		return 0;
 
-	while (uaddr <= end) {
-		ret = __get_user(c, uaddr);
-		if (ret != 0)
-			return ret;
+	if (unlikely(uaddr > end))
+		return -EFAULT;
+
+	do {
+		if (unlikely(__get_user(c, uaddr) != 0))
+			return -EFAULT;
 		uaddr += PAGE_SIZE;
-	}
+	} while (uaddr <= end);
 
 	/* Check whether the range spilled into the next page. */
 	if (((unsigned long)uaddr & PAGE_MASK) ==
 			((unsigned long)end & PAGE_MASK)) {
-		ret = __get_user(c, end);
-		(void)c;
+		return __get_user(c, end);
 	}
 
-	return ret;
+	return 0;
 }
 
 int add_to_page_cache_locked(struct page *page, struct address_space *mapping,

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok()
  2016-09-20 19:07     ` [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() Al Viro
@ 2016-09-20 20:24       ` Linus Torvalds
  2016-09-20 20:38         ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2016-09-20 20:24 UTC (permalink / raw)
  To: Al Viro, Heiko Carstens, Martin Schwidefsky, Jan Stancek,
	Arnd Bergmann, Greg Ungerer
  Cc: Linux Kernel Mailing List

On Tue, Sep 20, 2016 at 12:07 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> Switching iov_iter fault-in to multipages variants has exposed an old
> bug in underlying fault_in_multipages_...(); they break if the range
> passed to them wraps around.  Normally access_ok() done by callers
> will prevent such (and it's a guaranteed EFAULT - ERR_PTR() values
> fall into such a range and they should not point to any valid objects).
> However, on architectures where userland and kernel live in different
> MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
> with a wraparound can reach fault_in_multipages_...().

Quite frankly, I think it is access_ok() that should be fixed for s390.

A wrapping user access is *not* ok, not even if kernel and user memory
are separate.

It is insane to make fault_in_multipages..() return EFAULT if a normal
wrapping user access wouldn't. So the fix is not to change
fault_in_multipage_xyz, but to make sure any op that tries to wrap
will properly return EFAULT.

So I really think that we should just say "a no-op access_ok() is a
buggy access_ok()", and fix the problem at the source, rather than
make excuses for it in some random place.

A quick look seems to say that s390 and no-mmu ARM are the only
affected cases, but maybe I missed something.

              Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok()
  2016-09-20 20:24       ` Linus Torvalds
@ 2016-09-20 20:38         ` Al Viro
  2016-09-20 20:48           ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2016-09-20 20:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann,
	Greg Ungerer, Linux Kernel Mailing List

On Tue, Sep 20, 2016 at 01:24:25PM -0700, Linus Torvalds wrote:

> Quite frankly, I think it is access_ok() that should be fixed for s390.
> 
> A wrapping user access is *not* ok, not even if kernel and user memory
> are separate.
> 
> It is insane to make fault_in_multipages..() return EFAULT if a normal
> wrapping user access wouldn't. So the fix is not to change
> fault_in_multipage_xyz, but to make sure any op that tries to wrap
> will properly return EFAULT.

Not the point.  Of course it *would* fail; the problem is that the loop
that would ping each page is never executed.  What happens is
	while (uaddr <= end) 
		touch uaddr
		uaddr += PAGE_SIZE
	if uaddr and end point to different pages
		ping end

What happens if uaddr is greater than end, thanks to wraparound?  Right,
we skip the loop entirely and all we do is one ping of the end.  Which
might very well succeed, leaving us with false positive.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok()
  2016-09-20 20:38         ` Al Viro
@ 2016-09-20 20:48           ` Linus Torvalds
  2016-09-20 21:03             ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2016-09-20 20:48 UTC (permalink / raw)
  To: Al Viro
  Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann,
	Greg Ungerer, Linux Kernel Mailing List

On Tue, Sep 20, 2016 at 1:38 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Not the point.  Of course it *would* fail; the problem is that the loop
> that would ping each page is never executed.

You're missing the point.

If "access_ok()" is fine with wrapping, then some otehr system call
that wraps will successfully do a memcpy_from/to_user() with a wrapped
address (and the proper mappings). Which is completely bogus.

So access_ok() should be fixed regardless. An access_ok() that accepts
a wrapping address is a bug. End of story.

And once that bug is fixed, the fault_in_multipages..() issue is moot.
So it shouldn't be an issue.

             Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok()
  2016-09-20 20:48           ` Linus Torvalds
@ 2016-09-20 21:03             ` Al Viro
  2016-09-20 21:37               ` Al Viro
  2016-09-20 23:43               ` Linus Torvalds
  0 siblings, 2 replies; 12+ messages in thread
From: Al Viro @ 2016-09-20 21:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann,
	Greg Ungerer, Linux Kernel Mailing List

On Tue, Sep 20, 2016 at 01:48:10PM -0700, Linus Torvalds wrote:
> On Tue, Sep 20, 2016 at 1:38 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Not the point.  Of course it *would* fail; the problem is that the loop
> > that would ping each page is never executed.
> 
> You're missing the point.
> 
> If "access_ok()" is fine with wrapping, then some otehr system call
> that wraps will successfully do a memcpy_from/to_user() with a wrapped
> address (and the proper mappings).

What proper mappings?  If you manage to mmap something at (void*)-PAGE_SIZE,
you are very deep in trouble regardless.  We use IS_ERR() on userland
pointers and any architecture where that would be possible would be confused
as hell.

The testcase here is uaddr = (void *)-1, len = (unsigned long)valid_addr + 2.
If it tried to do __put_user(uaddr, 0) it would immediately fail, just as
__copy_to_user(uaddr, len); the problem is, that call will only do
__put_user(valid_addr, 0) and succeed.

Again, if get_user/put_user/copy_{to,from}_user() anywhere near ERR_PTR(...)
would succeed, we'd get trouble without any wraparounds.  That page should
be absent, and it really is.  In all cases, s390 included.  Wraparound is
irrelevant here.  The reason why it got spotted was persistent failure of
copy_{to,from}_user after successful fault-ins.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok()
  2016-09-20 21:03             ` Al Viro
@ 2016-09-20 21:37               ` Al Viro
  2016-09-20 23:43               ` Linus Torvalds
  1 sibling, 0 replies; 12+ messages in thread
From: Al Viro @ 2016-09-20 21:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann,
	Greg Ungerer, Linux Kernel Mailing List

On Tue, Sep 20, 2016 at 10:03:26PM +0100, Al Viro wrote:

> The testcase here is uaddr = (void *)-1, len = (unsigned long)valid_addr + 2.
> If it tried to do __put_user(uaddr, 0) it would immediately fail, just as
> __copy_to_user(uaddr, len); the problem is, that call will only do
> __put_user(valid_addr, 0) and succeed.
> 
> Again, if get_user/put_user/copy_{to,from}_user() anywhere near ERR_PTR(...)
> would succeed, we'd get trouble without any wraparounds.  That page should
> be absent, and it really is.  In all cases, s390 included.  Wraparound is
> irrelevant here.  The reason why it got spotted was persistent failure of
> copy_{to,from}_user after successful fault-ins.

PS: s390 is far from the only such architecture - at least m68k, parisc
and sparc64 are the same way.

Sure, we can make all of them check for wraparounds, but what's the point,
when actual attempts to copy to/from such range will fail anyway and
for absolute majority of the calls the check will do nothing.  What's the
point?

Note that we need to compare uaddr and end in these loops anyway, so we
are not going to save anything there...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok()
  2016-09-20 21:03             ` Al Viro
  2016-09-20 21:37               ` Al Viro
@ 2016-09-20 23:43               ` Linus Torvalds
  2016-09-21  0:38                 ` Al Viro
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2016-09-20 23:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann,
	Greg Ungerer, Linux Kernel Mailing List

[ Sorry, AFK there for a while ]

On Tue, Sep 20, 2016 at 2:03 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> What proper mappings?  If you manage to mmap something at (void*)-PAGE_SIZE,
> you are very deep in trouble regardless.  We use IS_ERR() on userland
> pointers and any architecture where that would be possible would be confused
> as hell.

Ack, you've convinced me. Will apply the patch.

             Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok()
  2016-09-20 23:43               ` Linus Torvalds
@ 2016-09-21  0:38                 ` Al Viro
  0 siblings, 0 replies; 12+ messages in thread
From: Al Viro @ 2016-09-21  0:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann,
	Greg Ungerer, Linux Kernel Mailing List

On Tue, Sep 20, 2016 at 04:43:18PM -0700, Linus Torvalds wrote:
> [ Sorry, AFK there for a while ]
> 
> On Tue, Sep 20, 2016 at 2:03 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > What proper mappings?  If you manage to mmap something at (void*)-PAGE_SIZE,
> > you are very deep in trouble regardless.  We use IS_ERR() on userland
> > pointers and any architecture where that would be possible would be confused
> > as hell.
> 
> Ack, you've convinced me. Will apply the patch.

BTW, folks, there's an interesting part of that story in commit message
of af2e84:
-----------------------------------------------------------------------
pagemap.h: fix warning about possibly used before init var

Commit f56f821feb7b ("mm: extend prefault helpers to fault in more than
PAGE_SIZE") added in the new functions: fault_in_multipages_writeable()
and fault_in_multipages_readable().

However, we currently see:

include/linux/pagemap.h:492: warning: 'ret' may be used uninitialized in this function
include/linux/pagemap.h:492: note: 'ret' was declared here

Unlike a lot of gcc nags, this one appears somewhat legit.  i.e.  passing
in an invalid negative value of "size" does make it look like all the
conditionals in there would be bypassed and the uninitialized value would
be returned.
-----------------------------------------------------------------------

It wasn't just "somewhat" legit - it was entirely accurate and pointing to
real bug.  The real reason why the loop could be bypassed wasn't the negative
'size' (that really couldn't happen), it was the possibility of wraparound.
Which had been masked on most of the architectures by checks that had been
pretty far upstream from that function (entire range passing access_ok())
and architecture-dependent on top of that.  Some architectures would always
prevent wraparounds in access_ok() (x86, for example), some would only do
so outside of set_fd(KERNEL_DS) (where the caller is responsible for validity
of addresses and lengths) and some did not prevent wraparounds on access_ok()
level at all.  IOW, it was OK on a lot of architectures, but only for rather
subtle reasons.  Making ret initialized to 0 did make cc(1) STFU, but didn't
fix the bug...

It was used only in intel-specific drivers (i915 is x86-only), arm-specific
one (armada) and in NTFS; the last one isn't arch-specific, but I doubt that
anyone has really mounted NTFS on e.g. sparc, let alone hit it with LTP or
fuzzing there.  Multipage path in BTRFS write would benefit from it, and
BTRFS probably did get tested on sparc and/or s390, but BTRFS kept using
plain fault-in (and paid for that with more frequent fallbacks to singlepage
codepath).  So it went unnoticed, until it did show up on a path from
generic_perform_write()...

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-09-21  0:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-20 12:56 [bug] pwritev02 hang on s390x with 4.8.0-rc7 Jan Stancek
2016-09-20 15:06 ` Al Viro
2016-09-20 17:11   ` Jan Stancek
2016-09-20 17:30     ` Al Viro
2016-09-20 19:07     ` [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() Al Viro
2016-09-20 20:24       ` Linus Torvalds
2016-09-20 20:38         ` Al Viro
2016-09-20 20:48           ` Linus Torvalds
2016-09-20 21:03             ` Al Viro
2016-09-20 21:37               ` Al Viro
2016-09-20 23:43               ` Linus Torvalds
2016-09-21  0:38                 ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).