* [bug] pwritev02 hang on s390x with 4.8.0-rc7
@ 2016-09-20 12:56 Jan Stancek
  2016-09-20 15:06 ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Stancek @ 2016-09-20 12:56 UTC (permalink / raw)
  To: viro, linux-kernel; +Cc: jstancek
Hi,
I'm hitting a regression with LTP's pwritev02 [1] on s390x with 4.8.0-rc7.
Specifically the EFAULT case, which is passing an iovec with invalid base address:
  #define CHUNK 64
  static struct iovec wr_iovec3[] = {
  	{(char *)-1, CHUNK},
  };
hangs with 100% cpu usage and not very helpful stack trace:
  # cat /proc/28544/stack
  [<0000000000001000>] 0x1000
  [<ffffffffffffffff>] 0xffffffffffffffff
The problem starts with d4690f1e1cda "fix iov_iter_fault_in_readable()".
Before this commit fault_in_pages_readable() called __get_user() on start
address which presumably failed with -EFAULT immediately.
With this commit applied fault_in_multipages_readable() appears to return 0
for the case when "start" is invalid but "end" overflows. Then according to
my traces we keep spinning inside while loop in iomap_write_actor().
Patch below makes the issue go away for me:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 66a1260b33de..e443dbd2b5df 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -600,11 +600,11 @@ static inline int fault_in_multipages_readable(const char __user *uaddr,
                                               int size)
 {
        volatile char c;
-       int ret = 0;
+       int ret = -EFAULT;
        const char __user *end = uaddr + size - 1;
        if (unlikely(size == 0))
-               return ret;
+               return 0;
        while (uaddr <= end) {
                ret = __get_user(c, uaddr);
Regards,
Jan
[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/pwritev/pwritev02.c
^ permalink raw reply related	[flat|nested] 12+ messages in thread* Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7 2016-09-20 12:56 [bug] pwritev02 hang on s390x with 4.8.0-rc7 Jan Stancek @ 2016-09-20 15:06 ` Al Viro 2016-09-20 17:11 ` Jan Stancek 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2016-09-20 15:06 UTC (permalink / raw) To: Jan Stancek; +Cc: linux-kernel On Tue, Sep 20, 2016 at 02:56:06PM +0200, Jan Stancek wrote: > Hi, > > I'm hitting a regression with LTP's pwritev02 [1] on s390x with 4.8.0-rc7. > Specifically the EFAULT case, which is passing an iovec with invalid base address: > #define CHUNK 64 > static struct iovec wr_iovec3[] = { > {(char *)-1, CHUNK}, > }; > hangs with 100% cpu usage and not very helpful stack trace: > # cat /proc/28544/stack > [<0000000000001000>] 0x1000 > [<ffffffffffffffff>] 0xffffffffffffffff > > The problem starts with d4690f1e1cda "fix iov_iter_fault_in_readable()". > > Before this commit fault_in_pages_readable() called __get_user() on start > address which presumably failed with -EFAULT immediately. > > With this commit applied fault_in_multipages_readable() appears to return 0 > for the case when "start" is invalid but "end" overflows. Then according to > my traces we keep spinning inside while loop in iomap_write_actor(). Cute. Let me see if I understand what's going on there: we have a wraparound that would've been caught by most of access_ok(), but not on an architectures where access_ok() is a no-op; in that case the loop is skipped and we just check the last address, which passes and we get a false positive. Bug is real and it's definitely -stable fodder. I'm not sure that the fix you propose is right, though. Note that ERR_PTR() is not a valid address on any architecture, so any wraparound automatically means -EFAULT and we can simply check unlikely(uaddr > end) and bugger off if it holds. while() turns into do-while(), of course, and the same is needed for the read side. Could you check if the following works for you? diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 66a1260..7e3d537 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -571,56 +571,56 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size) */ static inline int fault_in_multipages_writeable(char __user *uaddr, int size) { - int ret = 0; char __user *end = uaddr + size - 1; if (unlikely(size == 0)) - return ret; + return 0; + if (unlikely(uaddr > end)) + return -EFAULT; /* * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. */ - while (uaddr <= end) { - ret = __put_user(0, uaddr); - if (ret != 0) - return ret; + do { + if (unlikely(__put_user(0, uaddr) != 0)) + return -EFAULT; uaddr += PAGE_SIZE; - } + } while (uaddr <= end); /* Check whether the range spilled into the next page. */ if (((unsigned long)uaddr & PAGE_MASK) == ((unsigned long)end & PAGE_MASK)) - ret = __put_user(0, end); + return __put_user(0, end); - return ret; + return 0; } static inline int fault_in_multipages_readable(const char __user *uaddr, int size) { volatile char c; - int ret = 0; const char __user *end = uaddr + size - 1; if (unlikely(size == 0)) - return ret; + return 0; - while (uaddr <= end) { - ret = __get_user(c, uaddr); - if (ret != 0) - return ret; + if (unlikely(uaddr > end)) + return -EFAULT; + + do { + if (unlikely(__get_user(c, uaddr) != 0)) + return -EFAULT; uaddr += PAGE_SIZE; - } + } while (uaddr <= end); /* Check whether the range spilled into the next page. */ if (((unsigned long)uaddr & PAGE_MASK) == ((unsigned long)end & PAGE_MASK)) { - ret = __get_user(c, end); - (void)c; + return __get_user(c, end); } - return ret; + return 0; } int add_to_page_cache_locked(struct page *page, struct address_space *mapping, ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7 2016-09-20 15:06 ` Al Viro @ 2016-09-20 17:11 ` Jan Stancek 2016-09-20 17:30 ` Al Viro 2016-09-20 19:07 ` [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() Al Viro 0 siblings, 2 replies; 12+ messages in thread From: Jan Stancek @ 2016-09-20 17:11 UTC (permalink / raw) To: Al Viro; +Cc: linux-kernel ----- Original Message ----- > From: "Al Viro" <viro@ZenIV.linux.org.uk> > To: "Jan Stancek" <jstancek@redhat.com> > Cc: linux-kernel@vger.kernel.org > Sent: Tuesday, 20 September, 2016 5:06:57 PM > Subject: Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7 > > On Tue, Sep 20, 2016 at 02:56:06PM +0200, Jan Stancek wrote: > > Hi, > > > > I'm hitting a regression with LTP's pwritev02 [1] on s390x with 4.8.0-rc7. > > Specifically the EFAULT case, which is passing an iovec with invalid base > > address: > > #define CHUNK 64 > > static struct iovec wr_iovec3[] = { > > {(char *)-1, CHUNK}, > > }; > > hangs with 100% cpu usage and not very helpful stack trace: > > # cat /proc/28544/stack > > [<0000000000001000>] 0x1000 > > [<ffffffffffffffff>] 0xffffffffffffffff > > > > The problem starts with d4690f1e1cda "fix iov_iter_fault_in_readable()". > > > > Before this commit fault_in_pages_readable() called __get_user() on start > > address which presumably failed with -EFAULT immediately. > > > > With this commit applied fault_in_multipages_readable() appears to return 0 > > for the case when "start" is invalid but "end" overflows. Then according to > > my traces we keep spinning inside while loop in iomap_write_actor(). > > Cute. Let me see if I understand what's going on there: we have a wraparound > that would've been caught by most of access_ok(), but not on an architectures > where access_ok() is a no-op; in that case the loop is skipped and we > just check the last address, which passes and we get a false positive. > Bug is real and it's definitely -stable fodder. > > I'm not sure that the fix you propose is right, though. Note that ERR_PTR() > is not a valid address on any architecture, so any wraparound automatically > means -EFAULT and we can simply check unlikely(uaddr > end) and bugger off > if it holds. while() turns into do-while(), of course, and the same is > needed for the read side. > > Could you check if the following works for you? This fixes pwritev02 hang. I ran all syscalls tests from LTP, and I see a change in behaviour of couple other tests (writev01, writev03 and writev04 [1]) in 4.8.0-rc7. These call writev() with partially invalid iovecs, and now fail with EFAULT, while with previous -rc6 kernel they returned number of bytes written before they encountered invalid iovec record. This should be reproducible also on x86. Regards, Jan [1] https://github.com/linux-test-project/ltp/tree/master/testcases/kernel/syscalls/writev > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 66a1260..7e3d537 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -571,56 +571,56 @@ static inline int fault_in_pages_readable(const char > __user *uaddr, int size) > */ > static inline int fault_in_multipages_writeable(char __user *uaddr, int > size) > { > - int ret = 0; > char __user *end = uaddr + size - 1; > > if (unlikely(size == 0)) > - return ret; > + return 0; > > + if (unlikely(uaddr > end)) > + return -EFAULT; > /* > * Writing zeroes into userspace here is OK, because we know that if > * the zero gets there, we'll be overwriting it. > */ > - while (uaddr <= end) { > - ret = __put_user(0, uaddr); > - if (ret != 0) > - return ret; > + do { > + if (unlikely(__put_user(0, uaddr) != 0)) > + return -EFAULT; > uaddr += PAGE_SIZE; > - } > + } while (uaddr <= end); > > /* Check whether the range spilled into the next page. */ > if (((unsigned long)uaddr & PAGE_MASK) == > ((unsigned long)end & PAGE_MASK)) > - ret = __put_user(0, end); > + return __put_user(0, end); > > - return ret; > + return 0; > } > > static inline int fault_in_multipages_readable(const char __user *uaddr, > int size) > { > volatile char c; > - int ret = 0; > const char __user *end = uaddr + size - 1; > > if (unlikely(size == 0)) > - return ret; > + return 0; > > - while (uaddr <= end) { > - ret = __get_user(c, uaddr); > - if (ret != 0) > - return ret; > + if (unlikely(uaddr > end)) > + return -EFAULT; > + > + do { > + if (unlikely(__get_user(c, uaddr) != 0)) > + return -EFAULT; > uaddr += PAGE_SIZE; > - } > + } while (uaddr <= end); > > /* Check whether the range spilled into the next page. */ > if (((unsigned long)uaddr & PAGE_MASK) == > ((unsigned long)end & PAGE_MASK)) { > - ret = __get_user(c, end); > - (void)c; > + return __get_user(c, end); > } > > - return ret; > + return 0; > } > > int add_to_page_cache_locked(struct page *page, struct address_space > *mapping, > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [bug] pwritev02 hang on s390x with 4.8.0-rc7 2016-09-20 17:11 ` Jan Stancek @ 2016-09-20 17:30 ` Al Viro 2016-09-20 19:07 ` [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() Al Viro 1 sibling, 0 replies; 12+ messages in thread From: Al Viro @ 2016-09-20 17:30 UTC (permalink / raw) To: Jan Stancek; +Cc: linux-kernel On Tue, Sep 20, 2016 at 01:11:41PM -0400, Jan Stancek wrote: > I ran all syscalls tests from LTP, and I see a change in behaviour > of couple other tests (writev01, writev03 and writev04 [1]) in 4.8.0-rc7. > > These call writev() with partially invalid iovecs, and now fail with > EFAULT, while with previous -rc6 kernel they returned number of bytes > written before they encountered invalid iovec record. > This should be reproducible also on x86. Known, discussed and considered legitimate. It's not so much EFAULT vs short write, it's how far do we shorten the write. Change consists of removing an accidental (and undocumented) property of iovec boundaries wrt write shortening. Usually an invalid address anywhere in the data we are asked to write leads to write shortened to the last pagecache boundary (i.e file position multiple of page size) entirely covered by valid data. It is filesystem-dependent and already deep in nasal demon territory. writev, pretty much by accident, never shortened past an iovec boundary. That's what got changed - now the rules are same as they are for all writes. Having an LTP test (as opposed to actual real-world code) deliberately stepping into that and checking how far does the shortening go means just one thing: update the test. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() 2016-09-20 17:11 ` Jan Stancek 2016-09-20 17:30 ` Al Viro @ 2016-09-20 19:07 ` Al Viro 2016-09-20 20:24 ` Linus Torvalds 1 sibling, 1 reply; 12+ messages in thread From: Al Viro @ 2016-09-20 19:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Switching iov_iter fault-in to multipages variants has exposed an old bug in underlying fault_in_multipages_...(); they break if the range passed to them wraps around. Normally access_ok() done by callers will prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into such a range and they should not point to any valid objects). However, on architectures where userland and kernel live in different MMU contexts (e.g. s390) access_ok() is a no-op and on those a range with a wraparound can reach fault_in_multipages_...(). Since any wraparound means EFAULT there, the fix is trivial - turn those while (uaddr <= end) ... into if (unlikely(uaddr > end)) return -EFAULT; do ... while (uaddr <= end); Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Cc: stable@vger.kernel.org # v3.5+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 66a1260..7e3d537 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -571,56 +571,56 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size) */ static inline int fault_in_multipages_writeable(char __user *uaddr, int size) { - int ret = 0; char __user *end = uaddr + size - 1; if (unlikely(size == 0)) - return ret; + return 0; + if (unlikely(uaddr > end)) + return -EFAULT; /* * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. */ - while (uaddr <= end) { - ret = __put_user(0, uaddr); - if (ret != 0) - return ret; + do { + if (unlikely(__put_user(0, uaddr) != 0)) + return -EFAULT; uaddr += PAGE_SIZE; - } + } while (uaddr <= end); /* Check whether the range spilled into the next page. */ if (((unsigned long)uaddr & PAGE_MASK) == ((unsigned long)end & PAGE_MASK)) - ret = __put_user(0, end); + return __put_user(0, end); - return ret; + return 0; } static inline int fault_in_multipages_readable(const char __user *uaddr, int size) { volatile char c; - int ret = 0; const char __user *end = uaddr + size - 1; if (unlikely(size == 0)) - return ret; + return 0; - while (uaddr <= end) { - ret = __get_user(c, uaddr); - if (ret != 0) - return ret; + if (unlikely(uaddr > end)) + return -EFAULT; + + do { + if (unlikely(__get_user(c, uaddr) != 0)) + return -EFAULT; uaddr += PAGE_SIZE; - } + } while (uaddr <= end); /* Check whether the range spilled into the next page. */ if (((unsigned long)uaddr & PAGE_MASK) == ((unsigned long)end & PAGE_MASK)) { - ret = __get_user(c, end); - (void)c; + return __get_user(c, end); } - return ret; + return 0; } int add_to_page_cache_locked(struct page *page, struct address_space *mapping, ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() 2016-09-20 19:07 ` [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() Al Viro @ 2016-09-20 20:24 ` Linus Torvalds 2016-09-20 20:38 ` Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2016-09-20 20:24 UTC (permalink / raw) To: Al Viro, Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann, Greg Ungerer Cc: Linux Kernel Mailing List On Tue, Sep 20, 2016 at 12:07 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > Switching iov_iter fault-in to multipages variants has exposed an old > bug in underlying fault_in_multipages_...(); they break if the range > passed to them wraps around. Normally access_ok() done by callers > will prevent such (and it's a guaranteed EFAULT - ERR_PTR() values > fall into such a range and they should not point to any valid objects). > However, on architectures where userland and kernel live in different > MMU contexts (e.g. s390) access_ok() is a no-op and on those a range > with a wraparound can reach fault_in_multipages_...(). Quite frankly, I think it is access_ok() that should be fixed for s390. A wrapping user access is *not* ok, not even if kernel and user memory are separate. It is insane to make fault_in_multipages..() return EFAULT if a normal wrapping user access wouldn't. So the fix is not to change fault_in_multipage_xyz, but to make sure any op that tries to wrap will properly return EFAULT. So I really think that we should just say "a no-op access_ok() is a buggy access_ok()", and fix the problem at the source, rather than make excuses for it in some random place. A quick look seems to say that s390 and no-mmu ARM are the only affected cases, but maybe I missed something. Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() 2016-09-20 20:24 ` Linus Torvalds @ 2016-09-20 20:38 ` Al Viro 2016-09-20 20:48 ` Linus Torvalds 0 siblings, 1 reply; 12+ messages in thread From: Al Viro @ 2016-09-20 20:38 UTC (permalink / raw) To: Linus Torvalds Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann, Greg Ungerer, Linux Kernel Mailing List On Tue, Sep 20, 2016 at 01:24:25PM -0700, Linus Torvalds wrote: > Quite frankly, I think it is access_ok() that should be fixed for s390. > > A wrapping user access is *not* ok, not even if kernel and user memory > are separate. > > It is insane to make fault_in_multipages..() return EFAULT if a normal > wrapping user access wouldn't. So the fix is not to change > fault_in_multipage_xyz, but to make sure any op that tries to wrap > will properly return EFAULT. Not the point. Of course it *would* fail; the problem is that the loop that would ping each page is never executed. What happens is while (uaddr <= end) touch uaddr uaddr += PAGE_SIZE if uaddr and end point to different pages ping end What happens if uaddr is greater than end, thanks to wraparound? Right, we skip the loop entirely and all we do is one ping of the end. Which might very well succeed, leaving us with false positive. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() 2016-09-20 20:38 ` Al Viro @ 2016-09-20 20:48 ` Linus Torvalds 2016-09-20 21:03 ` Al Viro 0 siblings, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2016-09-20 20:48 UTC (permalink / raw) To: Al Viro Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann, Greg Ungerer, Linux Kernel Mailing List On Tue, Sep 20, 2016 at 1:38 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Not the point. Of course it *would* fail; the problem is that the loop > that would ping each page is never executed. You're missing the point. If "access_ok()" is fine with wrapping, then some otehr system call that wraps will successfully do a memcpy_from/to_user() with a wrapped address (and the proper mappings). Which is completely bogus. So access_ok() should be fixed regardless. An access_ok() that accepts a wrapping address is a bug. End of story. And once that bug is fixed, the fault_in_multipages..() issue is moot. So it shouldn't be an issue. Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() 2016-09-20 20:48 ` Linus Torvalds @ 2016-09-20 21:03 ` Al Viro 2016-09-20 21:37 ` Al Viro 2016-09-20 23:43 ` Linus Torvalds 0 siblings, 2 replies; 12+ messages in thread From: Al Viro @ 2016-09-20 21:03 UTC (permalink / raw) To: Linus Torvalds Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann, Greg Ungerer, Linux Kernel Mailing List On Tue, Sep 20, 2016 at 01:48:10PM -0700, Linus Torvalds wrote: > On Tue, Sep 20, 2016 at 1:38 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Not the point. Of course it *would* fail; the problem is that the loop > > that would ping each page is never executed. > > You're missing the point. > > If "access_ok()" is fine with wrapping, then some otehr system call > that wraps will successfully do a memcpy_from/to_user() with a wrapped > address (and the proper mappings). What proper mappings? If you manage to mmap something at (void*)-PAGE_SIZE, you are very deep in trouble regardless. We use IS_ERR() on userland pointers and any architecture where that would be possible would be confused as hell. The testcase here is uaddr = (void *)-1, len = (unsigned long)valid_addr + 2. If it tried to do __put_user(uaddr, 0) it would immediately fail, just as __copy_to_user(uaddr, len); the problem is, that call will only do __put_user(valid_addr, 0) and succeed. Again, if get_user/put_user/copy_{to,from}_user() anywhere near ERR_PTR(...) would succeed, we'd get trouble without any wraparounds. That page should be absent, and it really is. In all cases, s390 included. Wraparound is irrelevant here. The reason why it got spotted was persistent failure of copy_{to,from}_user after successful fault-ins. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() 2016-09-20 21:03 ` Al Viro @ 2016-09-20 21:37 ` Al Viro 2016-09-20 23:43 ` Linus Torvalds 1 sibling, 0 replies; 12+ messages in thread From: Al Viro @ 2016-09-20 21:37 UTC (permalink / raw) To: Linus Torvalds Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann, Greg Ungerer, Linux Kernel Mailing List On Tue, Sep 20, 2016 at 10:03:26PM +0100, Al Viro wrote: > The testcase here is uaddr = (void *)-1, len = (unsigned long)valid_addr + 2. > If it tried to do __put_user(uaddr, 0) it would immediately fail, just as > __copy_to_user(uaddr, len); the problem is, that call will only do > __put_user(valid_addr, 0) and succeed. > > Again, if get_user/put_user/copy_{to,from}_user() anywhere near ERR_PTR(...) > would succeed, we'd get trouble without any wraparounds. That page should > be absent, and it really is. In all cases, s390 included. Wraparound is > irrelevant here. The reason why it got spotted was persistent failure of > copy_{to,from}_user after successful fault-ins. PS: s390 is far from the only such architecture - at least m68k, parisc and sparc64 are the same way. Sure, we can make all of them check for wraparounds, but what's the point, when actual attempts to copy to/from such range will fail anyway and for absolute majority of the calls the check will do nothing. What's the point? Note that we need to compare uaddr and end in these loops anyway, so we are not going to save anything there... ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() 2016-09-20 21:03 ` Al Viro 2016-09-20 21:37 ` Al Viro @ 2016-09-20 23:43 ` Linus Torvalds 2016-09-21 0:38 ` Al Viro 1 sibling, 1 reply; 12+ messages in thread From: Linus Torvalds @ 2016-09-20 23:43 UTC (permalink / raw) To: Al Viro Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann, Greg Ungerer, Linux Kernel Mailing List [ Sorry, AFK there for a while ] On Tue, Sep 20, 2016 at 2:03 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > What proper mappings? If you manage to mmap something at (void*)-PAGE_SIZE, > you are very deep in trouble regardless. We use IS_ERR() on userland > pointers and any architecture where that would be possible would be confused > as hell. Ack, you've convinced me. Will apply the patch. Linus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() 2016-09-20 23:43 ` Linus Torvalds @ 2016-09-21 0:38 ` Al Viro 0 siblings, 0 replies; 12+ messages in thread From: Al Viro @ 2016-09-21 0:38 UTC (permalink / raw) To: Linus Torvalds Cc: Heiko Carstens, Martin Schwidefsky, Jan Stancek, Arnd Bergmann, Greg Ungerer, Linux Kernel Mailing List On Tue, Sep 20, 2016 at 04:43:18PM -0700, Linus Torvalds wrote: > [ Sorry, AFK there for a while ] > > On Tue, Sep 20, 2016 at 2:03 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > What proper mappings? If you manage to mmap something at (void*)-PAGE_SIZE, > > you are very deep in trouble regardless. We use IS_ERR() on userland > > pointers and any architecture where that would be possible would be confused > > as hell. > > Ack, you've convinced me. Will apply the patch. BTW, folks, there's an interesting part of that story in commit message of af2e84: ----------------------------------------------------------------------- pagemap.h: fix warning about possibly used before init var Commit f56f821feb7b ("mm: extend prefault helpers to fault in more than PAGE_SIZE") added in the new functions: fault_in_multipages_writeable() and fault_in_multipages_readable(). However, we currently see: include/linux/pagemap.h:492: warning: 'ret' may be used uninitialized in this function include/linux/pagemap.h:492: note: 'ret' was declared here Unlike a lot of gcc nags, this one appears somewhat legit. i.e. passing in an invalid negative value of "size" does make it look like all the conditionals in there would be bypassed and the uninitialized value would be returned. ----------------------------------------------------------------------- It wasn't just "somewhat" legit - it was entirely accurate and pointing to real bug. The real reason why the loop could be bypassed wasn't the negative 'size' (that really couldn't happen), it was the possibility of wraparound. Which had been masked on most of the architectures by checks that had been pretty far upstream from that function (entire range passing access_ok()) and architecture-dependent on top of that. Some architectures would always prevent wraparounds in access_ok() (x86, for example), some would only do so outside of set_fd(KERNEL_DS) (where the caller is responsible for validity of addresses and lengths) and some did not prevent wraparounds on access_ok() level at all. IOW, it was OK on a lot of architectures, but only for rather subtle reasons. Making ret initialized to 0 did make cc(1) STFU, but didn't fix the bug... It was used only in intel-specific drivers (i915 is x86-only), arm-specific one (armada) and in NTFS; the last one isn't arch-specific, but I doubt that anyone has really mounted NTFS on e.g. sparc, let alone hit it with LTP or fuzzing there. Multipage path in BTRFS write would benefit from it, and BTRFS probably did get tested on sparc and/or s390, but BTRFS kept using plain fault-in (and paid for that with more frequent fallbacks to singlepage codepath). So it went unnoticed, until it did show up on a path from generic_perform_write()... ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-09-21 0:39 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-09-20 12:56 [bug] pwritev02 hang on s390x with 4.8.0-rc7 Jan Stancek 2016-09-20 15:06 ` Al Viro 2016-09-20 17:11 ` Jan Stancek 2016-09-20 17:30 ` Al Viro 2016-09-20 19:07 ` [PATCH] fix fault_in_multipages_...() on architectures with no-op access_ok() Al Viro 2016-09-20 20:24 ` Linus Torvalds 2016-09-20 20:38 ` Al Viro 2016-09-20 20:48 ` Linus Torvalds 2016-09-20 21:03 ` Al Viro 2016-09-20 21:37 ` Al Viro 2016-09-20 23:43 ` Linus Torvalds 2016-09-21 0:38 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).