linux-sh.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
@ 2016-09-16 19:12 Guenter Roeck
  2016-09-16 19:45 ` Al Viro
  2016-09-16 20:03 ` Al Viro
  0 siblings, 2 replies; 18+ messages in thread
From: Guenter Roeck @ 2016-09-16 19:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Al Viro, Yoshinori Sato, linux-sh, Rich Felker

Hi,

I see the following runtime failure when running a 'sh' image with qemu in -next.

[ ... ]

sd 0:0:0:0: [sda] Attached SCSI disk
EXT2-fs (sda): warning: mounting unchecked fs, running e2fsck is recommended
VFS: Mounted root (ext2 filesystem) on device 8:0.
Freeing unused kernel memory: 124K (8c48a000 - 8c4a9000)
This architecture does not have kernel memory protection.
random: fast init done
Starting logging: OK
usb 1-1: new full-speed USB device number 2 using sm501-usb
Initializing random number generator... done.
Starting network...
ip: OVERRUN: Invalid argument
ip: OVERRUN: Bad address
ip: OVERRUN: Bad address
ip: OVERRUN: Bad address
ip: OVERRUN: Bad address
[repeats until the test aborts]

Bisect points to commit 6e050503a150 ("sh: fix copy_from_user()"). Bisect log is
attached.

Guenter

# bad: [c96ee5192be2bd160b209e063647c0a504247805] Add linux-next specific files for 20160915
# good: [9395452b4aab7bc2475ef8935b4a4fb99d778d70] Linux 4.8-rc6
git bisect start 'HEAD' 'v4.8-rc6'
# bad: [b9382cef0c0edb556ce9914fef0c3ccc3288fe93] Merge remote-tracking branch 'crypto/master'
git bisect bad b9382cef0c0edb556ce9914fef0c3ccc3288fe93
# bad: [c2c9f3960657219312f95baa1f140f3e0f257ebf] Merge branch 'dmi/master'
git bisect bad c2c9f3960657219312f95baa1f140f3e0f257ebf
# bad: [d3b5a0406cf8b7f5c655639c1eadbbc65ba3a56c] Merge remote-tracking branch 'arm-soc/for-next'
git bisect bad d3b5a0406cf8b7f5c655639c1eadbbc65ba3a56c
# good: [cd42d7672665ef51ae895631ec79ba9b4801cc85] Merge branch 'next/dt64' into for-next
git bisect good cd42d7672665ef51ae895631ec79ba9b4801cc85
# bad: [84b017489a6a09ffc394efe637346b598cef68a5] Merge remote-tracking branch 'input-current/for-linus'
git bisect bad 84b017489a6a09ffc394efe637346b598cef68a5
# good: [440f895aa97f81a2bdc02993da5360a1f6da2fb5] drivers: net: phy: xgene: Fix 'remove' function
git bisect good 440f895aa97f81a2bdc02993da5360a1f6da2fb5
# good: [b8f26e880c8166604e0da741eccd9fe6d8e1b9fb] Merge tag 'for-linus-4.8b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
git bisect good b8f26e880c8166604e0da741eccd9fe6d8e1b9fb
# bad: [77e5bdf9f7b2d20939c8d807f3e68778d6e1557a] Merge branch 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad 77e5bdf9f7b2d20939c8d807f3e68778d6e1557a
# good: [2e29f50ad5e23db37dde9be71410d95d50241ecd] nios2: fix __get_user()
git bisect good 2e29f50ad5e23db37dde9be71410d95d50241ecd
# good: [c6852389228df9fb3067f94f3b651de2a7921b36] sh64: failing __get_user() should zero
git bisect good c6852389228df9fb3067f94f3b651de2a7921b36
# bad: [c90a3bc5061d57e7931a9b7ad14784e1a0ed497d] m32r: fix __get_user()
git bisect bad c90a3bc5061d57e7931a9b7ad14784e1a0ed497d
# bad: [917400cecb4b52b5cde5417348322bb9c8272fa6] sparc32: fix copy_from_user()
git bisect bad 917400cecb4b52b5cde5417348322bb9c8272fa6
# bad: [6e050503a150b2126620c1a1e9b3a368fcd51eac] sh: fix copy_from_user()
git bisect bad 6e050503a150b2126620c1a1e9b3a368fcd51eac
# first bad commit: [6e050503a150b2126620c1a1e9b3a368fcd51eac] sh: fix copy_from_user()

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 19:12 Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()' Guenter Roeck
@ 2016-09-16 19:45 ` Al Viro
  2016-09-16 20:59   ` Guenter Roeck
  2016-09-16 20:03 ` Al Viro
  1 sibling, 1 reply; 18+ messages in thread
From: Al Viro @ 2016-09-16 19:45 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-kernel, Yoshinori Sato, linux-sh, Rich Felker

On Fri, Sep 16, 2016 at 12:12:18PM -0700, Guenter Roeck wrote:
> Hi,
> 
> I see the following runtime failure when running a 'sh' image with qemu in -next.

> Bisect points to commit 6e050503a150 ("sh: fix copy_from_user()"). Bisect log is
> attached.

Does reverting it recover the thing?

The change in question is
        if (__copy_size && __access_ok(__copy_from, __copy_size))
-               return __copy_user(to, from, __copy_size);
+               __copy_size = __copy_user(to, from, __copy_size);
+
+       if (unlikely(__copy_size))
+               memset(to + (n - __copy_size), 0, __copy_size);
 
        return __copy_size;

so the only difference is zeroing the tail of destination; return value
remains the same in all cases (what used to be return foo(); becomes
__copy_size = foo(); /* operations not modifying __copy_size */
return __copy_size;) and that memset is 100% legitimate -
copy_from_user(to, from, n) returning m means that the last m bytes of
[to .. to + n - 1] have not been copied into and must be zeroed.

If it affects anything at all, we have a serious problem somewhere in the
caller.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 19:12 Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()' Guenter Roeck
  2016-09-16 19:45 ` Al Viro
@ 2016-09-16 20:03 ` Al Viro
  2016-09-16 20:32   ` Guenter Roeck
  1 sibling, 1 reply; 18+ messages in thread
From: Al Viro @ 2016-09-16 20:03 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-kernel, Yoshinori Sato, linux-sh, Rich Felker

On Fri, Sep 16, 2016 at 12:12:18PM -0700, Guenter Roeck wrote:
> Hi,
> 
> I see the following runtime failure when running a 'sh' image with qemu in -next.
> 
> [ ... ]
> 
> sd 0:0:0:0: [sda] Attached SCSI disk
> EXT2-fs (sda): warning: mounting unchecked fs, running e2fsck is recommended
> VFS: Mounted root (ext2 filesystem) on device 8:0.
> Freeing unused kernel memory: 124K (8c48a000 - 8c4a9000)
> This architecture does not have kernel memory protection.
> random: fast init done
> Starting logging: OK
> usb 1-1: new full-speed USB device number 2 using sm501-usb
> Initializing random number generator... done.
> Starting network...
> ip: OVERRUN: Invalid argument
> ip: OVERRUN: Bad address
> ip: OVERRUN: Bad address
> ip: OVERRUN: Bad address
> ip: OVERRUN: Bad address
> [repeats until the test aborts]
> 
> Bisect points to commit 6e050503a150 ("sh: fix copy_from_user()"). Bisect log is
> attached.

BTW, could you post your .config and information about your userland?  E.g.
is that ip(8) a busybox one, etc.  If it's busybox, this smells like EINVAL
and EFAULT resp. coming from recvmsg() on netlink sockets, with nothing
extraordinary in iovecs, AFAICS...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 20:03 ` Al Viro
@ 2016-09-16 20:32   ` Guenter Roeck
  2016-09-16 20:43     ` Lennart Sorensen
  2016-09-16 21:20     ` Al Viro
  0 siblings, 2 replies; 18+ messages in thread
From: Guenter Roeck @ 2016-09-16 20:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Yoshinori Sato, linux-sh, Rich Felker

On Fri, Sep 16, 2016 at 09:03:20PM +0100, Al Viro wrote:
> On Fri, Sep 16, 2016 at 12:12:18PM -0700, Guenter Roeck wrote:
> > Hi,
> > 
> > I see the following runtime failure when running a 'sh' image with qemu in -next.
> > 
> > [ ... ]
> > 
> > sd 0:0:0:0: [sda] Attached SCSI disk
> > EXT2-fs (sda): warning: mounting unchecked fs, running e2fsck is recommended
> > VFS: Mounted root (ext2 filesystem) on device 8:0.
> > Freeing unused kernel memory: 124K (8c48a000 - 8c4a9000)
> > This architecture does not have kernel memory protection.
> > random: fast init done
> > Starting logging: OK
> > usb 1-1: new full-speed USB device number 2 using sm501-usb
> > Initializing random number generator... done.
> > Starting network...
> > ip: OVERRUN: Invalid argument
> > ip: OVERRUN: Bad address
> > ip: OVERRUN: Bad address
> > ip: OVERRUN: Bad address
> > ip: OVERRUN: Bad address
> > [repeats until the test aborts]
> > 
> > Bisect points to commit 6e050503a150 ("sh: fix copy_from_user()"). Bisect log is
> > attached.
> 
> BTW, could you post your .config and information about your userland?  E.g.
> is that ip(8) a busybox one, etc.  If it's busybox, this smells like EINVAL
> and EFAULT resp. coming from recvmsg() on netlink sockets, with nothing
> extraordinary in iovecs, AFAICS...

Please see https://github.com/groeck/linux-build-test/tree/master/rootfs/sh.
The rootfs is some three years old. I don't remember how I created it :-).

Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 20:32   ` Guenter Roeck
@ 2016-09-16 20:43     ` Lennart Sorensen
  2016-09-16 21:20     ` Al Viro
  1 sibling, 0 replies; 18+ messages in thread
From: Lennart Sorensen @ 2016-09-16 20:43 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Al Viro, linux-kernel, Yoshinori Sato, linux-sh, Rich Felker

On Fri, Sep 16, 2016 at 01:32:05PM -0700, Guenter Roeck wrote:
> On Fri, Sep 16, 2016 at 09:03:20PM +0100, Al Viro wrote:
> > On Fri, Sep 16, 2016 at 12:12:18PM -0700, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > I see the following runtime failure when running a 'sh' image with qemu in -next.
> > > 
> > > [ ... ]
> > > 
> > > sd 0:0:0:0: [sda] Attached SCSI disk
> > > EXT2-fs (sda): warning: mounting unchecked fs, running e2fsck is recommended
> > > VFS: Mounted root (ext2 filesystem) on device 8:0.
> > > Freeing unused kernel memory: 124K (8c48a000 - 8c4a9000)
> > > This architecture does not have kernel memory protection.
> > > random: fast init done
> > > Starting logging: OK
> > > usb 1-1: new full-speed USB device number 2 using sm501-usb
> > > Initializing random number generator... done.
> > > Starting network...
> > > ip: OVERRUN: Invalid argument
> > > ip: OVERRUN: Bad address
> > > ip: OVERRUN: Bad address
> > > ip: OVERRUN: Bad address
> > > ip: OVERRUN: Bad address
> > > [repeats until the test aborts]
> > > 
> > > Bisect points to commit 6e050503a150 ("sh: fix copy_from_user()"). Bisect log is
> > > attached.
> > 
> > BTW, could you post your .config and information about your userland?  E.g.
> > is that ip(8) a busybox one, etc.  If it's busybox, this smells like EINVAL
> > and EFAULT resp. coming from recvmsg() on netlink sockets, with nothing
> > extraordinary in iovecs, AFAICS...
> 
> Please see https://github.com/groeck/linux-build-test/tree/master/rootfs/sh.
> The rootfs is some three years old. I don't remember how I created it :-).

Well it certainly says in it that ip is busybox, and the version is 1.21.1

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 19:45 ` Al Viro
@ 2016-09-16 20:59   ` Guenter Roeck
  2016-09-16 21:31     ` Al Viro
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2016-09-16 20:59 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Yoshinori Sato, linux-sh, Rich Felker

On Fri, Sep 16, 2016 at 08:45:33PM +0100, Al Viro wrote:
> On Fri, Sep 16, 2016 at 12:12:18PM -0700, Guenter Roeck wrote:
> > Hi,
> > 
> > I see the following runtime failure when running a 'sh' image with qemu in -next.
> 
> > Bisect points to commit 6e050503a150 ("sh: fix copy_from_user()"). Bisect log is
> > attached.
> 
> Does reverting it recover the thing?
> 
Yes, reverting 6e050503a150 fixes the problem.

I added a BUG() into the "if (unlikely())" below, but it doesn't catch,
and I still get the ip: OVERRUN errors. Which leaves me a bit puzzled.

Guenter

> The change in question is
>         if (__copy_size && __access_ok(__copy_from, __copy_size))
> -               return __copy_user(to, from, __copy_size);
> +               __copy_size = __copy_user(to, from, __copy_size);
> +
> +       if (unlikely(__copy_size))
> +               memset(to + (n - __copy_size), 0, __copy_size);
>  
>         return __copy_size;
> 
> so the only difference is zeroing the tail of destination; return value
> remains the same in all cases (what used to be return foo(); becomes
> __copy_size = foo(); /* operations not modifying __copy_size */
> return __copy_size;) and that memset is 100% legitimate -
> copy_from_user(to, from, n) returning m means that the last m bytes of
> [to .. to + n - 1] have not been copied into and must be zeroed.
> 
> If it affects anything at all, we have a serious problem somewhere in the
> caller.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 20:32   ` Guenter Roeck
  2016-09-16 20:43     ` Lennart Sorensen
@ 2016-09-16 21:20     ` Al Viro
  1 sibling, 0 replies; 18+ messages in thread
From: Al Viro @ 2016-09-16 21:20 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-kernel, Yoshinori Sato, linux-sh, Rich Felker

On Fri, Sep 16, 2016 at 01:32:05PM -0700, Guenter Roeck wrote:
> > BTW, could you post your .config and information about your userland?  E.g.
> > is that ip(8) a busybox one, etc.  If it's busybox, this smells like EINVAL
> > and EFAULT resp. coming from recvmsg() on netlink sockets, with nothing
> > extraordinary in iovecs, AFAICS...
> 
> Please see https://github.com/groeck/linux-build-test/tree/master/rootfs/sh.
> The rootfs is some three years old. I don't remember how I created it :-).

Will do.  In the meanwhile, could you verify that reverting that commit
really fixes the problem (and, if possible, check if the mainline has
the same problem)?

This is really very strange - I don't see anything in that commit to
alter the return value or to stomp on anything that shouldn't be modified,
error or no error.  The rules from copy_from_user(to, from, n) are
	* if everything in n-bytes range starting at from is readable userland
memory, its contents should be copied into n-bytes range starting at to and
0 should be returned.
	* otherwise a poitive number between 1 and n should be returned;
if it returns n - m, everything in m-bytes range starting at from should be
readable userland memory and its contents should be copied into m-bytes
range starting at to, with remaining n - m bytes of n-bytes range starting
at to getting zeroed.  That includes the case of m equal to 0 (i.e. return
value being equal to n) - then nothing is copied and the entire n-bytes
range starting at to is zeroed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 20:59   ` Guenter Roeck
@ 2016-09-16 21:31     ` Al Viro
  2016-09-16 21:39       ` Rich Felker
  2016-09-16 22:47       ` Guenter Roeck
  0 siblings, 2 replies; 18+ messages in thread
From: Al Viro @ 2016-09-16 21:31 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-kernel, Yoshinori Sato, linux-sh, Rich Felker

On Fri, Sep 16, 2016 at 01:59:38PM -0700, Guenter Roeck wrote:
> Yes, reverting 6e050503a150 fixes the problem.
> 
> I added a BUG() into the "if (unlikely())" below, but it doesn't catch,
> and I still get the ip: OVERRUN errors. Which leaves me a bit puzzled.
> 
> Guenter
> 
> > The change in question is
> >         if (__copy_size && __access_ok(__copy_from, __copy_size))
> > -               return __copy_user(to, from, __copy_size);
> > +               __copy_size = __copy_user(to, from, __copy_size);
> > +
> > +       if (unlikely(__copy_size))
> > +               memset(to + (n - __copy_size), 0, __copy_size);
> >  
> >         return __copy_size;

So we don't even hit that memset()?  What the hell?  __copy_user() is
declared as
__kernel_size_t __copy_user(void *to, const void *from, __kernel_size_t n);

and __copy_size copy_from_user() is

        __kernel_size_t __copy_size = (__kernel_size_t) n;

So
	return __copy_user(to, from, __copy_size);
and
	__copy_size = __copy_user(to, from, __copy_size);
	return __copy_size;
ought to be doing exactly the same thing.  At that point it's starting to
smell like a compiler bug somewhere in there.

Try to remove that (not triggered) if (unlikely(__copy_size)) memset(...)
and see if that's enough to recover.  And it would be nice to see what
all three variants (as it is, with commit reverted and with just that if
removed) generate in e.g. sys_utimensat() (fs/utimes.s)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 21:31     ` Al Viro
@ 2016-09-16 21:39       ` Rich Felker
  2016-09-16 22:47         ` Guenter Roeck
  2016-09-16 22:47       ` Guenter Roeck
  1 sibling, 1 reply; 18+ messages in thread
From: Rich Felker @ 2016-09-16 21:39 UTC (permalink / raw)
  To: Al Viro; +Cc: Guenter Roeck, linux-kernel, Yoshinori Sato, linux-sh

On Fri, Sep 16, 2016 at 10:31:41PM +0100, Al Viro wrote:
> On Fri, Sep 16, 2016 at 01:59:38PM -0700, Guenter Roeck wrote:
> > Yes, reverting 6e050503a150 fixes the problem.
> > 
> > I added a BUG() into the "if (unlikely())" below, but it doesn't catch,
> > and I still get the ip: OVERRUN errors. Which leaves me a bit puzzled.
> > 
> > Guenter
> > 
> > > The change in question is
> > >         if (__copy_size && __access_ok(__copy_from, __copy_size))
> > > -               return __copy_user(to, from, __copy_size);
> > > +               __copy_size = __copy_user(to, from, __copy_size);
> > > +
> > > +       if (unlikely(__copy_size))
> > > +               memset(to + (n - __copy_size), 0, __copy_size);
> > >  
> > >         return __copy_size;
> 
> So we don't even hit that memset()?  What the hell?  __copy_user() is
> declared as
> __kernel_size_t __copy_user(void *to, const void *from, __kernel_size_t n);
> 
> and __copy_size copy_from_user() is
> 
>         __kernel_size_t __copy_size = (__kernel_size_t) n;
> 
> So
> 	return __copy_user(to, from, __copy_size);
> and
> 	__copy_size = __copy_user(to, from, __copy_size);
> 	return __copy_size;
> ought to be doing exactly the same thing.  At that point it's starting to
> smell like a compiler bug somewhere in there.
> 
> Try to remove that (not triggered) if (unlikely(__copy_size)) memset(...)
> and see if that's enough to recover.  And it would be nice to see what
> all three variants (as it is, with commit reverted and with just that if
> removed) generate in e.g. sys_utimensat() (fs/utimes.s)

It would be useful to know what compiler version was used to build the
kernel. I wouldn't be surprised if some are buggy.

Rich

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 21:31     ` Al Viro
  2016-09-16 21:39       ` Rich Felker
@ 2016-09-16 22:47       ` Guenter Roeck
  2016-09-16 23:00         ` Al Viro
  1 sibling, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2016-09-16 22:47 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Yoshinori Sato, linux-sh, Rich Felker

On Fri, Sep 16, 2016 at 10:31:41PM +0100, Al Viro wrote:
> On Fri, Sep 16, 2016 at 01:59:38PM -0700, Guenter Roeck wrote:
> > Yes, reverting 6e050503a150 fixes the problem.
> > 
> > I added a BUG() into the "if (unlikely())" below, but it doesn't catch,
> > and I still get the ip: OVERRUN errors. Which leaves me a bit puzzled.
> > 
> > Guenter
> > 
> > > The change in question is
> > >         if (__copy_size && __access_ok(__copy_from, __copy_size))
> > > -               return __copy_user(to, from, __copy_size);
> > > +               __copy_size = __copy_user(to, from, __copy_size);
> > > +
> > > +       if (unlikely(__copy_size))
> > > +               memset(to + (n - __copy_size), 0, __copy_size);
> > >  
> > >         return __copy_size;
> 
> So we don't even hit that memset()?  What the hell?  __copy_user() is
> declared as
> __kernel_size_t __copy_user(void *to, const void *from, __kernel_size_t n);
> 
> and __copy_size copy_from_user() is
> 
>         __kernel_size_t __copy_size = (__kernel_size_t) n;
> 
> So
> 	return __copy_user(to, from, __copy_size);
> and
> 	__copy_size = __copy_user(to, from, __copy_size);
> 	return __copy_size;
> ought to be doing exactly the same thing.  At that point it's starting to
> smell like a compiler bug somewhere in there.
> 
> Try to remove that (not triggered) if (unlikely(__copy_size)) memset(...)
> and see if that's enough to recover.  And it would be nice to see what
> all three variants (as it is, with commit reverted and with just that if
> removed) generate in e.g. sys_utimensat() (fs/utimes.s)

Adding pr_info() just before the "if (unlikely..." fixes the problem.

Commenting out the "if (unlikely())" code fixes the problem. 

Using a new variable "unsigned long x" for the return code instead of
re-using __copy_size fixes the problem.

Replacing "return __copy_size;" with "return __copy_size & 0xffffffff;"
fixes the problem.

The problem only seems to be seen if the copy size length is odd (more
specifically, the failing copy always has a length of 25 bytes).

No idea what is going on. Bug in __copy_user() ? Compiler bug ?

Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 21:39       ` Rich Felker
@ 2016-09-16 22:47         ` Guenter Roeck
  2016-09-16 23:32           ` Rich Felker
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2016-09-16 22:47 UTC (permalink / raw)
  To: Rich Felker; +Cc: Al Viro, linux-kernel, Yoshinori Sato, linux-sh

On Fri, Sep 16, 2016 at 05:39:18PM -0400, Rich Felker wrote:
> On Fri, Sep 16, 2016 at 10:31:41PM +0100, Al Viro wrote:
> > On Fri, Sep 16, 2016 at 01:59:38PM -0700, Guenter Roeck wrote:
> > > Yes, reverting 6e050503a150 fixes the problem.
> > > 
> > > I added a BUG() into the "if (unlikely())" below, but it doesn't catch,
> > > and I still get the ip: OVERRUN errors. Which leaves me a bit puzzled.
> > > 
> > > Guenter
> > > 
> > > > The change in question is
> > > >         if (__copy_size && __access_ok(__copy_from, __copy_size))
> > > > -               return __copy_user(to, from, __copy_size);
> > > > +               __copy_size = __copy_user(to, from, __copy_size);
> > > > +
> > > > +       if (unlikely(__copy_size))
> > > > +               memset(to + (n - __copy_size), 0, __copy_size);
> > > >  
> > > >         return __copy_size;
> > 
> > So we don't even hit that memset()?  What the hell?  __copy_user() is
> > declared as
> > __kernel_size_t __copy_user(void *to, const void *from, __kernel_size_t n);
> > 
> > and __copy_size copy_from_user() is
> > 
> >         __kernel_size_t __copy_size = (__kernel_size_t) n;
> > 
> > So
> > 	return __copy_user(to, from, __copy_size);
> > and
> > 	__copy_size = __copy_user(to, from, __copy_size);
> > 	return __copy_size;
> > ought to be doing exactly the same thing.  At that point it's starting to
> > smell like a compiler bug somewhere in there.
> > 
> > Try to remove that (not triggered) if (unlikely(__copy_size)) memset(...)
> > and see if that's enough to recover.  And it would be nice to see what
> > all three variants (as it is, with commit reverted and with just that if
> > removed) generate in e.g. sys_utimensat() (fs/utimes.s)
> 
> It would be useful to know what compiler version was used to build the
> kernel. I wouldn't be surprised if some are buggy.
> 
4.6.3 from kernel.org.

Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 22:47       ` Guenter Roeck
@ 2016-09-16 23:00         ` Al Viro
  0 siblings, 0 replies; 18+ messages in thread
From: Al Viro @ 2016-09-16 23:00 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-kernel, Yoshinori Sato, linux-sh, Rich Felker

On Fri, Sep 16, 2016 at 03:47:04PM -0700, Guenter Roeck wrote:

> Adding pr_info() just before the "if (unlikely..." fixes the problem.
> 
> Commenting out the "if (unlikely())" code fixes the problem. 
> 
> Using a new variable "unsigned long x" for the return code instead of
> re-using __copy_size fixes the problem.
> 
> Replacing "return __copy_size;" with "return __copy_size & 0xffffffff;"
> fixes the problem.
> 
> The problem only seems to be seen if the copy size length is odd (more
> specifically, the failing copy always has a length of 25 bytes).
> 
> No idea what is going on. Bug in __copy_user() ? Compiler bug ?

Definitely a compiler bug.  __kernel_size_t is 32 bits on sh; that & 0xffffffff
should've been optimized away, for crying out loud!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 22:47         ` Guenter Roeck
@ 2016-09-16 23:32           ` Rich Felker
  2016-09-17  2:23             ` Guenter Roeck
  2016-09-17  2:28             ` Guenter Roeck
  0 siblings, 2 replies; 18+ messages in thread
From: Rich Felker @ 2016-09-16 23:32 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Al Viro, linux-kernel, Yoshinori Sato, linux-sh

On Fri, Sep 16, 2016 at 03:47:44PM -0700, Guenter Roeck wrote:
> On Fri, Sep 16, 2016 at 05:39:18PM -0400, Rich Felker wrote:
> > On Fri, Sep 16, 2016 at 10:31:41PM +0100, Al Viro wrote:
> > > On Fri, Sep 16, 2016 at 01:59:38PM -0700, Guenter Roeck wrote:
> > > > Yes, reverting 6e050503a150 fixes the problem.
> > > > 
> > > > I added a BUG() into the "if (unlikely())" below, but it doesn't catch,
> > > > and I still get the ip: OVERRUN errors. Which leaves me a bit puzzled.
> > > > 
> > > > Guenter
> > > > 
> > > > > The change in question is
> > > > >         if (__copy_size && __access_ok(__copy_from, __copy_size))
> > > > > -               return __copy_user(to, from, __copy_size);
> > > > > +               __copy_size = __copy_user(to, from, __copy_size);
> > > > > +
> > > > > +       if (unlikely(__copy_size))
> > > > > +               memset(to + (n - __copy_size), 0, __copy_size);
> > > > >  
> > > > >         return __copy_size;
> > > 
> > > So we don't even hit that memset()?  What the hell?  __copy_user() is
> > > declared as
> > > __kernel_size_t __copy_user(void *to, const void *from, __kernel_size_t n);
> > > 
> > > and __copy_size copy_from_user() is
> > > 
> > >         __kernel_size_t __copy_size = (__kernel_size_t) n;
> > > 
> > > So
> > > 	return __copy_user(to, from, __copy_size);
> > > and
> > > 	__copy_size = __copy_user(to, from, __copy_size);
> > > 	return __copy_size;
> > > ought to be doing exactly the same thing.  At that point it's starting to
> > > smell like a compiler bug somewhere in there.
> > > 
> > > Try to remove that (not triggered) if (unlikely(__copy_size)) memset(...)
> > > and see if that's enough to recover.  And it would be nice to see what
> > > all three variants (as it is, with commit reverted and with just that if
> > > removed) generate in e.g. sys_utimensat() (fs/utimes.s)
> > 
> > It would be useful to know what compiler version was used to build the
> > kernel. I wouldn't be surprised if some are buggy.
> > 
> 4.6.3 from kernel.org.

That is utterly ancient and probaby very buggy. I would recommend 5.x+
or at the very least 4.7 or 4.8.

Rich

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 23:32           ` Rich Felker
@ 2016-09-17  2:23             ` Guenter Roeck
  2016-09-18  4:40               ` Rob Landley
  2016-09-17  2:28             ` Guenter Roeck
  1 sibling, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2016-09-17  2:23 UTC (permalink / raw)
  To: Rich Felker; +Cc: Al Viro, linux-kernel, Yoshinori Sato, linux-sh

On 09/16/2016 04:32 PM, Rich Felker wrote:
>> 4.6.3 from kernel.org.
>
> That is utterly ancient and probaby very buggy. I would recommend 5.x+
> or at the very least 4.7 or 4.8.
>
Unfortunately that is the latest one available from kernel.org :-(.
I'll try to build one myself.

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-16 23:32           ` Rich Felker
  2016-09-17  2:23             ` Guenter Roeck
@ 2016-09-17  2:28             ` Guenter Roeck
  1 sibling, 0 replies; 18+ messages in thread
From: Guenter Roeck @ 2016-09-17  2:28 UTC (permalink / raw)
  To: Rich Felker; +Cc: Al Viro, linux-kernel, Yoshinori Sato, linux-sh

On 09/16/2016 04:32 PM, Rich Felker wrote:
[ ... ]

>>> It would be useful to know what compiler version was used to build the
>>> kernel. I wouldn't be surprised if some are buggy.
>>>
>> 4.6.3 from kernel.org.
>
> That is utterly ancient and probaby very buggy. I would recommend 5.x+
> or at the very least 4.7 or 4.8.
>
I found a toolchain with 5.3.0, and using it the problem disappears. Oh well.
Sorry for the noise.

Guenter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-17  2:23             ` Guenter Roeck
@ 2016-09-18  4:40               ` Rob Landley
  2016-09-18 15:17                 ` Rich Felker
  0 siblings, 1 reply; 18+ messages in thread
From: Rob Landley @ 2016-09-18  4:40 UTC (permalink / raw)
  To: Guenter Roeck, Rich Felker
  Cc: Al Viro, linux-kernel, Yoshinori Sato, linux-sh



On 09/16/2016 09:23 PM, Guenter Roeck wrote:
> On 09/16/2016 04:32 PM, Rich Felker wrote:
>>> 4.6.3 from kernel.org.
>>
>> That is utterly ancient and probaby very buggy. I would recommend 5.x+
>> or at the very least 4.7 or 4.8.
>>
> Unfortunately that is the latest one available from kernel.org :-(.
> I'll try to build one myself.

Rich, you really, really need to get an actual release version of
https://github.com/richfelker/musl-cross-make posted.

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-18  4:40               ` Rob Landley
@ 2016-09-18 15:17                 ` Rich Felker
  2016-09-29  2:36                   ` Rob Landley
  0 siblings, 1 reply; 18+ messages in thread
From: Rich Felker @ 2016-09-18 15:17 UTC (permalink / raw)
  To: Rob Landley
  Cc: Guenter Roeck, Al Viro, linux-kernel, Yoshinori Sato, linux-sh

On Sat, Sep 17, 2016 at 11:40:28PM -0500, Rob Landley wrote:
> 
> 
> On 09/16/2016 09:23 PM, Guenter Roeck wrote:
> > On 09/16/2016 04:32 PM, Rich Felker wrote:
> >>> 4.6.3 from kernel.org.
> >>
> >> That is utterly ancient and probaby very buggy. I would recommend 5.x+
> >> or at the very least 4.7 or 4.8.
> >>
> > Unfortunately that is the latest one available from kernel.org :-(.
> > I'll try to build one myself.
> 
> Rich, you really, really need to get an actual release version of
> https://github.com/richfelker/musl-cross-make posted.

What do you mean? Binaries? There are release tags, though it would
probably be a good time to make another one.

But this project (musl-cross-make) is not needed for building kernels
-- stock gcc, any modern-ish version, should work fine. The canonical
way (from prior to my involvement) to build sh* kernels is to use a
gcc that supports any ISA level, and this can be done without multilib
libgcc since the kernel provides its own libgcc replacement functions.

Rich

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()'
  2016-09-18 15:17                 ` Rich Felker
@ 2016-09-29  2:36                   ` Rob Landley
  0 siblings, 0 replies; 18+ messages in thread
From: Rob Landley @ 2016-09-29  2:36 UTC (permalink / raw)
  To: Rich Felker
  Cc: Guenter Roeck, Al Viro, linux-kernel, Yoshinori Sato, linux-sh

On 09/18/2016 10:17 AM, Rich Felker wrote:
> On Sat, Sep 17, 2016 at 11:40:28PM -0500, Rob Landley wrote:
>>
>>
>> On 09/16/2016 09:23 PM, Guenter Roeck wrote:
>>> On 09/16/2016 04:32 PM, Rich Felker wrote:
>>>>> 4.6.3 from kernel.org.
>>>>
>>>> That is utterly ancient and probaby very buggy. I would recommend 5.x+
>>>> or at the very least 4.7 or 4.8.
>>>>
>>> Unfortunately that is the latest one available from kernel.org :-(.
>>> I'll try to build one myself.
>>
>> Rich, you really, really need to get an actual release version of
>> https://github.com/richfelker/musl-cross-make posted.
> 
> What do you mean? Binaries? There are release tags, though it would
> probably be a good time to make another one.
> 
> But this project (musl-cross-make) is not needed for building kernels
> -- stock gcc, any modern-ish version, should work fine. The canonical
> way (from prior to my involvement) to build sh* kernels is to use a
> gcc that supports any ISA level, and this can be done without multilib
> libgcc since the kernel provides its own libgcc replacement functions.

The above was an example of somebody using a broken toolchain because
there isn't a known-good reference toolchain for the architecture, which
the kernel maintainer is known to regression test against. Having such a
thing might help people distinguish "bug in kernel" from "bug in gcc".

> Rich

Rob

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-09-29  2:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-16 19:12 Runtime failure running sh:qemu in -next due to 'sh: fix copy_from_user()' Guenter Roeck
2016-09-16 19:45 ` Al Viro
2016-09-16 20:59   ` Guenter Roeck
2016-09-16 21:31     ` Al Viro
2016-09-16 21:39       ` Rich Felker
2016-09-16 22:47         ` Guenter Roeck
2016-09-16 23:32           ` Rich Felker
2016-09-17  2:23             ` Guenter Roeck
2016-09-18  4:40               ` Rob Landley
2016-09-18 15:17                 ` Rich Felker
2016-09-29  2:36                   ` Rob Landley
2016-09-17  2:28             ` Guenter Roeck
2016-09-16 22:47       ` Guenter Roeck
2016-09-16 23:00         ` Al Viro
2016-09-16 20:03 ` Al Viro
2016-09-16 20:32   ` Guenter Roeck
2016-09-16 20:43     ` Lennart Sorensen
2016-09-16 21:20     ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).