* [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite @ 2016-05-27 11:17 Nick Alcock 2016-05-27 13:19 ` Nick Alcock 0 siblings, 1 reply; 12+ messages in thread From: Nick Alcock @ 2016-05-27 11:17 UTC (permalink / raw) To: linux-kernel, linux-sparc; +Cc: David S. Miller, Florian Weimer So I've been working on a patch series (see below) that applies GCC's -fstack-protector{-all,-strong} to almost all of glibc bar the dynamic linker. In trying to upstream it, one review commenter queried one SPARC-specific patch in the series; the absence of this patch triggers a BUG in the SPARC kernel when glibc is tested as an unprivileged user, on all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on the ldoms I have access to and presumably on bare hardware too. This is clearly a bug, and equally clearly I think it needs fixing before we can upstream the series, which it would be nice to do because it would have prevented most of the recent spate of glibc stack overflows from escalating to arbitrary code execution. First, a representative sample of the BUG, as seen on 4.6.0: ld-linux.so.2[36805]: segfault at 7ff ip (null) (rpc (null)) sp (null) error 30001 in tst-kill6[100000+4000] ld-linux.so.2[36806]: segfault at 7ff ip (null) (rpc (null)) sp (null) error 30001 in tst-kill6[100000+4000] ld-linux.so.2[36807]: segfault at 7ff ip (null) (rpc (null)) sp (null) error 30001 in tst-kill6[100000+4000] kernel BUG at arch/sparc/mm/fault_64.c:299! \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ ld-linux.so.2(36808): Kernel bad sw trap 5 [#1] CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34 task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000 TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted TPC: <do_sparc64_fault+0x5c4/0x700> g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001 g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001 o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358 o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c RPC: <do_sparc64_fault+0x5bc/0x700> l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000 l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000 i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c I7: <user_rtt_fill_fixup+0x6c/0x7c> Call Trace: [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c Disabling lock debugging due to kernel taint Caller[0000000000404d4c]: user_rtt_fill_fixup+0x6c/0x7c Caller[0000000000000000]: (null) Instruction DUMP: 9210212b 7fe84179 901222e8 <91d02005> 90102002 92102001 94100018 7fecd033 96100010 Kernel panic - not syncing: Fatal exception Press Stop-A (L1-A) to return to the boot prom ---[ end Kernel panic - not syncing: Fatal exception The crash moves around, and can even be seen striking in completely random userspace processes that aren't part of the glibc under test (e.g. I've seen it happen inside awk and GCC). The backtrace is always the same, though. It seems this is an unexpected TLB fault from this BUG in do_sparc64_fault(): if ((fault_code & FAULT_CODE_ITLB) && (fault_code & FAULT_CODE_DTLB)) BUG(); which certainly explains the randomness to some extent. Now, some details for replication. It's easy to replicate if you can build and test glibc using a GCC that supports -fstack-protector-all on Linux/SPARC: I used 4.9.3. (You don't need to *install* the glibc or anything, and getting to the crash on reasonable hardware takes only a few minutes.) The patch series itself, in the hopefully-not-too-inconvenient form of a pair of git bundles based on glibc commit a5df3210a641c175138052037fcdad34298bfa4d (near the glibc-2.23 release), though this happens on glibc trunk with these bundles merged in too: <http://www.esperi.org.uk/~nix/src/glibc-crashes.bundle> <http://www.esperi.org.uk/~nix/src/glibc-workaround.bundle> You'll need to run autoconf-2.69 in the source tree after checkout, since I haven't regenerated configure in either of them. To configure/build/test, I used ../../glibc/configure --enable-stackguard-randomization \ --enable-stack-protector=all --prefix=/usr --enable-shared \ --enable-bind-now --enable-maintainer-mode --enable-obsolete-rpc \ --enable-add-ons=libidn --enable-kernel=4.1 --enable-check-abi=warn \ && make -j 5 && make -j 5 check TIMEOUTFACTOR=5 though most of the configure flags are probably unnecessary and you'll probably want to adjust the -j numbers. The crucial one is --enable-stack-protector=all; without it, the first patch series is equivalent to the second. The crash almost invariably happens during the make check run, usually during or after string/; both 32-bit and 64-bit glibc builds are affected (the above configure line is for 64-bit). I have not yet completed as many as four runs without a crash, and it almost always happens in one or two. You can probably trigger one reliably by simply rerunning make check in a loop without doing any of the rest of the rebuilding (but I was reconfiguring and rebuilding because all of that was scripted). The only difference between the two series above is that in the crashing series, the ka_restorer stub functions __rt_sigreturn_stub and __sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get stack-protected; in the non-crashing series, they do not; the same is true without --enable-stack-protector=all, because the functions have no local variables at all, so without -fstack-protector-all they don't get stack-protected in any case. Passing such a stack-protected function in as the ka_restorer stub seems to suffice to cause this crash at some later date. I'm wondering if the stack canary is clobbering something that the caller does not expect to be clobbered: we saw this cause trouble on x86 in a different context (see upstream commit 7a25d6a84df9fea56963569ceccaaf7c2a88f161). It is clearly acceptable to say "restorer stubs are incompatible with stack-protector canaries: turn them off" -- there are plenty of places that are incompatible with canaries for good reason, and quite a lot of the glibc patch series has been identifying these and turning the stack- protector off for them -- but it is probably less acceptable to crash the kernel if they don't do that! So at least some extra armouring seems to be called for. But where that extra armouring needs to go, I don't know (probably not in do_sparc64_fault(), since I guess the underlying bug is way upstream of this somewhere). I really have no idea what the underlying bug might *be*. setup_rt_frame() might be a good place to start looking, only of course that can't on its own explain how the explosion happens at a later date, or how TLB faults get involved. Anyway, I hope this is enough to at least replicate the bug: if it's not -- if I've forgotten some detail, or if there is an environmental dependence beyond "it's a SPARC" that I don't know about -- feel free to ask for more info. I'm a mere userspace guy, and barely know about any variation that may exist in the SPARC world these days. It's quite possible that the hardware I'm using to test this (on the other side of the world) is some sort of weird preproduction silicon and I don't know it and this only happens there: it's certain that its firmware is three years old... if nobody else can reproduce it, I'll try to dig out some more hosts with different characteristics and see if it happens on them too. Kernel .config for this host (it's huge because it's derived from an enterprise distro config): <http://www.esperi.org.uk/~nix/src/config-4.6-sparc> Very few of those modules are loaded, to wit: Module Size Used by ipt_REJECT 1853 2 nf_reject_ipv4 3645 1 ipt_REJECT nf_conntrack_ipv4 11179 2 nf_defrag_ipv4 1849 1 nf_conntrack_ipv4 iptable_filter 2108 1 ip_tables 20683 1 iptable_filter ip6t_REJECT 1857 2 nf_reject_ipv6 5205 1 ip6t_REJECT nf_conntrack_ipv6 11359 2 nf_defrag_ipv6 26774 1 nf_conntrack_ipv6 xt_state 1570 4 nf_conntrack 100343 3 nf_conntrack_ipv4,nf_conntrack_ipv6,xt_state ip6table_filter 2050 1 ip6_tables 19814 1 ip6table_filter ipv6 411857 153 nf_reject_ipv6,nf_conntrack_ipv6,nf_defrag_ipv6,[permanent] openprom 6699 0 ext4 608323 2 mbcache 6913 3 ext4 jbd2 108713 1 ext4 des_generic 20873 0 sunvnet 6897 0 sunvdc 10861 4 dm_mirror 14985 0 dm_region_hash 11360 1 dm_mirror dm_log 10973 2 dm_mirror,dm_region_hash dm_mod 108820 9 dm_mirror,dm_log ... though I doubt the set of loaded modules is likely to affect reproduction of this bug much. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-27 11:17 [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite Nick Alcock @ 2016-05-27 13:19 ` Nick Alcock 2016-05-27 13:34 ` John Paul Adrian Glaubitz ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Nick Alcock @ 2016-05-27 13:19 UTC (permalink / raw) To: linux-kernel; +Cc: sparclinux, David S. Miller, Florian Weimer [Resent with fixed address for sparclinux@; sorry!] So I've been working on a patch series (see below) that applies GCC's -fstack-protector{-all,-strong} to almost all of glibc bar the dynamic linker. In trying to upstream it, one review commenter queried one SPARC-specific patch in the series; the absence of this patch triggers a BUG in the SPARC kernel when glibc is tested as an unprivileged user, on all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on the ldoms I have access to and presumably on bare hardware too. This is clearly a bug, and equally clearly I think it needs fixing before we can upstream the series, which it would be nice to do because it would have prevented most of the recent spate of glibc stack overflows from escalating to arbitrary code execution. First, a representative sample of the BUG, as seen on 4.6.0: ld-linux.so.2[36805]: segfault at 7ff ip (null) (rpc (null)) sp (null) error 30001 in tst-kill6[100000+4000] ld-linux.so.2[36806]: segfault at 7ff ip (null) (rpc (null)) sp (null) error 30001 in tst-kill6[100000+4000] ld-linux.so.2[36807]: segfault at 7ff ip (null) (rpc (null)) sp (null) error 30001 in tst-kill6[100000+4000] kernel BUG at arch/sparc/mm/fault_64.c:299! \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ ld-linux.so.2(36808): Kernel bad sw trap 5 [#1] CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34 task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000 TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002 Not tainted TPC: <do_sparc64_fault+0x5c4/0x700> g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001 g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001 o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358 o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c RPC: <do_sparc64_fault+0x5bc/0x700> l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000 l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000 i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000 i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c I7: <user_rtt_fill_fixup+0x6c/0x7c> Call Trace: [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c Disabling lock debugging due to kernel taint Caller[0000000000404d4c]: user_rtt_fill_fixup+0x6c/0x7c Caller[0000000000000000]: (null) Instruction DUMP: 9210212b 7fe84179 901222e8 <91d02005> 90102002 92102001 94100018 7fecd033 96100010 Kernel panic - not syncing: Fatal exception Press Stop-A (L1-A) to return to the boot prom ---[ end Kernel panic - not syncing: Fatal exception The crash moves around, and can even be seen striking in completely random userspace processes that aren't part of the glibc under test (e.g. I've seen it happen inside awk and GCC). The backtrace is always the same, though. It seems this is an unexpected TLB fault from this BUG in do_sparc64_fault(): if ((fault_code & FAULT_CODE_ITLB) && (fault_code & FAULT_CODE_DTLB)) BUG(); which certainly explains the randomness to some extent. Now, some details for replication. It's easy to replicate if you can build and test glibc using a GCC that supports -fstack-protector-all on Linux/SPARC: I used 4.9.3. (You don't need to *install* the glibc or anything, and getting to the crash on reasonable hardware takes only a few minutes.) The patch series itself, in the hopefully-not-too-inconvenient form of a pair of git bundles based on glibc commit a5df3210a641c175138052037fcdad34298bfa4d (near the glibc-2.23 release), though this happens on glibc trunk with these bundles merged in too: <http://www.esperi.org.uk/~nix/src/glibc-crashes.bundle> <http://www.esperi.org.uk/~nix/src/glibc-workaround.bundle> You'll need to run autoconf-2.69 in the source tree after checkout, since I haven't regenerated configure in either of them. To configure/build/test, I used ../../glibc/configure --enable-stackguard-randomization \ --enable-stack-protector=all --prefix=/usr --enable-shared \ --enable-bind-now --enable-maintainer-mode --enable-obsolete-rpc \ --enable-add-ons=libidn --enable-kernel=4.1 --enable-check-abi=warn \ && make -j 5 && make -j 5 check TIMEOUTFACTOR=5 though most of the configure flags are probably unnecessary and you'll probably want to adjust the -j numbers. The crucial one is --enable-stack-protector=all; without it, the first patch series is equivalent to the second. The crash almost invariably happens during the make check run, usually during or after string/; both 32-bit and 64-bit glibc builds are affected (the above configure line is for 64-bit). I have not yet completed as many as four runs without a crash, and it almost always happens in one or two. You can probably trigger one reliably by simply rerunning make check in a loop without doing any of the rest of the rebuilding (but I was reconfiguring and rebuilding because all of that was scripted). The only difference between the two series above is that in the crashing series, the ka_restorer stub functions __rt_sigreturn_stub and __sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get stack-protected; in the non-crashing series, they do not; the same is true without --enable-stack-protector=all, because the functions have no local variables at all, so without -fstack-protector-all they don't get stack-protected in any case. Passing such a stack-protected function in as the ka_restorer stub seems to suffice to cause this crash at some later date. I'm wondering if the stack canary is clobbering something that the caller does not expect to be clobbered: we saw this cause trouble on x86 in a different context (see upstream commit 7a25d6a84df9fea56963569ceccaaf7c2a88f161). It is clearly acceptable to say "restorer stubs are incompatible with stack-protector canaries: turn them off" -- there are plenty of places that are incompatible with canaries for good reason, and quite a lot of the glibc patch series has been identifying these and turning the stack- protector off for them -- but it is probably less acceptable to crash the kernel if they don't do that! So at least some extra armouring seems to be called for. But where that extra armouring needs to go, I don't know (probably not in do_sparc64_fault(), since I guess the underlying bug is way upstream of this somewhere). I really have no idea what the underlying bug might *be*. setup_rt_frame() might be a good place to start looking, only of course that can't on its own explain how the explosion happens at a later date, or how TLB faults get involved. Anyway, I hope this is enough to at least replicate the bug: if it's not -- if I've forgotten some detail, or if there is an environmental dependence beyond "it's a SPARC" that I don't know about -- feel free to ask for more info. I'm a mere userspace guy, and barely know about any variation that may exist in the SPARC world these days. It's quite possible that the hardware I'm using to test this (on the other side of the world) is some sort of weird preproduction silicon and I don't know it and this only happens there: it's certain that its firmware is three years old... if nobody else can reproduce it, I'll try to dig out some more hosts with different characteristics and see if it happens on them too. Kernel .config for this host (it's huge because it's derived from an enterprise distro config): <http://www.esperi.org.uk/~nix/src/config-4.6-sparc> Very few of those modules are loaded, to wit: Module Size Used by ipt_REJECT 1853 2 nf_reject_ipv4 3645 1 ipt_REJECT nf_conntrack_ipv4 11179 2 nf_defrag_ipv4 1849 1 nf_conntrack_ipv4 iptable_filter 2108 1 ip_tables 20683 1 iptable_filter ip6t_REJECT 1857 2 nf_reject_ipv6 5205 1 ip6t_REJECT nf_conntrack_ipv6 11359 2 nf_defrag_ipv6 26774 1 nf_conntrack_ipv6 xt_state 1570 4 nf_conntrack 100343 3 nf_conntrack_ipv4,nf_conntrack_ipv6,xt_state ip6table_filter 2050 1 ip6_tables 19814 1 ip6table_filter ipv6 411857 153 nf_reject_ipv6,nf_conntrack_ipv6,nf_defrag_ipv6,[permanent] openprom 6699 0 ext4 608323 2 mbcache 6913 3 ext4 jbd2 108713 1 ext4 des_generic 20873 0 sunvnet 6897 0 sunvdc 10861 4 dm_mirror 14985 0 dm_region_hash 11360 1 dm_mirror dm_log 10973 2 dm_mirror,dm_region_hash dm_mod 108820 9 dm_mirror,dm_log ... though I doubt the set of loaded modules is likely to affect reproduction of this bug much. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-27 13:19 ` Nick Alcock @ 2016-05-27 13:34 ` John Paul Adrian Glaubitz 2016-05-27 13:43 ` Nick Alcock 2016-05-27 19:37 ` David Miller 2016-05-29 6:02 ` David Miller 2 siblings, 1 reply; 12+ messages in thread From: John Paul Adrian Glaubitz @ 2016-05-27 13:34 UTC (permalink / raw) To: Nick Alcock, linux-kernel Cc: Sparc kernel list, David S. Miller, Florian Weimer, Jose E. Marchesi Hi Nick! On 05/27/2016 03:19 PM, Nick Alcock wrote: > So I've been working on a patch series (see below) that applies GCC's > -fstack-protector{-all,-strong} to almost all of glibc bar the dynamic > linker. In trying to upstream it, one review commenter queried one > SPARC-specific patch in the series; the absence of this patch triggers a > BUG in the SPARC kernel when glibc is tested as an unprivileged user, on > all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on > the ldoms I have access to and presumably on bare hardware too. I apologize for hijacking this thread but since you are mentioning glibc, there are actually a couple of tests in the glibc testsuite [1]. Jose Marchesi has started looking at them and most of them seem to be related to unaligned accesses. Would be great if those could be resolved in the near future, too. We're working towards getting Debian's sparc64 port into a releasable state, but there are some packages like glibc or util-linux that need work since their testsuites fail. Cheers, Adrian > [1] https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=sparc64&ver=2.22-9&stamp=1464023539 -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer - glaubitz@debian.org `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-27 13:34 ` John Paul Adrian Glaubitz @ 2016-05-27 13:43 ` Nick Alcock 0 siblings, 0 replies; 12+ messages in thread From: Nick Alcock @ 2016-05-27 13:43 UTC (permalink / raw) To: John Paul Adrian Glaubitz Cc: linux-kernel, Sparc kernel list, David S. Miller, Florian Weimer, Jose E. Marchesi On 27 May 2016, John Paul Adrian Glaubitz outgrape: > Hi Nick! > > On 05/27/2016 03:19 PM, Nick Alcock wrote: >> So I've been working on a patch series (see below) that applies GCC's >> -fstack-protector{-all,-strong} to almost all of glibc bar the dynamic >> linker. In trying to upstream it, one review commenter queried one >> SPARC-specific patch in the series; the absence of this patch triggers a >> BUG in the SPARC kernel when glibc is tested as an unprivileged user, on >> all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on >> the ldoms I have access to and presumably on bare hardware too. > > I apologize for hijacking this thread but since you are mentioning glibc, > there are actually a couple of tests in the glibc testsuite [1]. At least one of those failures is spurious: FAIL: nptl/tst-cond11 original exit status 1 clock = 0 Timed out: killed the child process You want to pass in a higher TIMEOUTFACTOR to the make check run, and that problem at least should go away. (The TIMEOUTFACTOR you need depends on how sluggish your test machine is.) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-27 13:19 ` Nick Alcock 2016-05-27 13:34 ` John Paul Adrian Glaubitz @ 2016-05-27 19:37 ` David Miller 2016-05-27 21:44 ` Nick Alcock 2016-05-29 6:02 ` David Miller 2 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2016-05-27 19:37 UTC (permalink / raw) To: nix; +Cc: linux-kernel, sparclinux, fweimer From: Nick Alcock <nix@esperi.org.uk> Date: Fri, 27 May 2016 14:19:27 +0100 > The only difference between the two series above is that in the crashing > series, the ka_restorer stub functions __rt_sigreturn_stub and > __sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get > stack-protected; in the non-crashing series, they do not; the same is > true without --enable-stack-protector=all, because the functions have no > local variables at all, so without -fstack-protector-all they don't get > stack-protected in any case. Passing such a stack-protected function in > as the ka_restorer stub seems to suffice to cause this crash at some > later date. I'm wondering if the stack canary is clobbering something > that the caller does not expect to be clobbered: we saw this cause > trouble on x86 in a different context (see upstream commit > 7a25d6a84df9fea56963569ceccaaf7c2a88f161). This is amazing that it makes a difference since the sigreturn stub is implemented entirely in inline assembler :-) Normally the 64-bit stub is emitted as: __rt_sigreturn_stub: mov 101, %g1 ta 0x6d and with -fstack-protector-all we get: __rt_sigreturn_stub: save %sp, -192, %sp ldx [%g7+40], %g1 stx %g1, [%fp+2039] mov 0, %g1 mov 101, %g1 ta 0x6d ldx [%fp+2039], %g1 ldx [%g7+40], %g2 xor %g1, %g2, %g1 mov 0, %g2 brnz,pn %g1, .LL4 nop return %i7+8 nop .LL4: call __stack_chk_fail, 0 nop nop That 'save' is the problem. One can't change the register window or the stack pointer in this function, as the kernel has setup the restore frame at a precise location relative to the stack pointer when the stub is invoked. Basically, do_rt_sigreturn is restoring garbage into the cpu registers. It obviously shouldn't crash, which I'll look into, but it is clear that we can't enable -fstack-protector-all for this function. So far I'm playing with the patch below to do some basic sanity checks on the values inside of the sigreturn frame: diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c index 3c25241..6eb39a7 100644 --- a/arch/sparc/kernel/signal32.c +++ b/arch/sparc/kernel/signal32.c @@ -138,6 +138,18 @@ int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from) return 0; } +/* Checks if the fp is valid. We always build signal frames which are + * 16-byte aligned, therefore we can always enforce that the restore + * frame has that property as well. + */ +static bool invalid_frame_pointer(void __user *fp, int fplen) +{ + if ((((unsigned long) fp) & 15) || + ((unsigned long)fp) > 0x100000000ULL - fplen) + return true; + return false; +} + void do_sigreturn32(struct pt_regs *regs) { struct signal_frame32 __user *sf; @@ -158,8 +170,10 @@ void do_sigreturn32(struct pt_regs *regs) sf = (struct signal_frame32 __user *) regs->u_regs[UREG_FP]; /* 1. Make sure we are not getting garbage from the user */ - if (!access_ok(VERIFY_READ, sf, sizeof(*sf)) || - (((unsigned long) sf) & 3)) + if (invalid_frame_pointer(sf, sizeof(*sf))) + goto segv; + + if (sf->info.si_regs.u_regs[UREG_FP] & 3) goto segv; if (get_user(pc, &sf->info.si_regs.pc) || @@ -242,8 +256,10 @@ asmlinkage void do_rt_sigreturn32(struct pt_regs *regs) sf = (struct rt_signal_frame32 __user *) regs->u_regs[UREG_FP]; /* 1. Make sure we are not getting garbage from the user */ - if (!access_ok(VERIFY_READ, sf, sizeof(*sf)) || - (((unsigned long) sf) & 3)) + if (invalid_frame_pointer(sf, sizeof(*sf))) + goto segv; + + if (sf->regs.u_regs[UREG_FP] & 3) goto segv; if (get_user(pc, &sf->regs.pc) || @@ -307,14 +323,6 @@ segv: force_sig(SIGSEGV, current); } -/* Checks if the fp is valid */ -static int invalid_frame_pointer(void __user *fp, int fplen) -{ - if ((((unsigned long) fp) & 7) || ((unsigned long)fp) > 0x100000000ULL - fplen) - return 1; - return 0; -} - static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, unsigned long framesize) { unsigned long sp; diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c index 39aaec1..a8f0019 100644 --- a/arch/sparc/kernel/signal_64.c +++ b/arch/sparc/kernel/signal_64.c @@ -234,6 +234,17 @@ do_sigsegv: goto out; } +/* Checks if the fp is valid. We always build rt signal frames which + * are 16-byte aligned, therefore we can always enforce that the + * restore frame has that property as well. + */ +static bool invalid_frame_pointer(void __user *fp) +{ + if (((unsigned long) fp) & 15) + return true; + return false; +} + struct rt_signal_frame { struct sparc_stackf ss; siginfo_t info; @@ -261,7 +272,10 @@ void do_rt_sigreturn(struct pt_regs *regs) (regs->u_regs [UREG_FP] + STACK_BIAS); /* 1. Make sure we are not getting garbage from the user */ - if (((unsigned long) sf) & 3) + if (invalid_frame_pointer(sf)) + goto segv; + + if ((sf->regs.u_regs[UREG_FP] + STACK_BIAS) & 7) goto segv; err = get_user(tpc, &sf->regs.tpc); @@ -308,14 +322,6 @@ segv: force_sig(SIGSEGV, current); } -/* Checks if the fp is valid */ -static int invalid_frame_pointer(void __user *fp) -{ - if (((unsigned long) fp) & 15) - return 1; - return 0; -} - static inline void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, unsigned long framesize) { unsigned long sp = regs->u_regs[UREG_FP] + STACK_BIAS; ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-27 19:37 ` David Miller @ 2016-05-27 21:44 ` Nick Alcock 2016-05-27 22:51 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Nick Alcock @ 2016-05-27 21:44 UTC (permalink / raw) To: David Miller; +Cc: linux-kernel, sparclinux, fweimer On 27 May 2016, David Miller stated: > From: Nick Alcock <nix@esperi.org.uk> > Date: Fri, 27 May 2016 14:19:27 +0100 > >> The only difference between the two series above is that in the crashing >> series, the ka_restorer stub functions __rt_sigreturn_stub and >> __sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get >> stack-protected; in the non-crashing series, they do not; the same is >> true without --enable-stack-protector=all, because the functions have no >> local variables at all, so without -fstack-protector-all they don't get >> stack-protected in any case. Passing such a stack-protected function in >> as the ka_restorer stub seems to suffice to cause this crash at some >> later date. I'm wondering if the stack canary is clobbering something >> that the caller does not expect to be clobbered: we saw this cause >> trouble on x86 in a different context (see upstream commit >> 7a25d6a84df9fea56963569ceccaaf7c2a88f161). > > This is amazing that it makes a difference since the sigreturn stub is > implemented entirely in inline assembler :-) I was fairly surprised as well, but not shocked, because people who write a function that consists of one single inline assembler instruction might well be rather surprised to find a massive pile of prologue and epilogue code dumped around it! > Normally the 64-bit stub is emitted as: > > __rt_sigreturn_stub: > mov 101, %g1 > ta 0x6d > > and with -fstack-protector-all we get: > > __rt_sigreturn_stub: > save %sp, -192, %sp > ldx [%g7+40], %g1 > stx %g1, [%fp+2039] > mov 0, %g1 > > mov 101, %g1 > ta 0x6d > > ldx [%fp+2039], %g1 > ldx [%g7+40], %g2 > xor %g1, %g2, %g1 > mov 0, %g2 > brnz,pn %g1, .LL4 > nop > return %i7+8 > nop > .LL4: > call __stack_chk_fail, 0 > nop > nop > > That 'save' is the problem. > > One can't change the register window or the stack pointer in this > function, as the kernel has setup the restore frame at a precise > location relative to the stack pointer when the stub is invoked. Oops! > Basically, do_rt_sigreturn is restoring garbage into the cpu > registers. Oh gods is it supposed to do register restoration? i.e. the usual ABI rules in re stack changes, etc just don't apply to it? Right, that's a disaster for stack-protection, obviously. The stack-protector prologue/epilogue does rather assume that it's being wrapped around a function, and in a very real sense this thing isn't a function in the normal sense at all. This is exactly what I thought was going on with the x86 code, but in the end that turned out to be a simple case of the (assembly) caller assuming a call-clobbered register had survived unchanged when the stack-protector epilogue had clobbered it (as it was quite within its rights to). > It obviously shouldn't crash, which I'll look into, but it is clear > that we can't enable -fstack-protector-all for this function. And now I have a good explanation of why that is for the commit log. Thank you! > So far I'm playing with the patch below to do some basic sanity > checks on the values inside of the sigreturn frame: Good move. Segfaulting the process is fine! :) Any process that does this sort of thing is clearly either terminally buggy, written by an idiot who doesn't know what he's doing (i.e. my original patch) or malicious. These all deserve SEGVs. (I still don't understand why this leads to spurious TLB faults, though. Filling the userland CPU registers with garbage is bad, but should still be reasonably harmless to the kernel, surely?) -- NULL && (void) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-27 21:44 ` Nick Alcock @ 2016-05-27 22:51 ` David Miller 2016-05-29 4:24 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2016-05-27 22:51 UTC (permalink / raw) To: nix; +Cc: linux-kernel, sparclinux, fweimer From: Nick Alcock <nix@esperi.org.uk> Date: Fri, 27 May 2016 22:44:56 +0100 > Good move. Segfaulting the process is fine! :) Any process that does > this sort of thing is clearly either terminally buggy, written by an > idiot who doesn't know what he's doing (i.e. my original patch) or > malicious. These all deserve SEGVs. > > (I still don't understand why this leads to spurious TLB faults, though. > Filling the userland CPU registers with garbage is bad, but should still > be reasonably harmless to the kernel, surely?) I'm trying to figure out the same thing myself. Even the unaligned stack pointer should be gracefully handled by the kernel, so I think it has to be some other element of the register state restore sequence. The one area that deserves auditing is %tstate. This is a privileged register which we treat partially as non-privileged. Specifically we allow the user to modify the condition codes and the %asi register which is encoded into here. But I just went over that a few times. We are really careful to mask and only change those specific fields. I'll keep plugging away at this and also play with your patches to reproduce the bug. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-27 22:51 ` David Miller @ 2016-05-29 4:24 ` David Miller 2016-05-29 17:30 ` Sam Ravnborg 0 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2016-05-29 4:24 UTC (permalink / raw) To: nix; +Cc: linux-kernel, sparclinux, fweimer From: David Miller <davem@davemloft.net> Date: Fri, 27 May 2016 15:51:37 -0700 (PDT) > I'm trying to figure out the same thing myself. Ok, mystery solved. The basic problem is that we don't handle unaligned stacks on return to userspace %100 properly. We would also no handle a stack whose access would trigger a bus error or similar. A little background: The window trap handlers are slightly clever, the trap table entries for them are composed of two pieces of code. First comes the code that actually performs the window fill or spill trap handling, and then there are three instructions at the end which are for exception processing. The userland register window fill handler is: #define FILL_1_GENERIC(ASI) \ add %sp, STACK_BIAS + 0x00, %g1; \ ldxa [%g1 + %g0] ASI, %l0; \ mov 0x08, %g2; \ mov 0x10, %g3; \ ldxa [%g1 + %g2] ASI, %l1; \ mov 0x18, %g5; \ ldxa [%g1 + %g3] ASI, %l2; \ ldxa [%g1 + %g5] ASI, %l3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %l4; \ ldxa [%g1 + %g2] ASI, %l5; \ ldxa [%g1 + %g3] ASI, %l6; \ ldxa [%g1 + %g5] ASI, %l7; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i0; \ ldxa [%g1 + %g2] ASI, %i1; \ ldxa [%g1 + %g3] ASI, %i2; \ ldxa [%g1 + %g5] ASI, %i3; \ add %g1, 0x20, %g1; \ ldxa [%g1 + %g0] ASI, %i4; \ ldxa [%g1 + %g2] ASI, %i5; \ ldxa [%g1 + %g3] ASI, %i6; \ ldxa [%g1 + %g5] ASI, %i7; \ restored; \ retry; nop; nop; nop; nop; \ b,a,pt %xcc, fill_fixup_dax; \ b,a,pt %xcc, fill_fixup_mna; \ b,a,pt %xcc, fill_fixup; And the way this works is that if any of those loads or stores generate an exception, the exception handler can revector to one of those final three branch instructions depending upon which kind of exception the memory access took. For example, for a regular fault, the code goes: winfix_trampoline: rdpr %tpc, %g3 ! Prepare winfixup TNPC or %g3, 0x7c, %g3 ! Compute branch offset wrpr %g3, %tnpc ! Write it into TNPC done ! Trap return All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the trap time program counter, we'll get that final instruction in the trap handler. On return from trap, we have to pull the register window in but we do this by hand instead of just executing a "restore" instruction for several reasons. The largest being that from Niagara and onward we simply don't have enough levels in the trap stack to fully resolve all possible exception cases of a window fault when we are already at trap level 1 (which we enter to get ready to return from the original trap). This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's code branches directly to these to do the window fill by hand if necessary. Now if you look at them, we'll see at the end: ba,a,pt %xcc, user_rtt_fill_fixup; \ ba,a,pt %xcc, user_rtt_fill_fixup; \ ba,a,pt %xcc, user_rtt_fill_fixup; oops all three cases are handled like a fault. This doesn't work because each of these trap types (data access exception, memory address unaligned, and faults) store their auxiliary info in different registers to pass on to the C handler which does the real work. So in the case where the stack was unaligned, the unaligned trap handler setup the arg registers one way, and then we branched to the fault handler. So the FAULT_TYPE_* value was basically garbage, and randomly would generate the backtrace that you saw. This is the fix I am testing right now: ==================== >From d645a0376b0010cd60ae5651b3b84a56f5b0e5a4 Mon Sep 17 00:00:00 2001 From: "David S. Miller" <davem@davemloft.net> Date: Sat, 28 May 2016 20:41:12 -0700 Subject: [PATCH 2/2] sparc64: Fix return from trap window fill crashes. We must handle data access exception as well as memory address unaligned exceptions from return from trap window fill faults, not just normal TLB misses. Signed-off-by: David S. Miller <davem@davemloft.net> --- arch/sparc/include/asm/ttable.h | 8 ++-- arch/sparc/kernel/Makefile | 1 + arch/sparc/kernel/rtrap_64.S | 57 ++++-------------------- arch/sparc/kernel/urtt_fill.S | 98 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 112 insertions(+), 52 deletions(-) create mode 100644 arch/sparc/kernel/urtt_fill.S diff --git a/arch/sparc/include/asm/ttable.h b/arch/sparc/include/asm/ttable.h index 71b5a67..781b9f1 100644 --- a/arch/sparc/include/asm/ttable.h +++ b/arch/sparc/include/asm/ttable.h @@ -589,8 +589,8 @@ user_rtt_fill_64bit: \ restored; \ nop; nop; nop; nop; nop; nop; \ nop; nop; nop; nop; nop; \ - ba,a,pt %xcc, user_rtt_fill_fixup; \ - ba,a,pt %xcc, user_rtt_fill_fixup; \ + ba,a,pt %xcc, user_rtt_fill_fixup_dax; \ + ba,a,pt %xcc, user_rtt_fill_fixup_mna; \ ba,a,pt %xcc, user_rtt_fill_fixup; @@ -652,8 +652,8 @@ user_rtt_fill_32bit: \ restored; \ nop; nop; nop; nop; nop; \ nop; nop; nop; \ - ba,a,pt %xcc, user_rtt_fill_fixup; \ - ba,a,pt %xcc, user_rtt_fill_fixup; \ + ba,a,pt %xcc, user_rtt_fill_fixup_dax; \ + ba,a,pt %xcc, user_rtt_fill_fixup_mna; \ ba,a,pt %xcc, user_rtt_fill_fixup; diff --git a/arch/sparc/kernel/Makefile b/arch/sparc/kernel/Makefile index 7cf9c6e..fdb1332 100644 --- a/arch/sparc/kernel/Makefile +++ b/arch/sparc/kernel/Makefile @@ -21,6 +21,7 @@ CFLAGS_REMOVE_perf_event.o := -pg CFLAGS_REMOVE_pcr.o := -pg endif +obj-$(CONFIG_SPARC64) += urtt_fill.o obj-$(CONFIG_SPARC32) += entry.o wof.o wuf.o obj-$(CONFIG_SPARC32) += etrap_32.o obj-$(CONFIG_SPARC32) += rtrap_32.o diff --git a/arch/sparc/kernel/rtrap_64.S b/arch/sparc/kernel/rtrap_64.S index d08bdaf..216948c 100644 --- a/arch/sparc/kernel/rtrap_64.S +++ b/arch/sparc/kernel/rtrap_64.S @@ -14,10 +14,6 @@ #include <asm/visasm.h> #include <asm/processor.h> -#define RTRAP_PSTATE (PSTATE_TSO|PSTATE_PEF|PSTATE_PRIV|PSTATE_IE) -#define RTRAP_PSTATE_IRQOFF (PSTATE_TSO|PSTATE_PEF|PSTATE_PRIV) -#define RTRAP_PSTATE_AG_IRQOFF (PSTATE_TSO|PSTATE_PEF|PSTATE_PRIV|PSTATE_AG) - #ifdef CONFIG_CONTEXT_TRACKING # define SCHEDULE_USER schedule_user #else @@ -242,52 +238,17 @@ rt_continue: ldx [%sp + PTREGS_OFF + PT_V9_G1], %g1 wrpr %g1, %cwp ba,a,pt %xcc, user_rtt_fill_64bit -user_rtt_fill_fixup: - rdpr %cwp, %g1 - add %g1, 1, %g1 - wrpr %g1, 0x0, %cwp - - rdpr %wstate, %g2 - sll %g2, 3, %g2 - wrpr %g2, 0x0, %wstate - - /* We know %canrestore and %otherwin are both zero. */ - - sethi %hi(sparc64_kern_pri_context), %g2 - ldx [%g2 + %lo(sparc64_kern_pri_context)], %g2 - mov PRIMARY_CONTEXT, %g1 - -661: stxa %g2, [%g1] ASI_DMMU - .section .sun4v_1insn_patch, "ax" - .word 661b - stxa %g2, [%g1] ASI_MMU - .previous - - sethi %hi(KERNBASE), %g1 - flush %g1 +user_rtt_fill_fixup_dax: + ba,pt %xcc, user_rtt_fill_fixup_common + mov 1, %g3 - or %g4, FAULT_CODE_WINFIXUP, %g4 - stb %g4, [%g6 + TI_FAULT_CODE] - stx %g5, [%g6 + TI_FAULT_ADDR] +user_rtt_fill_fixup_mna: + ba,pt %xcc, user_rtt_fill_fixup_common + mov 2, %g3 - mov %g6, %l1 - wrpr %g0, 0x0, %tl - -661: nop - .section .sun4v_1insn_patch, "ax" - .word 661b - SET_GL(0) - .previous - - wrpr %g0, RTRAP_PSTATE, %pstate - - mov %l1, %g6 - ldx [%g6 + TI_TASK], %g4 - LOAD_PER_CPU_BASE(%g5, %g6, %g1, %g2, %g3) - call do_sparc64_fault - add %sp, PTREGS_OFF, %o0 - ba,pt %xcc, rtrap - nop +user_rtt_fill_fixup: + ba,pt %xcc, user_rtt_fill_fixup_common + clr %g3 user_rtt_pre_restore: add %g1, 1, %g1 diff --git a/arch/sparc/kernel/urtt_fill.S b/arch/sparc/kernel/urtt_fill.S new file mode 100644 index 0000000..5604a2b --- /dev/null +++ b/arch/sparc/kernel/urtt_fill.S @@ -0,0 +1,98 @@ +#include <asm/thread_info.h> +#include <asm/trap_block.h> +#include <asm/spitfire.h> +#include <asm/ptrace.h> +#include <asm/head.h> + + .text + .align 8 + .globl user_rtt_fill_fixup_common +user_rtt_fill_fixup_common: + rdpr %cwp, %g1 + add %g1, 1, %g1 + wrpr %g1, 0x0, %cwp + + rdpr %wstate, %g2 + sll %g2, 3, %g2 + wrpr %g2, 0x0, %wstate + + /* We know %canrestore and %otherwin are both zero. */ + + sethi %hi(sparc64_kern_pri_context), %g2 + ldx [%g2 + %lo(sparc64_kern_pri_context)], %g2 + mov PRIMARY_CONTEXT, %g1 + +661: stxa %g2, [%g1] ASI_DMMU + .section .sun4v_1insn_patch, "ax" + .word 661b + stxa %g2, [%g1] ASI_MMU + .previous + + sethi %hi(KERNBASE), %g1 + flush %g1 + + mov %g4, %l4 + mov %g5, %l5 + brnz,pn %g3, 1f + mov %g3, %l3 + + or %g4, FAULT_CODE_WINFIXUP, %g4 + stb %g4, [%g6 + TI_FAULT_CODE] + stx %g5, [%g6 + TI_FAULT_ADDR] +1: + mov %g6, %l1 + wrpr %g0, 0x0, %tl + +661: nop + .section .sun4v_1insn_patch, "ax" + .word 661b + SET_GL(0) + .previous + + wrpr %g0, RTRAP_PSTATE, %pstate + + mov %l1, %g6 + ldx [%g6 + TI_TASK], %g4 + LOAD_PER_CPU_BASE(%g5, %g6, %g1, %g2, %g3) + + brnz,pn %l3, 1f + nop + + call do_sparc64_fault + add %sp, PTREGS_OFF, %o0 + ba,pt %xcc, rtrap + nop + +1: cmp %g3, 2 + bne,pn %xcc, 2f + nop + + sethi %hi(tlb_type), %g1 + lduw [%g1 + %lo(tlb_type)], %g1 + cmp %g1, 3 + bne,pt %icc, 1f + add %sp, PTREGS_OFF, %o0 + mov %l4, %o2 + call sun4v_do_mna + mov %l5, %o1 + ba,a,pt %xcc, rtrap +1: mov %l4, %o1 + mov %l5, %o2 + call mem_address_unaligned + nop + ba,a,pt %xcc, rtrap + +2: sethi %hi(tlb_type), %g1 + mov %l4, %o1 + lduw [%g1 + %lo(tlb_type)], %g1 + mov %l5, %o2 + cmp %g1, 3 + bne,pt %icc, 1f + add %sp, PTREGS_OFF, %o0 + call sun4v_data_access_exception + nop + ba,a,pt %xcc, rtrap + +1: call spitfire_data_access_exception + nop + ba,a,pt %xcc, rtrap -- 2.1.2.532.g19b5d50 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-29 4:24 ` David Miller @ 2016-05-29 17:30 ` Sam Ravnborg 2016-05-30 2:15 ` David Miller 0 siblings, 1 reply; 12+ messages in thread From: Sam Ravnborg @ 2016-05-29 17:30 UTC (permalink / raw) To: David Miller; +Cc: nix, linux-kernel, sparclinux, fweimer Hi Dave. > Ok, mystery solved. Super good explanation... > > ==================== > >From d645a0376b0010cd60ae5651b3b84a56f5b0e5a4 Mon Sep 17 00:00:00 2001 > From: "David S. Miller" <davem@davemloft.net> > Date: Sat, 28 May 2016 20:41:12 -0700 > Subject: [PATCH 2/2] sparc64: Fix return from trap window fill crashes. > > We must handle data access exception as well as memory address unaligned > exceptions from return from trap window fill faults, not just normal > TLB misses. It would be nice to include the explanation from the mail in the changelog. It gives a good background information and it is more accessible in the commit log rather than mails only. Sam ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-29 17:30 ` Sam Ravnborg @ 2016-05-30 2:15 ` David Miller 0 siblings, 0 replies; 12+ messages in thread From: David Miller @ 2016-05-30 2:15 UTC (permalink / raw) To: sam; +Cc: nix, linux-kernel, sparclinux, fweimer From: Sam Ravnborg <sam@ravnborg.org> Date: Sun, 29 May 2016 19:30:38 +0200 > It would be nice to include the explanation from the mail in the changelog. > It gives a good background information and it is more accessible in the > commit log rather than mails only. Yep, will supplement the commit message as well as include the backtraces and reported-by: tags for Nick. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-27 13:19 ` Nick Alcock 2016-05-27 13:34 ` John Paul Adrian Glaubitz 2016-05-27 19:37 ` David Miller @ 2016-05-29 6:02 ` David Miller 2016-05-30 12:43 ` Nix 2 siblings, 1 reply; 12+ messages in thread From: David Miller @ 2016-05-29 6:02 UTC (permalink / raw) To: nix; +Cc: linux-kernel, sparclinux, fweimer From: Nick Alcock <nix@esperi.org.uk> Date: Fri, 27 May 2016 14:19:27 +0100 > The only difference between the two series above is that in the crashing > series, the ka_restorer stub functions __rt_sigreturn_stub and > __sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get > stack-protected; in the non-crashing series, they do not; the same is > true without --enable-stack-protector=all, because the functions have no > local variables at all, so without -fstack-protector-all they don't get > stack-protected in any case. Passing such a stack-protected function in > as the ka_restorer stub seems to suffice to cause this crash at some > later date. I'm wondering if the stack canary is clobbering something > that the caller does not expect to be clobbered: we saw this cause > trouble on x86 in a different context (see upstream commit > 7a25d6a84df9fea56963569ceccaaf7c2a88f161). > > It is clearly acceptable to say "restorer stubs are incompatible with > stack-protector canaries: turn them off" -- there are plenty of places > that are incompatible with canaries for good reason, and quite a lot of > the glibc patch series has been identifying these and turning the stack- > protector off for them -- but it is probably less acceptable to crash > the kernel if they don't do that! So at least some extra armouring seems > to be called for. BTW Nick, in thinking through all of this, I want to strongly encourage you to disable stack protector for all sigreturn stubs in the GLIBC tree. The reason is that any change in the code generated is going to break critical pieces of infrastructure. For example, one of the ways in which gdb knows that it is unwinding through a signal frame is that it parses the code at the program counter for that frame and looks at the instruction sequence to see if matches the signature for a sigreturn stub. So a glibc with stack protector enabled for sigreturn stubs would break backtracing in gdb. The unwinder in libgcc does the same exact thing. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite 2016-05-29 6:02 ` David Miller @ 2016-05-30 12:43 ` Nix 0 siblings, 0 replies; 12+ messages in thread From: Nix @ 2016-05-30 12:43 UTC (permalink / raw) To: David Miller; +Cc: linux-kernel, sparclinux, fweimer On 29 May 2016, David Miller spake thusly: > BTW Nick, in thinking through all of this, I want to strongly encourage > you to disable stack protector for all sigreturn stubs in the GLIBC tree. I completely concur, and have already written (but not committed) a patch to do this: I'll augment the existing sparc-only patch into a sigreturn-stubs patch. I *think* I spotted all the stubs. (Many of them are in assembler, but not all.) (If there's anything else which involves calling functions with a precisely-aligned stack and an expectation of no stack pointer movement in the prologue or epilogue, I'd be interested to know about it, since that'll need inhibit_stack_protector'ing too.) -- NULL && (void) ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-05-30 12:43 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-05-27 11:17 [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite Nick Alcock 2016-05-27 13:19 ` Nick Alcock 2016-05-27 13:34 ` John Paul Adrian Glaubitz 2016-05-27 13:43 ` Nick Alcock 2016-05-27 19:37 ` David Miller 2016-05-27 21:44 ` Nick Alcock 2016-05-27 22:51 ` David Miller 2016-05-29 4:24 ` David Miller 2016-05-29 17:30 ` Sam Ravnborg 2016-05-30 2:15 ` David Miller 2016-05-29 6:02 ` David Miller 2016-05-30 12:43 ` Nix
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).