[4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
@ 2016-05-27 11:17 Nick Alcock
  2016-05-27 13:19 ` Nick Alcock
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Alcock @ 2016-05-27 11:17 UTC (permalink / raw)
  To: linux-kernel, linux-sparc; +Cc: David S. Miller, Florian Weimer

So I've been working on a patch series (see below) that applies GCC's
-fstack-protector{-all,-strong} to almost all of glibc bar the dynamic
linker. In trying to upstream it, one review commenter queried one
SPARC-specific patch in the series; the absence of this patch triggers a
BUG in the SPARC kernel when glibc is tested as an unprivileged user, on
all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on
the ldoms I have access to and presumably on bare hardware too.

This is clearly a bug, and equally clearly I think it needs fixing
before we can upstream the series, which it would be nice to do because
it would have prevented most of the recent spate of glibc stack
overflows from escalating to arbitrary code execution.

First, a representative sample of the BUG, as seen on 4.6.0:

ld-linux.so.2[36805]: segfault at 7ff ip           (null) (rpc           (null)) sp           (null) error 30001 in tst-kill6[100000+4000]
ld-linux.so.2[36806]: segfault at 7ff ip           (null) (rpc           (null)) sp           (null) error 30001 in tst-kill6[100000+4000]
ld-linux.so.2[36807]: segfault at 7ff ip           (null) (rpc           (null)) sp           (null) error 30001 in tst-kill6[100000+4000]
kernel BUG at arch/sparc/mm/fault_64.c:299!
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002    Not tainted
TPC: <do_sparc64_fault+0x5c4/0x700>
g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
RPC: <do_sparc64_fault+0x5bc/0x700>
l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
I7: <user_rtt_fill_fixup+0x6c/0x7c>
Call Trace:
 [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c
Disabling lock debugging due to kernel taint
Caller[0000000000404d4c]: user_rtt_fill_fixup+0x6c/0x7c
Caller[0000000000000000]:           (null)
Instruction DUMP: 9210212b  7fe84179  901222e8 <91d02005> 90102002  92102001  94100018  7fecd033  96100010
Kernel panic - not syncing: Fatal exception
Press Stop-A (L1-A) to return to the boot prom
---[ end Kernel panic - not syncing: Fatal exception

The crash moves around, and can even be seen striking in completely
random userspace processes that aren't part of the glibc under test
(e.g. I've seen it happen inside awk and GCC). The backtrace is always
the same, though.

It seems this is an unexpected TLB fault from this BUG in
do_sparc64_fault():

	if ((fault_code & FAULT_CODE_ITLB) &&
            (fault_code & FAULT_CODE_DTLB))
                BUG();

which certainly explains the randomness to some extent.

Now, some details for replication. It's easy to replicate if you can
build and test glibc using a GCC that supports -fstack-protector-all on
Linux/SPARC: I used 4.9.3. (You don't need to *install* the glibc or
anything, and getting to the crash on reasonable hardware takes only a
few minutes.)

The patch series itself, in the hopefully-not-too-inconvenient form of a
pair of git bundles based on glibc commit
a5df3210a641c175138052037fcdad34298bfa4d (near the glibc-2.23 release),
though this happens on glibc trunk with these bundles merged in too:

<http://www.esperi.org.uk/~nix/src/glibc-crashes.bundle>

<http://www.esperi.org.uk/~nix/src/glibc-workaround.bundle>

You'll need to run autoconf-2.69 in the source tree after checkout,
since I haven't regenerated configure in either of them.

To configure/build/test, I used

../../glibc/configure --enable-stackguard-randomization \
    --enable-stack-protector=all --prefix=/usr --enable-shared \
    --enable-bind-now --enable-maintainer-mode --enable-obsolete-rpc \
    --enable-add-ons=libidn --enable-kernel=4.1 --enable-check-abi=warn \
    && make -j 5 && make -j 5 check TIMEOUTFACTOR=5

though most of the configure flags are probably unnecessary and you'll
probably want to adjust the -j numbers. The crucial one is
--enable-stack-protector=all; without it, the first patch series is
equivalent to the second.

The crash almost invariably happens during the make check run, usually
during or after string/; both 32-bit and 64-bit glibc builds are
affected (the above configure line is for 64-bit). I have not yet
completed as many as four runs without a crash, and it almost always
happens in one or two. You can probably trigger one reliably by simply
rerunning make check in a loop without doing any of the rest of the
rebuilding (but I was reconfiguring and rebuilding because all of that
was scripted).

The only difference between the two series above is that in the crashing
series, the ka_restorer stub functions __rt_sigreturn_stub and
__sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get
stack-protected; in the non-crashing series, they do not; the same is
true without --enable-stack-protector=all, because the functions have no
local variables at all, so without -fstack-protector-all they don't get
stack-protected in any case. Passing such a stack-protected function in
as the ka_restorer stub seems to suffice to cause this crash at some
later date. I'm wondering if the stack canary is clobbering something
that the caller does not expect to be clobbered: we saw this cause
trouble on x86 in a different context (see upstream commit
7a25d6a84df9fea56963569ceccaaf7c2a88f161).

It is clearly acceptable to say "restorer stubs are incompatible with
stack-protector canaries: turn them off" -- there are plenty of places
that are incompatible with canaries for good reason, and quite a lot of
the glibc patch series has been identifying these and turning the stack-
protector off for them -- but it is probably less acceptable to crash
the kernel if they don't do that! So at least some extra armouring seems
to be called for.

But where that extra armouring needs to go, I don't know (probably not
in do_sparc64_fault(), since I guess the underlying bug is way upstream
of this somewhere). I really have no idea what the underlying bug might
*be*. setup_rt_frame() might be a good place to start looking, only of
course that can't on its own explain how the explosion happens at a
later date, or how TLB faults get involved.

Anyway, I hope this is enough to at least replicate the bug: if it's not
-- if I've forgotten some detail, or if there is an environmental
dependence beyond "it's a SPARC" that I don't know about -- feel free to
ask for more info. I'm a mere userspace guy, and barely know about any
variation that may exist in the SPARC world these days. It's quite
possible that the hardware I'm using to test this (on the other side of
the world) is some sort of weird preproduction silicon and I don't know
it and this only happens there: it's certain that its firmware is three
years old... if nobody else can reproduce it, I'll try to dig out some
more hosts with different characteristics and see if it happens on them
too.

Kernel .config for this host (it's huge because it's derived from an
enterprise distro config):

<http://www.esperi.org.uk/~nix/src/config-4.6-sparc>

Very few of those modules are loaded, to wit:

Module                  Size  Used by
ipt_REJECT              1853  2
nf_reject_ipv4          3645  1 ipt_REJECT
nf_conntrack_ipv4      11179  2
nf_defrag_ipv4          1849  1 nf_conntrack_ipv4
iptable_filter          2108  1
ip_tables              20683  1 iptable_filter
ip6t_REJECT             1857  2
nf_reject_ipv6          5205  1 ip6t_REJECT
nf_conntrack_ipv6      11359  2
nf_defrag_ipv6         26774  1 nf_conntrack_ipv6
xt_state                1570  4
nf_conntrack          100343  3 nf_conntrack_ipv4,nf_conntrack_ipv6,xt_state
ip6table_filter         2050  1
ip6_tables             19814  1 ip6table_filter
ipv6                  411857  153 nf_reject_ipv6,nf_conntrack_ipv6,nf_defrag_ipv6,[permanent]
openprom                6699  0
ext4                  608323  2
mbcache                 6913  3 ext4
jbd2                  108713  1 ext4
des_generic            20873  0
sunvnet                 6897  0
sunvdc                 10861  4
dm_mirror              14985  0
dm_region_hash         11360  1 dm_mirror
dm_log                 10973  2 dm_mirror,dm_region_hash
dm_mod                108820  9 dm_mirror,dm_log

... though I doubt the set of loaded modules is likely to affect
reproduction of this bug much.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-27 11:17 [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite Nick Alcock
@ 2016-05-27 13:19 ` Nick Alcock
  2016-05-27 13:34   ` John Paul Adrian Glaubitz
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Nick Alcock @ 2016-05-27 13:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: sparclinux, David S. Miller, Florian Weimer

[Resent with fixed address for sparclinux@; sorry!]

So I've been working on a patch series (see below) that applies GCC's
-fstack-protector{-all,-strong} to almost all of glibc bar the dynamic
linker. In trying to upstream it, one review commenter queried one
SPARC-specific patch in the series; the absence of this patch triggers a
BUG in the SPARC kernel when glibc is tested as an unprivileged user, on
all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on
the ldoms I have access to and presumably on bare hardware too.

This is clearly a bug, and equally clearly I think it needs fixing
before we can upstream the series, which it would be nice to do because
it would have prevented most of the recent spate of glibc stack
overflows from escalating to arbitrary code execution.

First, a representative sample of the BUG, as seen on 4.6.0:

ld-linux.so.2[36805]: segfault at 7ff ip           (null) (rpc           (null)) sp           (null) error 30001 in tst-kill6[100000+4000]
ld-linux.so.2[36806]: segfault at 7ff ip           (null) (rpc           (null)) sp           (null) error 30001 in tst-kill6[100000+4000]
ld-linux.so.2[36807]: segfault at 7ff ip           (null) (rpc           (null)) sp           (null) error 30001 in tst-kill6[100000+4000]
kernel BUG at arch/sparc/mm/fault_64.c:299!
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
ld-linux.so.2(36808): Kernel bad sw trap 5 [#1]
CPU: 1 PID: 36808 Comm: ld-linux.so.2 Not tainted 4.6.0 #34
task: fff8000303be5c60 ti: fff8000301344000 task.ti: fff8000301344000
TSTATE: 0000004410001601 TPC: 0000000000a1a784 TNPC: 0000000000a1a788 Y: 00000002    Not tainted
TPC: <do_sparc64_fault+0x5c4/0x700>
g0: fff8000024fc8248 g1: 0000000000db04dc g2: 0000000000000000 g3: 0000000000000001
g4: fff8000303be5c60 g5: fff800030e672000 g6: fff8000301344000 g7: 0000000000000001
o0: 0000000000b95ee8 o1: 000000000000012b o2: 0000000000000000 o3: 0000000200b9b358
o4: 0000000000000000 o5: fff8000301344040 sp: fff80003013475c1 ret_pc: 0000000000a1a77c
RPC: <do_sparc64_fault+0x5bc/0x700>
l0: 00000000000007ff l1: 0000000000000000 l2: 000000000000005f l3: 0000000000000000
l4: fff8000301347e98 l5: fff8000024ff3060 l6: 0000000000000000 l7: 0000000000000000
i0: fff8000301347f60 i1: 0000000000102400 i2: 0000000000000000 i3: 0000000000000000
i4: 0000000000000000 i5: 0000000000000000 i6: fff80003013476a1 i7: 0000000000404d4c
I7: <user_rtt_fill_fixup+0x6c/0x7c>
Call Trace:
 [0000000000404d4c] user_rtt_fill_fixup+0x6c/0x7c
Disabling lock debugging due to kernel taint
Caller[0000000000404d4c]: user_rtt_fill_fixup+0x6c/0x7c
Caller[0000000000000000]:           (null)
Instruction DUMP: 9210212b  7fe84179  901222e8 <91d02005> 90102002  92102001  94100018  7fecd033  96100010
Kernel panic - not syncing: Fatal exception
Press Stop-A (L1-A) to return to the boot prom
---[ end Kernel panic - not syncing: Fatal exception

The crash moves around, and can even be seen striking in completely
random userspace processes that aren't part of the glibc under test
(e.g. I've seen it happen inside awk and GCC). The backtrace is always
the same, though.

It seems this is an unexpected TLB fault from this BUG in
do_sparc64_fault():

	if ((fault_code & FAULT_CODE_ITLB) &&
            (fault_code & FAULT_CODE_DTLB))
                BUG();

which certainly explains the randomness to some extent.

Now, some details for replication. It's easy to replicate if you can
build and test glibc using a GCC that supports -fstack-protector-all on
Linux/SPARC: I used 4.9.3. (You don't need to *install* the glibc or
anything, and getting to the crash on reasonable hardware takes only a
few minutes.)

The patch series itself, in the hopefully-not-too-inconvenient form of a
pair of git bundles based on glibc commit
a5df3210a641c175138052037fcdad34298bfa4d (near the glibc-2.23 release),
though this happens on glibc trunk with these bundles merged in too:

<http://www.esperi.org.uk/~nix/src/glibc-crashes.bundle>

<http://www.esperi.org.uk/~nix/src/glibc-workaround.bundle>

You'll need to run autoconf-2.69 in the source tree after checkout,
since I haven't regenerated configure in either of them.

To configure/build/test, I used

../../glibc/configure --enable-stackguard-randomization \
    --enable-stack-protector=all --prefix=/usr --enable-shared \
    --enable-bind-now --enable-maintainer-mode --enable-obsolete-rpc \
    --enable-add-ons=libidn --enable-kernel=4.1 --enable-check-abi=warn \
    && make -j 5 && make -j 5 check TIMEOUTFACTOR=5

though most of the configure flags are probably unnecessary and you'll
probably want to adjust the -j numbers. The crucial one is
--enable-stack-protector=all; without it, the first patch series is
equivalent to the second.

The crash almost invariably happens during the make check run, usually
during or after string/; both 32-bit and 64-bit glibc builds are
affected (the above configure line is for 64-bit). I have not yet
completed as many as four runs without a crash, and it almost always
happens in one or two. You can probably trigger one reliably by simply
rerunning make check in a loop without doing any of the rest of the
rebuilding (but I was reconfiguring and rebuilding because all of that
was scripted).

The only difference between the two series above is that in the crashing
series, the ka_restorer stub functions __rt_sigreturn_stub and
__sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get
stack-protected; in the non-crashing series, they do not; the same is
true without --enable-stack-protector=all, because the functions have no
local variables at all, so without -fstack-protector-all they don't get
stack-protected in any case. Passing such a stack-protected function in
as the ka_restorer stub seems to suffice to cause this crash at some
later date. I'm wondering if the stack canary is clobbering something
that the caller does not expect to be clobbered: we saw this cause
trouble on x86 in a different context (see upstream commit
7a25d6a84df9fea56963569ceccaaf7c2a88f161).

It is clearly acceptable to say "restorer stubs are incompatible with
stack-protector canaries: turn them off" -- there are plenty of places
that are incompatible with canaries for good reason, and quite a lot of
the glibc patch series has been identifying these and turning the stack-
protector off for them -- but it is probably less acceptable to crash
the kernel if they don't do that! So at least some extra armouring seems
to be called for.

But where that extra armouring needs to go, I don't know (probably not
in do_sparc64_fault(), since I guess the underlying bug is way upstream
of this somewhere). I really have no idea what the underlying bug might
*be*. setup_rt_frame() might be a good place to start looking, only of
course that can't on its own explain how the explosion happens at a
later date, or how TLB faults get involved.

Anyway, I hope this is enough to at least replicate the bug: if it's not
-- if I've forgotten some detail, or if there is an environmental
dependence beyond "it's a SPARC" that I don't know about -- feel free to
ask for more info. I'm a mere userspace guy, and barely know about any
variation that may exist in the SPARC world these days. It's quite
possible that the hardware I'm using to test this (on the other side of
the world) is some sort of weird preproduction silicon and I don't know
it and this only happens there: it's certain that its firmware is three
years old... if nobody else can reproduce it, I'll try to dig out some
more hosts with different characteristics and see if it happens on them
too.

Kernel .config for this host (it's huge because it's derived from an
enterprise distro config):

<http://www.esperi.org.uk/~nix/src/config-4.6-sparc>

Very few of those modules are loaded, to wit:

Module                  Size  Used by
ipt_REJECT              1853  2
nf_reject_ipv4          3645  1 ipt_REJECT
nf_conntrack_ipv4      11179  2
nf_defrag_ipv4          1849  1 nf_conntrack_ipv4
iptable_filter          2108  1
ip_tables              20683  1 iptable_filter
ip6t_REJECT             1857  2
nf_reject_ipv6          5205  1 ip6t_REJECT
nf_conntrack_ipv6      11359  2
nf_defrag_ipv6         26774  1 nf_conntrack_ipv6
xt_state                1570  4
nf_conntrack          100343  3 nf_conntrack_ipv4,nf_conntrack_ipv6,xt_state
ip6table_filter         2050  1
ip6_tables             19814  1 ip6table_filter
ipv6                  411857  153 nf_reject_ipv6,nf_conntrack_ipv6,nf_defrag_ipv6,[permanent]
openprom                6699  0
ext4                  608323  2
mbcache                 6913  3 ext4
jbd2                  108713  1 ext4
des_generic            20873  0
sunvnet                 6897  0
sunvdc                 10861  4
dm_mirror              14985  0
dm_region_hash         11360  1 dm_mirror
dm_log                 10973  2 dm_mirror,dm_region_hash
dm_mod                108820  9 dm_mirror,dm_log

... though I doubt the set of loaded modules is likely to affect
reproduction of this bug much.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-27 13:19 ` Nick Alcock
@ 2016-05-27 13:34   ` John Paul Adrian Glaubitz
  2016-05-27 13:43     ` Nick Alcock
  2016-05-27 19:37   ` David Miller
  2016-05-29  6:02   ` David Miller
  2 siblings, 1 reply; 12+ messages in thread
From: John Paul Adrian Glaubitz @ 2016-05-27 13:34 UTC (permalink / raw)
  To: Nick Alcock, linux-kernel
  Cc: Sparc kernel list, David S. Miller, Florian Weimer,
	Jose E. Marchesi

Hi Nick!

On 05/27/2016 03:19 PM, Nick Alcock wrote:
> So I've been working on a patch series (see below) that applies GCC's
> -fstack-protector{-all,-strong} to almost all of glibc bar the dynamic
> linker. In trying to upstream it, one review commenter queried one
> SPARC-specific patch in the series; the absence of this patch triggers a
> BUG in the SPARC kernel when glibc is tested as an unprivileged user, on
> all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on
> the ldoms I have access to and presumably on bare hardware too.

I apologize for hijacking this thread but since you are mentioning glibc,
there are actually a couple of tests in the glibc testsuite [1].

Jose Marchesi has started looking at them and most of them seem to be
related to unaligned accesses. Would be great if those could be resolved
in the near future, too.

We're working towards getting Debian's sparc64 port into a releasable
state, but there are some packages like glibc or util-linux that need
work since their testsuites fail.

Cheers,
Adrian

> [1] https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=sparc64&ver=2.22-9&stamp=1464023539

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-27 13:34   ` John Paul Adrian Glaubitz
@ 2016-05-27 13:43     ` Nick Alcock
  0 siblings, 0 replies; 12+ messages in thread
From: Nick Alcock @ 2016-05-27 13:43 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: linux-kernel, Sparc kernel list, David S. Miller, Florian Weimer,
	Jose E. Marchesi

On 27 May 2016, John Paul Adrian Glaubitz outgrape:

> Hi Nick!
>
> On 05/27/2016 03:19 PM, Nick Alcock wrote:
>> So I've been working on a patch series (see below) that applies GCC's
>> -fstack-protector{-all,-strong} to almost all of glibc bar the dynamic
>> linker. In trying to upstream it, one review commenter queried one
>> SPARC-specific patch in the series; the absence of this patch triggers a
>> BUG in the SPARC kernel when glibc is tested as an unprivileged user, on
>> all versions tested from Oracle UEK 4.1 right up to 4.6.0, at least on
>> the ldoms I have access to and presumably on bare hardware too.
>
> I apologize for hijacking this thread but since you are mentioning glibc,
> there are actually a couple of tests in the glibc testsuite [1].

At least one of those failures is spurious:

FAIL: nptl/tst-cond11
original exit status 1
clock = 0
Timed out: killed the child process

You want to pass in a higher TIMEOUTFACTOR to the make check run, and
that problem at least should go away. (The TIMEOUTFACTOR you need
depends on how sluggish your test machine is.)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-27 13:19 ` Nick Alcock
  2016-05-27 13:34   ` John Paul Adrian Glaubitz
@ 2016-05-27 19:37   ` David Miller
  2016-05-27 21:44     ` Nick Alcock
  2016-05-29  6:02   ` David Miller
  2 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2016-05-27 19:37 UTC (permalink / raw)
  To: nix; +Cc: linux-kernel, sparclinux, fweimer

From: Nick Alcock <nix@esperi.org.uk>
Date: Fri, 27 May 2016 14:19:27 +0100

> The only difference between the two series above is that in the crashing
> series, the ka_restorer stub functions __rt_sigreturn_stub and
> __sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get
> stack-protected; in the non-crashing series, they do not; the same is
> true without --enable-stack-protector=all, because the functions have no
> local variables at all, so without -fstack-protector-all they don't get
> stack-protected in any case. Passing such a stack-protected function in
> as the ka_restorer stub seems to suffice to cause this crash at some
> later date. I'm wondering if the stack canary is clobbering something
> that the caller does not expect to be clobbered: we saw this cause
> trouble on x86 in a different context (see upstream commit
> 7a25d6a84df9fea56963569ceccaaf7c2a88f161).

This is amazing that it makes a difference since the sigreturn stub is
implemented entirely in inline assembler :-)

Normally the 64-bit stub is emitted as:

__rt_sigreturn_stub:
        mov 101, %g1
        ta 0x6d

and with -fstack-protector-all we get:

__rt_sigreturn_stub:
        save    %sp, -192, %sp
        ldx     [%g7+40], %g1
        stx     %g1, [%fp+2039]
        mov     0, %g1

        mov 101, %g1
        ta 0x6d

        ldx     [%fp+2039], %g1
        ldx     [%g7+40], %g2
        xor     %g1, %g2, %g1
        mov     0, %g2
        brnz,pn %g1, .LL4
         nop
        return  %i7+8
         nop
.LL4:
        call    __stack_chk_fail, 0
         nop
        nop

That 'save' is the problem.

One can't change the register window or the stack pointer in this
function, as the kernel has setup the restore frame at a precise
location relative to the stack pointer when the stub is invoked.

Basically, do_rt_sigreturn is restoring garbage into the cpu
registers.

It obviously shouldn't crash, which I'll look into, but it is clear
that we can't enable -fstack-protector-all for this function.

So far I'm playing with the patch below to do some basic sanity
checks on the values inside of the sigreturn frame:

diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c
index 3c25241..6eb39a7 100644
--- a/arch/sparc/kernel/signal32.c
+++ b/arch/sparc/kernel/signal32.c
@@ -138,6 +138,18 @@ int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
 	return 0;
 }
 
+/* Checks if the fp is valid.  We always build signal frames which are
+ * 16-byte aligned, therefore we can always enforce that the restore
+ * frame has that property as well.
+ */
+static bool invalid_frame_pointer(void __user *fp, int fplen)
+{
+	if ((((unsigned long) fp) & 15) ||
+	    ((unsigned long)fp) > 0x100000000ULL - fplen)
+		return true;
+	return false;
+}
+
 void do_sigreturn32(struct pt_regs *regs)
 {
 	struct signal_frame32 __user *sf;
@@ -158,8 +170,10 @@ void do_sigreturn32(struct pt_regs *regs)
 	sf = (struct signal_frame32 __user *) regs->u_regs[UREG_FP];
 
 	/* 1. Make sure we are not getting garbage from the user */
-	if (!access_ok(VERIFY_READ, sf, sizeof(*sf)) ||
-	    (((unsigned long) sf) & 3))
+	if (invalid_frame_pointer(sf, sizeof(*sf)))
+		goto segv;
+
+	if (sf->info.si_regs.u_regs[UREG_FP] & 3)
 		goto segv;
 
 	if (get_user(pc, &sf->info.si_regs.pc) ||
@@ -242,8 +256,10 @@ asmlinkage void do_rt_sigreturn32(struct pt_regs *regs)
 	sf = (struct rt_signal_frame32 __user *) regs->u_regs[UREG_FP];
 
 	/* 1. Make sure we are not getting garbage from the user */
-	if (!access_ok(VERIFY_READ, sf, sizeof(*sf)) ||
-	    (((unsigned long) sf) & 3))
+	if (invalid_frame_pointer(sf, sizeof(*sf)))
+		goto segv;
+
+	if (sf->regs.u_regs[UREG_FP] & 3)
 		goto segv;
 
 	if (get_user(pc, &sf->regs.pc) || 
@@ -307,14 +323,6 @@ segv:
 	force_sig(SIGSEGV, current);
 }
 
-/* Checks if the fp is valid */
-static int invalid_frame_pointer(void __user *fp, int fplen)
-{
-	if ((((unsigned long) fp) & 7) || ((unsigned long)fp) > 0x100000000ULL - fplen)
-		return 1;
-	return 0;
-}
-
 static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, unsigned long framesize)
 {
 	unsigned long sp;
diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c
index 39aaec1..a8f0019 100644
--- a/arch/sparc/kernel/signal_64.c
+++ b/arch/sparc/kernel/signal_64.c
@@ -234,6 +234,17 @@ do_sigsegv:
 	goto out;
 }
 
+/* Checks if the fp is valid.  We always build rt signal frames which
+ * are 16-byte aligned, therefore we can always enforce that the
+ * restore frame has that property as well.
+ */
+static bool invalid_frame_pointer(void __user *fp)
+{
+	if (((unsigned long) fp) & 15)
+		return true;
+	return false;
+}
+
 struct rt_signal_frame {
 	struct sparc_stackf	ss;
 	siginfo_t		info;
@@ -261,7 +272,10 @@ void do_rt_sigreturn(struct pt_regs *regs)
 		(regs->u_regs [UREG_FP] + STACK_BIAS);
 
 	/* 1. Make sure we are not getting garbage from the user */
-	if (((unsigned long) sf) & 3)
+	if (invalid_frame_pointer(sf))
+		goto segv;
+
+	if ((sf->regs.u_regs[UREG_FP] + STACK_BIAS) & 7)
 		goto segv;
 
 	err = get_user(tpc, &sf->regs.tpc);
@@ -308,14 +322,6 @@ segv:
 	force_sig(SIGSEGV, current);
 }
 
-/* Checks if the fp is valid */
-static int invalid_frame_pointer(void __user *fp)
-{
-	if (((unsigned long) fp) & 15)
-		return 1;
-	return 0;
-}
-
 static inline void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs, unsigned long framesize)
 {
 	unsigned long sp = regs->u_regs[UREG_FP] + STACK_BIAS;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-27 19:37   ` David Miller
@ 2016-05-27 21:44     ` Nick Alcock
  2016-05-27 22:51       ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Alcock @ 2016-05-27 21:44 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, sparclinux, fweimer

On 27 May 2016, David Miller stated:

> From: Nick Alcock <nix@esperi.org.uk>
> Date: Fri, 27 May 2016 14:19:27 +0100
>
>> The only difference between the two series above is that in the crashing
>> series, the ka_restorer stub functions __rt_sigreturn_stub and
>> __sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get
>> stack-protected; in the non-crashing series, they do not; the same is
>> true without --enable-stack-protector=all, because the functions have no
>> local variables at all, so without -fstack-protector-all they don't get
>> stack-protected in any case. Passing such a stack-protected function in
>> as the ka_restorer stub seems to suffice to cause this crash at some
>> later date. I'm wondering if the stack canary is clobbering something
>> that the caller does not expect to be clobbered: we saw this cause
>> trouble on x86 in a different context (see upstream commit
>> 7a25d6a84df9fea56963569ceccaaf7c2a88f161).
>
> This is amazing that it makes a difference since the sigreturn stub is
> implemented entirely in inline assembler :-)

I was fairly surprised as well, but not shocked, because people who
write a function that consists of one single inline assembler
instruction might well be rather surprised to find a massive pile of
prologue and epilogue code dumped around it!

> Normally the 64-bit stub is emitted as:
>
> __rt_sigreturn_stub:
>         mov 101, %g1
>         ta 0x6d
>
> and with -fstack-protector-all we get:
>
> __rt_sigreturn_stub:
>         save    %sp, -192, %sp
>         ldx     [%g7+40], %g1
>         stx     %g1, [%fp+2039]
>         mov     0, %g1
>
>         mov 101, %g1
>         ta 0x6d
>
>         ldx     [%fp+2039], %g1
>         ldx     [%g7+40], %g2
>         xor     %g1, %g2, %g1
>         mov     0, %g2
>         brnz,pn %g1, .LL4
>          nop
>         return  %i7+8
>          nop
> .LL4:
>         call    __stack_chk_fail, 0
>          nop
>         nop
>
> That 'save' is the problem.
>
> One can't change the register window or the stack pointer in this
> function, as the kernel has setup the restore frame at a precise
> location relative to the stack pointer when the stub is invoked.

Oops!

> Basically, do_rt_sigreturn is restoring garbage into the cpu
> registers.

Oh gods is it supposed to do register restoration? i.e. the usual ABI
rules in re stack changes, etc just don't apply to it?

Right, that's a disaster for stack-protection, obviously. The
stack-protector prologue/epilogue does rather assume that it's being
wrapped around a function, and in a very real sense this thing isn't a
function in the normal sense at all. This is exactly what I thought was
going on with the x86 code, but in the end that turned out to be a
simple case of the (assembly) caller assuming a call-clobbered register
had survived unchanged when the stack-protector epilogue had clobbered
it (as it was quite within its rights to).

> It obviously shouldn't crash, which I'll look into, but it is clear
> that we can't enable -fstack-protector-all for this function.

And now I have a good explanation of why that is for the commit log.
Thank you!

> So far I'm playing with the patch below to do some basic sanity
> checks on the values inside of the sigreturn frame:

Good move. Segfaulting the process is fine! :) Any process that does
this sort of thing is clearly either terminally buggy, written by an
idiot who doesn't know what he's doing (i.e. my original patch) or
malicious. These all deserve SEGVs.

(I still don't understand why this leads to spurious TLB faults, though.
Filling the userland CPU registers with garbage is bad, but should still
be reasonably harmless to the kernel, surely?)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-27 21:44     ` Nick Alcock
@ 2016-05-27 22:51       ` David Miller
  2016-05-29  4:24         ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2016-05-27 22:51 UTC (permalink / raw)
  To: nix; +Cc: linux-kernel, sparclinux, fweimer

From: Nick Alcock <nix@esperi.org.uk>
Date: Fri, 27 May 2016 22:44:56 +0100

> Good move. Segfaulting the process is fine! :) Any process that does
> this sort of thing is clearly either terminally buggy, written by an
> idiot who doesn't know what he's doing (i.e. my original patch) or
> malicious. These all deserve SEGVs.
> 
> (I still don't understand why this leads to spurious TLB faults, though.
> Filling the userland CPU registers with garbage is bad, but should still
> be reasonably harmless to the kernel, surely?)

I'm trying to figure out the same thing myself.

Even the unaligned stack pointer should be gracefully handled by the
kernel, so I think it has to be some other element of the register
state restore sequence.

The one area that deserves auditing is %tstate.  This is a privileged
register which we treat partially as non-privileged.  Specifically we
allow the user to modify the condition codes and the %asi register
which is encoded into here.

But I just went over that a few times.  We are really careful to mask
and only change those specific fields.

I'll keep plugging away at this and also play with your patches to
reproduce the bug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-27 22:51       ` David Miller
@ 2016-05-29  4:24         ` David Miller
  2016-05-29 17:30           ` Sam Ravnborg
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2016-05-29  4:24 UTC (permalink / raw)
  To: nix; +Cc: linux-kernel, sparclinux, fweimer

From: David Miller <davem@davemloft.net>
Date: Fri, 27 May 2016 15:51:37 -0700 (PDT)

> I'm trying to figure out the same thing myself.

Ok, mystery solved.

The basic problem is that we don't handle unaligned stacks on return to userspace
%100 properly.  We would also no handle a stack whose access would trigger a bus
error or similar.

A little background:

The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code.  First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.

The userland register window fill handler is:

#define FILL_1_GENERIC(ASI)				\
	add	%sp, STACK_BIAS + 0x00, %g1;		\
	ldxa	[%g1 + %g0] ASI, %l0;			\
	mov	0x08, %g2;				\
	mov	0x10, %g3;				\
	ldxa	[%g1 + %g2] ASI, %l1;			\
	mov	0x18, %g5;				\
	ldxa	[%g1 + %g3] ASI, %l2;			\
	ldxa	[%g1 + %g5] ASI, %l3;			\
	add	%g1, 0x20, %g1;				\
	ldxa	[%g1 + %g0] ASI, %l4;			\
	ldxa	[%g1 + %g2] ASI, %l5;			\
	ldxa	[%g1 + %g3] ASI, %l6;			\
	ldxa	[%g1 + %g5] ASI, %l7;			\
	add	%g1, 0x20, %g1;				\
	ldxa	[%g1 + %g0] ASI, %i0;			\
	ldxa	[%g1 + %g2] ASI, %i1;			\
	ldxa	[%g1 + %g3] ASI, %i2;			\
	ldxa	[%g1 + %g5] ASI, %i3;			\
	add	%g1, 0x20, %g1;				\
	ldxa	[%g1 + %g0] ASI, %i4;			\
	ldxa	[%g1 + %g2] ASI, %i5;			\
	ldxa	[%g1 + %g3] ASI, %i6;			\
	ldxa	[%g1 + %g5] ASI, %i7;			\
	restored;					\
	retry; nop; nop; nop; nop;			\
	b,a,pt	%xcc, fill_fixup_dax;			\
	b,a,pt	%xcc, fill_fixup_mna;			\
	b,a,pt	%xcc, fill_fixup;

And the way this works is that if any of those loads or stores
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took.

For example, for a regular fault, the code goes:

winfix_trampoline:
	rdpr	%tpc, %g3			! Prepare winfixup TNPC
	or	%g3, 0x7c, %g3			! Compute branch offset
	wrpr	%g3, %tnpc			! Write it into TNPC
	done					! Trap return

All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.

On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons.  The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).

This is executed inline via the FILL_*_RTRAP handlers.  rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary.  Now if you look at them, we'll see at the end:

	ba,a,pt	%xcc, user_rtt_fill_fixup;		\
	ba,a,pt	%xcc, user_rtt_fill_fixup;		\
	ba,a,pt	%xcc, user_rtt_fill_fixup;

oops all three cases are handled like a fault.

This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.

So in the case where the stack was unaligned, the unaligned trap
handler setup the arg registers one way, and then we branched to
the fault handler.

So the FAULT_TYPE_* value was basically garbage, and randomly would
generate the backtrace that you saw.

This is the fix I am testing right now:

====================
>From d645a0376b0010cd60ae5651b3b84a56f5b0e5a4 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Sat, 28 May 2016 20:41:12 -0700
Subject: [PATCH 2/2] sparc64: Fix return from trap window fill crashes.

We must handle data access exception as well as memory address unaligned
exceptions from return from trap window fill faults, not just normal
TLB misses.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/include/asm/ttable.h |  8 ++--
 arch/sparc/kernel/Makefile      |  1 +
 arch/sparc/kernel/rtrap_64.S    | 57 ++++--------------------
 arch/sparc/kernel/urtt_fill.S   | 98 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 112 insertions(+), 52 deletions(-)
 create mode 100644 arch/sparc/kernel/urtt_fill.S

diff --git a/arch/sparc/include/asm/ttable.h b/arch/sparc/include/asm/ttable.h
index 71b5a67..781b9f1 100644
--- a/arch/sparc/include/asm/ttable.h
+++ b/arch/sparc/include/asm/ttable.h
@@ -589,8 +589,8 @@ user_rtt_fill_64bit:					\
 	 restored;					\
 	nop; nop; nop; nop; nop; nop;			\
 	nop; nop; nop; nop; nop;			\
-	ba,a,pt	%xcc, user_rtt_fill_fixup;		\
-	ba,a,pt	%xcc, user_rtt_fill_fixup;		\
+	ba,a,pt	%xcc, user_rtt_fill_fixup_dax;		\
+	ba,a,pt	%xcc, user_rtt_fill_fixup_mna;		\
 	ba,a,pt	%xcc, user_rtt_fill_fixup;
 
 
@@ -652,8 +652,8 @@ user_rtt_fill_32bit:					\
 	 restored;					\
 	nop; nop; nop; nop; nop;			\
 	nop; nop; nop;					\
-	ba,a,pt	%xcc, user_rtt_fill_fixup;		\
-	ba,a,pt	%xcc, user_rtt_fill_fixup;		\
+	ba,a,pt	%xcc, user_rtt_fill_fixup_dax;		\
+	ba,a,pt	%xcc, user_rtt_fill_fixup_mna;		\
 	ba,a,pt	%xcc, user_rtt_fill_fixup;
 
 
diff --git a/arch/sparc/kernel/Makefile b/arch/sparc/kernel/Makefile
index 7cf9c6e..fdb1332 100644
--- a/arch/sparc/kernel/Makefile
+++ b/arch/sparc/kernel/Makefile
@@ -21,6 +21,7 @@ CFLAGS_REMOVE_perf_event.o := -pg
 CFLAGS_REMOVE_pcr.o := -pg
 endif
 
+obj-$(CONFIG_SPARC64)   += urtt_fill.o
 obj-$(CONFIG_SPARC32)   += entry.o wof.o wuf.o
 obj-$(CONFIG_SPARC32)   += etrap_32.o
 obj-$(CONFIG_SPARC32)   += rtrap_32.o
diff --git a/arch/sparc/kernel/rtrap_64.S b/arch/sparc/kernel/rtrap_64.S
index d08bdaf..216948c 100644
--- a/arch/sparc/kernel/rtrap_64.S
+++ b/arch/sparc/kernel/rtrap_64.S
@@ -14,10 +14,6 @@
 #include <asm/visasm.h>
 #include <asm/processor.h>
 
-#define		RTRAP_PSTATE		(PSTATE_TSO|PSTATE_PEF|PSTATE_PRIV|PSTATE_IE)
-#define		RTRAP_PSTATE_IRQOFF	(PSTATE_TSO|PSTATE_PEF|PSTATE_PRIV)
-#define		RTRAP_PSTATE_AG_IRQOFF	(PSTATE_TSO|PSTATE_PEF|PSTATE_PRIV|PSTATE_AG)
-
 #ifdef CONFIG_CONTEXT_TRACKING
 # define SCHEDULE_USER schedule_user
 #else
@@ -242,52 +238,17 @@ rt_continue:	ldx			[%sp + PTREGS_OFF + PT_V9_G1], %g1
 		 wrpr			%g1, %cwp
 		ba,a,pt			%xcc, user_rtt_fill_64bit
 
-user_rtt_fill_fixup:
-		rdpr	%cwp, %g1
-		add	%g1, 1, %g1
-		wrpr	%g1, 0x0, %cwp
-
-		rdpr	%wstate, %g2
-		sll	%g2, 3, %g2
-		wrpr	%g2, 0x0, %wstate
-
-		/* We know %canrestore and %otherwin are both zero.  */
-
-		sethi	%hi(sparc64_kern_pri_context), %g2
-		ldx	[%g2 + %lo(sparc64_kern_pri_context)], %g2
-		mov	PRIMARY_CONTEXT, %g1
-
-661:		stxa	%g2, [%g1] ASI_DMMU
-		.section .sun4v_1insn_patch, "ax"
-		.word	661b
-		stxa	%g2, [%g1] ASI_MMU
-		.previous
-
-		sethi	%hi(KERNBASE), %g1
-		flush	%g1
+user_rtt_fill_fixup_dax:
+		ba,pt	%xcc, user_rtt_fill_fixup_common
+		 mov	1, %g3
 
-		or	%g4, FAULT_CODE_WINFIXUP, %g4
-		stb	%g4, [%g6 + TI_FAULT_CODE]
-		stx	%g5, [%g6 + TI_FAULT_ADDR]
+user_rtt_fill_fixup_mna:
+		ba,pt	%xcc, user_rtt_fill_fixup_common
+		 mov	2, %g3
 
-		mov	%g6, %l1
-		wrpr	%g0, 0x0, %tl
-
-661:		nop
-		.section		.sun4v_1insn_patch, "ax"
-		.word			661b
-		SET_GL(0)
-		.previous
-
-		wrpr	%g0, RTRAP_PSTATE, %pstate
-
-		mov	%l1, %g6
-		ldx	[%g6 + TI_TASK], %g4
-		LOAD_PER_CPU_BASE(%g5, %g6, %g1, %g2, %g3)
-		call	do_sparc64_fault
-		 add	%sp, PTREGS_OFF, %o0
-		ba,pt	%xcc, rtrap
-		 nop
+user_rtt_fill_fixup:
+		ba,pt	%xcc, user_rtt_fill_fixup_common
+		 clr	%g3
 
 user_rtt_pre_restore:
 		add			%g1, 1, %g1
diff --git a/arch/sparc/kernel/urtt_fill.S b/arch/sparc/kernel/urtt_fill.S
new file mode 100644
index 0000000..5604a2b
--- /dev/null
+++ b/arch/sparc/kernel/urtt_fill.S
@@ -0,0 +1,98 @@
+#include <asm/thread_info.h>
+#include <asm/trap_block.h>
+#include <asm/spitfire.h>
+#include <asm/ptrace.h>
+#include <asm/head.h>
+
+		.text
+		.align	8
+		.globl	user_rtt_fill_fixup_common
+user_rtt_fill_fixup_common:
+		rdpr	%cwp, %g1
+		add	%g1, 1, %g1
+		wrpr	%g1, 0x0, %cwp
+
+		rdpr	%wstate, %g2
+		sll	%g2, 3, %g2
+		wrpr	%g2, 0x0, %wstate
+
+		/* We know %canrestore and %otherwin are both zero.  */
+
+		sethi	%hi(sparc64_kern_pri_context), %g2
+		ldx	[%g2 + %lo(sparc64_kern_pri_context)], %g2
+		mov	PRIMARY_CONTEXT, %g1
+
+661:		stxa	%g2, [%g1] ASI_DMMU
+		.section .sun4v_1insn_patch, "ax"
+		.word	661b
+		stxa	%g2, [%g1] ASI_MMU
+		.previous
+
+		sethi	%hi(KERNBASE), %g1
+		flush	%g1
+
+		mov	%g4, %l4
+		mov	%g5, %l5
+		brnz,pn	%g3, 1f
+		 mov	%g3, %l3
+
+		or	%g4, FAULT_CODE_WINFIXUP, %g4
+		stb	%g4, [%g6 + TI_FAULT_CODE]
+		stx	%g5, [%g6 + TI_FAULT_ADDR]
+1:
+		mov	%g6, %l1
+		wrpr	%g0, 0x0, %tl
+
+661:		nop
+		.section		.sun4v_1insn_patch, "ax"
+		.word			661b
+		SET_GL(0)
+		.previous
+
+		wrpr	%g0, RTRAP_PSTATE, %pstate
+
+		mov	%l1, %g6
+		ldx	[%g6 + TI_TASK], %g4
+		LOAD_PER_CPU_BASE(%g5, %g6, %g1, %g2, %g3)
+
+		brnz,pn	%l3, 1f
+		 nop
+
+		call	do_sparc64_fault
+		 add	%sp, PTREGS_OFF, %o0
+		ba,pt	%xcc, rtrap
+		 nop
+
+1:		cmp	%g3, 2
+		bne,pn	%xcc, 2f
+		 nop
+
+		sethi	%hi(tlb_type), %g1
+		lduw	[%g1 + %lo(tlb_type)], %g1
+		cmp	%g1, 3
+		bne,pt	%icc, 1f
+		 add	%sp, PTREGS_OFF, %o0
+		mov	%l4, %o2
+		call	sun4v_do_mna
+		 mov	%l5, %o1
+		ba,a,pt	%xcc, rtrap
+1:		mov	%l4, %o1
+		mov	%l5, %o2
+		call	mem_address_unaligned
+		 nop
+		ba,a,pt	%xcc, rtrap
+
+2:		sethi	%hi(tlb_type), %g1
+		mov	%l4, %o1
+		lduw	[%g1 + %lo(tlb_type)], %g1
+		mov	%l5, %o2
+		cmp	%g1, 3
+		bne,pt	%icc, 1f
+		 add	%sp, PTREGS_OFF, %o0
+		call	sun4v_data_access_exception
+		 nop
+		ba,a,pt	%xcc, rtrap
+
+1:		call	spitfire_data_access_exception
+		 nop
+		ba,a,pt	%xcc, rtrap
-- 
2.1.2.532.g19b5d50

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-29  4:24         ` David Miller
@ 2016-05-29 17:30           ` Sam Ravnborg
  2016-05-30  2:15             ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Sam Ravnborg @ 2016-05-29 17:30 UTC (permalink / raw)
  To: David Miller; +Cc: nix, linux-kernel, sparclinux, fweimer

Hi Dave.

> Ok, mystery solved.

Super good explanation...
> 
> ====================
> >From d645a0376b0010cd60ae5651b3b84a56f5b0e5a4 Mon Sep 17 00:00:00 2001
> From: "David S. Miller" <davem@davemloft.net>
> Date: Sat, 28 May 2016 20:41:12 -0700
> Subject: [PATCH 2/2] sparc64: Fix return from trap window fill crashes.
> 
> We must handle data access exception as well as memory address unaligned
> exceptions from return from trap window fill faults, not just normal
> TLB misses.

It would be nice to include the explanation from the mail in the changelog.
It gives a good background information and it is more accessible in the
commit log rather than mails only.

	Sam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-29 17:30           ` Sam Ravnborg
@ 2016-05-30  2:15             ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2016-05-30  2:15 UTC (permalink / raw)
  To: sam; +Cc: nix, linux-kernel, sparclinux, fweimer

From: Sam Ravnborg <sam@ravnborg.org>
Date: Sun, 29 May 2016 19:30:38 +0200

> It would be nice to include the explanation from the mail in the changelog.
> It gives a good background information and it is more accessible in the
> commit log rather than mails only.

Yep, will supplement the commit message as well as include the backtraces
and reported-by: tags for Nick.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-27 13:19 ` Nick Alcock
  2016-05-27 13:34   ` John Paul Adrian Glaubitz
  2016-05-27 19:37   ` David Miller
@ 2016-05-29  6:02   ` David Miller
  2016-05-30 12:43     ` Nix
  2 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2016-05-29  6:02 UTC (permalink / raw)
  To: nix; +Cc: linux-kernel, sparclinux, fweimer

From: Nick Alcock <nix@esperi.org.uk>
Date: Fri, 27 May 2016 14:19:27 +0100

> The only difference between the two series above is that in the crashing
> series, the ka_restorer stub functions __rt_sigreturn_stub and
> __sigreturn_stub (on sparc32) and __rt_sigreturn_stub (on sparc64) get
> stack-protected; in the non-crashing series, they do not; the same is
> true without --enable-stack-protector=all, because the functions have no
> local variables at all, so without -fstack-protector-all they don't get
> stack-protected in any case. Passing such a stack-protected function in
> as the ka_restorer stub seems to suffice to cause this crash at some
> later date. I'm wondering if the stack canary is clobbering something
> that the caller does not expect to be clobbered: we saw this cause
> trouble on x86 in a different context (see upstream commit
> 7a25d6a84df9fea56963569ceccaaf7c2a88f161).
> 
> It is clearly acceptable to say "restorer stubs are incompatible with
> stack-protector canaries: turn them off" -- there are plenty of places
> that are incompatible with canaries for good reason, and quite a lot of
> the glibc patch series has been identifying these and turning the stack-
> protector off for them -- but it is probably less acceptable to crash
> the kernel if they don't do that! So at least some extra armouring seems
> to be called for.

BTW Nick, in thinking through all of this, I want to strongly encourage
you to disable stack protector for all sigreturn stubs in the GLIBC tree.

The reason is that any change in the code generated is going to break
critical pieces of infrastructure.

For example, one of the ways in which gdb knows that it is unwinding
through a signal frame is that it parses the code at the program
counter for that frame and looks at the instruction sequence to see if
matches the signature for a sigreturn stub.

So a glibc with stack protector enabled for sigreturn stubs would break
backtracing in gdb.

The unwinder in libgcc does the same exact thing.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite
  2016-05-29  6:02   ` David Miller
@ 2016-05-30 12:43     ` Nix
  0 siblings, 0 replies; 12+ messages in thread
From: Nix @ 2016-05-30 12:43 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, sparclinux, fweimer

On 29 May 2016, David Miller spake thusly:

> BTW Nick, in thinking through all of this, I want to strongly encourage
> you to disable stack protector for all sigreturn stubs in the GLIBC tree.

I completely concur, and have already written (but not committed) a
patch to do this: I'll augment the existing sparc-only patch into a
sigreturn-stubs patch. I *think* I spotted all the stubs. (Many of them
are in assembler, but not all.)

(If there's anything else which involves calling functions with a
precisely-aligned stack and an expectation of no stack pointer movement
in the prologue or epilogue, I'd be interested to know about it, since
that'll need inhibit_stack_protector'ing too.)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-05-30 12:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-27 11:17 [4.1.x -- 4.6.x and probably HEAD] Reproducible unprivileged panic/TLB BUG on sparc via a stack-protected rt_sigaction() ka_restorer, courtesy of the glibc testsuite Nick Alcock
2016-05-27 13:19 ` Nick Alcock
2016-05-27 13:34   ` John Paul Adrian Glaubitz
2016-05-27 13:43     ` Nick Alcock
2016-05-27 19:37   ` David Miller
2016-05-27 21:44     ` Nick Alcock
2016-05-27 22:51       ` David Miller
2016-05-29  4:24         ` David Miller
2016-05-29 17:30           ` Sam Ravnborg
2016-05-30  2:15             ` David Miller
2016-05-29  6:02   ` David Miller
2016-05-30 12:43     ` Nix

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).