* CET shadow stack app compatibility @ 2022-11-14 23:15 Edgecombe, Rick P 2022-11-15 2:01 ` Linus Torvalds 2022-11-15 9:43 ` Peter Zijlstra 0 siblings, 2 replies; 9+ messages in thread From: Edgecombe, Rick P @ 2022-11-14 23:15 UTC (permalink / raw) To: Torvalds, Linus Cc: keescook@chromium.org, linux-kernel@vger.kernel.org, fweimer@redhat.com, hjl.tools@gmail.com, x86@kernel.org Hi Linus, Could you weigh in on some brewing ecosystem compatibility issues around x86 CET[0] shadow stacks? This is the CPU security feature that keeps a separate protected stack to record return addresses, and verifies them on return. Support for this feature is not upstream in the kernel and so the issues discussed here are future problems that have not happened yet. The issues all have a root cause of support for CET in tools spreading widely while kernel support was still in development. This has lead to: 1. Some existing binaries (node.js, PyPy, CRIU) that will break when glibc updates to use the kernel CET APIs. 2. GCC C++ exception stack unwinding code expecting old development versions of the kernel ABI. On the first issue, once there is kernel support, glibc plans to immediately update in such a way that some existing distro binaries will break against it. So the scenario is existing distro binaries being used with future versions of glibc. The known extent of breakage is limited to some packages of node.js and PyPy, and any version of CRIU, but it’s reasonable to assume that there are undetected breakages based on how it came about. The breakage derives from how the decision is made on whether to enable shadow stack enforcement. Glibc will do this by checking a bit in the elf header of the binary. It then tells the kernel to turn CET on via a separate kernel API. But instead of this elf bit being selected by application developers, it was mostly applied in various automated ways (mostly default on) by distro builds for years. This huge amount of untested enablement has not generated any visible issues for users yet, because without kernel support the presence of this bit has not generated any actual CET enforcement. In some ways it is a variation of past compatibility problems around distros overriding package defaults for compiler hardening. But the difference is that the kernel support is involved in doing the enforcement in this case, leading to the issues going undetected. For the second issue, there are also problems lurking in gcc. The gcc CET support has preceded the kernel changes and the unwinding code assumes things about the kernel shadow stack signal frame ABI that have changed over the course of CET kernel development. It is compatible by luck for now, but old GGC’s that apply the existing elf bit (going back to gcc-8) can generate future binaries that would constrain the shadow stack signal frame from expanding, which there are already plans to do. I would like to make this go smoother all around by having the kernel detect the existing elf bit and refuse to enable CET for these applications, like this[1]. Then the binaries derived from the pre- kernel support era would just continue to run normally without CET enforcement. The intention would be to force tools to pick a new elf bit to denote compatibility for this feature. With a tools reset, this time the upstream kernel would have shadow stack support ahead of tools and so any issues would likely show up earlier. The best place to exclude the old binaries from shadow stack support would be in the glibc loader, but developers of that (on CC) are against creating new CET elf bits. So the kernel would be taking a stand here and would essentially burn this bit from the kernel side. Are you generally ok with the kernel reaching out and getting involved in this shadow stack enablement decision like this? Thanks, Rick [0] https://lwn.net/Articles/885220/ [1] https://lore.kernel.org/lkml/20221104223604.29615-38-rick.p.edgecombe@intel.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: CET shadow stack app compatibility 2022-11-14 23:15 CET shadow stack app compatibility Edgecombe, Rick P @ 2022-11-15 2:01 ` Linus Torvalds 2022-11-15 7:33 ` Florian Weimer 2022-11-15 9:43 ` Peter Zijlstra 1 sibling, 1 reply; 9+ messages in thread From: Linus Torvalds @ 2022-11-15 2:01 UTC (permalink / raw) To: Edgecombe, Rick P Cc: keescook@chromium.org, linux-kernel@vger.kernel.org, fweimer@redhat.com, hjl.tools@gmail.com, x86@kernel.org On Mon, Nov 14, 2022 at 3:15 PM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > I would like to make this go smoother all around by having the kernel > detect the existing elf bit and refuse to enable CET for these > applications, like this[1]. Honestly, I don't want to preemptively say 'this won't work". That said, once CET is enabled in the kernel, and it turns out that people complain that it breaks existing binaries, at that point I guess it gets disabled again. Possibly at that point using something like your suggested patch. But I'm not doing it until actual problems appear, and until we actually have this code in the kernel. I'm disgusted by glibc being willing to just upgrade and break existing binaries and take the "you shouldn't upgrade glibc if you have old binaries" approach. But hey, I guess that's part for the course for glibc, and there's nothing I can do about that. But yes, once people complain, I'll just make sure that old binaries continue to work, and at that point the glibc and tooling people will presumably have to fix their broken situation to get CET at all. Because no, the kernel doesn't enable CET if it breaks binaries. That's how we roll. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: CET shadow stack app compatibility 2022-11-15 2:01 ` Linus Torvalds @ 2022-11-15 7:33 ` Florian Weimer 2022-11-15 16:57 ` Edgecombe, Rick P 0 siblings, 1 reply; 9+ messages in thread From: Florian Weimer @ 2022-11-15 7:33 UTC (permalink / raw) To: Linus Torvalds Cc: Edgecombe, Rick P, keescook@chromium.org, linux-kernel@vger.kernel.org, hjl.tools@gmail.com, x86@kernel.org * Linus Torvalds: > I'm disgusted by glibc being willing to just upgrade and break > existing binaries and take the "you shouldn't upgrade glibc if you > have old binaries" approach. We've been in this position for years. Every time we use a new system call to implement existing functionality in glibc, some applications break. Mostly due to seccomp filters. They break even if there would be no observable differences for applictions in the way the new system calls would be invoked if the seccomp filter wouldn't block them. I proposed a new ENOSYS handshake between userspace and kernel to reduce the amount of breakage (but not all of it). Senior kernel developers rejected it, so we didn't implement it in glibc. [PATCH] syscalls: Document OCI seccomp filter interactions & workaround <https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/> (It deals with OCI because it's well-documented, but the same principle would have applied to browser sandboxes, too.) Instead, we work with distributions and upstreams to make sure the applications are ready before the next distribution glibc update. Fortunately, there seems to be a pretty broad overlap between seccomp-using applications and applications with frequent, more-or-less mandatory updates, so the transition periods are relatively short. You didn't seem to have noticed, so maybe we aren't doing such a bad job after all. I don't see why CET or x86 shadow stack support could not be handled in the same way. (There is probably a similar overlap.) At least we should try how far we can get with the existing binaries, and if things turn out not working after all, we will have to start over with different markers. But the kernel shouldn't have to care. Based on what we have seen so far (and since fixed), it's mostly shared objects that weren't marked up correctly. The posted hack didn't even deal with that case. If the main executable has the current markers, the kernel will not disable shadow stack, and the process will still crash after loading the incorrectly marked shared object. Someone has to step in and fix things for real (so that they don't break again just after rebuild with a current toolchain adding the current markers). The kernel patch makes this harder because it's not possible anymore to use an existing distribution for this kind of work. Instead, we'd have to wait for a rebuild with the new markers, and of course this rebuild will put is in exactly the same position as before: the incompatibilities will be back because they are no longer masked by the kernel. Fortunately, we are in a way better situation on x86 than where we are with PAC on AArch64: there you have to reboot with a custom kernel option to disable PAC and restore compatibility with applications. (As far as I know, PAC state isn't process-switched, which I find rather flabbergasting.) Furthermore, the way it was deployed in application and libraries was largely unconditional (hard-coded into hand-written assembly, without preprocessor conditionals to see of PAC was enabled during the build). At least the presence of CET features depends on CET compiler flags, and we can easily turn it off on a per-process basis if there are any incompatibilities. Thanks, Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: CET shadow stack app compatibility 2022-11-15 7:33 ` Florian Weimer @ 2022-11-15 16:57 ` Edgecombe, Rick P 2022-12-02 18:48 ` Florian Weimer 0 siblings, 1 reply; 9+ messages in thread From: Edgecombe, Rick P @ 2022-11-15 16:57 UTC (permalink / raw) To: Torvalds, Linus, fweimer@redhat.com Cc: keescook@chromium.org, linux-kernel@vger.kernel.org, hjl.tools@gmail.com, x86@kernel.org On Tue, 2022-11-15 at 08:33 +0100, Florian Weimer wrote: > Based on what we have seen so far (and since fixed), it's mostly > shared > objects that weren't marked up correctly. For the benefit of anyone that is not involved in CET... As PeterZ was just discussing, "CET" consists of two mostly independent features: "IBT" and "Shadow Stack". I am currently trying to enable userspace shadow stack in the kernel. No IBT enforcement will happen in userspace for the time being. For IBT, which seems to be in worse shape than shadow stack from an existing userspace perspective, I have also seen shared objects with issues. For shadow stack, it was just JITing binaries. Of course if glibc is compiled in non-permissive mode there is an additional category of issues around dlopen()ing that we haven't even discussed yet. And the past issues around makecontext() we have already worked around from the kernel. If you are aware of any other specific compatibility problems, please share so we can discuss the extent. > The posted hack didn't even > deal with that case. If the main executable has the current markers, > the kernel will not disable shadow stack, and the process will still > crash after loading the incorrectly marked shared object. The proposed glibc changes would not enable shadow stack unless the execing binary has the elf bit marked. So if we block those binaries (which the kernel can easily check) from enabling shadow stack, none of the linked shared objects will have shadow stack either. So I think we are ok to hold this in our back pocket to resolve the known issues if anyone complains. Where the shared objects could come into play is, in the event that we have to block the old elf bit from the kernel, and a new one is properly marked on a new executable, future glibcs could decide to honor the old bits when checking shared libraries. So you could have an executable with SHSTK2 bit loading a problem SO with just SHSTK1 bit. It would indeed be more difficult for the kernel to detect this, especially in the dlopen() case, but it should not prevent simply blocking any day 1 kernel support binaries. Please, please, don't do this in the future if it comes up though. If the kernel can't find any good options, it risks shadow stack getting reverted for everyone. > Someone has > to step in and fix things for real (so that they don't break again > just > after rebuild with a current toolchain adding the current markers). > The > kernel patch makes this harder because it's not possible anymore to > use > an existing distribution for this kind of work. There was an EXPERT config for things like this, and I was mulling a runtime sysctl. But I think now the idea is that the patch could serve a "better than a full revert" purpose. Not an ideal solution. But I still don't see why doing the order: 1. kernel support 2. libc support 3. compiler support ...wouldn't have generated a more normal situation where old binaries don't break against new kernels and testing can easily happen to reduce issues further. So we could still reset and do exactly that. > Instead, we'd have to > wait for a rebuild with the new markers, and of course this rebuild > will > put is in exactly the same position as before: the incompatibilities > will be back because they are no longer masked by the kernel. People building new apps and testing them against upstream kernels and finding issues sounds like business as usual. I'm not trying to solve all possible userspace mistakes from the kernel. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: CET shadow stack app compatibility 2022-11-15 16:57 ` Edgecombe, Rick P @ 2022-12-02 18:48 ` Florian Weimer 2022-12-05 19:02 ` Edgecombe, Rick P 0 siblings, 1 reply; 9+ messages in thread From: Florian Weimer @ 2022-12-02 18:48 UTC (permalink / raw) To: Edgecombe, Rick P Cc: Torvalds, Linus, keescook@chromium.org, linux-kernel@vger.kernel.org, hjl.tools@gmail.com, x86@kernel.org * Rick P. Edgecombe: > For IBT, which seems to be in worse shape than shadow stack from an > existing userspace perspective, I have also seen shared objects with > issues. > > For shadow stack, it was just JITing binaries. Except that the actual JITters are usually in shared objects, too, and you just assume here that they get loaded by a main program from the same build. 8-) I think most of them are reusable independently, or are bundled into applications built with a different toolchain. > Of course if glibc is compiled in non-permissive mode there is an > additional category of issues around dlopen()ing that we haven't even > discussed yet. And the past issues around makecontext() we have > already worked around from the kernel. If you are aware of any other > specific compatibility problems, please share so we can discuss the > extent. H.J. ran most of the experiments on Fedora. We did some early validation many years ago, using the first ABI iteration. We didn't have as much reach as we liked in terms of hardening at the time, if I recall correctly, but there were only very few cases where something did not work and was also not marked as incompatible. >> The posted hack didn't even >> deal with that case. If the main executable has the current markers, >> the kernel will not disable shadow stack, and the process will still >> crash after loading the incorrectly marked shared object. > > The proposed glibc changes would not enable shadow stack unless the > execing binary has the elf bit marked. So if we block those binaries > (which the kernel can easily check) from enabling shadow stack, none of > the linked shared objects will have shadow stack either. So I think we > are ok to hold this in our back pocket to resolve the known issues if > anyone complains. See above, the assumption that the JITter and the main program come from the same build that is implicit in this is not actually true in practice. > Where the shared objects could come into play is, in the event that we > have to block the old elf bit from the kernel, and a new one is > properly marked on a new executable, future glibcs could decide to > honor the old bits when checking shared libraries. So you could have an > executable with SHSTK2 bit loading a problem SO with just SHSTK1 bit. Right. But we can also have policies in userspace to paper over this. I'm not worried about it. I want to see how far we can get before making the flip in an upstream version of glibc, but if the kernel enforces SHSTK2 (even just on executables), I need a toolchain update plus a rebuild of a large chunk of the distribution. So with reusing SHSTK1 markup, it goes like this: 1. Get a Fedora rawhide kernel with userspace SHSTK support. 2. Get the glibc patches from H.J., and gate them behind a tunable (off by default). Kernel behavior should not change with this new glibc because the required arch_prctl does not happen (and the old ones currently in glibc have different numbers). 3. Run the Fedora graphical desktop with the tunable switched on and a few key third-party applications to see where we stand in terms of compatibility. 3b Do the same thing with RHEL and some enterprise applications (using the kernel and glibc from 1 & 2 for a start). 4. (Optional.) Flip the default of the tunable to on. I don't know how quickly we can get past step 1, but it seems fairly soon, maybe three months, considering the upcoming end-of-year break. With SHSTK2 markup required by the kernel, it goes like this: 1. Get a Fedora rawhide kernel with userspace SHSTK support. 2. Get a SHSTK2-enabled toolchain. GCC is currently freezing for the 13 release, so this is not a good time of the year for that. It's probably going to be a custom compiler, unless we want to wait a couple of months, and even then it's got to be a downstream-only backport at first because to upstream, this will have a “not finished” whiff (it's the umpteenth ABI change). 3. Get the glibc patches from H.J. We would probably put it behind a tunable as well. 4. Rebuild key parts of Fedora, probably directly in rawhide (the rolling integration distribution). 5. Run the Fedora rawhide graphical desktop etc. 6. RHEL testing will require a SHSTK2 port to a different compiler and another mass rebuild. ISV application testing is not meaningful until the ISVs have switched to a newer compiler. That's going to take much longer than three months. Maybe we have to do this in the end, but even then, we have no way of forcing developers to test on SHSTK-capable hardware on new-enough before turning on the SHSTK2 bit. In the end, we might still need SHSTK2, but we don't know that yet, and the first approach is quite cheap, so I really want to try it. Keep in mind that just because some useful interface is provided by the kernel, we can't necessarily use it in glibc immediately because with all those seccomp filters out there (and other dependencies on internal glibc/kernel interface details), too much would break if we exposed it into existing applications without some coordination. SHSTK isn't *that* different, except that we have some binary markup to guide us at run time. > But I still don't see why doing the order: > 1. kernel support > 2. libc support > 3. compiler support > > ...wouldn't have generated a more normal situation where old binaries > don't break against new kernels and testing can easily happen to reduce > issues further. So we could still reset and do exactly that. No matter in which order you do it, some group will want to change ABI or semantics. We actually had multiple different iterations in different orders, and everybody wanted to put their mark onto this feature, changing the ABI. I don't care at all about the internal ABI between glibc and the kernel, but the markup of the binaries (besides glibc itself) is quite important to me. In retrospect, separating SHSTK from IBT from the start would have helped a lot because I think we could have done that in libc without compiler support. But I don't think anyone expected this to take four to five years to implement (or probably longer for IBT). >> Instead, we'd have to >> wait for a rebuild with the new markers, and of course this rebuild >> will >> put is in exactly the same position as before: the incompatibilities >> will be back because they are no longer masked by the kernel. > > People building new apps and testing them against upstream kernels and > finding issues sounds like business as usual. I'm not trying to solve > all possible userspace mistakes from the kernel. They also have to test on the right hardware and with a new/unreleased glibc. I think it would be helpful to those developers if we could give them an existing distribution early on they can use for experiments. Not just getting SHSTK going, but also playing with the perf integration (which to me is the real goal here). Thanks, Florian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: CET shadow stack app compatibility 2022-12-02 18:48 ` Florian Weimer @ 2022-12-05 19:02 ` Edgecombe, Rick P 0 siblings, 0 replies; 9+ messages in thread From: Edgecombe, Rick P @ 2022-12-05 19:02 UTC (permalink / raw) To: fweimer@redhat.com Cc: Torvalds, Linus, keescook@chromium.org, linux-kernel@vger.kernel.org, hjl.tools@gmail.com, x86@kernel.org On Fri, 2022-12-02 at 19:48 +0100, Florian Weimer wrote: > * Rick P. Edgecombe: > > > For IBT, which seems to be in worse shape than shadow stack from an > > existing userspace perspective, I have also seen shared objects > > with > > issues. > > > > For shadow stack, it was just JITing binaries. > > Except that the actual JITters are usually in shared objects, too, > and > you just assume here that they get loaded by a main program from the > same build. 8-) I think most of them are reusable independently, or > are > bundled into applications built with a different toolchain. So I guess the situation must be a SHSTK2 binary dlopen()s a broken SHSTK1 DSO (broken because of JITing or whatever) using a future version of glibc. It would depend on how the future implementation of SHSTK2 in glibc would handle this. I can only hope glibc would do the right thing to avoid whatever situation caused the creation of SHSTK2. If the scenario is SHSTK1 binary dlopen()s a broken SHSTK1 DSO, it would already not have shadow stack because SHSTK1 was blocked from getting shadow stack enabled. > > > Of course if glibc is compiled in non-permissive mode there is an > > additional category of issues around dlopen()ing that we haven't > > even > > discussed yet. And the past issues around makecontext() we have > > already worked around from the kernel. If you are aware of any > > other > > specific compatibility problems, please share so we can discuss the > > extent. > > H.J. ran most of the experiments on Fedora. We did some early > validation many years ago, using the first ABI iteration. We didn't > have as much reach as we liked in terms of hardening at the time, if > I > recall correctly, but there were only very few cases where something > did > not work and was also not marked as incompatible. I think most binaries will work automatically. The problem is the standard is not "doesn't break *too many* binaries". > > > > The posted hack didn't even > > > deal with that case. If the main executable has the current > > > markers, > > > the kernel will not disable shadow stack, and the process will > > > still > > > crash after loading the incorrectly marked shared object. > > > > The proposed glibc changes would not enable shadow stack unless the > > execing binary has the elf bit marked. So if we block those > > binaries > > (which the kernel can easily check) from enabling shadow stack, > > none of > > the linked shared objects will have shadow stack either. So I think > > we > > are ok to hold this in our back pocket to resolve the known issues > > if > > anyone complains. > > See above, the assumption that the JITter and the main program come > from > the same build that is implicit in this is not actually true in > practice. Hmm, not sure I understand your point. Are you saying that the kernel can't resolve the found issues by blocking SHSTK1 execing binaries? I think it can by depending on nice future glibc behavior. In general, the point that the kernel can't fully stop userspace from breaking itself is well taken. > > > Where the shared objects could come into play is, in the event that > > we > > have to block the old elf bit from the kernel, and a new one is > > properly marked on a new executable, future glibcs could decide to > > honor the old bits when checking shared libraries. So you could > > have an > > executable with SHSTK2 bit loading a problem SO with just SHSTK1 > > bit. > > Right. But we can also have policies in userspace to paper over > this. > I'm not worried about it. I want to see how far we can get before > making the flip in an upstream version of glibc, but if the kernel > enforces SHSTK2 (even just on executables), I need a toolchain update > plus a rebuild of a large chunk of the distribution. The existing gcc's assume wrong ABI as well, so it's probably safest to use an updated toolchain in any case. I wasn't able to find any binaries that broke because of the GCC issues, but it wasn't an exhaustive search. But remember, even that filter patch had a Kconfig to disable it. Distros with the resources to test everything on SHSTK hardware and users that don't build their own glibcs could probably minimize the impact. But smaller distros or users could at least not be surprised or wait for SHSTK2 to make its way through. > > So with reusing SHSTK1 markup, it goes like this: > > 1. Get a Fedora rawhide kernel with userspace SHSTK support. > 2. Get the glibc patches from H.J., and gate them behind a tunable > (off by default). Kernel behavior should not change with this > new glibc because the required arch_prctl does not happen > (and the old ones currently in glibc have different numbers). > 3. Run the Fedora graphical desktop with the tunable switched on and > a few key > third-party applications to see where we stand in terms of > compatibility. > 3b Do the same thing with RHEL and some enterprise applications > (using the kernel and glibc from 1 & 2 for a start). > 4. (Optional.) Flip the default of the tunable to on. > > I don't know how quickly we can get past step 1, but it seems fairly > soon, maybe three months, considering the upcoming end-of-year break. > > With SHSTK2 markup required by the kernel, it goes like this: > > 1. Get a Fedora rawhide kernel with userspace SHSTK support. > 2. Get a SHSTK2-enabled toolchain. GCC is currently freezing for the > 13 > release, so this is not a good time of the year for that. It's > probably going to be a custom compiler, unless we want to wait a > couple of months, and even then it's got to be a downstream-only > backport at first because to upstream, this will have a “not > finished” whiff (it's the umpteenth ABI change). > 3. Get the glibc patches from H.J. We would probably put it behind > a tunable as well. > 4. Rebuild key parts of Fedora, probably directly in rawhide (the > rolling integration distribution). > 5. Run the Fedora rawhide graphical desktop etc. > 6. RHEL testing will require a SHSTK2 port to a different compiler > and another mass rebuild. ISV application testing is not > meaningful > until the ISVs have switched to a newer compiler. > > That's going to take much longer than three months. Maybe we have to > do > this in the end, but even then, we have no way of forcing developers > to > test on SHSTK-capable hardware on new-enough before turning on the > SHSTK2 bit. > > In the end, we might still need SHSTK2, but we don't know that yet, > and > the first approach is quite cheap, so I really want to try it. Yes, this is the working plan at this point. I removed the elf header bit filter in the latest revision. I still personally would favor starting over with SHSTK2 from the beginning, even if it led to slower roll out. That would be a feature, not a bug, in my view. If we do end up needing SHSTK2 though, then it resets the clock and the rollout is the slowest of the possibilities. > > Keep in mind that just because some useful interface is provided by > the > kernel, we can't necessarily use it in glibc immediately because with > all those seccomp filters out there (and other dependencies on > internal > glibc/kernel interface details), too much would break if we exposed > it > into existing applications without some coordination. SHSTK isn't > *that* different, except that we have some binary markup to guide us > at > run time. The thing that is rare is that the way that is has been rolled out restricts existing behavior under the nose of the application developers AND it depends on kernel/HW support. In the analogy of forced compiler hardening options, as best I can tell (I'm educating myself on this history only recently), larger distros started doing this and found and fixed the issues. Then smaller ones picked it up after that. With shadow stack, we seem to be well down this path already because of the lack of kernel support. > > > But I still don't see why doing the order: > > 1. kernel support > > 2. libc support > > 3. compiler support > > > > ...wouldn't have generated a more normal situation where old > > binaries > > don't break against new kernels and testing can easily happen to > > reduce > > issues further. So we could still reset and do exactly that. > > No matter in which order you do it, some group will want to change > ABI > or semantics. We actually had multiple different iterations in > different orders, and everybody wanted to put their mark onto this > feature, changing the ABI. I don't care at all about the internal > ABI > between glibc and the kernel, but the markup of the binaries (besides > glibc itself) is quite important to me. I'm late to this project, but for my changes to the enablement ABI I really had no choice. I preferred SHSTK2 to resolve the boot problems too and we did this other ABI change after extreme resistance from the glibc side. So it was really trying to prevent an insta-revert rather then putting any marks on anything. Whatever the spec, we really need to prevent compatibility sensitive features like this from making it upstream in userspace before the kernel changes. The kernel has high backwards compatibility standards. To try to achieve this, it should have flexibility to design its own ABI. Putting the userspace changes upstream ahead of time for a feature like this constrains the kernel. The idea that userspace can finalize on all the bits and ABI for future features and then wait lurking to cause kernel regressions if the kernel doesn't match is wrong. It also caused these concrete issues. So hopefully everyone is on the same page about this for the future. Just want to be clear in case. > > In retrospect, separating SHSTK from IBT from the start would have > helped a lot because I think we could have done that in libc without > compiler support. But I don't think anyone expected this to take > four > to five years to implement (or probably longer for IBT). > > > > Instead, we'd have to > > > wait for a rebuild with the new markers, and of course this > > > rebuild > > > will > > > put is in exactly the same position as before: the > > > incompatibilities > > > will be back because they are no longer masked by the kernel. > > > > People building new apps and testing them against upstream kernels > > and > > finding issues sounds like business as usual. I'm not trying to > > solve > > all possible userspace mistakes from the kernel. > > They also have to test on the right hardware and with a > new/unreleased > glibc. > > I think it would be helpful to those developers if we could give them > an > existing distribution early on they can use for experiments. Not > just > getting SHSTK going, but also playing with the perf integration > (which > to me is the real goal here). > > Agreed. A Kconfig or sysctl would have worked fine for this purpose though. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: CET shadow stack app compatibility 2022-11-14 23:15 CET shadow stack app compatibility Edgecombe, Rick P 2022-11-15 2:01 ` Linus Torvalds @ 2022-11-15 9:43 ` Peter Zijlstra 2022-11-15 17:04 ` Edgecombe, Rick P 1 sibling, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2022-11-15 9:43 UTC (permalink / raw) To: Edgecombe, Rick P Cc: Torvalds, Linus, keescook@chromium.org, linux-kernel@vger.kernel.org, fweimer@redhat.com, hjl.tools@gmail.com, x86@kernel.org Let me hijack this and go off on a tangent.. On Mon, Nov 14, 2022 at 11:15:44PM +0000, Edgecombe, Rick P wrote: > The breakage derives from how the decision is made on whether to enable > shadow stack enforcement. Glibc will do this by checking a bit in the > elf header of the binary. It then tells the kernel to turn CET on via a > separate kernel API. But instead of this elf bit being selected by > application developers, it was mostly applied in various automated ways > (mostly default on) by distro builds for years. This huge amount of > untested enablement has not generated any visible issues for users yet, > because without kernel support the presence of this bit has not > generated any actual CET enforcement. CET is two things, ideally we're fully eradicate the term CET, never again mention CET, ever. Whoever at Intel decided to push that term has created so much confusion it's not funny :/ The feature at hand here is backward edge control flow -- or shadow stacks (the means to implement this). Be explicit about this, do *NOT* use CET ever again. The other thing CET has is forward edge control flow -- or indirect branch tracking, this is a completely different and independent feature and not advertised or implemented here. These things are obviously related, but since they're two independent features there's the endless confusion as to which is actually meant. (go (re)watch the last plumbers conf talks on the subject -- there's always someone who gets is wrong) The only things that should have CET in their name are the CR4 bit and the two MSRs, nothing more. ELF bits should not, must not, be called CET. API, not CET, Compiler features, also not CET. (and I know it's too late to eradicate some of it, but please, at least make sure the kernel doesn't propagate this nonsense). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: CET shadow stack app compatibility 2022-11-15 9:43 ` Peter Zijlstra @ 2022-11-15 17:04 ` Edgecombe, Rick P 2022-11-15 18:45 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Edgecombe, Rick P @ 2022-11-15 17:04 UTC (permalink / raw) To: peterz@infradead.org Cc: Torvalds, Linus, keescook@chromium.org, linux-kernel@vger.kernel.org, fweimer@redhat.com, hjl.tools@gmail.com, x86@kernel.org On Tue, 2022-11-15 at 10:43 +0100, Peter Zijlstra wrote: > CET is two things, ideally we're fully eradicate the term CET, never > again mention CET, ever. Whoever at Intel decided to push that term > has > created so much confusion it's not funny :/ > > The feature at hand here is backward edge control flow -- or shadow > stacks (the means to implement this). Be explicit about this, do > *NOT* > use CET ever again. > > The other thing CET has is forward edge control flow -- or indirect > branch tracking, this is a completely different and independent > feature > and not advertised or implemented here. > > These things are obviously related, but since they're two independent > features there's the endless confusion as to which is actually meant. > > (go (re)watch the last plumbers conf talks on the subject -- there's > always someone who gets is wrong) > > The only things that should have CET in their name are the CR4 bit > and > the two MSRs, nothing more. The only other place in the kernel where it has to be that way is the "control protection" fault handler. I agree it's confusing, but when you talk about "shadow stacks", a lot of people don't connect it to the HW feature. Where as they have heard of CET. So for contexts like this, I thought it was useful to jog memories. I could put more distance between it... "x86 shadow stacks (you may have heard of CET)". > > ELF bits should not, must not, be called CET. API, not CET, Compiler > features, also not CET. So the arch_prctl()s can't be shared between shadow stack and IBT? They don't have to be, but this is a new thing after a fair amount of earlier discussion. > > (and I know it's too late to eradicate some of it, but please, at > least > make sure the kernel doesn't propagate this nonsense). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: CET shadow stack app compatibility 2022-11-15 17:04 ` Edgecombe, Rick P @ 2022-11-15 18:45 ` Peter Zijlstra 0 siblings, 0 replies; 9+ messages in thread From: Peter Zijlstra @ 2022-11-15 18:45 UTC (permalink / raw) To: Edgecombe, Rick P Cc: Torvalds, Linus, keescook@chromium.org, linux-kernel@vger.kernel.org, fweimer@redhat.com, hjl.tools@gmail.com, x86@kernel.org On Tue, Nov 15, 2022 at 05:04:40PM +0000, Edgecombe, Rick P wrote: > > ELF bits should not, must not, be called CET. API, not CET, Compiler > > features, also not CET. > > So the arch_prctl()s can't be shared between shadow stack and IBT? They > don't have to be, but this is a new thing after a fair amount of > earlier discussion. I would very strongly suggest IBT not use that interface and instead we follow ARM64 BTI's lead -- such that application developers don't go insane trying to use two nearly identical solutions. I mean, the toolchain folks made a godawefull mess of things, but we don't have to. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-12-05 19:03 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-14 23:15 CET shadow stack app compatibility Edgecombe, Rick P 2022-11-15 2:01 ` Linus Torvalds 2022-11-15 7:33 ` Florian Weimer 2022-11-15 16:57 ` Edgecombe, Rick P 2022-12-02 18:48 ` Florian Weimer 2022-12-05 19:02 ` Edgecombe, Rick P 2022-11-15 9:43 ` Peter Zijlstra 2022-11-15 17:04 ` Edgecombe, Rick P 2022-11-15 18:45 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox