* AF_ALG hardening [not found] ` <20260502033556.GA3872267@google.com> @ 2026-05-02 4:52 ` Demi Marie Obenour 2026-05-02 8:19 ` Simon Richter 2026-05-02 19:16 ` Eric Biggers [not found] ` <20260502035402.GB3872267@google.com> 1 sibling, 2 replies; 10+ messages in thread From: Demi Marie Obenour @ 2026-05-02 4:52 UTC (permalink / raw) To: Eric Biggers Cc: Jan Schaumann, iwd, Linux kernel mailing list, linux-crypto, Herbert Xu [-- Attachment #1.1.1: Type: text/plain, Size: 5269 bytes --] On 5/1/26 23:35, Eric Biggers wrote: > On Fri, May 01, 2026 at 08:21:27PM -0400, Demi Marie Obenour wrote: >> I think the single biggest hardening win for AF_ALG would be to move >> to the crypto library. The recent CVEs you mentioned mostly seem >> to relate to the crypto API, and with a hard-coded list of allowed >> algorithms there's no need to use the crypto API anymore. I'm not >> familiar enough with kernel code to do this easily, but for anyone >> with basic knowledge of the existing code it should (hopefully) be >> straightforward. >> >> In the meantime, only using synchronous algorithms and not using >> hardware drivers would also be a useful simplification. The latter >> would make it especially clear that AF_ALG is deprecated, because >> its one potential advantage (being able to use hardware acceleration) >> would no longer be present. > > The kernel's crypto library > (https://docs.kernel.org/crypto/libcrypto.html) does greatly simplify a > lot of kernel code that needs to use crypto algorithms. Yes, AF_ALG > doesn't use it directly yet. Currently AF_ALG puts all the data in > (zero-copy) scatterlists, then invokes the "traditional crypto API" > which is very complex and has full scatterlist support, asynchronous > execution support, an algorithm template system, etc. In some cases the > crypto library is then used internally, but it's not called directly. > > So the idea would be something along the lines of: > > - Add an algorithm allowlist to AF_ALG. It would include only what the > small set of userspace programs that uses it actually needs. Bizarre > stuff like "authencesn" wouldn't be included. > > - Change AF_ALG to make it copy any data written to an AF_ALG file > descriptor into an internal kernel buffer. Put the output in another > internal kernel buffer, then copy it to userspace. No zero-copy, and > no scatterlists. Both restrictions would greatly reduce the chance of > bugs: the actual crypto algorithms would operate only on these > internal buffers, not on pagecache data (e.g. the contents of 'su') or > buffers that userspace can concurrently modify. The use of simple > virtual addresses would eliminate all the scatterlist complexity. > > - AF_ALG would implement each algorithm by invoking the corresponding > the crypto library functions > (https://docs.kernel.org/crypto/libcrypto.html#api-documentation). No > asynchronous execution, no buggy hardware crypto drivers, etc. Yup! That's exactly the idea. > It sounds good to me. For people who feel like the su binary on their > system is a bit too restrictive and would like to fix that, these > changes might not be all that great for them. But for the rest of us, > they should work rather well. > > Of course, it'll also be a fair a bit of work, and unfortunately I also > expect pushback from people who (incorrectly IMO) think that AF_ALG > performance is important, even moreso than security. If one cares about crypto offload performance, they would be better served by creating a better interface to it than AF_ALG. AF_ALG is a horrible API with (presumably) tons of overhead. I know the QAT driver and an Nvidia BlueField DPU accelerator driver both bypass it. Furthermore, AF_ALG only supports symmetric algorithms. These algorithms are inexpensive in software, so the cost of going to an accelerator and back is enormous compared to the cost of a single operation. For offload to even a very fast accelerator to make sense, one must be able to deeply pipeline requests. However, this creates a huge amount of additional complexity for software. On the other hand, asymmetric cryptography performs far more work per operation. This might (no benchmarks!) mean that offloading asymmetric algorithms makes more sense than offloading symmetric ones. The cost of sending the work to the accelerator and waiting for completion is less than the time needed to perform the operation, so even a synchronous interface could still be faster than running the algorithm on the CPU. Furthermore, long-term keys are very often asymmetric, so DPA and EMA protections are much more likely to be relevant here. Asymmetric accelerators also don't have a better alternative in the form of inline encryption hardware. I think a high performance interface to hardware cryptography (and, more importantly, compression) would look much more like RDMA. There would be a kernel driver that did the bare minimum to provide isolation between userspace programs, and a userspace driver that was responsible for abstracting over the hardware. > Either way, the first step will be to create the algorithm allowlist, > which should happen anyway, regardless of the other changes. The simplest changes I can see are: 1. Get rid of zero-copy support (splice()). 2. Get rid of AIO support. 3. Only allow software implementations. All of these are really simple. I can send patches, but be warned that they would only be compile-tested, as I don't know how to test the code. I removed oss-security from CC as this is now a Linux kernel development discussion. -- Sincerely, Demi Marie Obenour (she/her/hers) [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 7253 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: AF_ALG hardening 2026-05-02 4:52 ` AF_ALG hardening Demi Marie Obenour @ 2026-05-02 8:19 ` Simon Richter 2026-05-02 20:42 ` Demi Marie Obenour 2026-05-02 19:16 ` Eric Biggers 1 sibling, 1 reply; 10+ messages in thread From: Simon Richter @ 2026-05-02 8:19 UTC (permalink / raw) To: Demi Marie Obenour, Eric Biggers Cc: Jan Schaumann, iwd, Linux kernel mailing list, linux-crypto, Herbert Xu [-- Attachment #1.1: Type: text/plain, Size: 4788 bytes --] Hi, On 5/2/26 13:52, Demi Marie Obenour wrote: >> Of course, it'll also be a fair a bit of work, and unfortunately I also >> expect pushback from people who (incorrectly IMO) think that AF_ALG >> performance is important, even moreso than security. AF_ALG performance (time/power) is important in the way that it's literally the only point to its existence. If all it provides is extra overhead over a software implementation, then it makes no sense to keep it. > If one cares about crypto offload performance, they would be better > served by creating a better interface to it than AF_ALG. AF_ALG is > a horrible API with (presumably) tons of overhead. I know the QAT > driver and an Nvidia BlueField DPU accelerator driver both bypass it. The API is designed to be zerocopy, that's why it's this horrible combination of socket API and splice(). The general assumption here is that it does not make sense to offload small requests in the first place, and application programmers are aware of that. The use case is "I have a file or pipe full of data and a device with a kernel driver that should process it, can we somehow avoid copying the data to userspace only to immediately copy it back to kernelspace?" This copying is even more silly if the actual question I have in userspace is "what is the SHA256 checksum of this file?" or "what is the SHA256 checksum of the string 'blob 8794311528\0' followed by this file?" (where you can see why anyone would ask such a silly question and prefer to use the dedicated hardware that processes 24 GB/s over the CPU at 100 MB/s) > Furthermore, AF_ALG only supports symmetric algorithms. These > algorithms are inexpensive in software, so the cost of going to an > accelerator and back is enormous compared to the cost of a single > operation. Yes, initial setup cost is high, so this only makes sense for large requests or batches (submitting individual requests is generally cheap, the difficulty is ensuring the data is accessible to the hardware). That's also why there are no asymmetric algorithms: these aren't generally used on large amounts of data, so it's never worth it to offload these. It would make sense to offload asymmetric algorithms if there was a secure key storage inside the device, but AFAIK the API does not support that, or even the notion of on-device contexts. It is not a good API, and it sits on top of the ahash/acomp/acrypt interfaces which are also unfriendly to accelerator hardware. > For offload to even a very fast accelerator to make sense, > one must be able to deeply pipeline requests. However, this creates > a huge amount of additional complexity for software. Software that has requirements like that is already complex -- if I have a few thousand workload packets, I need a worker pool. If I don't have these requirements, then indeed I am better off with a software-only solution in userspace, because it is not relevant from a performance standpoint. > Asymmetric accelerators also don't have a better alternative in the > form of inline encryption hardware. Quite a number of architectures do not have inline encryption support, and these are more likely to use offload hardware even for smaller requests (e.g. for power saving). > I think a high performance interface to hardware cryptography (and, > more importantly, compression) would look much more like RDMA. > There would be a kernel driver that did the bare minimum to provide > isolation between userspace programs, and a userspace driver that > was responsible for abstracting over the hardware. Offload hardware comes in two flavours: the high-throughput kind, built into devices where no one cares about power, and the lower-power-than-the-CPU-doing-it kind. The former can easily provide user contexts even in virtualized environments, but the latter is generally found in systems that do not even have an IOMMU. Either we have two distinct interfaces for these, or we need one that can handle either. My feeling is that no one is happy with either AF_ALG or the asynchronous interfaces in general, so I think they should be removed completely, and there should be a separate "offload" SIG that creates new interfaces that are actually usable with current hardware. > 1. Get rid of zero-copy support (splice()). > 2. Get rid of AIO support. > 3. Only allow software implementations. That makes sense if we're forced to keep the interface for now, but it means that offload support through the crypto subsystem is completely dead, and anyone wanting to support offload hardware needs to go elsewhere. Can we get a definitive statement that this is intended? Simon [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: AF_ALG hardening 2026-05-02 8:19 ` Simon Richter @ 2026-05-02 20:42 ` Demi Marie Obenour 0 siblings, 0 replies; 10+ messages in thread From: Demi Marie Obenour @ 2026-05-02 20:42 UTC (permalink / raw) To: Simon Richter, Eric Biggers Cc: Jan Schaumann, iwd, Linux kernel mailing list, linux-crypto, Herbert Xu [-- Attachment #1.1.1: Type: text/plain, Size: 7336 bytes --] On 5/2/26 04:19, Simon Richter wrote: > Hi, > > On 5/2/26 13:52, Demi Marie Obenour wrote: > >>> Of course, it'll also be a fair a bit of work, and unfortunately I also >>> expect pushback from people who (incorrectly IMO) think that AF_ALG >>> performance is important, even moreso than security. > > AF_ALG performance (time/power) is important in the way that it's > literally the only point to its existence. If all it provides is extra > overhead over a software implementation, then it makes no sense to keep it. The only reason for keeping it is for compatibility with existing userspace. >> If one cares about crypto offload performance, they would be better >> served by creating a better interface to it than AF_ALG. AF_ALG is >> a horrible API with (presumably) tons of overhead. I know the QAT >> driver and an Nvidia BlueField DPU accelerator driver both bypass it. > > The API is designed to be zerocopy, that's why it's this horrible > combination of socket API and splice(). The general assumption here is > that it does not make sense to offload small requests in the first > place, and application programmers are aware of that. > > The use case is "I have a file or pipe full of data and a device with a > kernel driver that should process it, can we somehow avoid copying the > data to userspace only to immediately copy it back to kernelspace?" > > This copying is even more silly if the actual question I have in > userspace is "what is the SHA256 checksum of this file?" or "what is the > SHA256 checksum of the string 'blob 8794311528\0' followed by this > file?" (where you can see why anyone would ask such a silly question and > prefer to use the dedicated hardware that processes 24 GB/s over the CPU > at 100 MB/s) Do you have a specific device that has such hardware and can use an upstream kernel? I have yet to see any concrete examples. >> Furthermore, AF_ALG only supports symmetric algorithms. These >> algorithms are inexpensive in software, so the cost of going to an >> accelerator and back is enormous compared to the cost of a single >> operation. > > Yes, initial setup cost is high, so this only makes sense for large > requests or batches (submitting individual requests is generally cheap, > the difficulty is ensuring the data is accessible to the hardware). > > That's also why there are no asymmetric algorithms: these aren't > generally used on large amounts of data, so it's never worth it to > offload these. Asymmetric cryptography is far more expensive than symmetric cryptography. Tens of microseconds or more on a high-end CPU I believe. That is more than enough time to justify going to an accelerator and back, if the accelerator can do the job significantly faster. > It would make sense to offload asymmetric algorithms if there was a > secure key storage inside the device, but AFAIK the API does not support > that, or even the notion of on-device contexts. > > It is not a good API, and it sits on top of the ahash/acomp/acrypt > interfaces which are also unfriendly to accelerator hardware. Not surprised. >> For offload to even a very fast accelerator to make sense, >> one must be able to deeply pipeline requests. However, this creates >> a huge amount of additional complexity for software. > > Software that has requirements like that is already complex -- if I have > a few thousand workload packets, I need a worker pool. Or a thread-per-core architecture. > If I don't have these requirements, then indeed I am better off with a > software-only solution in userspace, because it is not relevant from a > performance standpoint. > >> Asymmetric accelerators also don't have a better alternative in the >> form of inline encryption hardware. > > Quite a number of architectures do not have inline encryption support, > and these are more likely to use offload hardware even for smaller > requests (e.g. for power saving). Please provide a real-world example where that using the accelerator really does save power compared to running the cryptography on the CPU. >> I think a high performance interface to hardware cryptography (and, >> more importantly, compression) would look much more like RDMA. >> There would be a kernel driver that did the bare minimum to provide >> isolation between userspace programs, and a userspace driver that >> was responsible for abstracting over the hardware. > > Offload hardware comes in two flavours: the high-throughput kind, built > into devices where no one cares about power, and the > lower-power-than-the-CPU-doing-it kind. Again, please provide benchmarks. I have yet to see a real-world example where the accelerator is faster for short (read: realistic) message sizes. Eric Biggers has provided many where it is far slower, and didn't find any situation where it saved power. For very long messages, yes, it can be faster. But I have yet to see a situation where (a) performance for large files matters and (b) there is an accelerator worth using. Network and storage encryption is obviously performance-critical, but it uses small messages. Furthermore, both of them are well-suited to inline cryptographic engines, which are much more efficient. I mostly associate large file encryption and hashing with things like verifying software updates. On large systems, this matters because a human is waiting. However, these systems are also ones for which software cryptography is very fast. On small systems, I expect update validation to be much less performance-critical. > The former can easily provide user contexts even in virtualized > environments, but the latter is generally found in systems that do not > even have an IOMMU. Either we have two distinct interfaces for these, or > we need one that can handle either. > > My feeling is that no one is happy with either AF_ALG or the > asynchronous interfaces in general, so I think they should be removed > completely, and there should be a separate "offload" SIG that creates > new interfaces that are actually usable with current hardware. > > > 1. Get rid of zero-copy support (splice()). > > 2. Get rid of AIO support. > > 3. Only allow software implementations. > > That makes sense if we're forced to keep the interface for now, but it > means that offload support through the crypto subsystem is completely > dead, and anyone wanting to support offload hardware needs to go > elsewhere. Can we get a definitive statement that this is intended? AF_ALG is dead. Much of the rest of the kernel is moving from the crypto API to the software-only crypto library. Offload is far more complex than software cryptography, so there needs to be a substantial benefit to justify using it. Have you seen any real-world cases of this? Inline encryption hardware (both in storage controllers and in NICs) is definitely a win, but it doesn't use the crypto API at all. It's easy to provide synthetic benchmarks where offload is a win, but synthetic benchmarks don't justify a giant CVE magnet. If offload is a win in the real world, then it should be possible to demonstrate this. -- Sincerely, Demi Marie Obenour (she/her/hers) [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 7253 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: AF_ALG hardening 2026-05-02 4:52 ` AF_ALG hardening Demi Marie Obenour 2026-05-02 8:19 ` Simon Richter @ 2026-05-02 19:16 ` Eric Biggers 2026-05-04 19:01 ` Simon Richter 1 sibling, 1 reply; 10+ messages in thread From: Eric Biggers @ 2026-05-02 19:16 UTC (permalink / raw) To: Demi Marie Obenour Cc: Jan Schaumann, iwd, Linux kernel mailing list, linux-crypto, Herbert Xu On Sat, May 02, 2026 at 12:52:57AM -0400, Demi Marie Obenour wrote: > > Either way, the first step will be to create the algorithm allowlist, > > which should happen anyway, regardless of the other changes. > > The simplest changes I can see are: > > 1. Get rid of zero-copy support (splice()). > 2. Get rid of AIO support. > 3. Only allow software implementations. > > All of these are really simple. I can send patches, but be warned > that they would only be compile-tested, as I don't know how to test > the code. If you're interested, please send patches, and we'll see where things go from there. We need to get more people helping with this stuff. For (1), it probably should work like the way the zero-copy support was disabled in the 6.1 LTS kernel last year, where (I think) the splice() syscall still succeeds but it just copies the data. For (2) and (3), you can find examples of disabling asynchronous crypto API stuff at https://lore.kernel.org/linux-fscrypt/20250704070322.20692-1-ebiggers@kernel.org/ and https://lore.kernel.org/linux-fscrypt/20250708181313.66961-1-ebiggers@kernel.org/. Note that to request a synchronous algorithm you have to pass CRYPTO_ALG_ASYNC (yes, really). I think there are a few test scripts for AF_ALG in libkcapi. Besides that AF_ALG is barely tested. So you're in good company. - Eric ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: AF_ALG hardening 2026-05-02 19:16 ` Eric Biggers @ 2026-05-04 19:01 ` Simon Richter 2026-05-04 19:54 ` Eric Biggers 0 siblings, 1 reply; 10+ messages in thread From: Simon Richter @ 2026-05-04 19:01 UTC (permalink / raw) To: Eric Biggers, Demi Marie Obenour Cc: Jan Schaumann, iwd, Linux kernel mailing list, linux-crypto, Herbert Xu [-- Attachment #1.1: Type: text/plain, Size: 1823 bytes --] Hi, On 5/3/26 04:16, Eric Biggers wrote: > On Sat, May 02, 2026 at 12:52:57AM -0400, Demi Marie Obenour wrote: >> The simplest changes I can see are: >> 1. Get rid of zero-copy support (splice()). >> 2. Get rid of AIO support. >> 3. Only allow software implementations. > For (2) and (3), you can find examples of disabling asynchronous crypto I think we need to make up our minds here. This thread is about removing asynchronous implementations and accelerator support from AF_ALG, so it can support legacy applications with known-good implementations, while the other thread[1] is about removing everything *but* accelerator support from AF_ALG -- and as accelerators are typically asynchronous, this aspect has to stay as well. At least with the opposite proposals, it would be good to know which one is official policy. At the same time, the third thread[2] deprecates AF_ALG because of its wonky security posture, while newer accelerators are implementing their own userspace interfaces because AF_ALG is too limited, so we're already replacing one CVE magnet with several independent ones, and deprecating AF_ALG means that future drivers will add even more of those because there is no longer a common framework to attach to. Also, if AF_ALG is deprecated and the kernel no longer uses ahash/acrypt/acomp internally, there is no point in accelerator cards even registering with the crypto subsystem. Should that be an explicit policy "accelerator cards are outside the scope of the crypto subsystem, even if they implement a cryptographic algorithm"? Simon [1] https://lore.kernel.org/linux-crypto/112bf0af-1551-4d3e-ab15-e5dea3fc2435@app.fastmail.com/ [2] https://lore.kernel.org/linux-crypto/20260430011544.31823-1-ebiggers@kernel.org/ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: AF_ALG hardening 2026-05-04 19:01 ` Simon Richter @ 2026-05-04 19:54 ` Eric Biggers 0 siblings, 0 replies; 10+ messages in thread From: Eric Biggers @ 2026-05-04 19:54 UTC (permalink / raw) To: Simon Richter Cc: Demi Marie Obenour, Jan Schaumann, iwd, Linux kernel mailing list, linux-crypto, Herbert Xu On Tue, May 05, 2026 at 04:01:47AM +0900, Simon Richter wrote: > Hi, > > On 5/3/26 04:16, Eric Biggers wrote: > > > On Sat, May 02, 2026 at 12:52:57AM -0400, Demi Marie Obenour wrote: > > > > The simplest changes I can see are: > > > > 1. Get rid of zero-copy support (splice()). > > > 2. Get rid of AIO support. > > > 3. Only allow software implementations. > > > For (2) and (3), you can find examples of disabling asynchronous crypto > > I think we need to make up our minds here. > > This thread is about removing asynchronous implementations and accelerator > support from AF_ALG, so it can support legacy applications with known-good > implementations, while the other thread[1] is about removing everything > *but* accelerator support from AF_ALG -- and as accelerators are typically > asynchronous, this aspect has to stay as well. > Thread [1] is a patch that removes the kernel's last architecture-optimized implementation of MD5, which is a broken and deprecated algorithm anyway. So it's not just AF_ALG that's motivating that particular patch, but also a desire to focus effort on modern algorithms and keep the different Linux architectures consistent. So I think the scope of that thread is more narrow than what you're claiming. Also, it's already been established that for now AF_ALG will have to keep the software code used by a small set of userspace programs such as iwd. So no, it cannot be completely removed yet (except on systems that don't use any of these programs, where it can be already). However, that doesn't mean that we shouldn't be nudging people towards better solutions, with an eye towards future attack surface reductions. > At least with the opposite proposals, it would be good to know which one is > official policy. > > At the same time, the third thread[2] deprecates AF_ALG because of its wonky > security posture, while newer accelerators are implementing their own > userspace interfaces because AF_ALG is too limited, so we're already > replacing one CVE magnet with several independent ones, and deprecating > AF_ALG means that future drivers will add even more of those because there > is no longer a common framework to attach to. It's long been clear that by far the best way to accelerate symmetric crypto is to just put it in the CPU, or in-line in the storage or network controller. Indeed, that's what almost everyone does now. So I would expect the demand for this kind of interface to symmetric crypto to continue to decline, as it already has been for a long time. And as you pointed out, AF_ALG doesn't work well for it anyway, which makes AF_ALG increasingly kind of besides the point. > Also, if AF_ALG is deprecated and the kernel no longer uses > ahash/acrypt/acomp internally, there is no point in accelerator cards even > registering with the crypto subsystem. Should that be an explicit policy > "accelerator cards are outside the scope of the crypto subsystem, even if > they implement a cryptographic algorithm"? There are still some in-kernel users of the asynchronous crypto APIs, for example IPsec and dm-crypt. So I think your prediction of the demise of these APIs is a bit premature as well. But yes, at least for the symmetric crypto, kernel subsystems have been been repeatedly seeing that the async support just isn't worth it. We'll see more kernel subsystems switching to sync-only. But in practice this is a gradual transition. Anyway, I don't think I'm proposing conflicting things. We can and should document a general deprecation of AF_ALG, while also helping update userspace programs to no longer use it, while also applying various hardening measures to reduce AF_ALG's attack surface as best we can in the meantime. There are multiple independent hardening measures that could be applied, and they will be up for discussion on the individual patches that implement them. - Eric ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20260502035402.GB3872267@google.com>]
* Re: [oss-security] CVE-2026-31431: CopyFail: linux local privilege scalation [not found] ` <20260502035402.GB3872267@google.com> @ 2026-05-02 6:39 ` Demi Marie Obenour [not found] ` <CAM=PXV4q2i13W8Z_AZGDfdxbqWANJ=U4Sw3FTcv5mH_QUrrSfA@mail.gmail.com> 0 siblings, 1 reply; 10+ messages in thread From: Demi Marie Obenour @ 2026-05-02 6:39 UTC (permalink / raw) To: Eric Biggers Cc: oss-security, Jan Schaumann, iwd, Linux kernel mailing list, Linux kernel mailing list [-- Attachment #1.1.1: Type: text/plain, Size: 1012 bytes --] On 5/1/26 23:54, Eric Biggers wrote: > On Sat, May 02, 2026 at 03:35:58AM +0000, Eric Biggers wrote: >> So the idea would be something along the lines of: > > And just to make sure no one gets the wrong impression: just because > there seem to be ways in which the attack surface of AF_ALG could/should > be reduced doesn't mean that userspace should keep using it (or even > worse, start to use it). Fixing programs like iwd needs to proceed > concurrently, so that eventually (some years down the line) the problem > can finally be fully solved by removing AF_ALG from the kernel source. > > - Eric Can AF_ALG be emulated using LD_PRELOAD? That would allow it to be eliminated from the kernel much more quickly, as one would not need to get rid of all of its existing users. It would even work for those who need AF_ALG because of closed source binaries, who otherwise will have no alternative other than running an old kernel in a VM. -- Sincerely, Demi Marie Obenour (she/her/hers) [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 7253 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAM=PXV4q2i13W8Z_AZGDfdxbqWANJ=U4Sw3FTcv5mH_QUrrSfA@mail.gmail.com>]
[parent not found: <afcqxCv58YrhbtVr@definition.pseudorandom.co.uk>]
* Re: [oss-security] CVE-2026-31431: CopyFail: linux local privilege scalation [not found] ` <afcqxCv58YrhbtVr@definition.pseudorandom.co.uk> @ 2026-05-03 19:20 ` Greg Dahlman 0 siblings, 0 replies; 10+ messages in thread From: Greg Dahlman @ 2026-05-03 19:20 UTC (permalink / raw) To: oss-security, linux-crypto, Linux kernel mailing list Note: re-adding the other lists so that people have an opportunity to correct my errors. "CAP_FOO in the init namespace" doesn't matter if "CAP_FOO" is the gate in the default namespace, namespaces a facade-pattern and not an isolation, unix abstract sockets, af_inet, vsock, af_alg etc... do not currently use credentials at all IIRC. LD_PRELOAD, as a way to (transparently) replace this functionality without user intervention , involves putting in an interposer to directly intercept all socket() calls at a system scale in this case, when it is typically a thread scope concept. I think socket() hasn't been interposable for at least a decade (in glibc) you will weaken overall security by reintroducing the PLT or... Note many people want to avoid adding in a `/etc/ld.so.preload` because fighting dynamic linker hijacking is not easy due to Unix-like systems having zero security boundary between the parent and child process. The bigger problem is that the embedded users are not where most of the friction is going to come from, while the motivations are similar, FIPS 140-3 validation, and downstream vendors which used distros validations, incorporated into regilitory, compliance, and governance is a large unidentified user base. Searching for "Kernel Crypto API" in the Module name on this site will show some of the upstream validations. https://csrc.nist.gov/projects/cryptographic-module-validation-program/validated-modules/search In the case of non path backed sockets, userns provides zero protections and only adds to the attack surface, the only credential use for non-path backed sockets currently is the restriction of ports below 1024 on af_inet, Remember namespace support is not implicit, and all af_family calls outside of those specific families that have namespace support all stay in the default namespace. If you dig through the $distro openssl security documents from the NIST link above from the vendors you will see why people liked the contract that af_alg offered, because they were depending on the kernel teams stable api and reputation. and they could simplify their compliance because it is easier to ensure no openssl installs exist at all on a system than to try and maintain compliance, governance, and regulatory obligations. While there are some use cases like firmware images on some embedded systems, where having the DMA pipe into a cryptoengine avoided Von Neumann bottleneck issues and CPU usage etc.. No matter how flawed it is to use af_alg, it provided a simple zero dependency interface that their tools already supported (socket) and reduced their lifecycle costs. The reduced performance of the hw crypto engines for smaller data sizes was acceptable as a trade-off, not as the primary driver in many cases. I should be 100% clear, namespaces _are not_ a security feature, but they can be leveraged to lower privileges and improve a security posture. But when you have interfaces like sockets (non unix like) the main advantage of network namespaces is they allow you to constrain something that due to historical reasons has almost zero controls (except tcp ports < 1024). But the default is for any new, legacy or other subsystem to only live in the default namespace. The friction is when ~4 out of the 40+ af_families is namespace but the rest are not. There is a very real problem with people overestimating the isolation capabilities of namespaces in general, but paying attention to the official documentation may help here: https://www.kernel.org/doc/html/latest/admin-guide/namespaces/compatibility-list.html The same is true for the IPC namespaces being shared - two users from different user namespaces should not access the same IPC objects even having equal UIDs. But currently this is not so. The "should not access" is a very different contract than most people expect. The FIPS/ISO compliance issue mostly invalidates what I hoped was an easy fix and putting a kernel call interposer via ld_preload will still add friction that is likely to block the aspirations of removing af_alg from the kernel. I think that there is a path to do so, and I think it would be best in the long run. But the friction here is not just from code changes, which are far easier to accomplish than the regulatory issues. The compliance based user base is one that is often far more challenging. I do still think that both userland and kernel would benefit from some mechanism that would make it easier for security teams, admins, and users to run with lower privileges. IMHO thinking about enabling that control will also be critical to the kernel team's ability to remain effective. Different use cases will always conflict, and non-namespace users would also benefit from ways to restrict access to af_families. IMHO if the team thinks af_alg is unfixable, it is maybe one of the rare cases where breaking changes are necessary. It may be more productive to help compliance based users migrate than provide a brittle shim that still invalidates all their authorizations anyway. I am not an expert on FIPS/ISO compliance, but I do know that providing guidance that helps users migrate would go a long way. You could say, have a userland process that provides a socket-like interface with guidance on how to wrap or create a their_socket() to migrate. I still think that for non af_inet/unix (file backed)socket af families, there needs to be a credentials mechanism. People are building systems on top of vsock and other non unix/if based systems that are just as vulnerable. Like af_alg, vsock is known to have serious issues and was designed for a trusted environment. Without an effective way to limit exposure from either userland or the kernel there is enough that is simply just unexplored that it will be expensive. On Sun, May 3, 2026 at 5:00 AM Simon McVittie <smcv@debian.org> wrote: > > On Sat, 02 May 2026 at 14:21:57 -0600, Greg Dahlman wrote: > >LD_PRELOAD and capabilities > > These seem orthogonal, rather than being part of the same idea. > > LD_PRELOAD is discretionary (cooperative) so it would only be useful if > used in a design something like this: > > - at the kernel level, AF_ALG just doesn't work (fails with a > permission-related error), at least for unprivileged processes > - but in user-space, an opt-in LD_PRELOAD module intercepts the socket(), > etc. calls for AF_ALG, and emulates the behaviour of current kernels > by calling into a user-space crypto library > > It can't be a security boundary, but it can be a mitigation for the > regressions that a new security boundary (or complete feature removal) > would otherwise cause, similar to the way LD_PRELOADs like aoss and > padsp mitigated the regressions for older binaries when distro kernels > disabled OSS audio. > > Meanwhile capabilities are a way to let trusted, privileged processes > have access to things that unprivileged processes do not, for example > making AF_ALG available to a few system services that need it but not > available to all of user-space. > > >You should expect any UID (even nobody) to be able to gain the > >privileges in their bounding set > > The kernel can distinguish between "CAP_FOO in the init namespace" and > "CAP_FOO in any other userns" if it wants to, and some kernel features > are already gated by having a capability in the init namespace > specifically. For example CAP_SYS_ADMIN in the init namespace allows > mounting block-device-backed filesystems like ext4, but CAP_SYS_ADMIN in > a different userns only allows a few "safe" mount operations > (bind-mounts, overlayfs, FUSE). > > smcv ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <cfe5a1f5-f7fe-44a5-8af9-8e4c8d68b3d7@terraraq.uk>]
* Re: [oss-security] CVE-2026-31431: CopyFail: linux local privilege scalation [not found] ` <cfe5a1f5-f7fe-44a5-8af9-8e4c8d68b3d7@terraraq.uk> @ 2026-05-02 22:32 ` Demi Marie Obenour 2026-05-03 6:30 ` Peter Gutmann 0 siblings, 1 reply; 10+ messages in thread From: Demi Marie Obenour @ 2026-05-02 22:32 UTC (permalink / raw) To: oss-security, Richard Kettlewell, Eric Biggers, Linux kernel mailing list, linux-crypto [-- Attachment #1.1.1: Type: text/plain, Size: 1608 bytes --] On 5/2/26 15:13, Richard Kettlewell wrote: > On 01/05/2026 16:30, Demi Marie Obenour wrote: >> On 4/30/26 03:19, Eric Biggers wrote: >>> But I also hope this finally provides some more impetus for AF_ALG to be >>> deprecated and removed. It's a massive, largely pointless attack >>> surface which has been causing problems, including regular CVEs, ever >>> since it was added to the kernel in 2010. And of course it's gotten >>> even worse lately, with LLMs now being able to find the bugs. >>> >>> Userspace crypto libraries exist. There's no need to escalate to kernel >>> mode just to do some math. >> >> The only reason I can think of to keep it is for embedded systems >> with weak CPUs and crypto accelerators that are actually worth using. >> However, those seem to be very rare outside of things like routers, >> which run specialized distros like OpenWRT. Even when the accelerator >> exists and is worth using, AF_ALG is certainly not an efficient way >> to access it. > > I have that use case, although fortunately it's in a context where > splice() is disabled. But the requirement is for access to the SoC's > accelerator - the interface doesn't need to be via AF_ALG in particular, > it doesn't have to offer software crypto (and it might be better if it > didn't), and it needn't be independent of the specific hardware > (although in the bigger picture it'd be a shame if it wasn't). > > ttfn/rjk Can you provide benchmarks showing that the accelerator is faster than the CPU on realistic workloads? -- Sincerely, Demi Marie Obenour (she/her/hers) [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 7253 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [oss-security] CVE-2026-31431: CopyFail: linux local privilege scalation 2026-05-02 22:32 ` Demi Marie Obenour @ 2026-05-03 6:30 ` Peter Gutmann 0 siblings, 0 replies; 10+ messages in thread From: Peter Gutmann @ 2026-05-03 6:30 UTC (permalink / raw) To: oss-security@lists.openwall.com, Richard Kettlewell, Eric Biggers, Linux kernel mailing list, linux-crypto@vger.kernel.org Demi Marie Obenour writes: >Can you provide benchmarks showing that the accelerator is faster than the >CPU on realistic workloads? That could be tricky. The accelerator uses more hardware crypto than the CPU on realistic workloads, would that do? The following is from playing around on a few bits of hardware that were to hand some years ago, so don't take it as gospel, but: /* Check for the presence of crypto hardware support. This is something of an exercise in futility because the crypto hardware is anything from slightly slower (large data blocks) to much, much slower (more standard small data blocks) than software due to the overhead of getting the data through the API to and from the cryptologic, the cryptologic startup/ shutdown overhead, and in the case of /dev/crypto, in and out of the kernel. The only place where it does matter is things like Cortex M3- level SoCs, so a combination of lower-power CPUs, no instruction-level assist for crypto, and direct hardware access from the RTOS with no overhead where you just point the cryptologic at a block of memory and say "process this". However, people really want to see the fancy crypto hardware used even if it yields a net loss in performance so we try and enable it if possible unless it really is pointless, just a software emulation (many /dev/crypto instances) where, assuming the crypto is provided by OpenSSL, you can end up in a situation where OpenSSL is calling into a kernel interface that then provides access to another, older and possibly unpatched, copy of OpenSSL code that's doing the crypto. [...] So one solution would be to get the fingers-of-one-hand applications still using the interface off it onto user-mode software-only and then make it kernel-only, closing the door on the entire attack surface from user space. Peter. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-04 19:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <afJorKIje4O6dXbH@netmeister.org>
[not found] ` <d6111caa-db61-498a-92cb-ea7a0aa0a5e2@ehuk.net>
[not found] ` <87se8dgicq.fsf@gentoo.org>
[not found] ` <afL-QhLfEKqHZqka@eldamar.lan>
[not found] ` <20260430071917.GB54208@sol>
[not found] ` <177abb5d-8ba9-4bb9-8b23-9fbc868ed3cd@gmail.com>
[not found] ` <20260501180028.GA2260@sol>
[not found] ` <19837ef5-e5b6-45f4-8336-3ce07423dfb1@gmail.com>
[not found] ` <20260501201841.GA2540@quark>
[not found] ` <c13dd3c5-ddc1-431e-bc7d-2de39c551f8e@gmail.com>
[not found] ` <20260502033556.GA3872267@google.com>
2026-05-02 4:52 ` AF_ALG hardening Demi Marie Obenour
2026-05-02 8:19 ` Simon Richter
2026-05-02 20:42 ` Demi Marie Obenour
2026-05-02 19:16 ` Eric Biggers
2026-05-04 19:01 ` Simon Richter
2026-05-04 19:54 ` Eric Biggers
[not found] ` <20260502035402.GB3872267@google.com>
2026-05-02 6:39 ` [oss-security] CVE-2026-31431: CopyFail: linux local privilege scalation Demi Marie Obenour
[not found] ` <CAM=PXV4q2i13W8Z_AZGDfdxbqWANJ=U4Sw3FTcv5mH_QUrrSfA@mail.gmail.com>
[not found] ` <afcqxCv58YrhbtVr@definition.pseudorandom.co.uk>
2026-05-03 19:20 ` Greg Dahlman
[not found] ` <cfe5a1f5-f7fe-44a5-8af9-8e4c8d68b3d7@terraraq.uk>
2026-05-02 22:32 ` Demi Marie Obenour
2026-05-03 6:30 ` Peter Gutmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox