* bpf kernel code leaks internal error codes to userspace @ 2024-05-29 10:08 Lennart Poettering 2024-05-30 12:18 ` Jiri Olsa 0 siblings, 1 reply; 6+ messages in thread From: Lennart Poettering @ 2024-05-29 10:08 UTC (permalink / raw) To: bpf Hi! It seems that the bpf code in the kernel sometimes leaks kernel-internal error codes, i.e. those from include/linux/errno.h into userspace (as opposed to those from include/uapi/asm-generic/errno.h which are public userspace facing API). According to the comments from that internal header file: "These should never be seen by user programs." Specifically, this is about ENOTSUPP, which userspace simply cannot handle, there's no error 524 defined in glibc or anywhere else. We ran into this in systemd recently: https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761 (a google search reveals others were hit by this too) We commited a work-around for this for now: https://github.com/systemd/systemd/pull/33067 But it really sucks to work around this in userspace, this is a kernel internal definition after all, conflicting with userspace (where ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable. ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much equally expressive, and something userspace can actually handle. Various kernel subsystems have been fixed over the years in similar situations. For example: https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/ or https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/ or https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/ I think BPF should really fix that, too. Or if you insist on leaking internals to userspace then please move the definition to the public "uapi" headers, but yikes you are in for some pain then given the conflicting userspace definitions. Lennart ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bpf kernel code leaks internal error codes to userspace 2024-05-29 10:08 bpf kernel code leaks internal error codes to userspace Lennart Poettering @ 2024-05-30 12:18 ` Jiri Olsa 2024-05-30 14:17 ` Lennart Poettering 0 siblings, 1 reply; 6+ messages in thread From: Jiri Olsa @ 2024-05-30 12:18 UTC (permalink / raw) To: Lennart Poettering; +Cc: bpf On Wed, May 29, 2024 at 12:08:34PM +0200, Lennart Poettering wrote: > Hi! > > It seems that the bpf code in the kernel sometimes leaks > kernel-internal error codes, i.e. those from include/linux/errno.h > into userspace (as opposed to those from > include/uapi/asm-generic/errno.h which are public userspace facing > API). > > According to the comments from that internal header file: "These > should never be seen by user programs." > > Specifically, this is about ENOTSUPP, which userspace simply cannot > handle, there's no error 524 defined in glibc or anywhere else. > > We ran into this in systemd recently: > > https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761 > > (a google search reveals others were hit by this too) > > We commited a work-around for this for now: > > https://github.com/systemd/systemd/pull/33067 > > But it really sucks to work around this in userspace, this is a kernel > internal definition after all, conflicting with userspace (where > ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable. > > ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much > equally expressive, and something userspace can actually handle. > > Various kernel subsystems have been fixed over the years in similar > situations. For example: > > https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/ > > or > > https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/ > > or > > https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/ > > I think BPF should really fix that, too. hm, I don't think we can change that, user space already depends on those values and we'd break it with new value it's unfortunate, but I don't think we can do much about that, other than enforcing EOPNOTSUPP for new code jirka > > Or if you insist on leaking internals to userspace then please move > the definition to the public "uapi" headers, but yikes you are in for > some pain then given the conflicting userspace definitions. > > Lennart > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bpf kernel code leaks internal error codes to userspace 2024-05-30 12:18 ` Jiri Olsa @ 2024-05-30 14:17 ` Lennart Poettering 2024-05-30 17:09 ` Linus Torvalds 2024-05-31 7:16 ` Jiri Olsa 0 siblings, 2 replies; 6+ messages in thread From: Lennart Poettering @ 2024-05-30 14:17 UTC (permalink / raw) To: Jiri Olsa; +Cc: bpf, torvalds On Do, 30.05.24 14:18, Jiri Olsa (olsajiri@gmail.com) wrote: > > It seems that the bpf code in the kernel sometimes leaks > > kernel-internal error codes, i.e. those from include/linux/errno.h > > into userspace (as opposed to those from > > include/uapi/asm-generic/errno.h which are public userspace facing > > API). > > > > According to the comments from that internal header file: "These > > should never be seen by user programs." > > > > Specifically, this is about ENOTSUPP, which userspace simply cannot > > handle, there's no error 524 defined in glibc or anywhere else. > > > > We ran into this in systemd recently: > > > > https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761 > > > > (a google search reveals others were hit by this too) > > > > We commited a work-around for this for now: > > > > https://github.com/systemd/systemd/pull/33067 > > > > But it really sucks to work around this in userspace, this is a kernel > > internal definition after all, conflicting with userspace (where > > ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable. > > > > ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much > > equally expressive, and something userspace can actually handle. > > > > Various kernel subsystems have been fixed over the years in similar > > situations. For example: > > > > https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/ > > > > or > > > > https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/ > > > > or > > > > https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/ > > > > I think BPF should really fix that, too. > > hm, I don't think we can change that, user space already depends > on those values and we'd break it with new value Are you sure about that? To be able to handle this situation that userspace program whose existance you are indicating would have had to go the extra mile to literally handle error code 524 that is not known to userspace otherwise and handle it. If somebody goes the extra mile to do that, what makes you think that they didn't just handle it as equivalent to regular EOPNOSTUPP? In systemd at least that's what we are doing. Also: if various other subsystems (I linked examples from wireless, media, mtd above) just fixed this, why not bpf, why is it special in this regard? AFAICS the man pages of most syscalls just list errnos that *can* be returned, but usually doesn't list the precise conditions or makes guarantees about it. This specific error is not listed at all on any man page for the bpf() syscall, hence are you really sure this is actually as set in stone as you think it is? I mean, bpf() is still a bit of bleeding edge tech, and people playing around with this probably tend to have quite new toolchains even, not old, hard to fix code? > it's unfortunate, but I don't think we can do much about that, > other than enforcing EOPNOTSUPP for new code I took the liberty to CC Linus on this: Linus, what's the policy if some subsystem by mistake is leaking internal kernel error codes (such as ENOTSUP) to userspace? Leave it be as is (i.e. error number not defined in include/uapi/, not documented, but still returned), or fix it to the closest matching public error code (which is probably EOPNOTSUPP in this case) accepting a – mild, I would say – compat break? [BTW, wouldn't it make sense to add a BUG_ON or so on syscalls that return an error number > 133 or so? This kind of issue is quite a recurring theme, see the patches above, and such a BUG_ON would probably catch 95% of all cases like this.] Lennart ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bpf kernel code leaks internal error codes to userspace 2024-05-30 14:17 ` Lennart Poettering @ 2024-05-30 17:09 ` Linus Torvalds 2024-05-31 7:16 ` Jiri Olsa 1 sibling, 0 replies; 6+ messages in thread From: Linus Torvalds @ 2024-05-30 17:09 UTC (permalink / raw) To: Lennart Poettering; +Cc: Jiri Olsa, bpf On Thu, 30 May 2024 at 07:17, Lennart Poettering <mzxreary@0pointer.net> wrote: > > Linus, what's the policy if some subsystem by mistake is leaking > internal kernel error codes (such as ENOTSUP) to userspace? Try to fix it, and stop leaking silly garbage in the hope that nobody cared. And in most cases, people don't actually care, because they don't know to even test for non-standard errno values. In fact it is _very_ rare that some actual program - outside of some test suite - cares about the errno values outside of the really special ones So things like "EAGAIN" and "EINTR" have real semantic meaning and people test for them and do something explicitly different. Sometimes a few others too, but those are the two obvious ones. The rest are *almost* always entirely interchangeable and only result in different error messages, not different _behavior_. But then occasionally if it turns out that user space then breaks because some random case actually tested for a _particular_ error, and we have to put the silly garbage back. Congrats, you have now invented a new ABI that you will support until the end of time (or until user space stops caring, whichever comes first). That said, while I think ENOTSUPP should not be encouraged because it's so non-standard, I think the ship on ENOTSUPP has long since sailed. Yes, yes, the comment may say "These should never be seen by user programs." but that comment is historically about ERESTARTSYS and friends, that are supposed to be turned into EINTR (and a restart etc depending on process flags and signal details). And nobody reads comments anyway, and perhaps more importantly, they by definition have no semantic meaning, so... We have a ton of code that returns ENOTSUPP - not in the least limited to bpf - and while we have a checkpatch rule for it, it's more of a "don't add new ones". Even that one is likely not really relevant. And yes, some of them may get translated, but the very first one I randomly looked at was a proc_handler thing for some driver, and read -> ...->read_iter() -> proc_sys_read() -> proc_sys_call_handler() -> ...->proc_handler() most definitely does not. ENOTSUPP in particular is very understandable, because the "standard" error names are garbage that nobody would ever use. EOPNOTSUPP? Never heard of it. The standard errno for that is EINVAL, and people actively avoid it because it's _so_ common, so everybody goes "I want to make it clear that this is somethign else" and then they pick some other random name that sounds likely. And ENOTSUPP is right there and sounds very likely indeed. If we really cared, we should have made them have a very different naming syntax, I'm afraid. Something explicitly inconvenient to make people not want to use it. Like __INTERNAL_ONLY_ENOTSUPP. Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bpf kernel code leaks internal error codes to userspace 2024-05-30 14:17 ` Lennart Poettering 2024-05-30 17:09 ` Linus Torvalds @ 2024-05-31 7:16 ` Jiri Olsa 2024-05-31 17:10 ` Alexei Starovoitov 1 sibling, 1 reply; 6+ messages in thread From: Jiri Olsa @ 2024-05-31 7:16 UTC (permalink / raw) To: Lennart Poettering; +Cc: Jiri Olsa, bpf, torvalds On Thu, May 30, 2024 at 04:17:03PM +0200, Lennart Poettering wrote: > On Do, 30.05.24 14:18, Jiri Olsa (olsajiri@gmail.com) wrote: > > > > It seems that the bpf code in the kernel sometimes leaks > > > kernel-internal error codes, i.e. those from include/linux/errno.h > > > into userspace (as opposed to those from > > > include/uapi/asm-generic/errno.h which are public userspace facing > > > API). > > > > > > According to the comments from that internal header file: "These > > > should never be seen by user programs." > > > > > > Specifically, this is about ENOTSUPP, which userspace simply cannot > > > handle, there's no error 524 defined in glibc or anywhere else. > > > > > > We ran into this in systemd recently: > > > > > > https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761 > > > > > > (a google search reveals others were hit by this too) > > > > > > We commited a work-around for this for now: > > > > > > https://github.com/systemd/systemd/pull/33067 > > > > > > But it really sucks to work around this in userspace, this is a kernel > > > internal definition after all, conflicting with userspace (where > > > ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable. > > > > > > ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much > > > equally expressive, and something userspace can actually handle. > > > > > > Various kernel subsystems have been fixed over the years in similar > > > situations. For example: > > > > > > https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/ > > > > > > or > > > > > > https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/ > > > > > > or > > > > > > https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/ > > > > > > I think BPF should really fix that, too. > > > > hm, I don't think we can change that, user space already depends > > on those values and we'd break it with new value > > Are you sure about that? To be able to handle this situation that > userspace program whose existance you are indicating would have had to > go the extra mile to literally handle error code 524 that is not known > to userspace otherwise and handle it. If somebody goes the extra mile > to do that, what makes you think that they didn't just handle it as > equivalent to regular EOPNOSTUPP? In systemd at least that's what we > are doing. cilium/ebpf [1] library is checking return values just for ENOTSUPP(524) on multiple places, libbpf has one place to check on that value for program type detection AFAICS jirka [1] https://github.com/cilium/ebpf/ > > Also: if various other subsystems (I linked examples from wireless, > media, mtd above) just fixed this, why not bpf, why is it special in > this regard? > > AFAICS the man pages of most syscalls just list errnos that *can* be > returned, but usually doesn't list the precise conditions or makes > guarantees about it. This specific error is not listed at all on any > man page for the bpf() syscall, hence are you really sure this is > actually as set in stone as you think it is? > > I mean, bpf() is still a bit of bleeding edge tech, and people playing > around with this probably tend to have quite new toolchains even, not > old, hard to fix code? > > > it's unfortunate, but I don't think we can do much about that, > > other than enforcing EOPNOTSUPP for new code > > I took the liberty to CC Linus on this: > > Linus, what's the policy if some subsystem by mistake is leaking > internal kernel error codes (such as ENOTSUP) to userspace? Leave it > be as is (i.e. error number not defined in include/uapi/, not > documented, but still returned), or fix it to the closest matching > public error code (which is probably EOPNOTSUPP in this case) > accepting a – mild, I would say – compat break? > > [BTW, wouldn't it make sense to add a BUG_ON or so on syscalls that > return an error number > 133 or so? This kind of issue is quite a > recurring theme, see the patches above, and such a BUG_ON would > probably catch 95% of all cases like this.] > > Lennart ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bpf kernel code leaks internal error codes to userspace 2024-05-31 7:16 ` Jiri Olsa @ 2024-05-31 17:10 ` Alexei Starovoitov 0 siblings, 0 replies; 6+ messages in thread From: Alexei Starovoitov @ 2024-05-31 17:10 UTC (permalink / raw) To: Jiri Olsa; +Cc: Lennart Poettering, bpf, Linus Torvalds On Fri, May 31, 2024 at 12:16 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Thu, May 30, 2024 at 04:17:03PM +0200, Lennart Poettering wrote: > > On Do, 30.05.24 14:18, Jiri Olsa (olsajiri@gmail.com) wrote: > > > > > > It seems that the bpf code in the kernel sometimes leaks > > > > kernel-internal error codes, i.e. those from include/linux/errno.h > > > > into userspace (as opposed to those from > > > > include/uapi/asm-generic/errno.h which are public userspace facing > > > > API). > > > > > > > > According to the comments from that internal header file: "These > > > > should never be seen by user programs." > > > > > > > > Specifically, this is about ENOTSUPP, which userspace simply cannot > > > > handle, there's no error 524 defined in glibc or anywhere else. > > > > > > > > We ran into this in systemd recently: > > > > > > > > https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761 > > > > > > > > (a google search reveals others were hit by this too) > > > > > > > > We commited a work-around for this for now: > > > > > > > > https://github.com/systemd/systemd/pull/33067 > > > > > > > > But it really sucks to work around this in userspace, this is a kernel > > > > internal definition after all, conflicting with userspace (where > > > > ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable. > > > > > > > > ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much > > > > equally expressive, and something userspace can actually handle. > > > > > > > > Various kernel subsystems have been fixed over the years in similar > > > > situations. For example: > > > > > > > > https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/ > > > > > > > > or > > > > > > > > https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/ > > > > > > > > or > > > > > > > > https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/ > > > > > > > > I think BPF should really fix that, too. > > > > > > hm, I don't think we can change that, user space already depends > > > on those values and we'd break it with new value > > > > Are you sure about that? To be able to handle this situation that > > userspace program whose existance you are indicating would have had to > > go the extra mile to literally handle error code 524 that is not known > > to userspace otherwise and handle it. If somebody goes the extra mile > > to do that, what makes you think that they didn't just handle it as > > equivalent to regular EOPNOSTUPP? In systemd at least that's what we > > are doing. > > cilium/ebpf [1] library is checking return values just for ENOTSUPP(524) > on multiple places, libbpf has one place to check on that value for > program type detection AFAICS > > jirka > > > [1] https://github.com/cilium/ebpf/ fwiw both libraries can be changed to check for both error codes ENOTSUPP and EOPNOTSUPP, but, as noted earlier, the problem is not limited to bpf. cilium library mentions commit cb9a19fe4aa5 ("uprobes: Introduce prepare_uprobe()") back from 2012. Where installing uprobe would return ENOTSUPP to user space. On bpf side we've switched to EOPNOTSUPP for any new code long ago, but didn't adjust old code, because the damage was done before bpf existed. $ git grep EOPNOTSUPP kernel/bpf|wc -l 56 $ git grep ENOTSUPP kernel/bpf|wc -l 61 ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-05-31 17:11 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-29 10:08 bpf kernel code leaks internal error codes to userspace Lennart Poettering 2024-05-30 12:18 ` Jiri Olsa 2024-05-30 14:17 ` Lennart Poettering 2024-05-30 17:09 ` Linus Torvalds 2024-05-31 7:16 ` Jiri Olsa 2024-05-31 17:10 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox