BPF List
 help / color / mirror / Atom feed
* bpf kernel code leaks internal error codes to userspace
@ 2024-05-29 10:08 Lennart Poettering
  2024-05-30 12:18 ` Jiri Olsa
  0 siblings, 1 reply; 6+ messages in thread
From: Lennart Poettering @ 2024-05-29 10:08 UTC (permalink / raw)
  To: bpf

Hi!

It seems that the bpf code in the kernel sometimes leaks
kernel-internal error codes, i.e. those from include/linux/errno.h
into userspace (as opposed to those from
include/uapi/asm-generic/errno.h which are public userspace facing
API).

According to the comments from that internal header file: "These
should never be seen by user programs."

Specifically, this is about ENOTSUPP, which userspace simply cannot
handle, there's no error 524 defined in glibc or anywhere else.

We ran into this in systemd recently:

https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761

(a google search reveals others were hit by this too)

We commited a work-around for this for now:

https://github.com/systemd/systemd/pull/33067

But it really sucks to work around this in userspace, this is a kernel
internal definition after all, conflicting with userspace (where
ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable.

ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much
equally expressive, and something userspace can actually handle.

Various kernel subsystems have been fixed over the years in similar
situations. For example:

https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/

or

https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/

or

https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/

I think BPF should really fix that, too.

Or if you insist on leaking internals to userspace then please move
the definition to the public "uapi" headers, but yikes you are in for
some pain then given the conflicting userspace definitions.

Lennart

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bpf kernel code leaks internal error codes to userspace
  2024-05-29 10:08 bpf kernel code leaks internal error codes to userspace Lennart Poettering
@ 2024-05-30 12:18 ` Jiri Olsa
  2024-05-30 14:17   ` Lennart Poettering
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Olsa @ 2024-05-30 12:18 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: bpf

On Wed, May 29, 2024 at 12:08:34PM +0200, Lennart Poettering wrote:
> Hi!
> 
> It seems that the bpf code in the kernel sometimes leaks
> kernel-internal error codes, i.e. those from include/linux/errno.h
> into userspace (as opposed to those from
> include/uapi/asm-generic/errno.h which are public userspace facing
> API).
> 
> According to the comments from that internal header file: "These
> should never be seen by user programs."
> 
> Specifically, this is about ENOTSUPP, which userspace simply cannot
> handle, there's no error 524 defined in glibc or anywhere else.
> 
> We ran into this in systemd recently:
> 
> https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761
> 
> (a google search reveals others were hit by this too)
> 
> We commited a work-around for this for now:
> 
> https://github.com/systemd/systemd/pull/33067
> 
> But it really sucks to work around this in userspace, this is a kernel
> internal definition after all, conflicting with userspace (where
> ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable.
> 
> ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much
> equally expressive, and something userspace can actually handle.
> 
> Various kernel subsystems have been fixed over the years in similar
> situations. For example:
> 
> https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/
> 
> or
> 
> https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/
> 
> or
> 
> https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/
> 
> I think BPF should really fix that, too.

hm, I don't think we can change that, user space already depends
on those values and we'd break it with new value

it's unfortunate, but I don't think we can do much about that,
other than enforcing EOPNOTSUPP for new code

jirka

> 
> Or if you insist on leaking internals to userspace then please move
> the definition to the public "uapi" headers, but yikes you are in for
> some pain then given the conflicting userspace definitions.
> 
> Lennart
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bpf kernel code leaks internal error codes to userspace
  2024-05-30 12:18 ` Jiri Olsa
@ 2024-05-30 14:17   ` Lennart Poettering
  2024-05-30 17:09     ` Linus Torvalds
  2024-05-31  7:16     ` Jiri Olsa
  0 siblings, 2 replies; 6+ messages in thread
From: Lennart Poettering @ 2024-05-30 14:17 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: bpf, torvalds

On Do, 30.05.24 14:18, Jiri Olsa (olsajiri@gmail.com) wrote:

> > It seems that the bpf code in the kernel sometimes leaks
> > kernel-internal error codes, i.e. those from include/linux/errno.h
> > into userspace (as opposed to those from
> > include/uapi/asm-generic/errno.h which are public userspace facing
> > API).
> >
> > According to the comments from that internal header file: "These
> > should never be seen by user programs."
> >
> > Specifically, this is about ENOTSUPP, which userspace simply cannot
> > handle, there's no error 524 defined in glibc or anywhere else.
> >
> > We ran into this in systemd recently:
> >
> > https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761
> >
> > (a google search reveals others were hit by this too)
> >
> > We commited a work-around for this for now:
> >
> > https://github.com/systemd/systemd/pull/33067
> >
> > But it really sucks to work around this in userspace, this is a kernel
> > internal definition after all, conflicting with userspace (where
> > ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable.
> >
> > ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much
> > equally expressive, and something userspace can actually handle.
> >
> > Various kernel subsystems have been fixed over the years in similar
> > situations. For example:
> >
> > https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/
> >
> > or
> >
> > https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/
> >
> > or
> >
> > https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/
> >
> > I think BPF should really fix that, too.
>
> hm, I don't think we can change that, user space already depends
> on those values and we'd break it with new value

Are you sure about that? To be able to handle this situation that
userspace program whose existance you are indicating would have had to
go the extra mile to literally handle error code 524 that is not known
to userspace otherwise and handle it. If somebody goes the extra mile
to do that, what makes you think that they didn't just handle it as
equivalent to regular EOPNOSTUPP? In systemd at least that's what we
are doing.

Also: if various other subsystems (I linked examples from wireless,
media, mtd above) just fixed this, why not bpf, why is it special in
this regard?

AFAICS the man pages of most syscalls just list errnos that *can* be
returned, but usually doesn't list the precise conditions or makes
guarantees about it. This specific error is not listed at all on any
man page for the bpf() syscall, hence are you really sure this is
actually as set in stone as you think it is?

I mean, bpf() is still a bit of bleeding edge tech, and people playing
around with this probably tend to have quite new toolchains even, not
old, hard to fix code?

> it's unfortunate, but I don't think we can do much about that,
> other than enforcing EOPNOTSUPP for new code

I took the liberty to CC Linus on this:

Linus, what's the policy if some subsystem by mistake is leaking
internal kernel error codes (such as ENOTSUP) to userspace? Leave it
be as is (i.e. error number not defined in include/uapi/, not
documented, but still returned), or fix it to the closest matching
public error code (which is probably EOPNOTSUPP in this case)
accepting a – mild, I would say – compat break?

[BTW, wouldn't it make sense to add a BUG_ON or so on syscalls that
return an error number > 133 or so? This kind of issue is quite a
recurring theme, see the patches above, and such a BUG_ON would
probably catch 95% of all cases like this.]

Lennart

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bpf kernel code leaks internal error codes to userspace
  2024-05-30 14:17   ` Lennart Poettering
@ 2024-05-30 17:09     ` Linus Torvalds
  2024-05-31  7:16     ` Jiri Olsa
  1 sibling, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2024-05-30 17:09 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: Jiri Olsa, bpf

On Thu, 30 May 2024 at 07:17, Lennart Poettering <mzxreary@0pointer.net> wrote:
>
> Linus, what's the policy if some subsystem by mistake is leaking
> internal kernel error codes (such as ENOTSUP) to userspace?

Try to fix it, and stop leaking silly garbage in the hope that nobody cared.

And in most cases, people don't actually care, because they don't know
to even test for non-standard errno values.

In fact it is _very_ rare that some actual program - outside of some
test suite - cares about the errno values outside of the really
special ones

So things like "EAGAIN" and "EINTR" have real semantic meaning and
people test for them and do something explicitly different. Sometimes
a few others too, but those are the two obvious ones.

The rest are *almost* always entirely interchangeable and only result
in different error messages, not different _behavior_.

But then occasionally if it turns out that user space then breaks
because some random case actually tested for a _particular_ error, and
we have to put the silly garbage back.

Congrats, you have now invented a new ABI that you will support until
the end of time (or until user space stops caring, whichever comes
first).

That said, while I think ENOTSUPP should not be encouraged because
it's so non-standard, I think the ship on ENOTSUPP has long since
sailed.

Yes, yes, the comment may say "These should never be seen by user
programs." but that comment is historically about ERESTARTSYS and
friends, that are supposed to be turned into EINTR (and a restart etc
depending on process flags and signal details).

And nobody reads comments anyway, and perhaps more importantly, they
by definition have no semantic meaning, so...

We have a ton of code that returns ENOTSUPP - not in the least limited
to bpf - and while we have a checkpatch rule for it, it's more of a
"don't add new ones".  Even that one is likely not really relevant.

And yes, some of them may get translated, but the very first one I
randomly looked at was a proc_handler thing for some driver, and read
-> ...->read_iter() -> proc_sys_read() -> proc_sys_call_handler() ->
...->proc_handler() most definitely does not.

ENOTSUPP in particular is very understandable, because the "standard"
error names are garbage that nobody would ever use. EOPNOTSUPP? Never
heard of it.

The standard errno for that is EINVAL, and people actively avoid it
because it's _so_ common, so everybody goes "I want to make it clear
that this is somethign else" and then they pick some other random name
that sounds likely. And ENOTSUPP is right there and sounds very likely
indeed.

If we really cared, we should have made them have a very different
naming syntax, I'm afraid. Something explicitly inconvenient to make
people not want to use it. Like __INTERNAL_ONLY_ENOTSUPP.

                   Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bpf kernel code leaks internal error codes to userspace
  2024-05-30 14:17   ` Lennart Poettering
  2024-05-30 17:09     ` Linus Torvalds
@ 2024-05-31  7:16     ` Jiri Olsa
  2024-05-31 17:10       ` Alexei Starovoitov
  1 sibling, 1 reply; 6+ messages in thread
From: Jiri Olsa @ 2024-05-31  7:16 UTC (permalink / raw)
  To: Lennart Poettering; +Cc: Jiri Olsa, bpf, torvalds

On Thu, May 30, 2024 at 04:17:03PM +0200, Lennart Poettering wrote:
> On Do, 30.05.24 14:18, Jiri Olsa (olsajiri@gmail.com) wrote:
> 
> > > It seems that the bpf code in the kernel sometimes leaks
> > > kernel-internal error codes, i.e. those from include/linux/errno.h
> > > into userspace (as opposed to those from
> > > include/uapi/asm-generic/errno.h which are public userspace facing
> > > API).
> > >
> > > According to the comments from that internal header file: "These
> > > should never be seen by user programs."
> > >
> > > Specifically, this is about ENOTSUPP, which userspace simply cannot
> > > handle, there's no error 524 defined in glibc or anywhere else.
> > >
> > > We ran into this in systemd recently:
> > >
> > > https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761
> > >
> > > (a google search reveals others were hit by this too)
> > >
> > > We commited a work-around for this for now:
> > >
> > > https://github.com/systemd/systemd/pull/33067
> > >
> > > But it really sucks to work around this in userspace, this is a kernel
> > > internal definition after all, conflicting with userspace (where
> > > ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable.
> > >
> > > ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much
> > > equally expressive, and something userspace can actually handle.
> > >
> > > Various kernel subsystems have been fixed over the years in similar
> > > situations. For example:
> > >
> > > https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/
> > >
> > > or
> > >
> > > https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/
> > >
> > > or
> > >
> > > https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/
> > >
> > > I think BPF should really fix that, too.
> >
> > hm, I don't think we can change that, user space already depends
> > on those values and we'd break it with new value
> 
> Are you sure about that? To be able to handle this situation that
> userspace program whose existance you are indicating would have had to
> go the extra mile to literally handle error code 524 that is not known
> to userspace otherwise and handle it. If somebody goes the extra mile
> to do that, what makes you think that they didn't just handle it as
> equivalent to regular EOPNOSTUPP? In systemd at least that's what we
> are doing.

cilium/ebpf [1] library is checking return values just for ENOTSUPP(524)
on multiple places, libbpf has one place to check on that value for
program type detection AFAICS

jirka


[1] https://github.com/cilium/ebpf/

> 
> Also: if various other subsystems (I linked examples from wireless,
> media, mtd above) just fixed this, why not bpf, why is it special in
> this regard?
> 
> AFAICS the man pages of most syscalls just list errnos that *can* be
> returned, but usually doesn't list the precise conditions or makes
> guarantees about it. This specific error is not listed at all on any
> man page for the bpf() syscall, hence are you really sure this is
> actually as set in stone as you think it is?
> 
> I mean, bpf() is still a bit of bleeding edge tech, and people playing
> around with this probably tend to have quite new toolchains even, not
> old, hard to fix code?
> 
> > it's unfortunate, but I don't think we can do much about that,
> > other than enforcing EOPNOTSUPP for new code
> 
> I took the liberty to CC Linus on this:
> 
> Linus, what's the policy if some subsystem by mistake is leaking
> internal kernel error codes (such as ENOTSUP) to userspace? Leave it
> be as is (i.e. error number not defined in include/uapi/, not
> documented, but still returned), or fix it to the closest matching
> public error code (which is probably EOPNOTSUPP in this case)
> accepting a – mild, I would say – compat break?
> 
> [BTW, wouldn't it make sense to add a BUG_ON or so on syscalls that
> return an error number > 133 or so? This kind of issue is quite a
> recurring theme, see the patches above, and such a BUG_ON would
> probably catch 95% of all cases like this.]
> 
> Lennart

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bpf kernel code leaks internal error codes to userspace
  2024-05-31  7:16     ` Jiri Olsa
@ 2024-05-31 17:10       ` Alexei Starovoitov
  0 siblings, 0 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2024-05-31 17:10 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Lennart Poettering, bpf, Linus Torvalds

On Fri, May 31, 2024 at 12:16 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, May 30, 2024 at 04:17:03PM +0200, Lennart Poettering wrote:
> > On Do, 30.05.24 14:18, Jiri Olsa (olsajiri@gmail.com) wrote:
> >
> > > > It seems that the bpf code in the kernel sometimes leaks
> > > > kernel-internal error codes, i.e. those from include/linux/errno.h
> > > > into userspace (as opposed to those from
> > > > include/uapi/asm-generic/errno.h which are public userspace facing
> > > > API).
> > > >
> > > > According to the comments from that internal header file: "These
> > > > should never be seen by user programs."
> > > >
> > > > Specifically, this is about ENOTSUPP, which userspace simply cannot
> > > > handle, there's no error 524 defined in glibc or anywhere else.
> > > >
> > > > We ran into this in systemd recently:
> > > >
> > > > https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761
> > > >
> > > > (a google search reveals others were hit by this too)
> > > >
> > > > We commited a work-around for this for now:
> > > >
> > > > https://github.com/systemd/systemd/pull/33067
> > > >
> > > > But it really sucks to work around this in userspace, this is a kernel
> > > > internal definition after all, conflicting with userspace (where
> > > > ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable.
> > > >
> > > > ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much
> > > > equally expressive, and something userspace can actually handle.
> > > >
> > > > Various kernel subsystems have been fixed over the years in similar
> > > > situations. For example:
> > > >
> > > > https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/
> > > >
> > > > or
> > > >
> > > > https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@mess.org/
> > > >
> > > > or
> > > >
> > > > https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@canonical.com/
> > > >
> > > > I think BPF should really fix that, too.
> > >
> > > hm, I don't think we can change that, user space already depends
> > > on those values and we'd break it with new value
> >
> > Are you sure about that? To be able to handle this situation that
> > userspace program whose existance you are indicating would have had to
> > go the extra mile to literally handle error code 524 that is not known
> > to userspace otherwise and handle it. If somebody goes the extra mile
> > to do that, what makes you think that they didn't just handle it as
> > equivalent to regular EOPNOSTUPP? In systemd at least that's what we
> > are doing.
>
> cilium/ebpf [1] library is checking return values just for ENOTSUPP(524)
> on multiple places, libbpf has one place to check on that value for
> program type detection AFAICS
>
> jirka
>
>
> [1] https://github.com/cilium/ebpf/

fwiw both libraries can be changed to check for both error codes
ENOTSUPP and EOPNOTSUPP,
but, as noted earlier, the problem is not limited to bpf.
cilium library mentions
commit cb9a19fe4aa5 ("uprobes: Introduce prepare_uprobe()")
back from 2012.
Where installing uprobe would return ENOTSUPP to user space.

On bpf side we've switched to EOPNOTSUPP for any new code long ago,
but didn't adjust old code, because the damage was done before bpf existed.

$ git grep EOPNOTSUPP kernel/bpf|wc -l
56
$ git grep ENOTSUPP kernel/bpf|wc -l
61

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-31 17:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29 10:08 bpf kernel code leaks internal error codes to userspace Lennart Poettering
2024-05-30 12:18 ` Jiri Olsa
2024-05-30 14:17   ` Lennart Poettering
2024-05-30 17:09     ` Linus Torvalds
2024-05-31  7:16     ` Jiri Olsa
2024-05-31 17:10       ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox