public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Viktor Malik <vmalik@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
	Jordan Rome <linux@jordanrome.com>,
	Qais Yousef <qyousef@layalina.io>, Hou Tao <houtao1@huawei.com>
Subject: Re: [PATCH bpf-next v3 0/3] Separate tests that need error injection
Date: Mon, 16 Feb 2026 08:46:28 +0100	[thread overview]
Message-ID: <fa67d3db-273f-4e4e-90a2-ba805f484bd1@redhat.com> (raw)
In-Reply-To: <CAADnVQLT7cGgExtaqusyk+XwrD=-x15kL0veJB5YPewjPo=PUg@mail.gmail.com>

On 2/14/26 00:13, Alexei Starovoitov wrote:
> On Fri, Feb 13, 2026 at 1:22 AM Viktor Malik <vmalik@redhat.com> wrote:
>>
>> On 2/12/26 17:35, Alexei Starovoitov wrote:
>>> On Wed, Feb 11, 2026 at 11:22 PM Viktor Malik <vmalik@redhat.com> wrote:
>>>>
>>>> On 2/12/26 02:42, Alexei Starovoitov wrote:
>>>>> On Wed, Feb 11, 2026 at 5:55 AM Viktor Malik <vmalik@redhat.com> wrote:
>>>>>>
>>>>>> On 2/11/26 03:42, Alexei Starovoitov wrote:
>>>>>>> On Tue, Feb 10, 2026 at 7:14 AM Viktor Malik <vmalik@redhat.com> wrote:
>>>>>>>>
>>>>>>>> Some enterprise kernels (such as RHEL) do not enable error injection via
>>>>>>>> BPF (CONFIG_FUNCTION_ERROR_INJECTION and CONFIG_BPF_KPROBE_OVERRIDE).
>>>>>>>> When running test_progs on such kernels, a lot of test cases fail since
>>>>>>>> they use sleepable fentry or fmod_ret program types which require error
>>>>>>>> injection to be enabled. While it is possible to skip these via custom
>>>>>>>> DENYLIST, some test_progs are not properly split into subtests and
>>>>>>>> therefore must be entirely skipped.
>>>>>>>
>>>>>>> Something is wrong.
>>>>>>> -SEC("fentry/" SYS_PREFIX "sys_nanosleep")
>>>>>>> -int do_probe_read(void *ctx)
>>>>>>> +SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
>>>>>>> +int probe_read(void *ctx)
>>>>>>>  {
>>>>>>>         char buf[8];
>>>>>>>
>>>>>>> @@ -37,19 +37,19 @@ int do_probe_read(void *ctx)
>>>>>>>         return 0;
>>>>>>>  }
>>>>>>>
>>>>>>> -SEC("fentry.s/" SYS_PREFIX "sys_nanosleep")
>>>>>>> -int do_copy_from_user(void *ctx)
>>>>>>> +SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
>>>>>>>
>>>>>>> sleepable should have nothing to do with CONFIG_FUNCTION_ERROR_INJECTION.
>>>>>>
>>>>>> I may be missing something but I think it is directly dependent, at
>>>>>> least for fentry.
>>>>>>
>>>>>> In bpf_check_attach_target (kernel/bpf/verifier.c), we have:
>>>>>>
>>>>>>     if (prog->sleepable) {
>>>>>>             ret = -EINVAL;
>>>>>>             switch (prog->type) {
>>>>>>             case BPF_PROG_TYPE_TRACING:
>>>>>>
>>>>>>                 /* fentry/fexit/fmod_ret progs can be sleepable if they are
>>>>>>                  * attached to ALLOW_ERROR_INJECTION and are not in denylist.
>>>>>>                  */
>>>>>>                 if (!check_non_sleepable_error_inject(btf_id) &&
>>>>>>                     within_error_injection_list(addr))
>>>>>
>>>>> Oh. That's an annoying side effect of disabling error inject.
>>>>> fentry progs are not injecting any errors. They just use that
>>>>> list as allowlist of good attach points.
>>>>> Let's explicitly allow all syscalls here.
>>>>
>>>> Ok, I'll post a patch.
>>>>
>>>>>
>>>>>>                     ret = 0;
>>>>>>                 /* fentry/fexit/fmod_ret progs can also be sleepable if they are
>>>>>>                  * in the fmodret id set with the KF_SLEEPABLE flag.
>>>>>>                  */
>>>>>>                 else {
>>>>>>                     u32 *flags = btf_kfunc_is_modify_return(btf, btf_id,
>>>>>>                                         prog);
>>>>>>
>>>>>>                     if (flags && (*flags & KF_SLEEPABLE))
>>>>>>                         ret = 0;
>>>>>>                 }
>>>>>>                 break;
>>>>>>             [...]
>>>>>>             }
>>>>>>             if (ret) {
>>>>>>                 module_put(mod);
>>>>>>                 bpf_log(log, "%s is not sleepable\n", tname);
>>>>>>                 return ret;
>>>>>>             }
>>>>>>     }
>>>>>>
>>>>>> If I read that correctly, fentry must be on the error injection list
>>>>>> (which is populated by functions marked with ALLOW_ERROR_INJECTION when
>>>>>> CONFIG_FUNCTION_ERROR_INJECTION=y) to be allowed as sleepable.
>>>>>>
>>>>>> The other branch for fmodret id set seems to be only relevant for
>>>>>> kernel modules which explicily register their functions into
>>>>>> BTF_KFUNC_HOOK_FMODRET set.
>>>>>>
>>>>>>> While CONFIG_BPF_KPROBE_OVERRIDE affects a single helper.
>>>>>>>
>>>>>>> Both fentry and fentry.s should work fine on sys_nanosleep.
>>>>>>> fmod_ret is also completely independent of these two flags.
>>>>>>
>>>>>> For fmod_ret, we have a very similar code:
>>>>>>
>>>>>>     } else if (prog->expected_attach_type == BPF_MODIFY_RETURN) {
>>>>>>             [...]
>>>>>>             ret = -EINVAL;
>>>>>>             if (btf_kfunc_is_modify_return(btf, btf_id, prog) ||
>>>>>>                 !check_attach_modify_return(addr, tname))
>>>>>>                 ret = 0;
>>>>>>             if (ret) {
>>>>>>                 module_put(mod);
>>>>>>                 bpf_log(log, "%s() is not modifiable\n", tname);
>>>>>>                 return ret;
>>>>>>             }
>>>>>>     }
>>>>>>
>>>>>> with
>>>>>>
>>>>>>     static int check_attach_modify_return(unsigned long addr, const char *func_name)
>>>>>>     {
>>>>>>         if (within_error_injection_list(addr) ||
>>>>>>             !strncmp(SECURITY_PREFIX, func_name, sizeof(SECURITY_PREFIX) - 1))
>>>>>
>>>>> Same thing here.
>>>>> Let's wrap within_error_injection_list() with bpf specific helper
>>>>> that in case of !CONFIG_FUNCTION_ERROR_INJECTION
>>>>> will allow syscalls anyway.
>>>>
>>>> I would argue that fmod_ret should remain gated behind
>>>> CONFIG_ERROR_INJECTION. It's a similar concept to kprobe override - it
>>>> allows to override the execution of the original syscall and return a
>>>> custom error code.
>>>
>>> Definitely not. fmod_ret and error injection are two orthogonal concepts.
>>> fmod_ret was introduced to be independent of error injection in the
>>> first place. fmod_ret is a deliberate spot to allow bpf to change
>>> return code
>>> in production.
>>
>> I'm not sure that was the original intention of fmod_ret. Even the
>> original patch set mentions that error injection needs to be explicitly
>> enabled [1,2]. Changing that is IMHO a significant shift which needs
>> discussion.
> 
> That discussion already happened when error_injection was forked
> as a config flag and went to default N.

Then I'm a bit confused about the purpose of error_injection. The
description of the config option says:

    Add fault injections into various functions that are annotated with
    ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
    value of these functions. This is useful to test error paths of code.

    If unsure, say N

But if you say that fmod_ret and error_injection are orthogonal
concepts, then what is error_injection good for without fmod_ret?

>>
>>> While error injection is a debug feature to stress test the kernel.
>>> syscalls are in fmod_ret territory, since this is not debugging
>>> and not stressing the kernel. ptrace can change syscall returns without
>>> bpf, without fmod_ret, and without error injection.
>>
>> The thing I'm worried about here is that allowing BPF programs override
>> syscalls will esentially allow to implement a BPF-based rootkit. I'm
>> well aware that BPF is root-only (although there are ways to allow
>> unprivileged users run BPF, such as tokens) but if a potential attacker
>> gains root access even for a short time, they can install the rootkit
>> which may live unnoticed from that moment. Which is not something we
>> really want to enable in production. For ptrace, my understanding is
>> that it is significantly easier to detect if it is used.
>>
>> If you don't mind, I'd start with only allowing sleepable syscalls (when
>> error injection is disabled) for now.
> 
> I mind and I hate the reasoning that in kernels with error_injection=y
> bpf is prone to be a rootkit. This is just a security circus
> detached from reality.

Could you share a bit more details why? I'm not a security expert myself
but it seems to me that intercepting syscalls could be used to implement
a rootkit. For instance [1] does something similar, although they use
bpf_override_return in syscall kprobes. So, I'd love to understand why
this is detached from reality.

Thanks!
Viktor

[1] https://github.com/Gui774ume/ebpfkit


  reply	other threads:[~2026-02-16  7:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 15:13 [PATCH bpf-next v3 0/3] Separate tests that need error injection Viktor Malik
2026-02-10 15:14 ` [PATCH bpf-next v3 1/3] selftests/bpf: Split module_attach into subtests Viktor Malik
2026-02-10 15:14 ` [PATCH bpf-next v3 2/3] selftests/bpf: Split read_vsyscall " Viktor Malik
2026-02-10 15:14 ` [PATCH bpf-next v3 3/3] selftests/bpf: Split sleepable fentry from LSM test Viktor Malik
2026-02-11  2:42 ` [PATCH bpf-next v3 0/3] Separate tests that need error injection Alexei Starovoitov
2026-02-11 13:54   ` Viktor Malik
2026-02-12  1:42     ` Alexei Starovoitov
2026-02-12  7:22       ` Viktor Malik
2026-02-12 16:35         ` Alexei Starovoitov
2026-02-13  9:21           ` Viktor Malik
2026-02-13 23:13             ` Alexei Starovoitov
2026-02-16  7:46               ` Viktor Malik [this message]
2026-02-23 11:52                 ` Viktor Malik
2026-02-24  2:58                   ` Alexei Starovoitov
2026-02-24  8:03                     ` Viktor Malik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa67d3db-273f-4e4e-90a2-ba805f484bd1@redhat.com \
    --to=vmalik@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux@jordanrome.com \
    --cc=martin.lau@linux.dev \
    --cc=qyousef@layalina.io \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox