Re: [PATCH bpf-next v3 0/3] Separate tests that need error injection

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Viktor Malik <vmalik@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
	Jordan Rome <linux@jordanrome.com>,
	Qais Yousef <qyousef@layalina.io>, Hou Tao <houtao1@huawei.com>
Subject: Re: [PATCH bpf-next v3 0/3] Separate tests that need error injection
Date: Mon, 16 Feb 2026 08:46:28 +0100	[thread overview]
Message-ID: <fa67d3db-273f-4e4e-90a2-ba805f484bd1@redhat.com> (raw)
In-Reply-To: <CAADnVQLT7cGgExtaqusyk+XwrD=-x15kL0veJB5YPewjPo=PUg@mail.gmail.com>

On 2/14/26 00:13, Alexei Starovoitov wrote:
> On Fri, Feb 13, 2026 at 1:22 AM Viktor Malik <vmalik@redhat.com> wrote:
>>
>> On 2/12/26 17:35, Alexei Starovoitov wrote:
>>> On Wed, Feb 11, 2026 at 11:22 PM Viktor Malik <vmalik@redhat.com> wrote:
>>>>
>>>> On 2/12/26 02:42, Alexei Starovoitov wrote:
>>>>> On Wed, Feb 11, 2026 at 5:55 AM Viktor Malik <vmalik@redhat.com> wrote:
>>>>>>
>>>>>> On 2/11/26 03:42, Alexei Starovoitov wrote:
>>>>>>> On Tue, Feb 10, 2026 at 7:14 AM Viktor Malik <vmalik@redhat.com> wrote:
>>>>>>>>
>>>>>>>> Some enterprise kernels (such as RHEL) do not enable error injection via
>>>>>>>> BPF (CONFIG_FUNCTION_ERROR_INJECTION and CONFIG_BPF_KPROBE_OVERRIDE).
>>>>>>>> When running test_progs on such kernels, a lot of test cases fail since
>>>>>>>> they use sleepable fentry or fmod_ret program types which require error
>>>>>>>> injection to be enabled. While it is possible to skip these via custom
>>>>>>>> DENYLIST, some test_progs are not properly split into subtests and
>>>>>>>> therefore must be entirely skipped.
>>>>>>>
>>>>>>> Something is wrong.
>>>>>>> -SEC("fentry/" SYS_PREFIX "sys_nanosleep")
>>>>>>> -int do_probe_read(void *ctx)
>>>>>>> +SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
>>>>>>> +int probe_read(void *ctx)
>>>>>>>  {
>>>>>>>         char buf[8];
>>>>>>>
>>>>>>> @@ -37,19 +37,19 @@ int do_probe_read(void *ctx)
>>>>>>>         return 0;
>>>>>>>  }
>>>>>>>
>>>>>>> -SEC("fentry.s/" SYS_PREFIX "sys_nanosleep")
>>>>>>> -int do_copy_from_user(void *ctx)
>>>>>>> +SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
>>>>>>>
>>>>>>> sleepable should have nothing to do with CONFIG_FUNCTION_ERROR_INJECTION.
>>>>>>
>>>>>> I may be missing something but I think it is directly dependent, at
>>>>>> least for fentry.
>>>>>>
>>>>>> In bpf_check_attach_target (kernel/bpf/verifier.c), we have:
>>>>>>
>>>>>>     if (prog->sleepable) {
>>>>>>             ret = -EINVAL;
>>>>>>             switch (prog->type) {
>>>>>>             case BPF_PROG_TYPE_TRACING:
>>>>>>
>>>>>>                 /* fentry/fexit/fmod_ret progs can be sleepable if they are
>>>>>>                  * attached to ALLOW_ERROR_INJECTION and are not in denylist.
>>>>>>                  */
>>>>>>                 if (!check_non_sleepable_error_inject(btf_id) &&
>>>>>>                     within_error_injection_list(addr))
>>>>>
>>>>> Oh. That's an annoying side effect of disabling error inject.
>>>>> fentry progs are not injecting any errors. They just use that
>>>>> list as allowlist of good attach points.
>>>>> Let's explicitly allow all syscalls here.
>>>>
>>>> Ok, I'll post a patch.
>>>>
>>>>>
>>>>>>                     ret = 0;
>>>>>>                 /* fentry/fexit/fmod_ret progs can also be sleepable if they are
>>>>>>                  * in the fmodret id set with the KF_SLEEPABLE flag.
>>>>>>                  */
>>>>>>                 else {
>>>>>>                     u32 *flags = btf_kfunc_is_modify_return(btf, btf_id,
>>>>>>                                         prog);
>>>>>>
>>>>>>                     if (flags && (*flags & KF_SLEEPABLE))
>>>>>>                         ret = 0;
>>>>>>                 }
>>>>>>                 break;
>>>>>>             [...]
>>>>>>             }
>>>>>>             if (ret) {
>>>>>>                 module_put(mod);
>>>>>>                 bpf_log(log, "%s is not sleepable\n", tname);
>>>>>>                 return ret;
>>>>>>             }
>>>>>>     }
>>>>>>
>>>>>> If I read that correctly, fentry must be on the error injection list
>>>>>> (which is populated by functions marked with ALLOW_ERROR_INJECTION when
>>>>>> CONFIG_FUNCTION_ERROR_INJECTION=y) to be allowed as sleepable.
>>>>>>
>>>>>> The other branch for fmodret id set seems to be only relevant for
>>>>>> kernel modules which explicily register their functions into
>>>>>> BTF_KFUNC_HOOK_FMODRET set.
>>>>>>
>>>>>>> While CONFIG_BPF_KPROBE_OVERRIDE affects a single helper.
>>>>>>>
>>>>>>> Both fentry and fentry.s should work fine on sys_nanosleep.
>>>>>>> fmod_ret is also completely independent of these two flags.
>>>>>>
>>>>>> For fmod_ret, we have a very similar code:
>>>>>>
>>>>>>     } else if (prog->expected_attach_type == BPF_MODIFY_RETURN) {
>>>>>>             [...]
>>>>>>             ret = -EINVAL;
>>>>>>             if (btf_kfunc_is_modify_return(btf, btf_id, prog) ||
>>>>>>                 !check_attach_modify_return(addr, tname))
>>>>>>                 ret = 0;
>>>>>>             if (ret) {
>>>>>>                 module_put(mod);
>>>>>>                 bpf_log(log, "%s() is not modifiable\n", tname);
>>>>>>                 return ret;
>>>>>>             }
>>>>>>     }
>>>>>>
>>>>>> with
>>>>>>
>>>>>>     static int check_attach_modify_return(unsigned long addr, const char *func_name)
>>>>>>     {
>>>>>>         if (within_error_injection_list(addr) ||
>>>>>>             !strncmp(SECURITY_PREFIX, func_name, sizeof(SECURITY_PREFIX) - 1))
>>>>>
>>>>> Same thing here.
>>>>> Let's wrap within_error_injection_list() with bpf specific helper
>>>>> that in case of !CONFIG_FUNCTION_ERROR_INJECTION
>>>>> will allow syscalls anyway.
>>>>
>>>> I would argue that fmod_ret should remain gated behind
>>>> CONFIG_ERROR_INJECTION. It's a similar concept to kprobe override - it
>>>> allows to override the execution of the original syscall and return a
>>>> custom error code.
>>>
>>> Definitely not. fmod_ret and error injection are two orthogonal concepts.
>>> fmod_ret was introduced to be independent of error injection in the
>>> first place. fmod_ret is a deliberate spot to allow bpf to change
>>> return code
>>> in production.
>>
>> I'm not sure that was the original intention of fmod_ret. Even the
>> original patch set mentions that error injection needs to be explicitly
>> enabled [1,2]. Changing that is IMHO a significant shift which needs
>> discussion.
> 
> That discussion already happened when error_injection was forked
> as a config flag and went to default N.

Then I'm a bit confused about the purpose of error_injection. The
description of the config option says:

    Add fault injections into various functions that are annotated with
    ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
    value of these functions. This is useful to test error paths of code.

    If unsure, say N

But if you say that fmod_ret and error_injection are orthogonal
concepts, then what is error_injection good for without fmod_ret?

>>
>>> While error injection is a debug feature to stress test the kernel.
>>> syscalls are in fmod_ret territory, since this is not debugging
>>> and not stressing the kernel. ptrace can change syscall returns without
>>> bpf, without fmod_ret, and without error injection.
>>
>> The thing I'm worried about here is that allowing BPF programs override
>> syscalls will esentially allow to implement a BPF-based rootkit. I'm
>> well aware that BPF is root-only (although there are ways to allow
>> unprivileged users run BPF, such as tokens) but if a potential attacker
>> gains root access even for a short time, they can install the rootkit
>> which may live unnoticed from that moment. Which is not something we
>> really want to enable in production. For ptrace, my understanding is
>> that it is significantly easier to detect if it is used.
>>
>> If you don't mind, I'd start with only allowing sleepable syscalls (when
>> error injection is disabled) for now.
> 
> I mind and I hate the reasoning that in kernels with error_injection=y
> bpf is prone to be a rootkit. This is just a security circus
> detached from reality.

Could you share a bit more details why? I'm not a security expert myself
but it seems to me that intercepting syscalls could be used to implement
a rootkit. For instance [1] does something similar, although they use
bpf_override_return in syscall kprobes. So, I'd love to understand why
this is detached from reality.

Thanks!
Viktor

[1] https://github.com/Gui774ume/ebpfkit

next prev parent reply	other threads:[~2026-02-16  7:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 15:13 [PATCH bpf-next v3 0/3] Separate tests that need error injection Viktor Malik
2026-02-10 15:14 ` [PATCH bpf-next v3 1/3] selftests/bpf: Split module_attach into subtests Viktor Malik
2026-02-10 15:14 ` [PATCH bpf-next v3 2/3] selftests/bpf: Split read_vsyscall " Viktor Malik
2026-02-10 15:14 ` [PATCH bpf-next v3 3/3] selftests/bpf: Split sleepable fentry from LSM test Viktor Malik
2026-02-11  2:42 ` [PATCH bpf-next v3 0/3] Separate tests that need error injection Alexei Starovoitov
2026-02-11 13:54   ` Viktor Malik
2026-02-12  1:42     ` Alexei Starovoitov
2026-02-12  7:22       ` Viktor Malik
2026-02-12 16:35         ` Alexei Starovoitov
2026-02-13  9:21           ` Viktor Malik
2026-02-13 23:13             ` Alexei Starovoitov
2026-02-16  7:46               ` Viktor Malik [this message]
2026-02-23 11:52                 ` Viktor Malik
2026-02-24  2:58                   ` Alexei Starovoitov
2026-02-24  8:03                     ` Viktor Malik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa67d3db-273f-4e4e-90a2-ba805f484bd1@redhat.com \
    --to=vmalik@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux@jordanrome.com \
    --cc=martin.lau@linux.dev \
    --cc=qyousef@layalina.io \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.