public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next v4 0/4] bpf, x64: Fix tailcall infinite loop
@ 2023-09-03 15:14 Leon Hwang
  2023-09-03 15:14 ` [RFC PATCH bpf-next v4 1/4] bpf, x64: Comment tail_call_cnt initialisation Leon Hwang
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Leon Hwang @ 2023-09-03 15:14 UTC (permalink / raw)
  To: ast, daniel, andrii, maciej.fijalkowski; +Cc: song, iii, jakub, hffilwlqm, bpf

This patch series fixes a tailcall infinite loop on x64.

From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram.
2. A tailcall runs in the BPF subprogram.
3. The tailcall calls the subprogram's caller.

As a result, a tailcall infinite loop comes up. And the loop would halt
the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack
and rax register between BPF subprograms. So do in trampolines.

How did I discover the bug?

From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256 for
subprogs with tailcalls"), the total stack size limits to around 8KiB.
Then, I write some bpf progs to validate the stack consuming, that are
tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf[1].

At that time, accidently, I made a tailcall loop. And then the loop halted
my VM. Without the loop, the bpf progs would consume over 8KiB stack size.
But the _stack-overflow_ did not halt my VM.

With bpf_printk(), I confirmed that the tailcall count limit did not work
expectedly. Next, read the code and fix it.

Finally, unfortunately, I only fix it on x64 but other arches. As a
result, CI tests failed because this bug hasn't been fixed on s390x.

Some helps on s390x are requested.

v3 -> v4:
  * Suggestions from Maciej:
    * Separate tail_call_cnt initialisation to a single patch.
    * Unnecessary to check subprog.

v2 -> v3:
  * Suggestions from Alexei:
    * Fix comment style.
    * Remove FIXME comment.
    * Remove arch_prepare_bpf_trampoline() function comment update.
    * Remove the unnecessary 'tgt_prog->aux->func[subprog]->is_func' check.

[1]: https://github.com/Asphaltt/learn-by-example/tree/main/ebpf/tailcall-stackoverflow

Leon Hwang (4):
  bpf, x64: Comment tail_call_cnt initialisation
  bpf, x64: Fix tailcall infinite loop
  selftests/bpf: Correct map_fd to data_fd in tailcalls
  selftests/bpf: Add testcases for tailcall infinite loop fixing

 arch/x86/net/bpf_jit_comp.c                   |  32 +-
 include/linux/bpf.h                           |   5 +
 kernel/bpf/trampoline.c                       |   4 +-
 kernel/bpf/verifier.c                         |   3 +
 .../selftests/bpf/prog_tests/tailcalls.c      | 311 +++++++++++++++++-
 .../bpf/progs/tailcall_bpf2bpf_fentry.c       |  18 +
 .../bpf/progs/tailcall_bpf2bpf_fexit.c        |  18 +
 7 files changed, 377 insertions(+), 14 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c


base-commit: 9930e4af4b509bcf6f060b09b16884f26102d110
-- 
2.41.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-09-07  3:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-03 15:14 [RFC PATCH bpf-next v4 0/4] bpf, x64: Fix tailcall infinite loop Leon Hwang
2023-09-03 15:14 ` [RFC PATCH bpf-next v4 1/4] bpf, x64: Comment tail_call_cnt initialisation Leon Hwang
2023-09-05 19:26   ` Maciej Fijalkowski
2023-09-06  2:23     ` Leon Hwang
2023-09-03 15:14 ` [RFC PATCH bpf-next v4 2/4] bpf, x64: Fix tailcall infinite loop Leon Hwang
2023-09-06 20:50   ` Maciej Fijalkowski
2023-09-03 15:14 ` [RFC PATCH bpf-next v4 3/4] selftests/bpf: Correct map_fd to data_fd in tailcalls Leon Hwang
2023-09-05 19:22   ` Maciej Fijalkowski
2023-09-06  2:29     ` Leon Hwang
2023-09-03 15:14 ` [RFC PATCH bpf-next v4 4/4] selftests/bpf: Add testcases for tailcall infinite loop fixing Leon Hwang
2023-09-06 21:18   ` Maciej Fijalkowski
2023-09-07  3:53     ` Leon Hwang
2023-09-04 13:10 ` [RFC PATCH bpf-next v4 0/4] bpf, x64: Fix tailcall infinite loop Ilya Leoshkevich
2023-09-04 15:15   ` Leon Hwang
2023-09-06  0:57     ` Ilya Leoshkevich
2023-09-06  2:39       ` Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox