public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
@ 2024-07-05 14:50 Puranjay Mohan
  2024-07-08 14:52 ` Daniel Borkmann
  2024-07-08 20:30 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 24+ messages in thread
From: Puranjay Mohan @ 2024-07-05 14:50 UTC (permalink / raw)
  To: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, bpf, linux-kselftest,
	linux-kernel, puranjay12

fexit_sleep test runs successfully now on the CI so remove it from the
deny list.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
 tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
index e865451e90d2..2bf981c80180 100644
--- a/tools/testing/selftests/bpf/DENYLIST.aarch64
+++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
@@ -1,6 +1,5 @@
 bpf_cookie/multi_kprobe_attach_api               # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
 bpf_cookie/multi_kprobe_link_api                 # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
-fexit_sleep                                      # The test never returns. The remaining tests cannot start.
 kprobe_multi_bench_attach                        # needs CONFIG_FPROBE
 kprobe_multi_test                                # needs CONFIG_FPROBE
 module_attach                                    # prog 'kprobe_multi': failed to auto-attach: -95
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-05 14:50 [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep Puranjay Mohan
@ 2024-07-08 14:52 ` Daniel Borkmann
  2024-07-08 15:00   ` Puranjay Mohan
  2024-07-08 20:30 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 24+ messages in thread
From: Daniel Borkmann @ 2024-07-08 14:52 UTC (permalink / raw)
  To: Puranjay Mohan, Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf, linux-kselftest, linux-kernel, puranjay12

On 7/5/24 4:50 PM, Puranjay Mohan wrote:
> fexit_sleep test runs successfully now on the CI so remove it from the
> deny list.

Do you happen to know which commit fixed it? If yes, might be nice to have it
documented in the commit message.

> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>   tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
> index e865451e90d2..2bf981c80180 100644
> --- a/tools/testing/selftests/bpf/DENYLIST.aarch64
> +++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
> @@ -1,6 +1,5 @@
>   bpf_cookie/multi_kprobe_attach_api               # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
>   bpf_cookie/multi_kprobe_link_api                 # kprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3
> -fexit_sleep                                      # The test never returns. The remaining tests cannot start.
>   kprobe_multi_bench_attach                        # needs CONFIG_FPROBE
>   kprobe_multi_test                                # needs CONFIG_FPROBE
>   module_attach                                    # prog 'kprobe_multi': failed to auto-attach: -95
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-08 14:52 ` Daniel Borkmann
@ 2024-07-08 15:00   ` Puranjay Mohan
  2024-07-08 15:26     ` KP Singh
  0 siblings, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2024-07-08 15:00 UTC (permalink / raw)
  To: Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Mykola Lysenko, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, bpf, linux-kselftest,
	linux-kernel, Manu Bretelle

[-- Attachment #1: Type: text/plain, Size: 774 bytes --]

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 7/5/24 4:50 PM, Puranjay Mohan wrote:
>> fexit_sleep test runs successfully now on the CI so remove it from the
>> deny list.
>
> Do you happen to know which commit fixed it? If yes, might be nice to have it
> documented in the commit message.

Actually, I never saw this test failing on my local setup and yesterday
I tried running it on the CI where it passed as well. So, I assumed that
this would be fixed by some commit. I am not sure which exact commit
might have fixed this.

Manu, Martin

When this was added to the deny list was this failing every time and did
you have some reproducer for this. If there is a reproducer, I can try
fixing it but when ran normally this test never fails for me.

Thanks,
Puranjay

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-08 15:00   ` Puranjay Mohan
@ 2024-07-08 15:26     ` KP Singh
  2024-07-08 15:29       ` Daniel Borkmann
  0 siblings, 1 reply; 24+ messages in thread
From: KP Singh @ 2024-07-08 15:26 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Mykola Lysenko, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Shuah Khan, bpf, linux-kselftest, linux-kernel,
	Manu Bretelle, Florent Revest

On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> Daniel Borkmann <daniel@iogearbox.net> writes:
>
> > On 7/5/24 4:50 PM, Puranjay Mohan wrote:
> >> fexit_sleep test runs successfully now on the CI so remove it from the
> >> deny list.
> >
> > Do you happen to know which commit fixed it? If yes, might be nice to have it
> > documented in the commit message.
>
> Actually, I never saw this test failing on my local setup and yesterday
> I tried running it on the CI where it passed as well. So, I assumed that
> this would be fixed by some commit. I am not sure which exact commit
> might have fixed this.
>
> Manu, Martin
>
> When this was added to the deny list was this failing every time and did
> you have some reproducer for this. If there is a reproducer, I can try
> fixing it but when ran normally this test never fails for me.
>

I think this never worked until
https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@chromium.org/
was merged, FTrace direct calls was blocking tracing programs on ARM,
since then it has always worked.

- KP

> Thanks,
> Puranjay

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-08 15:26     ` KP Singh
@ 2024-07-08 15:29       ` Daniel Borkmann
  2024-07-08 15:31         ` Florent Revest
  2024-07-08 15:35         ` Puranjay Mohan
  0 siblings, 2 replies; 24+ messages in thread
From: Daniel Borkmann @ 2024-07-08 15:29 UTC (permalink / raw)
  To: KP Singh, Puranjay Mohan
  Cc: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf, linux-kselftest, linux-kernel, Manu Bretelle,
	Florent Revest

On 7/8/24 5:26 PM, KP Singh wrote:
> On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>
>>> On 7/5/24 4:50 PM, Puranjay Mohan wrote:
>>>> fexit_sleep test runs successfully now on the CI so remove it from the
>>>> deny list.
>>>
>>> Do you happen to know which commit fixed it? If yes, might be nice to have it
>>> documented in the commit message.
>>
>> Actually, I never saw this test failing on my local setup and yesterday
>> I tried running it on the CI where it passed as well. So, I assumed that
>> this would be fixed by some commit. I am not sure which exact commit
>> might have fixed this.
>>
>> Manu, Martin
>>
>> When this was added to the deny list was this failing every time and did
>> you have some reproducer for this. If there is a reproducer, I can try
>> fixing it but when ran normally this test never fails for me.
> 
> I think this never worked until
> https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@chromium.org/
> was merged, FTrace direct calls was blocking tracing programs on ARM,
> since then it has always worked.

Awesome, thanks! I'll add this to the commit desc then when applying.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-08 15:29       ` Daniel Borkmann
@ 2024-07-08 15:31         ` Florent Revest
  2024-07-08 15:35         ` Puranjay Mohan
  1 sibling, 0 replies; 24+ messages in thread
From: Florent Revest @ 2024-07-08 15:31 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: KP Singh, Puranjay Mohan, Andrii Nakryiko, Eduard Zingerman,
	Mykola Lysenko, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Shuah Khan, bpf, linux-kselftest, linux-kernel,
	Manu Bretelle

On Mon, Jul 8, 2024 at 5:29 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 7/8/24 5:26 PM, KP Singh wrote:
> > On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan <puranjay@kernel.org> wrote:
> >>
> >> Daniel Borkmann <daniel@iogearbox.net> writes:
> >>
> >>> On 7/5/24 4:50 PM, Puranjay Mohan wrote:
> >>>> fexit_sleep test runs successfully now on the CI so remove it from the
> >>>> deny list.
> >>>
> >>> Do you happen to know which commit fixed it? If yes, might be nice to have it
> >>> documented in the commit message.
> >>
> >> Actually, I never saw this test failing on my local setup and yesterday
> >> I tried running it on the CI where it passed as well. So, I assumed that
> >> this would be fixed by some commit. I am not sure which exact commit
> >> might have fixed this.
> >>
> >> Manu, Martin
> >>
> >> When this was added to the deny list was this failing every time and did
> >> you have some reproducer for this. If there is a reproducer, I can try
> >> fixing it but when ran normally this test never fails for me.
> >
> > I think this never worked until
> > https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@chromium.org/
> > was merged, FTrace direct calls was blocking tracing programs on ARM,
> > since then it has always worked.
>
> Awesome, thanks! I'll add this to the commit desc then when applying.


I originally removed that test from the denylist:
https://lore.kernel.org/lkml/20230405180250.2046566-6-revest@chromium.org/

But then it was re-introduced by Martin here:
https://lore.kernel.org/bpf/CABRcYmJZ2uUQ4S9rqm+H0N9otjDBv5v45tGjRGKfX2+GZ9gxbw@mail.gmail.com/T/#mc65a794a852bb8b6850cc98be09a90cdc8c76c06

I moved on to work on different things and never investigated it. I
don't know what would have broken it and what would have fixed it. I
don't remember having observed this "test never returns" problem
myself.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-08 15:29       ` Daniel Borkmann
  2024-07-08 15:31         ` Florent Revest
@ 2024-07-08 15:35         ` Puranjay Mohan
  2024-07-08 16:09           ` Daniel Borkmann
  1 sibling, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2024-07-08 15:35 UTC (permalink / raw)
  To: Daniel Borkmann, KP Singh
  Cc: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf, linux-kselftest, linux-kernel, Manu Bretelle,
	Florent Revest

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 7/8/24 5:26 PM, KP Singh wrote:
>> On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan <puranjay@kernel.org> wrote:
>>>
>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>
>>>> On 7/5/24 4:50 PM, Puranjay Mohan wrote:
>>>>> fexit_sleep test runs successfully now on the CI so remove it from the
>>>>> deny list.
>>>>
>>>> Do you happen to know which commit fixed it? If yes, might be nice to have it
>>>> documented in the commit message.
>>>
>>> Actually, I never saw this test failing on my local setup and yesterday
>>> I tried running it on the CI where it passed as well. So, I assumed that
>>> this would be fixed by some commit. I am not sure which exact commit
>>> might have fixed this.
>>>
>>> Manu, Martin
>>>
>>> When this was added to the deny list was this failing every time and did
>>> you have some reproducer for this. If there is a reproducer, I can try
>>> fixing it but when ran normally this test never fails for me.
>> 
>> I think this never worked until
>> https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@chromium.org/
>> was merged, FTrace direct calls was blocking tracing programs on ARM,
>> since then it has always worked.
>
> Awesome, thanks! I'll add this to the commit desc then when applying.

The commit that added this to the deny list said:
31f4f810d533 ("selftests/bpf: Add fexit_sleep to DENYLIST.aarch64")

```
It is reported that the fexit_sleep never returns in aarch64.
The remaining tests cannot start.
```

So, if the lack of Ftrace direct calls would be the reason then the
failure would be due to fexit programs not being supported on arm64.

But this says that the selftest never returns therefore is not related
to ftrace direct call support but another bug?

Thanks,
Puranjay

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-08 15:35         ` Puranjay Mohan
@ 2024-07-08 16:09           ` Daniel Borkmann
  2024-07-08 16:42             ` KP Singh
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Borkmann @ 2024-07-08 16:09 UTC (permalink / raw)
  To: Puranjay Mohan, KP Singh
  Cc: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf, linux-kselftest, linux-kernel, Manu Bretelle,
	Florent Revest

On 7/8/24 5:35 PM, Puranjay Mohan wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
> 
>> On 7/8/24 5:26 PM, KP Singh wrote:
>>> On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan <puranjay@kernel.org> wrote:
>>>>
>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>
>>>>> On 7/5/24 4:50 PM, Puranjay Mohan wrote:
>>>>>> fexit_sleep test runs successfully now on the CI so remove it from the
>>>>>> deny list.
>>>>>
>>>>> Do you happen to know which commit fixed it? If yes, might be nice to have it
>>>>> documented in the commit message.
>>>>
>>>> Actually, I never saw this test failing on my local setup and yesterday
>>>> I tried running it on the CI where it passed as well. So, I assumed that
>>>> this would be fixed by some commit. I am not sure which exact commit
>>>> might have fixed this.
>>>>
>>>> Manu, Martin
>>>>
>>>> When this was added to the deny list was this failing every time and did
>>>> you have some reproducer for this. If there is a reproducer, I can try
>>>> fixing it but when ran normally this test never fails for me.
>>>
>>> I think this never worked until
>>> https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@chromium.org/
>>> was merged, FTrace direct calls was blocking tracing programs on ARM,
>>> since then it has always worked.
>>
>> Awesome, thanks! I'll add this to the commit desc then when applying.
> 
> The commit that added this to the deny list said:
> 31f4f810d533 ("selftests/bpf: Add fexit_sleep to DENYLIST.aarch64")
> 
> ```
> It is reported that the fexit_sleep never returns in aarch64.
> The remaining tests cannot start.
> ```
> 
> So, if the lack of Ftrace direct calls would be the reason then the
> failure would be due to fexit programs not being supported on arm64.
> 
> But this says that the selftest never returns therefore is not related
> to ftrace direct call support but another bug?

Fwiw, at least it is passing in the BPF CI now.

https://github.com/kernel-patches/bpf/actions/runs/9841781347/job/27169610006

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-08 16:09           ` Daniel Borkmann
@ 2024-07-08 16:42             ` KP Singh
  2024-07-09 17:44               ` Daniel Borkmann
  0 siblings, 1 reply; 24+ messages in thread
From: KP Singh @ 2024-07-08 16:42 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Puranjay Mohan, Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf, linux-kselftest, linux-kernel, Manu Bretelle,
	Florent Revest

On Mon, Jul 8, 2024 at 6:09 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 7/8/24 5:35 PM, Puranjay Mohan wrote:
> > Daniel Borkmann <daniel@iogearbox.net> writes:
> >
> >> On 7/8/24 5:26 PM, KP Singh wrote:
> >>> On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan <puranjay@kernel.org> wrote:
> >>>>
> >>>> Daniel Borkmann <daniel@iogearbox.net> writes:
> >>>>
> >>>>> On 7/5/24 4:50 PM, Puranjay Mohan wrote:
> >>>>>> fexit_sleep test runs successfully now on the CI so remove it from the
> >>>>>> deny list.
> >>>>>
> >>>>> Do you happen to know which commit fixed it? If yes, might be nice to have it
> >>>>> documented in the commit message.
> >>>>
> >>>> Actually, I never saw this test failing on my local setup and yesterday
> >>>> I tried running it on the CI where it passed as well. So, I assumed that
> >>>> this would be fixed by some commit. I am not sure which exact commit
> >>>> might have fixed this.
> >>>>
> >>>> Manu, Martin
> >>>>
> >>>> When this was added to the deny list was this failing every time and did
> >>>> you have some reproducer for this. If there is a reproducer, I can try
> >>>> fixing it but when ran normally this test never fails for me.
> >>>
> >>> I think this never worked until
> >>> https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@chromium.org/
> >>> was merged, FTrace direct calls was blocking tracing programs on ARM,
> >>> since then it has always worked.
> >>
> >> Awesome, thanks! I'll add this to the commit desc then when applying.
> >
> > The commit that added this to the deny list said:
> > 31f4f810d533 ("selftests/bpf: Add fexit_sleep to DENYLIST.aarch64")
> >
> > ```
> > It is reported that the fexit_sleep never returns in aarch64.
> > The remaining tests cannot start.
> > ```

It may also have something to do with sleepable programs. But I think
it's generally in the category of "BPF tracing was catching up with
ARM", it has now.

- KP

> >
> > So, if the lack of Ftrace direct calls would be the reason then the
> > failure would be due to fexit programs not being supported on arm64.
> >
> > But this says that the selftest never returns therefore is not related
> > to ftrace direct call support but another bug?
>
> Fwiw, at least it is passing in the BPF CI now.
>
> https://github.com/kernel-patches/bpf/actions/runs/9841781347/job/27169610006

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-05 14:50 [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep Puranjay Mohan
  2024-07-08 14:52 ` Daniel Borkmann
@ 2024-07-08 20:30 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 24+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-07-08 20:30 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: andrii, eddyz87, mykolal, ast, daniel, martin.lau, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah,
	bpf, linux-kselftest, linux-kernel, puranjay12

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Fri,  5 Jul 2024 14:50:09 +0000 you wrote:
> fexit_sleep test runs successfully now on the CI so remove it from the
> deny list.
> 
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 -
>  1 file changed, 1 deletion(-)

Here is the summary with links:
  - [bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
    https://git.kernel.org/bpf/bpf-next/c/90dc946059b7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-08 16:42             ` KP Singh
@ 2024-07-09 17:44               ` Daniel Borkmann
  2024-07-09 19:06                 ` Manu Bretelle
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Borkmann @ 2024-07-09 17:44 UTC (permalink / raw)
  To: KP Singh
  Cc: Puranjay Mohan, Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf, linux-kselftest, linux-kernel, Manu Bretelle,
	Florent Revest

On 7/8/24 6:42 PM, KP Singh wrote:
> On Mon, Jul 8, 2024 at 6:09 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 7/8/24 5:35 PM, Puranjay Mohan wrote:
>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>> On 7/8/24 5:26 PM, KP Singh wrote:
>>>>> On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan <puranjay@kernel.org> wrote:
>>>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>>>> On 7/5/24 4:50 PM, Puranjay Mohan wrote:
>>>>>>>> fexit_sleep test runs successfully now on the CI so remove it from the
>>>>>>>> deny list.
>>>>>>>
>>>>>>> Do you happen to know which commit fixed it? If yes, might be nice to have it
>>>>>>> documented in the commit message.
>>>>>>
>>>>>> Actually, I never saw this test failing on my local setup and yesterday
>>>>>> I tried running it on the CI where it passed as well. So, I assumed that
>>>>>> this would be fixed by some commit. I am not sure which exact commit
>>>>>> might have fixed this.
>>>>>>
>>>>>> Manu, Martin
>>>>>>
>>>>>> When this was added to the deny list was this failing every time and did
>>>>>> you have some reproducer for this. If there is a reproducer, I can try
>>>>>> fixing it but when ran normally this test never fails for me.
>>>>>
>>>>> I think this never worked until
>>>>> https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@chromium.org/
>>>>> was merged, FTrace direct calls was blocking tracing programs on ARM,
>>>>> since then it has always worked.
>>>>
>>>> Awesome, thanks! I'll add this to the commit desc then when applying.
>>>
>>> The commit that added this to the deny list said:
>>> 31f4f810d533 ("selftests/bpf: Add fexit_sleep to DENYLIST.aarch64")
>>>
>>> ```
>>> It is reported that the fexit_sleep never returns in aarch64.
>>> The remaining tests cannot start.
>>> ```
> 
> It may also have something to do with sleepable programs. But I think
> it's generally in the category of "BPF tracing was catching up with
> ARM", it has now.

Hm, the latest run actually hangs in fexit_sleep (which is the test right after
fexit_bpf2bpf). So looks like this was too early. It seems some CI runs pass on
arm64 but others fail:

   https://github.com/kernel-patches/bpf/actions/runs/9859826851/job/27224868398 (fail)
   https://github.com/kernel-patches/bpf/actions/runs/9859837213/job/27224955045 (pass)

Puranjay, do you have a chance to look into this again?

>>> So, if the lack of Ftrace direct calls would be the reason then the
>>> failure would be due to fexit programs not being supported on arm64.
>>>
>>> But this says that the selftest never returns therefore is not related
>>> to ftrace direct call support but another bug?
>>
>> Fwiw, at least it is passing in the BPF CI now.
>>
>> https://github.com/kernel-patches/bpf/actions/runs/9841781347/job/27169610006


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-09 17:44               ` Daniel Borkmann
@ 2024-07-09 19:06                 ` Manu Bretelle
  2024-07-10  7:18                   ` Puranjay Mohan
  2024-07-11 14:00                   ` Puranjay Mohan
  0 siblings, 2 replies; 24+ messages in thread
From: Manu Bretelle @ 2024-07-09 19:06 UTC (permalink / raw)
  To: Daniel Borkmann, KP Singh
  Cc: Puranjay Mohan, Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest



________________________________________
From: Daniel Borkmann <daniel@iogearbox.net>
Sent: Tuesday, July 9, 2024 10:44 AM
To: KP Singh <kpsingh@kernel.org>
Cc: Puranjay Mohan <puranjay@kernel.org>; Andrii Nakryiko <andrii@kernel.org>; Eduard Zingerman <eddyz87@gmail.com>; Mykola Lysenko <mykolal@meta.com>; Alexei Starovoitov <ast@kernel.org>; Martin KaFai Lau <martin.lau@linux.dev>; Song Liu <song@kernel.org>; Yonghong Song <yonghong.song@linux.dev>; John Fastabend <john.fastabend@gmail.com>; Stanislav Fomichev <sdf@google.com>; Hao Luo <haoluo@google.com>; Jiri Olsa <jolsa@kernel.org>; Shuah Khan <shuah@kernel.org>; bpf@vger.kernel.org <bpf@vger.kernel.org>; linux-kselftest@vger.kernel.org <linux-kselftest@vger.kernel.org>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; Manu Bretelle <chantra@meta.com>; Florent Revest <revest@google.com>
Subject: Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
 
On 7/8/24 6:42 PM, KP Singh wrote:
> On Mon, Jul 8, 2024 at 6:09 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 7/8/24 5:35 PM, Puranjay Mohan wrote:
>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>> On 7/8/24 5:26 PM, KP Singh wrote:
>>>>> On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan <puranjay@kernel.org> wrote:
>>>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>>>> On 7/5/24 4:50 PM, Puranjay Mohan wrote:
>>>>>>>> fexit_sleep test runs successfully now on the CI so remove it from the
>>>>>>>> deny list.
>>>>>>>
>>>>>>> Do you happen to know which commit fixed it? If yes, might be nice to have it
>>>>>>> documented in the commit message.
>>>>>>
>>>>>> Actually, I never saw this test failing on my local setup and yesterday
>>>>>> I tried running it on the CI where it passed as well. So, I assumed that
>>>>>> this would be fixed by some commit. I am not sure which exact commit
>>>>>> might have fixed this.
>>>>>>
>>>>>> Manu, Martin
>>>>>>
>>>>>> When this was added to the deny list was this failing every time and did
>>>>>> you have some reproducer for this. If there is a reproducer, I can try
>>>>>> fixing it but when ran normally this test never fails for me.
>>>>>
>>>>> I think this never worked until
>>>>> https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@chromium.org/
>>>>> was merged, FTrace direct calls was blocking tracing programs on ARM,
>>>>> since then it has always worked.
>>>>
>>>> Awesome, thanks! I'll add this to the commit desc then when applying.
>>>
>>> The commit that added this to the deny list said:
>>> 31f4f810d533 ("selftests/bpf: Add fexit_sleep to DENYLIST.aarch64")
>>>
>>> ```
>>> It is reported that the fexit_sleep never returns in aarch64.
>>> The remaining tests cannot start.
>>> ```
>
> It may also have something to do with sleepable programs. But I think
> it's generally in the category of "BPF tracing was catching up with
> ARM", it has now.

Hm, the latest run actually hangs in fexit_sleep (which is the test right after
fexit_bpf2bpf). So looks like this was too early. It seems some CI runs pass on
arm64 but others fail:

   https://github.com/kernel-patches/bpf/actions/runs/9859826851/job/27224868398 (fail)
   https://github.com/kernel-patches/bpf/actions/runs/9859837213/job/27224955045 (pass)

Puranjay, do you have a chance to look into this again?

Probably unrelated... but when I tried to reproduce this using qemu in full emulation mode [0], I am getting a kernel crash for fexit_sleep, but also for fexit_bpf2bpf, fentry_fexit

stacktraces look like (for fentry_fexit)


root@(none):/mnt/vmtest/selftests/bpf# ./test_progs -v -t fentry_fexit
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_fentry_fexit:PASS:fentry_skel_load 0 nsec
test_fentry_fexit:PASS:fexit_skel_load 0 nsec

test_fentry_fexit:PASS:fentry_attach 0 nsec
test_fentry_fexit:PASS:fexit_attach 0 nsec
Unable to handle kernel paging request at virtual address ffff0000c2a80e68
Mem abort info:
  ESR = 0x0000000096000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x04: level 0 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000
[ffff0000c2a80e68] pgd=1000000042f28003, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE)]
CPU: 0 PID: 97 Comm: test_progs Tainted: G           OE      6.10.0-rc6-gb0eedd920017-dirty #67
Hardware name: linux,dummy-virt (DT)
pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : __bpf_tramp_enter+0x58/0x190
lr : __bpf_tramp_enter+0xd8/0x190
sp : ffff800084afbc10
x29: ffff800084afbc10 x28: fff00000c28c2e80 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000050 x24: 0000000000000000
x23: 000000000000000a x22: fff00000c28c2e80 x21: 0000ffffed100070
x20: ffff800082032938 x19: ffff0000c2a80c00 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffed100070
x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000
x11: 0000000000020007 x10: 0000000000000007 x9 : 00000000ffffffff
x8 : 0000000000004008 x7 : ffff80008218fa78 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000086db7919 x3 : 0000000095481a34
x2 : 0000000000000001 x1 : fff00000c28c2e80 x0 : 0000000000000001
Call trace:
 __bpf_tramp_enter+0x58/0x190
 bpf_trampoline_6442499844+0x44/0x158
 bpf_fentry_test1+0x8/0x10
 bpf_prog_test_run_tracing+0x190/0x328
 __sys_bpf+0x844/0x2148
 __arm64_sys_bpf+0x2c/0x48
 invoke_syscall+0x4c/0x118
 el0_svc_common.constprop.0+0x48/0xf0
 do_el0_svc+0x24/0x38
 el0_svc+0x4c/0x120
 el0t_64_sync_handler+0xc0/0xc8
 el0t_64_sync+0x190/0x198
Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x00,00000006,8c13bd78,576676af
Memory Limit: none

For "fexit_sleep" and "fexit_bpf2bpf" respectively:


 $ ( cd  9859826851 && vmtest -k kbuild-output/arch/arm64/boot/Image.gz -r ../aarch64-rootfs -a aarch64 '/bin/mount bpffs /sys/fs/bpf -t bpf && ip link set lo up && cd /mnt/vmtest/selftests/bpf/ && ./test_progs -v -t fexit_sleep' )
=> Image.gz
===> Booting
===> Setting up VM
===> Running command
root@(none):/# bpf_testmod: loading out-of-tree module taints kernel.
bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
Unable to handle kernel paging request at virtual address ffff0000c19c2668
Mem abort info:
  ESR = 0x0000000096000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x04: level 0 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000
[ffff0000c19c2668] pgd=1000000042f28003, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
Modules linked in: bpf_testmod(OE)
CPU: 1 PID: 91 Comm: test_progs Tainted: G           OE      6.10.0-rc6-gb0eedd920017-dirty #67
Hardware name: linux,dummy-virt (DT)
pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : __bpf_tramp_enter+0x58/0x190
lr : __bpf_tramp_enter+0xd8/0x190
sp : ffff800084c4bda0
x29: ffff800084c4bda0 x28: fff00000c274ae80 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
x23: 0000000060001000 x22: 0000ffffa36b7a54 x21: 00000000ffffffff
x20: ffff800082032938 x19: ffff0000c19c2400 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000
x11: 0000000000020007 x10: 0000000000000007 x9 : 00000000ffffffff
x8 : 0000000000004008 x7 : ffff80008218fa78 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000086db7919 x3 : 0000000095481a34
x2 : 0000000000000001 x1 : fff00000c274ae80 x0 : 0000000000000001
Call trace:
 __bpf_tramp_enter+0x58/0x190
 bpf_trampoline_6442487232+0x44/0x158
 __arm64_sys_nanosleep+0x8/0xf0
 invoke_syscall+0x4c/0x118
 el0_svc_common.constprop.0+0x48/0xf0
 do_el0_svc+0x24/0x38
 el0_svc+0x4c/0x120
 el0t_64_sync_handler+0xc0/0xc8
 el0t_64_sync+0x190/0x198
Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x00,00000006,8c13bd78,576676af
Memory Limit: none
Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Broken pipe (os error 32)
    3: Broken pipe (os error 32)
[11:46:14] chantra@devvm17937:scratchpad $
[11:47:56] chantra@devvm17937:scratchpad $
[11:47:57] chantra@devvm17937:scratchpad $ ( cd  9859826851 && vmtest -k kbuild-output/arch/arm64/boot/Image.gz -r ../aarch64-rootfs -a aarch64 '/bin/mount bpffs /sys/fs/bpf -t bpf && ip link set lo up && cd /mnt/vmtest/selftests/bpf/ && ./test_progs -v -t fexit_bpf2bpf' )
=> Image.gz
===> Booting
===> Setting up VM
===> Running command
root@(none):/# bpf_testmod: loading out-of-tree module taints kernel.
bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
Unable to handle kernel paging request at virtual address ffff0000c278de68
Mem abort info:
  ESR = 0x0000000096000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x04: level 0 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000
[ffff0000c278de68] pgd=1000000042f28003, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
Modules linked in: bpf_testmod(OE)
CPU: 1 PID: 87 Comm: test_progs Tainted: G           OE      6.10.0-rc6-gb0eedd920017-dirty #67
Hardware name: linux,dummy-virt (DT)
pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : __bpf_tramp_enter+0x58/0x190
lr : __bpf_tramp_enter+0xd8/0x190
sp : ffff800084c4ba90
x29: ffff800084c4ba90 x28: ffff800080a32d10 x27: ffff800080a32d80
x26: ffff8000813e0ad8 x25: ffff800084c4bce4 x24: ffff800082fbd048
x23: 0000000000000001 x22: fff00000c2732e80 x21: fff00000c18a3200
x20: ffff800082032938 x19: ffff0000c278dc00 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaabcc22aa0
x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000
x11: 0000000000000000 x10: 000000000ac0d5af x9 : 000000000ac0d5af
x8 : 00000000a4d7a457 x7 : ffff80008218fa78 x6 : 0000000000000000
x5 : 0000000000000002 x4 : 0000000006fa0785 x3 : 0000000081d7cd4c
x2 : 0000000000000202 x1 : fff00000c2732e80 x0 : 0000000000000001
Call trace:
 __bpf_tramp_enter+0x58/0x190
 bpf_trampoline_34359738386+0x44/0xf8
 bpf_prog_3b052b77318ab7c4_test_pkt_md_access+0x8/0x118
 bpf_test_run+0x200/0x3a0
 bpf_prog_test_run_skb+0x328/0x6d8
 __sys_bpf+0x844/0x2148
 __arm64_sys_bpf+0x2c/0x48
 invoke_syscall+0x4c/0x118
 el0_svc_common.constprop.0+0x48/0xf0
 do_el0_svc+0x24/0x38
 el0_svc+0x4c/0x120
 el0t_64_sync_handler+0xc0/0xc8
 el0t_64_sync+0x190/0x198
Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception in interrupt
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x00,00000006,8c13bd78,576676af
Memory Limit: none
Failed to run command

Caused by:
    0: Failed to QGA guest-exec-status
    1: error running guest_exec_status
    2: Broken pipe (os error 32)
    3: Broken pipe (os error 32)


[0] https://chantra.github.io/bpfcitools/bpfci-troubleshooting.html

>>> So, if the lack of Ftrace direct calls would be the reason then the
>>> failure would be due to fexit programs not being supported on arm64.
>>>
>>> But this says that the selftest never returns therefore is not related
>>> to ftrace direct call support but another bug?
>>
>> Fwiw, at least it is passing in the BPF CI now.
>>
>> https://github.com/kernel-patches/bpf/actions/runs/9841781347/job/27169610006




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-09 19:06                 ` Manu Bretelle
@ 2024-07-10  7:18                   ` Puranjay Mohan
  2024-07-11 14:00                   ` Puranjay Mohan
  1 sibling, 0 replies; 24+ messages in thread
From: Puranjay Mohan @ 2024-07-10  7:18 UTC (permalink / raw)
  To: Manu Bretelle, Daniel Borkmann, KP Singh
  Cc: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

[-- Attachment #1: Type: text/plain, Size: 9791 bytes --]


[SNIP]

>
> Hm, the latest run actually hangs in fexit_sleep (which is the test right after
> fexit_bpf2bpf). So looks like this was too early. It seems some CI runs pass on
> arm64 but others fail:
>
>    https://github.com/kernel-patches/bpf/actions/runs/9859826851/job/27224868398 (fail)
>    https://github.com/kernel-patches/bpf/actions/runs/9859837213/job/27224955045 (pass)
>
> Puranjay, do you have a chance to look into this again?
>
> Probably unrelated... but when I tried to reproduce this using qemu in full emulation mode [0], I am getting a kernel crash for fexit_sleep, but also for fexit_bpf2bpf, fentry_fexit
>
> stacktraces look like (for fentry_fexit)
>
>
> root@(none):/mnt/vmtest/selftests/bpf# ./test_progs -v -t fentry_fexit
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> test_fentry_fexit:PASS:fentry_skel_load 0 nsec
> test_fentry_fexit:PASS:fexit_skel_load 0 nsec
>
> test_fentry_fexit:PASS:fentry_attach 0 nsec
> test_fentry_fexit:PASS:fexit_attach 0 nsec
> Unable to handle kernel paging request at virtual address ffff0000c2a80e68
> Mem abort info:
>   ESR = 0x0000000096000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x04: level 0 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000
> [ffff0000c2a80e68] pgd=1000000042f28003, p4d=0000000000000000
> Internal error: Oops: 0000000096000004 [#1] SMP
> Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE)]
> CPU: 0 PID: 97 Comm: test_progs Tainted: G           OE      6.10.0-rc6-gb0eedd920017-dirty #67
> Hardware name: linux,dummy-virt (DT)
> pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> pc : __bpf_tramp_enter+0x58/0x190
> lr : __bpf_tramp_enter+0xd8/0x190
> sp : ffff800084afbc10
> x29: ffff800084afbc10 x28: fff00000c28c2e80 x27: 0000000000000000
> x26: 0000000000000000 x25: 0000000000000050 x24: 0000000000000000
> x23: 000000000000000a x22: fff00000c28c2e80 x21: 0000ffffed100070
> x20: ffff800082032938 x19: ffff0000c2a80c00 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffed100070
> x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000
> x11: 0000000000020007 x10: 0000000000000007 x9 : 00000000ffffffff
> x8 : 0000000000004008 x7 : ffff80008218fa78 x6 : 0000000000000000
> x5 : 0000000000000001 x4 : 0000000086db7919 x3 : 0000000095481a34
> x2 : 0000000000000001 x1 : fff00000c28c2e80 x0 : 0000000000000001
> Call trace:
>  __bpf_tramp_enter+0x58/0x190
>  bpf_trampoline_6442499844+0x44/0x158
>  bpf_fentry_test1+0x8/0x10
>  bpf_prog_test_run_tracing+0x190/0x328
>  __sys_bpf+0x844/0x2148
>  __arm64_sys_bpf+0x2c/0x48
>  invoke_syscall+0x4c/0x118
>  el0_svc_common.constprop.0+0x48/0xf0
>  do_el0_svc+0x24/0x38
>  el0_svc+0x4c/0x120
>  el0t_64_sync_handler+0xc0/0xc8
>  el0t_64_sync+0x190/0x198
> Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660)
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Oops: Fatal exception
> SMP: stopping secondary CPUs
> Kernel Offset: disabled
> CPU features: 0x00,00000006,8c13bd78,576676af
> Memory Limit: none
>
> For "fexit_sleep" and "fexit_bpf2bpf" respectively:
>
>
>  $ ( cd  9859826851 && vmtest -k kbuild-output/arch/arm64/boot/Image.gz -r ../aarch64-rootfs -a aarch64 '/bin/mount bpffs /sys/fs/bpf -t bpf && ip link set lo up && cd /mnt/vmtest/selftests/bpf/ && ./test_progs -v -t fexit_sleep' )
> => Image.gz
> ===> Booting
> ===> Setting up VM
> ===> Running command
> root@(none):/# bpf_testmod: loading out-of-tree module taints kernel.
> bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
> Unable to handle kernel paging request at virtual address ffff0000c19c2668
> Mem abort info:
>   ESR = 0x0000000096000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x04: level 0 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000
> [ffff0000c19c2668] pgd=1000000042f28003, p4d=0000000000000000
> Internal error: Oops: 0000000096000004 [#1] SMP
> Modules linked in: bpf_testmod(OE)
> CPU: 1 PID: 91 Comm: test_progs Tainted: G           OE      6.10.0-rc6-gb0eedd920017-dirty #67
> Hardware name: linux,dummy-virt (DT)
> pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> pc : __bpf_tramp_enter+0x58/0x190
> lr : __bpf_tramp_enter+0xd8/0x190
> sp : ffff800084c4bda0
> x29: ffff800084c4bda0 x28: fff00000c274ae80 x27: 0000000000000000
> x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> x23: 0000000060001000 x22: 0000ffffa36b7a54 x21: 00000000ffffffff
> x20: ffff800082032938 x19: ffff0000c19c2400 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000
> x11: 0000000000020007 x10: 0000000000000007 x9 : 00000000ffffffff
> x8 : 0000000000004008 x7 : ffff80008218fa78 x6 : 0000000000000000
> x5 : 0000000000000001 x4 : 0000000086db7919 x3 : 0000000095481a34
> x2 : 0000000000000001 x1 : fff00000c274ae80 x0 : 0000000000000001
> Call trace:
>  __bpf_tramp_enter+0x58/0x190
>  bpf_trampoline_6442487232+0x44/0x158
>  __arm64_sys_nanosleep+0x8/0xf0
>  invoke_syscall+0x4c/0x118
>  el0_svc_common.constprop.0+0x48/0xf0
>  do_el0_svc+0x24/0x38
>  el0_svc+0x4c/0x120
>  el0t_64_sync_handler+0xc0/0xc8
>  el0t_64_sync+0x190/0x198
> Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660)
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Oops: Fatal exception
> SMP: stopping secondary CPUs
> Kernel Offset: disabled
> CPU features: 0x00,00000006,8c13bd78,576676af
> Memory Limit: none
> Failed to run command
>
> Caused by:
>     0: Failed to QGA guest-exec-status
>     1: error running guest_exec_status
>     2: Broken pipe (os error 32)
>     3: Broken pipe (os error 32)
> [11:46:14] chantra@devvm17937:scratchpad $
> [11:47:56] chantra@devvm17937:scratchpad $
> [11:47:57] chantra@devvm17937:scratchpad $ ( cd  9859826851 && vmtest -k kbuild-output/arch/arm64/boot/Image.gz -r ../aarch64-rootfs -a aarch64 '/bin/mount bpffs /sys/fs/bpf -t bpf && ip link set lo up && cd /mnt/vmtest/selftests/bpf/ && ./test_progs -v -t fexit_bpf2bpf' )
> => Image.gz
> ===> Booting
> ===> Setting up VM
> ===> Running command
> root@(none):/# bpf_testmod: loading out-of-tree module taints kernel.
> bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
> Unable to handle kernel paging request at virtual address ffff0000c278de68
> Mem abort info:
>   ESR = 0x0000000096000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x04: level 0 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000
> [ffff0000c278de68] pgd=1000000042f28003, p4d=0000000000000000
> Internal error: Oops: 0000000096000004 [#1] SMP
> Modules linked in: bpf_testmod(OE)
> CPU: 1 PID: 87 Comm: test_progs Tainted: G           OE      6.10.0-rc6-gb0eedd920017-dirty #67
> Hardware name: linux,dummy-virt (DT)
> pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> pc : __bpf_tramp_enter+0x58/0x190
> lr : __bpf_tramp_enter+0xd8/0x190
> sp : ffff800084c4ba90
> x29: ffff800084c4ba90 x28: ffff800080a32d10 x27: ffff800080a32d80
> x26: ffff8000813e0ad8 x25: ffff800084c4bce4 x24: ffff800082fbd048
> x23: 0000000000000001 x22: fff00000c2732e80 x21: fff00000c18a3200
> x20: ffff800082032938 x19: ffff0000c278dc00 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaabcc22aa0
> x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000
> x11: 0000000000000000 x10: 000000000ac0d5af x9 : 000000000ac0d5af
> x8 : 00000000a4d7a457 x7 : ffff80008218fa78 x6 : 0000000000000000
> x5 : 0000000000000002 x4 : 0000000006fa0785 x3 : 0000000081d7cd4c
> x2 : 0000000000000202 x1 : fff00000c2732e80 x0 : 0000000000000001
> Call trace:
>  __bpf_tramp_enter+0x58/0x190
>  bpf_trampoline_34359738386+0x44/0xf8
>  bpf_prog_3b052b77318ab7c4_test_pkt_md_access+0x8/0x118
>  bpf_test_run+0x200/0x3a0
>  bpf_prog_test_run_skb+0x328/0x6d8
>  __sys_bpf+0x844/0x2148
>  __arm64_sys_bpf+0x2c/0x48
>  invoke_syscall+0x4c/0x118
>  el0_svc_common.constprop.0+0x48/0xf0
>  do_el0_svc+0x24/0x38
>  el0_svc+0x4c/0x120
>  el0t_64_sync_handler+0xc0/0xc8
>  el0t_64_sync+0x190/0x198
> Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660)
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Oops: Fatal exception in interrupt
> SMP: stopping secondary CPUs
> Kernel Offset: disabled
> CPU features: 0x00,00000006,8c13bd78,576676af
> Memory Limit: none
> Failed to run command
>
> Caused by:
>     0: Failed to QGA guest-exec-status
>     1: error running guest_exec_status
>     2: Broken pipe (os error 32)
>     3: Broken pipe (os error 32)
>
>
> [0] https://chantra.github.io/bpfcitools/bpfci-troubleshooting.html


Thanks for sharing the logs,
I will try to reproduce this and find the root cause.

Thanks,
Puranjay

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-09 19:06                 ` Manu Bretelle
  2024-07-10  7:18                   ` Puranjay Mohan
@ 2024-07-11 14:00                   ` Puranjay Mohan
  2024-07-11 15:55                     ` Daniel Borkmann
                                       ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Puranjay Mohan @ 2024-07-11 14:00 UTC (permalink / raw)
  To: Manu Bretelle, Daniel Borkmann, KP Singh
  Cc: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]


Hi,
I was able find the root cause of this bug and will send a fix soon!

> Unable to handle kernel paging request at virtual address ffff0000c2a80e68

We are running this test on Qemu with '-cpu max', this means 52-bit
virtual addresses are being used.

The trampolines generation code has the following two lines:

		emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
		emit_call((const u64)__bpf_tramp_enter, ctx);

here the address of struct bpf_tramp_image is moved to R0 and passed as
an argument to __bpf_tramp_enter().

emit_addr_mov_i64() assumes that the address passed to it is in the
vmalloc space and uses at most 48 bits. It sets all the remaining bits
to 1.

but struct bpf_tramp_image is allocated using kzalloc() and when 52-bit
VAs are used, its address is not guaranteed to be 48-bit, therefore we
see this bug, where  0xfff[0]0000c2a80e68 is converted to
0xfff[f]0000c2a80e68 when the trampoline is generated.

The fix would be use emit_a64_mov_i64() for moving this address into R0.

Thanks,
Puranjay

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-11 14:00                   ` Puranjay Mohan
@ 2024-07-11 15:55                     ` Daniel Borkmann
  2024-07-12 13:50                     ` Daniel Borkmann
  2024-07-12 17:27                     ` Manu Bretelle
  2 siblings, 0 replies; 24+ messages in thread
From: Daniel Borkmann @ 2024-07-11 15:55 UTC (permalink / raw)
  To: Puranjay Mohan, Manu Bretelle, KP Singh
  Cc: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

On 7/11/24 4:00 PM, Puranjay Mohan wrote:
> 
> Hi,
> I was able find the root cause of this bug and will send a fix soon!
> 
>> Unable to handle kernel paging request at virtual address ffff0000c2a80e68
> 
> We are running this test on Qemu with '-cpu max', this means 52-bit
> virtual addresses are being used.
> 
> The trampolines generation code has the following two lines:
> 
> 		emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
> 		emit_call((const u64)__bpf_tramp_enter, ctx);
> 
> here the address of struct bpf_tramp_image is moved to R0 and passed as
> an argument to __bpf_tramp_enter().
> 
> emit_addr_mov_i64() assumes that the address passed to it is in the
> vmalloc space and uses at most 48 bits. It sets all the remaining bits
> to 1.
> 
> but struct bpf_tramp_image is allocated using kzalloc() and when 52-bit
> VAs are used, its address is not guaranteed to be 48-bit, therefore we
> see this bug, where  0xfff[0]0000c2a80e68 is converted to
> 0xfff[f]0000c2a80e68 when the trampoline is generated.
> 
> The fix would be use emit_a64_mov_i64() for moving this address into R0.

Excellent find!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-11 14:00                   ` Puranjay Mohan
  2024-07-11 15:55                     ` Daniel Borkmann
@ 2024-07-12 13:50                     ` Daniel Borkmann
  2024-07-12 16:07                       ` Alexei Starovoitov
  2024-07-15 16:31                       ` Puranjay Mohan
  2024-07-12 17:27                     ` Manu Bretelle
  2 siblings, 2 replies; 24+ messages in thread
From: Daniel Borkmann @ 2024-07-12 13:50 UTC (permalink / raw)
  To: Puranjay Mohan, Manu Bretelle, KP Singh
  Cc: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

Hi Puranjay,

On 7/11/24 4:00 PM, Puranjay Mohan wrote:
[...]
> I was able find the root cause of this bug and will send a fix soon!
> 
>> Unable to handle kernel paging request at virtual address ffff0000c2a80e68
> 
> We are running this test on Qemu with '-cpu max', this means 52-bit
> virtual addresses are being used.
> 
> The trampolines generation code has the following two lines:
> 
> 		emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
> 		emit_call((const u64)__bpf_tramp_enter, ctx);
> 
> here the address of struct bpf_tramp_image is moved to R0 and passed as
> an argument to __bpf_tramp_enter().
> 
> emit_addr_mov_i64() assumes that the address passed to it is in the
> vmalloc space and uses at most 48 bits. It sets all the remaining bits
> to 1.
> 
> but struct bpf_tramp_image is allocated using kzalloc() and when 52-bit
> VAs are used, its address is not guaranteed to be 48-bit, therefore we
> see this bug, where  0xfff[0]0000c2a80e68 is converted to
> 0xfff[f]0000c2a80e68 when the trampoline is generated.
> 
> The fix would be use emit_a64_mov_i64() for moving this address into R0.

It looks like there is still an issue left. A recent CI run on bpf-next is
still hitting the same on arm64:

Base:

   https://github.com/kernel-patches/bpf/commits/series/870746%3D%3Ebpf-next/

CI:

   https://github.com/kernel-patches/bpf/actions/runs/9905842936/job/27366435436

   [...]
   #89/11   fexit_bpf2bpf/func_replace_global_func:OK
   #89/12   fexit_bpf2bpf/fentry_to_cgroup_bpf:OK
   #89/13   fexit_bpf2bpf/func_replace_progmap:OK
   #89      fexit_bpf2bpf:OK
   Error: The operation was canceled.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-12 13:50                     ` Daniel Borkmann
@ 2024-07-12 16:07                       ` Alexei Starovoitov
  2024-07-12 16:19                         ` Daniel Borkmann
  2024-07-15 16:31                       ` Puranjay Mohan
  1 sibling, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2024-07-12 16:07 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Puranjay Mohan, Manu Bretelle, KP Singh, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

On Fri, Jul 12, 2024 at 6:50 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Hi Puranjay,
>
> On 7/11/24 4:00 PM, Puranjay Mohan wrote:
> [...]
> > I was able find the root cause of this bug and will send a fix soon!
> >
> >> Unable to handle kernel paging request at virtual address ffff0000c2a80e68
> >
> > We are running this test on Qemu with '-cpu max', this means 52-bit
> > virtual addresses are being used.
> >
> > The trampolines generation code has the following two lines:
> >
> >               emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
> >               emit_call((const u64)__bpf_tramp_enter, ctx);
> >
> > here the address of struct bpf_tramp_image is moved to R0 and passed as
> > an argument to __bpf_tramp_enter().
> >
> > emit_addr_mov_i64() assumes that the address passed to it is in the
> > vmalloc space and uses at most 48 bits. It sets all the remaining bits
> > to 1.
> >
> > but struct bpf_tramp_image is allocated using kzalloc() and when 52-bit
> > VAs are used, its address is not guaranteed to be 48-bit, therefore we
> > see this bug, where  0xfff[0]0000c2a80e68 is converted to
> > 0xfff[f]0000c2a80e68 when the trampoline is generated.
> >
> > The fix would be use emit_a64_mov_i64() for moving this address into R0.
>
> It looks like there is still an issue left. A recent CI run on bpf-next is
> still hitting the same on arm64:
>
> Base:
>
>    https://github.com/kernel-patches/bpf/commits/series/870746%3D%3Ebpf-next/
>
> CI:
>
>    https://github.com/kernel-patches/bpf/actions/runs/9905842936/job/27366435436
>
>    [...]
>    #89/11   fexit_bpf2bpf/func_replace_global_func:OK
>    #89/12   fexit_bpf2bpf/fentry_to_cgroup_bpf:OK
>    #89/13   fexit_bpf2bpf/func_replace_progmap:OK
>    #89      fexit_bpf2bpf:OK
>    Error: The operation was canceled.

Let's denylist that test again for now?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-12 16:07                       ` Alexei Starovoitov
@ 2024-07-12 16:19                         ` Daniel Borkmann
  0 siblings, 0 replies; 24+ messages in thread
From: Daniel Borkmann @ 2024-07-12 16:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Puranjay Mohan, Manu Bretelle, KP Singh, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

On 7/12/24 6:07 PM, Alexei Starovoitov wrote:
> On Fri, Jul 12, 2024 at 6:50 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 7/11/24 4:00 PM, Puranjay Mohan wrote:
>> [...]
>>> I was able find the root cause of this bug and will send a fix soon!
>>>
>>>> Unable to handle kernel paging request at virtual address ffff0000c2a80e68
>>>
>>> We are running this test on Qemu with '-cpu max', this means 52-bit
>>> virtual addresses are being used.
>>>
>>> The trampolines generation code has the following two lines:
>>>
>>>                emit_addr_mov_i64(A64_R(0), (const u64)im, ctx);
>>>                emit_call((const u64)__bpf_tramp_enter, ctx);
>>>
>>> here the address of struct bpf_tramp_image is moved to R0 and passed as
>>> an argument to __bpf_tramp_enter().
>>>
>>> emit_addr_mov_i64() assumes that the address passed to it is in the
>>> vmalloc space and uses at most 48 bits. It sets all the remaining bits
>>> to 1.
>>>
>>> but struct bpf_tramp_image is allocated using kzalloc() and when 52-bit
>>> VAs are used, its address is not guaranteed to be 48-bit, therefore we
>>> see this bug, where  0xfff[0]0000c2a80e68 is converted to
>>> 0xfff[f]0000c2a80e68 when the trampoline is generated.
>>>
>>> The fix would be use emit_a64_mov_i64() for moving this address into R0.
>>
>> It looks like there is still an issue left. A recent CI run on bpf-next is
>> still hitting the same on arm64:
>>
>> Base:
>>
>>     https://github.com/kernel-patches/bpf/commits/series/870746%3D%3Ebpf-next/
>>
>> CI:
>>
>>     https://github.com/kernel-patches/bpf/actions/runs/9905842936/job/27366435436
>>
>>     [...]
>>     #89/11   fexit_bpf2bpf/func_replace_global_func:OK
>>     #89/12   fexit_bpf2bpf/fentry_to_cgroup_bpf:OK
>>     #89/13   fexit_bpf2bpf/func_replace_progmap:OK
>>     #89      fexit_bpf2bpf:OK
>>     Error: The operation was canceled.
> 
> Let's denylist that test again for now?

Agree, done/pushed now.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-11 14:00                   ` Puranjay Mohan
  2024-07-11 15:55                     ` Daniel Borkmann
  2024-07-12 13:50                     ` Daniel Borkmann
@ 2024-07-12 17:27                     ` Manu Bretelle
  2024-07-12 18:08                       ` Puranjay Mohan
  2 siblings, 1 reply; 24+ messages in thread
From: Manu Bretelle @ 2024-07-12 17:27 UTC (permalink / raw)
  To: Puranjay Mohan, Daniel Borkmann, KP Singh
  Cc: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest






________________________________________
From: Puranjay Mohan
Sent: Thursday, July 11, 2024 7:00 AM
To: Manu Bretelle; Daniel Borkmann; KP Singh
Cc: Andrii Nakryiko; Eduard Zingerman; Mykola Lysenko; Alexei Starovoitov; Martin KaFai Lau; Song Liu; Yonghong Song; John Fastabend; Stanislav Fomichev; Hao Luo; Jiri Olsa; Shuah Khan; bpf@vger.kernel.org; linux-kselftest@vger.kernel.org; linux-kernel@vger.kernel.org; Florent Revest
Subject: Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep




Hi,

I was able find the root cause of this bug and will send a fix soon!



> Unable to handle kernel paging request at virtual address ffff0000c2a80e68

I was able to confirm the fix using the artifacts from https://github.com/kernel-patches/bpf/actions/runs/9905842936
Thanks


Manu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-12 17:27                     ` Manu Bretelle
@ 2024-07-12 18:08                       ` Puranjay Mohan
  2024-07-12 19:59                         ` Manu Bretelle
  0 siblings, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2024-07-12 18:08 UTC (permalink / raw)
  To: Manu Bretelle
  Cc: Puranjay Mohan, Daniel Borkmann, KP Singh, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

Hi Manu,

>
> I was able to confirm the fix using the artifacts from https://github.com/kernel-patches/bpf/actions/runs/9905842936
> Thanks
>

Thanks for testing the fix.

This bug has been resolved now but the test still hangs sometimes.
Unfortunately, I am not able to reproduce this hang
using vmtest. Can you extract some logs from the CI somehow?? If it is
hanging in the kernel there should be some
soft lockup or RCU lockup related messages.

I was talking about this with Kumar and we think that this test is
hanging in the userspace in the following loop:

while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);

Could it be that fentry_cnt is > 2 somehow before we reach this?? This
is only a random guess though.

Thanks,
Puranjay

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-12 18:08                       ` Puranjay Mohan
@ 2024-07-12 19:59                         ` Manu Bretelle
  0 siblings, 0 replies; 24+ messages in thread
From: Manu Bretelle @ 2024-07-12 19:59 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: Puranjay Mohan, Daniel Borkmann, KP Singh, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest



> On Jul 12, 2024, at 11:08 AM, Puranjay Mohan <puranjay12@gmail.com> wrote:
> 
> Thanks for testing the fix.
> 
> This bug has been resolved now but the test still hangs sometimes.
> Unfortunately, I am not able to reproduce this hang
> using vmtest.

I have not able to reproduce the original error either. I will try to reproduce on the actual CI host next week (which are native arm64 host compared to my local setup which uses full emulation).

> Can you extract some logs from the CI somehow?? If it is
> hanging in the kernel there should be some
> soft lockup or RCU lockup related messages.

I think once we execute the test, vmtest does not track the console logs anymore, so we won’t see those. That should be fixable, but for now, I won’t be able to get more logs than you get from the UI currently.
> 
> I was talking about this with Kumar and we think that this test is
> hanging in the userspace in the following loop:
> 
> while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);
> 
> Could it be that fentry_cnt is > 2 somehow before we reach this?? This
> is only a random guess though.
> 


Manu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-12 13:50                     ` Daniel Borkmann
  2024-07-12 16:07                       ` Alexei Starovoitov
@ 2024-07-15 16:31                       ` Puranjay Mohan
  2024-07-15 17:07                         ` Alexei Starovoitov
  1 sibling, 1 reply; 24+ messages in thread
From: Puranjay Mohan @ 2024-07-15 16:31 UTC (permalink / raw)
  To: Daniel Borkmann, Manu Bretelle, KP Singh
  Cc: Andrii Nakryiko, Eduard Zingerman, Mykola Lysenko,
	Alexei Starovoitov, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

[-- Attachment #1: Type: text/plain, Size: 1778 bytes --]


Hi Daniel, Manu
I was able to reproduce this issue on KVM and found the root cause for
this hang! The other issue that we fixed is unrelated to this hang and
doesn't occur on self hosted github runners as they use 48-bit VAs.

The userspace test code has:

    #define STACK_SIZE (1024 * 1024)
    static char child_stack[STACK_SIZE];

    cpid = clone(do_sleep, child_stack + STACK_SIZE, CLONE_FILES | SIGCHLD, fexit_skel);

arm64 requires the stack pointer to be 16 byte aligned otherwise
SPAlignmentFault occurs, this appears as Bus error in the userspace.

The stack provided to the clone system call is not guaranteed to be
aligned properly in this selftest.

The test hangs on the following line:
    while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);

Because the child process is killed due to SPAlignmentFault, the
fentry_cnt remains at 0!

Reading the man page of clone system call, the correct way to allocate
stack for this call is using mmap like this:

stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);

This fixes the issue, I will send a patch to use this and once again
remove this test from DENYLIST and I hope this time it fixes it for good.

> It looks like there is still an issue left. A recent CI run on bpf-next is
> still hitting the same on arm64:
>
> Base:
>
>    https://github.com/kernel-patches/bpf/commits/series/870746%3D%3Ebpf-next/
>
> CI:
>
>    https://github.com/kernel-patches/bpf/actions/runs/9905842936/job/27366435436
>
>    [...]
>    #89/11   fexit_bpf2bpf/func_replace_global_func:OK
>    #89/12   fexit_bpf2bpf/fentry_to_cgroup_bpf:OK
>    #89/13   fexit_bpf2bpf/func_replace_progmap:OK
>    #89      fexit_bpf2bpf:OK
>    Error: The operation was canceled.

Thanks,
Puranjay

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-15 16:31                       ` Puranjay Mohan
@ 2024-07-15 17:07                         ` Alexei Starovoitov
  2024-07-15 17:32                           ` Puranjay Mohan
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2024-07-15 17:07 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: Daniel Borkmann, Manu Bretelle, KP Singh, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

On Mon, Jul 15, 2024 at 9:32 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>
>
> Hi Daniel, Manu
> I was able to reproduce this issue on KVM and found the root cause for
> this hang! The other issue that we fixed is unrelated to this hang and
> doesn't occur on self hosted github runners as they use 48-bit VAs.
>
> The userspace test code has:
>
>     #define STACK_SIZE (1024 * 1024)
>     static char child_stack[STACK_SIZE];
>
>     cpid = clone(do_sleep, child_stack + STACK_SIZE, CLONE_FILES | SIGCHLD, fexit_skel);
>
> arm64 requires the stack pointer to be 16 byte aligned otherwise
> SPAlignmentFault occurs, this appears as Bus error in the userspace.
>
> The stack provided to the clone system call is not guaranteed to be
> aligned properly in this selftest.
>
> The test hangs on the following line:
>     while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);
>
> Because the child process is killed due to SPAlignmentFault, the
> fentry_cnt remains at 0!
>
> Reading the man page of clone system call, the correct way to allocate
> stack for this call is using mmap like this:
>
> stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
>
> This fixes the issue, I will send a patch to use this and once again
> remove this test from DENYLIST and I hope this time it fixes it for good.

Wow. Great find. Good to know.
prog_tests/ns_current_pid_tgid.c has the same issue probably.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
  2024-07-15 17:07                         ` Alexei Starovoitov
@ 2024-07-15 17:32                           ` Puranjay Mohan
  0 siblings, 0 replies; 24+ messages in thread
From: Puranjay Mohan @ 2024-07-15 17:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Manu Bretelle, KP Singh, Andrii Nakryiko,
	Eduard Zingerman, Mykola Lysenko, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florent Revest

[-- Attachment #1: Type: text/plain, Size: 1781 bytes --]

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Jul 15, 2024 at 9:32 AM Puranjay Mohan <puranjay@kernel.org> wrote:
>>
>>
>> Hi Daniel, Manu
>> I was able to reproduce this issue on KVM and found the root cause for
>> this hang! The other issue that we fixed is unrelated to this hang and
>> doesn't occur on self hosted github runners as they use 48-bit VAs.
>>
>> The userspace test code has:
>>
>>     #define STACK_SIZE (1024 * 1024)
>>     static char child_stack[STACK_SIZE];
>>
>>     cpid = clone(do_sleep, child_stack + STACK_SIZE, CLONE_FILES | SIGCHLD, fexit_skel);
>>
>> arm64 requires the stack pointer to be 16 byte aligned otherwise
>> SPAlignmentFault occurs, this appears as Bus error in the userspace.
>>
>> The stack provided to the clone system call is not guaranteed to be
>> aligned properly in this selftest.
>>
>> The test hangs on the following line:
>>     while (READ_ONCE(fexit_skel->bss->fentry_cnt) != 2);
>>
>> Because the child process is killed due to SPAlignmentFault, the
>> fentry_cnt remains at 0!
>>
>> Reading the man page of clone system call, the correct way to allocate
>> stack for this call is using mmap like this:
>>
>> stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
>>
>> This fixes the issue, I will send a patch to use this and once again
>> remove this test from DENYLIST and I hope this time it fixes it for good.
>
> Wow. Great find. Good to know.
> prog_tests/ns_current_pid_tgid.c has the same issue probably.

Yes, I checked that test as well using gdb and fortunately it gets a 16
byte aligned stack pointer, but this is just luck, so I will send a
patch to fix that test as well.

Thanks,
Puranjay

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 255 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-07-15 17:32 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-05 14:50 [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep Puranjay Mohan
2024-07-08 14:52 ` Daniel Borkmann
2024-07-08 15:00   ` Puranjay Mohan
2024-07-08 15:26     ` KP Singh
2024-07-08 15:29       ` Daniel Borkmann
2024-07-08 15:31         ` Florent Revest
2024-07-08 15:35         ` Puranjay Mohan
2024-07-08 16:09           ` Daniel Borkmann
2024-07-08 16:42             ` KP Singh
2024-07-09 17:44               ` Daniel Borkmann
2024-07-09 19:06                 ` Manu Bretelle
2024-07-10  7:18                   ` Puranjay Mohan
2024-07-11 14:00                   ` Puranjay Mohan
2024-07-11 15:55                     ` Daniel Borkmann
2024-07-12 13:50                     ` Daniel Borkmann
2024-07-12 16:07                       ` Alexei Starovoitov
2024-07-12 16:19                         ` Daniel Borkmann
2024-07-15 16:31                       ` Puranjay Mohan
2024-07-15 17:07                         ` Alexei Starovoitov
2024-07-15 17:32                           ` Puranjay Mohan
2024-07-12 17:27                     ` Manu Bretelle
2024-07-12 18:08                       ` Puranjay Mohan
2024-07-12 19:59                         ` Manu Bretelle
2024-07-08 20:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox