From: Yonghong Song <yonghong.song@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Eduard Zingerman <eddyz87@gmail.com>
Cc: Puranjay Mohan <puranjay@kernel.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
bpf <bpf@vger.kernel.org>,
kkd@meta.com, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@kernel.org>,
Song Liu <song@kernel.org>, Steven Rostedt <rostedt@goodmis.org>,
Jiri Olsa <olsajiri@gmail.com>,
Juri Lelli <juri.lelli@redhat.com>,
Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments
Date: Fri, 8 Nov 2024 13:57:04 -0800 [thread overview]
Message-ID: <0f763686-2b79-40ee-b8c2-0fcfe78d7520@linux.dev> (raw)
In-Reply-To: <CAADnVQJNnqpoF2sNL76_Newyve8NVD2PLdq=tJyiA=tXkn_G4Q@mail.gmail.com>
On 11/8/24 12:13 PM, Alexei Starovoitov wrote:
> On Thu, Nov 7, 2024 at 9:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>> On Fri, 2024-11-01 at 17:32 -0700, Eduard Zingerman wrote:
>>> On Fri, 2024-11-01 at 17:29 -0700, Alexei Starovoitov wrote:
>>>
>>> [...]
>>>
>>>> Hmm.
>>>> Puranjay touched it last with extra logic.
>>>>
>>>> And before that David Vernet tried to address flakiness
>>>> in commit 4a54de65964d.
>>>> Yonghong also noticed lockups in paravirt
>>>> and added workaround 7015843afc.
>>>>
>>>> Your additional timeout/workaround makes sense to me,
>>>> but would be good to bisect whether Puranjay's change caused it.
>>> I'll debug what's going on some time later today or on Sat.
>> I finally had time to investigate this a bit.
>> First, here is how to trigger lockup:
>>
>> t1=send_signal/send_signal_perf_thread_remote; \
>> t2=send_signal/send_signal_nmi_thread_remote; \
>> for i in $(seq 1 100); do ./test_progs -t $t1,$t2; done
>>
>> Must be both tests for whatever reason.
>> The failing test is 'send_signal_nmi_thread_remote'.
>>
>> The test is organized as parent and child processes communicating
>> various events to each other. The intended sequence of events:
>> - child:
>> - install SIGUSR1 handler
>> - notify parent
>> - wait for parent
>> - parent:
>> - open PERF_COUNT_SW_CPU_CLOCK event
>> - attach BPF program to the event
>> - notify child
>> - enter busy loop for 10^8 iterations
>> - wait for child
>> - BPF program:
>> - send SIGUSR1 to child
>> - child:
>> - poll for SIGUSR1 in a busy loop
>> - notify parent
>> - parent:
>> - check value communicated by child,
>> terminate test.
>>
>> The lockup happens because on every other test run perf event is not
>> triggered, child does not receive SIGUSR1 and thus both parent and
>> child are stuck.
>>
>> For 'send_signal_nmi_thread_remote' perf event is defined as:
>>
>> struct perf_event_attr attr = {
>> .sample_period = 1,
>> .type = PERF_TYPE_HARDWARE,
>> .config = PERF_COUNT_HW_CPU_CYCLES,
>> };
>>
>> And is opened for parent process pid.
>>
>> Apparently, the perf event is not always triggered between lines
>> send_signal.c:165-180. And at line 180 parent enters system call,
>> so cpu cycles stop ticking for 'parent', thus if perf event
>> had not been triggered already it won't be triggered at all
>> (as far as I understand).
>>
>> Applying same fix as Yonghong did in 7015843afc is sufficient to
>> reliably trigger perf event:
>>
>> --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
>> @@ -223,7 +223,8 @@ static void test_send_signal_perf(bool signal_thread, bool remote)
>> static void test_send_signal_nmi(bool signal_thread, bool remote)
>> {
>> struct perf_event_attr attr = {
>> - .sample_period = 1,
>> + .freq = 1,
>> + .sample_freq = 1000,
>> .type = PERF_TYPE_HARDWARE,
>> .config = PERF_COUNT_HW_CPU_CYCLES,
>> };
>>
>> But I don't understand why.
>> As far as I can figure from kernel source code,
>> sample_period is measured in nanoseconds (is it?),
> I believe sample_period is a number of samples.
> 1 means that perf suppose to generate event very often.
> It means nanoseconds only for SW cpu_cycles.
>
> let's apply above workaround and move on. Pls send a patch.
sample_period = 1 intends to make sure we can get at one sample
since many samples will be generated. But too many samples may
cause *rare* issues in internal kernel logic in certain *corner*
cases. Agree with Alexei, let us just use a reasonable sample_freq
to fix the issue. Thanks!
prev parent reply other threads:[~2024-11-08 21:57 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-01 0:00 [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
2024-11-01 0:00 ` [PATCH bpf-next v1 1/2] bpf: Mark raw_tp arguments with PTR_MAYBE_NULL Kumar Kartikeya Dwivedi
2024-11-01 19:16 ` Andrii Nakryiko
2024-11-01 22:55 ` Alexei Starovoitov
2024-11-03 16:16 ` Kumar Kartikeya Dwivedi
2024-11-03 16:40 ` Kumar Kartikeya Dwivedi
2024-11-03 17:00 ` Kumar Kartikeya Dwivedi
2024-11-03 17:37 ` Alexei Starovoitov
2024-11-06 20:17 ` Andrii Nakryiko
2024-11-01 0:00 ` [PATCH bpf-next v1 2/2] selftests/bpf: Add tests for raw_tp null handling Kumar Kartikeya Dwivedi
2024-11-01 19:19 ` Andrii Nakryiko
2024-11-03 15:58 ` Kumar Kartikeya Dwivedi
2024-11-01 13:18 ` [PATCH bpf-next v1 0/2] Handle possible NULL trusted raw_tp arguments Kumar Kartikeya Dwivedi
2024-11-02 0:21 ` Eduard Zingerman
2024-11-02 0:29 ` Alexei Starovoitov
2024-11-02 0:32 ` Eduard Zingerman
2024-11-08 5:08 ` Eduard Zingerman
2024-11-08 20:13 ` Alexei Starovoitov
2024-11-08 21:57 ` Yonghong Song [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0f763686-2b79-40ee-b8c2-0fcfe78d7520@linux.dev \
--to=yonghong.song@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=eddyz87@gmail.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@fb.com \
--cc=kkd@meta.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=olsajiri@gmail.com \
--cc=puranjay@kernel.org \
--cc=rostedt@goodmis.org \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox