From: Nam Cao <namcaov@gmail.com>
To: "Björn Töpel" <bjorn@kernel.org>
Cc: linux-riscv@lists.infradead.org, Guo Ren <guoren@kernel.org>,
bpf@vger.kernel.org, Hou Tao <houtao@huaweicloud.com>,
yonghong.song@linux.dev,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Puranjay Mohan <puranjay12@gmail.com>
Subject: Re: RISC-V uprobe bug (Was: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342)
Date: Sat, 26 Aug 2023 20:31:43 +0200 [thread overview]
Message-ID: <ZOpFD3W3RfiqOoWn@nam-dell> (raw)
In-Reply-To: <ZOpAjkTtA4jYtuIa@nam-dell>
On Sat, Aug 26, 2023 at 08:12:30PM +0200, Nam Cao wrote:
> On Sat, Aug 26, 2023 at 03:44:48PM +0200, Björn Töpel wrote:
> > Björn Töpel <bjorn@kernel.org> writes:
> >
> > > I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
> > > selftests on bpf-next 9e3b47abeb8f.
> > >
> > > I'm able to reproduce the hang by multiple runs of:
> > > | ./test_progs -a link_api -a linked_list
> > > I'm currently investigating that.
> >
> > +Guo for uprobe
> >
> > This was an interesting bug. The hang is an ebreak (RISC-V breakpoint),
> > that puts the kernel into an infinite loop.
> >
> > To reproduce, simply run the BPF selftest:
> > ./test_progs -v -a link_api -a linked_list
> >
> > First the link_api test is being run, which exercises the uprobe
> > functionality. The link_api test completes, and test_progs will still
> > have the uprobe active/enabled. Next the linked_list test triggered a
> > WARN_ON (which is implemented via ebreak as well).
> >
> > Now, handle_break() is entered, and the uprobe_breakpoint_handler()
> > returns true exiting the handle_break(), which returns to the WARN
> > ebreak, and we have merry-go-round.
> >
> > Lucky for the RISC-V folks, the BPF memory handler had a WARN that
> > surfaced the bug! ;-)
>
> Thanks for the analysis.
>
> I couldn't reproduce the problem, so I am just taking a guess here. The problem
> is bebcause uprobes didn't find a probe point at that ebreak instruction. However,
> it also doesn't think a ebreak instruction is there, then it got confused and just
> return back to the ebreak instruction, then everything repeats.
>
> The reason why uprobes didn't think there is a ebreak instruction is because
> is_trap_insn() only returns true if it is a 32-bit ebreak, or 16-bit c.ebreak if
> C extension is available, not both. So a 32-bit ebreak is not correctly recognized
> as a trap instruction.
I feel like I wasn't very clear with this: I was talking about handle_swbp() in
kernel/events/uprobes.c. In this function, the call to find_active_uprobe() should
return false. Then uprobes check if the trap instruction is still there by
calling is_trap_insn(), who correctly says "no". So uprobes assume it is safe to
just comeback to that address. If is_trap_insn() correctly returns true, then
uprobes would know that this is a ebreak, but not a probe, and handle thing correctly.
Best regards,
Nam
WARNING: multiple messages have this Message-ID (diff)
From: Nam Cao <namcaov@gmail.com>
To: "Björn Töpel" <bjorn@kernel.org>
Cc: linux-riscv@lists.infradead.org, Guo Ren <guoren@kernel.org>,
bpf@vger.kernel.org, Hou Tao <houtao@huaweicloud.com>,
yonghong.song@linux.dev,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Puranjay Mohan <puranjay12@gmail.com>
Subject: Re: RISC-V uprobe bug (Was: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342)
Date: Sat, 26 Aug 2023 20:31:43 +0200 [thread overview]
Message-ID: <ZOpFD3W3RfiqOoWn@nam-dell> (raw)
In-Reply-To: <ZOpAjkTtA4jYtuIa@nam-dell>
On Sat, Aug 26, 2023 at 08:12:30PM +0200, Nam Cao wrote:
> On Sat, Aug 26, 2023 at 03:44:48PM +0200, Björn Töpel wrote:
> > Björn Töpel <bjorn@kernel.org> writes:
> >
> > > I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
> > > selftests on bpf-next 9e3b47abeb8f.
> > >
> > > I'm able to reproduce the hang by multiple runs of:
> > > | ./test_progs -a link_api -a linked_list
> > > I'm currently investigating that.
> >
> > +Guo for uprobe
> >
> > This was an interesting bug. The hang is an ebreak (RISC-V breakpoint),
> > that puts the kernel into an infinite loop.
> >
> > To reproduce, simply run the BPF selftest:
> > ./test_progs -v -a link_api -a linked_list
> >
> > First the link_api test is being run, which exercises the uprobe
> > functionality. The link_api test completes, and test_progs will still
> > have the uprobe active/enabled. Next the linked_list test triggered a
> > WARN_ON (which is implemented via ebreak as well).
> >
> > Now, handle_break() is entered, and the uprobe_breakpoint_handler()
> > returns true exiting the handle_break(), which returns to the WARN
> > ebreak, and we have merry-go-round.
> >
> > Lucky for the RISC-V folks, the BPF memory handler had a WARN that
> > surfaced the bug! ;-)
>
> Thanks for the analysis.
>
> I couldn't reproduce the problem, so I am just taking a guess here. The problem
> is bebcause uprobes didn't find a probe point at that ebreak instruction. However,
> it also doesn't think a ebreak instruction is there, then it got confused and just
> return back to the ebreak instruction, then everything repeats.
>
> The reason why uprobes didn't think there is a ebreak instruction is because
> is_trap_insn() only returns true if it is a 32-bit ebreak, or 16-bit c.ebreak if
> C extension is available, not both. So a 32-bit ebreak is not correctly recognized
> as a trap instruction.
I feel like I wasn't very clear with this: I was talking about handle_swbp() in
kernel/events/uprobes.c. In this function, the call to find_active_uprobe() should
return false. Then uprobes check if the trap instruction is still there by
calling is_trap_insn(), who correctly says "no". So uprobes assume it is safe to
just comeback to that address. If is_trap_insn() correctly returns true, then
uprobes would know that this is a ebreak, but not a probe, and handle thing correctly.
Best regards,
Nam
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2023-08-26 18:31 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-25 10:32 WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342 Björn Töpel
2023-08-25 10:32 ` Björn Töpel
2023-08-25 15:28 ` Yonghong Song
2023-08-25 15:28 ` Yonghong Song
2023-08-25 18:53 ` Alexei Starovoitov
2023-08-25 18:53 ` Alexei Starovoitov
2023-08-25 19:49 ` Alexei Starovoitov
2023-08-25 19:49 ` Alexei Starovoitov
2023-08-25 21:31 ` Andrii Nakryiko
2023-08-25 21:31 ` Andrii Nakryiko
2023-08-26 22:49 ` Kumar Kartikeya Dwivedi
2023-08-26 22:49 ` Kumar Kartikeya Dwivedi
2023-08-26 3:48 ` Hou Tao
2023-08-26 3:48 ` Hou Tao
2023-08-26 9:23 ` Björn Töpel
2023-08-26 9:23 ` Björn Töpel
2023-08-26 10:27 ` Hou Tao
2023-08-26 10:27 ` Hou Tao
2023-08-26 10:49 ` Björn Töpel
2023-08-26 10:49 ` Björn Töpel
2023-08-27 8:37 ` Björn Töpel
2023-08-27 8:37 ` Björn Töpel
2023-08-27 14:53 ` Yonghong Song
2023-08-27 14:53 ` Yonghong Song
2023-08-28 13:57 ` Hou Tao
2023-08-28 13:57 ` Hou Tao
2023-08-29 0:54 ` Yonghong Song
2023-08-29 0:54 ` Yonghong Song
2023-08-29 7:26 ` Björn Töpel
2023-08-29 7:26 ` Björn Töpel
2023-08-29 11:46 ` Björn Töpel
2023-08-29 11:46 ` Björn Töpel
2023-08-30 12:15 ` Hou Tao
2023-08-30 12:15 ` Hou Tao
2023-08-29 12:54 ` Björn Töpel
2023-08-29 12:54 ` Björn Töpel
2023-08-29 15:26 ` Alexei Starovoitov
2023-08-29 15:26 ` Alexei Starovoitov
2023-08-30 12:08 ` Hou Tao
2023-08-30 12:08 ` Hou Tao
2023-08-30 21:05 ` Alexei Starovoitov
2023-08-30 21:05 ` Alexei Starovoitov
2023-08-26 13:44 ` RISC-V uprobe bug (Was: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342) Björn Töpel
2023-08-26 13:44 ` Björn Töpel
2023-08-26 18:12 ` Nam Cao
2023-08-26 18:12 ` Nam Cao
2023-08-26 18:31 ` Nam Cao [this message]
2023-08-26 18:31 ` Nam Cao
2023-08-27 8:11 ` Björn Töpel
2023-08-27 8:11 ` Björn Töpel
2023-08-27 8:35 ` Nam Cao
2023-08-27 8:35 ` Nam Cao
2023-08-27 9:04 ` Björn Töpel
2023-08-27 9:04 ` Björn Töpel
2023-08-27 9:39 ` Nam Cao
2023-08-27 9:39 ` Nam Cao
2023-08-27 19:20 ` Björn Töpel
2023-08-27 19:20 ` Björn Töpel
2023-08-27 19:41 ` Nam Cao
2023-08-27 19:41 ` Nam Cao
2023-08-27 20:15 ` Nam Cao
2023-08-27 20:15 ` Nam Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZOpFD3W3RfiqOoWn@nam-dell \
--to=namcaov@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=guoren@kernel.org \
--cc=houtao@huaweicloud.com \
--cc=linux-riscv@lists.infradead.org \
--cc=puranjay12@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.