From: Steven Rostedt <rostedt@goodmis.org>
To: Luo Gengkun <luogengkun@huaweicloud.com>
Cc: mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
linux-arm-kernel@lists.infradead.org,
Mark Rutland <mark.rutland@arm.com>
Subject: Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
Date: Fri, 29 Aug 2025 08:26:04 -0400 [thread overview]
Message-ID: <20250829082604.1e3fd06e@gandalf.local.home> (raw)
In-Reply-To: <436e4fa7-f8c7-4c23-a28a-4e5eebe2f854@huaweicloud.com>
[ Adding arm64 maintainers ]
On Fri, 29 Aug 2025 16:29:07 +0800
Luo Gengkun <luogengkun@huaweicloud.com> wrote:
> On 2025/8/20 1:50, Steven Rostedt wrote:
> > On Tue, 19 Aug 2025 10:51:52 +0000
> > Luo Gengkun <luogengkun@huaweicloud.com> wrote:
> >
> >> Both tracing_mark_write and tracing_mark_raw_write call
> >> __copy_from_user_inatomic during preempt_disable. But in some case,
> >> __copy_from_user_inatomic may trigger page fault, and will call schedule()
> >> subtly. And if a task is migrated to other cpu, the following warning will
> > Wait! What?
> >
> > __copy_from_user_inatomic() is allowed to be called from in atomic context.
> > Hence the name it has. How the hell can it sleep? If it does, it's totally
> > broken!
> >
> > Now, I'm not against using nofault() as it is better named, but I want to
> > know why you are suggesting this change. Did you actually trigger a bug here?
>
> yes, I trigger this bug in arm64.
And I still think this is an arm64 bug.
>
> >
> >> be trigger:
> >> if (RB_WARN_ON(cpu_buffer,
> >> !local_read(&cpu_buffer->committing)))
> >>
> >> An example can illustrate this issue:
> >>
> >> process flow CPU
> >> ---------------------------------------------------------------------
> >>
> >> tracing_mark_raw_write(): cpu:0
> >> ...
> >> ring_buffer_lock_reserve(): cpu:0
> >> ...
> >> cpu = raw_smp_processor_id() cpu:0
> >> cpu_buffer = buffer->buffers[cpu] cpu:0
> >> ...
> >> ...
> >> __copy_from_user_inatomic(): cpu:0
> >> ...
> >> # page fault
> >> do_mem_abort(): cpu:0
> > Sounds to me that arm64 __copy_from_user_inatomic() may be broken.
> >
> >> ...
> >> # Call schedule
> >> schedule() cpu:0
> >> ...
> >> # the task schedule to cpu1
> >> __buffer_unlock_commit(): cpu:1
> >> ...
> >> ring_buffer_unlock_commit(): cpu:1
> >> ...
> >> cpu = raw_smp_processor_id() cpu:1
> >> cpu_buffer = buffer->buffers[cpu] cpu:1
> >>
> >> As shown above, the process will acquire cpuid twice and the return values
> >> are not the same.
> >>
> >> To fix this problem using copy_from_user_nofault instead of
> >> __copy_from_user_inatomic, as the former performs 'access_ok' before
> >> copying.
> >>
> >> Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing")
> > The above commit was intorduced in 2016. copy_from_user_nofault() was
> > introduced in 2020. I don't think this would be the fix for that kernel.
> >
> > So no, I'm not taking this patch. If you see __copy_from_user_inatomic()
> > sleeping, it's users are not the issue. That function is.
> >
> > -- Steve
> >
> >
> I noticed that in most places where __copy_from_user_inatomic() is used,
"most" but not all?
> it is within the pagefault_disable/enable() section. When pagefault_disable()
> is called, user access methods will no sleep. So I'm going to send a v2patch which use pagefault_disable/enable()to fix this problem. -- Gengkun
No, I don't want that either. __copy_from_user_inatomic() SHOULD NOT SLEEP!
If it does, than it is a bug!
If it can sleep, "inatomic" is a very bad name. The point of being
"inatomic" is that you are in a location that IS NOT ALLOWED TO SLEEP!
I don't want to fix a symptom and leave a bug around.
BTW, the reason not to fault is because this might be called in code that is
already doing a fault and could cause deadlocks. The no sleeping part is a
side effect.
-- Steve
next prev parent reply other threads:[~2025-08-29 12:25 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-19 10:51 [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable Luo Gengkun
2025-08-19 17:50 ` Steven Rostedt
2025-08-29 8:29 ` Luo Gengkun
2025-08-29 12:26 ` Steven Rostedt [this message]
2025-08-29 12:36 ` Steven Rostedt
2025-08-29 19:53 ` Catalin Marinas
2025-08-29 22:13 ` Steven Rostedt
2025-08-30 10:22 ` Catalin Marinas
2025-09-01 9:56 ` Mark Rutland
2025-09-01 12:28 ` Catalin Marinas
2025-09-01 13:07 ` Mark Rutland
2025-09-01 9:43 ` Mark Rutland
2025-09-02 14:11 ` Steven Rostedt
2025-09-01 16:01 ` Masami Hiramatsu
2025-09-01 15:56 ` Masami Hiramatsu
2025-09-02 3:47 ` Luo Gengkun
2025-09-02 7:35 ` Masami Hiramatsu
2025-09-02 14:14 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250829082604.1e3fd06e@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=luogengkun@huaweicloud.com \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).