linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
@ 2025-08-19 10:51 Luo Gengkun
  2025-08-19 17:50 ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Luo Gengkun @ 2025-08-19 10:51 UTC (permalink / raw)
  To: rostedt
  Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
	luogengkun

Both tracing_mark_write and tracing_mark_raw_write call
__copy_from_user_inatomic during preempt_disable. But in some case,
__copy_from_user_inatomic may trigger page fault, and will call schedule()
subtly. And if a task is migrated to other cpu, the following warning will
be trigger:
        if (RB_WARN_ON(cpu_buffer,
                       !local_read(&cpu_buffer->committing)))

An example can illustrate this issue:

process flow						CPU
---------------------------------------------------------------------

tracing_mark_raw_write():				cpu:0
   ...
   ring_buffer_lock_reserve():				cpu:0
      ...
      cpu = raw_smp_processor_id()			cpu:0
      cpu_buffer = buffer->buffers[cpu]			cpu:0
      ...
   ...
   __copy_from_user_inatomic():				cpu:0
      ...
      # page fault
      do_mem_abort():					cpu:0
         ...
         # Call schedule
         schedule()					cpu:0
	 ...
   # the task schedule to cpu1
   __buffer_unlock_commit():				cpu:1
      ...
      ring_buffer_unlock_commit():			cpu:1
	 ...
	 cpu = raw_smp_processor_id()			cpu:1
	 cpu_buffer = buffer->buffers[cpu]		cpu:1

As shown above, the process will acquire cpuid twice and the return values
are not the same.

To fix this problem using copy_from_user_nofault instead of
__copy_from_user_inatomic, as the former performs 'access_ok' before
copying.

Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
---
 kernel/trace/trace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 95ae7c4e5835..a41d1acf15a0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7271,7 +7271,7 @@ static ssize_t write_marker_to_buffer(struct trace_array *tr, const char __user
 	entry = ring_buffer_event_data(event);
 	entry->ip = ip;
 
-	len = __copy_from_user_inatomic(&entry->buf, ubuf, cnt);
+	len = copy_from_user_nofault(&entry->buf, ubuf, cnt);
 	if (len) {
 		memcpy(&entry->buf, FAULTED_STR, FAULTED_SIZE);
 		cnt = FAULTED_SIZE;
@@ -7368,7 +7368,7 @@ static ssize_t write_raw_marker_to_buffer(struct trace_array *tr,
 
 	entry = ring_buffer_event_data(event);
 
-	len = __copy_from_user_inatomic(&entry->id, ubuf, cnt);
+	len = copy_from_user_nofault(&entry->id, ubuf, cnt);
 	if (len) {
 		entry->id = -1;
 		memcpy(&entry->buf, FAULTED_STR, FAULTED_SIZE);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-19 10:51 [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable Luo Gengkun
@ 2025-08-19 17:50 ` Steven Rostedt
  2025-08-29  8:29   ` Luo Gengkun
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2025-08-19 17:50 UTC (permalink / raw)
  To: Luo Gengkun; +Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel

On Tue, 19 Aug 2025 10:51:52 +0000
Luo Gengkun <luogengkun@huaweicloud.com> wrote:

> Both tracing_mark_write and tracing_mark_raw_write call
> __copy_from_user_inatomic during preempt_disable. But in some case,
> __copy_from_user_inatomic may trigger page fault, and will call schedule()
> subtly. And if a task is migrated to other cpu, the following warning will

Wait! What?

__copy_from_user_inatomic() is allowed to be called from in atomic context.
Hence the name it has. How the hell can it sleep? If it does, it's totally
broken!

Now, I'm not against using nofault() as it is better named, but I want to
know why you are suggesting this change. Did you actually trigger a bug here?

> be trigger:
>         if (RB_WARN_ON(cpu_buffer,
>                        !local_read(&cpu_buffer->committing)))
> 
> An example can illustrate this issue:
> 
> process flow						CPU
> ---------------------------------------------------------------------
> 
> tracing_mark_raw_write():				cpu:0
>    ...
>    ring_buffer_lock_reserve():				cpu:0
>       ...
>       cpu = raw_smp_processor_id()			cpu:0
>       cpu_buffer = buffer->buffers[cpu]			cpu:0
>       ...
>    ...
>    __copy_from_user_inatomic():				cpu:0
>       ...
>       # page fault
>       do_mem_abort():					cpu:0

Sounds to me that arm64 __copy_from_user_inatomic() may be broken.

>          ...
>          # Call schedule
>          schedule()					cpu:0
> 	 ...
>    # the task schedule to cpu1
>    __buffer_unlock_commit():				cpu:1
>       ...
>       ring_buffer_unlock_commit():			cpu:1
> 	 ...
> 	 cpu = raw_smp_processor_id()			cpu:1
> 	 cpu_buffer = buffer->buffers[cpu]		cpu:1
> 
> As shown above, the process will acquire cpuid twice and the return values
> are not the same.
> 
> To fix this problem using copy_from_user_nofault instead of
> __copy_from_user_inatomic, as the former performs 'access_ok' before
> copying.
> 
> Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing")

The above commit was intorduced in 2016. copy_from_user_nofault() was
introduced in 2020. I don't think this would be the fix for that kernel.

So no, I'm not taking this patch. If you see __copy_from_user_inatomic()
sleeping, it's users are not the issue. That function is.

-- Steve



> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-19 17:50 ` Steven Rostedt
@ 2025-08-29  8:29   ` Luo Gengkun
  2025-08-29 12:26     ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Luo Gengkun @ 2025-08-29  8:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel


On 2025/8/20 1:50, Steven Rostedt wrote:
> On Tue, 19 Aug 2025 10:51:52 +0000
> Luo Gengkun <luogengkun@huaweicloud.com> wrote:
>
>> Both tracing_mark_write and tracing_mark_raw_write call
>> __copy_from_user_inatomic during preempt_disable. But in some case,
>> __copy_from_user_inatomic may trigger page fault, and will call schedule()
>> subtly. And if a task is migrated to other cpu, the following warning will
> Wait! What?
>
> __copy_from_user_inatomic() is allowed to be called from in atomic context.
> Hence the name it has. How the hell can it sleep? If it does, it's totally
> broken!
>
> Now, I'm not against using nofault() as it is better named, but I want to
> know why you are suggesting this change. Did you actually trigger a bug here?

yes, I trigger this bug in arm64.

>
>> be trigger:
>>          if (RB_WARN_ON(cpu_buffer,
>>                         !local_read(&cpu_buffer->committing)))
>>
>> An example can illustrate this issue:
>>
>> process flow						CPU
>> ---------------------------------------------------------------------
>>
>> tracing_mark_raw_write():				cpu:0
>>     ...
>>     ring_buffer_lock_reserve():				cpu:0
>>        ...
>>        cpu = raw_smp_processor_id()			cpu:0
>>        cpu_buffer = buffer->buffers[cpu]			cpu:0
>>        ...
>>     ...
>>     __copy_from_user_inatomic():				cpu:0
>>        ...
>>        # page fault
>>        do_mem_abort():					cpu:0
> Sounds to me that arm64 __copy_from_user_inatomic() may be broken.
>
>>           ...
>>           # Call schedule
>>           schedule()					cpu:0
>> 	 ...
>>     # the task schedule to cpu1
>>     __buffer_unlock_commit():				cpu:1
>>        ...
>>        ring_buffer_unlock_commit():			cpu:1
>> 	 ...
>> 	 cpu = raw_smp_processor_id()			cpu:1
>> 	 cpu_buffer = buffer->buffers[cpu]		cpu:1
>>
>> As shown above, the process will acquire cpuid twice and the return values
>> are not the same.
>>
>> To fix this problem using copy_from_user_nofault instead of
>> __copy_from_user_inatomic, as the former performs 'access_ok' before
>> copying.
>>
>> Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing")
> The above commit was intorduced in 2016. copy_from_user_nofault() was
> introduced in 2020. I don't think this would be the fix for that kernel.
>
> So no, I'm not taking this patch. If you see __copy_from_user_inatomic()
> sleeping, it's users are not the issue. That function is.
>
> -- Steve
>
>
I noticed that in most places where __copy_from_user_inatomic() is used,
it is within the pagefault_disable/enable() section. When pagefault_disable()
is called, user access methods will no sleep. So I'm going to send a v2patch which use pagefault_disable/enable()to fix this problem. -- Gengkun

>
>> Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-29  8:29   ` Luo Gengkun
@ 2025-08-29 12:26     ` Steven Rostedt
  2025-08-29 12:36       ` Steven Rostedt
  2025-09-01 15:56       ` Masami Hiramatsu
  0 siblings, 2 replies; 18+ messages in thread
From: Steven Rostedt @ 2025-08-29 12:26 UTC (permalink / raw)
  To: Luo Gengkun
  Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
	Catalin Marinas, Will Deacon, linux-arm-kernel, Mark Rutland


[ Adding arm64 maintainers ]

On Fri, 29 Aug 2025 16:29:07 +0800
Luo Gengkun <luogengkun@huaweicloud.com> wrote:

> On 2025/8/20 1:50, Steven Rostedt wrote:
> > On Tue, 19 Aug 2025 10:51:52 +0000
> > Luo Gengkun <luogengkun@huaweicloud.com> wrote:
> >  
> >> Both tracing_mark_write and tracing_mark_raw_write call
> >> __copy_from_user_inatomic during preempt_disable. But in some case,
> >> __copy_from_user_inatomic may trigger page fault, and will call schedule()
> >> subtly. And if a task is migrated to other cpu, the following warning will  
> > Wait! What?
> >
> > __copy_from_user_inatomic() is allowed to be called from in atomic context.
> > Hence the name it has. How the hell can it sleep? If it does, it's totally
> > broken!
> >
> > Now, I'm not against using nofault() as it is better named, but I want to
> > know why you are suggesting this change. Did you actually trigger a bug here?  
> 
> yes, I trigger this bug in arm64.

And I still think this is an arm64 bug.

> 
> >  
> >> be trigger:
> >>          if (RB_WARN_ON(cpu_buffer,
> >>                         !local_read(&cpu_buffer->committing)))
> >>
> >> An example can illustrate this issue:
> >>
> >> process flow						CPU
> >> ---------------------------------------------------------------------
> >>
> >> tracing_mark_raw_write():				cpu:0
> >>     ...
> >>     ring_buffer_lock_reserve():				cpu:0
> >>        ...
> >>        cpu = raw_smp_processor_id()			cpu:0
> >>        cpu_buffer = buffer->buffers[cpu]			cpu:0
> >>        ...
> >>     ...
> >>     __copy_from_user_inatomic():				cpu:0
> >>        ...
> >>        # page fault
> >>        do_mem_abort():					cpu:0  
> > Sounds to me that arm64 __copy_from_user_inatomic() may be broken.
> >  
> >>           ...
> >>           # Call schedule
> >>           schedule()					cpu:0
> >> 	 ...
> >>     # the task schedule to cpu1
> >>     __buffer_unlock_commit():				cpu:1
> >>        ...
> >>        ring_buffer_unlock_commit():			cpu:1
> >> 	 ...
> >> 	 cpu = raw_smp_processor_id()			cpu:1
> >> 	 cpu_buffer = buffer->buffers[cpu]		cpu:1
> >>
> >> As shown above, the process will acquire cpuid twice and the return values
> >> are not the same.
> >>
> >> To fix this problem using copy_from_user_nofault instead of
> >> __copy_from_user_inatomic, as the former performs 'access_ok' before
> >> copying.
> >>
> >> Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing")  
> > The above commit was intorduced in 2016. copy_from_user_nofault() was
> > introduced in 2020. I don't think this would be the fix for that kernel.
> >
> > So no, I'm not taking this patch. If you see __copy_from_user_inatomic()
> > sleeping, it's users are not the issue. That function is.
> >
> > -- Steve
> >
> >  
> I noticed that in most places where __copy_from_user_inatomic() is used,

"most" but not all?

> it is within the pagefault_disable/enable() section. When pagefault_disable()
> is called, user access methods will no sleep. So I'm going to send a v2patch which use pagefault_disable/enable()to fix this problem. -- Gengkun

No, I don't want that either. __copy_from_user_inatomic() SHOULD NOT SLEEP!
If it does, than it is a bug!

If it can sleep, "inatomic" is a very bad name. The point of being
"inatomic" is that you are in a location that IS NOT ALLOWED TO SLEEP!

I don't want to fix a symptom and leave a bug around.

BTW, the reason not to fault is because this might be called in code that is
already doing a fault and could cause deadlocks. The no sleeping part is a
side effect.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-29 12:26     ` Steven Rostedt
@ 2025-08-29 12:36       ` Steven Rostedt
  2025-08-29 19:53         ` Catalin Marinas
  2025-09-01 15:56       ` Masami Hiramatsu
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2025-08-29 12:36 UTC (permalink / raw)
  To: Luo Gengkun
  Cc: mhiramat, mathieu.desnoyers, linux-kernel, linux-trace-kernel,
	Catalin Marinas, Will Deacon, linux-arm-kernel, Mark Rutland

On Fri, 29 Aug 2025 08:26:04 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> BTW, the reason not to fault is because this might be called in code that is
> already doing a fault and could cause deadlocks. The no sleeping part is a
> side effect.

The difference between __copy_from_user_inatomic() and
copy_from_user_nofault() is the above. It is possible to fault in memory
without sleeping. For instance, the memory is already in the page cache,
but not the user space page tables. Where that would be OK for
__copy_from_user_inatomic() but not OK with copy_from_user_nofault(), due
to the mentioned locking.

For things like trace events and kprobes, copy_from_user_nofault() must be
used because they can be added to code that is doing a fault, and this version
must be used to prevent deadlocks.

But here, the __copy_from_user_inatomic() is in the code to handle writing
to the trace_marker file. It is directly called from a user space system
call, and will never be called within code that faults. Thus,
__copy_from_user_inatomic() *is* the correct operation, as there's no
problem if it needs to fault. It just can't sleep when doing so.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-29 12:36       ` Steven Rostedt
@ 2025-08-29 19:53         ` Catalin Marinas
  2025-08-29 22:13           ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2025-08-29 19:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luo Gengkun, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, Will Deacon, linux-arm-kernel, Mark Rutland,
	Al Viro

On Fri, Aug 29, 2025 at 08:36:55AM -0400, Steven Rostedt wrote:
> On Fri, 29 Aug 2025 08:26:04 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > BTW, the reason not to fault is because this might be called in code that is
> > already doing a fault and could cause deadlocks. The no sleeping part is a
> > side effect.
> 
> The difference between __copy_from_user_inatomic() and
> copy_from_user_nofault() is the above. It is possible to fault in memory
> without sleeping. For instance, the memory is already in the page cache,
> but not the user space page tables. Where that would be OK for
> __copy_from_user_inatomic() but not OK with copy_from_user_nofault(), due
> to the mentioned locking.

The semantics of __copy_from_user_inatomic() are not entirely clear. The
name implies it is to be used in atomic contexts but the documentation
also says that the caller should ensure there's no fault (well, the
comment is further down in uaccess.h for __copy_to_user_inatomic()).

The generic implementation uses raw_copy_from_user() in both atomic and
non-atomic variants. The difference is pretty much a might_fault() call.
So it's nothing arm64 specific here.

> For things like trace events and kprobes, copy_from_user_nofault() must be
> used because they can be added to code that is doing a fault, and this version
> must be used to prevent deadlocks.
> 
> But here, the __copy_from_user_inatomic() is in the code to handle writing
> to the trace_marker file. It is directly called from a user space system
> call, and will never be called within code that faults. Thus,
> __copy_from_user_inatomic() *is* the correct operation, as there's no
> problem if it needs to fault. It just can't sleep when doing so.

The problem is that it's the responsibility of the caller to ensure it
doesn't fault. In most cases, that's a pagefault_disable(). Or you just
go for copy_from_user_nofault() instead which actually checks that it's
a valid user address.

BTW, arm64 also bails out early in do_page_fault() if in_atomic() but I
suspect that's not the case here.

Adding Al Viro since since he wrote a large part of uaccess.h.

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-29 19:53         ` Catalin Marinas
@ 2025-08-29 22:13           ` Steven Rostedt
  2025-08-30 10:22             ` Catalin Marinas
                               ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Steven Rostedt @ 2025-08-29 22:13 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Luo Gengkun, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, Will Deacon, linux-arm-kernel, Mark Rutland,
	Al Viro

On Fri, 29 Aug 2025 20:53:40 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:
valid user address.
> 
> BTW, arm64 also bails out early in do_page_fault() if in_atomic() but I
> suspect that's not the case here.
> 
> Adding Al Viro since since he wrote a large part of uaccess.h.
> 

So, __copy_from_user_inatomic() is supposed to be called if
pagefault_disable() has already been called? If this is the case, can we
add more comments to this code? I've been using the inatomic() version this
way in preempt disabled locations since 2016.

Looks like it needs to be converted to copy_from_user_nofault().

Luo, this version of the patch looks legit, no need for a v2.

I just wanted to figure out why __copy_from_user_inatomic() wasn't atomic.
If anything, it needs to be better documented.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-29 22:13           ` Steven Rostedt
@ 2025-08-30 10:22             ` Catalin Marinas
  2025-09-01  9:56               ` Mark Rutland
  2025-09-01  9:43             ` Mark Rutland
  2025-09-01 16:01             ` Masami Hiramatsu
  2 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2025-08-30 10:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luo Gengkun, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, Will Deacon, linux-arm-kernel, Mark Rutland,
	Al Viro

On Fri, Aug 29, 2025 at 06:13:11PM -0400, Steven Rostedt wrote:
> On Fri, 29 Aug 2025 20:53:40 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> valid user address.
> > BTW, arm64 also bails out early in do_page_fault() if in_atomic() but I
> > suspect that's not the case here.
> > 
> > Adding Al Viro since since he wrote a large part of uaccess.h.
> 
> So, __copy_from_user_inatomic() is supposed to be called if
> pagefault_disable() has already been called? If this is the case, can we
> add more comments to this code? I've been using the inatomic() version this
> way in preempt disabled locations since 2016.

This should work as long as in_atomic() returns true as it's checked in
the arm64 fault code. With PREEMPT_NONE, however, I don't think this
works. __copy_from_user_inatomic() could be changed to call
pagefault_disable() if !in_atomic() but you might as well call
copy_from_user_nofault() in the trace code directly as per Luo's patch.

> I just wanted to figure out why __copy_from_user_inatomic() wasn't atomic.
> If anything, it needs to be better documented.

Yeah, I had no idea until I looked at the code. I guess it means it can
be safely used if in_atomic() == true (well, making it up, not sure what
the intention was).

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-29 22:13           ` Steven Rostedt
  2025-08-30 10:22             ` Catalin Marinas
@ 2025-09-01  9:43             ` Mark Rutland
  2025-09-02 14:11               ` Steven Rostedt
  2025-09-01 16:01             ` Masami Hiramatsu
  2 siblings, 1 reply; 18+ messages in thread
From: Mark Rutland @ 2025-09-01  9:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Catalin Marinas, Luo Gengkun, mhiramat, mathieu.desnoyers,
	linux-kernel, linux-trace-kernel, Will Deacon, linux-arm-kernel,
	Al Viro

On Fri, Aug 29, 2025 at 06:13:11PM -0400, Steven Rostedt wrote:
> On Fri, 29 Aug 2025 20:53:40 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> valid user address.
> > 
> > BTW, arm64 also bails out early in do_page_fault() if in_atomic() but I
> > suspect that's not the case here.
> > 
> > Adding Al Viro since since he wrote a large part of uaccess.h.
> > 
> 
> So, __copy_from_user_inatomic() is supposed to be called if
> pagefault_disable() has already been called? If this is the case, can we
> add more comments to this code?

Just to check, you're asking for better comments in <linux/uaccess.h>,
right?

> I've been using the inatomic() version this
> way in preempt disabled locations since 2016.
> 
> Looks like it needs to be converted to copy_from_user_nofault().
> 
> Luo, this version of the patch looks legit, no need for a v2.
> 
> I just wanted to figure out why __copy_from_user_inatomic() wasn't atomic.
> If anything, it needs to be better documented.

If that had roughly the same kerneldoc comment as for
__copy_to_user_inatomic(), would that be sufficient, or do you think
both need to be made more explicit?

Mark.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-30 10:22             ` Catalin Marinas
@ 2025-09-01  9:56               ` Mark Rutland
  2025-09-01 12:28                 ` Catalin Marinas
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Rutland @ 2025-09-01  9:56 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Steven Rostedt, Luo Gengkun, mhiramat, mathieu.desnoyers,
	linux-kernel, linux-trace-kernel, Will Deacon, linux-arm-kernel,
	Al Viro

On Sat, Aug 30, 2025 at 11:22:51AM +0100, Catalin Marinas wrote:
> On Fri, Aug 29, 2025 at 06:13:11PM -0400, Steven Rostedt wrote:
> > On Fri, 29 Aug 2025 20:53:40 +0100
> > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > valid user address.
> > > BTW, arm64 also bails out early in do_page_fault() if in_atomic() but I
> > > suspect that's not the case here.
> > > 
> > > Adding Al Viro since since he wrote a large part of uaccess.h.
> > 
> > So, __copy_from_user_inatomic() is supposed to be called if
> > pagefault_disable() has already been called? If this is the case, can we
> > add more comments to this code? I've been using the inatomic() version this
> > way in preempt disabled locations since 2016.
> 
> This should work as long as in_atomic() returns true as it's checked in
> the arm64 fault code. With PREEMPT_NONE, however, I don't think this
> works.

Sorry, what exactly breaks for the PREEMPT_NONE case?

> __copy_from_user_inatomic() could be changed to call
> pagefault_disable() if !in_atomic() but you might as well call
> copy_from_user_nofault() in the trace code directly as per Luo's patch.

That makes sense to me. I'll go check the arm64 users of
__copy_from_user_inatomic(), in case they're doing something dodgy.

> > I just wanted to figure out why __copy_from_user_inatomic() wasn't atomic.
> > If anything, it needs to be better documented.
> 
> Yeah, I had no idea until I looked at the code. I guess it means it can
> be safely used if in_atomic() == true (well, making it up, not sure what
> the intention was).

I think that was the intention -- it's the caller asserting that they
know the access won't fault (and hence won't sleep), and that's why
__copy_to_user_inatomic() and __copy_to_user() only differ by the latter
calling might_sleep().

It looks like other inconsistencies have crept in by accident. AFAICT
the should_fail_usercopy() check in __copy_from_user() was accidentally
missed from __copy_from_user_inatomic() when it was introduced in
commit:

  4d0e9df5e43dba52 ("lib, uaccess: add failure injection to usercopy functions")

... and the instrumentation is ordered inconsistently w.r.t.
should_fail_usercopy() since commit:

  33b75c1d884e81ec ("instrumented.h: allow instrumenting both sides of copy_from_user()")

... so there's a bunch of scope for cleanup, and we could probably have:

	/*
	 * TODO: comment saying to only call this directly when you know
	 * that the fault handler won't sleep.
	 */
	static __always_inline __must_check unsigned long
	__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
	{
		...
	}

	static __always_inline __must_check unsigned long
	__copy_from_user(void *to, const void __user *from, unsigned long n)
	{
		might_fault();
		return __copy_from_user_inatomic();
	}

... to avoid the inconsistency.

Mark.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-09-01  9:56               ` Mark Rutland
@ 2025-09-01 12:28                 ` Catalin Marinas
  2025-09-01 13:07                   ` Mark Rutland
  0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2025-09-01 12:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Steven Rostedt, Luo Gengkun, mhiramat, mathieu.desnoyers,
	linux-kernel, linux-trace-kernel, Will Deacon, linux-arm-kernel,
	Al Viro

On Mon, Sep 01, 2025 at 10:56:47AM +0100, Mark Rutland wrote:
> On Sat, Aug 30, 2025 at 11:22:51AM +0100, Catalin Marinas wrote:
> > On Fri, Aug 29, 2025 at 06:13:11PM -0400, Steven Rostedt wrote:
> > > On Fri, 29 Aug 2025 20:53:40 +0100
> > > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > valid user address.
> > > > BTW, arm64 also bails out early in do_page_fault() if in_atomic() but I
> > > > suspect that's not the case here.
> > > > 
> > > > Adding Al Viro since since he wrote a large part of uaccess.h.
> > > 
> > > So, __copy_from_user_inatomic() is supposed to be called if
> > > pagefault_disable() has already been called? If this is the case, can we
> > > add more comments to this code? I've been using the inatomic() version this
> > > way in preempt disabled locations since 2016.
> > 
> > This should work as long as in_atomic() returns true as it's checked in
> > the arm64 fault code. With PREEMPT_NONE, however, I don't think this
> > works.
> 
> Sorry, what exactly breaks for the PREEMPT_NONE case?

This code would trigger a warning:

	preempt_disable();
	WARN_ON(!in_atomic());
	preempt_enable();

More importantly, a faulting __copy_from_user_inatomic() between
get/put_cpu() could trigger migration.

> > > I just wanted to figure out why __copy_from_user_inatomic() wasn't atomic.
> > > If anything, it needs to be better documented.
> > 
> > Yeah, I had no idea until I looked at the code. I guess it means it can
> > be safely used if in_atomic() == true (well, making it up, not sure what
> > the intention was).
> 
> I think that was the intention -- it's the caller asserting that they
> know the access won't fault (and hence won't sleep), and that's why
> __copy_to_user_inatomic() and __copy_to_user() only differ by the latter
> calling might_sleep().
> 
> It looks like other inconsistencies have crept in by accident. AFAICT
> the should_fail_usercopy() check in __copy_from_user() was accidentally
> missed from __copy_from_user_inatomic() when it was introduced in
> commit:

I was wondering about that but some code comment for the inatomic
variant states that it's the responsibility of the caller to ensure it
doesn't fault. Not sure one can do other than pinning the page _and_
taking the mm lock. So I agree we should add the fault injection here as
well.

> ... so there's a bunch of scope for cleanup, and we could probably have:
> 
> 	/*
> 	 * TODO: comment saying to only call this directly when you know
> 	 * that the fault handler won't sleep.
> 	 */
> 	static __always_inline __must_check unsigned long
> 	__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
> 	{
> 		...
> 	}
> 
> 	static __always_inline __must_check unsigned long
> 	__copy_from_user(void *to, const void __user *from, unsigned long n)
> 	{
> 		might_fault();
> 		return __copy_from_user_inatomic();
> 	}
> 
> ... to avoid the inconsistency.

I think the _inatomic variant should be reserved to arch code that knows
the conditions. Generic code/drivers may not necessarily be aware of
what the arch fault handler does. The _nofault API I think is better
suited in generic code.

-- 
Catalin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-09-01 12:28                 ` Catalin Marinas
@ 2025-09-01 13:07                   ` Mark Rutland
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Rutland @ 2025-09-01 13:07 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Steven Rostedt, Luo Gengkun, mhiramat, mathieu.desnoyers,
	linux-kernel, linux-trace-kernel, Will Deacon, linux-arm-kernel,
	Al Viro

On Mon, Sep 01, 2025 at 01:28:01PM +0100, Catalin Marinas wrote:
> On Mon, Sep 01, 2025 at 10:56:47AM +0100, Mark Rutland wrote:
> > On Sat, Aug 30, 2025 at 11:22:51AM +0100, Catalin Marinas wrote:
> > > On Fri, Aug 29, 2025 at 06:13:11PM -0400, Steven Rostedt wrote:
> > > > On Fri, 29 Aug 2025 20:53:40 +0100
> > > > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > valid user address.
> > > > > BTW, arm64 also bails out early in do_page_fault() if in_atomic() but I
> > > > > suspect that's not the case here.
> > > > > 
> > > > > Adding Al Viro since since he wrote a large part of uaccess.h.
> > > > 
> > > > So, __copy_from_user_inatomic() is supposed to be called if
> > > > pagefault_disable() has already been called? If this is the case, can we
> > > > add more comments to this code? I've been using the inatomic() version this
> > > > way in preempt disabled locations since 2016.
> > > 
> > > This should work as long as in_atomic() returns true as it's checked in
> > > the arm64 fault code. With PREEMPT_NONE, however, I don't think this
> > > works.
> > 
> > Sorry, what exactly breaks for the PREEMPT_NONE case?
> 
> This code would trigger a warning:
> 
> 	preempt_disable();
> 	WARN_ON(!in_atomic());
> 	preempt_enable();

Ah, you mean in the absence of pagefault_disable()..pagefault_enable().

The page fault handling code uses faulthandler_disabled(), which checks
whether either pagefault_disabled() or in_atomic() are true, and aborts
if either are. Given that, using pagefault_disable() should work fine on
PREEMPT_NONE.

> More importantly, a faulting __copy_from_user_inatomic() between
> get/put_cpu() could trigger migration.

Yep, in the absence of pagefault_disable().

> > > > I just wanted to figure out why __copy_from_user_inatomic() wasn't atomic.
> > > > If anything, it needs to be better documented.
> > > 
> > > Yeah, I had no idea until I looked at the code. I guess it means it can
> > > be safely used if in_atomic() == true (well, making it up, not sure what
> > > the intention was).
> > 
> > I think that was the intention -- it's the caller asserting that they
> > know the access won't fault (and hence won't sleep), and that's why
> > __copy_to_user_inatomic() and __copy_to_user() only differ by the latter
> > calling might_sleep().
> > 
> > It looks like other inconsistencies have crept in by accident. AFAICT
> > the should_fail_usercopy() check in __copy_from_user() was accidentally
> > missed from __copy_from_user_inatomic() when it was introduced in
> > commit:
> 
> I was wondering about that but some code comment for the inatomic
> variant states that it's the responsibility of the caller to ensure it
> doesn't fault.

I think you mean the kerneldoc comment for __copy_to_user_inatomic(),
which says:

| The caller should also make sure he pins the user space address
| so that we don't result in page fault and sleep.

... and I think the key aspect is to avoid the sleeping, and actually
taking a fault (and failing the uaccess) has to be fine, or the _nofault
API (which uses the _inatomic API) is broken by design.

I think the bit about pinning the address space is misleading.

> Not sure one can do other than pinning the page _and_ taking the mm
> lock. So I agree we should add the fault injection here as well.

Cool.

> > ... so there's a bunch of scope for cleanup, and we could probably have:
> > 
> > 	/*
> > 	 * TODO: comment saying to only call this directly when you know
> > 	 * that the fault handler won't sleep.
> > 	 */
> > 	static __always_inline __must_check unsigned long
> > 	__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
> > 	{
> > 		...
> > 	}
> > 
> > 	static __always_inline __must_check unsigned long
> > 	__copy_from_user(void *to, const void __user *from, unsigned long n)
> > 	{
> > 		might_fault();
> > 		return __copy_from_user_inatomic();
> > 	}
> > 
> > ... to avoid the inconsistency.
> 
> I think the _inatomic variant should be reserved to arch code that knows
> the conditions. Generic code/drivers may not necessarily be aware of
> what the arch fault handler does. The _nofault API I think is better
> suited in generic code.

I agree. In almost all situations it's better for code to use the
_nofault API.

Mark.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-29 12:26     ` Steven Rostedt
  2025-08-29 12:36       ` Steven Rostedt
@ 2025-09-01 15:56       ` Masami Hiramatsu
  2025-09-02  3:47         ` Luo Gengkun
  1 sibling, 1 reply; 18+ messages in thread
From: Masami Hiramatsu @ 2025-09-01 15:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luo Gengkun, mhiramat, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Mark Rutland

On Fri, 29 Aug 2025 08:26:04 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> [ Adding arm64 maintainers ]
> 
> On Fri, 29 Aug 2025 16:29:07 +0800
> Luo Gengkun <luogengkun@huaweicloud.com> wrote:
> 
> > On 2025/8/20 1:50, Steven Rostedt wrote:
> > > On Tue, 19 Aug 2025 10:51:52 +0000
> > > Luo Gengkun <luogengkun@huaweicloud.com> wrote:
> > >  
> > >> Both tracing_mark_write and tracing_mark_raw_write call
> > >> __copy_from_user_inatomic during preempt_disable. But in some case,
> > >> __copy_from_user_inatomic may trigger page fault, and will call schedule()
> > >> subtly. And if a task is migrated to other cpu, the following warning will  
> > > Wait! What?
> > >
> > > __copy_from_user_inatomic() is allowed to be called from in atomic context.
> > > Hence the name it has. How the hell can it sleep? If it does, it's totally
> > > broken!
> > >
> > > Now, I'm not against using nofault() as it is better named, but I want to
> > > know why you are suggesting this change. Did you actually trigger a bug here?  
> > 
> > yes, I trigger this bug in arm64.
> 
> And I still think this is an arm64 bug.

I think it could be.

> > >  
> > >> be trigger:
> > >>          if (RB_WARN_ON(cpu_buffer,
> > >>                         !local_read(&cpu_buffer->committing)))
> > >>
> > >> An example can illustrate this issue:

You've missed an important part.

> > >>
> > >> process flow						CPU
> > >> ---------------------------------------------------------------------
> > >>
> > >> tracing_mark_raw_write():				cpu:0
> > >>     ...
> > >>     ring_buffer_lock_reserve():				cpu:0
> > >>        ...

	preempt_disable_notrace(); --> this is unlocked by ring_buffer_unlock_commit()

> > >>        cpu = raw_smp_processor_id()			cpu:0
> > >>        cpu_buffer = buffer->buffers[cpu]			cpu:0
> > >>        ...
> > >>     ...
> > >>     __copy_from_user_inatomic():				cpu:0

So this is called under preempt-disabled.

> > >>        ...
> > >>        # page fault
> > >>        do_mem_abort():					cpu:0  
> > > Sounds to me that arm64 __copy_from_user_inatomic() may be broken.
> > >  
> > >>           ...
> > >>           # Call schedule
> > >>           schedule()					cpu:0

If this does not check the preempt flag, it is a problem.
Maybe arm64 needs to do fixup and abort instead of do_mem_abort()?

> > >> 	 ...
> > >>     # the task schedule to cpu1
> > >>     __buffer_unlock_commit():				cpu:1
> > >>        ...
> > >>        ring_buffer_unlock_commit():			cpu:1
> > >> 	 ...
> > >> 	 cpu = raw_smp_processor_id()			cpu:1
> > >> 	 cpu_buffer = buffer->buffers[cpu]		cpu:1

	preempt_enable_notrace(); <-- here we enable preempt again.

> > >>
> > >> As shown above, the process will acquire cpuid twice and the return values
> > >> are not the same.
> > >>
> > >> To fix this problem using copy_from_user_nofault instead of
> > >> __copy_from_user_inatomic, as the former performs 'access_ok' before
> > >> copying.
> > >>
> > >> Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing")  
> > > The above commit was intorduced in 2016. copy_from_user_nofault() was
> > > introduced in 2020. I don't think this would be the fix for that kernel.
> > >
> > > So no, I'm not taking this patch. If you see __copy_from_user_inatomic()
> > > sleeping, it's users are not the issue. That function is.


BTW, the biggest difference between __copy_from_user() and
 __copy_from_user_inatomic() is `might_fault()` and `should_fail_usercopy()`.
The latter is a fault injection, so we can ignore it. But since
the `might_fail()` is NOT in __copy_from_user_inatomic(), it is designed
not to cause fault as Steve said?

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-08-29 22:13           ` Steven Rostedt
  2025-08-30 10:22             ` Catalin Marinas
  2025-09-01  9:43             ` Mark Rutland
@ 2025-09-01 16:01             ` Masami Hiramatsu
  2 siblings, 0 replies; 18+ messages in thread
From: Masami Hiramatsu @ 2025-09-01 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Catalin Marinas, Luo Gengkun, mhiramat, mathieu.desnoyers,
	linux-kernel, linux-trace-kernel, Will Deacon, linux-arm-kernel,
	Mark Rutland, Al Viro

On Fri, 29 Aug 2025 18:13:11 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 29 Aug 2025 20:53:40 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> valid user address.
> > 
> > BTW, arm64 also bails out early in do_page_fault() if in_atomic() but I
> > suspect that's not the case here.
> > 
> > Adding Al Viro since since he wrote a large part of uaccess.h.
> > 
> 
> So, __copy_from_user_inatomic() is supposed to be called if
> pagefault_disable() has already been called? If this is the case, can we
> add more comments to this code? I've been using the inatomic() version this
> way in preempt disabled locations since 2016.

Ah, OK. it is internal version... plz ignore my previous mail.

Thanks,

> 
> Looks like it needs to be converted to copy_from_user_nofault().
> 
> Luo, this version of the patch looks legit, no need for a v2.
> 
> I just wanted to figure out why __copy_from_user_inatomic() wasn't atomic.
> If anything, it needs to be better documented.
> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-09-01 15:56       ` Masami Hiramatsu
@ 2025-09-02  3:47         ` Luo Gengkun
  2025-09-02  7:35           ` Masami Hiramatsu
  2025-09-02 14:14           ` Steven Rostedt
  0 siblings, 2 replies; 18+ messages in thread
From: Luo Gengkun @ 2025-09-02  3:47 UTC (permalink / raw)
  To: Masami Hiramatsu (Google), Steven Rostedt
  Cc: mathieu.desnoyers, linux-kernel, linux-trace-kernel,
	Catalin Marinas, Will Deacon, linux-arm-kernel, Mark Rutland


On 2025/9/1 23:56, Masami Hiramatsu (Google) wrote:
> On Fri, 29 Aug 2025 08:26:04 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> [ Adding arm64 maintainers ]
>>
>> On Fri, 29 Aug 2025 16:29:07 +0800
>> Luo Gengkun <luogengkun@huaweicloud.com> wrote:
>>
>>> On 2025/8/20 1:50, Steven Rostedt wrote:
>>>> On Tue, 19 Aug 2025 10:51:52 +0000
>>>> Luo Gengkun <luogengkun@huaweicloud.com> wrote:
>>>>   
>>>>> Both tracing_mark_write and tracing_mark_raw_write call
>>>>> __copy_from_user_inatomic during preempt_disable. But in some case,
>>>>> __copy_from_user_inatomic may trigger page fault, and will call schedule()
>>>>> subtly. And if a task is migrated to other cpu, the following warning will
>>>> Wait! What?
>>>>
>>>> __copy_from_user_inatomic() is allowed to be called from in atomic context.
>>>> Hence the name it has. How the hell can it sleep? If it does, it's totally
>>>> broken!
>>>>
>>>> Now, I'm not against using nofault() as it is better named, but I want to
>>>> know why you are suggesting this change. Did you actually trigger a bug here?
>>> yes, I trigger this bug in arm64.
>> And I still think this is an arm64 bug.
> I think it could be.
>
>>>>   
>>>>> be trigger:
>>>>>           if (RB_WARN_ON(cpu_buffer,
>>>>>                          !local_read(&cpu_buffer->committing)))
>>>>>
>>>>> An example can illustrate this issue:
> You've missed an important part.
>
>>>>> process flow						CPU
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>> tracing_mark_raw_write():				cpu:0
>>>>>      ...
>>>>>      ring_buffer_lock_reserve():				cpu:0
>>>>>         ...
> 	preempt_disable_notrace(); --> this is unlocked by ring_buffer_unlock_commit()
>
>>>>>         cpu = raw_smp_processor_id()			cpu:0
>>>>>         cpu_buffer = buffer->buffers[cpu]			cpu:0
>>>>>         ...
>>>>>      ...
>>>>>      __copy_from_user_inatomic():				cpu:0
> So this is called under preempt-disabled.
>
>>>>>         ...
>>>>>         # page fault
>>>>>         do_mem_abort():					cpu:0
>>>> Sounds to me that arm64 __copy_from_user_inatomic() may be broken.
>>>>   
>>>>>            ...
>>>>>            # Call schedule
>>>>>            schedule()					cpu:0
> If this does not check the preempt flag, it is a problem.
> Maybe arm64 needs to do fixup and abort instead of do_mem_abort()?

My kernel was built without CONFIG_PREEMPT_COUNT, so the preempt_disable()
does nothing more than act as a barrier. In this case, it can pass the
check by schedule(). Perhaps this is another issue?

>
>>>>> 	 ...
>>>>>      # the task schedule to cpu1
>>>>>      __buffer_unlock_commit():				cpu:1
>>>>>         ...
>>>>>         ring_buffer_unlock_commit():			cpu:1
>>>>> 	 ...
>>>>> 	 cpu = raw_smp_processor_id()			cpu:1
>>>>> 	 cpu_buffer = buffer->buffers[cpu]		cpu:1
> 	preempt_enable_notrace(); <-- here we enable preempt again.
>
>>>>> As shown above, the process will acquire cpuid twice and the return values
>>>>> are not the same.
>>>>>
>>>>> To fix this problem using copy_from_user_nofault instead of
>>>>> __copy_from_user_inatomic, as the former performs 'access_ok' before
>>>>> copying.
>>>>>
>>>>> Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing")
>>>> The above commit was intorduced in 2016. copy_from_user_nofault() was
>>>> introduced in 2020. I don't think this would be the fix for that kernel.
>>>>
>>>> So no, I'm not taking this patch. If you see __copy_from_user_inatomic()
>>>> sleeping, it's users are not the issue. That function is.
>
> BTW, the biggest difference between __copy_from_user() and
>   __copy_from_user_inatomic() is `might_fault()` and `should_fail_usercopy()`.
> The latter is a fault injection, so we can ignore it. But since
> the `might_fail()` is NOT in __copy_from_user_inatomic(), it is designed
> not to cause fault as Steve said?
>
> Thank you,
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-09-02  3:47         ` Luo Gengkun
@ 2025-09-02  7:35           ` Masami Hiramatsu
  2025-09-02 14:14           ` Steven Rostedt
  1 sibling, 0 replies; 18+ messages in thread
From: Masami Hiramatsu @ 2025-09-02  7:35 UTC (permalink / raw)
  To: Luo Gengkun
  Cc: Steven Rostedt, mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Mark Rutland

On Tue, 2 Sep 2025 11:47:32 +0800
Luo Gengkun <luogengkun@huaweicloud.com> wrote:

> 
> On 2025/9/1 23:56, Masami Hiramatsu (Google) wrote:
> > On Fri, 29 Aug 2025 08:26:04 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> >> [ Adding arm64 maintainers ]
> >>
> >> On Fri, 29 Aug 2025 16:29:07 +0800
> >> Luo Gengkun <luogengkun@huaweicloud.com> wrote:
> >>
> >>> On 2025/8/20 1:50, Steven Rostedt wrote:
> >>>> On Tue, 19 Aug 2025 10:51:52 +0000
> >>>> Luo Gengkun <luogengkun@huaweicloud.com> wrote:
> >>>>   
> >>>>> Both tracing_mark_write and tracing_mark_raw_write call
> >>>>> __copy_from_user_inatomic during preempt_disable. But in some case,
> >>>>> __copy_from_user_inatomic may trigger page fault, and will call schedule()
> >>>>> subtly. And if a task is migrated to other cpu, the following warning will
> >>>> Wait! What?
> >>>>
> >>>> __copy_from_user_inatomic() is allowed to be called from in atomic context.
> >>>> Hence the name it has. How the hell can it sleep? If it does, it's totally
> >>>> broken!
> >>>>
> >>>> Now, I'm not against using nofault() as it is better named, but I want to
> >>>> know why you are suggesting this change. Did you actually trigger a bug here?
> >>> yes, I trigger this bug in arm64.
> >> And I still think this is an arm64 bug.
> > I think it could be.
> >
> >>>>   
> >>>>> be trigger:
> >>>>>           if (RB_WARN_ON(cpu_buffer,
> >>>>>                          !local_read(&cpu_buffer->committing)))
> >>>>>
> >>>>> An example can illustrate this issue:
> > You've missed an important part.
> >
> >>>>> process flow						CPU
> >>>>> ---------------------------------------------------------------------
> >>>>>
> >>>>> tracing_mark_raw_write():				cpu:0
> >>>>>      ...
> >>>>>      ring_buffer_lock_reserve():				cpu:0
> >>>>>         ...
> > 	preempt_disable_notrace(); --> this is unlocked by ring_buffer_unlock_commit()
> >
> >>>>>         cpu = raw_smp_processor_id()			cpu:0
> >>>>>         cpu_buffer = buffer->buffers[cpu]			cpu:0
> >>>>>         ...
> >>>>>      ...
> >>>>>      __copy_from_user_inatomic():				cpu:0
> > So this is called under preempt-disabled.
> >
> >>>>>         ...
> >>>>>         # page fault
> >>>>>         do_mem_abort():					cpu:0
> >>>> Sounds to me that arm64 __copy_from_user_inatomic() may be broken.
> >>>>   
> >>>>>            ...
> >>>>>            # Call schedule
> >>>>>            schedule()					cpu:0
> > If this does not check the preempt flag, it is a problem.
> > Maybe arm64 needs to do fixup and abort instead of do_mem_abort()?
> 
> My kernel was built without CONFIG_PREEMPT_COUNT, so the preempt_disable()
> does nothing more than act as a barrier. In this case, it can pass the
> check by schedule(). Perhaps this is another issue?

OK, I got it. Indeed, in that case, we have no way to check this
happens in the preempt critical section.
Anyway, as in discussed here, __copy_from_user_inatomic() is for
the internal function, so I'm also OK to this patch.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

BTW, currently we just write a fault message if the
__copy_from_user_*() hits a fault, but I think we can retry with
normal __copy_from_user() to a kernel buffer and copy it in the
ring buffer as slow path.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-09-01  9:43             ` Mark Rutland
@ 2025-09-02 14:11               ` Steven Rostedt
  0 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2025-09-02 14:11 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, Luo Gengkun, mhiramat, mathieu.desnoyers,
	linux-kernel, linux-trace-kernel, Will Deacon, linux-arm-kernel,
	Al Viro

On Mon, 1 Sep 2025 10:43:36 +0100
Mark Rutland <mark.rutland@arm.com> wrote:

> > So, __copy_from_user_inatomic() is supposed to be called if
> > pagefault_disable() has already been called? If this is the case, can we
> > add more comments to this code?  
> 
> Just to check, you're asking for better comments in <linux/uaccess.h>,
> right?

Yes.

> 
> > I've been using the inatomic() version this
> > way in preempt disabled locations since 2016.
> > 
> > Looks like it needs to be converted to copy_from_user_nofault().
> > 
> > Luo, this version of the patch looks legit, no need for a v2.
> > 
> > I just wanted to figure out why __copy_from_user_inatomic() wasn't atomic.
> > If anything, it needs to be better documented.  
> 
> If that had roughly the same kerneldoc comment as for
> __copy_to_user_inatomic(), would that be sufficient, or do you think
> both need to be made more explicit?

Yeah, it would have been very helpful to have that. Note, my use of
__copy_from_user_inatomic() was added before copy_from_user_nofault()
existed. So it would likely not have been helpful back then ;-)

I think even adding a comment to the kerneldoc of these functions with
something like:

  You probably want to use copy_from/to_user_nofault(). These are more
  for internal handling, and should be avoided unless you really know
  what you are doing.

-- Steve


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable
  2025-09-02  3:47         ` Luo Gengkun
  2025-09-02  7:35           ` Masami Hiramatsu
@ 2025-09-02 14:14           ` Steven Rostedt
  1 sibling, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2025-09-02 14:14 UTC (permalink / raw)
  To: Luo Gengkun
  Cc: Masami Hiramatsu (Google), mathieu.desnoyers, linux-kernel,
	linux-trace-kernel, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Mark Rutland

On Tue, 2 Sep 2025 11:47:32 +0800
Luo Gengkun <luogengkun@huaweicloud.com> wrote:

> > If this does not check the preempt flag, it is a problem.
> > Maybe arm64 needs to do fixup and abort instead of do_mem_abort()?  
> 
> My kernel was built without CONFIG_PREEMPT_COUNT, so the preempt_disable()
> does nothing more than act as a barrier. In this case, it can pass the
> check by schedule(). Perhaps this is another issue?

This is why I never triggered it. I always have PREEMPT_COUNT enabled.
I have tests that test without it, but I don't think those tests access
trace_marker, and if they do, they don't stress it.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-09-02 14:15 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 10:51 [PATCH] tracing: Fix tracing_marker may trigger page fault during preempt_disable Luo Gengkun
2025-08-19 17:50 ` Steven Rostedt
2025-08-29  8:29   ` Luo Gengkun
2025-08-29 12:26     ` Steven Rostedt
2025-08-29 12:36       ` Steven Rostedt
2025-08-29 19:53         ` Catalin Marinas
2025-08-29 22:13           ` Steven Rostedt
2025-08-30 10:22             ` Catalin Marinas
2025-09-01  9:56               ` Mark Rutland
2025-09-01 12:28                 ` Catalin Marinas
2025-09-01 13:07                   ` Mark Rutland
2025-09-01  9:43             ` Mark Rutland
2025-09-02 14:11               ` Steven Rostedt
2025-09-01 16:01             ` Masami Hiramatsu
2025-09-01 15:56       ` Masami Hiramatsu
2025-09-02  3:47         ` Luo Gengkun
2025-09-02  7:35           ` Masami Hiramatsu
2025-09-02 14:14           ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).