* mm/memory-failure tracepoint change breaks userspace rasdaemon
@ 2026-06-03 13:11 Zhuo, Qiuxu
2026-06-03 13:44 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 17+ messages in thread
From: Zhuo, Qiuxu @ 2026-06-03 13:11 UTC (permalink / raw)
To: david@kernel.org, rostedt@goodmis.org, mchehab+huawei@kernel.org,
Luck, Tony, bp@alien8.de, akpm@linux-foundation.org,
linmiaohe@huawei.com, xieyuanbin1@huawei.com
Cc: Lai, Yi1, Zhuo, Qiuxu, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org, linux-mm@kvack.org,
linux-trace-kernel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 846 bytes --]
Hi,
Laiyi reported that the userspace rasdaemon fails to enable memory_failure_event on kernels >= v6.19.
Kernel commit 31807483d395 ("mm/memory-failure: remove the selection of RAS"), merged in v6.19-rc1,
moved the memory_failure_event tracepoint from the "ras" subsystem to "memory_failure".
However, rasdaemon still tries to enable:
ras:memory_failure_event
while on v6.19+ kernels, the tracepoint is:
memory_failure:memory_failure_event
As a result, rasdaemon fails to start:
...
Can't write to set_event
Huh! something got wrong. Aborting.
...
Reproducer:
rasdaemon --enable
Could you please let me know whether the preferred solution is to revert the kernel change,
or to update rasdaemon to support both tracepoint names for backward/forward compatibility?
Thanks,
- Qiuxu
[-- Attachment #2: Type: text/html, Size: 3698 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 13:11 mm/memory-failure tracepoint change breaks userspace rasdaemon Zhuo, Qiuxu @ 2026-06-03 13:44 ` David Hildenbrand (Arm) 2026-06-03 16:17 ` Steven Rostedt 0 siblings, 1 reply; 17+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-03 13:44 UTC (permalink / raw) To: Zhuo, Qiuxu, rostedt@goodmis.org, mchehab+huawei@kernel.org, Luck, Tony, bp@alien8.de, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com Cc: Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org On 6/3/26 15:11, Zhuo, Qiuxu wrote: > Hi, > > > > Laiyi reported that the userspace rasdaemon fails to enable memory_failure_event > on kernels >= v6.19. > > > > Kernel commit 31807483d395 ("mm/memory-failure: remove the selection of RAS"), > merged in v6.19-rc1, > > moved the memory_failure_event tracepoint from the "ras" subsystem to > "memory_failure". > > However, rasdaemon still tries to enable: > > > > ras:memory_failure_event > > > > while on v6.19+ kernels, the tracepoint is: > > > > memory_failure:memory_failure_event > > > > As a result, rasdaemon fails to start: > > > > … > > Can't write to set_event > > Huh! something got wrong. Aborting. > > … > > > > Reproducer: > > > > rasdaemon --enable > > > > Could you please let me know whether the preferred solution is to revert the > kernel change, > > or to update rasdaemon to support both tracepoint names for backward/forward > compatibility? Likely the latter. BPF [1] documents: Q: Are tracepoints part of the stable ABI? A: NO. Tracepoints are tied to internal implementation details hence they are subject to change and can break with newer kernels. BPF programs need to change accordingly when this happens. The Kernel ABI document explicitly doesn't list them AFAIKS. There were previous discussions on the stability of tracepints [2], I don't know what changed in the meantime. CCing Steve [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html [2] https://lwn.net/Articles/747256/ [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html -- Cheers, David ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 13:44 ` David Hildenbrand (Arm) @ 2026-06-03 16:17 ` Steven Rostedt 2026-06-03 16:19 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Steven Rostedt @ 2026-06-03 16:17 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, bp@alien8.de, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On Wed, 3 Jun 2026 15:44:54 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote: > Likely the latter. BPF [1] documents: > > Q: Are tracepoints part of the stable ABI? > A: NO. Tracepoints are tied to internal implementation details hence they are > subject to change and can break with newer kernels. BPF programs need to change > accordingly when this happens. > > The Kernel ABI document explicitly doesn't list them AFAIKS. > > There were previous discussions on the stability of tracepints [2], I don't know > what changed in the meantime. CCing Steve > > [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html > [2] https://lwn.net/Articles/747256/ > [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html Tracepoints are not stable or BPF programs only. But other applications they are[1]. Adding Linus as he's the Supreme Judge on the matter. -- Steve [1] https://lwn.net/Articles/442113/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 16:17 ` Steven Rostedt @ 2026-06-03 16:19 ` Borislav Petkov 2026-06-03 16:26 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2026-06-03 16:19 UTC (permalink / raw) To: Steven Rostedt Cc: David Hildenbrand (Arm), Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On Wed, Jun 03, 2026 at 12:17:07PM -0400, Steven Rostedt wrote: > On Wed, 3 Jun 2026 15:44:54 +0200 > "David Hildenbrand (Arm)" <david@kernel.org> wrote: > > > Likely the latter. BPF [1] documents: > > > > Q: Are tracepoints part of the stable ABI? > > A: NO. Tracepoints are tied to internal implementation details hence they are > > subject to change and can break with newer kernels. BPF programs need to change > > accordingly when this happens. > > > > The Kernel ABI document explicitly doesn't list them AFAIKS. > > > > There were previous discussions on the stability of tracepints [2], I don't know > > what changed in the meantime. CCing Steve > > > > [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html > > [2] https://lwn.net/Articles/747256/ > > [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html > > Tracepoints are not stable or BPF programs only. But other applications > they are[1]. > > Adding Linus as he's the Supreme Judge on the matter. I *think* tools or libtraceevent can't really anticipate the TP namespace change so we might have to revert, I'm afraid... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 16:19 ` Borislav Petkov @ 2026-06-03 16:26 ` David Hildenbrand (Arm) 2026-06-03 17:00 ` Steven Rostedt 0 siblings, 1 reply; 17+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-03 16:26 UTC (permalink / raw) To: Borislav Petkov, Steven Rostedt Cc: Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On 6/3/26 18:19, Borislav Petkov wrote: > On Wed, Jun 03, 2026 at 12:17:07PM -0400, Steven Rostedt wrote: >> On Wed, 3 Jun 2026 15:44:54 +0200 >> "David Hildenbrand (Arm)" <david@kernel.org> wrote: >> >>> Likely the latter. BPF [1] documents: >>> >>> Q: Are tracepoints part of the stable ABI? >>> A: NO. Tracepoints are tied to internal implementation details hence they are >>> subject to change and can break with newer kernels. BPF programs need to change >>> accordingly when this happens. >>> >>> The Kernel ABI document explicitly doesn't list them AFAIKS. >>> >>> There were previous discussions on the stability of tracepints [2], I don't know >>> what changed in the meantime. CCing Steve >>> >>> [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html >>> [2] https://lwn.net/Articles/747256/ >>> [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html >> >> Tracepoints are not stable or BPF programs only. But other applications >> they are[1]. >> >> Adding Linus as he's the Supreme Judge on the matter. > > I *think* tools or libtraceevent can't really anticipate the TP namespace > change so we might have to revert, I'm afraid... Yeah, I was fearing that when I read in [2]: "It has become clear in the past that this promise extends to tracepoints, most notably in 2011 when a tracepoint change broke powertop and had to be reverted." Which means that I now also fully understand "Some kernel maintainers prohibit or severely restrict the addition of tracepoints to their subsystems out of fear that a similar thing could happen to them. " Whatever the result of this discussion will be, I'll try to document it. -- Cheers, David ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 16:26 ` David Hildenbrand (Arm) @ 2026-06-03 17:00 ` Steven Rostedt 2026-06-03 19:13 ` David Hildenbrand (Arm) 2026-06-03 19:54 ` Andrew Morton 0 siblings, 2 replies; 17+ messages in thread From: Steven Rostedt @ 2026-06-03 17:00 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On Wed, 3 Jun 2026 18:26:24 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote: > Yeah, I was fearing that when I read in [2]: > > "It has become clear in the past that this promise extends to > tracepoints, most notably in 2011 when a tracepoint change broke > powertop and had to be reverted." Technically the issue is with trace events and not tracepoints. The difference is that a trace event is created via the TRACE_EVENT() macro which defines what is to be collected from the tracepoint and exposes that information to tracefs which applications can easily see. A tracepoint is simply the hook in the code that you can attach to. Trace events create a callback from that hook to extract the data from the tracepoint to fill in the fields. > > Which means that I now also fully understand > > "Some kernel maintainers prohibit or severely restrict the addition of > tracepoints to their subsystems out of fear that a similar thing could > happen to them. " > > Whatever the result of this discussion will be, I'll try to document it. You can still create a tracepoint without creating a trace event by using the DECLARE_TRACE() macro. The scheduler subsystem uses that quite extensively. That creates a tracepoint without exposing it to tracefs. The runtime verifier uses these hooks to monitor the scheduler. But you can still connect to these tracepoints from tracefs via a tprobe. A tprobe hooks to tracepoints that you need the source code to find (just like a fprobe hooks to any function). Thus applications *can't* rely on them because there's nothing there to tell you it exists or not. For example, for the given tracepoint: # cd /sys/kernel/tracing # echo 't:rfail memory_failure_event pfn=pfn type=type result=result' > dynamic_events # cat events/tracepoints/rfail/format name: rfail ID: 1894 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:unsigned long __probe_ip; offset:8; size:8; signed:0; field:u64 pfn; offset:16; size:8; signed:0; field:s32 type; offset:24; size:4; signed:1; field:s32 result; offset:28; size:4; signed:1; print fmt: "(%lx) pfn=%Lu type=%d result=%d", REC->__probe_ip, REC->pfn, REC->type, REC->result It requires that BTF exists and the above doesn't annotate the result as nicely. But you can get data directly from tracepoints this way. -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 17:00 ` Steven Rostedt @ 2026-06-03 19:13 ` David Hildenbrand (Arm) 2026-06-03 19:30 ` Steven Rostedt ` (3 more replies) 2026-06-03 19:54 ` Andrew Morton 1 sibling, 4 replies; 17+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-03 19:13 UTC (permalink / raw) To: Steven Rostedt Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On 6/3/26 19:00, Steven Rostedt wrote: > On Wed, 3 Jun 2026 18:26:24 +0200 > "David Hildenbrand (Arm)" <david@kernel.org> wrote: > >> Yeah, I was fearing that when I read in [2]: >> >> "It has become clear in the past that this promise extends to >> tracepoints, most notably in 2011 when a tracepoint change broke >> powertop and had to be reverted." > > Technically the issue is with trace events and not tracepoints. The > difference is that a trace event is created via the TRACE_EVENT() macro > which defines what is to be collected from the tracepoint and exposes that > information to tracefs which applications can easily see. > > A tracepoint is simply the hook in the code that you can attach to. Trace > events create a callback from that hook to extract the data from the > tracepoint to fill in the fields. > >> >> Which means that I now also fully understand >> >> "Some kernel maintainers prohibit or severely restrict the addition of >> tracepoints to their subsystems out of fear that a similar thing could >> happen to them. " >> >> Whatever the result of this discussion will be, I'll try to document it. > > You can still create a tracepoint without creating a trace event by using > the DECLARE_TRACE() macro. The scheduler subsystem uses that quite > extensively. That creates a tracepoint without exposing it to tracefs. The > runtime verifier uses these hooks to monitor the scheduler. > > But you can still connect to these tracepoints from tracefs via a tprobe. A > tprobe hooks to tracepoints that you need the source code to find (just > like a fprobe hooks to any function). Thus applications *can't* rely on > them because there's nothing there to tell you it exists or not. Thanks, that makes sense! So, would it be fair to say that, in general, what's exposed through /sys/kernel/tracing/events/ is stable ABI? Would the following be sufficient to avoid a full revert and the dependency on CONFIG_RAS? diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h index aa57cc8f896b..c46b17602578 100644 --- a/include/trace/events/memory-failure.h +++ b/include/trace/events/memory-failure.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ #undef TRACE_SYSTEM -#define TRACE_SYSTEM memory_failure +/* Some user space relies on ras/memory_failure_event */ +#define TRACE_SYSTEM ras #define TRACE_INCLUDE_FILE memory-failure #if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ) -- Cheers, David ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 19:13 ` David Hildenbrand (Arm) @ 2026-06-03 19:30 ` Steven Rostedt 2026-06-03 19:31 ` Steven Rostedt ` (2 subsequent siblings) 3 siblings, 0 replies; 17+ messages in thread From: Steven Rostedt @ 2026-06-03 19:30 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On Wed, 3 Jun 2026 21:13:30 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote: > Would the following be sufficient to avoid a full revert and the dependency on CONFIG_RAS? > > diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h > index aa57cc8f896b..c46b17602578 100644 > --- a/include/trace/events/memory-failure.h > +++ b/include/trace/events/memory-failure.h > @@ -1,6 +1,7 @@ > /* SPDX-License-Identifier: GPL-2.0 */ > #undef TRACE_SYSTEM > -#define TRACE_SYSTEM memory_failure > +/* Some user space relies on ras/memory_failure_event */ > +#define TRACE_SYSTEM ras If that puts back the original path then yeah, all would be good. -- Steve > #define TRACE_INCLUDE_FILE memory-failure > > #if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 19:13 ` David Hildenbrand (Arm) 2026-06-03 19:30 ` Steven Rostedt @ 2026-06-03 19:31 ` Steven Rostedt 2026-06-05 8:52 ` David Hildenbrand (Arm) 2026-06-04 1:46 ` Xie Yuanbin 2026-06-04 15:43 ` Zhuo, Qiuxu 3 siblings, 1 reply; 17+ messages in thread From: Steven Rostedt @ 2026-06-03 19:31 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On Wed, 3 Jun 2026 21:13:30 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote: > Thanks, that makes sense! > > So, would it be fair to say that, in general, what's exposed through > > /sys/kernel/tracing/events/ > > is stable ABI? It's only stable if something depends on it. It changes all the time. It's only when someone complains about it that it becomes "stable"! -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 19:31 ` Steven Rostedt @ 2026-06-05 8:52 ` David Hildenbrand (Arm) 2026-06-05 14:13 ` Steven Rostedt 0 siblings, 1 reply; 17+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-05 8:52 UTC (permalink / raw) To: Steven Rostedt Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On 6/3/26 21:31, Steven Rostedt wrote: > On Wed, 3 Jun 2026 21:13:30 +0200 > "David Hildenbrand (Arm)" <david@kernel.org> wrote: > >> Thanks, that makes sense! >> >> So, would it be fair to say that, in general, what's exposed through >> >> /sys/kernel/tracing/events/ >> >> is stable ABI? > > It's only stable if something depends on it. It changes all the time. > It's only when someone complains about it that it becomes "stable"! Heh, so we only know that it's stable when we break it ... Let me figure out how to document that. -- Cheers, David ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-05 8:52 ` David Hildenbrand (Arm) @ 2026-06-05 14:13 ` Steven Rostedt 0 siblings, 0 replies; 17+ messages in thread From: Steven Rostedt @ 2026-06-05 14:13 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On June 5, 2026 4:52:28 AM EDT, "David Hildenbrand (Arm)" <david@kernel.org> wrote: >On 6/3/26 21:31, Steven Rostedt wrote: >> On Wed, 3 Jun 2026 21:13:30 +0200 >> "David Hildenbrand (Arm)" <david@kernel.org> wrote: >> >>> Thanks, that makes sense! >>> >>> So, would it be fair to say that, in general, what's exposed through >>> >>> /sys/kernel/tracing/events/ >>> >>> is stable ABI? >> >> It's only stable if something depends on it. It changes all the time. >> It's only when someone complains about it that it becomes "stable"! > >Heh, so we only know that it's stable when we break it ... > >Let me figure out how to document that. > Yep. That's basically Linus's rule. He even said we break user space API all the time. What we don't allow is to break actual user space. The problem is that you can break user space by fixing an API without knowing something depended on the bug. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 19:13 ` David Hildenbrand (Arm) 2026-06-03 19:30 ` Steven Rostedt 2026-06-03 19:31 ` Steven Rostedt @ 2026-06-04 1:46 ` Xie Yuanbin 2026-06-04 6:42 ` David Hildenbrand (Arm) 2026-06-04 15:43 ` Zhuo, Qiuxu 3 siblings, 1 reply; 17+ messages in thread From: Xie Yuanbin @ 2026-06-04 1:46 UTC (permalink / raw) To: david, qiuxu.zhuo, bp, akpm, rostedt, linmiaohe Cc: linux-edac, linux-kernel, linux-mm, linux-trace-kernel, mchehab+huawei, tony.luck, torvalds, xieyuanbin1, yi1.lai On Wed, 3 Jun 2026 21:13:30 +0200, David Hildenbrand (Arm) wrote: > Would the following be sufficient to avoid a full revert and the dependency on CONFIG_RAS? > > diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h > index aa57cc8f896b..c46b17602578 100644 > --- a/include/trace/events/memory-failure.h > +++ b/include/trace/events/memory-failure.h > @@ -1,6 +1,7 @@ > /* SPDX-License-Identifier: GPL-2.0 */ > #undef TRACE_SYSTEM > -#define TRACE_SYSTEM memory_failure > +/* Some user space relies on ras/memory_failure_event */ > +#define TRACE_SYSTEM ras > #define TRACE_INCLUDE_FILE memory-failure > > #if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ) Yes, it should be. In fact, when I sent the V2 patch, I had already considered this issue, and that's exactly what I did: Link: https://lore.kernel.org/20251104072306.100738-3-xieyuanbin1@huawei.com However, David Hildenbrand advised me at that time to completely remove the dependence on RAS: Link: https://lore.kernel.org/01b44e0f-ea2e-406f-9f65-b698b5504f42@kernel.org ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-04 1:46 ` Xie Yuanbin @ 2026-06-04 6:42 ` David Hildenbrand (Arm) 2026-06-04 13:42 ` Xie Yuanbin 0 siblings, 1 reply; 17+ messages in thread From: David Hildenbrand (Arm) @ 2026-06-04 6:42 UTC (permalink / raw) To: Xie Yuanbin, qiuxu.zhuo, bp, akpm, rostedt, linmiaohe Cc: linux-edac, linux-kernel, linux-mm, linux-trace-kernel, mchehab+huawei, tony.luck, torvalds, yi1.lai On 6/4/26 03:46, Xie Yuanbin wrote: > On Wed, 3 Jun 2026 21:13:30 +0200, David Hildenbrand (Arm) wrote: >> Would the following be sufficient to avoid a full revert and the dependency on CONFIG_RAS? >> >> diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h >> index aa57cc8f896b..c46b17602578 100644 >> --- a/include/trace/events/memory-failure.h >> +++ b/include/trace/events/memory-failure.h >> @@ -1,6 +1,7 @@ >> /* SPDX-License-Identifier: GPL-2.0 */ >> #undef TRACE_SYSTEM >> -#define TRACE_SYSTEM memory_failure >> +/* Some user space relies on ras/memory_failure_event */ >> +#define TRACE_SYSTEM ras >> #define TRACE_INCLUDE_FILE memory-failure >> >> #if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ) > > Yes, it should be. In fact, when I sent the V2 patch, I had already > considered this issue, and that's exactly what I did: > Link: https://lore.kernel.org/20251104072306.100738-3-xieyuanbin1@huawei.com > > However, David Hildenbrand advised me at that time to completely > remove the dependence on RAS: > Link: https://lore.kernel.org/01b44e0f-ea2e-406f-9f65-b698b5504f42@kernel.org Yeah, if only I had known that we would break user space by changing trace events ... now we know :) Do you have capacity to send a fix? -- Cheers, David ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-04 6:42 ` David Hildenbrand (Arm) @ 2026-06-04 13:42 ` Xie Yuanbin 2026-06-04 15:48 ` Zhuo, Qiuxu 0 siblings, 1 reply; 17+ messages in thread From: Xie Yuanbin @ 2026-06-04 13:42 UTC (permalink / raw) To: david, qiuxu.zhuo, bp, akpm, rostedt, linmiaohe Cc: linux-edac, linux-kernel, linux-mm, linux-trace-kernel, mchehab+huawei, tony.luck, torvalds, xieyuanbin1, yi1.lai On Thu, 4 Jun 2026 08:42:37 +0200, David Hildenbrand (Arm) wrote: > Yeah, if only I had known that we would break user space by changing trace > events ... now we know :) > > Do you have capacity to send a fix? Sure, with pleasure. ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-04 13:42 ` Xie Yuanbin @ 2026-06-04 15:48 ` Zhuo, Qiuxu 0 siblings, 0 replies; 17+ messages in thread From: Zhuo, Qiuxu @ 2026-06-04 15:48 UTC (permalink / raw) To: Xie Yuanbin, david@kernel.org, bp@alien8.de, akpm@linux-foundation.org, rostedt@goodmis.org, linmiaohe@huawei.com Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, mchehab+huawei@kernel.org, Luck, Tony, torvalds@linux-foundation.org, Lai, Yi1 > From: Xie Yuanbin <xieyuanbin1@huawei.com> > Sent: Thursday, June 4, 2026 9:42 PM > To: david@kernel.org; Zhuo, Qiuxu <qiuxu.zhuo@intel.com>; bp@alien8.de; > akpm@linux-foundation.org; rostedt@goodmis.org; linmiaohe@huawei.com > Cc: linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org; linux- > mm@kvack.org; linux-trace-kernel@vger.kernel.org; > mchehab+huawei@kernel.org; Luck, Tony <tony.luck@intel.com>; > torvalds@linux-foundation.org; xieyuanbin1@huawei.com; Lai, Yi1 > <yi1.lai@intel.com> > Subject: Re: mm/memory-failure tracepoint change breaks userspace > rasdaemon > > On Thu, 4 Jun 2026 08:42:37 +0200, David Hildenbrand (Arm) wrote: > > Yeah, if only I had known that we would break user space by changing > > trace events ... now we know :) > > > > Do you have capacity to send a fix? > > Sure, with pleasure. Thanks Yuanbin, When your patch is ready, we can help test it again if needed. -Qiuxu ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 19:13 ` David Hildenbrand (Arm) ` (2 preceding siblings ...) 2026-06-04 1:46 ` Xie Yuanbin @ 2026-06-04 15:43 ` Zhuo, Qiuxu 3 siblings, 0 replies; 17+ messages in thread From: Zhuo, Qiuxu @ 2026-06-04 15:43 UTC (permalink / raw) To: David Hildenbrand (Arm), Steven Rostedt Cc: Borislav Petkov, mchehab+huawei@kernel.org, Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds > From: David Hildenbrand (Arm) <david@kernel.org> > [...] > Would the following be sufficient to avoid a full revert and the dependency > on CONFIG_RAS? > > diff --git a/include/trace/events/memory-failure.h > b/include/trace/events/memory-failure.h > index aa57cc8f896b..c46b17602578 100644 > --- a/include/trace/events/memory-failure.h > +++ b/include/trace/events/memory-failure.h > @@ -1,6 +1,7 @@ > /* SPDX-License-Identifier: GPL-2.0 */ > #undef TRACE_SYSTEM > -#define TRACE_SYSTEM memory_failure > +/* Some user space relies on ras/memory_failure_event */ #define > +TRACE_SYSTEM ras > #define TRACE_INCLUDE_FILE memory-failure > Thanks all for the discussion on this issue. We applied David's above fix to v7.1-rc3, tested it, and confirmed that rasdaemon can again enable and receive the memory_failure event. Rasdaemon logs: ... rasdaemon: ras:memory_failure_event event enabled rasdaemon: Enabled event ras:memory_failure_event ... <...>-2513 [000] ..... 0.000021 memory_failure_event [ALERT] 2026-06-04 23:30:43 +0800 pfn=0x144e6f page_type=dirty LRU page action_result=Recovered ... -Qiuxu ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon 2026-06-03 17:00 ` Steven Rostedt 2026-06-03 19:13 ` David Hildenbrand (Arm) @ 2026-06-03 19:54 ` Andrew Morton 1 sibling, 0 replies; 17+ messages in thread From: Andrew Morton @ 2026-06-03 19:54 UTC (permalink / raw) To: Steven Rostedt Cc: David Hildenbrand (Arm), Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, linmiaohe@huawei.com, xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Linus Torvalds On Wed, 3 Jun 2026 13:00:06 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 3 Jun 2026 18:26:24 +0200 > "David Hildenbrand (Arm)" <david@kernel.org> wrote: > > > Yeah, I was fearing that when I read in [2]: > > > > "It has become clear in the past that this promise extends to > > tracepoints, most notably in 2011 when a tracepoint change broke > > powertop and had to be reverted." > > Technically the issue is with trace events and not tracepoints. The > difference is that a trace event is created via the TRACE_EVENT() macro > which defines what is to be collected from the tracepoint and exposes that > information to tracefs which applications can easily see. > > A tracepoint is simply the hook in the code that you can attach to. Trace > events create a callback from that hook to extract the data from the > tracepoint to fill in the fields. The problem here appears to be that "ras:memory_failure_event" became "memory_failure:memory_failure_event". Perhaps we can add infrastructure to permit aliasing "ras" onto "memory_failure". So if we make these namespace alterations we can easily preserve back-compatibility? ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-06-05 14:13 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-03 13:11 mm/memory-failure tracepoint change breaks userspace rasdaemon Zhuo, Qiuxu 2026-06-03 13:44 ` David Hildenbrand (Arm) 2026-06-03 16:17 ` Steven Rostedt 2026-06-03 16:19 ` Borislav Petkov 2026-06-03 16:26 ` David Hildenbrand (Arm) 2026-06-03 17:00 ` Steven Rostedt 2026-06-03 19:13 ` David Hildenbrand (Arm) 2026-06-03 19:30 ` Steven Rostedt 2026-06-03 19:31 ` Steven Rostedt 2026-06-05 8:52 ` David Hildenbrand (Arm) 2026-06-05 14:13 ` Steven Rostedt 2026-06-04 1:46 ` Xie Yuanbin 2026-06-04 6:42 ` David Hildenbrand (Arm) 2026-06-04 13:42 ` Xie Yuanbin 2026-06-04 15:48 ` Zhuo, Qiuxu 2026-06-04 15:43 ` Zhuo, Qiuxu 2026-06-03 19:54 ` Andrew Morton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.