* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: Steven Rostedt @ 2026-06-03 17:00 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org,
Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com,
xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org, linux-mm@kvack.org,
linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <0c16bf3d-7c6d-4e28-b200-03b7d0ef714a@kernel.org>
On Wed, 3 Jun 2026 18:26:24 +0200
"David Hildenbrand (Arm)" <david@kernel.org> wrote:
> Yeah, I was fearing that when I read in [2]:
>
> "It has become clear in the past that this promise extends to
> tracepoints, most notably in 2011 when a tracepoint change broke
> powertop and had to be reverted."
Technically the issue is with trace events and not tracepoints. The
difference is that a trace event is created via the TRACE_EVENT() macro
which defines what is to be collected from the tracepoint and exposes that
information to tracefs which applications can easily see.
A tracepoint is simply the hook in the code that you can attach to. Trace
events create a callback from that hook to extract the data from the
tracepoint to fill in the fields.
>
> Which means that I now also fully understand
>
> "Some kernel maintainers prohibit or severely restrict the addition of
> tracepoints to their subsystems out of fear that a similar thing could
> happen to them. "
>
> Whatever the result of this discussion will be, I'll try to document it.
You can still create a tracepoint without creating a trace event by using
the DECLARE_TRACE() macro. The scheduler subsystem uses that quite
extensively. That creates a tracepoint without exposing it to tracefs. The
runtime verifier uses these hooks to monitor the scheduler.
But you can still connect to these tracepoints from tracefs via a tprobe. A
tprobe hooks to tracepoints that you need the source code to find (just
like a fprobe hooks to any function). Thus applications *can't* rely on
them because there's nothing there to tell you it exists or not.
For example, for the given tracepoint:
# cd /sys/kernel/tracing
# echo 't:rfail memory_failure_event pfn=pfn type=type result=result' > dynamic_events
# cat events/tracepoints/rfail/format
name: rfail
ID: 1894
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long __probe_ip; offset:8; size:8; signed:0;
field:u64 pfn; offset:16; size:8; signed:0;
field:s32 type; offset:24; size:4; signed:1;
field:s32 result; offset:28; size:4; signed:1;
print fmt: "(%lx) pfn=%Lu type=%d result=%d", REC->__probe_ip, REC->pfn, REC->type, REC->result
It requires that BTF exists and the above doesn't annotate the result as
nicely. But you can get data directly from tracepoints this way.
-- Steve
^ permalink raw reply
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: David Hildenbrand (Arm) @ 2026-06-03 16:26 UTC (permalink / raw)
To: Borislav Petkov, Steven Rostedt
Cc: Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony,
akpm@linux-foundation.org, linmiaohe@huawei.com,
xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org, linux-mm@kvack.org,
linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <20260603161947.GBaiBUI7C8WWPwD84S@fat_crate.local>
On 6/3/26 18:19, Borislav Petkov wrote:
> On Wed, Jun 03, 2026 at 12:17:07PM -0400, Steven Rostedt wrote:
>> On Wed, 3 Jun 2026 15:44:54 +0200
>> "David Hildenbrand (Arm)" <david@kernel.org> wrote:
>>
>>> Likely the latter. BPF [1] documents:
>>>
>>> Q: Are tracepoints part of the stable ABI?
>>> A: NO. Tracepoints are tied to internal implementation details hence they are
>>> subject to change and can break with newer kernels. BPF programs need to change
>>> accordingly when this happens.
>>>
>>> The Kernel ABI document explicitly doesn't list them AFAIKS.
>>>
>>> There were previous discussions on the stability of tracepints [2], I don't know
>>> what changed in the meantime. CCing Steve
>>>
>>> [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html
>>> [2] https://lwn.net/Articles/747256/
>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html
>>
>> Tracepoints are not stable or BPF programs only. But other applications
>> they are[1].
>>
>> Adding Linus as he's the Supreme Judge on the matter.
>
> I *think* tools or libtraceevent can't really anticipate the TP namespace
> change so we might have to revert, I'm afraid...
Yeah, I was fearing that when I read in [2]:
"It has become clear in the past that this promise extends to
tracepoints, most notably in 2011 when a tracepoint change broke
powertop and had to be reverted."
Which means that I now also fully understand
"Some kernel maintainers prohibit or severely restrict the addition of
tracepoints to their subsystems out of fear that a similar thing could
happen to them. "
Whatever the result of this discussion will be, I'll try to document it.
--
Cheers,
David
^ permalink raw reply
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: Borislav Petkov @ 2026-06-03 16:19 UTC (permalink / raw)
To: Steven Rostedt
Cc: David Hildenbrand (Arm), Zhuo, Qiuxu, mchehab+huawei@kernel.org,
Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com,
xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org, linux-mm@kvack.org,
linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <20260603121707.7eccb9fb@gandalf.local.home>
On Wed, Jun 03, 2026 at 12:17:07PM -0400, Steven Rostedt wrote:
> On Wed, 3 Jun 2026 15:44:54 +0200
> "David Hildenbrand (Arm)" <david@kernel.org> wrote:
>
> > Likely the latter. BPF [1] documents:
> >
> > Q: Are tracepoints part of the stable ABI?
> > A: NO. Tracepoints are tied to internal implementation details hence they are
> > subject to change and can break with newer kernels. BPF programs need to change
> > accordingly when this happens.
> >
> > The Kernel ABI document explicitly doesn't list them AFAIKS.
> >
> > There were previous discussions on the stability of tracepints [2], I don't know
> > what changed in the meantime. CCing Steve
> >
> > [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html
> > [2] https://lwn.net/Articles/747256/
> > [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html
>
> Tracepoints are not stable or BPF programs only. But other applications
> they are[1].
>
> Adding Linus as he's the Supreme Judge on the matter.
I *think* tools or libtraceevent can't really anticipate the TP namespace
change so we might have to revert, I'm afraid...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: Steven Rostedt @ 2026-06-03 16:17 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, bp@alien8.de,
akpm@linux-foundation.org, linmiaohe@huawei.com,
xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org, linux-mm@kvack.org,
linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <b930678d-a1ae-458c-8705-7ca9680d4cb6@kernel.org>
On Wed, 3 Jun 2026 15:44:54 +0200
"David Hildenbrand (Arm)" <david@kernel.org> wrote:
> Likely the latter. BPF [1] documents:
>
> Q: Are tracepoints part of the stable ABI?
> A: NO. Tracepoints are tied to internal implementation details hence they are
> subject to change and can break with newer kernels. BPF programs need to change
> accordingly when this happens.
>
> The Kernel ABI document explicitly doesn't list them AFAIKS.
>
> There were previous discussions on the stability of tracepints [2], I don't know
> what changed in the meantime. CCing Steve
>
> [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html
> [2] https://lwn.net/Articles/747256/
> [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html
Tracepoints are not stable or BPF programs only. But other applications
they are[1].
Adding Linus as he's the Supreme Judge on the matter.
-- Steve
[1] https://lwn.net/Articles/442113/
^ permalink raw reply
* Re: [PATCH v2 06/13] verification/rvgen: Convert __fill_verify_guards_func() to Lark
From: Gabriele Monaco @ 2026-06-03 16:00 UTC (permalink / raw)
To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
linux-kernel
In-Reply-To: <2157c2e5fe34251c403604f0d58a3a686a6f6a8b.1779956342.git.namcao@linutronix.de>
On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> Prepare to remove self.guards and self.__parse_constraints(), convert
> __fill_verify_guards_func() to use the parsed transitions from Lark.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
> v2:
> - fix up the ';' separator [Gabriele]
> - make return type consistent with the function's signature
> [Wander]
> - fix up __parse_guard_rule()'s signature
> ---
> tools/verification/rvgen/rvgen/dot2k.py | 38 +++++++++++++++++++----
> --
> 1 file changed, 30 insertions(+), 8 deletions(-)
>
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index d9f8e1c7737a..ebaa6c9135c2 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -221,6 +221,19 @@ class ha2k(dot2k):
> def __parse_single_constraint(self, rule: dict, value: str) ->
> str:
> return f"ha_get_env(ha_mon, {rule["env"]}{self.enum_suffix},
> time_ns) {rule["op"]} {value}"
>
> + def __parse_guard_rule(self, rule) -> list[str]:
> + buff = []
> + for c, sep in rule.rules:
> + env = c.env + self.enum_suffix
> + op = c.op
> + val = self.__adjust_value(c.val, c.unit)
> +
> + cond = f"ha_get_env(ha_mon, {env}, time_ns) {op} {val}"
> + if sep:
> + cond += f" {sep}"
> + buff.append(cond)
> + return buff
> +
> def __get_constraint_env(self, constr: str) -> str:
> """Extract the second argument from an ha_ function"""
> env =
> constr.split("(")[1].split()[1].rstrip(")").rstrip(",")
> @@ -287,7 +300,7 @@ class ha2k(dot2k):
> rules = invalid_checks + rules
>
> separator = "\n\t\t " if sum(len(r) for r in rules) >
> 80 else " "
> - return ["res = " + separator.join(rules)]
> + return ["res = " + separator.join(rules) + ";"]
>
> def __validate_constraint(self, key: tuple[int, int] | int,
> constr: str,
> rule, reset) -> None:
> @@ -406,7 +419,8 @@ f"""static inline void
> ha_convert_inv_guard(struct ha_monitor *ha_mon,
>
> def __fill_verify_guards_func(self) -> list[str]:
> buff = []
> - if not self.guards:
> +
> + if not self.has_guard:
> return []
>
> buff.append(
> @@ -418,14 +432,22 @@ f"""static inline bool ha_verify_guards(struct
> ha_monitor *ha_mon,
> """)
>
> _else = ""
> - for edge, constr in sorted(self.guards.items()):
> + for transition in self.transitions:
> + if not transition.rule and not transition.reset:
> + continue
> +
> buff.append(f"\t{_else}if (curr_state == "
> - f"{self.states[edge[0]]}{self.enum_suffix}
> && "
> - f"event ==
> {self.events[edge[1]]}{self.enum_suffix})")
> - if constr.count(";") > 0:
> + f"{transition.src}{self.enum_suffix} && "
> + f"event ==
> {transition.event}{self.enum_suffix})")
> + rule = transition.rule
> + reset = transition.reset
> + if rule and reset:
> buff[-1] += " {"
> - buff += [f"\t\t{c};" for c in constr.split(";")]
> - if constr.count(";") > 0:
> + if rule:
> + buff.append("\t\t" +
> self.__format_guard_rules(self.__parse_guard_rule(rule))[0])
> + if reset:
> + buff.append(f"\t\tha_reset_env(ha_mon,
> {reset.env}{self.enum_suffix}, time_ns);")
> + if rule and reset:
> _else = "} else "
> else:
> _else = "else "
^ permalink raw reply
* Re: [PATCH v2 13/13] verification/rvgen: Remove dead code
From: Gabriele Monaco @ 2026-06-03 15:36 UTC (permalink / raw)
To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
linux-kernel
In-Reply-To: <9120d2a5987b79b646dde4001381581b703c0dd7.1779956342.git.namcao@linutronix.de>
On Thu, 2026-05-28 at 10:28 +0200, Nam Cao wrote:
> The conversion to use Lark left some dead code behind. Remove them.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
You might want to remove unused imports (linters should help you with
that too):
* re, typing.Iterator, and itertools.islice from automata.py
* deque and ConstraintRule from dot2k
Other than that
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
> tools/verification/rvgen/rvgen/automata.py | 154 -------------------
> --
> tools/verification/rvgen/rvgen/dot2k.py | 28 +---
> 2 files changed, 1 insertion(+), 181 deletions(-)
>
> diff --git a/tools/verification/rvgen/rvgen/automata.py
> b/tools/verification/rvgen/rvgen/automata.py
> index a3be327c2a73..b6ff5fceb820 100644
> --- a/tools/verification/rvgen/rvgen/automata.py
> +++ b/tools/verification/rvgen/rvgen/automata.py
> @@ -356,19 +356,6 @@ class State:
> self.name = name
> self.inv = inv
>
> -class _ConstraintKey:
> - """Base class for constraint keys."""
> -
> -class _StateConstraintKey(_ConstraintKey, int):
> - """Key for a state constraint. Under the hood just state_id."""
> - def __new__(cls, state_id: int):
> - return super().__new__(cls, state_id)
> -
> -class _EventConstraintKey(_ConstraintKey, tuple):
> - """Key for an event constraint. Under the hood just
> tuple(state_id,event_id)."""
> - def __new__(cls, state_id: int, event_id: int):
> - return super().__new__(cls, (state_id, event_id))
> -
> class AutomataError(Exception):
> """Exception raised for errors in automata parsing and
> validation.
>
> @@ -387,28 +374,10 @@ class Automata:
>
> invalid_state_str = "INVALID_STATE"
> init_marker = "__init_"
> - node_marker = "{node"
> - # val can be numerical, uppercase (constant or macro), lowercase
> (parameter or function)
> - # only numerical values should have units
> - constraint_rule = re.compile(r"""
> - ^
> - (?P<env>[a-zA-Z_][a-zA-Z0-9_]+) # C-like identifier for the
> env var
> - (?P<op>[!<=>]{1,2}) # operator
> - (?P<val>
> - [0-9]+ | # numerical value
> - [A-Z_]+\(\) | # macro
> - [A-Z_]+ | # constant
> - [a-z_]+\(\) | # function
> - [a-z_]+ # parameter
> - )
> - (?P<unit>[a-z]{1,2})? # optional unit for
> numerical values
> - """, re.VERBOSE)
> - constraint_reset = re.compile(r"^reset\((?P<env>[a-zA-Z_][a-zA-
> Z0-9_]+)\)")
>
> def __init__(self, file_path, model_name=None):
> self.__dot_path = file_path
> self.name = model_name or self.__get_model_name()
> - self.__dot_lines = self.__open_dot()
> self.__parse_tree = ParseTree(file_path)
> self.transitions = self.__parse_transitions()
> self.states, self.initial_state, self.final_states =
> self.__parse_states()
> @@ -435,57 +404,6 @@ class Automata:
>
> return model_name
>
> - def __open_dot(self) -> list[str]:
> - dot_lines = []
> - try:
> - with open(self.__dot_path) as dot_file:
> - dot_lines = dot_file.readlines()
> - except OSError as exc:
> - raise AutomataError(exc.strerror) from exc
> -
> - if not dot_lines:
> - raise AutomataError(f"{self.__dot_path} is empty")
> -
> - # checking the first line:
> - line = dot_lines[0].split()
> -
> - if len(line) < 2 or line[0] != "digraph" or line[1] !=
> "state_automaton":
> - raise AutomataError(f"Not a valid .dot format:
> {self.__dot_path}")
> -
> - return dot_lines
> -
> - def __get_cursor_begin_states(self) -> int:
> - for cursor, line in enumerate(self.__dot_lines):
> - split_line = line.split()
> -
> - if len(split_line) and split_line[0] ==
> self.node_marker:
> - return cursor
> -
> - raise AutomataError("Could not find a beginning state")
> -
> - def __get_cursor_begin_events(self) -> int:
> - state = 0
> - cursor = 0 # make pyright happy
> -
> - for cursor, line in enumerate(self.__dot_lines):
> - line = line.split()
> - if not line:
> - continue
> -
> - if state == 0:
> - if line[0] == self.node_marker:
> - state = 1
> - elif line[0] != self.node_marker:
> - break
> - else:
> - raise AutomataError("Could not find beginning event")
> -
> - cursor += 1 # skip initial state transition
> - if cursor == len(self.__dot_lines):
> - raise AutomataError("Dot file ended after event
> beginning")
> -
> - return cursor
> -
> def __parse_transitions(self):
> transitions = []
>
> @@ -542,51 +460,6 @@ class Automata:
> states.insert(0, initial_state)
> return states, initial_state, final_states
>
> - def __get_state_variables(self) -> tuple[list[str], str,
> list[str]]:
> - # wait for node declaration
> - states = []
> - final_states = []
> - initial_state = ""
> -
> - has_final_states = False
> - cursor = self.__get_cursor_begin_states()
> -
> - # process nodes
> - for line in islice(self.__dot_lines, cursor, None):
> - split_line = line.split()
> - if not split_line or split_line[0] != self.node_marker:
> - break
> -
> - raw_state = split_line[-1]
> -
> - # "enabled_fired"}; -> enabled_fired
> - state = raw_state.replace('"', '').replace('};',
> '').replace(',', '_')
> - if state.startswith(self.init_marker):
> - initial_state = state[len(self.init_marker):]
> - else:
> - states.append(state)
> - if "doublecircle" in line:
> - final_states.append(state)
> - has_final_states = True
> -
> - if "ellipse" in line:
> - final_states.append(state)
> - has_final_states = True
> -
> - if not initial_state:
> - raise AutomataError("The automaton doesn't have an
> initial state")
> -
> - states = sorted(set(states))
> - states.remove(initial_state)
> -
> - # Insert the initial state at the beginning of the states
> - states.insert(0, initial_state)
> -
> - if not has_final_states:
> - final_states.append(initial_state)
> -
> - return states, initial_state, final_states
> -
> def __get_event_variables(self) -> tuple[list[str], list[str]]:
> events: list[str] = []
> envs: list[str] = []
> @@ -609,26 +482,6 @@ class Automata:
>
> return sorted(set(events)), sorted(set(envs))
>
> - def _split_constraint_expr(self, constr: list[str]) ->
> Iterator[tuple[str,
> -
>
> str | None]]:
> - """
> - Get a list of strings of the type constr1 && constr2 and
> returns a list of
> - constraints and separators: [[constr1,"&&"],[constr2,None]]
> - """
> - exprs = []
> - seps = []
> - for c in constr:
> - while "&&" in c or "||" in c:
> - a = c.find("&&")
> - o = c.find("||")
> - pos = a if o < 0 or 0 < a < o else o
> - exprs.append(c[:pos].replace(" ", ""))
> - seps.append(c[pos:pos + 2].replace(" ", ""))
> - c = c[pos + 2:].replace(" ", "")
> - exprs.append(c)
> - seps.append(None)
> - return zip(exprs, seps)
> -
> def __extract_env_var(self, constraint: ConstraintCondition):
> if constraint.unit:
> self.env_types[constraint.env] = constraint.unit
> @@ -697,10 +550,3 @@ class Automata:
>
> def is_hybrid_automata(self) -> bool:
> return bool(self.envs)
> -
> - def is_event_constraint(self, key: _ConstraintKey) -> bool:
> - """
> - Given the key in self.constraints return true if it is an
> event
> - constraint, false if it is a state constraint
> - """
> - return isinstance(key, _EventConstraintKey)
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index 49cb3e724a93..d6779ac6b7dd 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -11,9 +11,7 @@
> from collections import deque
> from .dot2c import Dot2c
> from .generator import Monitor
> -from .automata import _EventConstraintKey, _StateConstraintKey,
> AutomataError
> -from .automata import ConstraintRule, ConstraintCondition
> -
> +from .automata import ConstraintRule, ConstraintCondition,
> AutomataError
>
> class dot2k(Monitor, Dot2c):
> template_dir = "dot2k"
> @@ -217,9 +215,6 @@ class ha2k(dot2k):
> value *= 10**9
> return str(value) + "ull"
>
> - def __parse_single_constraint(self, rule: dict, value: str) ->
> str:
> - return f"ha_get_env(ha_mon, {rule["env"]}{self.enum_suffix},
> time_ns) {rule["op"]} {value}"
> -
> def __parse_guard_rule(self, rule) -> list[str]:
> buff = []
> for c, sep in rule.rules:
> @@ -233,12 +228,6 @@ class ha2k(dot2k):
> buff.append(cond)
> return buff
>
> - def __get_constraint_env(self, constr: str) -> str:
> - """Extract the second argument from an ha_ function"""
> - env =
> constr.split("(")[1].split()[1].rstrip(")").rstrip(",")
> - assert env.rstrip(f"_{self.name}") in self.envs
> - return env
> -
> def __start_to_invariant_check(self, inv: ConstraintCondition) -
> > str:
> # by default assume the timer has ns expiration
> clock_type = "ns"
> @@ -249,21 +238,6 @@ class ha2k(dot2k):
>
> return f"return ha_check_invariant_{clock_type}(ha_mon,
> {inv.env}_{self.name}, time_ns, {value})"
>
> - def __start_to_conv(self, constr: str) -> str:
> - """
> - Undo the storage conversion done by ha_start_timer_
> - """
> - return "ha_inv_to_guard" + constr[constr.find("("):]
> -
> - def __parse_timer_constraint(self, rule: dict, value: str) ->
> str:
> - # by default assume the timer has ns expiration
> - clock_type = "ns"
> - if self.env_types.get(rule["env"]) == "j":
> - clock_type = "jiffy"
> -
> - return (f"ha_start_timer_{clock_type}(ha_mon,
> {rule["env"]}{self.enum_suffix},"
> - f" {value}, time_ns)")
> -
> def __parse_invariant(self, inv):
> # by default assume the timer has ns expiration
> clock_type = "ns"
^ permalink raw reply
* [PATCH v3] tracing: fix CFI violation in probestub test
From: Eva Kurchatova @ 2026-06-03 15:31 UTC (permalink / raw)
To: mhiramat, rostedt
Cc: linux-trace-kernel, linux-kernel, mathieu.desnoyers, peterz,
jpoimboe, samitolvanen, eva.kurchatova
When multiple callbacks are registered on the same tracepoint,
callbacks will be indirectly called via traceiter helper.
Pointers to __probestub_* callbacks reside in __tracepoints section,
which is excluded from ENDBR checks in objtool, causing objtool to
assume those functions are never indirectly called.
Registering multiple callbacks using sched_wakeup test will result
in #CP exception due to missing ENDBR in __probestub_sched_wakeup
on a CFI-enabled machine.
Fix this by adding CFI_NOSEAL annotation to probestub declaration.
Fixes: d5173f753750 ("objtool: Exclude __tracepoints data from ENDBR checks")
Signed-off-by: Eva Kurchatova <eva.kurchatova@virtuozzo.com>
---
include/linux/tracepoint.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 763eea4d80d8..2d2b9f8cdda4 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -20,6 +20,7 @@
#include <linux/rcupdate_trace.h>
#include <linux/tracepoint-defs.h>
#include <linux/static_call.h>
+#include <linux/cfi.h>
struct module;
struct tracepoint;
@@ -389,6 +390,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
void __probestub_##_name(void *__data, proto) \
{ \
} \
+ /* \
+ * Annotate the probestub 'CFI_NOSEAL' to stop objtool from \
+ * requesting the kernel remove the ENDBR, because the only \
+ * references to the function are in the __tracepoint section, \
+ * that objtool doesn't scan. \
+ */ \
+ CFI_NOSEAL(__probestub_##_name); \
DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name); \
DEFINE_RUST_DO_TRACE(_name, TP_PROTO(proto), TP_ARGS(args))
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v2 05/13] verification/rvgen: Convert __fill_setup_invariants_func() to Lark
From: Gabriele Monaco @ 2026-06-03 15:24 UTC (permalink / raw)
To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
linux-kernel
In-Reply-To: <0b2cf5e1bb03d0e3a667b0fc2c7093123ef0a78c.1779956342.git.namcao@linutronix.de>
On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> Prepare for self.invariants and __parse_constraints() to be removed.
> convert __fill_setup_invariants_func() to use the new parsed states
> from Lark.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
> v2: add missing time conversion [Sashiko]
> ---
> tools/verification/rvgen/rvgen/dot2k.py | 44 ++++++++++++++++++++---
> --
> 1 file changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index a344cbbcb346..d9f8e1c7737a 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -250,6 +250,26 @@ class ha2k(dot2k):
> return (f"ha_start_timer_{clock_type}(ha_mon,
> {rule["env"]}{self.enum_suffix},"
> f" {value}, time_ns)")
>
> + def __parse_invariant(self, inv):
> + # by default assume the timer has ns expiration
> + clock_type = "ns"
> + if inv.unit == "j":
> + clock_type = "jiffy"
> +
> + env = inv.env + self.enum_suffix
> + val = inv.val.replace("()", "(ha_mon)")
> +
> + match inv.unit:
> + case "us":
> + val *= 10**3
> + case "ms":
> + val *= 10**6
> + case "s":
> + val *= 10**9
> +
> + return (f"ha_start_timer_{clock_type}(ha_mon, {env},"
> + f" {val}, time_ns)")
> +
> def __format_guard_rules(self, rules: list[str]) -> list[str]:
> """
> Merge guard constraints as a single C return statement.
> @@ -463,15 +483,14 @@ f"""static inline bool ha_verify_guards(struct
> ha_monitor *ha_mon,
> return conflict_guards, conflict_invs
>
> def __fill_setup_invariants_func(self) -> list[str]:
> - buff = []
> - if not self.invariants:
> + if not self.has_invariant:
> return []
>
> - buff.append(
> + buff = [
> f"""static inline void ha_setup_invariants(struct ha_monitor
> *ha_mon,
> \t\t\t\t enum {self.enum_states_def} curr_state, enum
> {self.enum_events_def} event,
> \t\t\t\t enum {self.enum_states_def} next_state, u64 time_ns)
> -{{""")
> +{{"""]
>
> conditions = ["next_state == curr_state"]
> conditions += [f"event != {e}{self.enum_suffix}"
> @@ -480,13 +499,20 @@ f"""static inline void
> ha_setup_invariants(struct ha_monitor *ha_mon,
> buff.append(f"\tif ({condition_str})\n\t\treturn;")
>
> _else = ""
> - for state, constr in sorted(self.invariants.items()):
> - buff.append(f"\t{_else}if (next_state ==
> {self.states[state]}{self.enum_suffix})")
> - buff.append(f"\t\t{constr};")
> + for state in self._states:
> + inv = state.inv
> + if not inv:
> + continue
> + inv = self.__parse_invariant(inv)
> + buff.append(f"\t{_else}if (next_state ==
> {state.name}{self.enum_suffix})")
> + buff.append(f"\t\t{inv};")
> _else = "else "
>
> - for state in self.invariants:
> - buff.append(f"\telse if (curr_state ==
> {self.states[state]}{self.enum_suffix})")
> + for state in self._states:
> + inv = state.inv
> + if not inv:
> + continue
> + buff.append(f"\telse if (curr_state ==
> {state.name}{self.enum_suffix})")
> buff.append("\t\tha_cancel_timer(ha_mon);")
>
> buff.append("}\n")
^ permalink raw reply
* Re: [PATCH v2 04/13] verification/rvgen: Convert __fill_verify_invariants_func() to Lark
From: Gabriele Monaco @ 2026-06-03 14:56 UTC (permalink / raw)
To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
linux-kernel
In-Reply-To: <d1fbbafd8af64a9663b3982845ecb2c3c5a4eded.1779956342.git.namcao@linutronix.de>
On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> Convert __fill_verify_invariants_func() to use the parsed states
> information from Lark, prepare to remove the old raw text parsing
> code.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
> v2: fix up __start_to_invariant_check()'s signature [Sashiko]
> ---
> tools/verification/rvgen/rvgen/dot2k.py | 32 ++++++++++++++++-------
> --
> 1 file changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index e6f476b903b0..a344cbbcb346 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -12,6 +12,7 @@ from collections import deque
> from .dot2c import Dot2c
> from .generator import Monitor
> from .automata import _EventConstraintKey, _StateConstraintKey,
> AutomataError
> +from .automata import ConstraintRule, ConstraintCondition
>
>
> class dot2k(Monitor, Dot2c):
> @@ -177,6 +178,14 @@ class ha2k(dot2k):
> raise AutomataError("Detected deterministic automaton,
> use the 'da' class")
> self.trace_h = self._read_template_file("trace_hybrid.h")
> self.__parse_constraints()
> + self.has_invariant = False
> + self.has_guard = False
> + for state in self._states:
> + if state.inv:
> + self.has_invariant = True
> + for transition in self.transitions:
> + if transition.rule or transition.reset:
> + self.has_guard = True
>
> def fill_monitor_class_type(self) -> str:
> if self._is_id_monitor():
> @@ -218,14 +227,13 @@ class ha2k(dot2k):
> assert env.rstrip(f"_{self.name}") in self.envs
> return env
>
> - def __start_to_invariant_check(self, constr: str) -> str:
> + def __start_to_invariant_check(self, inv: ConstraintCondition) -
> > str:
> # by default assume the timer has ns expiration
> - env = self.__get_constraint_env(constr)
> clock_type = "ns"
> - if self.env_types.get(env.rstrip(f"_{self.name}")) == "j":
> + if inv.unit == "j":
> clock_type = "jiffy"
>
> - return f"return ha_check_invariant_{clock_type}(ha_mon,
> {env}, time_ns)"
> + return f"return ha_check_invariant_{clock_type}(ha_mon,
> {inv.env}_{self.name}, time_ns)"
>
> def __start_to_conv(self, constr: str) -> str:
> """
> @@ -320,20 +328,22 @@ class ha2k(dot2k):
> self.invariants[key] = rules[0]
>
> def __fill_verify_invariants_func(self) -> list[str]:
> - buff = []
> - if not self.invariants:
> + if not self.has_invariant:
> return []
>
> - buff.append(
> + buff = [
> f"""static inline bool ha_verify_invariants(struct ha_monitor
> *ha_mon,
> \t\t\t\t\tenum {self.enum_states_def} curr_state, enum
> {self.enum_events_def} event,
> \t\t\t\t\tenum {self.enum_states_def} next_state, u64 time_ns)
> -{{""")
> +{{"""]
>
> _else = ""
> - for state, constr in sorted(self.invariants.items()):
> - check_str = self.__start_to_invariant_check(constr)
> - buff.append(f"\t{_else}if (curr_state ==
> {self.states[state]}{self.enum_suffix})")
> + for state in self._states:
> + if not state.inv:
> + continue
> +
> + check_str = self.__start_to_invariant_check(state.inv)
> + buff.append(f"\t{_else}if (curr_state ==
> {state.name}{self.enum_suffix})")
> buff.append(f"\t\t{check_str};")
> _else = "else "
>
^ permalink raw reply
* Re: [PATCH v2 03/13] verification/rvgen: Implement state and transition parser based on Lark
From: Gabriele Monaco @ 2026-06-03 14:55 UTC (permalink / raw)
To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
linux-kernel
In-Reply-To: <e84c39bab831f1623142e6b83a9513ba99a702f1.1779956342.git.namcao@linutronix.de>
On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> The DOT parsing scripts directly parse the raw text and they are
> quite
> fragile. If the input dot files' formats are slightly changed (for
> instance, by breaking long some lines which is allowed by the DOT
> language), the scripts would fail.
>
> Prepare to move away from the raw text processing, implement parsers
> based
> on Lark which parse states, transitions and constraints.
>
> The parse results are not used yet. The existing scripts will be
> converted
> one by one to them, and the raw text processing will eventually be
> removed.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
> v2:
> - fixup guard grammar [Gabriele]
> - fixup inconsistent types in a list [Wander]
> - compiling the parsers only once to avoid overhead [Sashiko]
> - fix up the signature of State.__init__() [Sashiko]
> - gracefully handle node statement without label [Sashiko]
> - lark parse exception handling
> ---
> tools/verification/rvgen/rvgen/automata.py | 216
> +++++++++++++++++++++
> 1 file changed, 216 insertions(+)
>
> diff --git a/tools/verification/rvgen/rvgen/automata.py
> b/tools/verification/rvgen/rvgen/automata.py
> index 8649d982383d..b86275e7bf28 100644
> --- a/tools/verification/rvgen/rvgen/automata.py
> +++ b/tools/verification/rvgen/rvgen/automata.py
> @@ -198,6 +198,164 @@ class ParseTree:
> self.node_attrs = attributes_parser.node_attrs
> self.edge_attrs = attributes_parser.edge_attrs
>
> +class ConstraintCondition:
> + def __init__(self, env: str, op: str, val: str, unit=None):
> + self.env = env
> + self.op = op
> + self.val = val
> + self.unit = unit
> + if unit is None:
> + # try to infer unit from constants or parameters
> + val_for_unit = val.lower().replace("()", "")
> + if val_for_unit.endswith("_ns"):
> + self.unit = "ns"
> + if val_for_unit.endswith("_jiffies"):
> + self.unit = "j"
> +
> +class ConstraintRule:
> + grammar = r'''
> + rule: condition (OP condition)*
> +
> + OP: "&&" | "||"
> +
> + condition: ENV CMP_OP VAL UNIT?
> +
> + ENV: CNAME
> +
> + CMP_OP: "==" | "<=" | "<" | ">=" | ">"
> +
> + VAL: /[0-9]+/
> + | /[A-Z_]+\(\)/
> + | /[A-Z_]+/
> + | /[a-z_]+\(\)/
> + | /[a-z_]+/
> +
> + UNIT: "ns" | "us" | "ms" | "s"
> + '''
> +
> + def __init__(self, c: ConstraintCondition):
> + '''
> + A list of pairs of
> + - the condition (e.g. is_constr_dl == 1)
> + - the logical operator ("||" or "&&") combining this
> + condition with the next one if it exists, otherwise None
> +
> + TODO: Perhaps use an abstract syntax tree instead, because
> + this representation cannot capture precedence
> + '''
> + self.rules = [[c, None]]
> +
> + def chain(self, op: str, c: ConstraintCondition):
> + self.rules[-1][1] = op
> + self.rules.append([c, None])
> +
> +class ConstraintReset:
> + def __init__(self, env):
> + self.env = env
> +
> +class StateLabelParser:
> + grammar = r'''
> + label: CNAME ("\\n" condition)?
> +
> + %import common.CNAME
> + %import common.WS
> + %ignore WS
> + ''' + ConstraintRule.grammar
> +
> + parser = lark.Lark(grammar, parser='lalr', start="label")
> +
> + def __init__(self, label: str):
> + try:
> + tree = self.parser.parse(label)
> + except lark.exceptions.UnexpectedInput as exc:
> + raise(AutomataError(f"Unrecognised state
> \"{label}\"\n{exc}"))
> +
> + self.state = tree.children[0]
> + self.constraint = None
> +
> + if len(tree.children) == 2:
> + self.constraint =
> ConstraintCondition(*tree.children[1].children)
> + if self.constraint.op not in ("<", "<="):
> + raise AutomataError("State constraints must be clock
> expirations like"
> + f" clk<N ({label})")
> +
> +class EventLabelParser:
> + grammar = r'''
> + events: event ("\\n" event)*
> +
> + event: name (";" guard)?
> +
> + guard: reset
> + | rule
> + | rule ";" reset
> + | reset ";" rule
> +
> + name: CNAME
> +
> + reset: "reset" "(" ENV ")"
> +
> + %import common.CNAME
> + %import common.WS
> + %ignore WS
> + ''' + ConstraintRule.grammar
> +
> + parser = lark.Lark(grammar, parser='lalr', start="events")
> +
> + class GetEvents(lark.visitors.Transformer):
> + def guard(self, args):
> + reset = None
> + rule = None
> + for arg in args:
> + if arg.data == "reset":
> + reset = ConstraintReset(arg.children[0])
> + elif arg.data == "rule":
> + conditions = arg.children
> + rule = ConstraintRule(conditions[0])
> + for i in range(1, len(conditions), 2):
> + rule.chain(conditions[i], conditions[i + 1])
> + return reset, rule
> +
> + def OP(self, args):
> + return args
> +
> + def condition(self, args):
> + return ConstraintCondition(*args)
> +
> + def event(self, args):
> + assert(len(args) <= 2)
> + name = args[0]
> + rule, reset = None, None
> + if len(args) == 2:
> + reset, rule = args[1]
> + return name, reset, rule
> +
> + def events(self, args):
> + return args
> +
> + def name(self, args):
> + return args[0]
> +
> + def __init__(self, label: str):
> + try:
> + tree = self.parser.parse(label)
> + self.events = self.GetEvents().transform(tree)
> + except lark.exceptions.UnexpectedInput as exc:
> + raise(AutomataError(f"Unrecognised event
> \"{label}\"\n{exc}"))
> +
> +class Transition:
> + def __init__(self, src: str, dst: str, event: str,
> + reset: ConstraintReset, rule: ConstraintRule):
> + self.src = src
> + self.dst = dst
> + self.event = event
> + self.rule = rule
> + self.reset = reset
> +
> +class State:
> + def __init__(self, name: str, inv: ConstraintCondition):
> + self.name = name
> + self.inv = inv
> +
> class _ConstraintKey:
> """Base class for constraint keys."""
>
> @@ -252,6 +410,8 @@ class Automata:
> self.name = model_name or self.__get_model_name()
> self.__dot_lines = self.__open_dot()
> self.__parse_tree = ParseTree(file_path)
> + self.transitions = self.__parse_transitions()
> + self._states, self._initial_state, self._final_states =
> self.__parse_states()
> self.states, self.initial_state, self.final_states =
> self.__get_state_variables()
> self.env_types = {}
> self.env_stored = set()
> @@ -327,6 +487,62 @@ class Automata:
>
> return cursor
>
> + def __parse_transitions(self):
> + transitions = []
> +
> + for edge in self.__parse_tree.edges:
> + attr = self.__parse_tree.edge_attrs.get(edge)
> + if not attr:
> + continue
> +
> + label = attr.get("label")
> +
> + src, dst = edge
> +
> + parser = EventLabelParser(label)
> + for event, reset, rule in parser.events:
> + transitions.append(Transition(src, dst, event,
> reset, rule))
> +
> + transitions.sort(key=lambda t : (t.src, t.event))
> + return transitions
> +
> + def __parse_states(self):
> + initial_state = ""
> + states = []
> + final_states = []
> +
> + for node in self.__parse_tree.nodes:
> + attr = self.__parse_tree.node_attrs[node]
> + label = attr.get("label")
> +
> + if node.startswith(Automata.init_marker):
> + initial_state = node[len(Automata.init_marker):]
> +
> + if not label:
> + continue
> +
> + parser = StateLabelParser(label)
> + state = State(parser.state, parser.constraint)
> +
> + states.append(state)
> +
> + shape = attr.get("shape")
> + if shape in ("doublecircle", "ellipse"):
> + final_states.append(state)
> +
> +
> + initial_state = next((s for s in states if s.name ==
> initial_state), None)
> + if not initial_state:
> + raise AutomataError("The automaton doesn't have an
> initial state")
> +
> + if not final_states:
> + final_states.append(initial_state)
> +
> + states.remove(initial_state)
> + states.sort(key=lambda s : s.name)
> + states.insert(0, initial_state)
> + return states, initial_state, final_states
> +
> def __get_state_variables(self) -> tuple[list[str], str,
> list[str]]:
> # wait for node declaration
> states = []
^ permalink raw reply
* Re: [PATCH v2 02/13] verification/rvgen: Introduce a parse tree for automata using Lark
From: Gabriele Monaco @ 2026-06-03 14:55 UTC (permalink / raw)
To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
linux-kernel
In-Reply-To: <13a2f241b74e02b13efa7fe188be3fa7e6148b4a.1779956342.git.namcao@linutronix.de>
On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> The DOT parsing scripts directly parse the raw text and they are
> quite
> fragile. If the input dot files' formats are slightly changed (for
> instance, by breaking long some lines which is allowed by the DOT
> language
> defined by graphviz), the scripts would fail.
>
> To make the scripts robust, the parser should be implemented based on
> the
> dot language specification, not based on how the existing dot files
> look.
>
> As a first step, use Lark to implement a Parser based on the graphviz
> dot
> language specification. The resulting parse tree is not used yet, but
> the
> existing scripts will be converted one by one to use this new parse
> tree in
> the follow-up commits.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
> v2:
> - switch to use Lark's CNAME for identifier [Wander]
> - switch to use Lark's ESCAPED_STRING for string [Sashiko]
> - clean up variable name shadowing [Sashiko]
> - Properly catch Lark exception
> ---
> tools/verification/rvgen/rvgen/automata.py | 186
> +++++++++++++++++++++
> 1 file changed, 186 insertions(+)
>
> diff --git a/tools/verification/rvgen/rvgen/automata.py
> b/tools/verification/rvgen/rvgen/automata.py
> index b9f8149f7118..8649d982383d 100644
> --- a/tools/verification/rvgen/rvgen/automata.py
> +++ b/tools/verification/rvgen/rvgen/automata.py
> @@ -13,6 +13,191 @@ import re
> from typing import Iterator
> from itertools import islice
>
> +import lark
> +
> +class ParseTree:
> + # based on https://graphviz.org/doc/info/lang.html
> + # with the irrelevant stuffs (port and compass) removed
> + grammar = r'''
> + start: "strict"? ("graph" | "digraph") ID? "{" stmt_list "}"
> +
> + stmt_list: (stmt ";"? stmt_list)?
> +
> + stmt: node_stmt
> + | edge_stmt
> + | attr_stmt
> + | ID "=" ID
> + | subgraph
> +
> + attr_stmt: attr_type attr_list
> +
> + attr_type: "graph" -> graph
> + | "node" -> node
> + | "edge" -> edge
> +
> + attr_list: "[" a_list? "]" attr_list?
> +
> + a_list: ID "=" ID (";" | ",")? a_list?
> +
> + edge_stmt: (node_id | subgraph) edgerhs attr_list?
> +
> + edgerhs: edgeop (node_id | subgraph) edgerhs?
> +
> + edgeop: "->" | "--"
> +
> + node_stmt: node_id attr_list?
> +
> + node_id: ID
> +
> + subgraph: ("subgraph" ID?)? "{" stmt_list "}"
> +
> + ID: CNAME
> + | /-?(\.[0-9]+|[0-9]+(\.[0-9]*))/
> + | ESCAPED_STRING
> +
> + %import common.CNAME
> + %import common.ESCAPED_STRING
> + %import common.WS
> + %ignore WS
> + '''
> +
> + @staticmethod
> + def parse_edge(tree: lark.Tree) -> tuple[str, str]:
> + # only support a simple node-to-node edge
> + nodes = []
> + for node in tree.iter_subtrees_topdown():
> + if node.data == "node_id":
> + nodes.append(node.children[0].strip('"'))
> +
> + if len(nodes) != 2:
> + raise AutomataError("Only state-to-state transition is
> supported")
> +
> + return tuple(nodes)
> +
> + class ParseNodes(lark.visitors.Visitor):
> + def __init__(self, *args, **kwargs):
> + self.nodes = set()
> + super().__init__(*args, **kwargs)
> +
> + def node_stmt(self, tree):
> + node_id = tree.children[0]
> + node = node_id.children[0].strip('"')
> + self.nodes.add(node)
> +
> + class ParseEdges(lark.visitors.Visitor):
> + def __init__(self, *args, **kwargs):
> + self.edges = set()
> + super().__init__(*args, **kwargs)
> +
> + def edge_stmt(self, tree):
> + edge = ParseTree.parse_edge(tree)
> + self.edges.add(edge)
> +
> + class ParseAttributes(lark.visitors.Interpreter):
> + def __init__(self, *args, **kwargs):
> + '''
> + Stacks of default attributes. [0] is the default
> + attributes for the outermost scope, while [-1] is the
> + default attributes for the current scope.
> + '''
> + self.default_node_attrs = [{}]
> + self.default_edge_attrs = [{}]
> +
> + self.node_attrs = {}
> + self.edge_attrs = {}
> +
> + super().__init__(*args, **kwargs)
> +
> + @staticmethod
> + def __get_attrs(stmt: lark.Tree) -> dict[str, str]:
> + attrs = {}
> +
> + for node in stmt.iter_subtrees():
> + if node.data == "a_list":
> + attrs[node.children[0]] =
> node.children[1].strip('"')
> +
> + return attrs
> +
> +
> + def subgraph(self, tree):
> + # We are entering a new scope, inherit the default
> + # attributes of the outer scope
> + self.default_node_attrs.append(self.default_node_attrs[-
> 1].copy())
> + self.default_edge_attrs.append(self.default_edge_attrs[-
> 1].copy())
> +
> + children = self.visit_children(tree)
> +
> + # Exiting the scope
> + del self.default_node_attrs[-1]
> + del self.default_edge_attrs[-1]
> +
> + return children
> +
> + def node_stmt(self, tree):
> + node_id = tree.children[0]
> + node = node_id.children[0].strip('"')
> +
> + attrs = self.default_node_attrs[-1].copy()
> + attrs |= self.__get_attrs(tree)
> +
> + if attrs:
> + if node in self.node_attrs:
> + self.node_attrs[node] = attrs |
> self.node_attrs[node]
> + else:
> + self.node_attrs[node] = attrs
> +
> + return self.visit_children(tree)
> +
> + def edge_stmt(self, tree):
> + edge = ParseTree.parse_edge(tree)
> +
> + attrs = self.default_edge_attrs[-1].copy()
> + attrs |= self.__get_attrs(tree)
> +
> + if attrs:
> + if edge in self.edge_attrs:
> + self.edge_attrs[edge] = attrs |
> self.edge_attrs[edge]
> + else:
> + self.edge_attrs[edge] = attrs
> +
> + return self.visit_children(tree)
> +
> + def attr_stmt(self, tree):
> + attr_type = tree.children[0].data
> + attrs = self.__get_attrs(tree)
> +
> + if attr_type == "node":
> + self.default_node_attrs[-1] |= attrs
> + elif attr_type == "edge":
> + self.default_edge_attrs[-1] |= attrs
> + else:
> + # graph attributes are irrelevant
> + pass
> +
> + self.visit_children(tree)
> +
> + def __init__(self, dot_file):
> + parser = lark.Lark(self.grammar, parser='lalr')
> + node_parser = self.ParseNodes()
> + edge_parser = self.ParseEdges()
> + attributes_parser = self.ParseAttributes()
> +
> + try:
> + with open(dot_file, "r") as f:
> + tree = parser.parse(f.read())
> + attributes_parser.visit(tree)
> + node_parser.visit(tree)
> + edge_parser.visit(tree)
> + except OSError as exc:
> + raise AutomataError(exc.strerror) from exc
> + except lark.exceptions.UnexpectedInput as exc:
> + raise AutomataError(str(exc))
> +
> + self.nodes = node_parser.nodes
> + self.edges = edge_parser.edges
> + self.node_attrs = attributes_parser.node_attrs
> + self.edge_attrs = attributes_parser.edge_attrs
> +
> class _ConstraintKey:
> """Base class for constraint keys."""
>
> @@ -66,6 +251,7 @@ class Automata:
> self.__dot_path = file_path
> self.name = model_name or self.__get_model_name()
> self.__dot_lines = self.__open_dot()
> + self.__parse_tree = ParseTree(file_path)
> self.states, self.initial_state, self.final_states =
> self.__get_state_variables()
> self.env_types = {}
> self.env_stored = set()
^ permalink raw reply
* Re: [PATCH v2 01/13] verification/rvgen: Switch LTL parser to Lark
From: Gabriele Monaco @ 2026-06-03 14:49 UTC (permalink / raw)
To: Nam Cao
Cc: Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
linux-kernel
In-Reply-To: <e1736ef618e32712eeb65d1714a2fea76298057b.1779956342.git.namcao@linutronix.de>
On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> The LTL parser is built using Ply. However, Ply is no longer
> maintained [1].
>
> Switch to use Lark instead. In addition to being actively maintained,
> Lark
> also offers additional features (namely, automatically creating the
> abstract syntax tree) which make the parser simpler.
>
> Link:
> https://github.com/dabeaz/ply/commit/9d7c40099e23ff78f9d86ef69a26c1e8a83e706a
> [1]
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
> v2:
> - fix identifier starting with a digit is allowed [Wander]
> - fixup ast node uid [Gabriele]
> - Fix up Literal AST node construction [Wander, Sashiko]
> - FIx up unary op error message [Sashiko]
> - Add nice exception handling [Gabriele]
> ---
> tools/verification/rvgen/__main__.py | 5 +-
> tools/verification/rvgen/rvgen/ltl2ba.py | 202 +++++++++------------
> --
> 2 files changed, 82 insertions(+), 125 deletions(-)
>
> diff --git a/tools/verification/rvgen/__main__.py
> b/tools/verification/rvgen/__main__.py
> index 5c923dc10d0f..0915cf86e43b 100644
> --- a/tools/verification/rvgen/__main__.py
> +++ b/tools/verification/rvgen/__main__.py
> @@ -14,6 +14,7 @@ if __name__ == '__main__':
> from rvgen.container import Container
> from rvgen.ltl2k import ltl2k
> from rvgen.automata import AutomataError
> + from rvgen.ltl2ba import LTLError
> import argparse
> import sys
>
> @@ -57,8 +58,8 @@ if __name__ == '__main__':
> sys.exit(1)
> else:
> monitor = Container(vars(params))
> - except AutomataError as e:
> - print(f"There was an error processing {params.spec}: {e}",
> file=sys.stderr)
> + except (AutomataError, LTLError) as e:
> + print(f"There was an error processing {params.spec}:\n{e}",
> file=sys.stderr)
> sys.exit(1)
>
> print(f"Writing the monitor into the directory {monitor.name}")
> diff --git a/tools/verification/rvgen/rvgen/ltl2ba.py
> b/tools/verification/rvgen/rvgen/ltl2ba.py
> index 016e7cf93bbb..7cebda61bce8 100644
> --- a/tools/verification/rvgen/rvgen/ltl2ba.py
> +++ b/tools/verification/rvgen/rvgen/ltl2ba.py
> @@ -7,9 +7,7 @@
> # https://doi.org/10.1007/978-0-387-34892-6_1
> # With extra optimizations
>
> -from ply.lex import lex
> -from ply.yacc import yacc
> -from .automata import AutomataError
> +import lark
>
> # Grammar:
> # ltl ::= opd | ( ltl ) | ltl binop ltl | unop ltl
> @@ -30,42 +28,41 @@ from .automata import AutomataError
> # imply
> # equivalent
>
> -tokens = (
> - 'AND',
> - 'OR',
> - 'IMPLY',
> - 'UNTIL',
> - 'ALWAYS',
> - 'EVENTUALLY',
> - 'NEXT',
> - 'VARIABLE',
> - 'LITERAL',
> - 'NOT',
> - 'LPAREN',
> - 'RPAREN',
> - 'ASSIGN',
> -)
> -
> -t_AND = r'and'
> -t_OR = r'or'
> -t_IMPLY = r'imply'
> -t_UNTIL = r'until'
> -t_ALWAYS = r'always'
> -t_NEXT = r'next'
> -t_EVENTUALLY = r'eventually'
> -t_VARIABLE = r'[A-Z_0-9]+'
> -t_LITERAL = r'true|false'
> -t_NOT = r'not'
> -t_LPAREN = r'\('
> -t_RPAREN = r'\)'
> -t_ASSIGN = r'='
> -t_ignore_COMMENT = r'\#.*'
> -t_ignore = ' \t\n'
> -
> -def t_error(t):
> - raise AutomataError(f"Illegal character '{t.value[0]}'")
> -
> -lexer = lex()
> +GRAMMAR = r'''
> +start: assign+
> +
> +assign: VARIABLE "=" _ltl
> +
> +_ltl: _opd | binop | unop
> +
> +_opd : VARIABLE
> + | LITERAL
> + | "(" _ltl ")"
> +
> +unop: UNOP _ltl
> +UNOP: "always"
> + | "eventually"
> + | "next"
> + | "not"
> +
> +binop: _opd BINOP _ltl
> +BINOP: "until"
> + | "and"
> + | "or"
> + | "imply"
> +
> +VARIABLE: /[A-Z_][A-Z0-9_]*/
> +LITERAL: "true" | "false"
> +
> +COMMENT: "#" /.*/ "\n"
> +%ignore COMMENT
> +
> +%import common.WS
> +%ignore WS
> +'''
> +
> +class LTLError(Exception):
> + "Exception raised for malformed linear temporal logic"
>
> class GraphNode:
> uid = 0
> @@ -97,7 +94,7 @@ class GraphNode:
> return self.id < other.id
>
> class ASTNode:
> - uid = 1
> + uid = 0
>
> def __init__(self, op):
> self.op = op
> @@ -433,90 +430,49 @@ class Literal:
> node.old |= {n}
> return node.expand(node_set)
>
> -def p_spec(p):
> - '''
> - spec : assign
> - | assign spec
> - '''
> - if len(p) == 3:
> - p[2].append(p[1])
> - p[0] = p[2]
> - else:
> - p[0] = [p[1]]
> -
> -def p_assign(p):
> - '''
> - assign : VARIABLE ASSIGN ltl
> - '''
> - p[0] = (p[1], p[3])
> -
> -def p_ltl(p):
> - '''
> - ltl : opd
> - | binop
> - | unop
> - '''
> - p[0] = p[1]
> -
> -def p_opd(p):
> - '''
> - opd : VARIABLE
> - | LITERAL
> - | LPAREN ltl RPAREN
> - '''
> - if p[1] == "true":
> - p[0] = ASTNode(Literal(True))
> - elif p[1] == "false":
> - p[0] = ASTNode(Literal(False))
> - elif p[1] == '(':
> - p[0] = p[2]
> - else:
> - p[0] = ASTNode(Variable(p[1]))
> -
> -def p_unop(p):
> - '''
> - unop : ALWAYS ltl
> - | EVENTUALLY ltl
> - | NEXT ltl
> - | NOT ltl
> - '''
> - if p[1] == "always":
> - op = AlwaysOp(p[2])
> - elif p[1] == "eventually":
> - op = EventuallyOp(p[2])
> - elif p[1] == "next":
> - op = NextOp(p[2])
> - elif p[1] == "not":
> - op = NotOp(p[2])
> - else:
> - raise AutomataError(f"Invalid unary operator {p[1]}")
> -
> - p[0] = ASTNode(op)
> -
> -def p_binop(p):
> - '''
> - binop : opd UNTIL ltl
> - | opd AND ltl
> - | opd OR ltl
> - | opd IMPLY ltl
> - '''
> - if p[2] == "and":
> - op = AndOp(p[1], p[3])
> - elif p[2] == "until":
> - op = UntilOp(p[1], p[3])
> - elif p[2] == "or":
> - op = OrOp(p[1], p[3])
> - elif p[2] == "imply":
> - op = ImplyOp(p[1], p[3])
> - else:
> - raise AutomataError(f"Invalid binary operator {p[2]}")
> -
> - p[0] = ASTNode(op)
> -
> -parser = yacc()
> +class Transform(lark.visitors.Transformer):
> + def unop(self, node):
> + if node[0] == "always":
> + return ASTNode(AlwaysOp(node[1]))
> + if node[0] == "eventually":
> + return ASTNode(EventuallyOp(node[1]))
> + if node[0] == "next":
> + return ASTNode(NextOp(node[1]))
> + if node[0] == "not":
> + return ASTNode(NotOp(node[1]))
> + raise ValueError("Unknown operator %s" % node[0])
> +
> + def binop(self, node):
> + if node[1] == "until":
> + return ASTNode(UntilOp(node[0], node[2]))
> + if node[1] == "and":
> + return ASTNode(AndOp(node[0], node[2]))
> + if node[1] == "or":
> + return ASTNode(OrOp(node[0], node[2]))
> + if node[1] == "imply":
> + return ASTNode(ImplyOp(node[0], node[2]))
> + raise ValueError("Unknown operator %s" % node[1])
> +
> + def VARIABLE(self, args):
> + return ASTNode(Variable(args))
> +
> + def LITERAL(self, args):
> + return ASTNode(Literal(args == "true"))
> +
> + def start(self, node):
> + return node
> +
> + def assign(self, node):
> + return node[0].op.name, node[1]
> +
> +parser = lark.Lark(GRAMMAR)
>
> def parse_ltl(s: str) -> ASTNode:
> - spec = parser.parse(s)
> + try:
> + spec = parser.parse(s)
> + except lark.exceptions.UnexpectedInput as e:
> + raise LTLError(str(e))
> + spec = Transform().transform(spec)
>
> rule = None
> subexpr = {}
> @@ -528,7 +484,7 @@ def parse_ltl(s: str) -> ASTNode:
> subexpr[assign[0]] = assign[1]
>
> if rule is None:
> - raise AutomataError("Please define your specification in the
> \"RULE = <LTL spec>\" format")
> + raise LTLError("Please define your specification in the
> \"RULE = <LTL spec>\" format")
>
> for node in rule:
> if not isinstance(node.op, Variable):
^ permalink raw reply
* Re: [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods
From: Puranjay Mohan @ 2026-06-03 14:39 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: rcu, linux-kernel, linux-trace-kernel, Paul E. McKenney,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
In-Reply-To: <ahgxkIbBLkTJyUVt@localhost.localdomain>
Hi Frederic,
Thanks a lot for the detailed feedback on all patches, I will send the
next version soon with requested changes.
On Thu, May 28, 2026 at 1:14 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Hi,
>
> Le Fri, Apr 17, 2026 at 04:11:48PM -0700, Puranjay Mohan a écrit :
> > RCU callbacks only track normal grace period sequence numbers. This
> > means callbacks must wait for normal GPs even when expedited GPs have
> > already elapsed.
> >
> > This series tracks both normal and expedited GP sequences in the
> > callback infrastructure using struct rcu_gp_oldstate, so callbacks
> > advance when either GP type completes.
>
> What is the motivation behind this? When most callbacks don't have much latency
> requirements between queue and invocation?
Right now there's an asymmetry: synchronize_rcu_expedited() callers
get fast object freeing, but call_rcu() callers never benefit from
those same grace periods even though they prove the exact same thing.
This series closes that gap, the callback infrastructure treats a
grace period as a grace period regardless of how it was driven. The
practical effect is faster memory reclaim: objects that could already
be safely freed stop sitting around waiting for the next normal GP.
I'll make this motivation clearer in the next version.
Thanks,
Puranjay
^ permalink raw reply
* Re: [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock
From: Peter Zijlstra @ 2026-06-03 14:26 UTC (permalink / raw)
To: Dmitry Ilvokhin
Cc: Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
Thomas Bogendoerfer, Juergen Gross, Ajay Kaher, Alexey Makhalov,
Broadcom internal kernel review list, Thomas Gleixner,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Arnd Bergmann,
Dennis Zhou, Tejun Heo, Christoph Lameter, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, linux-mips,
virtualization, linux-arch, linux-mm, linux-trace-kernel,
kernel-team, Paul E. McKenney
In-Reply-To: <aiA3dzjKVq2_cpSY@shell.ilvokhin.com>
On Wed, Jun 03, 2026 at 02:17:27PM +0000, Dmitry Ilvokhin wrote:
> > Something a little like so, which is completely untested, except to
> > build kernel/locking/spinlock.o (with clang-23).
>
> Thanks a lot for taking a look, Peter.
>
> I like the static_call idea. It's truly zero cost on x86 (and, as you
> note, even a byte smaller). The one caveat is that it relies on
> HAVE_STATIC_CALL_INLINE to stay free.
>
> So my plan would be: static_call where HAVE_STATIC_CALL_INLINE is
> available (x86), and a static branch fallback elsewhere, gated behind a
> default-off config so it imposes nothing on arches/kernels that don't
> opt in. I'm mostly interested in x86, but would like arm64 to work too,
> which would use the fallback.
(i386 doesn't have STATIC_CALL_INLINE, but nobody cares about the
performance on that target, so anything goes really ;-)
>
> Concretely:
>
> 1. Split the sleepable-lock patches out and send them separately.
> They're independent of the static call work and look far less
> controversial.
>
> 2. Convert the paravirt spinlock unlock to a static_call, as the
> foundation for the unlock tracepoint. I'm happy to take a stab at it.
> Let me know if you'd rather do it yourself.
Yeah, I think that patch as-is *should* work, but like said, I haven't
even tried it, so it could be terribly broken :-)
> 3. Build the unlock tracepoint on top: static_call where it's cheap,
> config-gated static_branch fallback where it isn't.
Right, so I think we need some sort of custom callback for tracepoint
enable/disable. Its been a minute since I dug through the tracepoint
code, but I don't think it provides that with a convenient wrapper, but
it should be doable.
One thing to note is that when you set the tracepoint unlock function,
it should either tail-call into the original function, or you have to
create two unlock_trace functions, one for native and one for paravirt
and pick the right one.
> Does this plan sound reasonable to you?
Yeah, should work.
> > Also, I think someone should go do some performance runs with
> > ARCH_INLINE_SPIN_* set for x86 just like for s390.
>
> That's a good point, I'll run benchmarks and report back with the
> results.
Thanks!
^ permalink raw reply
* Re: [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock
From: Dmitry Ilvokhin @ 2026-06-03 14:17 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
Thomas Bogendoerfer, Juergen Gross, Ajay Kaher, Alexey Makhalov,
Broadcom internal kernel review list, Thomas Gleixner,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Arnd Bergmann,
Dennis Zhou, Tejun Heo, Christoph Lameter, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, linux-mips,
virtualization, linux-arch, linux-mm, linux-trace-kernel,
kernel-team, Paul E. McKenney
In-Reply-To: <20260603120811.GW3493090@noisy.programming.kicks-ass.net>
On Wed, Jun 03, 2026 at 02:08:11PM +0200, Peter Zijlstra wrote:
> On Thu, May 14, 2026 at 12:34:55PM +0000, Dmitry Ilvokhin wrote:
>
> > Baseline, in best case scenario of least number of executed
> > instructions.
> >
> > 3e0: endbr64 ; 4 bytes (always executed)
> > 3e4: movb $0x0,(%rdi) ; 3 bytes (unlock,
> > ; always executed)
> > 3e7: decl %gs:__preempt_count ; 7 bytes (always executed)
> > 3ee: je 3f5 ; 2 bytes (always executed)
> > 3f0: jmp __x86_return_thunk ; 5 bytes (executed if above
> > ; je is not taken)
> > ; rest is not executed
> > 3f5: call __SCT__preempt_schedule ; 5 bytes
> > 3fa: jmp __x86_return_thunk ; 5 bytes
> >
> > Tracepoint (again same case of least number of executed instructions).
> >
> > bc0: endbr64 ; 4 bytes (always executed)
> > bc4: xchg %ax,%ax ; 2 bytes (always executed, this is an
> > ; only addition on the execution path).
> > bc6: movb $0x0,(%rdi) ; 3 bytes (unlock, always executed)
> > bc9: decl %gs:__preempt_count ; 7 bytes (always executed)
> > bd0: je bde ; 2 bytes (always executed)
> > bd2: jmp __x86_return_thunk ; 5 bytes (executed if above
> > ; je is not taken)
> > ; rest is not executed
> > bd7: call queued_spin_release_traced ; 5 bytes
> > bdc: jmp bc9 ; 2 bytes
> > bde: call __SCT__preempt_schedule ; 5 bytes
> > be3: jmp __x86_return_thunk ; 5 bytes
> >
>
> So I've been playing with this a bit, and it is all really sad.
>
> Now, since pretty much everybody+dog will have PARAVIRT_SPINLOCK=y, the
> 'best' solution would be changing that paravirt call with a
> static_call(), that actually shrinks the code by 1 byte.
>
> And then this tracepoint nonsense can simply use a different unlock
> function, just like paravirt.
>
> 0000 00000000000001d0 <_raw_spin_unlock>:
> 0000 1d0: f3 0f 1e fa endbr64
> 0004 1d4: ff 15 00 00 00 00 call *0x0(%rip) # 1da <_raw_spin_unlock+0xa> 1d6: R_X86_64_PC32 pv_ops_lock+0x4
> 000a 1da: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 1e1 <_raw_spin_unlock+0x11> 1dd: R_X86_64_PC32 __preempt_count-0x4
> 0011 1e1: 74 06 je 1e9 <_raw_spin_unlock+0x19>
> 0013 1e3: 2e e9 00 00 00 00 cs jmp 1e9 <_raw_spin_unlock+0x19> 1e5: R_X86_64_PLT32 __x86_return_thunk-0x4
> 0019 1e9: e8 00 00 00 00 call 1ee <_raw_spin_unlock+0x1e> 1ea: R_X86_64_PLT32 __SCT__preempt_schedule-0x4
> 001e 1ee: 2e e9 00 00 00 00 cs jmp 1f4 <_raw_spin_unlock+0x24> 1f0: R_X86_64_PLT32 __x86_return_thunk-0x4
>
>
> 0000 00000000000001d0 <_raw_spin_unlock>:
> 0000 1d0: f3 0f 1e fa endbr64
> 0004 1d4: e8 00 00 00 00 call 1d9 <_raw_spin_unlock+0x9> 1d5: R_X86_64_PLT32 __SCT__queued_spin_unlock-0x4
> 0009 1d9: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # 1e0 <_raw_spin_unlock+0x10> 1dc: R_X86_64_PC32 __preempt_count-0x4
> 0010 1e0: 74 06 je 1e8 <_raw_spin_unlock+0x18>
> 0012 1e2: 2e e9 00 00 00 00 cs jmp 1e8 <_raw_spin_unlock+0x18> 1e4: R_X86_64_PLT32 __x86_return_thunk-0x4
> 0018 1e8: e8 00 00 00 00 call 1ed <_raw_spin_unlock+0x1d> 1e9: R_X86_64_PLT32 __SCT__preempt_schedule-0x4
> 001d 1ed: 2e e9 00 00 00 00 cs jmp 1f3 <_raw_spin_unlock+0x23> 1ef: R_X86_64_PLT32 __x86_return_thunk-0x4
>
>
> Something a little like so, which is completely untested, except to
> build kernel/locking/spinlock.o (with clang-23).
Thanks a lot for taking a look, Peter.
I like the static_call idea. It's truly zero cost on x86 (and, as you
note, even a byte smaller). The one caveat is that it relies on
HAVE_STATIC_CALL_INLINE to stay free.
So my plan would be: static_call where HAVE_STATIC_CALL_INLINE is
available (x86), and a static branch fallback elsewhere, gated behind a
default-off config so it imposes nothing on arches/kernels that don't
opt in. I'm mostly interested in x86, but would like arm64 to work too,
which would use the fallback.
Concretely:
1. Split the sleepable-lock patches out and send them separately.
They're independent of the static call work and look far less
controversial.
2. Convert the paravirt spinlock unlock to a static_call, as the
foundation for the unlock tracepoint. I'm happy to take a stab at it.
Let me know if you'd rather do it yourself.
3. Build the unlock tracepoint on top: static_call where it's cheap,
config-gated static_branch fallback where it isn't.
Does this plan sound reasonable to you?
>
> Also, I think someone should go do some performance runs with
> ARCH_INLINE_SPIN_* set for x86 just like for s390.
That's a good point, I'll run benchmarks and report back with the
results.
^ permalink raw reply
* Re: [RFC PATCH 07/10] rcu: Wake NOCB rcuog kthreads on expedited grace period completion
From: Frederic Weisbecker @ 2026-06-03 14:07 UTC (permalink / raw)
To: Puranjay Mohan
Cc: rcu, linux-kernel, linux-trace-kernel, Paul E. McKenney,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
In-Reply-To: <20260417231203.785172-8-puranjay@kernel.org>
Le Fri, Apr 17, 2026 at 04:11:55PM -0700, Puranjay Mohan a écrit :
> When an expedited grace period completes, rcu_exp_wait_wake() wakes
> waiters on rnp->exp_wq[] but does not notify NOCB rcuog kthreads. These
> kthreads may be sleeping waiting for a grace period to complete.
> Without this wakeup, callbacks on offloaded CPUs that could benefit from
> the expedited GP must wait until the rcuog kthread wakes for some other
> reason (e.g., next normal GP completion or a timer).
>
> Add rcu_exp_wake_nocb() which wakes rcuog kthreads for leaf-node CPUs,
> deduplicating via rdp->nocb_gp_rdp since multiple CPUs share one rcuog
> kthread. Uses for_each_leaf_node_possible_cpu() because offline CPUs
> can have pending callbacks. The function is defined in tree_nocb.h with
> an empty stub for CONFIG_RCU_NOCB_CPU=n builds.
>
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
> kernel/rcu/tree.h | 1 +
> kernel/rcu/tree_exp.h | 1 +
> kernel/rcu/tree_nocb.h | 29 +++++++++++++++++++++++++++++
> 3 files changed, 31 insertions(+)
>
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 7dfc57e9adb1..40f778453591 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -500,6 +500,7 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
> static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
> static void rcu_init_one_nocb(struct rcu_node *rnp);
> static bool wake_nocb_gp(struct rcu_data *rdp);
> +static void rcu_exp_wake_nocb(struct rcu_node *rnp);
> static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> unsigned long j, bool lazy);
> static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 82cada459e5d..0df1009c6e97 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -708,6 +708,7 @@ static void rcu_exp_wait_wake(unsigned long s)
> }
> smp_mb(); /* All above changes before wakeup. */
> wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
> + rcu_exp_wake_nocb(rnp);
> }
> trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake"));
> mutex_unlock(&rcu_state.exp_wake_mutex);
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 7462cd5e2507..f37ee56d62a9 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -190,6 +190,31 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
> init_swait_queue_head(&rnp->nocb_gp_wq[1]);
> }
>
> +/*
> + * Wake NOCB rcuog kthreads for leaf-node CPUs so that they can advance
> + * callbacks that were waiting for the just-completed expedited GP.
> + * Deduplicate via nocb_gp_rdp since multiple CPUs share one rcuog
> + * kthread. Use for_each_leaf_node_possible_cpu() because offline CPUs
> + * may have pending callbacks.
> + */
> +static void rcu_exp_wake_nocb(struct rcu_node *rnp)
Please consolidate the naming to match rcu_nocb_gp_cleanup().
> +{
> + struct rcu_data *last_rdp_gp = NULL;
> + int cpu;
> +
> + if (!rcu_is_leaf_node(rnp))
> + return;
> +
> + for_each_leaf_node_possible_cpu(rnp, cpu) {
> + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
> + if (rdp->nocb_gp_rdp == last_rdp_gp)
> + continue;
> + last_rdp_gp = rdp->nocb_gp_rdp;
> + wake_nocb_gp(rdp);
> + }
There are two waitqueues for rcuog wake-ups:
1) rdp->rdp_gp->nocb_gp_wq: to wait for callbacks on the queue
2) rnp->nocb_gp_wq: to wait for grace periods
So you're waking up the wrong one.
Something like the below? (untested)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0e43866dc4cd..436e12e313c2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2193,8 +2193,13 @@ static noinline void rcu_gp_cleanup(void)
dump_blkd_tasks(rnp, 10);
WARN_ON_ONCE(rnp->qsmask);
WRITE_ONCE(rnp->gp_seq, new_gp_seq);
- if (!rnp->parent)
- smp_mb(); // Order against failing poll_state_synchronize_rcu_full().
+ if (!rnp->parent) {
+ /*
+ * Order against failing poll_state_synchronize_rcu_full().
+ * and also rcu_nocb_cleanup_wake() -> swait_active()
+ */
+ smp_mb();
+ }
rdp = this_cpu_ptr(&rcu_data);
if (rnp == rdp->mynode)
needgp = __note_gp_changes(rnp, rdp) || needgp;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 40f778453591..8f272cb4e4f3 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -253,7 +253,7 @@ struct rcu_data {
u8 nocb_gp_sleep; /* Is the nocb GP thread asleep? */
u8 nocb_gp_bypass; /* Found a bypass on last scan? */
u8 nocb_gp_gp; /* GP to wait for on last scan? */
- unsigned long nocb_gp_seq; /* If so, ->gp_seq to wait for. */
+ struct rcu_gp_oldstate nocb_gp_seq; /* If so, ->gp_seq to wait for. */
unsigned long nocb_gp_loops; /* # passes through wait code. */
struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
bool nocb_cb_sleep; /* Is the nocb CB thread asleep? */
@@ -498,9 +498,9 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
+static void rcu_nocb_exp_cleanup(struct rcu_node *rnp);
static void rcu_init_one_nocb(struct rcu_node *rnp);
static bool wake_nocb_gp(struct rcu_data *rdp);
-static void rcu_exp_wake_nocb(struct rcu_node *rnp);
static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
unsigned long j, bool lazy);
static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 0df1009c6e97..43c167a8a145 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -708,7 +708,8 @@ static void rcu_exp_wait_wake(unsigned long s)
}
smp_mb(); /* All above changes before wakeup. */
wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
- rcu_exp_wake_nocb(rnp);
+ if (rcu_is_leaf_node(rnp))
+ rcu_nocb_exp_cleanup(rnp);
}
trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake"));
mutex_unlock(&rcu_state.exp_wake_mutex);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f37ee56d62a9..60d4182b9509 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -170,13 +170,59 @@ static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
lockdep_assert_held(&rdp->nocb_lock);
}
+static void rcu_nocb_cleanup_wake(struct swait_queue_head *sq)
+{
+ if (swait_active(sq))
+ swake_up_all(sq);
+}
+
/*
* Wake up any no-CBs CPUs' kthreads that were waiting on the just-ended
* grace period.
*/
static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq)
{
- swake_up_all(sq);
+ /*
+ * swait active() can be checked first because:
+ *
+ * rcu_gp_cleanup() nocb_gp_wait()
+ * --------------- --------------
+ * WRITE_ONCE(root->gp_seq, new_gp_seq); swait_event_interruptible_exclusive(sq)
+ * smp_mb() prepare_to_swait()
+ * if swait_active(sq) list_add_tail(&wait->task_list, &q->task_list);
+ * swake_up_all(sq) set_current_state()
+ * smp_mb()
+ * if (poll_state_synchronize_rcu_full())
+ * if (rcu_seq_done_exact(&root->gp_seq, rgosp->rgos_norm))
+ * ...
+ */
+ rcu_nocb_cleanup_wake(sq);
+}
+
+/*
+ * Wake NOCB rcuog kthreads for leaf-node CPUs so that they can advance
+ * callbacks that were waiting for the just-completed expedited GP.
+ * Wake-up waitqueues for both even and odd GP numbers because exp and
+ * normal sequences don't match.
+ */
+static void rcu_nocb_exp_cleanup(struct rcu_node *rnp)
+{
+/*
+ * swait active() can be checked first because:
+ *
+ * rcu_exp_wait_wake() nocb_gp_wait()
+ * --------------- --------------
+ * rcu_seq_end(&rcu_state.expedited_sequence); swait_event_interruptible_exclusive(sq)
+ * smp_mb() prepare_to_swait()
+ * if swait_active(sq) list_add_tail(&wait->task_list, &q->task_list);
+ * swake_up_all(sq) set_current_state()
+ * smp_mb()
+ * if (poll_state_synchronize_rcu_full())
+ * if (rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp))
+ * ...
+ */
+ rcu_nocb_cleanup_wake(&rnp->nocb_gp_wq[0]);
+ rcu_nocb_cleanup_wake(&rnp->nocb_gp_wq[1]);
}
static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
@@ -190,31 +236,6 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
init_swait_queue_head(&rnp->nocb_gp_wq[1]);
}
-/*
- * Wake NOCB rcuog kthreads for leaf-node CPUs so that they can advance
- * callbacks that were waiting for the just-completed expedited GP.
- * Deduplicate via nocb_gp_rdp since multiple CPUs share one rcuog
- * kthread. Use for_each_leaf_node_possible_cpu() because offline CPUs
- * may have pending callbacks.
- */
-static void rcu_exp_wake_nocb(struct rcu_node *rnp)
-{
- struct rcu_data *last_rdp_gp = NULL;
- int cpu;
-
- if (!rcu_is_leaf_node(rnp))
- return;
-
- for_each_leaf_node_possible_cpu(rnp, cpu) {
- struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
-
- if (rdp->nocb_gp_rdp == last_rdp_gp)
- continue;
- last_rdp_gp = rdp->nocb_gp_rdp;
- wake_nocb_gp(rdp);
- }
-}
-
/* Clear any pending deferred wakeup timer (nocb_gp_lock must be held). */
static void nocb_defer_wakeup_cancel(struct rcu_data *rdp_gp)
{
@@ -684,7 +705,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
{
bool bypass = false;
int __maybe_unused cpu = my_rdp->cpu;
- struct rcu_gp_oldstate cur_gp_seq_full;
+ struct rcu_gp_oldstate wait_gp_seq = {0}; //remove uninitialized warning
unsigned long flags;
bool gotcbs = false;
unsigned long j = jiffies;
@@ -694,7 +715,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
bool needwake_gp;
struct rcu_data *rdp, *rdp_toggling = NULL;
struct rcu_node *rnp;
- unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning.
bool wasempty = false;
/*
@@ -718,6 +738,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
* won't be ignored for long.
*/
list_for_each_entry(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp) {
+ struct rcu_gp_oldstate cur_gp_seq;
long bypass_ncbs;
bool flush_bypass = false;
long lazy_ncbs;
@@ -755,8 +776,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
needwake_gp = false;
if (!rcu_segcblist_restempty(&rdp->cblist,
RCU_NEXT_READY_TAIL) ||
- (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full) &&
- poll_state_synchronize_rcu_full(&cur_gp_seq_full))) {
+ (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
+ poll_state_synchronize_rcu_full(&cur_gp_seq))) {
raw_spin_lock_rcu_node(rnp); /* irqs disabled. */
needwake_gp = rcu_advance_cbs(rnp, rdp);
wasempty = rcu_segcblist_restempty(&rdp->cblist,
@@ -777,11 +798,15 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
* numbers from rcu_accelerate_cbs() inside
* rcu_advance_cbs() and will be handled on the next pass.
*/
- if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full) &&
- !poll_state_synchronize_rcu_full(&cur_gp_seq_full)) {
+ if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
+ !poll_state_synchronize_rcu_full(&cur_gp_seq)) {
+ if (!needwait_gp ||
+ ULONG_CMP_LT(cur_gp_seq.rgos_norm, wait_gp_seq.rgos_norm))
+ wait_gp_seq.rgos_norm = cur_gp_seq.rgos_norm;
if (!needwait_gp ||
- ULONG_CMP_LT(cur_gp_seq_full.rgos_norm, wait_gp_seq))
- wait_gp_seq = cur_gp_seq_full.rgos_norm;
+ ULONG_CMP_LT(cur_gp_seq.rgos_exp, wait_gp_seq.rgos_exp))
+ wait_gp_seq.rgos_exp = cur_gp_seq.rgos_exp;
+
needwait_gp = true;
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("NeedWaitGP"));
@@ -803,7 +828,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
my_rdp->nocb_gp_bypass = bypass;
my_rdp->nocb_gp_gp = needwait_gp;
- my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
+ if (needwait_gp)
+ my_rdp->nocb_gp_seq = wait_gp_seq;
// At least one child with non-empty ->nocb_bypass, so set
// timer in order to avoid stranding its callbacks.
@@ -838,12 +864,12 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
nocb_gp_sleep(my_rdp, cpu);
} else {
rnp = my_rdp->mynode;
- trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("StartWait"));
+ trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq.rgos_norm, TPS("StartWait"));
swait_event_interruptible_exclusive(
- rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq) & 0x1],
- rcu_seq_done(&rnp->gp_seq, wait_gp_seq) ||
+ rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq.rgos_norm) & 0x1],
+ poll_state_synchronize_rcu_full(&wait_gp_seq) ||
!READ_ONCE(my_rdp->nocb_gp_sleep));
- trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("EndWait"));
+ trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq.rgos_norm, TPS("EndWait"));
}
if (!rcu_nocb_poll) {
@@ -877,7 +903,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
swake_up_one(&rdp_toggling->nocb_state_wq);
}
- my_rdp->nocb_gp_seq = -1;
+ my_rdp->nocb_gp_seq.rgos_norm = -1;
+ my_rdp->nocb_gp_seq.rgos_exp = -1;
WARN_ON(signal_pending(current));
}
@@ -1561,7 +1588,7 @@ static void show_rcu_nocb_gp_state(struct rcu_data *rdp)
{
struct rcu_node *rnp = rdp->mynode;
- pr_info("nocb GP %d %c%c%c%c%c %c[%c%c] %c%c:%ld rnp %d:%d %lu %c CPU %d%s\n",
+ pr_info("nocb GP %d %c%c%c%c%c %c[%c%c] %c%c:%ld/%ld rnp %d:%d %lu %c CPU %d%s\n",
rdp->cpu,
"kK"[!!rdp->nocb_gp_kthread],
"lL"[raw_spin_is_locked(&rdp->nocb_gp_lock)],
@@ -1573,7 +1600,8 @@ static void show_rcu_nocb_gp_state(struct rcu_data *rdp)
".W"[swait_active(&rnp->nocb_gp_wq[1])],
".B"[!!rdp->nocb_gp_bypass],
".G"[!!rdp->nocb_gp_gp],
- (long)rdp->nocb_gp_seq,
+ (long)rdp->nocb_gp_seq.rgos_norm,
+ (long)rdp->nocb_gp_seq.rgos_exp,
rnp->grplo, rnp->grphi, READ_ONCE(rdp->nocb_gp_loops),
rdp->nocb_gp_kthread ? task_state_to_char(rdp->nocb_gp_kthread) : '.',
rdp->nocb_gp_kthread ? (int)task_cpu(rdp->nocb_gp_kthread) : -1,
@@ -1684,16 +1712,16 @@ static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq)
{
}
-static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
+static void rcu_nocb_exp_cleanup(struct rcu_node *rnp)
{
- return NULL;
}
-static void rcu_init_one_nocb(struct rcu_node *rnp)
+static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
{
+ return NULL;
}
-static void rcu_exp_wake_nocb(struct rcu_node *rnp)
+static void rcu_init_one_nocb(struct rcu_node *rnp)
{
}
^ permalink raw reply related
* Re: [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages
From: Michael Roth @ 2026-06-03 13:54 UTC (permalink / raw)
To: Ackerley Tng
Cc: Suzuki K Poulose, aik, andrew.jones, binbin.wu, brauner,
chao.p.peng, david, ira.weiny, jmattson, jthoughton, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, aneesh.kumar, liam, Paolo Bonzini, Sean Christopherson,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <CAEvNRgGzOnA34WyOHtkOx5MZDZhOHaXAe+nD75AiJsZ-PsTSFQ@mail.gmail.com>
On Tue, Jun 02, 2026 at 01:46:09PM -0700, Ackerley Tng wrote:
> Suzuki K Poulose <suzuki.poulose@arm.com> writes:
>
> > On 23/05/2026 01:17, Ackerley Tng via B4 Relay wrote:
> >> From: Ackerley Tng <ackerleytng@google.com>
> >>
> >> All-shared guest_memfd used to be only supported for non-CoCo VMs where
> >> preparation doesn't apply. INIT_SHARED is about to be supported for
> >> non-CoCo VMs in a later patch in this series.
> >
> > nit: s/non-CoCo/CoCo ?
> >
>
> Yes, thanks!
>
> >>
> >> In addition, KVM_SET_MEMORY_ATTRIBUTES2 is about to be supported in
> >> guest_memfd in a later patch in this series.
> >>
> >> This means that the kvm fault handler may now call kvm_gmem_get_pfn() on a
> >> shared folio for a CoCo VM where preparation applies.
> >>
> >> Add a check to make sure that preparation is only performed for private
> >> folios.
> >>
> >> Preparation will be undone on freeing (see kvm_gmem_free_folio()) and on
> >> conversion to shared.
> >>
> >> Signed-off-by: Michael Roth <michael.roth@amd.com>
> >
> > nit: Missing Co-Developed-by: ?
> >
>
> IIRC this should have been
>
> Suggested-by: Michael Roth <michael.roth@amd.com>
>
> IIRC Michael suggested this on one of the guest_memfd calls, Michael
> please let me know if you remember otherwise!
That rings a bell. Feel free to add, or just drop the stray SoB, either
way.
-Mike
>
> >>
> >> [...snip...]
> >>
^ permalink raw reply
* Re: [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages
From: Michael Roth @ 2026-06-03 13:51 UTC (permalink / raw)
To: Suzuki K Poulose
Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
david, ira.weiny, jmattson, jthoughton, oupton, pankaj.gupta,
qperret, rick.p.edgecombe, rientjes, shivankg, steven.price,
tabba, willy, wyihan, yan.y.zhao, forkloop, pratyush,
aneesh.kumar, liam, Paolo Bonzini, Sean Christopherson,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco
In-Reply-To: <88cae738-18e9-4ed3-8414-506a1ad8fb18@arm.com>
On Wed, Jun 03, 2026 at 09:58:45AM +0100, Suzuki K Poulose wrote:
> On 02/06/2026 23:41, Ackerley Tng wrote:
> > Suzuki K Poulose <suzuki.poulose@arm.com> writes:
> >
> > >
> > > [...snip...]
> > >
> > > > > @@ -914,7 +916,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct
> > > > > kvm_memory_slot *slot,
> > > > > folio_mark_uptodate(folio);
> > > > > }
> > > > > - r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio);
> > > > > + if (kvm_gmem_is_private_mem(inode, index))
> > > >
> > > > Don't we need to make sure the entire folio is private ? Not just the
> > > > page at the index ?
> > > > if (kvm_gmem_range_is_private(, index, folio_nr_pages(folio)) ?
> >
> > I was thinking to fix this when I do huge pages, for now guest_memfd is
> > always just PAGE_SIZE, so just looking up index is fine.
> >
> > Is that okay?
>
> Thats fine, but would be good to enforce that here, so that we don't miss
> out when we add support for multi page folios.
We sort of already enforce that in kvm_gmem_get_folio():
/*
* External interfaces like kvm_gmem_get_pfn() support dealing
* with hugepages to a degree, but internally, guest_memfd currently
* assumes that all folios are order-0 and handling would need
* to be updated for anything otherwise (e.g. page-clearing
* operations).
*/
WARN_ON_ONCE(!IS_ERR(folio) && folio_order(folio));
which was done as part of:
commit 6538b6221cc2feda415ca1946e66a5ef02dc6a0a
Author: Michael Roth <michael.roth@amd.com>
Date: Thu Jan 8 15:46:18 2026 -0600
KVM: guest_memfd: Remove partial hugepage handling from kvm_gmem_populate()
and that should trigger before you even reach the prepare path, so I think
that's covered.
In general, there some previous discussion where we decided we would stop wasting
time guessing at what we'll need to do for hugepages and instead just strip out
the partial support. Sean wanted the folio order kept at part of the internal API
since we know MMU will need that one way or another, but elsewhere within
guest_memfd we are okay to assume 4K. If we *know* certain points that will need
to change then a comment mentioning it isn't a bad idea, but even those comments
have tended to be wrong so far about exactly what changes are supposed to happen.
I'm not sure where the original discussion happened but there's some aftermath
discussion here[1] that I think summarizes current [non-]plans around
prepare+hugepages.
[1] https://lore.kernel.org/kvm/20250711163440.kwjebnzd7zeb4bxt@amd.com/
>
> >
> > >
> > > Or rather, we should go through the individual pages and apply the
> > > prepare for ones that are private ?
> > >
> > > Suzuki
> > >
> >
> > IIRC the plan was to make kvm_gmem_prepare_folio() idempotent, as in, if
> > a page is already private, just skip. Currently sev_gmem_prepare() does
> > a pr_debug(), which I guess is technically still idempotent.
> >
> > I'm thinking that the information tha needs tracking to make
> > .gmem_prepare() idempotent should be tracked by arch code.
> >
> > Does this work for ARM CCA?
>
> We don't hook into the prepare yet, but have plans to do that. We should
> be able to handle the pages that are already private. (For CCA context,
> RMI_GRANULE_DELEGATE_RANGE can skip over already REALM pages). So this
> should be fine.
>
> My point is, in a given folio, there may be pages that are shared.
> Like you said, this could be dealt with when we support hugepages.
Sounds good, that's also what SNP will do once hugepages come along.
-Mike
>
> Suzuki
>
>
> >
> > > >
> > > > [...snip...]
> > > >
>
^ permalink raw reply
* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: David Hildenbrand (Arm) @ 2026-06-03 13:44 UTC (permalink / raw)
To: Zhuo, Qiuxu, rostedt@goodmis.org, mchehab+huawei@kernel.org,
Luck, Tony, bp@alien8.de, akpm@linux-foundation.org,
linmiaohe@huawei.com, xieyuanbin1@huawei.com
Cc: Lai, Yi1, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org, linux-mm@kvack.org,
linux-trace-kernel@vger.kernel.org
In-Reply-To: <CY8PR11MB7134346A3E4BB28ECA28D6E989132@CY8PR11MB7134.namprd11.prod.outlook.com>
On 6/3/26 15:11, Zhuo, Qiuxu wrote:
> Hi,
>
>
>
> Laiyi reported that the userspace rasdaemon fails to enable memory_failure_event
> on kernels >= v6.19.
>
>
>
> Kernel commit 31807483d395 ("mm/memory-failure: remove the selection of RAS"),
> merged in v6.19-rc1,
>
> moved the memory_failure_event tracepoint from the "ras" subsystem to
> "memory_failure".
>
> However, rasdaemon still tries to enable:
>
>
>
> ras:memory_failure_event
>
>
>
> while on v6.19+ kernels, the tracepoint is:
>
>
>
> memory_failure:memory_failure_event
>
>
>
> As a result, rasdaemon fails to start:
>
>
>
> …
>
> Can't write to set_event
>
> Huh! something got wrong. Aborting.
>
> …
>
>
>
> Reproducer:
>
>
>
> rasdaemon --enable
>
>
>
> Could you please let me know whether the preferred solution is to revert the
> kernel change,
>
> or to update rasdaemon to support both tracepoint names for backward/forward
> compatibility?
Likely the latter. BPF [1] documents:
Q: Are tracepoints part of the stable ABI?
A: NO. Tracepoints are tied to internal implementation details hence they are
subject to change and can break with newer kernels. BPF programs need to change
accordingly when this happens.
The Kernel ABI document explicitly doesn't list them AFAIKS.
There were previous discussions on the stability of tracepints [2], I don't know
what changed in the meantime. CCing Steve
[1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html
[2] https://lwn.net/Articles/747256/
[3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Rasmus Villemoes @ 2026-06-03 13:14 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Andy Shevchenko, Arnd Bergmann, Steven Rostedt, Masami Hiramatsu,
Andrew Morton, Petr Mladek, Nathan Chancellor, Dennis Dalessandro,
Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
Miri Korenblit, Mathieu Desnoyers, Sergey Senozhatsky,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <87zf1bhjqp.fsf@prevas.dk>
On Wed, Jun 03 2026, Rasmus Villemoes <ravi@prevas.dk> wrote:
> On Wed, Jun 03 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:
>
>> On Wed, Jun 3, 2026, at 09:15, Rasmus Villemoes wrote:
>>> On Tue, Jun 02 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:
>>>> On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
>>>>> On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
>>>
>>> May I suggest a different approach, that avoids having that extra
>>> function emitted (which presumably compiles to a single jump
>>> instruction, but still, with retpoline and CFI and all that it all adds
>>> up): Keep the declaration of __vsnprintf() in the header without the
>>> __print() attribute, but then do
>>>
>>> int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
>>> __alias(vsnprintf);
>>>
>>> in vsprintf.c. Aside from reusing the same entry point, I could well
>>> imagine a compiler some day complaining about seeing the printf
>>> attribute applied in a local extra declaration but not having it in the
>>> header file.
>>>
>>> Presumably it will need its own EXPORT_SYMBOL if any of the intended
>>> users are modular, and it certainly still needs a comment.
>>
>> I had tried that earlier but given up because the attributes have to
>> match exactly.
>>
>> This definition works with all currently supported versions of gcc,
>> but may have to change when the there is a new version that adds
>> even more attributes:
>>
>> int
>> __printf(3, 0)
>> __attribute__((nothrow))
>> __attribute__((nonnull(1)))
>> __vsnprintf(char *__restrict buf, size_t size,
>> const char * __restrict fmt_str, va_list args)
>> __alias(vsnprintf);
>>
>
> Ah, I see. The documentation for the alias attribute does say that the
> types have to match, but I didn't know that the nothrow and nonnull
> attributes were considered part of the type identity. Oddly enough, if
> one does
>
> typeof(vsnprintf) __vsnprintf __alias(vsnprintf);
>
> that still fails, but only complains about nothrow, not nonnull.
>
> I don't remember what minimum gcc we currently require, but gcc 9
> introduced another attribute that is apperently meant for cases like
> this: 'copy'. This seems to build:
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 9f359b31c8d1..c1402d375429 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2988,6 +2988,9 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> }
> EXPORT_SYMBOL(vsnprintf);
>
> +int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> + __alias(vsnprintf) __attribute__((__copy__(vsnprintf)));
> +
> /**
> * vscnprintf - Format a string and place it in a buffer
> * @buf: The buffer to place the result into
>
> That at least should handle any future "gcc knows this-or-that about the
> vsnprintf function". But I don't know if clang supports that copy
> mechanism or if the minimum supported gcc is too old.
Ah, so we already have __copy in compiler-attributes.h, stating that
it's not supported by clang. Our current minimum gcc version is 8. But
judging from the commit message for c0d9782f5, perhaps it's not actually
a problem that it just expands to nothing for gcc 8, as the "less
restrictive attribute" warning was also introduced with gcc 9, and the
__copy macro is already used for module init/exit functions. Which also
suggests that it might not be a problem that clang doesn't support it.
Rasmus
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Arnd Bergmann @ 2026-06-03 13:10 UTC (permalink / raw)
To: Rasmus Villemoes
Cc: Andy Shevchenko, Arnd Bergmann, Steven Rostedt, Masami Hiramatsu,
Andrew Morton, Petr Mladek, Nathan Chancellor, Dennis Dalessandro,
Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
Miri Korenblit, Mathieu Desnoyers, Sergey Senozhatsky,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <87zf1bhjqp.fsf@prevas.dk>
On Wed, Jun 3, 2026, at 14:49, Rasmus Villemoes wrote:
> On Wed, Jun 03 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:
>> On Wed, Jun 3, 2026, at 09:15, Rasmus Villemoes wrote:
> I don't remember what minimum gcc we currently require, but gcc 9
> introduced another attribute that is apperently meant for cases like
> this: 'copy'.
The minimum version is gcc-8.
> This seems to build:
>...
> +int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> + __alias(vsnprintf) __attribute__((__copy__(vsnprintf)));
> +
>
> That at least should handle any future "gcc knows this-or-that about the
> vsnprintf function". But I don't know if clang supports that copy
> mechanism or if the minimum supported gcc is too old.
clang-22 and gcc-8 don't like the attribute, but also don't complain
about the other mismatched attributes, so simply using the __copy()
macro we already have will work on all currently supported
compilers and likely all future ones as well.
Arnd
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: David Laight @ 2026-06-03 13:03 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Rasmus Villemoes, Andy Shevchenko, Arnd Bergmann, Steven Rostedt,
Masami Hiramatsu, Andrew Morton, Petr Mladek, Nathan Chancellor,
Dennis Dalessandro, Jason Gunthorpe, Leon Romanovsky,
Arend van Spriel, Miri Korenblit, Mathieu Desnoyers,
Sergey Senozhatsky, Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <aafe201a-64a6-438f-89a3-d1cd10a357a7@app.fastmail.com>
On Wed, 03 Jun 2026 10:41:18 +0200
"Arnd Bergmann" <arnd@arndb.de> wrote:
> On Wed, Jun 3, 2026, at 09:15, Rasmus Villemoes wrote:
> > On Tue, Jun 02 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:
> >> On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
> >>> On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
> >
> > May I suggest a different approach, that avoids having that extra
> > function emitted (which presumably compiles to a single jump
> > instruction, but still, with retpoline and CFI and all that it all adds
> > up): Keep the declaration of __vsnprintf() in the header without the
> > __print() attribute, but then do
> >
> > int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> > __alias(vsnprintf);
> >
> > in vsprintf.c. Aside from reusing the same entry point, I could well
> > imagine a compiler some day complaining about seeing the printf
> > attribute applied in a local extra declaration but not having it in the
> > header file.
> >
> > Presumably it will need its own EXPORT_SYMBOL if any of the intended
> > users are modular, and it certainly still needs a comment.
>
> I had tried that earlier but given up because the attributes have to
> match exactly.
>
> This definition works with all currently supported versions of gcc,
> but may have to change when the there is a new version that adds
> even more attributes:
>
> int
> __printf(3, 0)
> __attribute__((nothrow))
> __attribute__((nonnull(1)))
> __vsnprintf(char *__restrict buf, size_t size,
> const char * __restrict fmt_str, va_list args)
> __alias(vsnprintf);
>
> We'd probably want to also add __nothrow and __nonnull macros
> in linux/compiler-attributes.h if we do this.
>
> For reference, see below for the alternative idea I had
> that avoids adding the __vsnprintf() alias altogether by
> passing down the va_format using "%pV".
>
> I don't think I actually got this one right in the end
> since I only build-tested it, but I expect it could be done
> if someone is able to test and fix all the corner cases
> properly.
>
> Arnd
>
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 4715330c7b6b..8e44fc3e60b0 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -956,14 +956,11 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
> * gcc warns that you can not use a va_list in an inlined
> * function. But lets me make it into a macro :-/
> */
> -#define __trace_event_vstr_len(fmt, va) \
> +#define __trace_event_vstr_len(vf) \
> ({ \
> - va_list __ap; \
> int __ret; \
> \
> - va_copy(__ap, *(va)); \
> - __ret = __vsnprintf(NULL, 0, fmt, __ap) + 1; \
> - va_end(__ap); \
> + __ret = snprintf(NULL, 0, "%pV", vf) + 1; \
This adds an extra snprintf call - non-trivial and more stack.
Can't you just use the old code with vf->fmt and vf->ap ?
And does the %pV" include the va_copy()?
It isn't normally needed.
Any scheme for avoiding doing the printf processing twice
is likely to be a gain.
-- David
^ permalink raw reply
* [GIT PULL] rv fixes for v7.1
From: Gabriele Monaco @ 2026-06-03 12:50 UTC (permalink / raw)
To: Steven Rostedt, linux-kernel
Cc: linux-trace-kernel, Gabriele Monaco, unknownbbqrx, Wen Yang
Steve,
The following changes since commit e43ffb69e0438cddd72aaa30898b4dc446f664f8:
Linux 7.1-rc6 (2026-05-31 15:14:24 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/gmonaco/linux.git rv-fixes-7.1
for you to fetch changes up to 44c2e91a684132ff0e47dc1f792bbdb42d64bd64:
verification/rvgen: Fix ltl2k writing True as a literal (2026-06-03 13:20:58 +0200)
----------------------------------------------------------------
rv fixes for v7.1
Summary of changes:
- Fix reset ordering on per-task destruction
Reset the task before dropping the slot instead of after, which was
causing out-of-bound memory accesses.
- Fix HA monitor synchronization and cleanup
Ensure synchronous cleanup for HA monitors by running timer callbacks
in RCU read-side critical sections and using synchronize_rcu() during
destruction.
- Avoid armed timers after tasks exit
Add automatic cleanup for per-task HA monitors to prevent timers from
firing after task exit.
- Fix memory ordering for DA/HA monitors
Fix race conditions during monitor start by using release-acquire
semantics for the monitoring flag.
- Fix initialization for DA/HA monitors
Ensure monitors are not initialized relying on potentially corrupted
state like the monitoring flag, that is not reset by all monitors type
and may have an unknown state in monitors reusing the storage
(per-task).
- Fix memory safety in per-task and per-object monitors
Prevent use-after-free and out-of-bounds access by synchronizing with
in-flight tracepoint probes using tracepoint_synchronize_unregister()
before freeing monitor storage or releasing task slots.
- Adjust monitors for preemptible tracepoints
Fix monitors that relied on tracepoints disabling preemption.
Explicitly disable task migration when per-CPU monitors handle events
to avoid accessing the wrong state and update the opid monitor logic.
- Fix incorrect __user specifier usage
Remove __user from a non-pointer variable in the extract_params()
helper.
- Fix bugs in the rv tool
Ensure strings are NUL-terminated, fix substring matching in monitor
searches, and improve cleanup and exit status handling.
- Fix several bugs in rvgen
Fix LTL literal stringification, subparsers' options handling, and
suffix stripping in dot2k.
----------------------------------------------------------------
Gabriele Monaco (15):
rv: Fix __user specifier usage in extract_params()
rv: Reset per-task DA monitors before releasing the slot
rv: Prevent in-flight per-task handlers from using invalid slots
rv: Ensure all pending probes terminate on per-obj monitor destroy
rv: Do not rely on clean monitor when initialising HA
rv: Add automatic cleanup handlers for per-task HA monitors
rv: Ensure synchronous cleanup for HA monitors
rv: Prevent task migration while handling per-CPU events
rv: Use 0 to check preemption enabled in opid
tools/rv: Fix substring match bug in monitor name search
tools/rv: Fix substring match when listing container monitors
tools/rv: Fix cleanup after failed trace setup
verification/rvgen: Fix suffix strip in dot2k
verification/rvgen: Fix options shared among commands
verification/rvgen: Fix ltl2k writing True as a literal
Wen Yang (1):
rv: Fix monitor start ordering and memory ordering for monitoring flag
unknownbbqrx (1):
tools/rv: Ensure monitor name and desc are NUL-terminated
include/rv/da_monitor.h | 139 +++++++++++++++++----
include/rv/ha_monitor.h | 91 +++++++++++++-
include/rv/ltl_monitor.h | 1 +
kernel/trace/rv/monitors/deadline/deadline.h | 3 +-
kernel/trace/rv/monitors/nomiss/nomiss.c | 4 +-
kernel/trace/rv/monitors/opid/opid.c | 12 +-
kernel/trace/rv/monitors/stall/stall.c | 4 +-
tools/verification/rv/src/in_kernel.c | 65 +++++-----
tools/verification/rvgen/__main__.py | 10 +-
tools/verification/rvgen/rvgen/dot2k.py | 4 +-
tools/verification/rvgen/rvgen/ltl2ba.py | 9 +-
.../rvgen/rvgen/templates/dot2k/main.c | 4 +-
12 files changed, 263 insertions(+), 83 deletions(-)
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Cc: unknownbbqrx <dev@unknownbbqr.xyz>
Cc: Wen Yang <wen.yang@linux.dev>
^ permalink raw reply
* Re: [PATCH 1/2] tracing: work around -Wmissing-format-attribute warning
From: Rasmus Villemoes @ 2026-06-03 12:49 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Andy Shevchenko, Arnd Bergmann, Steven Rostedt, Masami Hiramatsu,
Andrew Morton, Petr Mladek, Nathan Chancellor, Dennis Dalessandro,
Jason Gunthorpe, Leon Romanovsky, Arend van Spriel,
Miri Korenblit, Mathieu Desnoyers, Sergey Senozhatsky,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Vlastimil Babka (SUSE), linux-rdma, linux-kernel, linux-wireless,
brcm80211, brcm80211-dev-list.pdl, linux-trace-kernel, llvm
In-Reply-To: <aafe201a-64a6-438f-89a3-d1cd10a357a7@app.fastmail.com>
On Wed, Jun 03 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:
> On Wed, Jun 3, 2026, at 09:15, Rasmus Villemoes wrote:
>> On Tue, Jun 02 2026, "Arnd Bergmann" <arnd@arndb.de> wrote:
>>> On Tue, Jun 2, 2026, at 20:59, Andy Shevchenko wrote:
>>>> On Tue, Jun 02, 2026 at 05:07:05PM +0200, Arnd Bergmann wrote:
>>
>> May I suggest a different approach, that avoids having that extra
>> function emitted (which presumably compiles to a single jump
>> instruction, but still, with retpoline and CFI and all that it all adds
>> up): Keep the declaration of __vsnprintf() in the header without the
>> __print() attribute, but then do
>>
>> int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
>> __alias(vsnprintf);
>>
>> in vsprintf.c. Aside from reusing the same entry point, I could well
>> imagine a compiler some day complaining about seeing the printf
>> attribute applied in a local extra declaration but not having it in the
>> header file.
>>
>> Presumably it will need its own EXPORT_SYMBOL if any of the intended
>> users are modular, and it certainly still needs a comment.
>
> I had tried that earlier but given up because the attributes have to
> match exactly.
>
> This definition works with all currently supported versions of gcc,
> but may have to change when the there is a new version that adds
> even more attributes:
>
> int
> __printf(3, 0)
> __attribute__((nothrow))
> __attribute__((nonnull(1)))
> __vsnprintf(char *__restrict buf, size_t size,
> const char * __restrict fmt_str, va_list args)
> __alias(vsnprintf);
>
Ah, I see. The documentation for the alias attribute does say that the
types have to match, but I didn't know that the nothrow and nonnull
attributes were considered part of the type identity. Oddly enough, if
one does
typeof(vsnprintf) __vsnprintf __alias(vsnprintf);
that still fails, but only complains about nothrow, not nonnull.
I don't remember what minimum gcc we currently require, but gcc 9
introduced another attribute that is apperently meant for cases like
this: 'copy'. This seems to build:
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 9f359b31c8d1..c1402d375429 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2988,6 +2988,9 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
}
EXPORT_SYMBOL(vsnprintf);
+int __vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
+ __alias(vsnprintf) __attribute__((__copy__(vsnprintf)));
+
/**
* vscnprintf - Format a string and place it in a buffer
* @buf: The buffer to place the result into
That at least should handle any future "gcc knows this-or-that about the
vsnprintf function". But I don't know if clang supports that copy
mechanism or if the minimum supported gcc is too old.
Rasmus
^ permalink raw reply related
* Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support
From: David Hildenbrand (Arm) @ 2026-06-03 12:27 UTC (permalink / raw)
To: Nico Pache
Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, vbabka, vishal.moola,
wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe,
Usama Arif, usamaarif642
In-Reply-To: <CAA1CXcD_Bz=Si+vP0Sg_FhAHVjbeAJ4VdvtWGM-jVYiRXFHAnw@mail.gmail.com>
On 6/3/26 14:16, Nico Pache wrote:
> On Wed, Jun 3, 2026 at 4:01 AM David Hildenbrand (Arm) <david@kernel.org> wrote:
>>
>>
>>> next_order:
>>> - if ((BIT(order) - 1) & enabled_orders) {
>>> - const u8 next_order = order - 1;
>>> - const u16 mid_offset = offset + (nr_ptes / 2);
>>> -
>>> - collapse_mthp_stack_push(cc, &stack_size, mid_offset,
>>> - next_order);
>>> - collapse_mthp_stack_push(cc, &stack_size, offset,
>>> - next_order);
>>> + if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
>>> + (BIT(order) - 1) & enabled_orders) {
>>
>> Why not a test_bit() ?
>
> The test bit is at the top of the loop. This adds a exit if the lower
> orders are all disabled or we hit the last order.
Ah, now I understand what you want to do here.
I guess you should add a () around the & and maybe add a comment.
And likely using GENMASK is clearer?
/*
* Continue with the next smaller order if there is still
* any smaller order enabled.
*/
if (order > KHUGEPAGED_MIN_MTHP_ORDER &&
(enabled_orders & GENMASK(order - 1, 0))) {
...
}
>
>>
>>
>> But, wouldn't you want to skip orders that are not enabled and try with the next
>> smaller one in any case before you advance the offset?
>
> We are currently iterating through each order (not skipping them).
> There may be optimizations to avoid iterating through every order
> (like your changes suggest), but currently, every collapse, whether it
> succeeds or fails at the bottom order, must also iterate the offset.
Right, we currently miss opportunities to just skip, that we can optimize later.
--
Cheers,
David
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox