From: Petr Mladek <pmladek@suse.com>
To: chensong_2000@189.cn
Cc: rafael@kernel.org, lenb@kernel.org, mturquette@baylibre.com,
sboyd@kernel.org, viresh.kumar@linaro.org, agk@redhat.com,
snitzer@kernel.org, mpatocka@redhat.com, bmarzins@redhat.com,
song@kernel.org, yukuai@fnnas.com, linan122@huawei.com,
jason.wessel@windriver.com, danielt@kernel.org,
dianders@chromium.org, horms@kernel.org, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
paulmck@kernel.org, frederic@kernel.org, mcgrof@kernel.org,
petr.pavlu@suse.com, da.gomez@kernel.org,
samitolvanen@google.com, atomlin@atomlin.com,
jpoimboe@kernel.org, jikos@kernel.org, mbenes@suse.cz,
joe.lawrence@redhat.com, rostedt@goodmis.org,
mhiramat@kernel.org, mark.rutland@arm.com,
mathieu.desnoyers@efficios.com, linux-modules@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-clk@vger.kernel.org,
linux-pm@vger.kernel.org, live-patching@vger.kernel.org,
dm-devel@lists.linux.dev, linux-raid@vger.kernel.org,
kgdb-bugreport@lists.sourceforge.net, netdev@vger.kernel.org
Subject: Re: [RFC PATCH 1/2] kernel/notifier: replace single-linked list with double-linked list for reverse traversal
Date: Thu, 16 Apr 2026 12:33:43 +0200 [thread overview]
Message-ID: <aeC7ByGA5MHBcGQR@pathway.suse.cz> (raw)
In-Reply-To: <20260415070137.17860-1-chensong_2000@189.cn>
On Wed 2026-04-15 15:01:37, chensong_2000@189.cn wrote:
> From: Song Chen <chensong_2000@189.cn>
>
> The current notifier chain implementation uses a single-linked list
> (struct notifier_block *next), which only supports forward traversal
> in priority order. This makes it difficult to handle cleanup/teardown
> scenarios that require notifiers to be called in reverse priority order.
>
> A concrete example is the ordering dependency between ftrace and
> livepatch during module load/unload. see the detail here [1].
>
> This patch replaces the single-linked list in struct notifier_block
> with a struct list_head, converting the notifier chain into a
> doubly-linked list sorted in descending priority order. Based on
> this, a new function notifier_call_chain_reverse() is introduced,
> which traverses the chain in reverse (ascending priority order).
> The corresponding blocking_notifier_call_chain_reverse() is also
> added as the locking wrapper for blocking notifier chains.
>
> The internal notifier_call_chain_robust() is updated to use
> notifier_call_chain_reverse() for rollback: on error, it records
> the failing notifier (last_nb) and the count of successfully called
> notifiers (nr), then rolls back exactly those nr-1 notifiers in
> reverse order starting from last_nb's predecessor, without needing
> to know the total length of the chain.
>
> With this change, subsystems with symmetric setup/teardown ordering
> requirements can register a single notifier_block with one priority
> value, and rely on blocking_notifier_call_chain() for forward
> traversal and blocking_notifier_call_chain_reverse() for reverse
> traversal, without needing hard-coded call sequences or separate
> notifier registrations for each direction.
>
> [1]:https://lore.kernel.org/all
> /alpine.LNX.2.00.1602172216491.22700@cbobk.fhfr.pm/
>
> --- a/include/linux/notifier.h
> +++ b/include/linux/notifier.h
> @@ -53,41 +53,41 @@ typedef int (*notifier_fn_t)(struct notifier_block *nb,
[...]
> struct notifier_block {
> notifier_fn_t notifier_call;
> - struct notifier_block __rcu *next;
> + struct list_head __rcu entry;
> int priority;
> };
[...]
> #define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \
> spin_lock_init(&(name)->lock); \
> - (name)->head = NULL; \
> + INIT_LIST_HEAD(&(name)->head); \
I would expect the RCU variant here, aka INIT_LIST_HEAD_RCU().
> --- a/kernel/notifier.c
> +++ b/kernel/notifier.c
> @@ -14,39 +14,47 @@
> * are layered on top of these, with appropriate locking added.
> */
>
> -static int notifier_chain_register(struct notifier_block **nl,
> +static int notifier_chain_register(struct list_head *nl,
> struct notifier_block *n,
> bool unique_priority)
> {
> - while ((*nl) != NULL) {
> - if (unlikely((*nl) == n)) {
> + struct notifier_block *cur;
> +
> + list_for_each_entry(cur, nl, entry) {
> + if (unlikely(cur == n)) {
> WARN(1, "notifier callback %ps already registered",
> n->notifier_call);
> return -EEXIST;
> }
> - if (n->priority > (*nl)->priority)
> - break;
> - if (n->priority == (*nl)->priority && unique_priority)
> +
> + if (n->priority == cur->priority && unique_priority)
> return -EBUSY;
> - nl = &((*nl)->next);
> +
> + if (n->priority > cur->priority) {
> + list_add_tail(&n->entry, &cur->entry);
> + goto out;
> + }
> }
> - n->next = *nl;
> - rcu_assign_pointer(*nl, n);
> +
> + list_add_tail(&n->entry, nl);
I would expect list_add_tail_rcu() here.
> @@ -59,25 +67,25 @@ static int notifier_chain_unregister(struct notifier_block **nl,
> * value of this parameter is -1.
> * @nr_calls: Records the number of notifications sent. Don't care
> * value of this field is NULL.
> + * @last_nb: Records the last called notifier block for rolling back
> * Return: notifier_call_chain returns the value returned by the
> * last notifier function called.
> */
> -static int notifier_call_chain(struct notifier_block **nl,
> +static int notifier_call_chain(struct list_head *nl,
> unsigned long val, void *v,
> - int nr_to_call, int *nr_calls)
> + int nr_to_call, int *nr_calls,
> + struct notifier_block **last_nb)
> {
> int ret = NOTIFY_DONE;
> - struct notifier_block *nb, *next_nb;
> -
> - nb = rcu_dereference_raw(*nl);
> + struct notifier_block *nb;
>
> - while (nb && nr_to_call) {
> - next_nb = rcu_dereference_raw(nb->next);
> + if (!nr_to_call)
> + return ret;
>
> + list_for_each_entry(nb, nl, entry) {
I would expect the RCU variant here, aka list_for_each_rcu()
These are just two random examples which I found by a quick look.
I guess that the notifier API is very old and it does not use all
the RCU API features which allow to track safety when
CONFIG_PROVE_RCU and CONFIG_PROVE_RCU_LIST are enabled.
It actually might be worth to audit the code and make it right.
> #ifdef CONFIG_DEBUG_NOTIFIERS
> if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
> WARN(1, "Invalid notifier called!");
> - nb = next_nb;
> continue;
> }
> #endif
That said, I am not sure if the ftrace/livepatching handlers are
the right motivation for this. Especially when I see the
complexity of the 2nd patch [*]
After thinking more about it. I am not even sure that the ftrace and
livepatching callbacks are good candidates for generic notifiers.
They are too special. It is not only about ordering them against
each other. But it is also about ordering them against other
notifiers. The ftrace/livepatching callbacks must be the first/last
during module load/release.
[*] The 2nd patch is not archived by lore for some reason.
I have found only a review but it gives a good picture, see
https://lore.kernel.org/all/1191caf5-6a61-4622-a15e-854d3701f4fc@suse.com/
Best Regards,
Petr
next prev parent reply other threads:[~2026-04-16 10:33 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 7:01 [RFC PATCH 1/2] kernel/notifier: replace single-linked list with double-linked list for reverse traversal chensong_2000
2026-04-15 7:40 ` Christoph Hellwig
2026-04-16 10:33 ` Petr Mladek [this message]
2026-04-16 12:30 ` David Laight
2026-04-16 14:54 ` Petr Mladek
2026-04-16 19:15 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeC7ByGA5MHBcGQR@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=agk@redhat.com \
--cc=atomlin@atomlin.com \
--cc=bmarzins@redhat.com \
--cc=chensong_2000@189.cn \
--cc=da.gomez@kernel.org \
--cc=danielt@kernel.org \
--cc=davem@davemloft.net \
--cc=dianders@chromium.org \
--cc=dm-devel@lists.linux.dev \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=horms@kernel.org \
--cc=jason.wessel@windriver.com \
--cc=jikos@kernel.org \
--cc=joe.lawrence@redhat.com \
--cc=jpoimboe@kernel.org \
--cc=kgdb-bugreport@lists.sourceforge.net \
--cc=kuba@kernel.org \
--cc=lenb@kernel.org \
--cc=linan122@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-clk@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-modules@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=live-patching@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mbenes@suse.cz \
--cc=mcgrof@kernel.org \
--cc=mhiramat@kernel.org \
--cc=mpatocka@redhat.com \
--cc=mturquette@baylibre.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paulmck@kernel.org \
--cc=petr.pavlu@suse.com \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=samitolvanen@google.com \
--cc=sboyd@kernel.org \
--cc=snitzer@kernel.org \
--cc=song@kernel.org \
--cc=viresh.kumar@linaro.org \
--cc=yukuai@fnnas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox