From: Song Chen <chensong_2000@189.cn>
To: Petr Mladek <pmladek@suse.com>
Cc: rafael@kernel.org, lenb@kernel.org, mturquette@baylibre.com,
sboyd@kernel.org, viresh.kumar@linaro.org, agk@redhat.com,
snitzer@kernel.org, mpatocka@redhat.com, bmarzins@redhat.com,
song@kernel.org, yukuai@fnnas.com, linan122@huawei.com,
jason.wessel@windriver.com, danielt@kernel.org,
dianders@chromium.org, horms@kernel.org, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
paulmck@kernel.org, frederic@kernel.org, mcgrof@kernel.org,
petr.pavlu@suse.com, da.gomez@kernel.org,
samitolvanen@google.com, atomlin@atomlin.com,
jpoimboe@kernel.org, jikos@kernel.org, mbenes@suse.cz,
joe.lawrence@redhat.com, rostedt@goodmis.org,
mhiramat@kernel.org, mark.rutland@arm.com,
mathieu.desnoyers@efficios.com, linux-modules@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-clk@vger.kernel.org,
linux-pm@vger.kernel.org, live-patching@vger.kernel.org,
dm-devel@lists.linux.dev, linux-raid@vger.kernel.org,
kgdb-bugreport@lists.sourceforge.net, netdev@vger.kernel.org
Subject: Re: [RFC PATCH 1/2] kernel/notifier: replace single-linked list with double-linked list for reverse traversal
Date: Sun, 19 Apr 2026 08:07:27 +0800 [thread overview]
Message-ID: <7486bf8e-9713-43ed-af4d-ecefec00c980@189.cn> (raw)
In-Reply-To: <aeC7ByGA5MHBcGQR@pathway.suse.cz>
Hi,
On 4/16/26 18:33, Petr Mladek wrote:
> On Wed 2026-04-15 15:01:37, chensong_2000@189.cn wrote:
>> From: Song Chen <chensong_2000@189.cn>
>>
>> The current notifier chain implementation uses a single-linked list
>> (struct notifier_block *next), which only supports forward traversal
>> in priority order. This makes it difficult to handle cleanup/teardown
>> scenarios that require notifiers to be called in reverse priority order.
>>
>> A concrete example is the ordering dependency between ftrace and
>> livepatch during module load/unload. see the detail here [1].
>>
>> This patch replaces the single-linked list in struct notifier_block
>> with a struct list_head, converting the notifier chain into a
>> doubly-linked list sorted in descending priority order. Based on
>> this, a new function notifier_call_chain_reverse() is introduced,
>> which traverses the chain in reverse (ascending priority order).
>> The corresponding blocking_notifier_call_chain_reverse() is also
>> added as the locking wrapper for blocking notifier chains.
>>
>> The internal notifier_call_chain_robust() is updated to use
>> notifier_call_chain_reverse() for rollback: on error, it records
>> the failing notifier (last_nb) and the count of successfully called
>> notifiers (nr), then rolls back exactly those nr-1 notifiers in
>> reverse order starting from last_nb's predecessor, without needing
>> to know the total length of the chain.
>>
>> With this change, subsystems with symmetric setup/teardown ordering
>> requirements can register a single notifier_block with one priority
>> value, and rely on blocking_notifier_call_chain() for forward
>> traversal and blocking_notifier_call_chain_reverse() for reverse
>> traversal, without needing hard-coded call sequences or separate
>> notifier registrations for each direction.
>>
>> [1]:https://lore.kernel.org/all
>> /alpine.LNX.2.00.1602172216491.22700@cbobk.fhfr.pm/
>>
>> --- a/include/linux/notifier.h
>> +++ b/include/linux/notifier.h
>> @@ -53,41 +53,41 @@ typedef int (*notifier_fn_t)(struct notifier_block *nb,
> [...]
>> struct notifier_block {
>> notifier_fn_t notifier_call;
>> - struct notifier_block __rcu *next;
>> + struct list_head __rcu entry;
>> int priority;
>> };
> [...]
>> #define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \
>> spin_lock_init(&(name)->lock); \
>> - (name)->head = NULL; \
>> + INIT_LIST_HEAD(&(name)->head); \
>
> I would expect the RCU variant here, aka INIT_LIST_HEAD_RCU().
I'm not familiar with list rcu, but i will look into it and give it a try.
>
>> --- a/kernel/notifier.c
>> +++ b/kernel/notifier.c
>> @@ -14,39 +14,47 @@
>> * are layered on top of these, with appropriate locking added.
>> */
>>
>> -static int notifier_chain_register(struct notifier_block **nl,
>> +static int notifier_chain_register(struct list_head *nl,
>> struct notifier_block *n,
>> bool unique_priority)
>> {
>> - while ((*nl) != NULL) {
>> - if (unlikely((*nl) == n)) {
>> + struct notifier_block *cur;
>> +
>> + list_for_each_entry(cur, nl, entry) {
>> + if (unlikely(cur == n)) {
>> WARN(1, "notifier callback %ps already registered",
>> n->notifier_call);
>> return -EEXIST;
>> }
>> - if (n->priority > (*nl)->priority)
>> - break;
>> - if (n->priority == (*nl)->priority && unique_priority)
>> +
>> + if (n->priority == cur->priority && unique_priority)
>> return -EBUSY;
>> - nl = &((*nl)->next);
>> +
>> + if (n->priority > cur->priority) {
>> + list_add_tail(&n->entry, &cur->entry);
>> + goto out;
>> + }
>> }
>> - n->next = *nl;
>> - rcu_assign_pointer(*nl, n);
>> +
>> + list_add_tail(&n->entry, nl);
>
> I would expect list_add_tail_rcu() here.
>
>> @@ -59,25 +67,25 @@ static int notifier_chain_unregister(struct notifier_block **nl,
>> * value of this parameter is -1.
>> * @nr_calls: Records the number of notifications sent. Don't care
>> * value of this field is NULL.
>> + * @last_nb: Records the last called notifier block for rolling back
>> * Return: notifier_call_chain returns the value returned by the
>> * last notifier function called.
>> */
>> -static int notifier_call_chain(struct notifier_block **nl,
>> +static int notifier_call_chain(struct list_head *nl,
>> unsigned long val, void *v,
>> - int nr_to_call, int *nr_calls)
>> + int nr_to_call, int *nr_calls,
>> + struct notifier_block **last_nb)
>> {
>> int ret = NOTIFY_DONE;
>> - struct notifier_block *nb, *next_nb;
>> -
>> - nb = rcu_dereference_raw(*nl);
>> + struct notifier_block *nb;
>>
>> - while (nb && nr_to_call) {
>> - next_nb = rcu_dereference_raw(nb->next);
>> + if (!nr_to_call)
>> + return ret;
>>
>> + list_for_each_entry(nb, nl, entry) {
>
> I would expect the RCU variant here, aka list_for_each_rcu()
>
> These are just two random examples which I found by a quick look.
>
> I guess that the notifier API is very old and it does not use all
> the RCU API features which allow to track safety when
> CONFIG_PROVE_RCU and CONFIG_PROVE_RCU_LIST are enabled.
>
> It actually might be worth to audit the code and make it right.
>
>> #ifdef CONFIG_DEBUG_NOTIFIERS
>> if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
>> WARN(1, "Invalid notifier called!");
>> - nb = next_nb;
>> continue;
>> }
>> #endif
>
> That said, I am not sure if the ftrace/livepatching handlers are
> the right motivation for this. Especially when I see the
> complexity of the 2nd patch [*]
>
> After thinking more about it. I am not even sure that the ftrace and
> livepatching callbacks are good candidates for generic notifiers.
> They are too special. It is not only about ordering them against
> each other. But it is also about ordering them against other
> notifiers. The ftrace/livepatching callbacks must be the first/last
> during module load/release.
>
> [*] The 2nd patch is not archived by lore for some reason.
> I have found only a review but it gives a good picture, see
> https://lore.kernel.org/all/1191caf5-6a61-4622-a15e-854d3701f4fc@suse.com/
>
> Best Regards,
> Petr
>
Thanks.
BR
Song
next prev parent reply other threads:[~2026-04-19 0:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 7:01 [RFC PATCH 1/2] kernel/notifier: replace single-linked list with double-linked list for reverse traversal chensong_2000
2026-04-15 7:40 ` Christoph Hellwig
2026-04-16 10:33 ` Petr Mladek
2026-04-19 0:07 ` Song Chen [this message]
2026-04-16 12:30 ` David Laight
2026-04-16 14:54 ` Petr Mladek
2026-04-16 19:15 ` David Laight
2026-04-19 0:21 ` Song Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7486bf8e-9713-43ed-af4d-ecefec00c980@189.cn \
--to=chensong_2000@189.cn \
--cc=agk@redhat.com \
--cc=atomlin@atomlin.com \
--cc=bmarzins@redhat.com \
--cc=da.gomez@kernel.org \
--cc=danielt@kernel.org \
--cc=davem@davemloft.net \
--cc=dianders@chromium.org \
--cc=dm-devel@lists.linux.dev \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=horms@kernel.org \
--cc=jason.wessel@windriver.com \
--cc=jikos@kernel.org \
--cc=joe.lawrence@redhat.com \
--cc=jpoimboe@kernel.org \
--cc=kgdb-bugreport@lists.sourceforge.net \
--cc=kuba@kernel.org \
--cc=lenb@kernel.org \
--cc=linan122@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-clk@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-modules@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=live-patching@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mbenes@suse.cz \
--cc=mcgrof@kernel.org \
--cc=mhiramat@kernel.org \
--cc=mpatocka@redhat.com \
--cc=mturquette@baylibre.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paulmck@kernel.org \
--cc=petr.pavlu@suse.com \
--cc=pmladek@suse.com \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=samitolvanen@google.com \
--cc=sboyd@kernel.org \
--cc=snitzer@kernel.org \
--cc=song@kernel.org \
--cc=viresh.kumar@linaro.org \
--cc=yukuai@fnnas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox