From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87E253B9DAC for ; Thu, 16 Apr 2026 13:09:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776344979; cv=none; b=Df8IqBNAsqoy8l3Y4pLlkwsEOo176IISszZYGuLu3avpG+isUNJmW/XwRKiPVyJQbQz/sFX/3h0h3mGS5hVZ2ktOvoDdx82ltDALecxyTi3+krkWvbn8oHehnpFqhDVLzC/rLA1/jDvfktx4fKmpqKztRi4WTRFDSRdbG9vc+ZM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776344979; c=relaxed/simple; bh=eVmjde+k93xycVxx5TzpLj5SCL1rx1KrvC1l3ql6brE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=AxvTaWj9W5uGkV8VUjXScpVt8EiALFIZShX0edSjOLXOKV28LkbmljlpHLtPeqvCZ8zYuu9br5NbQj9bYqbTt4nOFD8v3srQc2TMEbm9ikbrARFlasFFPPbGRvOHxNpanAI6luJoPOAE0QxBmvWecEuVMdPg222Wc2Q6BCGgu8M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=gWVkabXn; arc=none smtp.client-ip=209.85.128.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="gWVkabXn" Received: by mail-wm1-f50.google.com with SMTP id 5b1f17b1804b1-488ba840146so79001835e9.1 for ; Thu, 16 Apr 2026 06:09:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1776344975; x=1776949775; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=m3piWCtfpN3U6w8bKY3YeNvzMYnQTdnw3h4KjP2i/sk=; b=gWVkabXnsOnvoM3GUthhg8haRYPkHddoedkvckM6BiaWRjb06CqxgY1ZFpHCK6p2WQ FeqGPchIhpZXb0+i22atTHjtQiVxh/L3GaX0sweeDZNZh0F0BqZNyig0cvTJIlkZOWrU iafVJOHGilIbYuSWy9/9iW9WyzDmm/DAkiMmIoAp3Kr55fvy4h49B0CstsHsF0gAOsMP k2vDVLT3F5J56OpiQi1aRT8PZh7o1ZnusmrX9FtcSuPjXgt4jpR7uV6jXUFW+6bTdZK+ bfmF+CPLNC4k007+B1ngA7VPideyNIXpDY5g7gjNDJuM/JGoIeOAqBG39/Z81HGf0hxO cXOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776344975; x=1776949775; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=m3piWCtfpN3U6w8bKY3YeNvzMYnQTdnw3h4KjP2i/sk=; b=l3em0MvUuihLPuw0KHJuhxlCNVGWH6TNcVKA6niRv07Rlhd0Gy5ZjZ1N3tBzIyxU+Y RvWJU/+e2IdIxuCPRtEjLaL+LI7QxIJ7W+4ErGgU8043JxredBH8w3CPMtadGLvS9JgN ULSxetNztHbdNJAUtYAbn+lr0hqjwekkLnqDsPw5GUCbME8zuoA/wZ9/1rLDAmiRAzg8 o0T+O6R0ER1F+czVX9fUqNrTVhAAt+CW8sxgrLIjJJ+o2uRqCIaxemly1eSruFRqMiwi SDUevgDi63oWrWZAOkiTWAmqHCqyGWCOM6UfQmJWnFhNtXZ9z2BKC+3V4KaOZEUgbHru Ip+g== X-Forwarded-Encrypted: i=1; AFNElJ89MfdlKz4DkuWb0GJBs+HMJBBC6bl0XsnE8b9jcd2cxigKkluOwdHwpmv7RiNDNiZiHuXXFVZKXxzo@vger.kernel.org X-Gm-Message-State: AOJu0YzHjZI63nSo24LDJOqPfYPHwSgeqt9r1A61WRfn6KedwF6Chs1f 55oS/sC9Qq3IWj98r+n3GlNBoE5Z/C04RBg+/Hb02+sS9Hlr8G4QzxK/tWWMXxFFHvk= X-Gm-Gg: AeBDiev7iCKjcat6QST9wp3R1jX4lyp/KHpzi1Rj9Hg2ibSvf5YVlBx7PpVl7N2DPd6 tjSlCdU3xH05Iw9Mx1OE0t7q1KBHMILrxAq2kEEgJqCucYHH8yFN6gcUVcuhyGgRa1dBJKQQohb F5eR1FSbLlJzbnJ8uUF+hTynlBx1i9whw70r3O8j4S2U4DWOxCqy6srLhhNxKO78MOt2rQXbCxB UgjCWAjZ1/+sPNly8A0yf1UKOeP7EshjBYdvfTJvFR8KNDG2fJhpLIJkJInBFls/5qJkICAVO+4 FxdXGiRd/GygjiC9G6jl6VyHZn25eR4WnlEMDGVdiMluRfN+S9iPVWHzDKW/5lZH7MVaC6wrHK6 tZnhqV+sgyH6vraf9qamWzHexmN1KKhrfhT40Mk4i2yWNWBqsotrPUu9Jkr0cCKDdhgzHitRIzt XHn4Z6550shLNZGTUP0FHEsOks+w== X-Received: by 2002:a05:600d:8449:b0:487:55c:e0c1 with SMTP id 5b1f17b1804b1-488d68364fdmr279402625e9.14.1776344974877; Thu, 16 Apr 2026 06:09:34 -0700 (PDT) Received: from pathway.suse.cz ([176.114.240.130]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-488f585cefdsm53796665e9.14.2026.04.16.06.09.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Apr 2026 06:09:34 -0700 (PDT) Date: Thu, 16 Apr 2026 15:09:31 +0200 From: Petr Mladek To: Song Chen Cc: Petr Pavlu , rafael@kernel.org, lenb@kernel.org, mturquette@baylibre.com, sboyd@kernel.org, viresh.kumar@linaro.org, agk@redhat.com, snitzer@kernel.org, mpatocka@redhat.com, bmarzins@redhat.com, song@kernel.org, yukuai@fnnas.com, linan122@huawei.com, jason.wessel@windriver.com, danielt@kernel.org, dianders@chromium.org, horms@kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, paulmck@kernel.org, frederic@kernel.org, mcgrof@kernel.org, da.gomez@kernel.org, samitolvanen@google.com, atomlin@atomlin.com, jpoimboe@kernel.org, jikos@kernel.org, mbenes@suse.cz, joe.lawrence@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, mark.rutland@arm.com, mathieu.desnoyers@efficios.com, linux-modules@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-clk@vger.kernel.org, linux-pm@vger.kernel.org, live-patching@vger.kernel.org, dm-devel@lists.linux.dev, linux-raid@vger.kernel.org, kgdb-bugreport@lists.sourceforge.net, netdev@vger.kernel.org Subject: Re: [RFC PATCH 2/2] kernel/module: Decouple klp and ftrace from load_module Message-ID: References: <20260413080701.180976-1-chensong_2000@189.cn> <1191caf5-6a61-4622-a15e-854d3701f4fc@suse.com> Precedence: bulk X-Mailing-List: linux-raid@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed 2026-04-15 14:43:53, Song Chen wrote: > Hi, > > On 4/14/26 22:33, Petr Pavlu wrote: > > On 4/13/26 10:07 AM, chensong_2000@189.cn wrote: > > > From: Song Chen > > > > > > ftrace and livepatch currently have their module load/unload callbacks > > > hard-coded in the module loader as direct function calls to > > > ftrace_module_enable(), klp_module_coming(), klp_module_going() > > > and ftrace_release_mod(). This tight coupling was originally introduced > > > to enforce strict call ordering that could not be guaranteed by the > > > module notifier chain, which only supported forward traversal. Their > > > notifiers were moved in and out back and forth. see [1] and [2]. > > > > I'm unclear about what is meant by the notifiers being moved back and > > forth. The links point to patches that converted ftrace+klp from using > > module notifiers to explicit callbacks due to ordering issues, but this > > switch occurred only once. Have there been other attempts to use > > notifiers again? > > > > > diff --git a/include/linux/module.h b/include/linux/module.h > > > index 14f391b186c6..0bdd56f9defd 100644 > > > --- a/include/linux/module.h > > > +++ b/include/linux/module.h > > > @@ -308,6 +308,14 @@ enum module_state { > > > MODULE_STATE_COMING, /* Full formed, running module_init. */ > > > MODULE_STATE_GOING, /* Going away. */ > > > MODULE_STATE_UNFORMED, /* Still setting it up. */ > > > + MODULE_STATE_FORMED, > > > > I don't see a reason to add a new module state. Why is it necessary and > > how does it fit with the existing states? > > > because once notifier fails in state MODULE_STATE_UNFORMED (now only ftrace > has someting to do in this state), notifier chain will roll back by calling > blocking_notifier_call_chain_robust, i'm afraid MODULE_STATE_GOING is going > to jeopardise the notifers which don't handle it appropriately, like: > > case MODULE_STATE_COMING: > kmalloc(); > case MODULE_STATE_GOING: > kfree(); > > > > > +}; > > > + > > > +enum module_notifier_prio { > > > + MODULE_NOTIFIER_PRIO_LOW = INT_MIN, /* Low prioroty, coming last, going first */ > > > + MODULE_NOTIFIER_PRIO_MID = 0, /* Normal priority. */ > > > + MODULE_NOTIFIER_PRIO_SECOND_HIGH = INT_MAX - 1, /* Second high priorigy, coming second*/ > > > + MODULE_NOTIFIER_PRIO_HIGH = INT_MAX, /* High priorigy, coming first, going late. */ > > > > I suggest being explicit about how the notifiers are ordered. For > > example: > > > > enum module_notifier_prio { > > MODULE_NOTIFIER_PRIO_NORMAL, /* Normal priority, coming last, going first. */ > > MODULE_NOTIFIER_PRIO_LIVEPATCH, > > MODULE_NOTIFIER_PRIO_FTRACE, /* High priority, coming first, going late. */ > > }; > > I like the explicit PRIO_LIVEPATCH/FTRACE names. But I would keep the INT_MAX - 1 and INT_MAX priorities. I believe that ftrace/livepatching will always be the first/last to call. And INT_MAX would help to preserve kABI when PRIO_NORMAL is not enough for the rest of notifiers. That said, I am not sure whether this is worth the effort. This patch tries to move the explicit callbacks in a generic notifiers API. But it will still need to use some explictly defined (reserved) priorities. And it will not guarantee a misuse. Also it requires the double linked list which complicates the notifiers code. > > > }; > > > struct mod_tree_node { > > > --- a/kernel/module/main.c > > > +++ b/kernel/module/main.c > > > @@ -3281,20 +3277,14 @@ static int complete_formation(struct module *mod, struct load_info *info) > > > return err; > > > } > > > -static int prepare_coming_module(struct module *mod) > > > +static int prepare_module_state_transaction(struct module *mod, > > > + unsigned long val_up, unsigned long val_down) > > > { > > > int err; > > > - ftrace_module_enable(mod); > > > - err = klp_module_coming(mod); > > > - if (err) > > > - return err; > > > - > > > err = blocking_notifier_call_chain_robust(&module_notify_list, > > > - MODULE_STATE_COMING, MODULE_STATE_GOING, mod); > > > + val_up, val_down, mod); > > > err = notifier_to_errno(err); > > > - if (err) > > > - klp_module_going(mod); > > > return err; > > > } I personally find the name "prepare_module_state_transaction" misleading. What is the "transaction" here? If this was a "preparation" step then where is the transaction done/finished? It might be better to just opencode the blocking_notifier_call_chain_robust() instead. > > > @@ -3468,14 +3458,21 @@ static int load_module(struct load_info *info, const char __user *uargs, > > > init_build_id(mod, info); > > > /* Ftrace init must be called in the MODULE_STATE_UNFORMED state */ > > > - ftrace_module_init(mod); > > > + err = prepare_module_state_transaction(mod, > > > + MODULE_STATE_UNFORMED, MODULE_STATE_FORMED); > > > > I believe val_down should be MODULE_STATE_GOING to reverse the > > operation. Why is the new state MODULE_STATE_FORMED needed here? > to avoid this: > > case MODULE_STATE_COMING: > kmalloc(); > case MODULE_STATE_GOING: > kfree(); Hmm, the module is in "FORMED" state here. > > > + if (err) > > > + goto ddebug_cleanup; > > > /* Finally it's fully formed, ready to start executing. */ > > > err = complete_formation(mod, info); And we call "complete_formation()" function. This sounds like it was not really "FORMED" before. => It is confusing and nono. Please, try to avoid the new state if possible. My experience with reading the module loader code is that any new state brings a lot of complexity. You need to take it into account when checking correctness of other changes, features, ... Something tells me that if the state was not needed before then we could avoid it. > > > - if (err) > > > + if (err) { > > > + blocking_notifier_call_chain_reverse(&module_notify_list, > > > + MODULE_STATE_FORMED, mod); > > > goto ddebug_cleanup; > > > + } > > > - err = prepare_coming_module(mod); > > > + err = prepare_module_state_transaction(mod, > > > + MODULE_STATE_COMING, MODULE_STATE_GOING); > > > if (err) > > > goto bug_cleanup; > > > --- a/kernel/trace/ftrace.c > > > +++ b/kernel/trace/ftrace.c > > > @@ -5241,6 +5241,44 @@ static int __init ftrace_mod_cmd_init(void) > > > } > > > core_initcall(ftrace_mod_cmd_init); > > > +static int ftrace_module_callback(struct notifier_block *nb, unsigned long op, > > > + void *module) > > > +{ > > > + struct module *mod = module; > > > + > > > + switch (op) { > > > + case MODULE_STATE_UNFORMED: > > > + ftrace_module_init(mod); > > > + break; > > > + case MODULE_STATE_COMING: > > > + ftrace_module_enable(mod); > > > + break; > > > + case MODULE_STATE_LIVE: > > > + ftrace_free_mem(mod, mod->mem[MOD_INIT_TEXT].base, > > > + mod->mem[MOD_INIT_TEXT].base + mod->mem[MOD_INIT_TEXT].size); > > > + break; > > > + case MODULE_STATE_GOING: > > > + case MODULE_STATE_FORMED: > > > + ftrace_release_mod(mod); This calls "release" in a "FORMED" state. It does not make any sense. Something looks fishy, either the code or the naming. > > > + break; > > > + default: > > > + break; > > > + } > > I am sorry for being so picky about names. I believe that good names help to prevent bugs and reduce headaches. Best Regards, Petr