From: Jessica Yu <jeyu@kernel.org>
To: Prarit Bhargava <prarit@redhat.com>
Cc: linux-kernel@vger.kernel.org,
Heiko Carstens <heiko.carstens@de.ibm.com>,
David Arcari <darcari@redhat.com>
Subject: Re: [PATCH v2] modules: Only return -EEXIST for modules that have finished loading
Date: Thu, 9 May 2019 15:53:34 +0200 [thread overview]
Message-ID: <20190509135333.GA9337@linux-8ccs> (raw)
In-Reply-To: <20190507145413.16297-1-prarit@redhat.com>
+++ Prarit Bhargava [07/05/19 10:54 -0400]:
>Heiko, it would still be good to get a test of this patch from you. I
>tested this here at Red Hat on some System Z machines. Without the
>modification made here in v2, the systems failed to boot ~10% of the time.
>After the modification I do not see any boot failures. I also was
>able to reproduce the boot issue with the acpi_cpufreq driver on a very
>large & fast x86 system which had closer to 100% failure rate without
>the changes in v2. After the modification in v2 the system has rebooted
>all weekend without any issues.
>
>P.
>
>---8<---
>
>Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
>linux guests boot with repeated errors:
>
>amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
>amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
>amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
>amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
>amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
>amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
>
>The warnings occur because the module code erroneously returns -EEXIST
>for modules that have failed to load and are in the process of being
>removed from the module list.
>
>module amd64_edac_mod has a dependency on module edac_mce_amd. Using
>modules.dep, systemd will load edac_mce_amd for every request of
>amd64_edac_mod. When the edac_mce_amd module loads, the module has
>state MODULE_STATE_UNFORMED and once the module load fails and the state
>becomes MODULE_STATE_GOING. Another request for edac_mce_amd module
>executes and add_unformed_module() will erroneously return -EEXIST even
>though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
>Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
>fails because of unknown symbols from edac_mce_amd.
>
>add_unformed_module() must wait to return for any case other than
>MODULE_STATE_LIVE to prevent a race between multiple loads of
>dependent modules.
>
>v2: The initial (old->state != MODULE_STATE_LIVE) change exposed an
>additional issue in the code. wait_event_interruptible() puts each thread
>to sleep until the a module finishes loading an executes the module_wq
>workqueue. The result is a long delay during the boot. Switching to
>wait_event_interruptible_timeout() resolves the sleep problem.
>
>Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>Cc: Jessica Yu <jeyu@kernel.org>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Cc: David Arcari <darcari@redhat.com>
Hi Prarit,
Thanks a lot for the revised patch. I'll queue this up right after the
merge window is over.
Thanks!
Jessica
>---
> kernel/module.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/kernel/module.c b/kernel/module.c
>index 1c429d8d2d74..6c868aabaf37 100644
>--- a/kernel/module.c
>+++ b/kernel/module.c
>@@ -3568,12 +3568,12 @@ static int add_unformed_module(struct module *mod)
> mutex_lock(&module_mutex);
> old = find_module_all(mod->name, strlen(mod->name), true);
> if (old != NULL) {
>- if (old->state == MODULE_STATE_COMING
>- || old->state == MODULE_STATE_UNFORMED) {
>+ if (old->state != MODULE_STATE_LIVE) {
> /* Wait in case it fails to load. */
> mutex_unlock(&module_mutex);
>- err = wait_event_interruptible(module_wq,
>- finished_loading(mod->name));
>+ err = wait_event_interruptible_timeout(module_wq,
>+ finished_loading(mod->name),
>+ HZ/1000);
> if (err)
> goto out_unlocked;
> goto again;
>--
>2.18.1
>
prev parent reply other threads:[~2019-05-09 13:53 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-07 14:54 [PATCH v2] modules: Only return -EEXIST for modules that have finished loading Prarit Bhargava
2019-05-08 10:56 ` Heiko Carstens
2019-05-09 13:53 ` Jessica Yu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190509135333.GA9337@linux-8ccs \
--to=jeyu@kernel.org \
--cc=darcari@redhat.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=prarit@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox