All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jessica Yu <jeyu@kernel.org>
To: Prarit Bhargava <prarit@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	David Arcari <darcari@redhat.com>
Subject: Re: [PATCH v2] modules: Only return -EEXIST for modules that have finished loading
Date: Thu, 9 May 2019 15:53:34 +0200	[thread overview]
Message-ID: <20190509135333.GA9337@linux-8ccs> (raw)
In-Reply-To: <20190507145413.16297-1-prarit@redhat.com>

+++ Prarit Bhargava [07/05/19 10:54 -0400]:
>Heiko, it would still be good to get a test of this patch from you.  I
>tested this here at Red Hat on some System Z machines.  Without the
>modification made here in v2, the systems failed to boot ~10% of the time.
>After the modification I do not see any boot failures.  I also was
>able to reproduce the boot issue with the acpi_cpufreq driver on a very
>large & fast x86 system which had closer to 100% failure rate without
>the changes in v2.  After the modification in v2 the system has rebooted
>all weekend without any issues.
>
>P.
>
>---8<---
>
>Microsoft HyperV disables the X86_FEATURE_SMCA bit on AMD systems, and
>linux guests boot with repeated errors:
>
>amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
>amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
>amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
>amd64_edac_mod: Unknown symbol amd_unregister_ecc_decoder (err -2)
>amd64_edac_mod: Unknown symbol amd_register_ecc_decoder (err -2)
>amd64_edac_mod: Unknown symbol amd_report_gart_errors (err -2)
>
>The warnings occur because the module code erroneously returns -EEXIST
>for modules that have failed to load and are in the process of being
>removed from the module list.
>
>module amd64_edac_mod has a dependency on module edac_mce_amd.  Using
>modules.dep, systemd will load edac_mce_amd for every request of
>amd64_edac_mod.  When the edac_mce_amd module loads, the module has
>state MODULE_STATE_UNFORMED and once the module load fails and the state
>becomes MODULE_STATE_GOING.  Another request for edac_mce_amd module
>executes and add_unformed_module() will erroneously return -EEXIST even
>though the previous instance of edac_mce_amd has MODULE_STATE_GOING.
>Upon receiving -EEXIST, systemd attempts to load amd64_edac_mod, which
>fails because of unknown symbols from edac_mce_amd.
>
>add_unformed_module() must wait to return for any case other than
>MODULE_STATE_LIVE to prevent a race between multiple loads of
>dependent modules.
>
>v2: The initial (old->state != MODULE_STATE_LIVE) change exposed an
>additional issue in the code.  wait_event_interruptible() puts each thread
>to sleep until the a module finishes loading an executes the module_wq
>workqueue.  The result is a long delay during the boot.  Switching to
>wait_event_interruptible_timeout() resolves the sleep problem.
>
>Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>Cc: Jessica Yu <jeyu@kernel.org>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Cc: David Arcari <darcari@redhat.com>

Hi Prarit,

Thanks a lot for the revised patch. I'll queue this up right after the
merge window is over.

Thanks!

Jessica

>---
> kernel/module.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/kernel/module.c b/kernel/module.c
>index 1c429d8d2d74..6c868aabaf37 100644
>--- a/kernel/module.c
>+++ b/kernel/module.c
>@@ -3568,12 +3568,12 @@ static int add_unformed_module(struct module *mod)
> 	mutex_lock(&module_mutex);
> 	old = find_module_all(mod->name, strlen(mod->name), true);
> 	if (old != NULL) {
>-		if (old->state == MODULE_STATE_COMING
>-		    || old->state == MODULE_STATE_UNFORMED) {
>+		if (old->state != MODULE_STATE_LIVE) {
> 			/* Wait in case it fails to load. */
> 			mutex_unlock(&module_mutex);
>-			err = wait_event_interruptible(module_wq,
>-					       finished_loading(mod->name));
>+			err = wait_event_interruptible_timeout(module_wq,
>+					       finished_loading(mod->name),
>+					       HZ/1000);
> 			if (err)
> 				goto out_unlocked;
> 			goto again;
>-- 
>2.18.1
>

      parent reply	other threads:[~2019-05-09 13:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-07 14:54 [PATCH v2] modules: Only return -EEXIST for modules that have finished loading Prarit Bhargava
2019-05-08 10:56 ` Heiko Carstens
2019-05-09 13:53 ` Jessica Yu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190509135333.GA9337@linux-8ccs \
    --to=jeyu@kernel.org \
    --cc=darcari@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=prarit@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.