From: Heiko Carstens <heiko.carstens@de.ibm.com>
To: Prarit Bhargava <prarit@redhat.com>
Cc: Jessica Yu <jeyu@kernel.org>,
linux-next@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-s390@vger.kernel.org, Cathy Avery <cavery@redhat.com>
Subject: Re: [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading"
Date: Sat, 27 Apr 2019 12:24:40 +0200 [thread overview]
Message-ID: <20190427102440.GA28889@osiris> (raw)
In-Reply-To: <f74996cb-3e0a-ab23-00b9-85ac782583d1@redhat.com>
On Fri, Apr 26, 2019 at 08:20:52PM -0400, Prarit Bhargava wrote:
> Heiko and Jessica,
>
> The issue doesn't appear to be with my patch AFAICT. The s390_trng fails to
> load and then the kernel occasionally hangs (as Heiko mentioned) calling
> synchronize_rcu().
>
> The call sequence is
>
> module_load()
> do_init_module()
> do_one_initcall(mod->init)
>
> which fails.
>
> The failure path in do_one_initcall() is entered and we start executing code at
> kernel/module.c:3541
>
> fail_free_freeinit:
> kfree(freeinit);
> fail:
> /* Try to protect us from buggy refcounters. */
> mod->state = MODULE_STATE_GOING;
> synchronize_rcu();
>
> ^^^ the kernel hangs here. Sometimes it's very short and other times it seems
> to hang. I've left systems that appear to be hung and come back after 10
> minutes to find that they've somehow made it through this call.
>
> Is there a known issue with RCU on s390 that is making this occur?
No there is no known issue with RCU on s390. The reason that
synchronize_rcu() doesn't finish is because a different cpu is within
an endless loop in add_unformed_module() just like Jessica suspected.
Note: the kernel is compiled with CONFIG_PREEMPT off - there is no
kernel preemption that will make the looping cpu ever go over schedule
and subsequently let synchronize_rcu() finish.
To confirm Jessicas theory - looking into the dump we have:
crash> bt 742
PID: 742 TASK: 1efa6c000 CPU: 7 COMMAND: "systemd-udevd"
#0 [3e0043aba30] __schedule at abb25e
#1 [3e0043abaa0] schedule at abb6a2
#2 [3e0043abac8] schedule_timeout at abf49a
#3 [3e0043abb60] wait_for_common at abc396
#4 [3e0043abbf0] __wait_rcu_gp at 1c0136
#5 [3e0043abc48] synchronize_rcu at 1c72ea
#6 [3e0043abc98] do_init_module at 1f10be
#7 [3e0043abcf0] load_module at 1f3594
#8 [3e0043abdd0] __se_sys_init_module at 1f3af0
#9 [3e0043abea8] system_call at ac0766
Which is the process waiting for synchronize_rcu to finish. Wading
through the stack frames gives me this struct module:
struct module {
state = MODULE_STATE_GOING,
list = {
next = 0x3ff80394508,
prev = 0xe25090 <modules>
},
name = "s390_trng\000...
...
Then we have the looping task/cpu:
PID: 731 TASK: 1e79ba000 CPU: 7 COMMAND: "systemd-udevd"
LOWCORE INFO:
-psw : 0x0704c00180000000 0x0000000000ab666a
-function : memcmp at ab666a
...
-general registers:
0x0000000000000009 0x0000000000000009
0x000003ff80347321 000000000000000000
0x000003ff8034f321 000000000000000000
0x000000000000001e 0x000003ff8c592708
0x000003e0047da900 0x000003ff8034f318
0x0000000000000001 0x0000000000000009
0x000003ff80347300 0x0000000000ad81b8
0x00000000001ee062 0x000003e004357cb0
#0 [3e004357cf0] load_module at 1f1eb0
#1 [3e004357dd0] __se_sys_init_module at 1f3af0
#2 [3e004357ea8] system_call at ac0766
which is find_module_all() calling memcmp with this string:
3ff80347318: 733339305f74726e 6700000000000000 s390_trng.......
So it all seems to fit. A simple cond_resched() call, which enforces
an RCU quiescent state for the calliung cpu, fixes the problem for me
(patch on top of linux-next 20190424 -- c392798a85ab):
diff --git a/kernel/module.c b/kernel/module.c
index 410eeb7e4f1d..48748cfec991 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3585,6 +3585,7 @@ again:
finished_loading(mod->name));
if (err)
goto out_unlocked;
+ cond_resched();
goto again;
}
err = -EEXIST;
next prev parent reply other threads:[~2019-04-27 10:24 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-26 13:07 [-next] system hangs likely due to "modules: Only return -EEXIST for modules that have finished loading" Heiko Carstens
2019-04-26 13:22 ` Prarit Bhargava
2019-04-26 15:07 ` Heiko Carstens
2019-04-26 16:09 ` Jessica Yu
2019-04-26 17:15 ` Prarit Bhargava
2019-04-26 18:10 ` Prarit Bhargava
2019-04-26 19:45 ` Prarit Bhargava
2019-04-27 0:20 ` Prarit Bhargava
2019-04-27 10:24 ` Heiko Carstens [this message]
2019-04-27 10:35 ` Prarit Bhargava
2019-04-27 10:42 ` Prarit Bhargava
2019-04-29 5:55 ` Heiko Carstens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190427102440.GA28889@osiris \
--to=heiko.carstens@de.ibm.com \
--cc=cavery@redhat.com \
--cc=jeyu@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=prarit@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).