All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.17 x86_64 regression - reboot fails due to deadlock
@ 2006-07-06 16:18 Mr. Berkley Shands
  2006-07-06 16:23 ` Arjan van de Ven
  2006-07-07  0:24 ` Andrew Morton
  0 siblings, 2 replies; 3+ messages in thread
From: Mr. Berkley Shands @ 2006-07-06 16:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: Dave Lloyd

With a SuperMicro H8DC8 (nvidia chipset), Dual Opteron 285's, 16GB, 
Centos 4.3 -

Under 2.6.16 both the tyan 2895 and the supermicro H8DC8 both will 
reboot corectly,
in kernel/sys.c machine_restart() gets called. But with the changes to 
sys.c under 2.6.17,
a new path is introduced, calling void kernel_restart_prepare(char *cmd)
which calls blocking_notifier_call_chain(&reboot_notifier_list, 
SYS_RESTART, cmd); (line 588)
Which looks at the first element of the notifier list, and blocks 
forever. But ONLY on the supermicro.
The tyan, a very similar motherboard does not deadlock. It returns and 
still calls machine_restart().
So neither reboot nor "shutdown -fh now" actually get to the bios calls.

on the supermicro, (linux-2.6.17/kernel/sys.c)

static int __kprobes notifier_call_chain(struct notifier_block **nl,
                unsigned long val, void *v)
{
        int ret = NOTIFY_DONE;
        struct notifier_block *nb;

        nb = rcu_dereference(*nl);
        while (nb) {
                ret = nb->notifier_call(nb, val, v);         /* this is 
the deadlock for the first entry */
                if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
                        break;
                nb = rcu_dereference(nb->next);
        }
        return ret;
}

I see that 2.6.18 reworks this code further.

If I want to hurt myself really, really badly, disabling the call to 
blocking_notifier_call_chain(&reboot_notifier_list,...
restores the reboot/power off functions.

In kdb, the system sits idle awaiting something to schedule, but nothing 
will schedule since there is
a deadlock on the supermicro. Any clues as to how to find which notifier 
is deadlocked?

berkley





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 2.6.17 x86_64 regression - reboot fails due to deadlock
  2006-07-06 16:18 2.6.17 x86_64 regression - reboot fails due to deadlock Mr. Berkley Shands
@ 2006-07-06 16:23 ` Arjan van de Ven
  2006-07-07  0:24 ` Andrew Morton
  1 sibling, 0 replies; 3+ messages in thread
From: Arjan van de Ven @ 2006-07-06 16:23 UTC (permalink / raw)
  To: Mr. Berkley Shands; +Cc: linux-kernel, Dave Lloyd

> In kdb, the system sits idle awaiting something to schedule, but nothing 
> will schedule since there is
> a deadlock on the supermicro. Any clues as to how to find which notifier 
> is deadlocked?

Hi,


if it's really a deadlock, then lockdep (new in 2.6.18-rc1) ought to
find it... just enable the various locking debug options in the -rc1
kernel and... it's active.


Greetings,
  Arjan van de Ven


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 2.6.17 x86_64 regression - reboot fails due to deadlock
  2006-07-06 16:18 2.6.17 x86_64 regression - reboot fails due to deadlock Mr. Berkley Shands
  2006-07-06 16:23 ` Arjan van de Ven
@ 2006-07-07  0:24 ` Andrew Morton
  1 sibling, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2006-07-07  0:24 UTC (permalink / raw)
  To: Mr. Berkley Shands; +Cc: linux-kernel, dlloyd

"Mr. Berkley Shands" <bshands@exegy.com> wrote:
>
> With a SuperMicro H8DC8 (nvidia chipset), Dual Opteron 285's, 16GB, 
> Centos 4.3 -
> 
> Under 2.6.16 both the tyan 2895 and the supermicro H8DC8 both will 
> reboot corectly,
> in kernel/sys.c machine_restart() gets called. But with the changes to 
> sys.c under 2.6.17,
> a new path is introduced, calling void kernel_restart_prepare(char *cmd)
> which calls blocking_notifier_call_chain(&reboot_notifier_list, 
> SYS_RESTART, cmd); (line 588)
> Which looks at the first element of the notifier list, and blocks 
> forever. But ONLY on the supermicro.
> The tyan, a very similar motherboard does not deadlock. It returns and 
> still calls machine_restart().
> So neither reboot nor "shutdown -fh now" actually get to the bios calls.
> 
> on the supermicro, (linux-2.6.17/kernel/sys.c)
> 
> static int __kprobes notifier_call_chain(struct notifier_block **nl,
>                 unsigned long val, void *v)
> {
>         int ret = NOTIFY_DONE;
>         struct notifier_block *nb;
> 
>         nb = rcu_dereference(*nl);
>         while (nb) {
>                 ret = nb->notifier_call(nb, val, v);         /* this is 
> the deadlock for the first entry */
>                 if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
>                         break;
>                 nb = rcu_dereference(nb->next);
>         }
>         return ret;
> }
> 
> I see that 2.6.18 reworks this code further.
> 
> If I want to hurt myself really, really badly, disabling the call to 
> blocking_notifier_call_chain(&reboot_notifier_list,...
> restores the reboot/power off functions.
> 
> In kdb, the system sits idle awaiting something to schedule, but nothing 
> will schedule since there is
> a deadlock on the supermicro. Any clues as to how to find which notifier 
> is deadlocked?
> 

Are you able to do sysrq-T when it's stuck?

Something like this...
diff -puN kernel/sys.c~a kernel/sys.c

--- a/kernel/sys.c~a
+++ a/kernel/sys.c
@@ -70,6 +70,8 @@
 int overflowuid = DEFAULT_OVERFLOWUID;
 int overflowgid = DEFAULT_OVERFLOWGID;
 
+static int foo;
+
 #ifdef CONFIG_UID16
 EXPORT_SYMBOL(overflowuid);
 EXPORT_SYMBOL(overflowgid);
@@ -141,6 +143,9 @@ static int __kprobes notifier_call_chain
 	nb = rcu_dereference(*nl);
 	while (nb) {
 		next_nb = rcu_dereference(nb->next);
+		if (foo)
+			print_symbol("calling %s()\n",
+				(unsigned long)nb->notifier_call);
 		ret = nb->notifier_call(nb, val, v);
 		if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
 			break;
@@ -590,6 +595,7 @@ EXPORT_SYMBOL_GPL(emergency_restart);
 
 static void kernel_restart_prepare(char *cmd)
 {
+	foo = 1;
 	blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
 	system_state = SYSTEM_RESTART;
 	device_shutdown();
_


Be aware that there's a known lock_cpu_hotplug()-vs-cpufreq deadlock, but
afaik it's only been reported during suspend.  Disabling CONFIG_HOTPLUG_CPU
might make a difference.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-07-07  0:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-06 16:18 2.6.17 x86_64 regression - reboot fails due to deadlock Mr. Berkley Shands
2006-07-06 16:23 ` Arjan van de Ven
2006-07-07  0:24 ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.