All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Jones <davej@redhat.com>
To: Linus Torvalds <torvalds@osdl.org>,
	Michal Piotrowski <michal.k.k.piotrowski@gmail.com>,
	Dave Jones <davej@codemonkey.org.uk>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: 2.6.18-rc3-g3b445eea BUG: warning at /usr/src/linux-git/kernel/cpu.c:51/unlock_cpu_hotplug()
Date: Fri, 4 Aug 2006 22:10:51 -0400	[thread overview]
Message-ID: <20060805021051.GA13393@redhat.com> (raw)
In-Reply-To: <20060805003142.GH18792@redhat.com>

On Fri, Aug 04, 2006 at 08:31:42PM -0400, Dave Jones wrote:
 > On Fri, Aug 04, 2006 at 06:24:00PM -0400, Dave Jones wrote:
 >  >  > DaveJ, any ideas?
 >  > 
 >  > So I'm at a loss to explain why this made this bug suddenly appear, but
 >  > I don't think this is anything new.   Fixing it however...
 >  > I'm looking at it, but I don't see anything obvious yet.
 >  
 > Ok, debugging this is getting ridiculous.  I dug out a core duo,
 > built a kernel and watched it spontaneously reboot before it even got to init.

Hmm, this went away after a ccache flush & a rebuild. Old crud left behind
from earlier debugging I guess.

So I added this diff..

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f230f9a..31210a0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -31,6 +31,14 @@ void lock_cpu_hotplug(void)
 {
 	struct task_struct *tsk = current;
 
+	if (jiffies > HZ * 60) {
+		printk ("CPU%d called lock_cpu_hotplug() for app %s. recursive_depth=%d\n",
+			smp_processor_id(),
+			current->comm,
+			recursive_depth);
+		dump_stack();
+	}
+
 	if (tsk == recursive) {
 		static int warnings = 10;
 		if (warnings) {
@@ -42,18 +50,30 @@ void lock_cpu_hotplug(void)
 		return;
 	}
 	mutex_lock(&cpu_bitmask_lock);
+	if (jiffies > HZ * 60)
+		printk ("%s acquired cpu_bitmask_lock\n", current->comm);
 	recursive = tsk;
 }
 EXPORT_SYMBOL_GPL(lock_cpu_hotplug);
 
 void unlock_cpu_hotplug(void)
 {
+	if (jiffies > HZ * 120) {
+		printk ("CPU%d called unlock_cpu_hotplug() for app %s. recursive_depth=%d\n",
+			smp_processor_id(),
+			current->comm,
+			recursive_depth);
+		dump_stack();
+	}
+
 	WARN_ON(recursive != current);
 	if (recursive_depth) {
 		recursive_depth--;
 		return;
 	}
 	mutex_unlock(&cpu_bitmask_lock);
+	if (jiffies > HZ * 120)
+		printk ("%s released cpu_bitmask_lock\n", current->comm);
 	recursive = NULL;
 }
 EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);

In the hope to better understand what the hell is going on.
Here's what happened when I started the cpuspeed service

(That DWARF unwinder stuck thing is really darned annoying btw)

CPU1 called lock_cpu_hotplug() for app modprobe. recursive_depth=0
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c01499c8>] stop_machine_run+0x11/0x38
 [<c0141943>] sys_init_module+0x166d/0x1847
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d
Leftover inexact backtrace:
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c01499c8>] stop_machine_run+0x11/0x38
 [<c0141943>] sys_init_module+0x166d/0x1847
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
modprobe acquired cpu_bitmask_lock

CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c029fbae>] store_scaling_governor+0x142/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d
Leftover inexact backtrace:
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c029fbae>] store_scaling_governor+0x142/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
cpuspeed acquired cpu_bitmask_lock

I have questions.

1. How come the printk at the end of unlock_cpu_hotplug() never
got hit when modprobe got to the end of stop_machine_run() ?

2. How come another process acquired this mutex when the previous
one aparently never released it ?

Moving on ...

CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c0132f3c>] __create_workqueue+0x52/0x122
 [<f8faf34b>] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand]
 [<c029f7b6>] __cpufreq_governor+0x57/0xd8
 [<c029f985>] __cpufreq_set_policy+0x14e/0x1bc
 [<c029fbc5>] store_scaling_governor+0x159/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d
Leftover inexact backtrace:
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c0132f3c>] __create_workqueue+0x52/0x122
 [<f8faf34b>] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand]
 [<c029f7b6>] __cpufreq_governor+0x57/0xd8
 [<c029f985>] __cpufreq_set_policy+0x14e/0x1bc
 [<c029fbc5>] store_scaling_governor+0x159/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
Lukewarm IQ detected in hotplug locking
BUG: warning at kernel/cpu.c:46/lock_cpu_hotplug()
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8fc>] lock_cpu_hotplug+0x72/0xbf
 [<c0132f3c>] __create_workqueue+0x52/0x122
 [<f8faf34b>] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand]
 [<c029f7b6>] __cpufreq_governor+0x57/0xd8
 [<c029f985>] __cpufreq_set_policy+0x14e/0x1bc
 [<c029fbc5>] store_scaling_governor+0x159/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d
Leftover inexact backtrace:
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8fc>] lock_cpu_hotplug+0x72/0xbf
 [<c0132f3c>] __create_workqueue+0x52/0x122
 [<f8faf34b>] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand]
 [<c029f7b6>] __cpufreq_governor+0x57/0xd8
 [<c029f985>] __cpufreq_set_policy+0x14e/0x1bc
 [<c029fbc5>] store_scaling_governor+0x159/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d


Ok, now that it's done insulting my IQ, recursive_depth should be 1.
However ... 


CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c029fbae>] store_scaling_governor+0x142/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d


WTF is going on here ?
In the whole log, recursive_depth is always 0.


At this stage, it gets into a death spiral printing out the same
crap over and over again.  I put the full log at
http://people.redhat.com/davej/hotplug-wtf.txt

Note how the acquire/releases don't match up at all. Scary.

		Dave

-- 
http://www.codemonkey.org.uk

  reply	other threads:[~2006-08-05  2:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-04 19:04 2.6.18-rc3-g3b445eea BUG: warning at /usr/src/linux-git/kernel/cpu.c:51/unlock_cpu_hotplug() Michal Piotrowski
2006-08-04 19:24 ` Linus Torvalds
2006-08-04 22:24   ` Dave Jones
2006-08-05  0:31     ` Dave Jones
2006-08-05  2:10       ` Dave Jones [this message]
2006-08-05  2:23         ` Dave Jones
2006-08-05  2:49           ` Dave Jones
2006-08-05  6:47             ` Dave Jones
2006-08-05  7:46               ` Andrew Morton
2006-08-05 19:47                 ` Dave Jones
2006-08-05 20:02                   ` Andrew Morton
2006-08-06  1:12                 ` Andrew Morton
2006-08-05 10:54               ` Michal Piotrowski
2006-08-05 11:11                 ` Michal Piotrowski
2006-08-05 11:26                   ` Michal Piotrowski
2006-08-05 18:47                     ` Dave Jones
2006-08-05 21:15                       ` Michal Piotrowski
2006-08-06 16:59               ` Michal Piotrowski
2006-08-06 18:05                 ` Dave Jones
2006-08-06 19:32                 ` Andrew Morton
2006-08-07  1:26                   ` Andi Kleen
2006-08-07  8:18                     ` Jan Beulich
2006-08-15 12:23                     ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2006-08-05 22:14 art
2006-08-06  1:23 ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060805021051.GA13393@redhat.com \
    --to=davej@redhat.com \
    --cc=davej@codemonkey.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michal.k.k.piotrowski@gmail.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.