public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Jones <davej@redhat.com>
To: Linus Torvalds <torvalds@osdl.org>,
	Michal Piotrowski <michal.k.k.piotrowski@gmail.com>,
	Dave Jones <davej@codemonkey.org.uk>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: 2.6.18-rc3-g3b445eea BUG: warning at /usr/src/linux-git/kernel/cpu.c:51/unlock_cpu_hotplug()
Date: Fri, 4 Aug 2006 22:10:51 -0400	[thread overview]
Message-ID: <20060805021051.GA13393@redhat.com> (raw)
In-Reply-To: <20060805003142.GH18792@redhat.com>

On Fri, Aug 04, 2006 at 08:31:42PM -0400, Dave Jones wrote:
 > On Fri, Aug 04, 2006 at 06:24:00PM -0400, Dave Jones wrote:
 >  >  > DaveJ, any ideas?
 >  > 
 >  > So I'm at a loss to explain why this made this bug suddenly appear, but
 >  > I don't think this is anything new.   Fixing it however...
 >  > I'm looking at it, but I don't see anything obvious yet.
 >  
 > Ok, debugging this is getting ridiculous.  I dug out a core duo,
 > built a kernel and watched it spontaneously reboot before it even got to init.

Hmm, this went away after a ccache flush & a rebuild. Old crud left behind
from earlier debugging I guess.

So I added this diff..

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f230f9a..31210a0 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -31,6 +31,14 @@ void lock_cpu_hotplug(void)
 {
 	struct task_struct *tsk = current;
 
+	if (jiffies > HZ * 60) {
+		printk ("CPU%d called lock_cpu_hotplug() for app %s. recursive_depth=%d\n",
+			smp_processor_id(),
+			current->comm,
+			recursive_depth);
+		dump_stack();
+	}
+
 	if (tsk == recursive) {
 		static int warnings = 10;
 		if (warnings) {
@@ -42,18 +50,30 @@ void lock_cpu_hotplug(void)
 		return;
 	}
 	mutex_lock(&cpu_bitmask_lock);
+	if (jiffies > HZ * 60)
+		printk ("%s acquired cpu_bitmask_lock\n", current->comm);
 	recursive = tsk;
 }
 EXPORT_SYMBOL_GPL(lock_cpu_hotplug);
 
 void unlock_cpu_hotplug(void)
 {
+	if (jiffies > HZ * 120) {
+		printk ("CPU%d called unlock_cpu_hotplug() for app %s. recursive_depth=%d\n",
+			smp_processor_id(),
+			current->comm,
+			recursive_depth);
+		dump_stack();
+	}
+
 	WARN_ON(recursive != current);
 	if (recursive_depth) {
 		recursive_depth--;
 		return;
 	}
 	mutex_unlock(&cpu_bitmask_lock);
+	if (jiffies > HZ * 120)
+		printk ("%s released cpu_bitmask_lock\n", current->comm);
 	recursive = NULL;
 }
 EXPORT_SYMBOL_GPL(unlock_cpu_hotplug);

In the hope to better understand what the hell is going on.
Here's what happened when I started the cpuspeed service

(That DWARF unwinder stuck thing is really darned annoying btw)

CPU1 called lock_cpu_hotplug() for app modprobe. recursive_depth=0
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c01499c8>] stop_machine_run+0x11/0x38
 [<c0141943>] sys_init_module+0x166d/0x1847
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d
Leftover inexact backtrace:
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c01499c8>] stop_machine_run+0x11/0x38
 [<c0141943>] sys_init_module+0x166d/0x1847
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
modprobe acquired cpu_bitmask_lock

CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c029fbae>] store_scaling_governor+0x142/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d
Leftover inexact backtrace:
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c029fbae>] store_scaling_governor+0x142/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
cpuspeed acquired cpu_bitmask_lock

I have questions.

1. How come the printk at the end of unlock_cpu_hotplug() never
got hit when modprobe got to the end of stop_machine_run() ?

2. How come another process acquired this mutex when the previous
one aparently never released it ?

Moving on ...

CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c0132f3c>] __create_workqueue+0x52/0x122
 [<f8faf34b>] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand]
 [<c029f7b6>] __cpufreq_governor+0x57/0xd8
 [<c029f985>] __cpufreq_set_policy+0x14e/0x1bc
 [<c029fbc5>] store_scaling_governor+0x159/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d
Leftover inexact backtrace:
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c0132f3c>] __create_workqueue+0x52/0x122
 [<f8faf34b>] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand]
 [<c029f7b6>] __cpufreq_governor+0x57/0xd8
 [<c029f985>] __cpufreq_set_policy+0x14e/0x1bc
 [<c029fbc5>] store_scaling_governor+0x159/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
Lukewarm IQ detected in hotplug locking
BUG: warning at kernel/cpu.c:46/lock_cpu_hotplug()
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8fc>] lock_cpu_hotplug+0x72/0xbf
 [<c0132f3c>] __create_workqueue+0x52/0x122
 [<f8faf34b>] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand]
 [<c029f7b6>] __cpufreq_governor+0x57/0xd8
 [<c029f985>] __cpufreq_set_policy+0x14e/0x1bc
 [<c029fbc5>] store_scaling_governor+0x159/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d
DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d
Leftover inexact backtrace:
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8fc>] lock_cpu_hotplug+0x72/0xbf
 [<c0132f3c>] __create_workqueue+0x52/0x122
 [<f8faf34b>] cpufreq_governor_dbs+0x9f/0x2c3 [cpufreq_ondemand]
 [<c029f7b6>] __cpufreq_governor+0x57/0xd8
 [<c029f985>] __cpufreq_set_policy+0x14e/0x1bc
 [<c029fbc5>] store_scaling_governor+0x159/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d


Ok, now that it's done insulting my IQ, recursive_depth should be 1.
However ... 


CPU1 called lock_cpu_hotplug() for app cpuspeed. recursive_depth=0
 [<c0104edc>] show_trace_log_lvl+0x58/0x152
 [<c01054c2>] show_trace+0xd/0x10
 [<c01055db>] dump_stack+0x19/0x1b
 [<c013e8c3>] lock_cpu_hotplug+0x39/0xbf
 [<c029fbae>] store_scaling_governor+0x142/0x1a3
 [<c029f1a5>] store+0x37/0x48
 [<c01a6561>] sysfs_write_file+0xab/0xd1
 [<c016f99f>] vfs_write+0xab/0x157
 [<c016ffe4>] sys_write+0x3b/0x60
 [<c0103db9>] sysenter_past_esp+0x56/0x8d


WTF is going on here ?
In the whole log, recursive_depth is always 0.


At this stage, it gets into a death spiral printing out the same
crap over and over again.  I put the full log at
http://people.redhat.com/davej/hotplug-wtf.txt

Note how the acquire/releases don't match up at all. Scary.

		Dave

-- 
http://www.codemonkey.org.uk

  reply	other threads:[~2006-08-05  2:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-04 19:04 2.6.18-rc3-g3b445eea BUG: warning at /usr/src/linux-git/kernel/cpu.c:51/unlock_cpu_hotplug() Michal Piotrowski
2006-08-04 19:24 ` Linus Torvalds
2006-08-04 22:24   ` Dave Jones
2006-08-05  0:31     ` Dave Jones
2006-08-05  2:10       ` Dave Jones [this message]
2006-08-05  2:23         ` Dave Jones
2006-08-05  2:49           ` Dave Jones
2006-08-05  6:47             ` Dave Jones
2006-08-05  7:46               ` Andrew Morton
2006-08-05 19:47                 ` Dave Jones
2006-08-05 20:02                   ` Andrew Morton
2006-08-06  1:12                 ` Andrew Morton
2006-08-05 10:54               ` Michal Piotrowski
2006-08-05 11:11                 ` Michal Piotrowski
2006-08-05 11:26                   ` Michal Piotrowski
2006-08-05 18:47                     ` Dave Jones
2006-08-05 21:15                       ` Michal Piotrowski
2006-08-06 16:59               ` Michal Piotrowski
2006-08-06 18:05                 ` Dave Jones
2006-08-06 19:32                 ` Andrew Morton
2006-08-07  1:26                   ` Andi Kleen
2006-08-07  8:18                     ` Jan Beulich
2006-08-15 12:23                     ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2006-08-05 22:14 art
2006-08-06  1:23 ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060805021051.GA13393@redhat.com \
    --to=davej@redhat.com \
    --cc=davej@codemonkey.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michal.k.k.piotrowski@gmail.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox