Re: BUG during shutdown - bisected to commit e2912009

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Marc Dionne <marc.c.dionne@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Xiaotian Feng <dfeng@redhat.com>, Ingo Molnar <mingo@elte.hu>
Subject: Re: BUG during shutdown - bisected to commit e2912009
Date: Sat, 02 Jan 2010 01:42:00 +0100	[thread overview]
Message-ID: <1262392920.32223.10.camel@laptop> (raw)
In-Reply-To: <6041d2001001011627o5c494df4v37c0c466df3d444c@mail.gmail.com>

On Fri, 2010-01-01 at 19:27 -0500, Marc Dionne wrote:
> I'm getting a BUG with current kernels from
> kernel/time/clockevents.c:263 when halting the system - a restart
> behaves normally.  I don't have a good camera handy at the moment to
> capture the call stack on screen, but the call sequence is:
> 
> clockevents_notify
> hrtimer_cpu_notify
> notifier_call_chain
> raw_notifier_call_chain
> _cpu_down
> disable_nonboot_cpus
> kernel_power_off
> sys_reboot
> 
> I bisected it down to commit e2912009: sched: Ensure set_task_cpu() is
> never called on blocked tasks.  There were a few commits tested along
> the way where I got a freeze (with the power still on) instead of a
> BUG. Reverting that commit from the current kernel doesn't look
> trivial, but the commit immediately preceding this one does halt fine.

We somehow seem to trip up the below patch, which doesn't really make
sense, as I can't find how task placement would affect the below error.

It seems to purely test against the hot-unplugged cpu, not a cpu the
task is running on.

---
commit bb6eddf7676e1c1f3e637aa93c5224488d99036f
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Dec 10 15:35:10 2009 +0100

    clockevents: Prevent clockevent_devices list corruption on cpu hotplug
    
    Xiaotian Feng triggered a list corruption in the clock events list on
    CPU hotplug and debugged the root cause.
    
    If a CPU registers more than one per cpu clock event device, then only
    the active clock event device is removed on CPU_DEAD. The unused
    devices are kept in the clock events device list.
    
    On CPU up the clock event devices are registered again, which means
    that we list_add an already enqueued list_head. That results in list
    corruption.
    
    Resolve this by removing all devices which are associated to the dead
    CPU on CPU_DEAD.
    
    Reported-by: Xiaotian Feng <dfeng@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Xiaotian Feng <dfeng@redhat.com>
    Cc: stable@kernel.org

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 20a8920..91db2e3 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -238,8 +238,9 @@ void clockevents_exchange_device(struct clock_event_device *old,
  */
 void clockevents_notify(unsigned long reason, void *arg)
 {
-	struct list_head *node, *tmp;
+	struct clock_event_device *dev, *tmp;
 	unsigned long flags;
+	int cpu;
 
 	spin_lock_irqsave(&clockevents_lock, flags);
 	clockevents_do_notify(reason, arg);
@@ -250,8 +251,19 @@ void clockevents_notify(unsigned long reason, void *arg)
 		 * Unregister the clock event devices which were
 		 * released from the users in the notify chain.
 		 */
-		list_for_each_safe(node, tmp, &clockevents_released)
-			list_del(node);
+		list_for_each_entry_safe(dev, tmp, &clockevents_released, list)
+			list_del(&dev->list);
+		/*
+		 * Now check whether the CPU has left unused per cpu devices
+		 */
+		cpu = *((int *)arg);
+		list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
+			if (cpumask_test_cpu(cpu, dev->cpumask) &&
+			    cpumask_weight(dev->cpumask) == 1) {
+				BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
+				list_del(&dev->list);
+			}
+		}
 		break;
 	default:
 		break;

next prev parent reply	other threads:[~2010-01-02  0:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-02  0:27 BUG during shutdown - bisected to commit e2912009 Marc Dionne
2010-01-02  0:42 ` Peter Zijlstra [this message]
2010-01-04 18:43   ` Marc Dionne
2010-01-05  2:56     ` Xiaotian Feng
2010-01-05  3:23       ` Marc Dionne
2010-01-05 10:18         ` Xiaotian Feng
2010-01-05 22:58           ` Marc Dionne
2010-01-06  9:42             ` Xiaotian Feng
2010-01-07  0:44               ` Marc Dionne
2010-01-07  2:51                 ` Xiaotian Feng
2010-01-07  3:07                   ` Marc Dionne
2010-01-07  3:20                     ` Marc Dionne
2010-01-07  3:24                       ` Xiaotian Feng

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:20a8920 dfblob:91db2e3 )
 OR (
bs:"Re: BUG during shutdown - bisected to commit e2912009" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1262392920.32223.10.camel@laptop \
    --to=peterz@infradead.org \
    --cc=dfeng@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.c.dionne@gmail.com \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.