All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Chiang <achiang@hp.com>
To: tony.luck@intel.com,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	stable@kernel.org, linux-ia64@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq
Date: Mon, 09 Feb 2009 21:17:43 +0000	[thread overview]
Message-ID: <20090209211743.GA3939@ldl.fc.hp.com> (raw)
In-Reply-To: <20090209181616.GE19064@ldl.fc.hp.com>

Hi Tony,

* Alex Chiang <achiang@hp.com>:
> This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.
> 
> Commit e7b14036 removes the targetted disabled CPU from the
> cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

I'm currently testing the patch below as a v3.

> Paul McKenney states that the reasoning behind the patch was to
> prevent irq handlers from running on CPUs marked offline because:
> 
> 	RCU happily ignores CPUs that don't have their bits set in
> 	cpu_online_map, so if there are RCU read-side critical sections
> 	in the irq handlers being run, RCU will ignore them.  If the
> 	other CPUs were running, they might sequence through the RCU
> 	state machine, which could result in data structures being
> 	yanked out from under those irq handlers, which in turn could
> 	result in oopses or worse.
> 
> Unfortunately, both ia64 functions above look at cpu_online_map to find
> a new CPU to migrate interrupts onto. This means we can potentially
> migrate an interrupt off ourself back to... ourself. Uh oh.

v3 uses cpu_active_mask to find an interrupt migration target.
This should fix both the oops we were seeing as well as avoid the
issues with RCU that Paul mentions above.

I also think that this fix is simpler for us to think through
rather than making Paul think through the implications of
changing RCU to use cpu_active_mask. :)

So far, it's survived ~45 minutes on my simple test bed (without
any patches, it usually crashes in < 15 minutes). I'm about to
start a longer run on our complex test system that runs under
heavy load.

Hopefully I'll have some results for tomorrow, in which case I'll
send a proper patch.

Thanks.

/ac

diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
index a58f64c..9eaab3c 100644
--- a/arch/ia64/kernel/irq.c
+++ b/arch/ia64/kernel/irq.c
@@ -155,7 +155,7 @@ static void migrate_irqs(void)
 			 */
 			vectors_in_migration[irq] = irq;
 
-			new_cpu = cpumask_any(cpu_online_mask);
+			new_cpu = cpumask_any(cpu_active_mask);
 
 			/*
 			 * Al three are essential, currently WARN_ON.. maybe panic?
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 1146399..4e8765d 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -694,7 +694,7 @@ int migrate_platform_irqs(unsigned int cpu)
 			/*
 			 * Now re-target the CPEI to a different processor
 			 */
-			new_cpei_cpu = any_online_cpu(cpu_online_map);
+			new_cpei_cpu = cpumask_any(cpu_active_mask);
 			mask = cpumask_of(new_cpei_cpu);
 			set_cpei_target_cpu(new_cpei_cpu);
 			desc = irq_desc + ia64_cpe_irq;

WARNING: multiple messages have this Message-ID (diff)
From: Alex Chiang <achiang@hp.com>
To: tony.luck@intel.com,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	stable@kernel.org, linux-ia64@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs"
Date: Mon, 9 Feb 2009 14:17:43 -0700	[thread overview]
Message-ID: <20090209211743.GA3939@ldl.fc.hp.com> (raw)
In-Reply-To: <20090209181616.GE19064@ldl.fc.hp.com>

Hi Tony,

* Alex Chiang <achiang@hp.com>:
> This reverts commit e7b140365b86aaf94374214c6f4e6decbee2eb0a.
> 
> Commit e7b14036 removes the targetted disabled CPU from the
> cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

I'm currently testing the patch below as a v3.

> Paul McKenney states that the reasoning behind the patch was to
> prevent irq handlers from running on CPUs marked offline because:
> 
> 	RCU happily ignores CPUs that don't have their bits set in
> 	cpu_online_map, so if there are RCU read-side critical sections
> 	in the irq handlers being run, RCU will ignore them.  If the
> 	other CPUs were running, they might sequence through the RCU
> 	state machine, which could result in data structures being
> 	yanked out from under those irq handlers, which in turn could
> 	result in oopses or worse.
> 
> Unfortunately, both ia64 functions above look at cpu_online_map to find
> a new CPU to migrate interrupts onto. This means we can potentially
> migrate an interrupt off ourself back to... ourself. Uh oh.

v3 uses cpu_active_mask to find an interrupt migration target.
This should fix both the oops we were seeing as well as avoid the
issues with RCU that Paul mentions above.

I also think that this fix is simpler for us to think through
rather than making Paul think through the implications of
changing RCU to use cpu_active_mask. :)

So far, it's survived ~45 minutes on my simple test bed (without
any patches, it usually crashes in < 15 minutes). I'm about to
start a longer run on our complex test system that runs under
heavy load.

Hopefully I'll have some results for tomorrow, in which case I'll
send a proper patch.

Thanks.

/ac

diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
index a58f64c..9eaab3c 100644
--- a/arch/ia64/kernel/irq.c
+++ b/arch/ia64/kernel/irq.c
@@ -155,7 +155,7 @@ static void migrate_irqs(void)
 			 */
 			vectors_in_migration[irq] = irq;
 
-			new_cpu = cpumask_any(cpu_online_mask);
+			new_cpu = cpumask_any(cpu_active_mask);
 
 			/*
 			 * Al three are essential, currently WARN_ON.. maybe panic?
diff --git a/arch/ia64/kernel/smpboot.c b/arch/ia64/kernel/smpboot.c
index 1146399..4e8765d 100644
--- a/arch/ia64/kernel/smpboot.c
+++ b/arch/ia64/kernel/smpboot.c
@@ -694,7 +694,7 @@ int migrate_platform_irqs(unsigned int cpu)
 			/*
 			 * Now re-target the CPEI to a different processor
 			 */
-			new_cpei_cpu = any_online_cpu(cpu_online_map);
+			new_cpei_cpu = cpumask_any(cpu_active_mask);
 			mask = cpumask_of(new_cpei_cpu);
 			set_cpei_target_cpu(new_cpei_cpu);
 			desc = irq_desc + ia64_cpe_irq;

  reply	other threads:[~2009-02-09 21:17 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-09 18:13 [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable Alex Chiang
2009-02-09 18:13 ` [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable path Alex Chiang
2009-02-09 18:16 ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Alex Chiang
2009-02-09 18:16   ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs" Alex Chiang
2009-02-09 21:17   ` Alex Chiang [this message]
2009-02-09 21:17     ` Alex Chiang
2009-02-09 23:33     ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Alex Chiang
2009-02-09 23:33       ` [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq handlers on offline CPUs" Alex Chiang
2009-02-09 23:52       ` Russ Anderson
2009-02-09 23:52         ` Russ Anderson
2009-11-12 22:40     ` [APPLIED] [PATCH v2 1/2] Revert "[IA64] prevent ia64 from invoking irq Tony Lindgren
2009-02-09 18:16 ` [PATCH v2 2/2] ia64: Remove redundant cpu_clear() in __cpu_disable Alex Chiang
2009-02-09 18:16   ` [PATCH v2 2/2] ia64: Remove redundant cpu_clear() in __cpu_disable path Alex Chiang
2009-02-10 12:36 ` [PATCH v2 0/2] ia64: prevent irq migration race in Paul E. McKenney
2009-02-10 12:36   ` [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable path Paul E. McKenney
2009-02-10 16:11   ` [PATCH v2 0/2] ia64: prevent irq migration race in Alex Chiang
2009-02-10 16:11     ` [PATCH v2 0/2] ia64: prevent irq migration race in __cpu_disable path Alex Chiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090209211743.GA3939@ldl.fc.hp.com \
    --to=achiang@hp.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=stable@kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.