public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH][KDUMP] Ignore spurious IPI
@ 2011-03-23 18:40 Takao Indoh
  2011-03-24 14:20 ` [KDUMP] " Milton Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Takao Indoh @ 2011-03-23 18:40 UTC (permalink / raw)
  To: linux-kernel, kexec

Hi all,

I found a problem that kdump(2nd kernel) sometimes hangs up. It seems
that system panic occurs as follows.

(1)
2nd kernel boot up

(2)
A pending IPI from 1st kernel comes after unmasking interrupts at the
following point.

asmlinkage void __init start_kernel(void)
{
(snip)
    time_init();
    profile_init();
    if (!irqs_disabled())
            printk(KERN_CRIT "start_kernel(): bug: interrupts were "
                             "enabled early\n");
    early_boot_irqs_disabled = false;
    local_irq_enable(); <=======================================HERE

(3)
Kernel tries to handle the interrupt, but some data structures are not
initialized yet at this point. As a result, in the
generic_smp_call_function_single_interrupt(), NULL pointer dereference
occurs when list_replace_init() tries to access &q->list.next.

I took a look at local_apic_timer_interrupt() and found a few lines to
handle such a pending LAPIC interrupt(in this case, timer interrupt).
Therefore I made a patch to ignore spurious IPI in the same manner. I
confirmed this problem does not occur with this patch.

Any comments?

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
---
 kernel/smp.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index 9910744..f2f561b 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -260,6 +260,12 @@ void generic_smp_call_function_single_interrupt(void)
 	 */
 	WARN_ON_ONCE(!cpu_online(smp_processor_id()));
 
+	if (unlikely(!q->list.next)) {
+		/* Pending interrupt from previous kernel(e.g. kdump), just ignore */
+		pr_warning("Spurious IPI on cpu %d\n", smp_processor_id());
+		return;
+	}
+
 	raw_spin_lock(&q->lock);
 	list_replace_init(&q->list, &list);
 	raw_spin_unlock(&q->lock);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [KDUMP] Ignore spurious IPI
  2011-03-23 18:40 [PATCH][KDUMP] Ignore spurious IPI Takao Indoh
@ 2011-03-24 14:20 ` Milton Miller
  2011-03-24 21:25   ` Takao Indoh
  2011-03-25  6:45   ` WANG Cong
  0 siblings, 2 replies; 4+ messages in thread
From: Milton Miller @ 2011-03-24 14:20 UTC (permalink / raw)
  To: Takao Indoh; +Cc: linux-kernel, kexec

On Wed, 23 Mar 2011 about 18:40:12 -0000, Takao Indoh wrote:
> Hi all,
> 
> I found a problem that kdump(2nd kernel) sometimes hangs up. It seems
> that system panic occurs as follows.
..
> (2)
> A pending IPI from 1st kernel comes after unmasking interrupts at the
> following point.
> 
> asmlinkage void __init start_kernel(void)
> {
> (snip)
>     time_init();
>     profile_init();
>     if (!irqs_disabled())
>             printk(KERN_CRIT "start_kernel(): bug: interrupts were "
>                              "enabled early\n");
>     early_boot_irqs_disabled = false;
>     local_irq_enable(); <=======================================HERE
> 
> (3)
> Kernel tries to handle the interrupt, but some data structures are not
> initialized yet at this point. As a result, in the
> generic_smp_call_function_single_interrupt(), NULL pointer dereference
> occurs when list_replace_init() tries to access &q->list.next.
> 
[tried to match lapic timer interrupt]
> Any comments?

So this occurs because unlike device interrupts, this vector has the action
defined statically and no per-interrupt disable on your architecture?


If so, just initialize the data structure earlier -- change
init_call_single_data from early_initcall to an explict call after the
per-cpu areas are initialized.

milton

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [KDUMP] Ignore spurious IPI
  2011-03-24 14:20 ` [KDUMP] " Milton Miller
@ 2011-03-24 21:25   ` Takao Indoh
  2011-03-25  6:45   ` WANG Cong
  1 sibling, 0 replies; 4+ messages in thread
From: Takao Indoh @ 2011-03-24 21:25 UTC (permalink / raw)
  To: Milton Miller; +Cc: linux-kernel, kexec

On Thu, 24 Mar 2011 08:20:32 -0600, Milton Miller wrote:

>On Wed, 23 Mar 2011 about 18:40:12 -0000, Takao Indoh wrote:
>> Hi all,
>> 
>> I found a problem that kdump(2nd kernel) sometimes hangs up. It seems
>> that system panic occurs as follows.
>..
>> (2)
>> A pending IPI from 1st kernel comes after unmasking interrupts at the
>> following point.
>> 
>> asmlinkage void __init start_kernel(void)
>> {
>> (snip)
>>     time_init();
>>     profile_init();
>>     if (!irqs_disabled())
>>             printk(KERN_CRIT "start_kernel(): bug: interrupts were "
>>                              "enabled early\n");
>>     early_boot_irqs_disabled = false;
>>     local_irq_enable(); <=======================================HERE
>> 
>> (3)
>> Kernel tries to handle the interrupt, but some data structures are not
>> initialized yet at this point. As a result, in the
>> generic_smp_call_function_single_interrupt(), NULL pointer dereference
>> occurs when list_replace_init() tries to access &q->list.next.
>> 
>[tried to match lapic timer interrupt]
>> Any comments?
>
>So this occurs because unlike device interrupts, this vector has the action
>defined statically and no per-interrupt disable on your architecture?

I think there is not per-interrupt disable for IPI.

>If so, just initialize the data structure earlier -- change
>init_call_single_data from early_initcall to an explict call after the
>per-cpu areas are initialized.

That makes sense. I'll do this, thanks.

Thanks,
Takao Indoh


>
>milton

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [KDUMP] Ignore spurious IPI
  2011-03-24 14:20 ` [KDUMP] " Milton Miller
  2011-03-24 21:25   ` Takao Indoh
@ 2011-03-25  6:45   ` WANG Cong
  1 sibling, 0 replies; 4+ messages in thread
From: WANG Cong @ 2011-03-25  6:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: kexec

On Thu, 24 Mar 2011 08:20:32 -0600, Milton Miller wrote:
> If so, just initialize the data structure earlier -- change
> init_call_single_data from early_initcall to an explict call after the
> per-cpu areas are initialized.
> 

Yeah, this is a good idea, IPI related data structures should be
ready before interrupts are enabled. We should not assume that
there is no IPI pending before init_call_single_data gets initialized,
altough this is true in non-kdump kernel.

Thanks.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-03-25  6:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-23 18:40 [PATCH][KDUMP] Ignore spurious IPI Takao Indoh
2011-03-24 14:20 ` [KDUMP] " Milton Miller
2011-03-24 21:25   ` Takao Indoh
2011-03-25  6:45   ` WANG Cong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox