Re: [BUG] lockup with the latest kernel

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [BUG] lockup with the latest kernel
Date: Fri, 28 Aug 2009 08:36:03 +0200	[thread overview]
Message-ID: <20090828063603.GA21420@elte.hu> (raw)
In-Reply-To: <4A97468A.6080502@kernel.org>


* Tejun Heo <tj@kernel.org> wrote:

> Tejun Heo wrote:
> >>> Always happens where one CPU is sending an IPI and the other has the rq 
> >>> spinlock. Seems to be that the IPI expects the other CPU to not have 
> >>> interrupts disabled or something?
> > 
> > I'm not too familiar with apics but AFAIK sending IPI isn't an
> > interlocked operation (not at least at the software level) so I doubt
> > it has much to do with the other cpu doing or not doing anything.  It
> > looks like the local apic is stuck hardware-wise.  The only thing the
> > commit changes is that cpu1 would be using vector 0xf1 instead of 0xf0
> > together with cpu0.
> > 
> > (reading the doc...) Okay, here's something interesting.  It's from
> > section 9.8.4 of intel doc 253668.pdf - Intel 64 and IA-32
> > Architectures Software Developer's Manual Volume 3A: System
> > Programming Guide, Part 1.
> > 
> >  For the P6 family and Pentium processors, the IRR and ISR registers
> >  can queue no more than two interrupts per priority level, and will
> >  reject other interrupts that are received within the same priority
> >  level.
> > 
> > And from AMD's 24593 - AMD64 Architecture Programmer's Manual Volume
> > 2: System Programming, section 16.6.3.
> > 
> >  No more than two interrupts can be pending for the same interrupt
> >  vector number. Subsequent interrupt requests to the same interrupt
> >  vector number will be rejected. See Figure 16-23 on page 445.
> > 
> 
> Oh... there are differences that I missed.
> 
>  All intels: If more than one interrupt is generated with the same
> 	     vector number, the local APIC can set the bit for the
> 	     vector both in the IRR and the ISR. This means that for
> 	     the Pentium 4 and Intel Xeon processors, the IRR and ISR
> 	     can queue two interrupts for each interrupt vector: one
> 	     in the IRR and one in the ISR. Any additional interrupts
> 	     issued FOR THE SAME INTERRUPT VECTOR are COLLAPSED INTO
> 	     THE SINGLE BIT in the IRR.
> 
>  Ppro: no more than two interrupts PER PRIORITY LEVEL, and will REJECT
>        OTHER interrupts
> 
>  AMD64: Subsequent interrupt requests to THE SAME INTERRUPT VECTOR
>         NUMBER will be REJECTED.
> 
> Eh... don't have earlier AMD doc and gotta go now.  Can somebody 
> please check?  But it looks like we can deadlock by simply sending 
> RESCHEDULE_VECTOR more than two times while holding rq lock on 
> AMD?

We poll ICR in the send-IPI logic before sending it out - so this 
shouldnt happen. The restrictions above should at most cause extra 
polling latency (i.e. it's a performance detail, not a lockup 
source). See all the *wait_icr_idle() methods in the IPI sending 
logic in arch/x86.

Neither TLB flushes nor reschedule IPIs are idempotent, so if this 
was broken and if we lost requested events on remote CPUs we'd 
notice it rather quickly via TLB flush related hangs or scheduling 
latencies or lost wakeups, on a rather large category of CPUs.

I think Linus's suggestion that it's the zero mask quirk on certain 
older CPUs that is causing problems on that system should be 
examined ... does .31-rc8 work fine?

	Ingo

next prev parent reply	other threads:[~2009-08-28  6:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-19 15:49 [BUG] lockup with the latest kernel Steven Rostedt
2009-08-19 15:50 ` Steven Rostedt
2009-08-19 16:18 ` Andrew Morton
2009-08-27 22:41 ` Steven Rostedt
2009-08-27 22:45   ` Steven Rostedt
2009-08-28  2:46   ` Tejun Heo
2009-08-28  2:52     ` Tejun Heo
2009-08-28  6:36       ` Ingo Molnar [this message]
2009-08-28  6:59         ` Tejun Heo
2009-08-28  4:05     ` Linus Torvalds
2009-08-28 16:15       ` Steven Rostedt
2009-08-28 18:33         ` Steven Rostedt
2009-09-09  2:29           ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090828063603.GA21420@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox