All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@domain.hid>
To: adeos-main@gna.org
Cc: Xenomai-core@domain.hid
Subject: [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling
Date: Thu, 04 Oct 2007 11:14:27 +0200	[thread overview]
Message-ID: <4704AEF3.4030105@domain.hid> (raw)

Hi all,

after a really long search I'm now quite sure to have found the reason
for the lockups I'm seeing over 2.6.22-i386. I'm yet struggling to
understand why this issue is not visible over 2.6.19 and .20 for me, but
maybe it is just far less likely there.

Here is a short write-up of the I-pipe trace I was able to catch with
some hacking from a locked up box:

Scenario: I-pipe active, Xenomai not loaded or compiled out (but loading
Xenomai just increases the probability)

1. IRQ 20 arrives, Linux starts serving it, but no one talks to the
   IO-APIC so far because this is a fasteoi type IRQ.

2. Linux reenables IRQs due to IRQF_DISABLED not set for IRQ 20.

3. IRQ 23 arrives and gets delivered as it is of higher priority in the
   APIC. From this point on, things start to fall apart.

4. I-pipe stops the delivery in __ipipe_synch_stage because the
   IPIPE_SYNC_FLAG is still set for the root domain. Linux switches back
   to the IRQ 20 handler so that the usual handling order gets inverted
   -- the first I-pipe bug.

5. IRQ 20 completes and sends an EOI to the APIC. Linux means that this
   is for IRQ 20, but the APIC considers it for IRQ 23!

6. IRQ 23 is re-enabled and arrives before its last event was handled.
   Thus two IRQ-23-events get merged into one, and eoi is only executed
   once instead of twice. This causes all IRQs < 23 being blocked from
   now on. :(

Well, this trace also reveals a second bug that can cause nasty priority
inversion: a high-prio domains executes when a fasteoi-IRQ arrives for a
low-prio domain. This will now block all IRQs until the low-prio domain
was able to run its IRQ handler completely. Thus we must _mask_ fasteoi
IRQs for low-prio domains while high-prio ones are running!

These bugs should impact at least x86_64 as well, not sure about how
powerpc looks like.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


             reply	other threads:[~2007-10-04  9:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-04  9:14 Jan Kiszka [this message]
2007-10-04  9:34 ` [Xenomai-core] [BUG] IO-APIC stall due to broken fasteoi handling Philippe Gerum
2007-10-04 12:22   ` Philippe Gerum
2007-10-04 12:42     ` Jan Kiszka
2007-10-04 12:55       ` Philippe Gerum
2007-10-04 14:06         ` Jan Kiszka
2007-10-04 14:26           ` Philippe Gerum
2007-10-04 14:44             ` Jan Kiszka
2007-10-04 15:52     ` Jan Kiszka
2007-10-04 17:03       ` Jan Kiszka
2007-10-04 20:05     ` Gilles Chanteperdrix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4704AEF3.4030105@domain.hid \
    --to=jan.kiszka@domain.hid \
    --cc=Xenomai-core@domain.hid \
    --cc=adeos-main@gna.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.