linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
@ 2008-09-15  8:04 Sebastien Dugue
  2008-09-15 12:17 ` Jan-Bernd Themann
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Sebastien Dugue @ 2008-09-15  8:04 UTC (permalink / raw)
  Cc: tklein, tinytim, Linux-rt, themann, netdev, linux-kernel,
	jean-pierre.dion, linux-ppc, raisch, gilles.carry


WARNING: HACK - HACK - HACK

  Under the RT kernel (with hardirq preemption) the eHEA driver hangs right
after booting. Fiddling with the hardirqs and softirqs priorities allows to
run a bit longer but as soon as the network gets under load, the hang
returns. After investigating, it appears that the driver is loosing interrupts.

  To make a long story short, looking at the code, it appears that the XICS
maps all its interrupts to level sensitive interrupts (I don't know if it's the
reality or if it's due to an incomplete implementation - no datasheets
available to check) and use the fasteoi processing flow.

  When entering the low level handler, level sensitive interrupts are masked,
then eio'd in interrupt context and then unmasked at the end of hardirq
processing.
That's fine as any interrupt comming in-between will still be processed since
the kernel replays those pending interrupts.

  However, it appears that the eHEA interrupts are behaving as edge sensitive
interrupts and are routed through the XICS which process those as level
sensitive using the fasteoi handler __OR__ the XICS loses interrupts when they
are masked.

  Therefore the masking done in the handler causes any interrupt happening while
in the handler to be lost.

  So this patch maps the interrupts being requested through
ibmebus_request_irq() as edge sensitive interrupts (this concerns both the eHEA
and the eHCA - only users of ibmebus_request_irq()) and changes the way edge
interrupts are processed by the fasteoi handler.

  It works for the eHEA, dunno for the eHCA.

  So, unless all the designers of the XICS & eHEA have been shot to keep it
a secret, could someone knowledgeable shed some light on this issue.

  Thanks,

  Sebastien.

Not-Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
---
 arch/powerpc/kernel/ibmebus.c |   11 ++++++++++-
 kernel/irq/chip.c             |    5 +++--
 kernel/irq/manage.c           |    9 ++++++---
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/ibmebus.c b/arch/powerpc/kernel/ibmebus.c
index 9971159..5200323 100644
--- a/arch/powerpc/kernel/ibmebus.c
+++ b/arch/powerpc/kernel/ibmebus.c
@@ -41,6 +41,7 @@
 #include <linux/kobject.h>
 #include <linux/dma-mapping.h>
 #include <linux/interrupt.h>
+#include <linux/irq.h>
 #include <linux/of.h>
 #include <linux/of_platform.h>
 #include <asm/ibmebus.h>
@@ -213,11 +214,19 @@ int ibmebus_request_irq(u32 ist, irq_handler_t handler,
 			void *dev_id)
 {
 	unsigned int irq = irq_create_mapping(NULL, ist);
+	struct irq_desc *desc;
+	int ret;
 
 	if (irq == NO_IRQ)
 		return -EINVAL;
 
-	return request_irq(irq, handler, irq_flags, devname, dev_id);
+	ret = request_irq(irq, handler, irq_flags, devname, dev_id);
+
+	desc = irq_desc + irq;
+	desc->status &= ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
+	desc->status |= IRQ_TYPE_EDGE_RISING;
+
+	return ret;
 }
 EXPORT_SYMBOL(ibmebus_request_irq);
 
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b7b397a..6d366ca 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -430,7 +430,7 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
 	action = desc->action;
 	if (unlikely(!action || (desc->status & (IRQ_INPROGRESS |
 						 IRQ_DISABLED)))) {
-		desc->status |= IRQ_PENDING;
+		desc->status |= IRQ_PENDING | IRQ_MASKED;
 		if (desc->chip->mask)
 			desc->chip->mask(irq);
 		goto out;
@@ -439,9 +439,10 @@ handle_fasteoi_irq(unsigned int irq, struct irq_desc *desc)
 	desc->status |= IRQ_INPROGRESS;
 	/*
 	 * In the threaded case we fall back to a mask+eoi sequence:
+	 * excepted for edge interrupts which are not masked.
 	 */
 	if (redirect_hardirq(desc)) {
-		if (desc->chip->mask)
+		if (desc->chip->mask && !(desc->status & IRQ_TYPE_EDGE_BOTH))
 			desc->chip->mask(irq);
 		goto out;
 	}
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 3bffa20..3e39c71 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -788,9 +788,12 @@ static void do_hardirq(struct irq_desc *desc)
 		thread_simple_irq(desc);
 	else if (desc->handle_irq == handle_level_irq)
 		thread_level_irq(desc);
-	else if (desc->handle_irq == handle_fasteoi_irq)
-		thread_fasteoi_irq(desc);
-	else if (desc->handle_irq == handle_edge_irq)
+	else if (desc->handle_irq == handle_fasteoi_irq) {
+		if (desc->status & IRQ_TYPE_EDGE_BOTH)
+			thread_edge_irq(desc);
+		else
+			thread_fasteoi_irq(desc);
+	} else if (desc->handle_irq == handle_edge_irq)
 		thread_edge_irq(desc);
 	else
 		thread_do_irq(desc);
-- 
1.6.0.1.308.gede4c

^ permalink raw reply related	[flat|nested] 30+ messages in thread
* Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption
@ 2008-09-25 23:40 Milton Miller
  2008-09-26  9:16 ` Sebastien Dugue
  0 siblings, 1 reply; 30+ messages in thread
From: Milton Miller @ 2008-09-25 23:40 UTC (permalink / raw)
  To: Sebastien Dugue, Milton Miller; +Cc: linuxppc-dev, linux-kernel

(I trimmed the cc list for the implementation discussion).
 
> On Wed, 24 Sep 2008 11:42:15 -0500 Milton Miller
> <miltonm@bga.com> wrote:
> 
> > On Sep 24, 2008, at 7:30 AM, Sebastien Dugue wrote:
> > >   Hi Milton,
> > > On Wed, 24 Sep 2008 04:58:22 -0500 (CDT) Milton Miller
> > > <miltonm@bga.com> wrote:
> > >> On Mon Sep 15 at 18:04:06 EST in 2008, Sebastien
> > Dugue wrote: >>> When entering the low level handler,
> > level sensitive interrupts are >>> masked, then eio'd in
> > interrupt context and then unmasked at the >>> end of
> > hardirq processing.  That's fine as any interrupt
> > comming >>> in-between will still be processed since the
> > kernel replays those >>> pending interrupts.
> > >>
> > >> Is this to generate some kind of software managed
> > nesting and priority >> of the hardware level
> > interrupts? >
> > >   No, not really. This is only to be sure to not miss
> > > interrupts coming from the same source that were
> > > received during threaded hardirq  processing.
> > > Some instrumentation showed that it never seems to
> > > happen in the eHEA interrupt case, so I think we can
> > > forget this aspect. 
> > I don't trust "the interrupt can never happen during hea
> > hardirq",  because I think there will be a race between
> > their rearming the next  interrupt and the unmask being
> > called.
> 
>   So do I, it was just to make sure I was not hit by
> another interrupt while handling the previous one and thus
> reduce the number of hypothesis.
> 
>   I sure do not say that it cannot happen, just that that
> path is not taken when I have the eHEA hang.
> 
> > I was trying to understand why the mask and early eoi,
> > but I guess its  to handle other more limited interrupt
> > controllers where the interrupts  stack in hardware
> > instead of software. 
> > >   Also, the problem only manifests with the eHEA RX
> > > interrupt. For  example,
> > > the IBM Power Raid (ipr) SCSI exhibits absolutely no
> > > problem under an  RT
> > > kernel. From this I conclude that:
> > >
> > >   IPR - PCI - XICS is OK
> > >   eHEA - IBMEBUS - XICS is broken with hardirq
> > preemption. >
> > >   I also checked that forcing the eHEA interrupt to
> > > take the non  threaded
> > > path does work.
> > 
> > For a long period of time, XICS dealt only with level
> > interrupts.    First Micro Channel, and later PCI buses.
> >  The IPI is made level by  software conventions. 
> > Recently, EHCA, EHEA, and MSI interrupts were  added
> > which by their nature are edge based.  The logic that
> > converts  those interrupts to the XICS layer is
> > responsible for the resend when  no cpu can accept them,
> > but not to retrigger after an EOI.
> 
>  OK
> 
> > 
> > >   Here is a side by side comparison of the fasteoi
> > > flow with and  without hardirq
> > > threading (sorry it's a bit wide)
> > (removed)
> > >   the non-threaded flow does (in interrupt context):
> > >
> > >     mask
> 
>   Whoops, my bad, in the non threaded case, there's no
> mask at all, only an unmask+eoi at the end, maybe that's
> an oversight!
 
No, not an oversight.  The point is, don't mask/unmask
between ack/eoi while handling the interrupt.  For many
irq controllers, the eoi must be done from the same cpu,
hence the mask and eoi before actually handling the
interrupt in the general case.   Its a feature of xics
that we don't have to play that game, but can do the
cpu and device eoi separately.
 
> > >     handle interrupt
> > >     unmask
> > >     eoi
> > >
> > >   the threaded flow does:
> > >
> > >     mask
> > >     eoi
> > >         handle interrupt
> > >         unmask
> > >
> > >   If I remove the mask() call, then the eHEA is no
> > > longer hanging. 
> > Hmm, I guess I'm confused.  You are saying the irq does
> > not appear if  it occurs while it is masked?
> 
>   Looks like it is, but I cannot say for sure, the only
> observable effect is that I do not get any more interrupts
> coming from the eHEA.
 
(removed features of xics)
 
 
>   That may be, but I'm only looking at the code (read no
> specifications at hand) and it looks like a black box to
> me.
 
"PowerPC External Interrupt Architecture" is defined in
appendix A of "Power.org=99 Standard for 
Power Architecture=99 Platform Requirements
(Workstation, Server)", available to Power.org members.
"The developer-level membership in Power.org is free."
(see www.power.org).
 
That said, it likely won't mention the eHEA in enough
detail to note that the interrupt gets cleared on
unmask.
 
On the other hand, I have actually seen the source
to implementations of the xics logic, so I have a
very good understanding of it (and know of a few
implementation "features", shall we say).
 
 
> > The path lengh for mask and unmask is always VERY slow
> > and single  threaded global lock and single context in
> > xics.  It is designed and  tuned to run at driver
> > startup and shutdown (and adapter reset and  reinitalize
> > during pci error processing), not during normal irq 
> > processing.
> 
>   Now, that is quite interesting then. Those mask() and
> unmask() should then be called shutdown() and startup()
> and not at each interrupt or am I misunderstanding you.
 
Basically, yes.  but linux likes to let drivers mask at
other times, and that is the facility we have.
 
> > The XICS hardware implicitly masks the specific source
> > as part of  interrupt ack (get_irq), and implicitly
> > undoes this mask at eoi.   In  addition, it helps to
> > manage the cpu priority by supplying the previous 
> priority as part of the get_irq process and providing for
> > the priority  to be restored (lowered only) as part of
> > the eoi.  The hardware does  support setting the cpu
> priority independently.
> 
>   This confirms, then, that the mask and unmask methods
> should be empty for the xics.
> 
> > 
> > We should only be using this implicit masking for xics,
> > and not the  explicit masking for any normal interrupt
> > processing.
> 
>   OK
> 
> >  I don't know if 
> > this means making the mask/unmask setting a bit in
> > software,
> 
>   Used by whom? 
 
The thought here was if we can't change the caller, then
maybe we could try to figure out what the caller was
trying to accomplish and defer what was requested based
on context.   Obviously, we are better off changing the
caller.
 
> 
> > and the 
> > enable/disable to actually call what we do now on
> > mask/unmask, or if it  means we need a new flow type on
> real time.
> 
>   Maybe a new flow type is not necessary considering what
> you said.
 
Maybe not, but I think it would be preferred ... we do have
the source to both sides.
 
> > While call to mask and unmask might work on level
> > interrupts, its  really slow and will limit performance
> > if done on every interrupt. 
> > >   the non-threaded flow does (in interrupt context):
> > >
> > >     mask
> 
>   Same Whoops, no mask is done in the non threaded case
I was just copying for context :=3d)
 
> > >     handle interrupt
> > >     unmask
> > >     eoi
> > >
> > >   the threaded flow does:
> > >
> > >     mask
> > >     eoi
> > >         handle interrupt
> > >         unmask
> > 
> > I think the flows we want on xics are:
> > 
> > (non-threaded)
> >     getirq (implicit source specific mask until eoi)
> >     handle interrupt
> >     eoi (implicit cpu priority restore)
> 
>   Yep
> 
> > 
> > (threaded)
> >     getirq (implicit source specific mask until eoi)
> >     explicit cpu priority restore
>         ^
>   How do you go about doing that? Still not clear to me.
 
xics_set_cpu_priority(0xff)
 
of course, there needs to be some kind of 
struct irq_chip method to call it.
 
> >     handle interrupt
> >     eoi (implicit cpu priority restore to same as
> > explicit level) 
> > Where the cpu priority restore allows receiving other
> > interrupts of the  same priority from the hardware.
> > 
> > So I guess the question is can the rt kernel interrupt
> > processing take  advantage of xics auto mask,
> 
>   It should, but even mainline could benefit from it I
> guess.
> 
 
I haven't looked, but are you implying mainline calls
mask and unmask for each interrupt, and not just when
the driver does unmask/mask/request/free?
 
Otherwise, I am not sure mainline is doing anything wrong.
 
> > or does someone need to write state 
> > tracking in the xics code to work around this, changing
> > mask under  interrupt to "defer eoi to unmask" (which I
> > can not see as clean, and  having shutdown problems).
> > 
>   Thanks a lot Milton for those explanations,
> 
>   Sebastien.
 
You're welcome.  I don't have time to work on the rt kernel,
but I'll be glad to help suggest an implementation that
works efficiently.
 
milton

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2008-09-26 12:01 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-15  8:04 [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption Sebastien Dugue
2008-09-15 12:17 ` Jan-Bernd Themann
2008-09-15 12:35 ` Thomas Klein
2008-09-15 13:13   ` Sebastien Dugue
2008-09-16 11:59     ` Anton Vorontsov
2008-09-16 12:22       ` Sebastien Dugue
2008-09-18  7:53 ` Christoph Raisch
2008-09-18  9:27   ` Sebastien Dugue
2008-09-18 10:42     ` [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption, eHCA is close Christoph Raisch
2008-09-18 12:31       ` Sebastien Dugue
2008-09-23 15:43         ` Jan-Bernd Themann
2008-09-24  9:58 ` [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption Milton Miller
2008-09-24 10:17   ` Benjamin Herrenschmidt
2008-09-24 11:02     ` Milton Miller
2008-09-24 21:14       ` Benjamin Herrenschmidt
2008-09-25  7:31         ` Sebastien Dugue
2008-09-24 12:35     ` Sebastien Dugue
2008-09-24 21:15       ` Benjamin Herrenschmidt
2008-09-25  7:18         ` Sebastien Dugue
2008-09-25  7:22           ` Benjamin Herrenschmidt
2008-09-25  7:42             ` Sebastien Dugue
2008-09-25  8:36               ` Benjamin Herrenschmidt
2008-09-25  8:39                 ` Sebastien Dugue
2008-09-24 12:30   ` Sebastien Dugue
2008-09-24 16:42     ` Milton Miller
2008-09-24 21:16       ` Benjamin Herrenschmidt
2008-09-25  3:56         ` Milton Miller
2008-09-25  8:45       ` Sebastien Dugue
  -- strict thread matches above, loose matches on Subject: below --
2008-09-25 23:40 Milton Miller
2008-09-26  9:16 ` Sebastien Dugue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).