All of lore.kernel.org
 help / color / mirror / Atom feed
* The i386 and x86_64 genirq patches are wrong!
@ 2006-06-13 16:34 Eric W. Biederman
  2006-06-14  7:03 ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2006-06-13 16:34 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton; +Cc: linux-kernel, Thomas Gleixner


A little while ago I worked up some patches to that made
x86 vectors per cpu.  Allowing x86 to handle more than
256 irqs, which significantly cleaned up the code.

The big stumbling block there was msi and it's totally
backwards way of allocating interrupts.  Getting msi
to work with irqs and not vectors is a lot of work,
and very hard to make a clean patchset out of.

Since there has been a lot of irq work, I decided to rebase
my changes on the -mm tree so that I could catch all of
the relevant patches and be in a better shape for merging
when I am done.

The work on msi in the -mm tree while still woefully
short of making it a general piece of code seemed
to be in the right direction.

When I got to io_apic.c I screamed, things had gotten worse.

I currently have a pending bug fix that puts move_irq in
the correct place for edge and level triggered interrupts.

For edge triggered interrupts you must move the irq before
you acknowledge the interrupt, or else it can reoccur.  

For level triggered interrupts you must acknowledge the irq
before you move it, or else the acknowledgement will never
find it's way back to the irq source.

I can no longer implement that bug fix in io_apic.c because
the two interrupt handling paths have been smushed together.

Previously in io_apic.c the hacks that the msi code imposed on
it were clear, and many of them were enclosed in #ifdef CONFIG_PCI_MSI.
Now that ifdefs are gone, and the logic in io_apic.h that
selected between the versions of the irq handlers to use (vector or irq)
has not been updated.

I don't know if the latter is actually a bug.  But it definitely makes
it harder to remove the msi hacks, and io_apic.h should definitely
have been updated.

The fact that you I now can't put move_irq where it needs to be is definitely
a bug.

So could we please get a version the genirq patches that don't simultaneously
convert the x86 code to the new infrastructure and try and clean it up
simultaneously.

Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The i386 and x86_64 genirq patches are wrong!
  2006-06-13 16:34 The i386 and x86_64 genirq patches are wrong! Eric W. Biederman
@ 2006-06-14  7:03 ` Ingo Molnar
  2006-06-14  7:55   ` Eric W. Biederman
  2006-06-14 17:31   ` Eric W. Biederman
  0 siblings, 2 replies; 7+ messages in thread
From: Ingo Molnar @ 2006-06-14  7:03 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel, Thomas Gleixner


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> A little while ago I worked up some patches to that made x86 vectors 
> per cpu.  Allowing x86 to handle more than 256 irqs, which 
> significantly cleaned up the code.
> 
> The big stumbling block there was msi and it's totally backwards way 
> of allocating interrupts.  Getting msi to work with irqs and not 
> vectors is a lot of work, and very hard to make a clean patchset out 
> of.

yeah. The whole MSI irq remapping business is a total mess anyway: all 
that trouble we do to compensate a +32 mapping offset of vectors vs. 
irqs? Why dont we simply fix up all IRQ entry stubs to have +32, and 
thus we'd standardize on vector metrics and be done with it. In 
/proc/interrupts we could subtract 32 perhaps. Then MSI would be always 
enabled and there would be no MSI #ifdefs anywhere. Am i missing 
something?

> I currently have a pending bug fix that puts move_irq in the correct 
> place for edge and level triggered interrupts.
> 
> For edge triggered interrupts you must move the irq before you 
> acknowledge the interrupt, or else it can reoccur.
>
> For level triggered interrupts you must acknowledge the irq before you 
> move it, or else the acknowledgement will never find it's way back to 
> the irq source.

could you please send that fix to me, against whatever base you have it 
tested on, and i'll merge it to genirq/irqchips [and fix up genirq if 
needed]. Please also include a description of the problem. How common is 
that edge retrigger problem, and how come this has never been seen in 
the past years since we had irqbalance?

> Previously in io_apic.c the hacks that the msi code imposed on it were 
> clear, and many of them were enclosed in #ifdef CONFIG_PCI_MSI. Now 
> that ifdefs are gone, and the logic in io_apic.h that selected between 
> the versions of the irq handlers to use (vector or irq) has not been 
> updated.
> 
> I don't know if the latter is actually a bug.  But it definitely makes 
> it harder to remove the msi hacks, and io_apic.h should definitely 
> have been updated.

here too it's hard for me to give an answer without seeing your specific 
changes (against whatever base is most convenient to you). MSI certainly 
works fine on current -mm. (at least on my box)

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The i386 and x86_64 genirq patches are wrong!
  2006-06-14  7:03 ` Ingo Molnar
@ 2006-06-14  7:55   ` Eric W. Biederman
  2006-06-14 17:31   ` Eric W. Biederman
  1 sibling, 0 replies; 7+ messages in thread
From: Eric W. Biederman @ 2006-06-14  7:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Thomas Gleixner

Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> A little while ago I worked up some patches to that made x86 vectors 
>> per cpu.  Allowing x86 to handle more than 256 irqs, which 
>> significantly cleaned up the code.
>> 
>> The big stumbling block there was msi and it's totally backwards way 
>> of allocating interrupts.  Getting msi to work with irqs and not 
>> vectors is a lot of work, and very hard to make a clean patchset out 
>> of.
>
> yeah. The whole MSI irq remapping business is a total mess anyway: all 
> that trouble we do to compensate a +32 mapping offset of vectors vs. 
> irqs? Why dont we simply fix up all IRQ entry stubs to have +32, and 
> thus we'd standardize on vector metrics and be done with it. In 
> /proc/interrupts we could subtract 32 perhaps. Then MSI would be always 
> enabled and there would be no MSI #ifdefs anywhere. Am i missing 
> something?

The problem that got me started was that on big boxes
NR_IOAPICS*NR_IOAPiC_PINS on large boxes is frequently in the 1400+
territory.  we have 1400+ irq sources that we can address
individually. The code to map that into 256 interrupt vectors is a
real maintenance pain, and limits what we can do in the future.  

On x86 hardware interrupts are sent from interrupt sources to a cpu
vector pair (at least above 8 cpus).  If you take advantage of that
you can allow your self to have NR_CPUS*255 irqs all working
simultaneously without sharing.  I have tested a basic version of that
and it works.

In acpi the enumeration of the ioapic source pins is called the gsi
number, and it we want irq numbers to be meaningful our irqs in an
ioapic based system really need to be the gsi.  But that requires
we actually use the dynamic mapping of irq# to vector that we
currently have.   All of which is easy to do and simplifies a
lot of code unless you want to enable msi.

As for msi, I want to introduce two arch specific helper functions:

 extern int create_irq(struct irq_chip *chip, void *handler_data);
 extern void destroy_irq(unsigned int irq);

That understand the architectures rules for irq number assignment
and do whatever is appropriate.  The rest of the code can then work.

>> I currently have a pending bug fix that puts move_irq in the correct 
>> place for edge and level triggered interrupts.
>> 
>> For edge triggered interrupts you must move the irq before you 
>> acknowledge the interrupt, or else it can reoccur.
>>
>> For level triggered interrupts you must acknowledge the irq before you 
>> move it, or else the acknowledgement will never find it's way back to 
>> the irq source.
>
> could you please send that fix to me, against whatever base you have it 
> tested on, and i'll merge it to genirq/irqchips [and fix up genirq if 
> needed]. Please also include a description of the problem. How common is 
> that edge retrigger problem, and how come this has never been seen in 
> the past years since we had irqbalance?

Sure.  I'm in the process of breaking it out from my development work
where it was in a big blob with a lot of other changes.  I was going
to make it against -mm but it looks like I need to do that against something
else.

We haven't seen it because the only symptom currently is that an
interrupt is delivered on the wrong cpu.  Since don't change the vector
number we don't notice and life is fine.

As soon as you start caring which cpu an interrupt comes in on it
shows up fast.  Which is why I caught it.


I will send something better in a little while but here are the
relevant irq handling methods, snipped out of my working version
of io_apic.c:

> /*
>  * Once we have recorded IRQ_PENDING already, we can mask the
>  * interrupt for real. This prevents IRQ storms from unhandled
>  * devices.
>  */
> static void ack_edge_ioapic_irq(unsigned int irq)
> {
> 	irq_desc_t *desc = irq_descp(irq);
> 	int do_unmask_irq = 0;
> 	if (((desc->status & (IRQ_PENDING | IRQ_DISABLED))
> 		    == (IRQ_PENDING | IRQ_DISABLED)) || 
> 					    (desc->status & IRQ_MOVE_PENDING)) {
> 		do_unmask_irq = !(desc->status & (IRQ_DISABLED | IRQ_PENDING));
> 		mask_IO_APIC_irq(irq);
> 	}
> 	move_irq(irq);
> 	ack_APIC_irq();
> 	if (unlikely(do_unmask_irq))
> 		unmask_IO_APIC_irq(irq);
> }
...
> static void end_level_ioapic_irq (unsigned int irq)
> {
> 	int do_unmask_irq = 0;
> 	if (unlikely(irq_descp(irq)->status & IRQ_MOVE_PENDING)) {
> 		do_unmask_irq = 1;
> 		mask_IO_APIC_irq(irq);
> 	}
> 	ack_APIC_irq();
> 	move_irq(irq);
> 	if (unlikely(do_unmask_irq))
> 		unmask_IO_APIC_irq(irq);
> }

Two notes for reading the above code.  In that source tree I
have changed move_irq into a flag IRQ_MOVE_PENDING, and I have
reduce move_irq and move_irq_native to just move_irq.

That at least should clearly show the logic I am working with.

>> Previously in io_apic.c the hacks that the msi code imposed on it were 
>> clear, and many of them were enclosed in #ifdef CONFIG_PCI_MSI. Now 
>> that ifdefs are gone, and the logic in io_apic.h that selected between 
>> the versions of the irq handlers to use (vector or irq) has not been 
>> updated.
>> 
>> I don't know if the latter is actually a bug.  But it definitely makes 
>> it harder to remove the msi hacks, and io_apic.h should definitely 
>> have been updated.
>
> here too it's hard for me to give an answer without seeing your specific 
> changes (against whatever base is most convenient to you). MSI certainly 
> works fine on current -mm. (at least on my box)

So I'm not certain the msi/non-msi case is broken at the moment.  I haven't
looked at it close enough to convince myself either way.  What is clearly
wrong is that the comments now no longer reflect the code, and 
io_apic.h still had defines referring to the previous situation.

>From io_apic.c
> /*
>  * Level and edge triggered IO-APIC interrupts need different handling,
>  * so we use two separate IRQ descriptors. Edge triggered IRQs can be
>  * handled with the level-triggered descriptor, but that one has slightly
>  * more overhead. Level-triggered interrupts cannot be handled with the
>  * edge-triggered handler, without risking IRQ storms and other ugly
>  * races.
>  */
> 

>From io_apic.h
> #ifdef CONFIG_PCI_MSI
> static inline int use_pci_vector(void)	{return 1;}
> static inline void disable_edge_ioapic_vector(unsigned int vector) { }
> static inline void mask_and_ack_level_ioapic_vector(unsigned int vector) { }
> static inline void end_edge_ioapic_vector (unsigned int vector) { }
> #define startup_level_ioapic	startup_level_ioapic_vector
> #define shutdown_level_ioapic	mask_IO_APIC_vector
> #define enable_level_ioapic	unmask_IO_APIC_vector
> #define disable_level_ioapic	mask_IO_APIC_vector
> #define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_vector
> #define end_level_ioapic	end_level_ioapic_vector
> #define set_ioapic_affinity	set_ioapic_affinity_vector
> 
> #define startup_edge_ioapic 	startup_edge_ioapic_vector
> #define shutdown_edge_ioapic 	disable_edge_ioapic_vector
> #define enable_edge_ioapic 	unmask_IO_APIC_vector
> #define disable_edge_ioapic 	disable_edge_ioapic_vector
> #define ack_edge_ioapic 	ack_edge_ioapic_vector
> #define end_edge_ioapic 	end_edge_ioapic_vector
> #else
> static inline int use_pci_vector(void)	{return 0;}
> static inline void disable_edge_ioapic_irq(unsigned int irq) { }
> static inline void mask_and_ack_level_ioapic_irq(unsigned int irq) { }
> static inline void end_edge_ioapic_irq (unsigned int irq) { }
> #define startup_level_ioapic	startup_level_ioapic_irq
> #define shutdown_level_ioapic	mask_IO_APIC_irq
> #define enable_level_ioapic	unmask_IO_APIC_irq
> #define disable_level_ioapic	mask_IO_APIC_irq
> #define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_irq
> #define end_level_ioapic	end_level_ioapic_irq
> #define set_ioapic_affinity	set_ioapic_affinity_irq
> 
> #define startup_edge_ioapic 	startup_edge_ioapic_irq
> #define shutdown_edge_ioapic 	disable_edge_ioapic_irq
> #define enable_edge_ioapic 	unmask_IO_APIC_irq
> #define disable_edge_ioapic 	disable_edge_ioapic_irq
> #define ack_edge_ioapic 	ack_edge_ioapic_irq
> #define end_edge_ioapic 	end_edge_ioapic_irq
> #endif

Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The i386 and x86_64 genirq patches are wrong!
  2006-06-14  7:03 ` Ingo Molnar
  2006-06-14  7:55   ` Eric W. Biederman
@ 2006-06-14 17:31   ` Eric W. Biederman
  2006-06-19 21:41     ` Andrew Morton
  1 sibling, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2006-06-14 17:31 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, linux-kernel, Thomas Gleixner

Ingo Molnar <mingo@elte.hu> writes:

> here too it's hard for me to give an answer without seeing your specific 
> changes (against whatever base is most convenient to you). MSI certainly 
> works fine on current -mm. (at least on my box)

Ok.  Looking closer.  I have found a clear functional bug.

When CONFIG_PCI_MSI is not set.
  move_irq expands to move_native_irq.

  ack_ioapic_vector
    move_native_irq
    ack_ioapic_irq
      move_irq
        move_native_irq

  ack_ioapic_quirk_vector
    move_native_irq
    ack_ioapic_quirk_irq
      move_irq
        move_native_irq

So we wind up calling move_native_irq twice when MSI is disabled where
before your conversion we only ever called it once.  Luckily in
the case where we have the double call vector_to_irq is a noop so
we only migration the same irq twice.

Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The i386 and x86_64 genirq patches are wrong!
  2006-06-14 17:31   ` Eric W. Biederman
@ 2006-06-19 21:41     ` Andrew Morton
  2006-06-20  5:13       ` Eric W. Biederman
  2006-06-20 22:23       ` Eric W. Biederman
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2006-06-19 21:41 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: mingo, linux-kernel, tglx

ebiederm@xmission.com (Eric W. Biederman) wrote:
>
> Ingo Molnar <mingo@elte.hu> writes:
> 
> > here too it's hard for me to give an answer without seeing your specific 
> > changes (against whatever base is most convenient to you). MSI certainly 
> > works fine on current -mm. (at least on my box)
> 
> Ok.  Looking closer.  I have found a clear functional bug.
> 
> When CONFIG_PCI_MSI is not set.
>   move_irq expands to move_native_irq.
> 
>   ack_ioapic_vector
>     move_native_irq
>     ack_ioapic_irq
>       move_irq
>         move_native_irq
> 
>   ack_ioapic_quirk_vector
>     move_native_irq
>     ack_ioapic_quirk_irq
>       move_irq
>         move_native_irq
> 
> So we wind up calling move_native_irq twice when MSI is disabled where
> before your conversion we only ever called it once.  Luckily in
> the case where we have the double call vector_to_irq is a noop so
> we only migration the same irq twice.
> 

OK, but this doesn't seem to answer Ingo's request "could you please send
that fix to me, against whatever base you have it tested on, and i'll merge
it to genirq/irqchips [and fix up genirq if needed].  Please also include a
description of the problem.  How common is that edge retrigger problem, and
how come this has never been seen in the past years since we had
irqbalance?"

The genirq patches are stuck in limboland until issues like this are
resolved.  I'm not planning on sending them to Linus for 2.6.18 so there's
no huge rush on it, but it would be nice to get all these loose ends tied
off reasonably promptly, please.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The i386 and x86_64 genirq patches are wrong!
  2006-06-19 21:41     ` Andrew Morton
@ 2006-06-20  5:13       ` Eric W. Biederman
  2006-06-20 22:23       ` Eric W. Biederman
  1 sibling, 0 replies; 7+ messages in thread
From: Eric W. Biederman @ 2006-06-20  5:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, linux-kernel, tglx

Andrew Morton <akpm@osdl.org> writes:

> ebiederm@xmission.com (Eric W. Biederman) wrote:
>>
>> Ingo Molnar <mingo@elte.hu> writes:
>> 
>> > here too it's hard for me to give an answer without seeing your specific 
>> > changes (against whatever base is most convenient to you). MSI certainly 
>> > works fine on current -mm. (at least on my box)
>> 
>> Ok.  Looking closer.  I have found a clear functional bug.
>> 
>> When CONFIG_PCI_MSI is not set.
>>   move_irq expands to move_native_irq.
>> 
>>   ack_ioapic_vector
>>     move_native_irq
>>     ack_ioapic_irq
>>       move_irq
>>         move_native_irq
>> 
>>   ack_ioapic_quirk_vector
>>     move_native_irq
>>     ack_ioapic_quirk_irq
>>       move_irq
>>         move_native_irq
>> 
>> So we wind up calling move_native_irq twice when MSI is disabled where
>> before your conversion we only ever called it once.  Luckily in
>> the case where we have the double call vector_to_irq is a noop so
>> we only migration the same irq twice.
>> 
>
> OK, but this doesn't seem to answer Ingo's request "could you please send
> that fix to me, against whatever base you have it tested on, and i'll merge
> it to genirq/irqchips [and fix up genirq if needed].  Please also include a
> description of the problem.  How common is that edge retrigger problem, and
> how come this has never been seen in the past years since we had
> irqbalance?"
>
> The genirq patches are stuck in limboland until issues like this are
> resolved.  I'm not planning on sending them to Linus for 2.6.18 so there's
> no huge rush on it, but it would be nice to get all these loose ends tied
> off reasonably promptly, please.

Should be in the morning.  The patches are ready I just need to make certain
I copy all of the appropriate people, add signed-off-by lines, etc.
The reason for the delay was that I didn't have a convenient base.  I
had a development effort that as it's final product turned up a bug.

I think everything is in good shape but I don't trust myself until I slept
on it all.

My patches are incremental against 2.6.17-rc6-mm2 as that turned out
to be the easiest place to start.

Eric




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: The i386 and x86_64 genirq patches are wrong!
  2006-06-19 21:41     ` Andrew Morton
  2006-06-20  5:13       ` Eric W. Biederman
@ 2006-06-20 22:23       ` Eric W. Biederman
  1 sibling, 0 replies; 7+ messages in thread
From: Eric W. Biederman @ 2006-06-20 22:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, linux-kernel, tglx

Andrew Morton <akpm@osdl.org> writes:

> ebiederm@xmission.com (Eric W. Biederman) wrote:
>>
>> Ingo Molnar <mingo@elte.hu> writes:
>> 
>> > here too it's hard for me to give an answer without seeing your specific 
>> > changes (against whatever base is most convenient to you). MSI certainly 
>> > works fine on current -mm. (at least on my box)
>> 
>> Ok.  Looking closer.  I have found a clear functional bug.
>> 
>> When CONFIG_PCI_MSI is not set.
>>   move_irq expands to move_native_irq.
>> 
>>   ack_ioapic_vector
>>     move_native_irq
>>     ack_ioapic_irq
>>       move_irq
>>         move_native_irq
>> 
>>   ack_ioapic_quirk_vector
>>     move_native_irq
>>     ack_ioapic_quirk_irq
>>       move_irq
>>         move_native_irq
>> 
>> So we wind up calling move_native_irq twice when MSI is disabled where
>> before your conversion we only ever called it once.  Luckily in
>> the case where we have the double call vector_to_irq is a noop so
>> we only migration the same irq twice.
>> 
>
> OK, but this doesn't seem to answer Ingo's request "could you please send
> that fix to me, against whatever base you have it tested on, and i'll merge
> it to genirq/irqchips [and fix up genirq if needed].  Please also include a
> description of the problem.  How common is that edge retrigger problem, and
> how come this has never been seen in the past years since we had
> irqbalance?"
>
> The genirq patches are stuck in limboland until issues like this are
> resolved.  I'm not planning on sending them to Linus for 2.6.18 so there's
> no huge rush on it, but it would be nice to get all these loose ends tied
> off reasonably promptly, please.

So to wrap thread up cleanly.  Before I start another one.

The bug fix is only important for level triggered irqs if you change
their vector.  Since we don't change their vector right now we don't
see a problem.

The two outstanding issues I have with the genirq patches are:

- On x86_64 irq migration was removed.  The irq balancer there
  is in user space, but we still have it, so not being able
  to bind irqs to anything but cpu 0 is a regression.

- On i386 the CONFIG_PCI_MSI defined was mishandled and we attempt
  to migrate an irq twice.  As I mentioned above.

A new patchset follows shortly that addresses the root cause and
removes the difference in behaviour of io_apic.c present
CONFIG_PCI_MSI  is defined.  

Eric



   


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-06-20 22:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-13 16:34 The i386 and x86_64 genirq patches are wrong! Eric W. Biederman
2006-06-14  7:03 ` Ingo Molnar
2006-06-14  7:55   ` Eric W. Biederman
2006-06-14 17:31   ` Eric W. Biederman
2006-06-19 21:41     ` Andrew Morton
2006-06-20  5:13       ` Eric W. Biederman
2006-06-20 22:23       ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.