public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Question about interrupt routing and irq allocation
@ 2008-05-26 22:08 Jeremy Fitzhardinge
  2008-05-27  8:37 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-26 22:08 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner; +Cc: Linux Kernel Mailing List, Andi Kleen

I'm working on a pv driver for hvm Xen guests.  That is, when booting 
Linux in a fully-virtualized Xen domain, it can still access the 
underlying Xen device model to get more efficient device access, 
bypassing all the hardware emulation.

Xen implements this by creating a "Xen platform device" on the emulated 
PCI bus, which is a bit like a PCI-Xenbus bridge:  the pci device driver 
which discovers this device can then use it to register a xenbus, and 
which then allows all the xenbus drivers to discover their devices.  
This device has an interrupt which is asserted when any Xen event 
channel has a pending event.

Now one way to handle this interrupt is just make it a single irq which 
all xenbus drivers share.  They would then treat the event channel bit 
array like an internal device register to disambiguate who should get 
the interrupt.  That's what the current out of tree drivers do, and it 
works OK.  The main problem is that all the interrupts are mushed 
together, and can't be accounted for separately, given separate 
affinities, etc.  It also means that there's a gratuitous difference 
between the pv-on-hvm and pv-on-pv drivers, even though they're 
functionally identical.

The other approach would be to treat it as some kind of interrupt 
daisy-chain device.  The PCI-xenbus driver gets the interrupt, scans the 
event channels, maps those onto distinct irqs and then (re-)delivers 
them appropriately.  This means that the system would have a mixture of 
PIC, APIC and Xen interrupt sources.  The main problem I see with this 
is how to allocate irqs for the routing of event channels to irqs 
(which, as I understand it, is equivalent to mapping IOAPIC pins to 
local APIC irqs).

Is there some way to allocate irqs reliably, in a way which won't 
conflict with APIC-based interrupt sources?  If I scan the irq_desc 
array looking for entries without any chip, can I claim them and use 
them for my Xen-irq-chip, or will that cause later conflicts?  Should I 
just raise NR_IRQs and start using irqs above 224?

This is not an area I've looked at before, so it's quite likely I'm 
getting details wrong.  Are there any other examples of devices like 
this, either in the x86 world, or in general?

Thanks,
    J

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about interrupt routing and irq allocation
  2008-05-26 22:08 Question about interrupt routing and irq allocation Jeremy Fitzhardinge
@ 2008-05-27  8:37 ` Ingo Molnar
  2008-05-27  9:45   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-05-27  8:37 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Andi Kleen,
	Avi Kivity, H. Peter Anvin, Eric W. Biederman


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> I'm working on a pv driver for hvm Xen guests.  That is, when booting 
> Linux in a fully-virtualized Xen domain, it can still access the 
> underlying Xen device model to get more efficient device access, 
> bypassing all the hardware emulation.
>
> Xen implements this by creating a "Xen platform device" on the 
> emulated PCI bus, which is a bit like a PCI-Xenbus bridge: the pci 
> device driver which discovers this device can then use it to register 
> a xenbus, and which then allows all the xenbus drivers to discover 
> their devices.  This device has an interrupt which is asserted when 
> any Xen event channel has a pending event.
>
> Now one way to handle this interrupt is just make it a single irq 
> which all xenbus drivers share.  They would then treat the event 
> channel bit array like an internal device register to disambiguate who 
> should get the interrupt.  That's what the current out of tree drivers 
> do, and it works OK.  The main problem is that all the interrupts are 
> mushed together, and can't be accounted for separately, given separate 
> affinities, etc.  It also means that there's a gratuitous difference 
> between the pv-on-hvm and pv-on-pv drivers, even though they're 
> functionally identical.
>
> The other approach would be to treat it as some kind of interrupt 
> daisy-chain device.  The PCI-xenbus driver gets the interrupt, scans 
> the event channels, maps those onto distinct irqs and then 
> (re-)delivers them appropriately.  This means that the system would 
> have a mixture of PIC, APIC and Xen interrupt sources.  The main 
> problem I see with this is how to allocate irqs for the routing of 
> event channels to irqs (which, as I understand it, is equivalent to 
> mapping IOAPIC pins to local APIC irqs).
>
> Is there some way to allocate irqs reliably, in a way which won't 
> conflict with APIC-based interrupt sources?  If I scan the irq_desc 
> array looking for entries without any chip, can I claim them and use 
> them for my Xen-irq-chip, or will that cause later conflicts?  Should 
> I just raise NR_IRQs and start using irqs above 224?
>
> This is not an area I've looked at before, so it's quite likely I'm 
> getting details wrong.  Are there any other examples of devices like 
> this, either in the x86 world, or in general?

hm, in theory the highest quality method would be to do this on the 
genirq level and register your own special "Xen irq-chip" methods. [see 
include/linux/irq.h's "struct irq_chip" and kernel/irq/*.c]

you can use set_irq_chip() to claim a specific irq and set up its 
handling at the highest level. That way you dont have to do anything in 
the x86 hw vector space at all and you'd avoid all the overhead and 
complications of x86 irq vectors. You can control how these interrupts 
are named in /proc/interrupts, etc.

but this needs synchronization with all the other entities that claim 
specific irqs and expect to be able to get them. MSI already does that 
to a certain level, see arch_setup_msi_irq() / set_irq_msi(). But that 
wastes x86 vectors and we dont really want to waste them as you dont 
actually want to use any separate per irq hw vectoring mechanism for 
these interrupts.

So the most intelligent method would be to reserve the Linux irq itself 
but not the vector, i.e. allocate from irq_cfg[] in 
arch/x86/kernel/io_apic_64.c so that the irq number does not get reused 
- setting irq_cfg[irq].vector to -1 will achieve that.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about interrupt routing and irq allocation
  2008-05-27  8:37 ` Ingo Molnar
@ 2008-05-27  9:45   ` Jeremy Fitzhardinge
  2008-05-27 14:56     ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-27  9:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Andi Kleen,
	Avi Kivity, H. Peter Anvin, Eric W. Biederman, Keir Fraser

Ingo Molnar wrote:
> hm, in theory the highest quality method would be to do this on the 
> genirq level and register your own special "Xen irq-chip" methods. [see 
> include/linux/irq.h's "struct irq_chip" and kernel/irq/*.c]
>   

I already have one of those for pv guests, and I think I can reuse it 
more or less unchanged.

> you can use set_irq_chip() to claim a specific irq and set up its 
> handling at the highest level. That way you dont have to do anything in 
> the x86 hw vector space at all and you'd avoid all the overhead and 
> complications of x86 irq vectors. You can control how these interrupts 
> are named in /proc/interrupts, etc.
>   

Yeah, that was my plan.

> but this needs synchronization with all the other entities that claim 
> specific irqs and expect to be able to get them. MSI already does that 
> to a certain level, see arch_setup_msi_irq() / set_irq_msi(). But that 
> wastes x86 vectors and we dont really want to waste them as you dont 
> actually want to use any separate per irq hw vectoring mechanism for 
> these interrupts.
>   

OK.  So if I just used create_irq() that would get me an irq I can use, 
but would also end up allocating a vector too.

> So the most intelligent method would be to reserve the Linux irq itself 
> but not the vector, i.e. allocate from irq_cfg[] in 
> arch/x86/kernel/io_apic_64.c so that the irq number does not get reused 
> - setting irq_cfg[irq].vector to -1 will achieve that.
>   

I'm initially targeting 32-bit, though obviously I'd like something that 
works for both 32 and 64 bit.  irq_cfg[] is missing in io_apic_32.c; 
would I achieve the same effect by setting irq_vector[irq] = 0xff or 
something?

Thanks,
    J

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about interrupt routing and irq allocation
  2008-05-27  9:45   ` Jeremy Fitzhardinge
@ 2008-05-27 14:56     ` Ingo Molnar
  2008-05-27 16:24       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-05-27 14:56 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Andi Kleen,
	Avi Kivity, H. Peter Anvin, Eric W. Biederman, Keir Fraser


* Jeremy Fitzhardinge <jeremy@goop.org> wrote:

>> So the most intelligent method would be to reserve the Linux irq 
>> itself but not the vector, i.e. allocate from irq_cfg[] in 
>> arch/x86/kernel/io_apic_64.c so that the irq number does not get 
>> reused - setting irq_cfg[irq].vector to -1 will achieve that.
>
> I'm initially targeting 32-bit, though obviously I'd like something 
> that works for both 32 and 64 bit.  irq_cfg[] is missing in 
> io_apic_32.c; would I achieve the same effect by setting 
> irq_vector[irq] = 0xff or something?

ok, here comes the next phase of a rather cunning plan: please unify 
these vector allocators first! ;-)

it's nontrivial but would result in rather nice code. I dont know 
whether we want to extend per-CPU vectors to 32-bit as well ... but 
might be worth an attempt and we could give any exploratory patches a 
try in -tip. Eric, what do you think about the general approach?

this would also pave the way towards unified APIC code. Hm?

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about interrupt routing and irq allocation
  2008-05-27 14:56     ` Ingo Molnar
@ 2008-05-27 16:24       ` Jeremy Fitzhardinge
  2008-05-28  9:35         ` Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-27 16:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Linux Kernel Mailing List, Andi Kleen,
	Avi Kivity, H. Peter Anvin, Eric W. Biederman, Keir Fraser

Ingo Molnar wrote:
>> I'm initially targeting 32-bit, though obviously I'd like something 
>> that works for both 32 and 64 bit.  irq_cfg[] is missing in 
>> io_apic_32.c; would I achieve the same effect by setting 
>> irq_vector[irq] = 0xff or something?
>>     
>
> ok, here comes the next phase of a rather cunning plan: please unify 
> these vector allocators first! ;-)
>   

Somehow I knew you were going to say that...

> it's nontrivial but would result in rather nice code. I dont know 
> whether we want to extend per-CPU vectors to 32-bit as well ... but 
> might be worth an attempt and we could give any exploratory patches a 
> try in -tip. Eric, what do you think about the general approach?
>
> this would also pave the way towards unified APIC code. Hm

All of that sounds very appealing, particularly as the work on xen-dom0 
continues.  But in the meantime I'm just using create_irq(), and I'll 
wear the wasted vector (after all, it will only happen when booting 
under Xen-hvm).

    J

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about interrupt routing and irq allocation
  2008-05-27 16:24       ` Jeremy Fitzhardinge
@ 2008-05-28  9:35         ` Eric W. Biederman
  2008-05-28 10:40           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2008-05-28  9:35 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List,
	Andi Kleen, Avi Kivity, H. Peter Anvin, Keir Fraser

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Ingo Molnar wrote:
>>> I'm initially targeting 32-bit, though obviously I'd like something that
>>> works for both 32 and 64 bit.  irq_cfg[] is missing in io_apic_32.c; would I
>>> achieve the same effect by setting irq_vector[irq] = 0xff or something?
>>>
>>
>> ok, here comes the next phase of a rather cunning plan: please unify these
>> vector allocators first! ;-)
>>
>
> Somehow I knew you were going to say that...
>
>> it's nontrivial but would result in rather nice code. I dont know whether we
>> want to extend per-CPU vectors to 32-bit as well ... but might be worth an
>> attempt and we could give any exploratory patches a try in -tip. Eric, what do
>> you think about the general approach?
>>
>> this would also pave the way towards unified APIC code. Hm
>
> All of that sounds very appealing, particularly as the work on xen-dom0
> continues.  But in the meantime I'm just using create_irq(), and I'll wear the
> wasted vector (after all, it will only happen when booting under Xen-hvm).

- I think using create_irq is a good step.
- I think all vectors are wasted in the case of Xen.
- I think we want a individual irq for each xen irq source.
  Sparc already does a demux in similar circumstances with
  a queue of received MSI messages an a single cpu irq
  that these get demuxed from.
  If we don't have individual irqs per drivers it will be hard
  to share a source base with native drivers.
- I think it would be very nice if we could get irqs allocated
  in request_irq instead of create_irq (and equivalents).
  
- I think ultimately it makes sense to port the per vector
  code to 32bit linux.  On single cpu systems the cost should
  be just a hair more code, but no extra data structures.  We
  can easily restrict the irq allocation to allocating the same
  vector on all cpus for any old machines that prove flaky with
  irq migration.

  The code between the two architectures we kept fairly close
  in sync when I worked on it so a merge should not be a big deal.

  Trouble is I'm not finding a lot of time to work on any of this
  stuff lately :(

Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about interrupt routing and irq allocation
  2008-05-28  9:35         ` Eric W. Biederman
@ 2008-05-28 10:40           ` Jeremy Fitzhardinge
  2008-05-28 16:04             ` Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-28 10:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List,
	Andi Kleen, Avi Kivity, H. Peter Anvin, Keir Fraser

Eric W. Biederman wrote:
> - I think using create_irq is a good step.
> - I think all vectors are wasted in the case of Xen.
>   

The case I'm discussing now is in hvm domains - ie, fully virtualized PC 
platform. I'm adding a driver to poke a hole through all the emulated 
hardware to get directly to the underlying Xen layer so that we can run 
paravirtual drivers to get better performance. Only the irqs associated 
with pv drivers will waste their vectors.

> - I think we want a individual irq for each xen irq source.
>   Sparc already does a demux in similar circumstances with
>   a queue of received MSI messages an a single cpu irq
>   that these get demuxed from.
>   If we don't have individual irqs per drivers it will be hard
>   to share a source base with native drivers.
>   

In this case the sharing is between fully paravirtualized paravirt_ops 
Xen and pv-on-hvm drivers. In general I want those drivers to look as 
normal as possible, so they should use irqs in a normal way.

> - I think it would be very nice if we could get irqs allocated
>   in request_irq instead of create_irq (and equivalents).
>   

Something along the lines of passing -1 as the irq, and it would return 
the allocated irq? It's not clear to me how all that would fit together.

> - I think ultimately it makes sense to port the per vector
>   code to 32bit linux.  On single cpu systems the cost should
>   be just a hair more code, but no extra data structures.  We
>   can easily restrict the irq allocation to allocating the same
>   vector on all cpus for any old machines that prove flaky with
>   irq migration.
>
>   The code between the two architectures we kept fairly close
>   in sync when I worked on it so a merge should not be a big deal.

Well, if I find myself at a loose end, I'll have a look at it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about interrupt routing and irq allocation
  2008-05-28 10:40           ` Jeremy Fitzhardinge
@ 2008-05-28 16:04             ` Eric W. Biederman
  0 siblings, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2008-05-28 16:04 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List,
	Andi Kleen, Avi Kivity, H. Peter Anvin, Keir Fraser

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>> - I think using create_irq is a good step.
>> - I think all vectors are wasted in the case of Xen.
>>
>
> The case I'm discussing now is in hvm domains - ie, fully virtualized PC
> platform. I'm adding a driver to poke a hole through all the emulated hardware
> to get directly to the underlying Xen layer so that we can run paravirtual
> drivers to get better performance. Only the irqs associated with pv drivers will
> waste their vectors.

I see. The fully virtualized machine case.  So we do have apics
visible to us.

>> - I think we want a individual irq for each xen irq source.
>>   Sparc already does a demux in similar circumstances with
>>   a queue of received MSI messages an a single cpu irq
>>   that these get demuxed from.
>>   If we don't have individual irqs per drivers it will be hard
>>   to share a source base with native drivers.
>>
>
> In this case the sharing is between fully paravirtualized paravirt_ops Xen and
> pv-on-hvm drivers. In general I want those drivers to look as normal as
> possible, so they should use irqs in a normal way.

Right.  We should be able to assume that the native irqs for
those devices are not shared, and we should be able to extend
that property (among others) to the virtualzed irqs for the
devices.

Under other hypervisors sparc, ppc we can run unmodified pci
drivers just the OS platform code changes.  How close to that
can we come in the Xen case?

I think running unmodified drivers with the OS platform code doing
the adaption should be the goal, unless there is a real need for
the driver to know about Xen.  Is that compatible with what you
are trying to achieve?

>> - I think it would be very nice if we could get irqs allocated
>>   in request_irq instead of create_irq (and equivalents).
>>
>
> Something along the lines of passing -1 as the irq, and it would return the
> allocated irq? It's not clear to me how all that would fit together.

Groan.  I mispoke.  I meant:
- I think it would be very nice if we could get vectors allocated
  in request_irq instead of in create_irq (and equivalents).

Just delayed vector allocation.  I wasn't after something driver
visible.

>> - I think ultimately it makes sense to port the per vector
>>   code to 32bit linux.  On single cpu systems the cost should
>>   be just a hair more code, but no extra data structures.  We
>>   can easily restrict the irq allocation to allocating the same
>>   vector on all cpus for any old machines that prove flaky with
>>   irq migration.
>>
>>   The code between the two architectures we kept fairly close
>>   in sync when I worked on it so a merge should not be a big deal.
>
> Well, if I find myself at a loose end, I'll have a look at it.

Thanks.

Eric


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-05-28 16:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-26 22:08 Question about interrupt routing and irq allocation Jeremy Fitzhardinge
2008-05-27  8:37 ` Ingo Molnar
2008-05-27  9:45   ` Jeremy Fitzhardinge
2008-05-27 14:56     ` Ingo Molnar
2008-05-27 16:24       ` Jeremy Fitzhardinge
2008-05-28  9:35         ` Eric W. Biederman
2008-05-28 10:40           ` Jeremy Fitzhardinge
2008-05-28 16:04             ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox