* Question about interrupt routing and irq allocation
@ 2008-05-26 22:08 Jeremy Fitzhardinge
2008-05-27 8:37 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-26 22:08 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner; +Cc: Linux Kernel Mailing List, Andi Kleen
I'm working on a pv driver for hvm Xen guests. That is, when booting
Linux in a fully-virtualized Xen domain, it can still access the
underlying Xen device model to get more efficient device access,
bypassing all the hardware emulation.
Xen implements this by creating a "Xen platform device" on the emulated
PCI bus, which is a bit like a PCI-Xenbus bridge: the pci device driver
which discovers this device can then use it to register a xenbus, and
which then allows all the xenbus drivers to discover their devices.
This device has an interrupt which is asserted when any Xen event
channel has a pending event.
Now one way to handle this interrupt is just make it a single irq which
all xenbus drivers share. They would then treat the event channel bit
array like an internal device register to disambiguate who should get
the interrupt. That's what the current out of tree drivers do, and it
works OK. The main problem is that all the interrupts are mushed
together, and can't be accounted for separately, given separate
affinities, etc. It also means that there's a gratuitous difference
between the pv-on-hvm and pv-on-pv drivers, even though they're
functionally identical.
The other approach would be to treat it as some kind of interrupt
daisy-chain device. The PCI-xenbus driver gets the interrupt, scans the
event channels, maps those onto distinct irqs and then (re-)delivers
them appropriately. This means that the system would have a mixture of
PIC, APIC and Xen interrupt sources. The main problem I see with this
is how to allocate irqs for the routing of event channels to irqs
(which, as I understand it, is equivalent to mapping IOAPIC pins to
local APIC irqs).
Is there some way to allocate irqs reliably, in a way which won't
conflict with APIC-based interrupt sources? If I scan the irq_desc
array looking for entries without any chip, can I claim them and use
them for my Xen-irq-chip, or will that cause later conflicts? Should I
just raise NR_IRQs and start using irqs above 224?
This is not an area I've looked at before, so it's quite likely I'm
getting details wrong. Are there any other examples of devices like
this, either in the x86 world, or in general?
Thanks,
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question about interrupt routing and irq allocation
2008-05-26 22:08 Question about interrupt routing and irq allocation Jeremy Fitzhardinge
@ 2008-05-27 8:37 ` Ingo Molnar
2008-05-27 9:45 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-05-27 8:37 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Thomas Gleixner, Linux Kernel Mailing List, Andi Kleen,
Avi Kivity, H. Peter Anvin, Eric W. Biederman
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> I'm working on a pv driver for hvm Xen guests. That is, when booting
> Linux in a fully-virtualized Xen domain, it can still access the
> underlying Xen device model to get more efficient device access,
> bypassing all the hardware emulation.
>
> Xen implements this by creating a "Xen platform device" on the
> emulated PCI bus, which is a bit like a PCI-Xenbus bridge: the pci
> device driver which discovers this device can then use it to register
> a xenbus, and which then allows all the xenbus drivers to discover
> their devices. This device has an interrupt which is asserted when
> any Xen event channel has a pending event.
>
> Now one way to handle this interrupt is just make it a single irq
> which all xenbus drivers share. They would then treat the event
> channel bit array like an internal device register to disambiguate who
> should get the interrupt. That's what the current out of tree drivers
> do, and it works OK. The main problem is that all the interrupts are
> mushed together, and can't be accounted for separately, given separate
> affinities, etc. It also means that there's a gratuitous difference
> between the pv-on-hvm and pv-on-pv drivers, even though they're
> functionally identical.
>
> The other approach would be to treat it as some kind of interrupt
> daisy-chain device. The PCI-xenbus driver gets the interrupt, scans
> the event channels, maps those onto distinct irqs and then
> (re-)delivers them appropriately. This means that the system would
> have a mixture of PIC, APIC and Xen interrupt sources. The main
> problem I see with this is how to allocate irqs for the routing of
> event channels to irqs (which, as I understand it, is equivalent to
> mapping IOAPIC pins to local APIC irqs).
>
> Is there some way to allocate irqs reliably, in a way which won't
> conflict with APIC-based interrupt sources? If I scan the irq_desc
> array looking for entries without any chip, can I claim them and use
> them for my Xen-irq-chip, or will that cause later conflicts? Should
> I just raise NR_IRQs and start using irqs above 224?
>
> This is not an area I've looked at before, so it's quite likely I'm
> getting details wrong. Are there any other examples of devices like
> this, either in the x86 world, or in general?
hm, in theory the highest quality method would be to do this on the
genirq level and register your own special "Xen irq-chip" methods. [see
include/linux/irq.h's "struct irq_chip" and kernel/irq/*.c]
you can use set_irq_chip() to claim a specific irq and set up its
handling at the highest level. That way you dont have to do anything in
the x86 hw vector space at all and you'd avoid all the overhead and
complications of x86 irq vectors. You can control how these interrupts
are named in /proc/interrupts, etc.
but this needs synchronization with all the other entities that claim
specific irqs and expect to be able to get them. MSI already does that
to a certain level, see arch_setup_msi_irq() / set_irq_msi(). But that
wastes x86 vectors and we dont really want to waste them as you dont
actually want to use any separate per irq hw vectoring mechanism for
these interrupts.
So the most intelligent method would be to reserve the Linux irq itself
but not the vector, i.e. allocate from irq_cfg[] in
arch/x86/kernel/io_apic_64.c so that the irq number does not get reused
- setting irq_cfg[irq].vector to -1 will achieve that.
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question about interrupt routing and irq allocation
2008-05-27 8:37 ` Ingo Molnar
@ 2008-05-27 9:45 ` Jeremy Fitzhardinge
2008-05-27 14:56 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-27 9:45 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, Linux Kernel Mailing List, Andi Kleen,
Avi Kivity, H. Peter Anvin, Eric W. Biederman, Keir Fraser
Ingo Molnar wrote:
> hm, in theory the highest quality method would be to do this on the
> genirq level and register your own special "Xen irq-chip" methods. [see
> include/linux/irq.h's "struct irq_chip" and kernel/irq/*.c]
>
I already have one of those for pv guests, and I think I can reuse it
more or less unchanged.
> you can use set_irq_chip() to claim a specific irq and set up its
> handling at the highest level. That way you dont have to do anything in
> the x86 hw vector space at all and you'd avoid all the overhead and
> complications of x86 irq vectors. You can control how these interrupts
> are named in /proc/interrupts, etc.
>
Yeah, that was my plan.
> but this needs synchronization with all the other entities that claim
> specific irqs and expect to be able to get them. MSI already does that
> to a certain level, see arch_setup_msi_irq() / set_irq_msi(). But that
> wastes x86 vectors and we dont really want to waste them as you dont
> actually want to use any separate per irq hw vectoring mechanism for
> these interrupts.
>
OK. So if I just used create_irq() that would get me an irq I can use,
but would also end up allocating a vector too.
> So the most intelligent method would be to reserve the Linux irq itself
> but not the vector, i.e. allocate from irq_cfg[] in
> arch/x86/kernel/io_apic_64.c so that the irq number does not get reused
> - setting irq_cfg[irq].vector to -1 will achieve that.
>
I'm initially targeting 32-bit, though obviously I'd like something that
works for both 32 and 64 bit. irq_cfg[] is missing in io_apic_32.c;
would I achieve the same effect by setting irq_vector[irq] = 0xff or
something?
Thanks,
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question about interrupt routing and irq allocation
2008-05-27 9:45 ` Jeremy Fitzhardinge
@ 2008-05-27 14:56 ` Ingo Molnar
2008-05-27 16:24 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-05-27 14:56 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Thomas Gleixner, Linux Kernel Mailing List, Andi Kleen,
Avi Kivity, H. Peter Anvin, Eric W. Biederman, Keir Fraser
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>> So the most intelligent method would be to reserve the Linux irq
>> itself but not the vector, i.e. allocate from irq_cfg[] in
>> arch/x86/kernel/io_apic_64.c so that the irq number does not get
>> reused - setting irq_cfg[irq].vector to -1 will achieve that.
>
> I'm initially targeting 32-bit, though obviously I'd like something
> that works for both 32 and 64 bit. irq_cfg[] is missing in
> io_apic_32.c; would I achieve the same effect by setting
> irq_vector[irq] = 0xff or something?
ok, here comes the next phase of a rather cunning plan: please unify
these vector allocators first! ;-)
it's nontrivial but would result in rather nice code. I dont know
whether we want to extend per-CPU vectors to 32-bit as well ... but
might be worth an attempt and we could give any exploratory patches a
try in -tip. Eric, what do you think about the general approach?
this would also pave the way towards unified APIC code. Hm?
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question about interrupt routing and irq allocation
2008-05-27 14:56 ` Ingo Molnar
@ 2008-05-27 16:24 ` Jeremy Fitzhardinge
2008-05-28 9:35 ` Eric W. Biederman
0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-27 16:24 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, Linux Kernel Mailing List, Andi Kleen,
Avi Kivity, H. Peter Anvin, Eric W. Biederman, Keir Fraser
Ingo Molnar wrote:
>> I'm initially targeting 32-bit, though obviously I'd like something
>> that works for both 32 and 64 bit. irq_cfg[] is missing in
>> io_apic_32.c; would I achieve the same effect by setting
>> irq_vector[irq] = 0xff or something?
>>
>
> ok, here comes the next phase of a rather cunning plan: please unify
> these vector allocators first! ;-)
>
Somehow I knew you were going to say that...
> it's nontrivial but would result in rather nice code. I dont know
> whether we want to extend per-CPU vectors to 32-bit as well ... but
> might be worth an attempt and we could give any exploratory patches a
> try in -tip. Eric, what do you think about the general approach?
>
> this would also pave the way towards unified APIC code. Hm
All of that sounds very appealing, particularly as the work on xen-dom0
continues. But in the meantime I'm just using create_irq(), and I'll
wear the wasted vector (after all, it will only happen when booting
under Xen-hvm).
J
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question about interrupt routing and irq allocation
2008-05-27 16:24 ` Jeremy Fitzhardinge
@ 2008-05-28 9:35 ` Eric W. Biederman
2008-05-28 10:40 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2008-05-28 9:35 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List,
Andi Kleen, Avi Kivity, H. Peter Anvin, Keir Fraser
Jeremy Fitzhardinge <jeremy@goop.org> writes:
> Ingo Molnar wrote:
>>> I'm initially targeting 32-bit, though obviously I'd like something that
>>> works for both 32 and 64 bit. irq_cfg[] is missing in io_apic_32.c; would I
>>> achieve the same effect by setting irq_vector[irq] = 0xff or something?
>>>
>>
>> ok, here comes the next phase of a rather cunning plan: please unify these
>> vector allocators first! ;-)
>>
>
> Somehow I knew you were going to say that...
>
>> it's nontrivial but would result in rather nice code. I dont know whether we
>> want to extend per-CPU vectors to 32-bit as well ... but might be worth an
>> attempt and we could give any exploratory patches a try in -tip. Eric, what do
>> you think about the general approach?
>>
>> this would also pave the way towards unified APIC code. Hm
>
> All of that sounds very appealing, particularly as the work on xen-dom0
> continues. But in the meantime I'm just using create_irq(), and I'll wear the
> wasted vector (after all, it will only happen when booting under Xen-hvm).
- I think using create_irq is a good step.
- I think all vectors are wasted in the case of Xen.
- I think we want a individual irq for each xen irq source.
Sparc already does a demux in similar circumstances with
a queue of received MSI messages an a single cpu irq
that these get demuxed from.
If we don't have individual irqs per drivers it will be hard
to share a source base with native drivers.
- I think it would be very nice if we could get irqs allocated
in request_irq instead of create_irq (and equivalents).
- I think ultimately it makes sense to port the per vector
code to 32bit linux. On single cpu systems the cost should
be just a hair more code, but no extra data structures. We
can easily restrict the irq allocation to allocating the same
vector on all cpus for any old machines that prove flaky with
irq migration.
The code between the two architectures we kept fairly close
in sync when I worked on it so a merge should not be a big deal.
Trouble is I'm not finding a lot of time to work on any of this
stuff lately :(
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question about interrupt routing and irq allocation
2008-05-28 9:35 ` Eric W. Biederman
@ 2008-05-28 10:40 ` Jeremy Fitzhardinge
2008-05-28 16:04 ` Eric W. Biederman
0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-05-28 10:40 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List,
Andi Kleen, Avi Kivity, H. Peter Anvin, Keir Fraser
Eric W. Biederman wrote:
> - I think using create_irq is a good step.
> - I think all vectors are wasted in the case of Xen.
>
The case I'm discussing now is in hvm domains - ie, fully virtualized PC
platform. I'm adding a driver to poke a hole through all the emulated
hardware to get directly to the underlying Xen layer so that we can run
paravirtual drivers to get better performance. Only the irqs associated
with pv drivers will waste their vectors.
> - I think we want a individual irq for each xen irq source.
> Sparc already does a demux in similar circumstances with
> a queue of received MSI messages an a single cpu irq
> that these get demuxed from.
> If we don't have individual irqs per drivers it will be hard
> to share a source base with native drivers.
>
In this case the sharing is between fully paravirtualized paravirt_ops
Xen and pv-on-hvm drivers. In general I want those drivers to look as
normal as possible, so they should use irqs in a normal way.
> - I think it would be very nice if we could get irqs allocated
> in request_irq instead of create_irq (and equivalents).
>
Something along the lines of passing -1 as the irq, and it would return
the allocated irq? It's not clear to me how all that would fit together.
> - I think ultimately it makes sense to port the per vector
> code to 32bit linux. On single cpu systems the cost should
> be just a hair more code, but no extra data structures. We
> can easily restrict the irq allocation to allocating the same
> vector on all cpus for any old machines that prove flaky with
> irq migration.
>
> The code between the two architectures we kept fairly close
> in sync when I worked on it so a merge should not be a big deal.
Well, if I find myself at a loose end, I'll have a look at it.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question about interrupt routing and irq allocation
2008-05-28 10:40 ` Jeremy Fitzhardinge
@ 2008-05-28 16:04 ` Eric W. Biederman
0 siblings, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2008-05-28 16:04 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List,
Andi Kleen, Avi Kivity, H. Peter Anvin, Keir Fraser
Jeremy Fitzhardinge <jeremy@goop.org> writes:
> Eric W. Biederman wrote:
>> - I think using create_irq is a good step.
>> - I think all vectors are wasted in the case of Xen.
>>
>
> The case I'm discussing now is in hvm domains - ie, fully virtualized PC
> platform. I'm adding a driver to poke a hole through all the emulated hardware
> to get directly to the underlying Xen layer so that we can run paravirtual
> drivers to get better performance. Only the irqs associated with pv drivers will
> waste their vectors.
I see. The fully virtualized machine case. So we do have apics
visible to us.
>> - I think we want a individual irq for each xen irq source.
>> Sparc already does a demux in similar circumstances with
>> a queue of received MSI messages an a single cpu irq
>> that these get demuxed from.
>> If we don't have individual irqs per drivers it will be hard
>> to share a source base with native drivers.
>>
>
> In this case the sharing is between fully paravirtualized paravirt_ops Xen and
> pv-on-hvm drivers. In general I want those drivers to look as normal as
> possible, so they should use irqs in a normal way.
Right. We should be able to assume that the native irqs for
those devices are not shared, and we should be able to extend
that property (among others) to the virtualzed irqs for the
devices.
Under other hypervisors sparc, ppc we can run unmodified pci
drivers just the OS platform code changes. How close to that
can we come in the Xen case?
I think running unmodified drivers with the OS platform code doing
the adaption should be the goal, unless there is a real need for
the driver to know about Xen. Is that compatible with what you
are trying to achieve?
>> - I think it would be very nice if we could get irqs allocated
>> in request_irq instead of create_irq (and equivalents).
>>
>
> Something along the lines of passing -1 as the irq, and it would return the
> allocated irq? It's not clear to me how all that would fit together.
Groan. I mispoke. I meant:
- I think it would be very nice if we could get vectors allocated
in request_irq instead of in create_irq (and equivalents).
Just delayed vector allocation. I wasn't after something driver
visible.
>> - I think ultimately it makes sense to port the per vector
>> code to 32bit linux. On single cpu systems the cost should
>> be just a hair more code, but no extra data structures. We
>> can easily restrict the irq allocation to allocating the same
>> vector on all cpus for any old machines that prove flaky with
>> irq migration.
>>
>> The code between the two architectures we kept fairly close
>> in sync when I worked on it so a merge should not be a big deal.
>
> Well, if I find myself at a loose end, I'll have a look at it.
Thanks.
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-05-28 16:11 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-26 22:08 Question about interrupt routing and irq allocation Jeremy Fitzhardinge
2008-05-27 8:37 ` Ingo Molnar
2008-05-27 9:45 ` Jeremy Fitzhardinge
2008-05-27 14:56 ` Ingo Molnar
2008-05-27 16:24 ` Jeremy Fitzhardinge
2008-05-28 9:35 ` Eric W. Biederman
2008-05-28 10:40 ` Jeremy Fitzhardinge
2008-05-28 16:04 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox