* Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu?
@ 2008-09-23 20:02 Jeremy Fitzhardinge
2008-09-24 8:45 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-09-23 20:02 UTC (permalink / raw)
To: Ingo Molnar, Eric W. Biederman, Thomas Gleixner; +Cc: Linux Kernel Mailing List
Hi,
I'm reworking Xen's interrupt handling to isolate it a bit from the
workings of the apic-based code, as Eric suggested a while back.
As I've mentioned before, Xen represents interrupts as event channels.
There are two major classes of event channels: per-cpu and, erm, not
percpu. Per-cpu event channels are for things like timers and IPI
function calls which are inherently per-cpu; it's meaningless to
consider, for example, migrating them from cpu to cpu. I guess they're
analogous to the the local apic vectors.
(Non-percpu event channels can be bound to a particular cpu, and rebound
at will; I'm not worried about them here.)
Previously I allocated an irq per percpu event channel per cpu. This
was pretty wasteful, since I need about 5-6 of them per cpu, so the
number of interrupts increases quite quickly as cpus does. There's no
deep problem with that, but it gets fairly ugly in /proc/interrupts, and
there's some tricky corners to manage in suspend/resume.
This time around I'm allocating a single irq for each percpu interrupt
source (so one for timers, one for IPI, etc), and mapping each per-cpu
event channel to each. But I'm wondering what the correct behaviour of
irq_chip->mask/unmask should be in this case. Each event channel is
individually maskable, so when ->mask gets called, I can either mask all
the event channels associated with that irq, or just the one for this
cpu. The latter makes most sense for me, but I don't quite understand
the irq infrastructure enough to know if it will have bad effects globally.
When I request the irq, I pass IRQF_PERCPU in the flags, but aside from
preventing migration, this only seems to have an effect on __do_IRQ(),
which looks like a legacy path anyway. It seems to me that by setting
it that I'm giving the interrupt subsystem fair warning that ->mask() is
only going to disable the interrupt on this cpu.
Are there any other ill-effects of sharing an irq across all cpus like
this? I guess there's some risk of contention on the irq_desc lock.
Thanks,
J
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu?
2008-09-23 20:02 Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu? Jeremy Fitzhardinge
@ 2008-09-24 8:45 ` Ingo Molnar
2008-09-24 9:54 ` Eric W. Biederman
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-09-24 8:45 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Eric W. Biederman, Thomas Gleixner, Linux Kernel Mailing List
* Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Hi,
>
> I'm reworking Xen's interrupt handling to isolate it a bit from the
> workings of the apic-based code, as Eric suggested a while back.
>
> As I've mentioned before, Xen represents interrupts as event channels.
> There are two major classes of event channels: per-cpu and, erm, not
> percpu. Per-cpu event channels are for things like timers and IPI
> function calls which are inherently per-cpu; it's meaningless to
> consider, for example, migrating them from cpu to cpu. I guess
> they're analogous to the the local apic vectors.
>
> (Non-percpu event channels can be bound to a particular cpu, and
> rebound at will; I'm not worried about them here.)
>
> Previously I allocated an irq per percpu event channel per cpu. This
> was pretty wasteful, since I need about 5-6 of them per cpu, so the
> number of interrupts increases quite quickly as cpus does. There's no
> deep problem with that, but it gets fairly ugly in /proc/interrupts,
> and there's some tricky corners to manage in suspend/resume.
>
> This time around I'm allocating a single irq for each percpu interrupt
> source (so one for timers, one for IPI, etc), and mapping each per-cpu
> event channel to each. But I'm wondering what the correct behaviour
> of irq_chip->mask/unmask should be in this case. Each event channel
> is individually maskable, so when ->mask gets called, I can either
> mask all the event channels associated with that irq, or just the one
> for this cpu. The latter makes most sense for me, but I don't quite
> understand the irq infrastructure enough to know if it will have bad
> effects globally.
>
> When I request the irq, I pass IRQF_PERCPU in the flags, but aside
> from preventing migration, this only seems to have an effect on
> __do_IRQ(), which looks like a legacy path anyway. It seems to me
> that by setting it that I'm giving the interrupt subsystem fair
> warning that ->mask() is only going to disable the interrupt on this
> cpu.
>
> Are there any other ill-effects of sharing an irq across all cpus like
> this? I guess there's some risk of contention on the irq_desc lock.
You should be a pretty special case: both the producer and consumer of
those IRQs. So if you change the semantics of ->mask()/->unmask() you'll
only affect your own drivers: you might get irqs even after you
disable_irq_nosync(). [but the genirq layer wont pass them down]
The genirq layer should be robust enough all across - as stray IRQs are
commonplace on real hw anyway. Sometimes we have ->unmask() methods that
opportunistically do not unmask the hw itself (but hope for an irq to
not occur) - edge handlers for example. And you probably wont use
disable_irq_nosync() anyway, you just want genirq to prevent irq handler
self-reentry, right?
So i _think_ in theory with your scheme you should get enough
concurrency and no arbitrary limitations/serialization/etc. - but you
should check whether Miss Practice agrees with that ;)
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu?
2008-09-24 8:45 ` Ingo Molnar
@ 2008-09-24 9:54 ` Eric W. Biederman
2008-09-24 10:18 ` Ingo Molnar
2008-09-24 18:33 ` Jeremy Fitzhardinge
0 siblings, 2 replies; 8+ messages in thread
From: Eric W. Biederman @ 2008-09-24 9:54 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jeremy Fitzhardinge, Thomas Gleixner, Linux Kernel Mailing List
Ingo Molnar <mingo@elte.hu> writes:
> * Jeremy Fitzhardinge <jeremy@goop.org> wrote:
>
>> Hi,
>>
>> I'm reworking Xen's interrupt handling to isolate it a bit from the
>> workings of the apic-based code, as Eric suggested a while back.
>>
>> As I've mentioned before, Xen represents interrupts as event channels.
>> There are two major classes of event channels: per-cpu and, erm, not
>> percpu. Per-cpu event channels are for things like timers and IPI
>> function calls which are inherently per-cpu; it's meaningless to
>> consider, for example, migrating them from cpu to cpu. I guess
>> they're analogous to the the local apic vectors.
>>
>> (Non-percpu event channels can be bound to a particular cpu, and
>> rebound at will; I'm not worried about them here.)
>>
>> Previously I allocated an irq per percpu event channel per cpu. This
>> was pretty wasteful, since I need about 5-6 of them per cpu, so the
>> number of interrupts increases quite quickly as cpus does. There's no
>> deep problem with that, but it gets fairly ugly in /proc/interrupts,
>> and there's some tricky corners to manage in suspend/resume.
Every high performance device wants one irq per cpu.
So if it gets ugly in /proc/interrupts we should look at fixing
/proc/interrupts.
It looked like in Xen each of those interrupts were delivered
to different event channels. Did I misread that code?
I really hate the notion of sharing a single irq_desc across
multiple cpus as a preferred mode of operation. As NUMA comes
into play it guarantees we will have cross cpu memory fetches
on a fast path for irq handling.
Other than the beautiful way we print things in /proc/interrupts
IRQ_PER_CPU feels like a really bad idea. Especially in that
it enshrines the nasty per cpu irq counters that scale horribly.
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu?
2008-09-24 9:54 ` Eric W. Biederman
@ 2008-09-24 10:18 ` Ingo Molnar
2008-09-24 18:33 ` Jeremy Fitzhardinge
1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2008-09-24 10:18 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Jeremy Fitzhardinge, Thomas Gleixner, Linux Kernel Mailing List
* Eric W. Biederman <ebiederm@xmission.com> wrote:
> Other than the beautiful way we print things in /proc/interrupts
> IRQ_PER_CPU feels like a really bad idea. Especially in that it
> enshrines the nasty per cpu irq counters that scale horribly.
ok - i thought the idea was to still have a per cpu irq desc.
so lets fix /proc/interrupts in that model. Only a single entry for
every class of interrupts or so?
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu?
2008-09-24 9:54 ` Eric W. Biederman
2008-09-24 10:18 ` Ingo Molnar
@ 2008-09-24 18:33 ` Jeremy Fitzhardinge
2008-09-24 19:34 ` Eric W. Biederman
1 sibling, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-09-24 18:33 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List
Eric W. Biederman wrote:
> I really hate the notion of sharing a single irq_desc across
> multiple cpus as a preferred mode of operation. As NUMA comes
> into play it guarantees we will have cross cpu memory fetches
> on a fast path for irq handling.
>
> Other than the beautiful way we print things in /proc/interrupts
> IRQ_PER_CPU feels like a really bad idea. Especially in that
> it enshrines the nasty per cpu irq counters that scale horribly.
>
I found handle_percpu_irq() which addresses my concerns. It doesn't
attempt to mask the interrupt, takes no locks, and doesn't set or test
IRQ_INPROGRESS in desc->status, so it will scale perfectly across
multiple cpus. It makes no changes to the desc structure, so there
isn't even any cacheline bouncing.
J
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu?
2008-09-24 18:33 ` Jeremy Fitzhardinge
@ 2008-09-24 19:34 ` Eric W. Biederman
2008-09-27 19:44 ` Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2008-09-24 19:34 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Ingo Molnar, Thomas Gleixner, Linux Kernel Mailing List
Jeremy Fitzhardinge <jeremy@goop.org> writes:
> I found handle_percpu_irq() which addresses my concerns. It doesn't
> attempt to mask the interrupt, takes no locks, and doesn't set or test
> IRQ_INPROGRESS in desc->status, so it will scale perfectly across
> multiple cpus. It makes no changes to the desc structure, so there
> isn't even any cacheline bouncing.
kstat_irqs. Is arguably part of the irq structure.
And kstat_irqs is a major pain in my book.
And for a rare event you have a cacheline read.
I don't think we are quite there yet but we really want to allocate
irq_desc on the right NUMA node in a multi socket system, to reduce
the cache miss times.
Is it a big deal? Probably not. But I think it would be a bad
idea to increasingly use infrastructure that will make it hard
to optimize the code.
Especially since the common case in high performance drivers
is going to be, individually routable irq sources. Having
one queue per cpu and one irq per queue. Which sounds like
the same case you have.
Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu?
2008-09-24 19:34 ` Eric W. Biederman
@ 2008-09-27 19:44 ` Ingo Molnar
2008-09-28 4:58 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2008-09-27 19:44 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Jeremy Fitzhardinge, Thomas Gleixner, Linux Kernel Mailing List,
Yinghai Lu
* Eric W. Biederman <ebiederm@xmission.com> wrote:
> Jeremy Fitzhardinge <jeremy@goop.org> writes:
>
> > I found handle_percpu_irq() which addresses my concerns. It doesn't
> > attempt to mask the interrupt, takes no locks, and doesn't set or test
> > IRQ_INPROGRESS in desc->status, so it will scale perfectly across
> > multiple cpus. It makes no changes to the desc structure, so there
> > isn't even any cacheline bouncing.
>
> kstat_irqs. Is arguably part of the irq structure.
> And kstat_irqs is a major pain in my book.
>
> And for a rare event you have a cacheline read.
> I don't think we are quite there yet but we really want to allocate
> irq_desc on the right NUMA node in a multi socket system, to reduce
> the cache miss times.
note that we already do _almost_ that in tip/irq/sparseirq. dyn_array[]
will extend itself in a NUMA-aware fashion. (normal device irq_desc
entries will be allocated via kmalloc)
what would be needed is to deallocate/reallocate irq_desc when the IRQ
affinity is changed? (i.e. when a device is migrated to a specific NUMA
node)
> Is it a big deal? Probably not. But I think it would be a bad idea
> to increasingly use infrastructure that will make it hard to optimize
> the code.
>
> Especially since the common case in high performance drivers is going
> to be, individually routable irq sources. Having one queue per cpu
> and one irq per queue. Which sounds like the same case you have.
agreed - the kstat_irqs cacheline bounce would show up in Xen benchmarks
i'm sure.
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu?
2008-09-27 19:44 ` Ingo Molnar
@ 2008-09-28 4:58 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-09-28 4:58 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric W. Biederman, Thomas Gleixner, Linux Kernel Mailing List,
Yinghai Lu
Ingo Molnar wrote:
> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>
>> Jeremy Fitzhardinge <jeremy@goop.org> writes:
>>
>>
>>> I found handle_percpu_irq() which addresses my concerns. It doesn't
>>> attempt to mask the interrupt, takes no locks, and doesn't set or test
>>> IRQ_INPROGRESS in desc->status, so it will scale perfectly across
>>> multiple cpus. It makes no changes to the desc structure, so there
>>> isn't even any cacheline bouncing.
>>>
>> kstat_irqs. Is arguably part of the irq structure.
>> And kstat_irqs is a major pain in my book.
>>
>> And for a rare event you have a cacheline read.
>> I don't think we are quite there yet but we really want to allocate
>> irq_desc on the right NUMA node in a multi socket system, to reduce
>> the cache miss times.
>>
>
> note that we already do _almost_ that in tip/irq/sparseirq. dyn_array[]
> will extend itself in a NUMA-aware fashion. (normal device irq_desc
> entries will be allocated via kmalloc)
>
> what would be needed is to deallocate/reallocate irq_desc when the IRQ
> affinity is changed? (i.e. when a device is migrated to a specific NUMA
> node)
>
>
>> Is it a big deal? Probably not. But I think it would be a bad idea
>> to increasingly use infrastructure that will make it hard to optimize
>> the code.
>>
>> Especially since the common case in high performance drivers is going
>> to be, individually routable irq sources. Having one queue per cpu
>> and one irq per queue. Which sounds like the same case you have.
>>
>
> agreed - the kstat_irqs cacheline bounce would show up in Xen benchmarks
> i'm sure.
>
I've put that approach aside anyway, since I couldn't get it to work
after a day of fiddling and I didn't want to waste too much time on it.
I've just restricted myself to avoiding the normal interrupt delivery
path, and going direct from event channel to irq to desc->handler.
J
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-09-28 4:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-23 20:02 Should irq_chip->mask disable percpu interrupts to all cpus, or just to this cpu? Jeremy Fitzhardinge
2008-09-24 8:45 ` Ingo Molnar
2008-09-24 9:54 ` Eric W. Biederman
2008-09-24 10:18 ` Ingo Molnar
2008-09-24 18:33 ` Jeremy Fitzhardinge
2008-09-24 19:34 ` Eric W. Biederman
2008-09-27 19:44 ` Ingo Molnar
2008-09-28 4:58 ` Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox