HPET Stack overflow

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* HPET Stack overflow
@ 2013-09-30 17:00 Andrew Cooper
  2013-10-01 10:01 ` Jan Beulich
  0 siblings, 1 reply; 2+ messages in thread
From: Andrew Cooper @ 2013-09-30 17:00 UTC (permalink / raw)
  To: Xen-devel List, Jan Beulich

Hello,

After some of the more urgent regressions XenServer testing has found
have been fixed, I finally have time to get back to this one.

CA-113568 - Dumping interesting state:
local_irq_count was 8
num_hpets_used = 8
HPET[00] idx 0, cpu 3, flags 0x00000001, cpumask 00000008
MSI: HPET    25 vec=41  fixed  edge   assert phys    cpu dest=00000002
mask=1/0/?
HPET[01] idx 1, cpu 4294967295, flags 0x00000000, cpumask 00000000
MSI: HPET    26 vec=c0  fixed  edge   assert phys    cpu dest=00000002
mask=1/0/?
HPET[02] idx 2, cpu 4294967295, flags 0x00000000, cpumask 00000000
MSI: HPET    27 vec=c8  fixed  edge   assert phys    cpu dest=00000002
mask=1/0/?
HPET[03] idx 3, cpu 4294967295, flags 0x00000000, cpumask 00000000
MSI: HPET    28 vec=d0  fixed  edge   assert phys    cpu dest=00000006
mask=1/0/?
HPET[04] idx 4, cpu 4294967295, flags 0x00000000, cpumask 00000000
MSI: HPET    29 vec=21  fixed  edge   assert phys    cpu dest=00000006
mask=1/0/?
HPET[05] idx 5, cpu 2, flags 0x00000001, cpumask 00000004
MSI: HPET    30 vec=29  fixed  edge   assert phys    cpu dest=00000006
mask=1/0/?
HPET[06] idx 6, cpu 4294967295, flags 0x00000000, cpumask 00000000
MSI: HPET    31 vec=31  fixed  edge   assert phys    cpu dest=00000002
mask=1/0/?
HPET[07] idx 7, cpu 1, flags 0x00000001, cpumask 00000002
MSI: HPET    32 vec=39  fixed  edge   assert phys    cpu dest=00000006
mask=1/0/?

This debugging is taken after nmi_shootdown_cpus(), when all other pcpus
are stopped  (In due course, I shall get around to submitting the
debugging infrastructure patches, as I suspect others upstream might
find them useful).  This is one of the more interesting traces, but
other times have been seen every HPET with cpu set to -1.

local_irq_count proves that we have taken 8 nested interrupts, and the
full stack debugging (not included here for brevity) shows that they
were all HPET interrupts with different vectors.

On this particular hardware, using "maxcpus=4" does work around the
issue, but is not a valid fix.  Playing with the position of
ack_APIC_irq() does appear to affect the problem (as suspected), but I
am not convinced it is the correct fix either.

Having looked at the implementation, which uses regular irqs and irq
migration, I have to admit to being surprised it even works in the
slightest.  For safety reasons, the irq migration code requires a irq to
arrive at the old cpu, at which point state gets updates to point it at
the new cpu.  It also means that the vector is essentially random
between 0x21 and 0xef, which plays havoc with priorities of other
interrupts, including line level interrupts (where a high priority line
level interrupt with its ack sitting on the pending EOI stack can block
a a lower priority HPET timer interrupt for an indefinite period of
time).  From my reading of the code, a pcpu which calls
hpet_get_channel() and needs to share channels will cause an early HPET
interrupt to occur on the old pcpu, which appears to then go out of its
way to wake up the cpu which should have received the interrupt.

It occurs to me that a substantially more sane method of doing this is
to install a high priority handler with the hpet handler directly wired
in.  This way, hpet_get_channel() becomes vastly more simple and just
involves rewriting the MSI to change the destination.  It also allows
for correct use of ack_APIC_irq() (i.e. prevent reentrancy), and frees
up some space in the dynamic range.

Jan: as the author of the current code, is there anything I have overlooked?

~Andrew

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: HPET Stack overflow
  2013-09-30 17:00 HPET Stack overflow Andrew Cooper
@ 2013-10-01 10:01 ` Jan Beulich
  0 siblings, 0 replies; 2+ messages in thread
From: Jan Beulich @ 2013-10-01 10:01 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 30.09.13 at 19:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On this particular hardware, using "maxcpus=4" does work around the
> issue, but is not a valid fix.  Playing with the position of
> ack_APIC_irq() does appear to affect the problem (as suspected), but I
> am not convinced it is the correct fix either.

I'm more and more convinced that this was wrong from the
beginning, and hence should be fixed independent of any
other adjustments for the specific problem of yours.

> Having looked at the implementation, which uses regular irqs and irq
> migration, I have to admit to being surprised it even works in the
> slightest.  For safety reasons, the irq migration code requires a irq to
> arrive at the old cpu, at which point state gets updates to point it at
> the new cpu.  It also means that the vector is essentially random
> between 0x21 and 0xef, which plays havoc with priorities of other
> interrupts, including line level interrupts (where a high priority line
> level interrupt with its ack sitting on the pending EOI stack can block
> a a lower priority HPET timer interrupt for an indefinite period of
> time).  From my reading of the code, a pcpu which calls
> hpet_get_channel() and needs to share channels will cause an early HPET
> interrupt to occur on the old pcpu, which appears to then go out of its
> way to wake up the cpu which should have received the interrupt.
> 
> It occurs to me that a substantially more sane method of doing this is
> to install a high priority handler with the hpet handler directly wired
> in.  This way, hpet_get_channel() becomes vastly more simple and just
> involves rewriting the MSI to change the destination.  It also allows
> for correct use of ack_APIC_irq() (i.e. prevent reentrancy), and frees
> up some space in the dynamic range.
> 
> Jan: as the author of the current code, is there anything I have overlooked?

That sounds all quite plausible. And no, I'm not the author of the
current code. I'm just the author of a lot of other adjustments to
the original implementation done by Intel folks, which - as we see
now - still weren't enough to get this into a sane state.

Jan

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-10-01 10:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-30 17:00 HPET Stack overflow Andrew Cooper
2013-10-01 10:01 ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).