[RFC] Saving/Restoring the internal state of an interrupt

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

* [RFC] Saving/Restoring the internal state of an interrupt
@ 2014-06-16 18:47 Marc Zyngier
  2014-08-04 11:57 ` Christoffer Dall
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2014-06-16 18:47 UTC (permalink / raw)
  To: linux-arm-kernel

Thomas, all,

Since ARM has grown some virtualization extensions, there is a
particular issue I've carefully avoided to address, and which is now
biting me exactly where you think it does.

Let's expose the problem:
- We have a per-CPU timer (the architected timer)
- This timer is shared across virtual machines
- We context-switch the state of the timer on entry/exit of the VM

The state of the timer itself is basically a control word and a
comparator value. Not much. Ah yes, one tiny detail: the state of the
interrupt controller (the GIC) is also part of the state.

This interrupt state is not the pending state, but the internal state
(called the "active" state), indicating if the interrupt is in
progress or not. If the interrupt is active (and configured as a level
interrupt), then the interrupt can't fire again until the guest EOIs
it (there is some special hardware dedicated to this).

So far, I got away with ignoring that state altogether by playing some
tricks in the timer control register (basically masking the interrupt
there). This worked fine until Ian found some interesting guest (QNX)
that falls over if someone masks the interrupt in the timer behind its
back. Fair enough.

So I'm back to square one and I have to context-switch that single
bit.

So far, I can see about three options:

- I can add a pair of exported functions:
  void gicv2_irq_set_active_state(struct irq_data *d, bool act)
  bool gicv2_irq_is_active(struct irq_data *d);

  that directly play with the state. Oh, and for GICv3 as well. And
  whatever comes after. It means that KVM has to know exactly the type
  of interrupt controller we're using (well, it already does), and call
  the right stuff.

- I can add similar function to struct irq_chip:
  void (*irq_set_state)(struct irq_data *d, void *data);
  void (*irq_get_state)(struct irq_data *d, void *data);

  and build a generic API on top of that. That's tempting, but I'm
  really not keen on the "void *data" crap, as it means KVM has to
  know the type of the opaque data anyway. Or we define what is the
  state of an interrupt, and I'm afraid there is as many definitions
  as there are interrupt controllers.

- The third possibility is that I go and duplicate parts of the two GIC
  drivers into KVM just to be able to save/restore these bits. Please
  pass the bucket around.

So far, I've prototyped the first option, but I'm seriously
questioning my own sanity. Any idea/opinion?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC] Saving/Restoring the internal state of an interrupt
  2014-06-16 18:47 [RFC] Saving/Restoring the internal state of an interrupt Marc Zyngier
@ 2014-08-04 11:57 ` Christoffer Dall
  2014-08-04 12:31   ` Marc Zyngier
  0 siblings, 1 reply; 6+ messages in thread
From: Christoffer Dall @ 2014-08-04 11:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote:
> Thomas, all,
> 
> Since ARM has grown some virtualization extensions, there is a
> particular issue I've carefully avoided to address, and which is now
> biting me exactly where you think it does.
> 
> Let's expose the problem:
> - We have a per-CPU timer (the architected timer)
> - This timer is shared across virtual machines
> - We context-switch the state of the timer on entry/exit of the VM
> 
> The state of the timer itself is basically a control word and a
> comparator value. Not much. Ah yes, one tiny detail: the state of the
> interrupt controller (the GIC) is also part of the state.
> 
> This interrupt state is not the pending state, but the internal state
> (called the "active" state), indicating if the interrupt is in
> progress or not. If the interrupt is active (and configured as a level
> interrupt), then the interrupt can't fire again until the guest EOIs
> it (there is some special hardware dedicated to this).

Can you clarify this part.  I was under the impression from reading the
GICv2 specs at least that an active interrupt can become active and
pending (and the pending part of that would cause the interrupt to fire
again).  Specifically, I'm thinking about transition A2 in Figure 3-1 in
the GICv2 spec (ARM IHI 0048B.b).  Am I reading this incorrectly or is
there something special about the timer in this regard?

> 
> So far, I got away with ignoring that state altogether by playing some
> tricks in the timer control register (basically masking the interrupt
> there). This worked fine until Ian found some interesting guest (QNX)
> that falls over if someone masks the interrupt in the timer behind its
> back. Fair enough.
> 
> So I'm back to square one and I have to context-switch that single
> bit.
> 
> So far, I can see about three options:
> 
> - I can add a pair of exported functions:
>   void gicv2_irq_set_active_state(struct irq_data *d, bool act)
>   bool gicv2_irq_is_active(struct irq_data *d);
> 
>   that directly play with the state. Oh, and for GICv3 as well. And
>   whatever comes after. It means that KVM has to know exactly the type
>   of interrupt controller we're using (well, it already does), and call
>   the right stuff.
> 
> - I can add similar function to struct irq_chip:
>   void (*irq_set_state)(struct irq_data *d, void *data);
>   void (*irq_get_state)(struct irq_data *d, void *data);
> 
>   and build a generic API on top of that. That's tempting, but I'm
>   really not keen on the "void *data" crap, as it means KVM has to
>   know the type of the opaque data anyway. Or we define what is the
>   state of an interrupt, and I'm afraid there is as many definitions
>   as there are interrupt controllers.
> 
> - The third possibility is that I go and duplicate parts of the two GIC
>   drivers into KVM just to be able to save/restore these bits. Please
>   pass the bucket around.
> 
> So far, I've prototyped the first option, but I'm seriously
> questioning my own sanity. Any idea/opinion?
> 
KVM and the GIC driver are tightly bound anyhow, so is there anything
particular horrible in the first approach?  (Besides the fact that we're
exposing random bits of information to the entire kernel not intended to
be used by anyone else than KVM for now).

Second approach feels over-engineered to me, unless there are other
needs or real use cases in the near term.

The third option, just feels wrong, we'd have to do lookups in the DT
again to get the base address of the GICC etc. right?

-Christoffer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC] Saving/Restoring the internal state of an interrupt
  2014-08-04 11:57 ` Christoffer Dall
@ 2014-08-04 12:31   ` Marc Zyngier
  2014-08-04 13:26     ` Christoffer Dall
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2014-08-04 12:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 04 2014 at 12:57:16 pm BST, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote:
>> Thomas, all,
>> 
>> Since ARM has grown some virtualization extensions, there is a
>> particular issue I've carefully avoided to address, and which is now
>> biting me exactly where you think it does.
>> 
>> Let's expose the problem:
>> - We have a per-CPU timer (the architected timer)
>> - This timer is shared across virtual machines
>> - We context-switch the state of the timer on entry/exit of the VM
>> 
>> The state of the timer itself is basically a control word and a
>> comparator value. Not much. Ah yes, one tiny detail: the state of the
>> interrupt controller (the GIC) is also part of the state.
>> 
>> This interrupt state is not the pending state, but the internal state
>> (called the "active" state), indicating if the interrupt is in
>> progress or not. If the interrupt is active (and configured as a level
>> interrupt), then the interrupt can't fire again until the guest EOIs
>> it (there is some special hardware dedicated to this).
>
> Can you clarify this part.  I was under the impression from reading the
> GICv2 specs at least that an active interrupt can become active and
> pending (and the pending part of that would cause the interrupt to fire
> again).  Specifically, I'm thinking about transition A2 in Figure 3-1 in
> the GICv2 spec (ARM IHI 0048B.b).  Am I reading this incorrectly or is
> there something special about the timer in this regard?

Nothing special about the timer. Just that the timer being a level
interrupt, the pending state is a direct consequence of the level being
high (as mentionned in the note at the bottom of 1.4.2).

You go from Pending to Active-and-Pending, and then to Active (when the
timer is reprogrammed/disabled, though you may go back to A&P if the timer
fires again immediately). Eventually, after the EOI, you go back to
Inactive.

Here, the only bit we really care about is the Active bit (Pending
changes anyway, as we save/restore the timer context).

Or is there something that I don't understand in your question?

>> 
>> So far, I got away with ignoring that state altogether by playing some
>> tricks in the timer control register (basically masking the interrupt
>> there). This worked fine until Ian found some interesting guest (QNX)
>> that falls over if someone masks the interrupt in the timer behind its
>> back. Fair enough.
>> 
>> So I'm back to square one and I have to context-switch that single
>> bit.
>> 
>> So far, I can see about three options:
>> 
>> - I can add a pair of exported functions:
>>   void gicv2_irq_set_active_state(struct irq_data *d, bool act)
>>   bool gicv2_irq_is_active(struct irq_data *d);
>> 
>>   that directly play with the state. Oh, and for GICv3 as well. And
>>   whatever comes after. It means that KVM has to know exactly the type
>>   of interrupt controller we're using (well, it already does), and call
>>   the right stuff.
>> 
>> - I can add similar function to struct irq_chip:
>>   void (*irq_set_state)(struct irq_data *d, void *data);
>>   void (*irq_get_state)(struct irq_data *d, void *data);
>> 
>>   and build a generic API on top of that. That's tempting, but I'm
>>   really not keen on the "void *data" crap, as it means KVM has to
>>   know the type of the opaque data anyway. Or we define what is the
>>   state of an interrupt, and I'm afraid there is as many definitions
>>   as there are interrupt controllers.
>> 
>> - The third possibility is that I go and duplicate parts of the two GIC
>>   drivers into KVM just to be able to save/restore these bits. Please
>>   pass the bucket around.
>> 
>> So far, I've prototyped the first option, but I'm seriously
>> questioning my own sanity. Any idea/opinion?
>> 
> KVM and the GIC driver are tightly bound anyhow, so is there anything
> particular horrible in the first approach?  (Besides the fact that we're
> exposing random bits of information to the entire kernel not intended to
> be used by anyone else than KVM for now).
>
> Second approach feels over-engineered to me, unless there are other
> needs or real use cases in the near term.

Well, I've decided on option 2 so far, as I feel we'd better have
something that is more generic then just a bunch of driver-specific
hacks. Other architectures may have similar requirements (we just
haven't realized it yet).

> The third option, just feels wrong, we'd have to do lookups in the DT
> again to get the base address of the GICC etc. right?

Yup. I don't even want to think about it. ;-)

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC] Saving/Restoring the internal state of an interrupt
  2014-08-04 12:31   ` Marc Zyngier
@ 2014-08-04 13:26     ` Christoffer Dall
  2014-08-04 14:22       ` Marc Zyngier
  0 siblings, 1 reply; 6+ messages in thread
From: Christoffer Dall @ 2014-08-04 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 04, 2014 at 01:31:05PM +0100, Marc Zyngier wrote:
> On Mon, Aug 04 2014 at 12:57:16 pm BST, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> > On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote:
> >> Thomas, all,
> >> 
> >> Since ARM has grown some virtualization extensions, there is a
> >> particular issue I've carefully avoided to address, and which is now
> >> biting me exactly where you think it does.
> >> 
> >> Let's expose the problem:
> >> - We have a per-CPU timer (the architected timer)
> >> - This timer is shared across virtual machines
> >> - We context-switch the state of the timer on entry/exit of the VM
> >> 
> >> The state of the timer itself is basically a control word and a
> >> comparator value. Not much. Ah yes, one tiny detail: the state of the
> >> interrupt controller (the GIC) is also part of the state.
> >> 
> >> This interrupt state is not the pending state, but the internal state
> >> (called the "active" state), indicating if the interrupt is in
> >> progress or not. If the interrupt is active (and configured as a level
> >> interrupt), then the interrupt can't fire again until the guest EOIs
> >> it (there is some special hardware dedicated to this).
> >
> > Can you clarify this part.  I was under the impression from reading the
> > GICv2 specs at least that an active interrupt can become active and
> > pending (and the pending part of that would cause the interrupt to fire
> > again).  Specifically, I'm thinking about transition A2 in Figure 3-1 in
> > the GICv2 spec (ARM IHI 0048B.b).  Am I reading this incorrectly or is
> > there something special about the timer in this regard?
> 
> Nothing special about the timer. Just that the timer being a level
> interrupt, the pending state is a direct consequence of the level being
> high (as mentionned in the note at the bottom of 1.4.2).
> 
> You go from Pending to Active-and-Pending, and then to Active (when the
> timer is reprogrammed/disabled, though you may go back to A&P if the timer
> fires again immediately). Eventually, after the EOI, you go back to
> Inactive.
> 
> Here, the only bit we really care about is the Active bit (Pending
> changes anyway, as we save/restore the timer context).
> 
> Or is there something that I don't understand in your question?
> 

If not using priorities, what difference does it make if we save/restore
the active bit?  The whole point is that a timer that the guest hasn't
reprogrammed/disabled should not raise a hardware interrupt, right?

So what I inferred from your text is that we prevent that from happening
by restoring the active bit.  But that will still cause the hardware to
assert the signal (we will become A&P) and that would raise the
interrupt on the hardware, we would trap again, and so on.  Or?

-Christoffer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC] Saving/Restoring the internal state of an interrupt
  2014-08-04 13:26     ` Christoffer Dall
@ 2014-08-04 14:22       ` Marc Zyngier
  2014-08-04 15:13         ` Christoffer Dall
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2014-08-04 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 04 2014 at 02:26:48 PM, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Mon, Aug 04, 2014 at 01:31:05PM +0100, Marc Zyngier wrote:
>> On Mon, Aug 04 2014 at 12:57:16 pm BST, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote:
>> >> Thomas, all,
>> >> 
>> >> Since ARM has grown some virtualization extensions, there is a
>> >> particular issue I've carefully avoided to address, and which is now
>> >> biting me exactly where you think it does.
>> >> 
>> >> Let's expose the problem:
>> >> - We have a per-CPU timer (the architected timer)
>> >> - This timer is shared across virtual machines
>> >> - We context-switch the state of the timer on entry/exit of the VM
>> >> 
>> >> The state of the timer itself is basically a control word and a
>> >> comparator value. Not much. Ah yes, one tiny detail: the state of the
>> >> interrupt controller (the GIC) is also part of the state.
>> >> 
>> >> This interrupt state is not the pending state, but the internal state
>> >> (called the "active" state), indicating if the interrupt is in
>> >> progress or not. If the interrupt is active (and configured as a level
>> >> interrupt), then the interrupt can't fire again until the guest EOIs
>> >> it (there is some special hardware dedicated to this).
>> >
>> > Can you clarify this part.  I was under the impression from reading the
>> > GICv2 specs at least that an active interrupt can become active and
>> > pending (and the pending part of that would cause the interrupt to fire
>> > again).  Specifically, I'm thinking about transition A2 in Figure 3-1 in
>> > the GICv2 spec (ARM IHI 0048B.b).  Am I reading this incorrectly or is
>> > there something special about the timer in this regard?
>> 
>> Nothing special about the timer. Just that the timer being a level
>> interrupt, the pending state is a direct consequence of the level being
>> high (as mentionned in the note at the bottom of 1.4.2).
>> 
>> You go from Pending to Active-and-Pending, and then to Active (when the
>> timer is reprogrammed/disabled, though you may go back to A&P if the timer
>> fires again immediately). Eventually, after the EOI, you go back to
>> Inactive.
>> 
>> Here, the only bit we really care about is the Active bit (Pending
>> changes anyway, as we save/restore the timer context).
>> 
>> Or is there something that I don't understand in your question?
>> 
>
> If not using priorities, what difference does it make if we save/restore
> the active bit?  The whole point is that a timer that the guest hasn't
> reprogrammed/disabled should not raise a hardware interrupt, right?

It makes a massive difference: the interrupt is active on the host, and
it is what will prevent subsequent (and possibly spurious) timer
interrupts from being taken.

> So what I inferred from your text is that we prevent that from happening
> by restoring the active bit.  But that will still cause the hardware to
> assert the signal (we will become A&P) and that would raise the
> interrupt on the hardware, we would trap again, and so on.  Or?

If the interrupt is already active when you restore the timer state, no
other timer interrupt will be injected. That's the whole point of the
active bit.

It feels like we're talking past each other, or that we're not exatly
talking about the same thing. If that's of any help, can you have a look
at the irq_forwarding patches I've posted a while ago?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC] Saving/Restoring the internal state of an interrupt
  2014-08-04 14:22       ` Marc Zyngier
@ 2014-08-04 15:13         ` Christoffer Dall
  0 siblings, 0 replies; 6+ messages in thread
From: Christoffer Dall @ 2014-08-04 15:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 04, 2014 at 03:22:16PM +0100, Marc Zyngier wrote:
> On Mon, Aug 04 2014 at 02:26:48 PM, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> > On Mon, Aug 04, 2014 at 01:31:05PM +0100, Marc Zyngier wrote:
> >> On Mon, Aug 04 2014 at 12:57:16 pm BST, Christoffer Dall
> >> <christoffer.dall@linaro.org> wrote:
> >> > On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote:
> >> >> Thomas, all,
> >> >> 
> >> >> Since ARM has grown some virtualization extensions, there is a
> >> >> particular issue I've carefully avoided to address, and which is now
> >> >> biting me exactly where you think it does.
> >> >> 
> >> >> Let's expose the problem:
> >> >> - We have a per-CPU timer (the architected timer)
> >> >> - This timer is shared across virtual machines
> >> >> - We context-switch the state of the timer on entry/exit of the VM
> >> >> 
> >> >> The state of the timer itself is basically a control word and a
> >> >> comparator value. Not much. Ah yes, one tiny detail: the state of the
> >> >> interrupt controller (the GIC) is also part of the state.
> >> >> 
> >> >> This interrupt state is not the pending state, but the internal state
> >> >> (called the "active" state), indicating if the interrupt is in
> >> >> progress or not. If the interrupt is active (and configured as a level
> >> >> interrupt), then the interrupt can't fire again until the guest EOIs
> >> >> it (there is some special hardware dedicated to this).
> >> >
> >> > Can you clarify this part.  I was under the impression from reading the
> >> > GICv2 specs at least that an active interrupt can become active and
> >> > pending (and the pending part of that would cause the interrupt to fire
> >> > again).  Specifically, I'm thinking about transition A2 in Figure 3-1 in
> >> > the GICv2 spec (ARM IHI 0048B.b).  Am I reading this incorrectly or is
> >> > there something special about the timer in this regard?
> >> 
> >> Nothing special about the timer. Just that the timer being a level
> >> interrupt, the pending state is a direct consequence of the level being
> >> high (as mentionned in the note at the bottom of 1.4.2).
> >> 
> >> You go from Pending to Active-and-Pending, and then to Active (when the
> >> timer is reprogrammed/disabled, though you may go back to A&P if the timer
> >> fires again immediately). Eventually, after the EOI, you go back to
> >> Inactive.
> >> 
> >> Here, the only bit we really care about is the Active bit (Pending
> >> changes anyway, as we save/restore the timer context).
> >> 
> >> Or is there something that I don't understand in your question?
> >> 
> >
> > If not using priorities, what difference does it make if we save/restore
> > the active bit?  The whole point is that a timer that the guest hasn't
> > reprogrammed/disabled should not raise a hardware interrupt, right?
> 
> It makes a massive difference: the interrupt is active on the host, and
> it is what will prevent subsequent (and possibly spurious) timer
> interrupts from being taken.
> 
> > So what I inferred from your text is that we prevent that from happening
> > by restoring the active bit.  But that will still cause the hardware to
> > assert the signal (we will become A&P) and that would raise the
> > interrupt on the hardware, we would trap again, and so on.  Or?
> 
> If the interrupt is already active when you restore the timer state, no
> other timer interrupt will be injected. That's the whole point of the
> active bit.
> 
> It feels like we're talking past each other, or that we're not exatly
> talking about the same thing. If that's of any help, can you have a look
> at the irq_forwarding patches I've posted a while ago?
> 
We're not talking past each other, I've simply 'misread' the spec.  I
find it extremely confusing, because Section 3.2.4 talks specifically
about the CPU interface state machine, and Figure 3-1 correspondingly
talks about an interrupt being pending even though it's already active
(A&P), but that sort of more a distributor or input-side state for the
CPU interface.

Anyway, the pseudo code clears it up, and clarifies that it is only
actually pending if it is not active.  This insight let's me make sense
out of your e-mail, and urges me to go check the QEMU emulation soon.

-Christoffer

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-08-04 15:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-16 18:47 [RFC] Saving/Restoring the internal state of an interrupt Marc Zyngier
2014-08-04 11:57 ` Christoffer Dall
2014-08-04 12:31   ` Marc Zyngier
2014-08-04 13:26     ` Christoffer Dall
2014-08-04 14:22       ` Marc Zyngier
2014-08-04 15:13         ` Christoffer Dall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox