* [RFC] Saving/Restoring the internal state of an interrupt @ 2014-06-16 18:47 Marc Zyngier 2014-08-04 11:57 ` Christoffer Dall 0 siblings, 1 reply; 6+ messages in thread From: Marc Zyngier @ 2014-06-16 18:47 UTC (permalink / raw) To: linux-arm-kernel Thomas, all, Since ARM has grown some virtualization extensions, there is a particular issue I've carefully avoided to address, and which is now biting me exactly where you think it does. Let's expose the problem: - We have a per-CPU timer (the architected timer) - This timer is shared across virtual machines - We context-switch the state of the timer on entry/exit of the VM The state of the timer itself is basically a control word and a comparator value. Not much. Ah yes, one tiny detail: the state of the interrupt controller (the GIC) is also part of the state. This interrupt state is not the pending state, but the internal state (called the "active" state), indicating if the interrupt is in progress or not. If the interrupt is active (and configured as a level interrupt), then the interrupt can't fire again until the guest EOIs it (there is some special hardware dedicated to this). So far, I got away with ignoring that state altogether by playing some tricks in the timer control register (basically masking the interrupt there). This worked fine until Ian found some interesting guest (QNX) that falls over if someone masks the interrupt in the timer behind its back. Fair enough. So I'm back to square one and I have to context-switch that single bit. So far, I can see about three options: - I can add a pair of exported functions: void gicv2_irq_set_active_state(struct irq_data *d, bool act) bool gicv2_irq_is_active(struct irq_data *d); that directly play with the state. Oh, and for GICv3 as well. And whatever comes after. It means that KVM has to know exactly the type of interrupt controller we're using (well, it already does), and call the right stuff. - I can add similar function to struct irq_chip: void (*irq_set_state)(struct irq_data *d, void *data); void (*irq_get_state)(struct irq_data *d, void *data); and build a generic API on top of that. That's tempting, but I'm really not keen on the "void *data" crap, as it means KVM has to know the type of the opaque data anyway. Or we define what is the state of an interrupt, and I'm afraid there is as many definitions as there are interrupt controllers. - The third possibility is that I go and duplicate parts of the two GIC drivers into KVM just to be able to save/restore these bits. Please pass the bucket around. So far, I've prototyped the first option, but I'm seriously questioning my own sanity. Any idea/opinion? Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC] Saving/Restoring the internal state of an interrupt 2014-06-16 18:47 [RFC] Saving/Restoring the internal state of an interrupt Marc Zyngier @ 2014-08-04 11:57 ` Christoffer Dall 2014-08-04 12:31 ` Marc Zyngier 0 siblings, 1 reply; 6+ messages in thread From: Christoffer Dall @ 2014-08-04 11:57 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote: > Thomas, all, > > Since ARM has grown some virtualization extensions, there is a > particular issue I've carefully avoided to address, and which is now > biting me exactly where you think it does. > > Let's expose the problem: > - We have a per-CPU timer (the architected timer) > - This timer is shared across virtual machines > - We context-switch the state of the timer on entry/exit of the VM > > The state of the timer itself is basically a control word and a > comparator value. Not much. Ah yes, one tiny detail: the state of the > interrupt controller (the GIC) is also part of the state. > > This interrupt state is not the pending state, but the internal state > (called the "active" state), indicating if the interrupt is in > progress or not. If the interrupt is active (and configured as a level > interrupt), then the interrupt can't fire again until the guest EOIs > it (there is some special hardware dedicated to this). Can you clarify this part. I was under the impression from reading the GICv2 specs at least that an active interrupt can become active and pending (and the pending part of that would cause the interrupt to fire again). Specifically, I'm thinking about transition A2 in Figure 3-1 in the GICv2 spec (ARM IHI 0048B.b). Am I reading this incorrectly or is there something special about the timer in this regard? > > So far, I got away with ignoring that state altogether by playing some > tricks in the timer control register (basically masking the interrupt > there). This worked fine until Ian found some interesting guest (QNX) > that falls over if someone masks the interrupt in the timer behind its > back. Fair enough. > > So I'm back to square one and I have to context-switch that single > bit. > > So far, I can see about three options: > > - I can add a pair of exported functions: > void gicv2_irq_set_active_state(struct irq_data *d, bool act) > bool gicv2_irq_is_active(struct irq_data *d); > > that directly play with the state. Oh, and for GICv3 as well. And > whatever comes after. It means that KVM has to know exactly the type > of interrupt controller we're using (well, it already does), and call > the right stuff. > > - I can add similar function to struct irq_chip: > void (*irq_set_state)(struct irq_data *d, void *data); > void (*irq_get_state)(struct irq_data *d, void *data); > > and build a generic API on top of that. That's tempting, but I'm > really not keen on the "void *data" crap, as it means KVM has to > know the type of the opaque data anyway. Or we define what is the > state of an interrupt, and I'm afraid there is as many definitions > as there are interrupt controllers. > > - The third possibility is that I go and duplicate parts of the two GIC > drivers into KVM just to be able to save/restore these bits. Please > pass the bucket around. > > So far, I've prototyped the first option, but I'm seriously > questioning my own sanity. Any idea/opinion? > KVM and the GIC driver are tightly bound anyhow, so is there anything particular horrible in the first approach? (Besides the fact that we're exposing random bits of information to the entire kernel not intended to be used by anyone else than KVM for now). Second approach feels over-engineered to me, unless there are other needs or real use cases in the near term. The third option, just feels wrong, we'd have to do lookups in the DT again to get the base address of the GICC etc. right? -Christoffer ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC] Saving/Restoring the internal state of an interrupt 2014-08-04 11:57 ` Christoffer Dall @ 2014-08-04 12:31 ` Marc Zyngier 2014-08-04 13:26 ` Christoffer Dall 0 siblings, 1 reply; 6+ messages in thread From: Marc Zyngier @ 2014-08-04 12:31 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 04 2014 at 12:57:16 pm BST, Christoffer Dall <christoffer.dall@linaro.org> wrote: > On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote: >> Thomas, all, >> >> Since ARM has grown some virtualization extensions, there is a >> particular issue I've carefully avoided to address, and which is now >> biting me exactly where you think it does. >> >> Let's expose the problem: >> - We have a per-CPU timer (the architected timer) >> - This timer is shared across virtual machines >> - We context-switch the state of the timer on entry/exit of the VM >> >> The state of the timer itself is basically a control word and a >> comparator value. Not much. Ah yes, one tiny detail: the state of the >> interrupt controller (the GIC) is also part of the state. >> >> This interrupt state is not the pending state, but the internal state >> (called the "active" state), indicating if the interrupt is in >> progress or not. If the interrupt is active (and configured as a level >> interrupt), then the interrupt can't fire again until the guest EOIs >> it (there is some special hardware dedicated to this). > > Can you clarify this part. I was under the impression from reading the > GICv2 specs at least that an active interrupt can become active and > pending (and the pending part of that would cause the interrupt to fire > again). Specifically, I'm thinking about transition A2 in Figure 3-1 in > the GICv2 spec (ARM IHI 0048B.b). Am I reading this incorrectly or is > there something special about the timer in this regard? Nothing special about the timer. Just that the timer being a level interrupt, the pending state is a direct consequence of the level being high (as mentionned in the note at the bottom of 1.4.2). You go from Pending to Active-and-Pending, and then to Active (when the timer is reprogrammed/disabled, though you may go back to A&P if the timer fires again immediately). Eventually, after the EOI, you go back to Inactive. Here, the only bit we really care about is the Active bit (Pending changes anyway, as we save/restore the timer context). Or is there something that I don't understand in your question? >> >> So far, I got away with ignoring that state altogether by playing some >> tricks in the timer control register (basically masking the interrupt >> there). This worked fine until Ian found some interesting guest (QNX) >> that falls over if someone masks the interrupt in the timer behind its >> back. Fair enough. >> >> So I'm back to square one and I have to context-switch that single >> bit. >> >> So far, I can see about three options: >> >> - I can add a pair of exported functions: >> void gicv2_irq_set_active_state(struct irq_data *d, bool act) >> bool gicv2_irq_is_active(struct irq_data *d); >> >> that directly play with the state. Oh, and for GICv3 as well. And >> whatever comes after. It means that KVM has to know exactly the type >> of interrupt controller we're using (well, it already does), and call >> the right stuff. >> >> - I can add similar function to struct irq_chip: >> void (*irq_set_state)(struct irq_data *d, void *data); >> void (*irq_get_state)(struct irq_data *d, void *data); >> >> and build a generic API on top of that. That's tempting, but I'm >> really not keen on the "void *data" crap, as it means KVM has to >> know the type of the opaque data anyway. Or we define what is the >> state of an interrupt, and I'm afraid there is as many definitions >> as there are interrupt controllers. >> >> - The third possibility is that I go and duplicate parts of the two GIC >> drivers into KVM just to be able to save/restore these bits. Please >> pass the bucket around. >> >> So far, I've prototyped the first option, but I'm seriously >> questioning my own sanity. Any idea/opinion? >> > KVM and the GIC driver are tightly bound anyhow, so is there anything > particular horrible in the first approach? (Besides the fact that we're > exposing random bits of information to the entire kernel not intended to > be used by anyone else than KVM for now). > > Second approach feels over-engineered to me, unless there are other > needs or real use cases in the near term. Well, I've decided on option 2 so far, as I feel we'd better have something that is more generic then just a bunch of driver-specific hacks. Other architectures may have similar requirements (we just haven't realized it yet). > The third option, just feels wrong, we'd have to do lookups in the DT > again to get the base address of the GICC etc. right? Yup. I don't even want to think about it. ;-) M. -- Jazz is not dead. It just smells funny. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC] Saving/Restoring the internal state of an interrupt 2014-08-04 12:31 ` Marc Zyngier @ 2014-08-04 13:26 ` Christoffer Dall 2014-08-04 14:22 ` Marc Zyngier 0 siblings, 1 reply; 6+ messages in thread From: Christoffer Dall @ 2014-08-04 13:26 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 04, 2014 at 01:31:05PM +0100, Marc Zyngier wrote: > On Mon, Aug 04 2014 at 12:57:16 pm BST, Christoffer Dall <christoffer.dall@linaro.org> wrote: > > On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote: > >> Thomas, all, > >> > >> Since ARM has grown some virtualization extensions, there is a > >> particular issue I've carefully avoided to address, and which is now > >> biting me exactly where you think it does. > >> > >> Let's expose the problem: > >> - We have a per-CPU timer (the architected timer) > >> - This timer is shared across virtual machines > >> - We context-switch the state of the timer on entry/exit of the VM > >> > >> The state of the timer itself is basically a control word and a > >> comparator value. Not much. Ah yes, one tiny detail: the state of the > >> interrupt controller (the GIC) is also part of the state. > >> > >> This interrupt state is not the pending state, but the internal state > >> (called the "active" state), indicating if the interrupt is in > >> progress or not. If the interrupt is active (and configured as a level > >> interrupt), then the interrupt can't fire again until the guest EOIs > >> it (there is some special hardware dedicated to this). > > > > Can you clarify this part. I was under the impression from reading the > > GICv2 specs at least that an active interrupt can become active and > > pending (and the pending part of that would cause the interrupt to fire > > again). Specifically, I'm thinking about transition A2 in Figure 3-1 in > > the GICv2 spec (ARM IHI 0048B.b). Am I reading this incorrectly or is > > there something special about the timer in this regard? > > Nothing special about the timer. Just that the timer being a level > interrupt, the pending state is a direct consequence of the level being > high (as mentionned in the note at the bottom of 1.4.2). > > You go from Pending to Active-and-Pending, and then to Active (when the > timer is reprogrammed/disabled, though you may go back to A&P if the timer > fires again immediately). Eventually, after the EOI, you go back to > Inactive. > > Here, the only bit we really care about is the Active bit (Pending > changes anyway, as we save/restore the timer context). > > Or is there something that I don't understand in your question? > If not using priorities, what difference does it make if we save/restore the active bit? The whole point is that a timer that the guest hasn't reprogrammed/disabled should not raise a hardware interrupt, right? So what I inferred from your text is that we prevent that from happening by restoring the active bit. But that will still cause the hardware to assert the signal (we will become A&P) and that would raise the interrupt on the hardware, we would trap again, and so on. Or? -Christoffer ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC] Saving/Restoring the internal state of an interrupt 2014-08-04 13:26 ` Christoffer Dall @ 2014-08-04 14:22 ` Marc Zyngier 2014-08-04 15:13 ` Christoffer Dall 0 siblings, 1 reply; 6+ messages in thread From: Marc Zyngier @ 2014-08-04 14:22 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 04 2014 at 02:26:48 PM, Christoffer Dall <christoffer.dall@linaro.org> wrote: > On Mon, Aug 04, 2014 at 01:31:05PM +0100, Marc Zyngier wrote: >> On Mon, Aug 04 2014 at 12:57:16 pm BST, Christoffer Dall >> <christoffer.dall@linaro.org> wrote: >> > On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote: >> >> Thomas, all, >> >> >> >> Since ARM has grown some virtualization extensions, there is a >> >> particular issue I've carefully avoided to address, and which is now >> >> biting me exactly where you think it does. >> >> >> >> Let's expose the problem: >> >> - We have a per-CPU timer (the architected timer) >> >> - This timer is shared across virtual machines >> >> - We context-switch the state of the timer on entry/exit of the VM >> >> >> >> The state of the timer itself is basically a control word and a >> >> comparator value. Not much. Ah yes, one tiny detail: the state of the >> >> interrupt controller (the GIC) is also part of the state. >> >> >> >> This interrupt state is not the pending state, but the internal state >> >> (called the "active" state), indicating if the interrupt is in >> >> progress or not. If the interrupt is active (and configured as a level >> >> interrupt), then the interrupt can't fire again until the guest EOIs >> >> it (there is some special hardware dedicated to this). >> > >> > Can you clarify this part. I was under the impression from reading the >> > GICv2 specs at least that an active interrupt can become active and >> > pending (and the pending part of that would cause the interrupt to fire >> > again). Specifically, I'm thinking about transition A2 in Figure 3-1 in >> > the GICv2 spec (ARM IHI 0048B.b). Am I reading this incorrectly or is >> > there something special about the timer in this regard? >> >> Nothing special about the timer. Just that the timer being a level >> interrupt, the pending state is a direct consequence of the level being >> high (as mentionned in the note at the bottom of 1.4.2). >> >> You go from Pending to Active-and-Pending, and then to Active (when the >> timer is reprogrammed/disabled, though you may go back to A&P if the timer >> fires again immediately). Eventually, after the EOI, you go back to >> Inactive. >> >> Here, the only bit we really care about is the Active bit (Pending >> changes anyway, as we save/restore the timer context). >> >> Or is there something that I don't understand in your question? >> > > If not using priorities, what difference does it make if we save/restore > the active bit? The whole point is that a timer that the guest hasn't > reprogrammed/disabled should not raise a hardware interrupt, right? It makes a massive difference: the interrupt is active on the host, and it is what will prevent subsequent (and possibly spurious) timer interrupts from being taken. > So what I inferred from your text is that we prevent that from happening > by restoring the active bit. But that will still cause the hardware to > assert the signal (we will become A&P) and that would raise the > interrupt on the hardware, we would trap again, and so on. Or? If the interrupt is already active when you restore the timer state, no other timer interrupt will be injected. That's the whole point of the active bit. It feels like we're talking past each other, or that we're not exatly talking about the same thing. If that's of any help, can you have a look at the irq_forwarding patches I've posted a while ago? Thanks, M. -- Jazz is not dead. It just smells funny. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC] Saving/Restoring the internal state of an interrupt 2014-08-04 14:22 ` Marc Zyngier @ 2014-08-04 15:13 ` Christoffer Dall 0 siblings, 0 replies; 6+ messages in thread From: Christoffer Dall @ 2014-08-04 15:13 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 04, 2014 at 03:22:16PM +0100, Marc Zyngier wrote: > On Mon, Aug 04 2014 at 02:26:48 PM, Christoffer Dall <christoffer.dall@linaro.org> wrote: > > On Mon, Aug 04, 2014 at 01:31:05PM +0100, Marc Zyngier wrote: > >> On Mon, Aug 04 2014 at 12:57:16 pm BST, Christoffer Dall > >> <christoffer.dall@linaro.org> wrote: > >> > On Mon, Jun 16, 2014 at 07:47:57PM +0100, Marc Zyngier wrote: > >> >> Thomas, all, > >> >> > >> >> Since ARM has grown some virtualization extensions, there is a > >> >> particular issue I've carefully avoided to address, and which is now > >> >> biting me exactly where you think it does. > >> >> > >> >> Let's expose the problem: > >> >> - We have a per-CPU timer (the architected timer) > >> >> - This timer is shared across virtual machines > >> >> - We context-switch the state of the timer on entry/exit of the VM > >> >> > >> >> The state of the timer itself is basically a control word and a > >> >> comparator value. Not much. Ah yes, one tiny detail: the state of the > >> >> interrupt controller (the GIC) is also part of the state. > >> >> > >> >> This interrupt state is not the pending state, but the internal state > >> >> (called the "active" state), indicating if the interrupt is in > >> >> progress or not. If the interrupt is active (and configured as a level > >> >> interrupt), then the interrupt can't fire again until the guest EOIs > >> >> it (there is some special hardware dedicated to this). > >> > > >> > Can you clarify this part. I was under the impression from reading the > >> > GICv2 specs at least that an active interrupt can become active and > >> > pending (and the pending part of that would cause the interrupt to fire > >> > again). Specifically, I'm thinking about transition A2 in Figure 3-1 in > >> > the GICv2 spec (ARM IHI 0048B.b). Am I reading this incorrectly or is > >> > there something special about the timer in this regard? > >> > >> Nothing special about the timer. Just that the timer being a level > >> interrupt, the pending state is a direct consequence of the level being > >> high (as mentionned in the note at the bottom of 1.4.2). > >> > >> You go from Pending to Active-and-Pending, and then to Active (when the > >> timer is reprogrammed/disabled, though you may go back to A&P if the timer > >> fires again immediately). Eventually, after the EOI, you go back to > >> Inactive. > >> > >> Here, the only bit we really care about is the Active bit (Pending > >> changes anyway, as we save/restore the timer context). > >> > >> Or is there something that I don't understand in your question? > >> > > > > If not using priorities, what difference does it make if we save/restore > > the active bit? The whole point is that a timer that the guest hasn't > > reprogrammed/disabled should not raise a hardware interrupt, right? > > It makes a massive difference: the interrupt is active on the host, and > it is what will prevent subsequent (and possibly spurious) timer > interrupts from being taken. > > > So what I inferred from your text is that we prevent that from happening > > by restoring the active bit. But that will still cause the hardware to > > assert the signal (we will become A&P) and that would raise the > > interrupt on the hardware, we would trap again, and so on. Or? > > If the interrupt is already active when you restore the timer state, no > other timer interrupt will be injected. That's the whole point of the > active bit. > > It feels like we're talking past each other, or that we're not exatly > talking about the same thing. If that's of any help, can you have a look > at the irq_forwarding patches I've posted a while ago? > We're not talking past each other, I've simply 'misread' the spec. I find it extremely confusing, because Section 3.2.4 talks specifically about the CPU interface state machine, and Figure 3-1 correspondingly talks about an interrupt being pending even though it's already active (A&P), but that sort of more a distributor or input-side state for the CPU interface. Anyway, the pseudo code clears it up, and clarifies that it is only actually pending if it is not active. This insight let's me make sense out of your e-mail, and urges me to go check the QEMU emulation soon. -Christoffer ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-08-04 15:13 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-16 18:47 [RFC] Saving/Restoring the internal state of an interrupt Marc Zyngier 2014-08-04 11:57 ` Christoffer Dall 2014-08-04 12:31 ` Marc Zyngier 2014-08-04 13:26 ` Christoffer Dall 2014-08-04 14:22 ` Marc Zyngier 2014-08-04 15:13 ` Christoffer Dall
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).