KVM handling external interrupts

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* KVM handling external interrupts
@ 2012-06-07  0:12 sheng qiu
  2012-06-07  7:51 ` Abel Gordon
  0 siblings, 1 reply; 31+ messages in thread
From: sheng qiu @ 2012-06-07  0:12 UTC (permalink / raw)
  To: kvm

Hi all,

i have been doing the KVM stuff and have a couple of questions that
can not figure out.

1> as we know, normally the external interrupt will cause VMexit and
the hypervisor will inject a virtual interrupt if it is for guest.
Then which irq will be injected (i mean the interrupt vector for
indexing the guest IDT)? How does the KVM get to know about this
(associate a host IRQ with a guest virtual IRQ)?

2> if for assigned device to the guest, the hypervisor will deliver
that IRQ to the guest. by tracing the code, i found the host IRQ is
different with the guest's (i mean the interrupt vector). how the KVM
configure which interrupt vector the guest should use?

3> if we configure not exit on external interrupt by setting some
field in VMCS, what will happen during the physical interrupts? will
the CPU use the guest IDT for response interrupt? If so, can KVM
redirect the CPU to use another IDT for guest (assuming modifying the
IDTR)?

4> where is the guest IDT located? it this configured by the qemu
while initializing the vcpu and registers (include the IDTR)?

I really hope someone can reply to my questions. I will be very appreciated.

Thanks

-- 
Sheng Qiu
Texas A & M University
Room 332B Wisenbaker
email: herbert1984106@gmail.com
College Station, TX 77843-3259

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07  0:12 KVM handling external interrupts sheng qiu
@ 2012-06-07  7:51 ` Abel Gordon
  2012-06-07  8:13   ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-07  7:51 UTC (permalink / raw)
  To: sheng qiu
  Cc: kvm, kvm-owner, Nadav Har'El, Alex Landau, Nadav Amit,
	Dan Tsafrir, Muli Ben-Yehuda

kvm-owner@vger.kernel.org wrote on 07/06/2012 03:12:55:

> From: sheng qiu <herbert1984106@gmail.com>
> To: kvm <kvm@vger.kernel.org>,
> Date: 07/06/2012 03:13
> Subject: KVM handling external interrupts
> Sent by: kvm-owner@vger.kernel.org
>
> 1> as we know, normally the external interrupt will cause VMexit and
> the hypervisor will inject a virtual interrupt if it is for guest.
> Then which irq will be injected (i mean the interrupt vector for
> indexing the guest IDT)? How does the KVM get to know about this
> (associate a host IRQ with a guest virtual IRQ)?

For emulated/para-virtual devices:
There is no direct 1-to-1 relation between a "physical" irq that caused an
exit and a "virtual irq" injected to the guest. QEMU is responsible for
emulating virtual devices, and will inject a virtual interrupt (via KVM)
when the virtual
hardware emulation (software) does so. Physical interrupts are always
handled by
the host linux kernel whenever they caused and exit (arrived in guest mode)
or not
(arrived in root mode). So, physical interrupts are always consumed by the
host.
For example, interrupts in the host might trigger some I/O callback or
release a thread
blocked due to sync I/O in QEMU but QEMU does not map physical interrupts
to virtual
interrupts. Then, depending on how QEMU emulates the virtual devices, it
may device
to inject a virtual interrupt after it finished handling an I/O operation
(e.g. callback called)

> 2> if for assigned device to the guest, the hypervisor will deliver
> that IRQ to the guest. by tracing the code, i found the host IRQ is
> different with the guest's (i mean the interrupt vector). how the KVM
> configure which interrupt vector the guest should use?

For device assignment:
In this case you do have a 1-to-1 mapping between physical and virtual
interrupts
but they do NOT necessary use the same vector.
The linux host kernel assigns a "physical vector". The guest OS assigns a
"virtual vector".
KVM knows the "virtual vector" the guest is using and then injects a
corresponding virtual
interrupt each time a physical interrupts arrives. In other words, KVM does
a simple physical
to virtual conversion of the vector number.

>
> 3> if we configure not exit on external interrupt by setting some
> field in VMCS, what will happen during the physical interrupts? will
> the CPU use the guest IDT for response interrupt? If so, can KVM
> redirect the CPU to use another IDT for guest (assuming modifying the
> IDTR)?

Yes, that's exactly something we already did in a research project.
You can read our paper published in ASPLOS 2012: ELI: Bare-metal
performance for I/O virtualization
(
http://dl.acm.org/citation.cfm?id=2151020&dl=ACM&coll=DL&CFID=86701665&CFTOKEN=26302003
)

Note this is not so simple, there are many other issues you should
consider.

> 4> where is the guest IDT located? it this configured by the qemu
> while initializing the vcpu and registers (include the IDTR)?

The guest IDT is located in the guest address space and the guest setup the
IDTR
(pointer to the IDT) register. KVM is not involved. In other words,
KVM does not touch/modify the IDT content or the IDTR. KVM simple uses
the hardware support in the processor to virtualize the IDTR register
(GUEST_IDTR). Offcourse, you can modify the logic and let KVM
change the IDTR/IDT as we did.

Regards,
Abel Gordon
IBM Research - Haifa

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07  7:51 ` Abel Gordon
@ 2012-06-07  8:13   ` Jan Kiszka
  2012-06-07  9:02     ` Jan Kiszka
  2012-06-07  9:55     ` Abel Gordon
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07  8:13 UTC (permalink / raw)
  To: Abel Gordon
  Cc: sheng qiu, kvm, Nadav Har'El, Alex Landau, Nadav Amit,
	Dan Tsafrir, Muli Ben-Yehuda

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

On 2012-06-07 09:51, Abel Gordon wrote:
>> 3> if we configure not exit on external interrupt by setting some
>> field in VMCS, what will happen during the physical interrupts? will
>> the CPU use the guest IDT for response interrupt? If so, can KVM
>> redirect the CPU to use another IDT for guest (assuming modifying the
>> IDTR)?
> 
> Yes, that's exactly something we already did in a research project.
> You can read our paper published in ASPLOS 2012: ELI: Bare-metal
> performance for I/O virtualization
> (
> http://dl.acm.org/citation.cfm?id=2151020&dl=ACM&coll=DL&CFID=86701665&CFTOKEN=26302003

Interesting. Can you provide it publicly (or send a version privately)?

> )
> 
> Note this is not so simple, there are many other issues you should
> consider.

Is it just complicated, not upstreamable, or are the unsolved issues
like security holes or the need to paravirtualize the guest?

I'm still hoping that Intel/AMD will finally enable this in hardware, at
least for MSIs. Providing direct injection for legacy line-base
interrupts is likely not worth the silicon and bits (would require some
hw-assisted IOAPIC instead of just a bit more APIC virtualization).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07  8:13   ` Jan Kiszka
@ 2012-06-07  9:02     ` Jan Kiszka
  2012-06-07 10:47       ` Abel Gordon
  2012-06-07  9:55     ` Abel Gordon
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07  9:02 UTC (permalink / raw)
  To: Abel Gordon
  Cc: sheng qiu, kvm, Nadav Har'El, Alex Landau, Nadav Amit,
	Dan Tsafrir, Muli Ben-Yehuda

[-- Attachment #1: Type: text/plain, Size: 1613 bytes --]

On 2012-06-07 10:13, Jan Kiszka wrote:
> On 2012-06-07 09:51, Abel Gordon wrote:
>>> 3> if we configure not exit on external interrupt by setting some
>>> field in VMCS, what will happen during the physical interrupts? will
>>> the CPU use the guest IDT for response interrupt? If so, can KVM
>>> redirect the CPU to use another IDT for guest (assuming modifying the
>>> IDTR)?
>>
>> Yes, that's exactly something we already did in a research project.
>> You can read our paper published in ASPLOS 2012: ELI: Bare-metal
>> performance for I/O virtualization
>> (
>> http://dl.acm.org/citation.cfm?id=2151020&dl=ACM&coll=DL&CFID=86701665&CFTOKEN=26302003
> 
> Interesting. Can you provide it publicly (or send a version privately)?

Sorry, should have googled first:

http://www.mulix.org/pubs/eli/eli.pdf :)

> 
>> )
>>
>> Note this is not so simple, there are many other issues you should
>> consider.
> 
> Is it just complicated, not upstreamable, or are the unsolved issues
> like security holes or the need to paravirtualize the guest?

My first feeling is that it's not easily upstreamable due to the need to
fiddle with the host's IDT, specifically on VCPU task migration. But I
need to read the requirements of this more carefully. Still interesting
work!

Jan

> 
> I'm still hoping that Intel/AMD will finally enable this in hardware, at
> least for MSIs. Providing direct injection for legacy line-base
> interrupts is likely not worth the silicon and bits (would require some
> hw-assisted IOAPIC instead of just a bit more APIC virtualization).
> 
> Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07  8:13   ` Jan Kiszka
  2012-06-07  9:02     ` Jan Kiszka
@ 2012-06-07  9:55     ` Abel Gordon
  2012-06-07 10:23       ` Jan Kiszka
  2012-06-07 11:40       ` Jan Kiszka
  1 sibling, 2 replies; 31+ messages in thread
From: Abel Gordon @ 2012-06-07  9:55 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

> > Note this is not so simple, there are many other issues you should
> > consider.
>
> Is it just complicated, not upstreamable, or are the unsolved issues
> like security holes or the need to paravirtualize the guest?

Well, I let you read the paper first :) It will answer all these questions.

In a nutshell,
Complicated: that always depends who you ask and relative to what you
consider something complicated. ELI changes some critical points in KVM.
Unsolved issues: there are some issues solves in theory but not implemented
Security holes: not if you are OK with the threat model we describe in the
paper
need paravirtualize the guest: no if you have x2APIC.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07  9:55     ` Abel Gordon
@ 2012-06-07 10:23       ` Jan Kiszka
  2012-06-07 10:34         ` Nadav Har'El
  2012-06-07 11:40       ` Jan Kiszka
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 10:23 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

On 2012-06-07 11:55, Abel Gordon wrote:
> 
>>> Note this is not so simple, there are many other issues you should
>>> consider.
>>
>> Is it just complicated, not upstreamable, or are the unsolved issues
>> like security holes or the need to paravirtualize the guest?
> 
> Well, I let you read the paper first :) It will answer all these questions.

I'm on it. Two general remarks so far:

 - At least the preemption timer is not common x86 architecture but can
   only be found in VT-x. You should mention that you are focusing on
   Intel.
 - You discuss interrupt delivery without stating that you have MSIs in
   mind. Some aspects may be helpful for legacy interrupts as well, but
   you obviously can't achieve exit-less operation there. Not an issue,
   should just be made clear.

> 
> In a nutshell,
> Complicated: that always depends who you ask and relative to what you
> consider something complicated. ELI changes some critical points in KVM.
> Unsolved issues: there are some issues solves in theory but not implemented
> Security holes: not if you are OK with the threat model we describe in the
> paper

The thread model looks sane, but I'm not feeling well with the "let's
poll the guest to see if it misbehaved" solution. It should work but is
a bit ugly.

> need paravirtualize the guest: no if you have x2APIC.

...and the guest makes use of it. This excludes older OSes. When did
Windows start to use it?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 10:23       ` Jan Kiszka
@ 2012-06-07 10:34         ` Nadav Har'El
  2012-06-07 10:48           ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Nadav Har'El @ 2012-06-07 10:34 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Abel Gordon, Alex Landau, Dan Tsafrir, sheng qiu, kvm,
	Muli Ben-Yehuda, Nadav Amit

>  - You discuss interrupt delivery without stating that you have MSIs in
>    mind. Some aspects may be helpful for legacy interrupts as well, but
>    you obviously can't achieve exit-less operation there. Not an issue,
>    should just be made clear.

Can you eleborate on why exit-less operation cannot be achieved without
MSI? Doesn't the VMCS flag to avoid exiting on external interrupts
apply to any interrupts? Or something else won't work?

In any case, you're right that our implementation and tests all used
MSI.

> > need paravirtualize the guest: no if you have x2APIC.
>
> ...and the guest makes use of it. This excludes older OSes. When did
> Windows start to use it?

Iff you can't use x2APIC, and don't want to paravirtualize
the guest, you still get exit-less interrupt *delivery*, which as we
showed in the benchmarks, gets you more than half of the performance
improvement (although with newer KVM's improvement in EOI emulation
performance, the over-half improvement should be somewhat less pronounced).

Nadav.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07  9:02     ` Jan Kiszka
@ 2012-06-07 10:47       ` Abel Gordon
  2012-06-07 10:51         ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-07 10:47 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit



Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 12:02:31:

> >> Yes, that's exactly something we already did in a research project.
> >> You can read our paper published in ASPLOS 2012: ELI: Bare-metal
> >> performance for I/O virtualization
> >> (
> >> http://dl.acm.org/citation.cfm?
> id=2151020&dl=ACM&coll=DL&CFID=86701665&CFTOKEN=26302003
> >
> > Interesting. Can you provide it publicly (or send a version privately)?
>
> Sorry, should have googled first:
>
> http://www.mulix.org/pubs/eli/eli.pdf :)
np ;)

> >> Note this is not so simple, there are many other issues you should
> >> consider.
> >
> > Is it just complicated, not upstreamable, or are the unsolved issues
> > like security holes or the need to paravirtualize the guest?
>
> My first feeling is that it's not easily upstreamable due to the need to
> fiddle with the host's IDT, specifically on VCPU task migration. But I
> need to read the requirements of this more carefully. Still interesting
> work!

You don't need to fiddle with the host's IDT, you need to fiddle with
the shadow IDT and interrupt vector mapping/remapping.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 10:34         ` Nadav Har'El
@ 2012-06-07 10:48           ` Jan Kiszka
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 10:48 UTC (permalink / raw)
  To: Nadav Har'El
  Cc: Abel Gordon, Alex Landau, Dan Tsafrir, sheng qiu, kvm,
	Muli Ben-Yehuda, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

On 2012-06-07 12:34, Nadav Har'El wrote:
>>  - You discuss interrupt delivery without stating that you have MSIs in
>>    mind. Some aspects may be helpful for legacy interrupts as well, but
>>    you obviously can't achieve exit-less operation there. Not an issue,
>>    should just be made clear.
> 
> Can you eleborate on why exit-less operation cannot be achieved without
> MSI? Doesn't the VMCS flag to avoid exiting on external interrupts
> apply to any interrupts? Or something else won't work?

The guest needs to interact with the IOAPIC. And this resource is shared
between host and guest. It can't be passed through.

> 
> In any case, you're right that our implementation and tests all used
> MSI.
> 
>>> need paravirtualize the guest: no if you have x2APIC.
>>
>> ...and the guest makes use of it. This excludes older OSes. When did
>> Windows start to use it?
> 
> Iff you can't use x2APIC, and don't want to paravirtualize

Often, it is more about "... _can't_ paravirtualize". :)

> the guest, you still get exit-less interrupt *delivery*, which as we
> showed in the benchmarks, gets you more than half of the performance
> improvement (although with newer KVM's improvement in EOI emulation
> performance, the over-half improvement should be somewhat less pronounced).

Yes, I understood this, and I think looking at direct delivery would be
a good first step to check if this could eventually become an upstream
feature. It should even be beneficial for legacy interrupts.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 10:47       ` Abel Gordon
@ 2012-06-07 10:51         ` Jan Kiszka
  2012-06-07 11:05           ` Abel Gordon
  2012-06-07 11:10           ` Jan Kiszka
  0 siblings, 2 replies; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 10:51 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 1501 bytes --]

On 2012-06-07 12:47, Abel Gordon wrote:
> 
> 
> Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 12:02:31:
> 
>>>> Yes, that's exactly something we already did in a research project.
>>>> You can read our paper published in ASPLOS 2012: ELI: Bare-metal
>>>> performance for I/O virtualization
>>>> (
>>>> http://dl.acm.org/citation.cfm?
>> id=2151020&dl=ACM&coll=DL&CFID=86701665&CFTOKEN=26302003
>>>
>>> Interesting. Can you provide it publicly (or send a version privately)?
>>
>> Sorry, should have googled first:
>>
>> http://www.mulix.org/pubs/eli/eli.pdf :)
> np ;)
> 
>>>> Note this is not so simple, there are many other issues you should
>>>> consider.
>>>
>>> Is it just complicated, not upstreamable, or are the unsolved issues
>>> like security holes or the need to paravirtualize the guest?
>>
>> My first feeling is that it's not easily upstreamable due to the need to
>> fiddle with the host's IDT, specifically on VCPU task migration. But I
>> need to read the requirements of this more carefully. Still interesting
>> work!
> 
> You don't need to fiddle with the host's IDT, you need to fiddle with
> the shadow IDT and interrupt vector mapping/remapping.

Yes, but you need to sync the host IDT into the shadow table. This may
require some hooks in generic code to avoid scanning the host table on
each guest entry.

BTW, the shadow IDT has to be put in the guest address space, right? So
we need to make it read-only for the guest?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 10:51         ` Jan Kiszka
@ 2012-06-07 11:05           ` Abel Gordon
  2012-06-07 11:13             ` Jan Kiszka
  2012-06-07 11:10           ` Jan Kiszka
  1 sibling, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-07 11:05 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 13:51:19:

> >> My first feeling is that it's not easily upstreamable due to the need
to
> >> fiddle with the host's IDT, specifically on VCPU task migration. But I
> >> need to read the requirements of this more carefully. Still
interesting
> >> work!
> >
> > You don't need to fiddle with the host's IDT, you need to fiddle with
> > the shadow IDT and interrupt vector mapping/remapping.
>
> Yes, but you need to sync the host IDT into the shadow table. This may
> require some hooks in generic code to avoid scanning the host table on
> each guest entry.

Well, the shadow IDT only needs to be synced with interrupts coming from
assigned devices. The rest of the entries doesn't matter, they just
generate an exception. Once they generate an exception, they are delivered
through the host IDT. So, all you need to know are the vectors assigned
to the guest to build the shadow IDT.

> BTW, the shadow IDT has to be put in the guest address space, right? So
> we need to make it read-only for the guest?

Yes, the shadow IDT is mapped into the guest address space and
write-protected
in case a malicious guest tries to change it. In addition, you also need
to write protect the "guest IDT" to catch any changes the guest could made
that need to be reflected in the shadow IDT (e.g. handlers for assigned
vectors
or exceptions). However, this is a rare case and does not occur during
normal execution.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 10:51         ` Jan Kiszka
  2012-06-07 11:05           ` Abel Gordon
@ 2012-06-07 11:10           ` Jan Kiszka
  2012-06-07 11:49             ` Abel Gordon
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 11:10 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

On 2012-06-07 12:51, Jan Kiszka wrote:
> BTW, the shadow IDT has to be put in the guest address space, right? So
> we need to make it read-only for the guest?

Just found your solution: Append to a PCI bar. That's nasty. Better
reserve some memory via e820. There is a paravirtual channel from QEMU
to the BIOS to communicate such reservations.

BTW, the IDTR holds a linear address, not a virtual one. Unless I
misremember, there is no need to map the IDT via the page table. The
processor will not consult it for reading its entries.

Also, you do not discuss making the shadow table read-only in the guest
address space. This should help enforcing some security properties, no?

Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 11:05           ` Abel Gordon
@ 2012-06-07 11:13             ` Jan Kiszka
  2012-06-07 11:51               ` Abel Gordon
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 11:13 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 1255 bytes --]

On 2012-06-07 13:05, Abel Gordon wrote:
> 
> 
> Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 13:51:19:
> 
>>>> My first feeling is that it's not easily upstreamable due to the need
> to
>>>> fiddle with the host's IDT, specifically on VCPU task migration. But I
>>>> need to read the requirements of this more carefully. Still
> interesting
>>>> work!
>>>
>>> You don't need to fiddle with the host's IDT, you need to fiddle with
>>> the shadow IDT and interrupt vector mapping/remapping.
>>
>> Yes, but you need to sync the host IDT into the shadow table. This may
>> require some hooks in generic code to avoid scanning the host table on
>> each guest entry.
> 
> Well, the shadow IDT only needs to be synced with interrupts coming from
> assigned devices. The rest of the entries doesn't matter, they just
> generate an exception. Once they generate an exception, they are delivered
> through the host IDT. So, all you need to know are the vectors assigned
> to the guest to build the shadow IDT.

Not totally true. If the host decides to allocate some new vector that
collides with some guest usage, you need to rearrange the shadow IDT and
the physical IRQ routing. So you need to track what the host does.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07  9:55     ` Abel Gordon
  2012-06-07 10:23       ` Jan Kiszka
@ 2012-06-07 11:40       ` Jan Kiszka
  2012-06-07 12:17         ` Abel Gordon
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 11:40 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 881 bytes --]

On 2012-06-07 11:55, Abel Gordon wrote:
> Security holes: not if you are OK with the threat model we describe in the
> paper

Back to this: I don't get your threat model completely. How should the
guest be able to manipulate the shadow IDT if we a) mark it read-only in
the host's page table that maps the guest physical memory and b) prevent
via the IOMMU that any assigned devices can address this page via DMA?

But even if we consider the IDT unsafe, what does that IDT limiting buy
us? The guest can still mask interrupts above that limit via cli, no?
Also, unless I misunderstood your suggestions, I wouldn't try to run
normal interrupt handlers in NMI context. That's asking for lots of
troubles or lots of code changes.

So the only measures that save us from CPU hogging guests are the
preemption timer and kicking via NMI. Or what am I missing?

Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 11:10           ` Jan Kiszka
@ 2012-06-07 11:49             ` Abel Gordon
  2012-06-07 12:11               ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-07 11:49 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 14:10:32:

> > BTW, the shadow IDT has to be put in the guest address space, right? So
> > we need to make it read-only for the guest?
>
> Just found your solution: Append to a PCI bar. That's nasty. Better
> reserve some memory via e820. There is a paravirtual channel from QEMU
> to the BIOS to communicate such reservations.

We will take a look at e820 and consider your suggestion, thanks!
The PCI BAR worked well to obtain an "unused" and "mapped" memory area
for unmodified guests.

Nasty ? Well, as usual, depends who you ask and the alternatives you
compare with.
For us, it was an elegant and easy way to achieve the goal.

> BTW, the IDTR holds a linear address, not a virtual one. Unless I
> misremember, there is no need to map the IDT via the page table. The
> processor will not consult it for reading its entries.

As I understand and as we noticed in our runs using ELI, the processor
uses the page tables to translate the IDTR (linear address) into physical
address (guest physical in this case).

(1) Logical addresses are converted to linear addresses using segments (not
relevant in our case)
(2) Linear addresses are converted to physical addresses using page tables
(this is our case)

Am I missing something ? In your case, I assume, [virtual = logical] and
[linear = linear]
or you are using some different semantics ?

> Also, you do not discuss making the shadow table read-only in the guest
> address space. This should help enforcing some security properties, no?

We discussed this shortly at the end of Section 4.2:

"...To detect runtime changes to the guest IDT, the
host also write-protects the shadow IDT page. Other security
and isolation considerations are discussed in Section 6"

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 11:13             ` Jan Kiszka
@ 2012-06-07 11:51               ` Abel Gordon
  2012-06-07 11:54                 ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-07 11:51 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit



Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 14:13:45:

> > Well, the shadow IDT only needs to be synced with interrupts coming
from
> > assigned devices. The rest of the entries doesn't matter, they just
> > generate an exception. Once they generate an exception, they are
delivered
> > through the host IDT. So, all you need to know are the vectors assigned
> > to the guest to build the shadow IDT.
>
> Not totally true. If the host decides to allocate some new vector that
> collides with some guest usage, you need to rearrange the shadow IDT and
> the physical IRQ routing. So you need to track what the host does.

Well, depends if you re-allocate the vector used by the guest or the vector
used by the host. Anyway, I think we understand each other :)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 11:51               ` Abel Gordon
@ 2012-06-07 11:54                 ` Jan Kiszka
  2012-06-07 12:02                   ` Abel Gordon
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 11:54 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

On 2012-06-07 13:51, Abel Gordon wrote:
> 
> 
> Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 14:13:45:
> 
>>> Well, the shadow IDT only needs to be synced with interrupts coming
> from
>>> assigned devices. The rest of the entries doesn't matter, they just
>>> generate an exception. Once they generate an exception, they are
> delivered
>>> through the host IDT. So, all you need to know are the vectors assigned
>>> to the guest to build the shadow IDT.
>>
>> Not totally true. If the host decides to allocate some new vector that
>> collides with some guest usage, you need to rearrange the shadow IDT and
>> the physical IRQ routing. So you need to track what the host does.
> 
> Well, depends if you re-allocate the vector used by the guest or the vector
> used by the host. Anyway, I think we understand each other :)

KVM is just a subsystem of the Linux kernel, usually not involved in
LAPIC vector allocations. Your suggestion would turn this around a bit.
Not impossible, but expect some discussions. ;)

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 11:54                 ` Jan Kiszka
@ 2012-06-07 12:02                   ` Abel Gordon
  0 siblings, 0 replies; 31+ messages in thread
From: Abel Gordon @ 2012-06-07 12:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 14:54:35:

> > Well, depends if you re-allocate the vector used by the guest or the
vector
> > used by the host. Anyway, I think we understand each other :)
>
> KVM is just a subsystem of the Linux kernel, usually not involved in
> LAPIC vector allocations. Your suggestion would turn this around a bit.

:) I think we will find long discussions around this type of statement
since
KVM was created... how many changes made to the Linux kernel were driven
by KVM ? (e.g MMU notifies for guest memory swapping)

I wrote this just to clarify my point of view. I don't want to start an
endless
discussion around Linux and KVM synergy, so let's not do that :)

> Not impossible, but expect some discussions. ;)
Agree

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 11:49             ` Abel Gordon
@ 2012-06-07 12:11               ` Jan Kiszka
  2012-06-07 12:25                 ` Abel Gordon
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 12:11 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 2120 bytes --]

On 2012-06-07 13:49, Abel Gordon wrote:
> 
> 
> Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 14:10:32:
> 
>>> BTW, the shadow IDT has to be put in the guest address space, right? So
>>> we need to make it read-only for the guest?
>>
>> Just found your solution: Append to a PCI bar. That's nasty. Better
>> reserve some memory via e820. There is a paravirtual channel from QEMU
>> to the BIOS to communicate such reservations.
> 
> We will take a look at e820 and consider your suggestion, thanks!
> The PCI BAR worked well to obtain an "unused" and "mapped" memory area
> for unmodified guests.
> 
> Nasty ? Well, as usual, depends who you ask and the alternatives you
> compare with.
> For us, it was an elegant and easy way to achieve the goal.

It's nasty as it requires more interaction between KVM and the userspace
hypervisor and relies on PCI, which has nothing to do with the x86
architecture. Consider you only want to forward non-PCI interrupts (e.g.
the LAPIC timer) and have no assigned device...

> 
>> BTW, the IDTR holds a linear address, not a virtual one. Unless I
>> misremember, there is no need to map the IDT via the page table. The
>> processor will not consult it for reading its entries.
> 
> As I understand and as we noticed in our runs using ELI, the processor
> uses the page tables to translate the IDTR (linear address) into physical
> address (guest physical in this case).
> 
> (1) Logical addresses are converted to linear addresses using segments (not
> relevant in our case)
> (2) Linear addresses are converted to physical addresses using page tables
> (this is our case)
> 
> Am I missing something ? In your case, I assume, [virtual = logical] and
> [linear = linear]
> or you are using some different semantics ?

No, you are right, the descriptor tables run through paging as well.

But how do you ensure that the shadow IDT is mapped where you expect it?
How do you detect where it is mapped? That reminds me of our KVM VAPIC
and the hoops that code has to jump through to ensure this just for
32-bit XP guests...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 11:40       ` Jan Kiszka
@ 2012-06-07 12:17         ` Abel Gordon
  2012-06-07 12:19           ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-07 12:17 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit



Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 14:40:57:

> But even if we consider the IDT unsafe, what does that IDT limiting buy
> us?

The limit lets you force an exit (#GP exception) whenever the shadow IDT
is ok or not. In this case, you simple shadow the GUEST_IDTR register
and not a memory area

> The guest can still mask interrupts above that limit via cli, no?
> So the only measures that save us from CPU hogging guests are the
> preemption timer and kicking via NMI. Or what am I missing?

Nothing :) As we described in the paper, this is what we do to avoid
this situation.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 12:17         ` Abel Gordon
@ 2012-06-07 12:19           ` Jan Kiszka
  2012-06-07 12:32             ` Abel Gordon
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 12:19 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 817 bytes --]

On 2012-06-07 14:17, Abel Gordon wrote:
> 
> 
> Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 14:40:57:
> 
>> But even if we consider the IDT unsafe, what does that IDT limiting buy
>> us?
> 
> The limit lets you force an exit (#GP exception) whenever the shadow IDT
> is ok or not. In this case, you simple shadow the GUEST_IDTR register
> and not a memory area
> 
>> The guest can still mask interrupts above that limit via cli, no?
>> So the only measures that save us from CPU hogging guests are the
>> preemption timer and kicking via NMI. Or what am I missing?
> 
> Nothing :) As we described in the paper, this is what we do to avoid
> this situation.

So the other measures are redundant, right? They only seem to complicate
the approach without any gain, that is my point.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 12:11               ` Jan Kiszka
@ 2012-06-07 12:25                 ` Abel Gordon
  2012-06-07 15:05                   ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-07 12:25 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 15:11:24:

> > Am I missing something ? In your case, I assume, [virtual = logical]
and
> > [linear = linear]
> > or you are using some different semantics ?
> No, you are right, the descriptor tables run through paging as well.

Txs. Now that you understand your mistake, the discussion will be simpler.

 > But how do you ensure that the shadow IDT is mapped where you expect it?

First, I assume,  you will agree with us that using the e820 as you
suggested doesn't help because we need mapped memory.

How ? As we described in the paper, we use the PCI BAR to obtain mapped
memory.
Where ? Doesn't matter. We know the GPA of the BAR and just do a reverse
translation to obtain the GVA.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 12:19           ` Jan Kiszka
@ 2012-06-07 12:32             ` Abel Gordon
  2012-06-07 15:07               ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-07 12:32 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, kvm-owner,
	Muli Ben-Yehuda, Nadav Har'El, Nadav Amit



kvm-owner@vger.kernel.org wrote on 07/06/2012 15:19:14:

> >> The guest can still mask interrupts above that limit via cli, no?
> >> So the only measures that save us from CPU hogging guests are the
> >> preemption timer and kicking via NMI. Or what am I missing?
> >
> > Nothing :) As we described in the paper, this is what we do to avoid
> > this situation.
>
> So the other measures are redundant, right? They only seem to complicate
> the approach without any gain, that is my point.

We described in the paper all the mechanisms we thought could be used.
Which mechanisms are sufficient/preferable/simpler ?  I think we are back
to the KVM<->Linux dependencies and whenever we are talking about
hypervisors in general or a specific implementation for KVM.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 12:25                 ` Abel Gordon
@ 2012-06-07 15:05                   ` Jan Kiszka
  2012-06-10  8:41                     ` Abel Gordon
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 15:05 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

On 2012-06-07 14:25, Abel Gordon wrote:
> Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 15:11:24:
> 
>>> Am I missing something ? In your case, I assume, [virtual = logical]
> and
>>> [linear = linear]
>>> or you are using some different semantics ?
>> No, you are right, the descriptor tables run through paging as well.
> 
> Txs. Now that you understand your mistake, the discussion will be simpler.
> 
>  > But how do you ensure that the shadow IDT is mapped where you expect it?
> 
> First, I assume,  you will agree with us that using the e820 as you
> suggested doesn't help because we need mapped memory.
> 
> How ? As we described in the paper, we use the PCI BAR to obtain mapped
> memory.
> Where ? Doesn't matter. We know the GPA of the BAR and just do a reverse
> translation to obtain the GVA.

It remains a fragile approach:
 - host-side reverse translations may not return a stable result, thus
   may require to redo this step several times
 - the guest may decide to remove/disable the device you chose for
   appending the IDT
 - changing the real BAR size can confuse the guest, or it only maps
   what it requires of the real device

That's why I consider it nasty.

I'm wondering if redirecting (to different cores) or masking (at
device/IOAPIC/LAPIC level) of non-guest interrupts and only relying on
preemption timer/NMI isn't simpler. Then you wouldn't have to shadow the
IDT.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 12:32             ` Abel Gordon
@ 2012-06-07 15:07               ` Jan Kiszka
  2012-06-10 10:12                 ` Abel Gordon
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-07 15:07 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, kvm-owner,
	Muli Ben-Yehuda, Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

On 2012-06-07 14:32, Abel Gordon wrote:
> kvm-owner@vger.kernel.org wrote on 07/06/2012 15:19:14:
> 
>>>> The guest can still mask interrupts above that limit via cli, no?
>>>> So the only measures that save us from CPU hogging guests are the
>>>> preemption timer and kicking via NMI. Or what am I missing?
>>>
>>> Nothing :) As we described in the paper, this is what we do to avoid
>>> this situation.
>>
>> So the other measures are redundant, right? They only seem to complicate
>> the approach without any gain, that is my point.
> 
> We described in the paper all the mechanisms we thought could be used.

Which of them did you implement and validate so far?

> Which mechanisms are sufficient/preferable/simpler ?  I think we are back
> to the KVM<->Linux dependencies and whenever we are talking about
> hypervisors in general or a specific implementation for KVM.

I don't think this depends on KVM vs. whatever hypervisor, these are
pretty generic considerations.

If you need the preemption timer for breaking out of cli anyway, why
play tricks with off-limit vectors? NMIs can be useful to accelerate the
preemption when some other core wants to deliver an IPI (to kick the
target out of guest mode and to reenable interrupts, not to process them).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 15:05                   ` Jan Kiszka
@ 2012-06-10  8:41                     ` Abel Gordon
  2012-06-10 10:16                       ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-10  8:41 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit



Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 18:05:55:

> It remains a fragile approach:
>  - host-side reverse translations may not return a stable result, thus
>    may require to redo this step several times
>  - the guest may decide to remove/disable the device you chose for
>    appending the IDT
>  - changing the real BAR size can confuse the guest, or it only maps
>    what it requires of the real device
> That's why I consider it nasty.

Yep, these are corner cases we should deal with but they are not part
of the common case/critical path.

> I'm wondering if redirecting (to different cores) or masking (at
> device/IOAPIC/LAPIC level) of non-guest interrupts and only relying on
> preemption timer/NMI isn't simpler. Then you wouldn't have to shadow the
> IDT.

Yep, as we suggested in the paper, that could be also an alternative.
Is it really simpler ? Again, depends who you ask and what you need to
change.
All the alternatives have a set of pros and cons.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-07 15:07               ` Jan Kiszka
@ 2012-06-10 10:12                 ` Abel Gordon
  0 siblings, 0 replies; 31+ messages in thread
From: Abel Gordon @ 2012-06-10 10:12 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, kvm-owner,
	Muli Ben-Yehuda, Nadav Har'El, Nadav Amit

kvm-owner@vger.kernel.org wrote on 07/06/2012 18:07:03:

> > We described in the paper all the mechanisms we thought could be used.
> Which of them did you implement and validate so far?

shadow IDT + NP exception, preeemption timer, kicks via NMI, interrupt
affinity

> > Which mechanisms are sufficient/preferable/simpler ?  I think we are
back
> > to the KVM<->Linux dependencies and whenever we are talking about
> > hypervisors in general or a specific implementation for KVM.
>
> I don't think this depends on KVM vs. whatever hypervisor, these are
> pretty generic considerations.

Well, as you mentioned in a previous email, it might be difficult to get
some changes upstreamed to the Linux Kernel (due to KVM/Linux asymmetric
model)
while they may be easy to integrate with other hypervisors.

> If you need the preemption timer for breaking out of cli anyway, why
> play tricks with off-limit vectors? NMIs can be useful to accelerate the
> preemption when some other core wants to deliver an IPI (to kick the
> target out of guest mode and to reenable interrupts, not to process
them).

Yep, you can use NMI if you need "immediate" kicking. The question is
when/why
you need immediate kicking (e.g. TLB flush). In my opinion, if you care
about
performance and you don't need immediate kicks then you can just deliver an
interrupt and wait until the guest enables interrupts or the timer elapses.
Remember that the guest disables interrupts to handle critical
operations and a non-malicious guest is supposed to do so for a short
period of
time. So, if you can avoid interrupting the guest while it disabled
interrupts,
then, better to do so. Take for example the case a guest holds a spin lock
for
part of the time it disabled interrupts. If you don't interrupt the guest
in this
period, you could avoid lock-holder preemptions.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-10  8:41                     ` Abel Gordon
@ 2012-06-10 10:16                       ` Jan Kiszka
  2012-06-10 10:43                         ` Abel Gordon
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-10 10:16 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, Muli Ben-Yehuda,
	Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

On 2012-06-10 10:41, Abel Gordon wrote:
> Jan Kiszka <jan.kiszka@web.de> wrote on 07/06/2012 18:05:55:
> 
>> It remains a fragile approach:
>>  - host-side reverse translations may not return a stable result, thus
>>    may require to redo this step several times
>>  - the guest may decide to remove/disable the device you chose for
>>    appending the IDT
>>  - changing the real BAR size can confuse the guest, or it only maps
>>    what it requires of the real device
>> That's why I consider it nasty.
> 
> Yep, these are corner cases we should deal with but they are not part
> of the common case/critical path.
> 
>> I'm wondering if redirecting (to different cores) or masking (at
>> device/IOAPIC/LAPIC level) of non-guest interrupts and only relying on
>> preemption timer/NMI isn't simpler. Then you wouldn't have to shadow the
>> IDT.
> 
> Yep, as we suggested in the paper, that could be also an alternative.
> Is it really simpler ? Again, depends who you ask and what you need to
> change.
> All the alternatives have a set of pros and cons.
> 

For sure. But avoiding the shadow IDT would likely mean avoiding
userspace changes for KVM. And that means simplification. And avoid PCI
dependencies.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-10 10:16                       ` Jan Kiszka
@ 2012-06-10 10:43                         ` Abel Gordon
  2012-06-10 12:16                           ` Jan Kiszka
  0 siblings, 1 reply; 31+ messages in thread
From: Abel Gordon @ 2012-06-10 10:43 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, kvm-owner,
	Muli Ben-Yehuda, Nadav Har'El, Nadav Amit



kvm-owner@vger.kernel.org wrote on 10/06/2012 13:16:01:

> > Yep, these are corner cases we should deal with but they are not part
> > of the common case/critical path.
> >
> >> I'm wondering if redirecting (to different cores) or masking (at
> >> device/IOAPIC/LAPIC level) of non-guest interrupts and only relying on
> >> preemption timer/NMI isn't simpler. Then you wouldn't have to shadow
the
> >> IDT.
> >
> > Yep, as we suggested in the paper, that could be also an alternative.
> > Is it really simpler ? Again, depends who you ask and what you need to
> > change.
> > All the alternatives have a set of pros and cons.
> >
> For sure. But avoiding the shadow IDT would likely mean avoiding
> userspace changes for KVM. And that means simplification. And avoid PCI
> dependencies.

But you lose flexibility. Remember that if you don't shadow the IDT
you need at least one dedicated core that never uses ELI to handle
all the physical interrupts. With the shadow IDT, you could enable
ELI in all the cores.
In addition, if you don't use the shadow IDT, host interrupts will not
be balanced across all the ELI cores. Thus, if you run many VMs/VCPU, you
might experience higher latency/bottlenecks or have scalability
problems unless you use a shadow IDT (depending on the workload,
offcourse).


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-10 10:43                         ` Abel Gordon
@ 2012-06-10 12:16                           ` Jan Kiszka
  2012-06-10 13:30                             ` Abel Gordon
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kiszka @ 2012-06-10 12:16 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, kvm-owner,
	Muli Ben-Yehuda, Nadav Har'El, Nadav Amit

[-- Attachment #1: Type: text/plain, Size: 2044 bytes --]

On 2012-06-10 12:43, Abel Gordon wrote:
> 
> 
> kvm-owner@vger.kernel.org wrote on 10/06/2012 13:16:01:
> 
>>> Yep, these are corner cases we should deal with but they are not part
>>> of the common case/critical path.
>>>
>>>> I'm wondering if redirecting (to different cores) or masking (at
>>>> device/IOAPIC/LAPIC level) of non-guest interrupts and only relying on
>>>> preemption timer/NMI isn't simpler. Then you wouldn't have to shadow
> the
>>>> IDT.
>>>
>>> Yep, as we suggested in the paper, that could be also an alternative.
>>> Is it really simpler ? Again, depends who you ask and what you need to
>>> change.
>>> All the alternatives have a set of pros and cons.
>>>
>> For sure. But avoiding the shadow IDT would likely mean avoiding
>> userspace changes for KVM. And that means simplification. And avoid PCI
>> dependencies.
> 
> But you lose flexibility. Remember that if you don't shadow the IDT
> you need at least one dedicated core that never uses ELI to handle
> all the physical interrupts. With the shadow IDT, you could enable
> ELI in all the cores.

You need to program the preemption timer anyway. Once you leave some
guest due to its expiry, you will re-enable the host IRQs and process them.

> In addition, if you don't use the shadow IDT, host interrupts will not
> be balanced across all the ELI cores. Thus, if you run many VMs/VCPU, you
> might experience higher latency/bottlenecks or have scalability
> problems unless you use a shadow IDT (depending on the workload,
> offcourse).

That might be an issue.

My feeling is software-based ELI could be a transitional feature (until
hardware supports it properly) and may focus more on static setups where
you have dedicated cores for guests and separated I/O processing.

In any case, I would suggest to start small, mostly self-contained, ie.
with changes that stay within KVM as far as possible. If that is
accepted, you could suggest more sophisticated mechanisms on top,
addressing more use cases.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: KVM handling external interrupts
  2012-06-10 12:16                           ` Jan Kiszka
@ 2012-06-10 13:30                             ` Abel Gordon
  0 siblings, 0 replies; 31+ messages in thread
From: Abel Gordon @ 2012-06-10 13:30 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Alex Landau, Dan Tsafrir, sheng qiu, kvm, kvm-owner,
	Muli Ben-Yehuda, Nadav Har'El, Nadav Amit

Jan Kiszka <jan.kiszka@web.de> wrote on 10/06/2012 15:16:11:

> > But you lose flexibility. Remember that if you don't shadow the IDT
> > you need at least one dedicated core that never uses ELI to handle
> > all the physical interrupts. With the shadow IDT, you could enable
> > ELI in all the cores.
>
> You need to program the preemption timer anyway. Once you leave some
> guest due to its expiry, you will re-enable the host IRQs and process
them.

That' exactly what we do. I never meant to say you don't need the
preemption timer.
You always need it. However, we "reset" it on every exit. So, if your exit
rate
is higher than the preemption timer (e.g. due to local interrupts or
privileged
instructions), then the preemption timer will never fire and you will not
increase
the number of exits. On every exit, we (actually KVM) re-enable and process
interrupts in the host.

> That might be an issue.
>
> My feeling is software-based ELI could be a transitional feature (until
> hardware supports it properly) and may focus more on static setups where
> you have dedicated cores for guests and separated I/O processing.

Yes, we hope the paper and the results will help x86 manufacturers
understand the potential/importance of ELI and convince them to
support this feature in the hardware (as we described in section 7,
Architectural Support).

> In any case, I would suggest to start small, mostly self-contained, ie.
> with changes that stay within KVM as far as possible. If that is
> accepted, you could suggest more sophisticated mechanisms on top,
> addressing more use cases.

Agree.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-06-10 13:30 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-07  0:12 KVM handling external interrupts sheng qiu
2012-06-07  7:51 ` Abel Gordon
2012-06-07  8:13   ` Jan Kiszka
2012-06-07  9:02     ` Jan Kiszka
2012-06-07 10:47       ` Abel Gordon
2012-06-07 10:51         ` Jan Kiszka
2012-06-07 11:05           ` Abel Gordon
2012-06-07 11:13             ` Jan Kiszka
2012-06-07 11:51               ` Abel Gordon
2012-06-07 11:54                 ` Jan Kiszka
2012-06-07 12:02                   ` Abel Gordon
2012-06-07 11:10           ` Jan Kiszka
2012-06-07 11:49             ` Abel Gordon
2012-06-07 12:11               ` Jan Kiszka
2012-06-07 12:25                 ` Abel Gordon
2012-06-07 15:05                   ` Jan Kiszka
2012-06-10  8:41                     ` Abel Gordon
2012-06-10 10:16                       ` Jan Kiszka
2012-06-10 10:43                         ` Abel Gordon
2012-06-10 12:16                           ` Jan Kiszka
2012-06-10 13:30                             ` Abel Gordon
2012-06-07  9:55     ` Abel Gordon
2012-06-07 10:23       ` Jan Kiszka
2012-06-07 10:34         ` Nadav Har'El
2012-06-07 10:48           ` Jan Kiszka
2012-06-07 11:40       ` Jan Kiszka
2012-06-07 12:17         ` Abel Gordon
2012-06-07 12:19           ` Jan Kiszka
2012-06-07 12:32             ` Abel Gordon
2012-06-07 15:07               ` Jan Kiszka
2012-06-10 10:12                 ` Abel Gordon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox