* [MODERATED] Interrupts policy for SMT
@ 2018-07-05 23:07 Andi Kleen
2018-07-06 12:35 ` [MODERATED] " Paolo Bonzini
2018-07-06 18:56 ` Jon Masters
0 siblings, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2018-07-05 23:07 UTC (permalink / raw)
To: speck
Hi,
When SMT is on for a guest and there is an interrupt on the sibling the guest
may be able to look at the data processed by the interrupt through L1TF.
This is mainly an issue when SMT is on, but the guest is confined
to a small number of exclusive cores that don't run any other processes,
e.g. through an exclusive cpuset.
There are three possible solutions for this:
1. Ask the user to change the affinity of interrupts away to other cores
2. Explicitly force the sibling to exit when an interrupt is processed
3. Consider it low risk and allow it
1. Ask the user to change affinity when running confined guests
This is somewhat complicated for the users. They would
need to have a script to move all the interrupt away from a given
set of cores used by VMs through /proc/irq/*/smp_affinity.
It would be possible to provide a tool that does this.
It also implies that at least one core needs to be left for interrupt
processing.
For MSI-X drivers that bind an interrupt to every core and don't
allow to override it, it wouldn't work. However those typically
one process interrupts on CPUs that actually initiated the IO,
and cores only running guests shouldn't.
One case where this might be violated is when a network driver uses a
hash function to select the queue, thus the cpu. However we expect network
traffic to be encrypted anyways, so the risk of leaking something
here should be low.
2. Explicitly force the sibling to exit for interrupts
The basic idea is to maintain the "i am in a guest" and "i am in
an interrupt states in two cache lins.
The sibling always checks it before executing interrupts. If
is set it sends an IPI. KVM also needs to check and wait
for interrupts before entering the guest.
The IPI isn't actually fully executed because KVM can intercept
it during an exit and ack it directly.
I did a prototype patch for this and we saw ~2% performance loss
for doing a kernel build inside the guest. This was before
the optimization of not executing the IPI, with that I would
expect it to be faster
I wouldn't expect any performance loss when the guest is not active,
as it just checks a cache line that is never written (but
this still needs to be verified)
So this has some overhead, but it would have the advantage
that the user doesn't need to change the affinity, and doesn't
have the potential issue with hashed MSI-X interrupts.
3. Consider it low risk and allow it
Sensitive data is any user data and any kernel secrets, such as encryption
keys. Normal kernel pointers are normally not sensitive, except for KASLR
(which is very hard to maintain with L1TF)
Interrupts for modern devices normally don't touch user data directly
because they operate with DMA only, so they only handle descriptors
and similar. So they should not leak any data.
Really old interrupt handlers of course may use PIO, but we don't
expect those to be used anymore.
Another case is the soft interrupts. For example it may run the
network stack, which can in some exceptional copy user data
(for example TCP on a retransmit with MTU change if the device
doesn't support full scather gather). However we expect network
traffic to be already encrypted, so this should be low risk.
Another case is timer handlers. It's hard to say for sure,
but my assumption is that they generally don't directly touch
user data either.
For both timers and networking stack the processing is also
only on the cores that actually initiated a transaction, which
we don't expect cores that only run guests to do frequently.
Of course to really verify this for all interrupt handlers
would be a daunting task. However from a very preliminary
analysis it seems low risk.
In theory also hybrid solutions of (2) and (3) would be possible.
For example some interrupts could be white listed, and the synchronization
only be done if something not audited runs, or in case of softirqs
it could be pushed to ksoftirqd.
I'm thinking for most cases recommending (3) may be actually a reasonable
approach, although it is definitely somewhat hand wavey.
We could do (2) or better a (2)/(3) hybrid (however proper
white listing may be a lot of effort). (3) is somewhat
ugly and should probably only be last resort.
Comments?
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* [MODERATED] Re: Interrupts policy for SMT
2018-07-05 23:07 [MODERATED] Interrupts policy for SMT Andi Kleen
@ 2018-07-06 12:35 ` Paolo Bonzini
2018-07-06 16:46 ` Andi Kleen
2018-07-06 18:56 ` Jon Masters
1 sibling, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2018-07-06 12:35 UTC (permalink / raw)
To: speck
[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]
On 06/07/2018 01:07, speck for Andi Kleen wrote:
> 2. Explicitly force the sibling to exit for interrupts
>
> The basic idea is to maintain the "i am in a guest" and "i am in
> an interrupt states in two cache lins.
> The sibling always checks it before executing interrupts. If
> is set it sends an IPI. KVM also needs to check and wait
> for interrupts before entering the guest.
>
> The IPI isn't actually fully executed because KVM can intercept
> it during an exit and ack it directly.
>
> I did a prototype patch for this and we saw ~2% performance loss
> for doing a kernel build inside the guest. This was before
> the optimization of not executing the IPI, with that I would
> expect it to be faster
I like the handwavy approach, but I'd like to see the code nevertheless.
Of course it is impossible to handle userspace, but the same handwavy
justifications used for (3) apply even more there, since userspace
should hardly run on a properly configured guest.
Since we are at it, we could add a sticky bit "I am in an interrupt"
sticky (bit 0 = am I in an interrupt, bit 1 = have I been in an
interrupt since the flag was cleared). Then:
- the interrupt handler writes 3 on entry and 2 on exit;
- KVM writes 0 just before setting IF=1 and checks the sticky bit just
before vmentry to check whether to do a L1D flush.
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* [MODERATED] Re: Interrupts policy for SMT
2018-07-06 12:35 ` [MODERATED] " Paolo Bonzini
@ 2018-07-06 16:46 ` Andi Kleen
2018-07-06 17:13 ` Paolo Bonzini
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2018-07-06 16:46 UTC (permalink / raw)
To: speck
> Of course it is impossible to handle userspace, but the same handwavy
> justifications used for (3) apply even more there, since userspace
> should hardly run on a properly configured guest.
What do you mean? user space on the host would be handled by
the scheduler in this case (e.g. exclusive cpuset)
This proposal was merely about interrupts.
> Since we are at it, we could add a sticky bit "I am in an interrupt"
> sticky (bit 0 = am I in an interrupt, bit 1 = have I been in an
> interrupt since the flag was cleared). Then:
>
> - the interrupt handler writes 3 on entry and 2 on exit;
>
> - KVM writes 0 just before setting IF=1 and checks the sticky bit just
> before vmentry to check whether to do a L1D flush.
Not sure what the point of this is, but if we wanted something
like that the easiest would be to maintain a interrupt count
per CPU.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* [MODERATED] Re: Interrupts policy for SMT
2018-07-06 16:46 ` Andi Kleen
@ 2018-07-06 17:13 ` Paolo Bonzini
2018-07-06 18:07 ` Andi Kleen
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2018-07-06 17:13 UTC (permalink / raw)
To: speck
[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]
On 06/07/2018 18:46, speck for Andi Kleen wrote:
>> Of course it is impossible to handle userspace, but the same handwavy
>> justifications used for (3) apply even more there, since userspace
>> should hardly run on a properly configured guest.
> What do you mean? user space on the host would be handled by
> the scheduler in this case (e.g. exclusive cpuset)
I mean KVM userspace (whoever invokes the KVM_RUN ioctl).
> This proposal was merely about interrupts.
Understood.
>> Since we are at it, we could add a sticky bit "I am in an interrupt"
>> sticky (bit 0 = am I in an interrupt, bit 1 = have I been in an
>> interrupt since the flag was cleared). Then:
>>
>> - the interrupt handler writes 3 on entry and 2 on exit;
>>
>> - KVM writes 0 just before setting IF=1 and checks the sticky bit just
>> before vmentry to check whether to do a L1D flush.
>
> Not sure what the point of this is, but if we wanted something
> like that the easiest would be to maintain a interrupt count
> per CPU.
The point is to detect interrupts that happen during the processing of
vmexits. "External interrupt" vmexits are easy, but interrupts can
happen even during the processing of other exits.
An interrupt count has to be updated during interrupts, and also
retrieved and compared in KVM. Why not reuse the same flag you're adding?
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* [MODERATED] Re: Interrupts policy for SMT
2018-07-06 17:13 ` Paolo Bonzini
@ 2018-07-06 18:07 ` Andi Kleen
2018-07-06 18:52 ` Jon Masters
0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2018-07-06 18:07 UTC (permalink / raw)
To: speck
On Fri, Jul 06, 2018 at 07:13:32PM +0200, speck for Paolo Bonzini wrote:
> On 06/07/2018 18:46, speck for Andi Kleen wrote:
> >> Of course it is impossible to handle userspace, but the same handwavy
> >> justifications used for (3) apply even more there, since userspace
> >> should hardly run on a properly configured guest.
> > What do you mean? user space on the host would be handled by
> > the scheduler in this case (e.g. exclusive cpuset)
>
> I mean KVM userspace (whoever invokes the KVM_RUN ioctl).
Ok I assume there's nothing sensitive in there.
But haven't checked any source.
> An interrupt count has to be updated during interrupts, and also
> retrieved and compared in KVM. Why not reuse the same flag you're adding?
My flag is not sticky, so it wouldn't be the same flag I guess.
-Andi
^ permalink raw reply [flat|nested] 8+ messages in thread
* [MODERATED] Re: Interrupts policy for SMT
2018-07-06 18:07 ` Andi Kleen
@ 2018-07-06 18:52 ` Jon Masters
0 siblings, 0 replies; 8+ messages in thread
From: Jon Masters @ 2018-07-06 18:52 UTC (permalink / raw)
To: speck
[-- Attachment #1: Type: text/plain, Size: 851 bytes --]
On 07/06/2018 02:07 PM, speck for Andi Kleen wrote:
> On Fri, Jul 06, 2018 at 07:13:32PM +0200, speck for Paolo Bonzini wrote:
>> On 06/07/2018 18:46, speck for Andi Kleen wrote:
>>>> Of course it is impossible to handle userspace, but the same handwavy
>>>> justifications used for (3) apply even more there, since userspace
>>>> should hardly run on a properly configured guest.
>>> What do you mean? user space on the host would be handled by
>>> the scheduler in this case (e.g. exclusive cpuset)
>>
>> I mean KVM userspace (whoever invokes the KVM_RUN ioctl).
>
> Ok I assume there's nothing sensitive in there.
Therein lies part of the problem :) We mentioned before that if there's
a full Linux stack there's a lot in the KVM userspace of interest :)
Jon.
--
Computer Architect | Sent from my Fedora powered laptop
^ permalink raw reply [flat|nested] 8+ messages in thread
* [MODERATED] Re: Interrupts policy for SMT
2018-07-05 23:07 [MODERATED] Interrupts policy for SMT Andi Kleen
2018-07-06 12:35 ` [MODERATED] " Paolo Bonzini
@ 2018-07-06 18:56 ` Jon Masters
2018-07-11 14:31 ` Jon Masters
1 sibling, 1 reply; 8+ messages in thread
From: Jon Masters @ 2018-07-06 18:56 UTC (permalink / raw)
To: speck
[-- Attachment #1: Type: text/plain, Size: 841 bytes --]
On 07/05/2018 07:07 PM, speck for Andi Kleen wrote:
> 2. Explicitly force the sibling to exit for interrupts
>
> The basic idea is to maintain the "i am in a guest" and "i am in
> an interrupt states in two cache lins.
> The sibling always checks it before executing interrupts. If
> is set it sends an IPI. KVM also needs to check and wait
> for interrupts before entering the guest.
So guest A receives an interrupt, notices that an IPI needs to be sent
and interrupts sibling thread B. Then B waits until some later time
before re-entering its guest at the same time as A? How do you do the
lock-step entry? Do you send a second IPI or spin on some lock value
that's in this shared cache line you talk about?
Can you provide a worked example?
Jon.
--
Computer Architect | Sent from my Fedora powered laptop
^ permalink raw reply [flat|nested] 8+ messages in thread
* [MODERATED] Re: Interrupts policy for SMT
2018-07-06 18:56 ` Jon Masters
@ 2018-07-11 14:31 ` Jon Masters
0 siblings, 0 replies; 8+ messages in thread
From: Jon Masters @ 2018-07-11 14:31 UTC (permalink / raw)
To: speck
[-- Attachment #1: Type: text/plain, Size: 924 bytes --]
On 07/06/2018 02:56 PM, speck for Jon Masters wrote:
> On 07/05/2018 07:07 PM, speck for Andi Kleen wrote:
>
>> 2. Explicitly force the sibling to exit for interrupts
>>
>> The basic idea is to maintain the "i am in a guest" and "i am in
>> an interrupt states in two cache lins.
>> The sibling always checks it before executing interrupts. If
>> is set it sends an IPI. KVM also needs to check and wait
>> for interrupts before entering the guest.
>
> So guest A receives an interrupt, notices that an IPI needs to be sent
> and interrupts sibling thread B. Then B waits until some later time
> before re-entering its guest at the same time as A? How do you do the
> lock-step entry? Do you send a second IPI or spin on some lock value
> that's in this shared cache line you talk about?
>
> Can you provide a worked example?
Andi?
--
Computer Architect | Sent from my Fedora powered laptop
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-07-11 14:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-05 23:07 [MODERATED] Interrupts policy for SMT Andi Kleen
2018-07-06 12:35 ` [MODERATED] " Paolo Bonzini
2018-07-06 16:46 ` Andi Kleen
2018-07-06 17:13 ` Paolo Bonzini
2018-07-06 18:07 ` Andi Kleen
2018-07-06 18:52 ` Jon Masters
2018-07-06 18:56 ` Jon Masters
2018-07-11 14:31 ` Jon Masters
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.