[MODERATED] Interrupts policy for SMT

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andi Kleen <ak@linux.intel.com>
To: speck@linutronix.de
Subject: [MODERATED] Interrupts policy for SMT
Date: Thu, 5 Jul 2018 16:07:32 -0700	[thread overview]
Message-ID: <20180705230732.GJ17013@tassilo.jf.intel.com> (raw)

Hi,

When SMT is on for a guest and there is an interrupt on the sibling the guest
may be able to look at the data processed by the interrupt through L1TF.

This is mainly an issue when SMT is on, but the guest is confined
to a small number of exclusive cores that don't run any other processes,
e.g. through an exclusive cpuset.

There are three possible solutions for this:

1. Ask the user to change the affinity of interrupts away to other cores
2. Explicitly force the sibling to exit when an interrupt is processed
3. Consider it low risk and allow it

1. Ask the user to change affinity when running confined guests

This is somewhat complicated for the users. They would
need to have a script to move all the interrupt away from a given
set of cores used by VMs through /proc/irq/*/smp_affinity. 
It would be possible to provide a tool that does this.

It also implies that at least one core needs to be left for interrupt
processing.

For MSI-X drivers that bind an interrupt to every core and don't 
allow to override it, it wouldn't work. However those typically
one process interrupts on CPUs that actually initiated the IO,
and cores only running guests shouldn't. 

One case where this might be violated is when a network driver uses a
hash function to select the queue, thus the cpu. However we expect network
traffic to be encrypted anyways, so the risk of leaking something
here should be low.

2. Explicitly force the sibling to exit for interrupts

The basic idea is to maintain the "i am in a guest" and "i am in
an interrupt states in two cache lins.
The sibling always checks it before executing interrupts. If
is set it sends an IPI. KVM also needs to check and wait
for interrupts before entering the guest.

The IPI isn't actually fully executed because KVM can intercept
it during an exit and ack it directly.

I did a prototype patch for this and we saw ~2% performance loss
for doing a kernel build inside the guest. This was before
the optimization of not executing the IPI, with that I would
expect it to be faster

I wouldn't expect any performance loss when the guest is not active,
as it just checks a cache line that is never written (but
this still needs to be verified)

So this has some overhead, but it would have the advantage
that the user doesn't need to change the affinity, and doesn't
have the potential issue with hashed MSI-X interrupts.

3. Consider it low risk and allow it

Sensitive data is any user data and any kernel secrets, such as encryption 
keys. Normal kernel pointers are normally not sensitive, except for KASLR
(which is very hard to maintain with L1TF) 

Interrupts for modern devices normally don't touch user data directly 
because they operate with DMA only, so they only handle descriptors
and similar. So they should not leak any data.

Really old interrupt handlers of course may use PIO, but we don't
expect those to be used anymore.

Another case is the soft interrupts. For example it may run the
network stack, which can in some exceptional copy user data 
(for example TCP on a retransmit with MTU change if the device
doesn't support full scather gather). However we expect network
traffic to be already encrypted, so this should be low risk.

Another case is timer handlers. It's hard to say for sure,
but my assumption is that they generally don't directly touch
user data either. 

For both timers and networking stack the processing is also
only on the cores that actually initiated a transaction, which
we don't expect cores that only run guests to do frequently.

Of course to really verify this for all interrupt handlers
would be a daunting task. However from a very preliminary
analysis it seems low risk.

In theory also hybrid solutions of (2) and (3) would be possible.
For example some interrupts could be white listed, and the synchronization
only be done if something not audited runs, or in case of softirqs
it could be pushed to ksoftirqd.

I'm thinking for most cases recommending (3) may be actually a reasonable
approach, although it is definitely somewhat hand wavey.

We could do (2) or better a (2)/(3) hybrid (however proper
white listing may be a lot of effort). (3) is somewhat
ugly and should probably only be last resort.

Comments?

-Andi

next             reply	other threads:[~2018-07-05 23:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-05 23:07 Andi Kleen [this message]
2018-07-06 12:35 ` [MODERATED] Re: Interrupts policy for SMT Paolo Bonzini
2018-07-06 16:46   ` Andi Kleen
2018-07-06 17:13     ` Paolo Bonzini
2018-07-06 18:07       ` Andi Kleen
2018-07-06 18:52         ` Jon Masters
2018-07-06 18:56 ` Jon Masters
2018-07-11 14:31   ` Jon Masters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180705230732.GJ17013@tassilo.jf.intel.com \
    --to=ak@linux.intel.com \
    --cc=speck@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.