From: Andrew Jones <drjones@redhat.com>
To: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org
Cc: marc.zyngier@arm.com, cdall@linaro.org, pbonzini@redhat.com
Subject: [PATCH v3 02/10] KVM: Add documentation for VCPU requests
Date: Wed, 3 May 2017 18:06:27 +0200 [thread overview]
Message-ID: <20170503160635.21669-3-drjones@redhat.com> (raw)
In-Reply-To: <20170503160635.21669-1-drjones@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
Documentation/virtual/kvm/vcpu-requests.rst | 269 ++++++++++++++++++++++++++++
1 file changed, 269 insertions(+)
create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
new file mode 100644
index 000000000000..d74616d7999a
--- /dev/null
+++ b/Documentation/virtual/kvm/vcpu-requests.rst
@@ -0,0 +1,269 @@
+=================
+KVM VCPU Requests
+=================
+
+Overview
+========
+
+KVM supports an internal API enabling threads to request a VCPU thread to
+perform some activity. For example, a thread may request a VCPU to flush
+its TLB with a VCPU request. The API consists of the following functions::
+
+ /* Check if any requests are pending for VCPU @vcpu. */
+ bool kvm_request_pending(struct kvm_vcpu *vcpu);
+
+ /* Check if VCPU @vcpu has request @req pending. */
+ bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
+
+ /* Clear request @req for VCPU @vcpu. */
+ void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
+
+ /*
+ * Check if VCPU @vcpu has request @req pending. When the request is
+ * pending it will be cleared and a memory barrier, which pairs with
+ * another in kvm_make_request(), will be issued.
+ */
+ bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
+
+ /*
+ * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
+ * with another in kvm_check_request(), prior to setting the request.
+ */
+ void kvm_make_request(int req, struct kvm_vcpu *vcpu);
+
+ /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
+ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
+
+Typically a requester wants the VCPU to perform the activity as soon
+as possible after making the request. This means most requests
+(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
+and kvm_make_all_cpus_request() has the kicking of all VCPUs built
+into it.
+
+VCPU Kicks
+----------
+
+The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
+order to perform some KVM maintenance. To do so, an IPI is sent, forcing
+a guest mode exit. However, a VCPU thread may not be in guest mode at the
+time of the kick. Therefore, depending on the mode and state of the VCPU
+thread, there are two other actions a kick may take. All three actions
+are listed below:
+
+1) Send an IPI. This forces a guest mode exit.
+2) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest
+ mode that wait on waitqueues. Waking them removes the threads from
+ the waitqueues, allowing the threads to run again. This behavior
+ may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
+3) Nothing. When the VCPU is not in guest mode and the VCPU thread is not
+ sleeping, then there is nothing to do.
+
+VCPU Mode
+---------
+
+VCPUs have a mode state, vcpu->mode, that is used to track whether the
+guest is running in guest mode or not, as well as some specific
+outside guest mode states. The architecture may use vcpu->mode to ensure
+VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"), as
+well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and even
+to ensure IPI acknowledgements are waited upon (see "Waiting for
+Acknowledgements"). The following modes are defined:
+
+OUTSIDE_GUEST_MODE
+
+ The VCPU thread is outside guest mode.
+
+IN_GUEST_MODE
+
+ The VCPU thread is in guest mode.
+
+EXITING_GUEST_MODE
+
+ The VCPU thread is transitioning from IN_GUEST_MODE to
+ OUTSIDE_GUEST_MODE.
+
+READING_SHADOW_PAGE_TABLES
+
+ The VCPU thread is outside guest mode and wants certain VCPU requests,
+ namely KVM_REQ_TLB_FLUSH, to be delayed until it's done reading the
+ page tables.
+
+VCPU Request Internals
+======================
+
+VCPU requests are simply bit indices of the vcpu->requests bitmap. This
+means general bitops, like those documented in [atomic-ops]_ could also be
+used, e.g. ::
+
+ clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
+
+However, VCPU request users should refrain from doing so, as it would
+break the abstraction. The first 8 bits are reserved for architecture
+independent requests, all additional bits are available for architecture
+dependent requests.
+
+Architecture Independent Requests
+---------------------------------
+
+KVM_REQ_TLB_FLUSH
+
+ KVM's common MMU notifier may need to flush all of a guest's TLB
+ entries, calling kvm_flush_remote_tlbs() to do so. Architectures that
+ choose to use the common kvm_flush_remote_tlbs() implementation will
+ need to handle this VCPU request.
+
+KVM_REQ_MMU_RELOAD
+
+ When shadow page tables are used and memory slots are removed it's
+ necessary to inform each VCPU to completely refresh the tables. This
+ request is used for that.
+
+KVM_REQ_PENDING_TIMER
+
+ This request may be made from a timer handler run on the host on behalf
+ of a VCPU. It informs the VCPU thread to inject a timer interrupt.
+
+KVM_REQ_UNHALT
+
+ This request may be made from the KVM common function kvm_vcpu_block(),
+ which is used to emulate an instruction that causes a CPU to halt until
+ one of an architectural specific set of events and/or interrupts is
+ received (determined by checking kvm_arch_vcpu_runnable()). When that
+ event or interrupt arrives kvm_vcpu_block() makes the request. This is
+ in contrast to when kvm_vcpu_block() returns due to any other reason,
+ such as a pending signal, which does not indicate the VCPU's halt
+ emulation should stop, and therefore does not make the request.
+
+KVM_REQUEST_MASK
+----------------
+
+VCPU requests should be masked by KVM_REQUEST_MASK before using them with
+bitops. This is because only the lower 8 bits are used to represent the
+request's number. The upper bits are reserved, and may be used as flags.
+
+VCPU Request Flags
+------------------
+
+KVM_REQUEST_NO_WAKEUP
+
+ This flag is applied to a request that does not need immediate
+ attention. When a request does not need immediate attention, and the
+ VCPU's thread is outside guest mode sleeping, then the thread is not
+ awaken by a kick.
+
+KVM_REQUEST_WAIT
+
+ When requests with this flag are made with kvm_make_all_cpus_request(),
+ then the caller will wait for each VCPU to acknowledge the IPI before
+ proceeding.
+
+VCPU Requests with Associated State
+===================================
+
+Requesters that want the receiving VCPU to handle new state need to ensure
+the newly written state is observable to the receiving VCPU thread's CPU
+by the time it observes the request. This means a write memory barrier
+must be inserted after writing the new state and before setting the VCPU
+request bit. Additionally, on the receiving VCPU thread's side, a
+corresponding read barrier must be inserted after reading the request bit
+and before proceeding to read the new state associated with it. See
+scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
+[memory-barriers]_.
+
+The pair of functions, kvm_check_request() and kvm_make_request(), provide
+the memory barriers, allowing this requirement to be handled internally by
+the API.
+
+Ensuring Requests Are Seen
+==========================
+
+When making requests to VCPUs, we want to avoid the receiving VCPU
+executing in guest mode for an arbitrary long time without handling the
+request. We can be sure this won't happen as long as we ensure the VCPU
+thread checks kvm_request_pending() before entering guest mode and that a
+kick will send an IPI when necessary. Extra care must be taken to cover
+the period after the VCPU thread's last kvm_request_pending() check and
+before it has entered guest mode, as kick IPIs will only trigger VCPU run
+loops for VCPU threads that are in guest mode or at least have already
+disabled interrupts in order to prepare to enter guest mode. This means
+that an optimized implementation (see "IPI Reduction") must be certain
+when it's safe to not send the IPI. One solution, which all architectures
+except s390 apply, is to set vcpu->mode to IN_GUEST_MODE prior to the last
+kvm_request_pending() check and to rely on memory barrier guarantees.
+
+With memory barriers we can exclude the possibility of a VCPU thread
+observing !kvm_request_pending() on its last check and then not receiving
+an IPI for the next request made of it, even if the request is made
+immediately after the check. This is done by way of the Dekker memory
+barrier pattern (scenario 10 of [lwn-mb]_). As the Dekker pattern
+requires two variables, this solution pairs vcpu->mode with
+vcpu->requests. Substituting them into the pattern gives::
+
+ CPU1 CPU2
+ ================= =================
+ local_irq_disable();
+ WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu);
+ smp_mb(); smp_mb();
+ if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) ==
+ IN_GUEST_MODE) {
+ ...abort guest entry... ...send IPI...
+ } }
+
+As stated above, the IPI is only useful for VCPU threads in guest mode or
+that have already disabled interrupts. This is why this specific case of
+the Dekker pattern has been extended to disable interrupts before setting
+vcpu->mode to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to
+pedantically implement the memory barrier pattern, guaranteeing the
+compiler doesn't interfere with vcpu->mode's carefully planned accesses.
+
+IPI Reduction
+-------------
+
+As only one IPI is needed to get a VCPU to check for any/all requests,
+then they may be coalesced. This is easily done by having the first IPI
+sending kick also change the VCPU mode to something !IN_GUEST_MODE. The
+transitional state, EXITING_GUEST_MODE, is used for this purpose.
+
+Waiting for Acknowledgements
+----------------------------
+
+Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
+be sent, and the acknowledgements to be waited upon, even when the target
+VCPU threads are in modes other than IN_GUEST_MODE. For example, one case
+is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
+is set after disabling interrupts. For these cases, the "should send an
+IPI" condition becomes READ_ONCE(vcpu->mode) != OUTSIDE_GUEST_MODE.
+
+Request-less VCPU Kicks
+-----------------------
+
+As the determination of whether or not to send an IPI depends on the
+two-variable Dekker memory barrier pattern, then it's clear that
+request-less VCPU kicks are almost never correct. Without the assurance
+that a non-IPI generating kick will still result in an action by the
+receiving VCPU, as the final kvm_request_pending() check does for
+request-accompanying kicks, then the kick may not do anything useful at
+all. If, for instance, a request-less kick was made to a VCPU that was
+just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
+the VCPU thread may continue its entry without actually having done
+whatever it was the kick was meant to initiate.
+
+Additional Considerations
+=========================
+
+Sleeping VCPUs
+--------------
+
+VCPU threads may need to consider requests before and/or after calling
+functions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they
+do or not, and, if they do, which requests need consideration, is
+architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
+to check if it should awaken. One reason to do so is to provide
+architectures a function where requests may be checked if necessary.
+
+References
+==========
+
+.. [atomic-ops] Documentation/core-api/atomic_ops.rst
+.. [memory-barriers] Documentation/memory-barriers.txt
+.. [lwn-mb] https://lwn.net/Articles/573436/
--
2.9.3
next prev parent reply other threads:[~2017-05-03 16:06 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
2017-05-03 16:06 ` [PATCH v3 01/10] KVM: add kvm_request_pending Andrew Jones
2017-05-03 16:06 ` Andrew Jones [this message]
2017-05-04 11:27 ` [PATCH v3 02/10] KVM: Add documentation for VCPU requests Paolo Bonzini
2017-05-04 12:06 ` Andrew Jones
2017-05-04 12:51 ` Paolo Bonzini
2017-05-04 13:31 ` Andrew Jones
2017-05-03 16:06 ` [PATCH v3 03/10] KVM: arm/arm64: prepare to use vcpu requests Andrew Jones
2017-05-03 16:06 ` [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu Andrew Jones
2017-05-06 18:08 ` Christoffer Dall
2017-05-09 17:02 ` Andrew Jones
2017-05-10 9:59 ` Christoffer Dall
2017-05-15 11:14 ` Christoffer Dall
2017-05-16 2:17 ` Andrew Jones
2017-05-16 10:06 ` Christoffer Dall
2017-05-03 16:06 ` [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller Andrew Jones
2017-05-06 18:12 ` Christoffer Dall
2017-05-09 17:17 ` Andrew Jones
2017-05-10 9:55 ` Christoffer Dall
2017-05-10 10:07 ` Andrew Jones
2017-05-10 12:19 ` Christoffer Dall
2017-05-03 16:06 ` [PATCH v3 06/10] KVM: arm/arm64: use vcpu requests for power_off Andrew Jones
2017-05-06 18:17 ` Christoffer Dall
2017-05-03 16:06 ` [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN Andrew Jones
2017-05-06 18:27 ` Christoffer Dall
2017-05-09 17:40 ` Andrew Jones
2017-05-09 20:13 ` Christoffer Dall
2017-05-10 6:58 ` Andrew Jones
2017-05-10 8:07 ` Christoffer Dall
2017-05-10 8:20 ` Andrew Jones
2017-05-10 9:06 ` Christoffer Dall
2017-05-03 16:06 ` [PATCH v3 08/10] KVM: arm/arm64: change exit request to sleep request Andrew Jones
2017-05-04 11:38 ` Paolo Bonzini
2017-05-04 12:07 ` Andrew Jones
2017-05-03 16:06 ` [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection Andrew Jones
2017-05-04 11:47 ` Paolo Bonzini
2017-05-06 18:49 ` Christoffer Dall
2017-05-08 8:48 ` Paolo Bonzini
2017-05-08 8:56 ` Christoffer Dall
2017-05-06 18:51 ` Christoffer Dall
2017-05-09 17:53 ` Andrew Jones
2017-05-03 16:06 ` [PATCH v3 10/10] KVM: arm/arm64: PMU: remove request-less vcpu kick Andrew Jones
2017-05-06 18:55 ` Christoffer Dall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170503160635.21669-3-drjones@redhat.com \
--to=drjones@redhat.com \
--cc=cdall@linaro.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=marc.zyngier@arm.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox