stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, "Rik van Riel" <riel@redhat.com>,
	"Christian Borntraeger" <borntraeger@de.ibm.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>
Subject: [PATCH 4.14 01/27] x86,kvm: move qemu/guest FPU switching out to vcpu_run
Date: Tue, 15 Jan 2019 17:35:50 +0100	[thread overview]
Message-ID: <20190115154901.275881593@linuxfoundation.org> (raw)
In-Reply-To: <20190115154901.189747728@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Rik van Riel <riel@redhat.com>

commit f775b13eedee2f7f3c6fdd4e90fb79090ce5d339 upstream.

Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.

This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.

This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.

This can fix a bug where KVM calls get_user_pages while owning the
FPU, and the file system ends up requesting the FPU again:

    [258270.527947]  __warn+0xcb/0xf0
    [258270.527948]  warn_slowpath_null+0x1d/0x20
    [258270.527951]  kernel_fpu_disable+0x3f/0x50
    [258270.527953]  __kernel_fpu_begin+0x49/0x100
    [258270.527955]  kernel_fpu_begin+0xe/0x10
    [258270.527958]  crc32c_pcl_intel_update+0x84/0xb0
    [258270.527961]  crypto_shash_update+0x3f/0x110
    [258270.527968]  crc32c+0x63/0x8a [libcrc32c]
    [258270.527975]  dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
    [258270.527978]  node_prepare_for_write+0x44/0x70 [dm_persistent_data]
    [258270.527985]  dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
    [258270.527988]  submit_io+0x170/0x1b0 [dm_bufio]
    [258270.527992]  __write_dirty_buffer+0x89/0x90 [dm_bufio]
    [258270.527994]  __make_buffer_clean+0x4f/0x80 [dm_bufio]
    [258270.527996]  __try_evict_buffer+0x42/0x60 [dm_bufio]
    [258270.527998]  dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
    [258270.528002]  shrink_slab.part.40+0x1f5/0x420
    [258270.528004]  shrink_node+0x22c/0x320
    [258270.528006]  do_try_to_free_pages+0xf5/0x330
    [258270.528008]  try_to_free_pages+0xe9/0x190
    [258270.528009]  __alloc_pages_slowpath+0x40f/0xba0
    [258270.528011]  __alloc_pages_nodemask+0x209/0x260
    [258270.528014]  alloc_pages_vma+0x1f1/0x250
    [258270.528017]  do_huge_pmd_anonymous_page+0x123/0x660
    [258270.528021]  handle_mm_fault+0xfd3/0x1330
    [258270.528025]  __get_user_pages+0x113/0x640
    [258270.528027]  get_user_pages+0x4f/0x60
    [258270.528063]  __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
    [258270.528108]  try_async_pf+0x66/0x230 [kvm]
    [258270.528135]  tdp_page_fault+0x130/0x280 [kvm]
    [258270.528149]  kvm_mmu_page_fault+0x60/0x120 [kvm]
    [258270.528158]  handle_ept_violation+0x91/0x170 [kvm_intel]
    [258270.528162]  vmx_handle_exit+0x1ca/0x1400 [kvm_intel]

No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy.

Cc: stable@vger.kernel.org
Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
 which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/kvm_host.h |   13 +++++++++++++
 arch/x86/kvm/x86.c              |   34 +++++++++++++---------------------
 include/linux/kvm_host.h        |    2 +-
 3 files changed, 27 insertions(+), 22 deletions(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -539,7 +539,20 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu_memory_cache mmu_page_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
 
+	/*
+	 * QEMU userspace and the guest each have their own FPU state.
+	 * In vcpu_run, we switch between the user and guest FPU contexts.
+	 * While running a VCPU, the VCPU thread will have the guest FPU
+	 * context.
+	 *
+	 * Note that while the PKRU state lives inside the fpu registers,
+	 * it is switched out separately at VMENTER and VMEXIT time. The
+	 * "guest_fpu" state here contains the guest FPU context, with the
+	 * host PRKU bits.
+	 */
+	struct fpu user_fpu;
 	struct fpu guest_fpu;
+
 	u64 xcr0;
 	u64 guest_supported_xcr0;
 	u32 guest_xstate_size;
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3020,7 +3020,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	pagefault_enable();
 	kvm_x86_ops->vcpu_put(vcpu);
-	kvm_put_guest_fpu(vcpu);
 	vcpu->arch.last_host_tsc = rdtsc();
 	/*
 	 * If userspace has set any breakpoints or watchpoints, dr6 is restored
@@ -5377,13 +5376,10 @@ static void emulator_halt(struct x86_emu
 
 static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)
 {
-	preempt_disable();
-	kvm_load_guest_fpu(emul_to_vcpu(ctxt));
 }
 
 static void emulator_put_fpu(struct x86_emulate_ctxt *ctxt)
 {
-	preempt_enable();
 }
 
 static int emulator_intercept(struct x86_emulate_ctxt *ctxt,
@@ -7083,7 +7079,6 @@ static int vcpu_enter_guest(struct kvm_v
 	preempt_disable();
 
 	kvm_x86_ops->prepare_guest_switch(vcpu);
-	kvm_load_guest_fpu(vcpu);
 
 	/*
 	 * Disable IRQs before setting IN_GUEST_MODE.  Posted interrupt
@@ -7428,12 +7423,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
 		}
 	}
 
+	kvm_load_guest_fpu(vcpu);
+
 	if (unlikely(vcpu->arch.complete_userspace_io)) {
 		int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
 		vcpu->arch.complete_userspace_io = NULL;
 		r = cui(vcpu);
 		if (r <= 0)
-			goto out;
+			goto out_fpu;
 	} else
 		WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
 
@@ -7442,6 +7439,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
 	else
 		r = vcpu_run(vcpu);
 
+out_fpu:
+	kvm_put_guest_fpu(vcpu);
 out:
 	kvm_put_guest_fpu(vcpu);
 	post_kvm_run_save(vcpu);
@@ -7865,32 +7864,25 @@ static void fx_init(struct kvm_vcpu *vcp
 	vcpu->arch.cr0 |= X86_CR0_ET;
 }
 
+/* Swap (qemu) user FPU context for the guest FPU context. */
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	if (vcpu->guest_fpu_loaded)
-		return;
-
-	/*
-	 * Restore all possible states in the guest,
-	 * and assume host would use all available bits.
-	 * Guest xcr0 would be loaded later.
-	 */
-	vcpu->guest_fpu_loaded = 1;
-	__kernel_fpu_begin();
+	preempt_disable();
+	copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
 	/* PKRU is separately restored in kvm_x86_ops->run.  */
 	__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
 				~XFEATURE_MASK_PKRU);
+	preempt_enable();
 	trace_kvm_fpu(1);
 }
 
+/* When vcpu_run ends, restore user space FPU context. */
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
-	if (!vcpu->guest_fpu_loaded)
-		return;
-
-	vcpu->guest_fpu_loaded = 0;
+	preempt_disable();
 	copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
-	__kernel_fpu_end();
+	copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
+	preempt_enable();
 	++vcpu->stat.fpu_reload;
 	trace_kvm_fpu(0);
 }
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -232,7 +232,7 @@ struct kvm_vcpu {
 	struct mutex mutex;
 	struct kvm_run *run;
 
-	int guest_fpu_loaded, guest_xcr0_loaded;
+	int guest_xcr0_loaded;
 	struct swait_queue_head wq;
 	struct pid __rcu *pid;
 	int sigset_active;



  reply	other threads:[~2019-01-15 16:58 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-15 16:35 [PATCH 4.14 00/27] 4.14.94-stable review Greg Kroah-Hartman
2019-01-15 16:35 ` Greg Kroah-Hartman [this message]
2019-01-15 16:35 ` [PATCH 4.14 02/27] x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE Greg Kroah-Hartman
2019-01-15 16:35 ` [PATCH 4.14 03/27] ALSA: hda/realtek - Support Dell headset mode for New AIO platform Greg Kroah-Hartman
2019-01-15 16:35 ` [PATCH 4.14 04/27] ALSA: hda/realtek - Add unplug function into unplug state of Headset Mode for ALC225 Greg Kroah-Hartman
2019-01-15 16:35 ` [PATCH 4.14 05/27] ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225 Greg Kroah-Hartman
2019-01-15 16:35 ` [PATCH 4.14 06/27] CIFS: Fix adjustment of credits for MTU requests Greg Kroah-Hartman
2019-01-15 16:35 ` [PATCH 4.14 07/27] CIFS: Do not hide EINTR after sending network packets Greg Kroah-Hartman
2019-01-15 16:35 ` [PATCH 4.14 08/27] cifs: Fix potential OOB access of lock element array Greg Kroah-Hartman
2019-01-15 16:35 ` [PATCH 4.14 09/27] usb: cdc-acm: send ZLP for Telit 3G Intel based modems Greg Kroah-Hartman
2019-01-15 16:35 ` [PATCH 4.14 10/27] USB: storage: dont insert sane sense for SPC3+ when bad sense specified Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 11/27] USB: storage: add quirk for SMI SM3350 Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 12/27] USB: Add USB_QUIRK_DELAY_CTRL_MSG quirk for Corsair K70 RGB Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 13/27] slab: alien caches must not be initialized if the allocation of the alien cache failed Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 14/27] mm: page_mapped: dont assume compound page is huge or THP Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 15/27] mm, memcg: fix reclaim deadlock with writeback Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 16/27] ACPI: power: Skip duplicate power resource references in _PRx Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 17/27] ACPI / PMIC: xpower: Fix TS-pin current-source handling Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 18/27] i2c: dev: prevent adapter retries and timeout being set as minus value Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 19/27] drm/fb-helper: Partially bring back workaround for bugs of SDL 1.2 Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 20/27] rbd: dont return 0 on unmap if RBD_DEV_FLAG_REMOVING is set Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 21/27] ext4: make sure enough credits are reserved for dioread_nolock writes Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 22/27] ext4: fix a potential fiemap/page fault deadlock w/ inline_data Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 23/27] ext4: avoid kernel warning when writing the superblock to a dead device Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 24/27] ext4: use ext4_write_inode() when fsyncing w/o a journal Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 25/27] ext4: track writeback errors using the generic tracking infrastructure Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 26/27] sunrpc: use-after-free in svc_process_common() Greg Kroah-Hartman
2019-01-15 16:36 ` [PATCH 4.14 27/27] KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less Greg Kroah-Hartman
2019-01-16  1:37 ` [PATCH 4.14 00/27] 4.14.94-stable review shuah
2019-01-16  9:25 ` Jon Hunter
2019-01-16 16:02   ` Greg Kroah-Hartman
2019-01-16 16:56     ` Jon Hunter
2019-01-16 17:11       ` Greg Kroah-Hartman
2019-01-16 17:38         ` Jon Hunter
2019-01-16 17:47           ` Greg Kroah-Hartman
2019-01-16 11:51 ` Naresh Kamboju
2019-01-16 20:37 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190115154901.275881593@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=riel@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).