All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>, kvm <kvm@vger.kernel.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Joao Martins <joao.m.martins@oracle.com>,
	"jmattson @ google . com" <jmattson@google.com>,
	"wanpengli @ tencent . com" <wanpengli@tencent.com>,
	"vkuznets @ redhat . com" <vkuznets@redhat.com>,
	"mtosatti @ redhat . com" <mtosatti@redhat.com>,
	"joro @ 8bytes . org" <joro@8bytes.org>,
	karahmed@amazon.com, butt3rflyh4ck <butterflyhuangxx@gmail.com>
Subject: Re: [PATCH v6 0/6] x86/xen: Add in-kernel Xen event channel delivery
Date: Thu, 23 Dec 2021 21:24:55 +0000	[thread overview]
Message-ID: <YcTpJ369cRBN4W93@google.com> (raw)
In-Reply-To: <e2ed79e6-612a-44a3-d77b-297135849656@redhat.com>

On Thu, Dec 23, 2021, Paolo Bonzini wrote:
> On 12/22/21 16:18, David Woodhouse wrote:
> > > > n a mapped
> > > > shared info page.
> > > > 
> > > > This can be used for routing MSI of passthrough devices to PIRQ event
> > > > channels in a Xen guest, and we can build on it for delivering IPIs and
> > > > timers directly from the kernel too.
> > > Queued patches 1-5, thanks.
> > > 
> > > Paolo
> > Any word on when these will make it out to kvm.git? Did you find
> > something else I need to fix?
> 
> I got a WARN when testing the queue branch, now I'm bisecting.

Was it perhaps this WARN?

  WARNING: CPU: 5 PID: 2180 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3163 mark_page_dirty_in_slot+0x6b/0x80 [kvm]
  Modules linked in: kvm_intel kvm irqbypass
  CPU: 5 PID: 2180 Comm: hyperv_clock Not tainted 5.16.0-rc4+ #81
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:mark_page_dirty_in_slot+0x6b/0x80 [kvm]
   kvm_write_guest+0x117/0x120 [kvm]
   kvm_hv_invalidate_tsc_page+0x99/0xf0 [kvm]
   kvm_arch_vm_ioctl+0x20f/0xb60 [kvm]
   kvm_vm_ioctl+0x711/0xd50 [kvm]
   __x64_sys_ioctl+0x7f/0xb0
   do_syscall_64+0x38/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

If so, it was clearly introduced by commit 03c0304a86bc ("KVM: Warn if mark_page_dirty()
is called without an active vCPU").  But that change is 100% correct as it's trivial
to crash KVM without its protection.  E.g. running hyper_clock with these mods

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 2dd78c1db4b6..ae0d0490580a 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -371,6 +371,8 @@ struct kvm_vm *vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t nr_vcpus,
        vm_create_irqchip(vm);
 #endif

+       vm_enable_dirty_ring(vm, 0x10000);
+
        for (i = 0; i < nr_vcpus; ++i) {
                uint32_t vcpuid = vcpuids ? vcpuids[i] : i;

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
index e0b2bb1339b1..1032695e3901 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
@@ -212,6 +212,8 @@ int main(void)
        vm = vm_create_default(VCPU_ID, 0, guest_main);
        run = vcpu_state(vm, VCPU_ID);

+       vm_mem_region_set_flags(vm, 0, KVM_MEM_LOG_DIRTY_PAGES);
+
        vcpu_set_hv_cpuid(vm, VCPU_ID);

        tsc_page_gva = vm_vaddr_alloc_page(vm);

Triggers a NULL pointer deref.

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 12959a067 P4D 12959a067 PUD 129594067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 5 PID: 1784 Comm: hyperv_clock Not tainted 5.16.0-rc4+ #82
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:kvm_dirty_ring_get+0xe/0x30 [kvm]
   mark_page_dirty_in_slot.part.0+0x30/0x60 [kvm]
   kvm_write_guest+0x117/0x130 [kvm]
   kvm_hv_invalidate_tsc_page+0x99/0xf0 [kvm]
   kvm_arch_vm_ioctl+0x20f/0xb60 [kvm]
   kvm_vm_ioctl+0x711/0xd50 [kvm]
   __x64_sys_ioctl+0x7f/0xb0
   do_syscall_64+0x38/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Commit e880c6ea55b9 ("KVM: x86: hyper-v: Prevent using not-yet-updated TSC page
by secondary CPUs") is squarely to blame as it was added after dirty ring, though
in Vitaly's defense, David put it best: "That's a fairly awful bear trap".

Assuming there's nothing special about vcpu0's clock, I belive the below fixes
all issues.  If not, then the correct fix is to have vcpu0 block all other vCPUs
until the update completes.

---
 arch/x86/include/asm/kvm_host.h |  5 +++--
 arch/x86/kvm/hyperv.c           | 39 ++++++++-------------------------
 arch/x86/kvm/hyperv.h           |  2 +-
 arch/x86/kvm/x86.c              |  7 +++---
 4 files changed, 16 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5d97f4adc1cb..f85df83e3083 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -971,10 +971,11 @@ enum hv_tsc_page_status {
 	HV_TSC_PAGE_HOST_CHANGED,
 	/* TSC page was properly set up and is currently active  */
 	HV_TSC_PAGE_SET,
-	/* TSC page is currently being updated and therefore is inactive */
-	HV_TSC_PAGE_UPDATING,
 	/* TSC page was set up with an inaccessible GPA */
 	HV_TSC_PAGE_BROKEN,
+
+	/* TSC page needs to be updated to handle a guest or host change */
+	HV_TSC_PAGE_UPDATE_REQUIRED = BIT(31),
 };

 /* Hyper-V emulation context */
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 6e38a7d22e97..84b063bda4d8 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1122,14 +1122,14 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence));
 	BUILD_BUG_ON(offsetof(struct ms_hyperv_tsc_page, tsc_sequence) != 0);

-	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
-	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET)
-		return;
-
 	mutex_lock(&hv->hv_lock);
+
 	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
 		goto out_unlock;

+	if (!(hv->hv_tsc_page_status & HV_TSC_PAGE_UPDATE_REQUIRED))
+		goto out_unlock;
+
 	gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
 	/*
 	 * Because the TSC parameters only vary when there is a
@@ -1188,45 +1188,24 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
 	mutex_unlock(&hv->hv_lock);
 }

-void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
+void kvm_hv_request_tsc_page_update(struct kvm *kvm)
 {
 	struct kvm_hv *hv = to_kvm_hv(kvm);
-	u64 gfn;
-	int idx;
+
+	mutex_lock(&hv->hv_lock);

 	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
 	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET ||
 	    tsc_page_update_unsafe(hv))
-		return;
-
-	mutex_lock(&hv->hv_lock);
-
-	if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))
 		goto out_unlock;

-	/* Preserve HV_TSC_PAGE_GUEST_CHANGED/HV_TSC_PAGE_HOST_CHANGED states */
-	if (hv->hv_tsc_page_status == HV_TSC_PAGE_SET)
-		hv->hv_tsc_page_status = HV_TSC_PAGE_UPDATING;
-
-	gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
-
-	hv->tsc_ref.tsc_sequence = 0;
-
-	/*
-	 * Take the srcu lock as memslots will be accessed to check the gfn
-	 * cache generation against the memslots generation.
-	 */
-	idx = srcu_read_lock(&kvm->srcu);
-	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
-			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
-		hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
-	srcu_read_unlock(&kvm->srcu, idx);
+	if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)
+		hv->hv_tsc_page_status |= HV_TSC_PAGE_UPDATE_REQUIRED;

 out_unlock:
 	mutex_unlock(&hv->hv_lock);
 }

-
 static bool hv_check_msr_access(struct kvm_vcpu_hv *hv_vcpu, u32 msr)
 {
 	if (!hv_vcpu->enforce_cpuid)
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index ed1c4e546d04..3e79b4a9ed4e 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -133,7 +133,7 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu);

 void kvm_hv_setup_tsc_page(struct kvm *kvm,
 			   struct pvclock_vcpu_time_info *hv_clock);
-void kvm_hv_invalidate_tsc_page(struct kvm *kvm);
+void kvm_hv_request_tsc_page_update(struct kvm *kvm);

 void kvm_hv_init_vm(struct kvm *kvm);
 void kvm_hv_destroy_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c194a8cbd25f..c1adc9efea28 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2848,7 +2848,7 @@ static void kvm_end_pvclock_update(struct kvm *kvm)

 static void kvm_update_masterclock(struct kvm *kvm)
 {
-	kvm_hv_invalidate_tsc_page(kvm);
+	kvm_hv_request_tsc_page_update(kvm);
 	kvm_start_pvclock_update(kvm);
 	pvclock_update_vm_gtod_copy(kvm);
 	kvm_end_pvclock_update(kvm);
@@ -3060,8 +3060,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 				       offsetof(struct compat_vcpu_info, time));
 	if (vcpu->xen.vcpu_time_info_set)
 		kvm_setup_pvclock_page(v, &vcpu->xen.vcpu_time_info_cache, 0);
-	if (!v->vcpu_idx)
-		kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
+	kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock);
 	return 0;
 }

@@ -6039,7 +6038,7 @@ static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp)
 	if (data.flags & ~KVM_CLOCK_VALID_FLAGS)
 		return -EINVAL;

-	kvm_hv_invalidate_tsc_page(kvm);
+	kvm_hv_request_tsc_page_update(kvm);
 	kvm_start_pvclock_update(kvm);
 	pvclock_update_vm_gtod_copy(kvm);


base-commit: 9f73307be079749233b47e623a67201975c575bc
--


  reply	other threads:[~2021-12-23 21:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 16:36 [PATCH v6 0/6] x86/xen: Add in-kernel Xen event channel delivery David Woodhouse
2021-12-10 16:36 ` [PATCH v6 1/6] KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse
2021-12-10 16:36 ` [PATCH v6 2/6] KVM: Reinstate gfn_to_pfn_cache with invalidation support David Woodhouse
2021-12-10 16:36 ` [PATCH v6 3/6] KVM: x86/xen: Maintain valid mapping of Xen shared_info page David Woodhouse
2021-12-10 16:36 ` [PATCH v6 4/6] KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery David Woodhouse
2021-12-10 16:36 ` [PATCH v6 5/6] KVM: x86: Fix wall clock writes in Xen shared_info not to mark page dirty David Woodhouse
2021-12-10 16:36 ` [PATCH v6 6/6] KVM: x86: First attempt at converting nested virtual APIC page to gpc David Woodhouse
2021-12-11  1:09 ` [PATCH v6 0/6] x86/xen: Add in-kernel Xen event channel delivery Paolo Bonzini
2021-12-22 15:18   ` David Woodhouse
2021-12-23  0:26     ` Paolo Bonzini
2021-12-23 21:24       ` Sean Christopherson [this message]
2021-12-23 22:21         ` Paolo Bonzini
2022-01-04  7:48         ` Wanpeng Li
2022-01-05 17:58         ` David Woodhouse
2022-01-08  0:26           ` Sean Christopherson
2022-01-12  3:10             ` Peter Xu
2022-01-19  8:14               ` David Woodhouse
2022-01-19 17:36                 ` Paolo Bonzini
2022-01-19 17:38                   ` David Woodhouse
2022-01-19 17:44                     ` Sean Christopherson
2022-01-21 18:15                       ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YcTpJ369cRBN4W93@google.com \
    --to=seanjc@google.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=butterflyhuangxx@gmail.com \
    --cc=dwmw2@infradead.org \
    --cc=jmattson@google.com \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=karahmed@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.