Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Gleb Natapov <gleb@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	a.p.zijlstra@chello.nl, tglx@linutronix.de, hpa@zytor.com,
	riel@redhat.com, cl@linux-foundation.org, mtosatti@redhat.com
Subject: Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
Date: Thu, 7 Oct 2010 19:47:16 +0200	[thread overview]
Message-ID: <20101007174716.GD2397@redhat.com> (raw)
In-Reply-To: <4CAD97D0.70100@redhat.com>

On Thu, Oct 07, 2010 at 11:50:08AM +0200, Avi Kivity wrote:
>  On 10/04/2010 05:56 PM, Gleb Natapov wrote:
> >If a guest accesses swapped out memory do not swap it in from vcpu thread
> >context. Schedule work to do swapping and put vcpu into halted state
> >instead.
> >
> >Interrupts will still be delivered to the guest and if interrupt will
> >cause reschedule guest will continue to run another task.
> >
> >
> >+
> >+static bool can_do_async_pf(struct kvm_vcpu *vcpu)
> >+{
> >+	if (unlikely(!irqchip_in_kernel(vcpu->kvm) ||
> >+		     kvm_event_needs_reinjection(vcpu)))
> >+		return false;
> >+
> >+	return kvm_x86_ops->interrupt_allowed(vcpu);
> >+}
> 
> Strictly speaking, if the cpu can handle NMIs it can take an apf?
> 
We can always do apf, but if vcpu can't do anything hwy bother. For NMI
watchdog yes, may be it is worth to allow apf if nmi is allowed.

> >@@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >  	if (unlikely(r))
> >  		goto out;
> >
> >+	kvm_check_async_pf_completion(vcpu);
> >+	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) {
> >+		/* Page is swapped out. Do synthetic halt */
> >+		r = 1;
> >+		goto out;
> >+	}
> >+
> 
> Why do it here in the fast path?  Can't you halt the cpu when
> starting the page fault?
Page fault may complete before guest re-entry. We do not want to halt vcpu
in this case.
> 
> I guess the apf threads can't touch mp_state, but they can have a
> KVM_REQ to trigger the check.
This will require KVM_REQ check on fast path, so what's the difference
performance wise.

> 
> >  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >  		inject_pending_event(vcpu);
> >
> >@@ -5781,6 +5798,9 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
> >
> >  	kvm_make_request(KVM_REQ_EVENT, vcpu);
> >
> >+	kvm_clear_async_pf_completion_queue(vcpu);
> >+	memset(vcpu->arch.apf.gfns, 0xff, sizeof vcpu->arch.apf.gfns);
> 
> An ordinary for loop is less tricky, even if it means one more line.
> 
> >
> >@@ -6040,6 +6064,7 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
> >  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
> >  {
> >  	return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
> >+		|| !list_empty_careful(&vcpu->async_pf.done)
> >  		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> >  		|| vcpu->arch.nmi_pending ||
> >  		(kvm_arch_interrupt_allowed(vcpu)&&
> 
> Unrelated, shouldn't kvm_arch_vcpu_runnable() look at
> vcpu->requests?  Specifically KVM_REQ_EVENT?
I think KVM_REQ_EVENT is covered by checking nmi and interrupt queue
here.

> 
> >+static void kvm_add_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
> >+{
> >+	u32 key = kvm_async_pf_hash_fn(gfn);
> >+
> >+	while (vcpu->arch.apf.gfns[key] != -1)
> >+		key = kvm_async_pf_next_probe(key);
> 
> Not sure what that -1 converts to on i386 where gfn_t is u64.
Will check.

> >+
> >+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
> >+				     struct kvm_async_pf *work)
> >+{
> >+	vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> >+
> >+	if (work == kvm_double_apf)
> >+		trace_kvm_async_pf_doublefault(kvm_rip_read(vcpu));
> >+	else {
> >+		trace_kvm_async_pf_not_present(work->gva);
> >+
> >+		kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
> >+	}
> >+}
> 
> Just have vcpu as the argument for tracepoints to avoid
> unconditional kvm_rip_read (slow on Intel), and call kvm_rip_read()
> in tp_fast_assign().  Similarly you can pass work instead of
> work->gva, though that's not nearly as important.
> 
Will do.

> >+
> >+TRACE_EVENT(
> >+	kvm_async_pf_not_present,
> >+	TP_PROTO(u64 gva),
> >+	TP_ARGS(gva),
> 
> Do you actually have a gva with tdp?  With nested virtualization,
> how do you interpret this gva?
With tdp it is gpa just like tdp_page_fault gets gpa where shadow page
version gets gva. Nested virtualization is too complex to interpret.

> >+
> >+TRACE_EVENT(
> >+	kvm_async_pf_completed,
> >+	TP_PROTO(unsigned long address, struct page *page, u64 gva),
> >+	TP_ARGS(address, page, gva),
> 
> What does address mean?  There's also gva?
> 
hva.

> >+
> >+	TP_STRUCT__entry(
> >+		__field(unsigned long, address)
> >+		__field(struct page*, page)
> >+		__field(u64, gva)
> >+		),
> >+
> >+	TP_fast_assign(
> >+		__entry->address = address;
> >+		__entry->page = page;
> >+		__entry->gva = gva;
> >+		),
> 
> Recording a struct page * in a tracepoint?  Userspace can read this
> entry, better to the page_to_pfn() here.
>
OK.
 
> 
> >+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> >+{
> >+	/* cancel outstanding work queue item */
> >+	while (!list_empty(&vcpu->async_pf.queue)) {
> >+		struct kvm_async_pf *work =
> >+			list_entry(vcpu->async_pf.queue.next,
> >+				   typeof(*work), queue);
> >+		cancel_work_sync(&work->work);
> >+		list_del(&work->queue);
> >+		if (!work->page) /* work was canceled */
> >+			kmem_cache_free(async_pf_cache, work);
> >+	}
> 
> Are you holding any lock here?
> 
> If not, what protects vcpu->async_pf.queue?
Nothing. It is accessed only from vcpu thread.

> If yes, cancel_work_sync() will need to aquire it too (in case work
> is running now and needs to take the lock, and cacncel_work_sync()
> needs to wait for it) -> deadlock.
> 
Work never touches this list.

> >+
> >+	/* do alloc nowait since if we are going to sleep anyway we
> >+	   may as well sleep faulting in page */
> /*
>  * multi
>  * line
>  * comment
>  */
> 
> (but a good one, this is subtle)
> 
> I missed where you halt the vcpu.  Can you point me at the function?
> 
> Note this is a synthetic halt and must not be visible to live
> migration, or we risk live migrating a halted state which doesn't
> really exist.
> 
> Might be simplest to drain the apf queue on any of the save/restore ioctls.
> 
So that "info cpu" will interfere with apf? Migration should work
in regular way. apf state should not be migrated since it has no meaning
on the destination. I'll make sure synthetic halt state will not
interfere with migration.

--
			Gleb.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-10-07 17:47 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-04 15:56 [PATCH v6 00/12] KVM: Add host swap event notifications for PV guest Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 01/12] Add get_user_pages() variant that fails if major fault is required Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out Gleb Natapov
2010-10-05  1:20   ` Rik van Riel
2010-10-05 14:59   ` Marcelo Tosatti
2010-10-06 10:50     ` Avi Kivity
2010-10-06 10:52       ` Gleb Natapov
2010-10-07  9:54         ` Avi Kivity
2010-10-07 17:48           ` Gleb Natapov
2010-10-06 11:15     ` Gleb Natapov
2010-10-07  9:50   ` Avi Kivity
2010-10-07  9:52     ` Avi Kivity
2010-10-07 13:24     ` Rik van Riel
2010-10-07 13:29       ` Avi Kivity
2010-10-07 17:47     ` Gleb Natapov [this message]
2010-10-09 18:30       ` Avi Kivity
2010-10-09 18:32         ` Avi Kivity
2010-10-10  7:30           ` Gleb Natapov
2010-10-10  7:29         ` Gleb Natapov
2010-10-10 15:55           ` Avi Kivity
2010-10-10 15:56             ` Avi Kivity
2010-10-10 16:17               ` Gleb Natapov
2010-10-10 16:16             ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 03/12] Retry fault before vmentry Gleb Natapov
2010-10-05 15:54   ` Marcelo Tosatti
2010-10-06 11:07     ` Gleb Natapov
2010-10-06 14:20       ` Marcelo Tosatti
2010-10-07 18:44         ` Gleb Natapov
2010-10-08 16:07           ` Marcelo Tosatti
2010-10-07 12:29   ` Avi Kivity
2010-10-07 17:21     ` Gleb Natapov
2010-10-09 18:42       ` Avi Kivity
2010-10-10  7:35         ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 04/12] Add memory slot versioning and use it to provide fast guest write interface Gleb Natapov
2010-10-05  1:29   ` Rik van Riel
2010-10-05 16:57   ` Marcelo Tosatti
2010-10-06 11:14     ` Gleb Natapov
2010-10-06 14:38       ` Marcelo Tosatti
2010-10-06 20:08         ` Gleb Natapov
2010-10-07 10:00           ` Avi Kivity
2010-10-07 15:42             ` Marcelo Tosatti
2010-10-07 16:03               ` Gleb Natapov
2010-10-07 16:20                 ` Avi Kivity
2010-10-07 17:23                   ` Gleb Natapov
2010-10-10 12:48                     ` Avi Kivity
2010-10-07 12:31   ` Avi Kivity
2010-10-04 15:56 ` [PATCH v6 05/12] Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 06/12] Add PV MSR to enable asynchronous page faults delivery Gleb Natapov
2010-10-07 12:42   ` Avi Kivity
2010-10-07 17:53     ` Gleb Natapov
2010-10-10 12:47       ` Avi Kivity
2010-10-10 13:27         ` Gleb Natapov
2010-10-07 12:58   ` Avi Kivity
2010-10-07 17:59     ` Gleb Natapov
2010-10-09 18:43       ` Avi Kivity
2010-10-04 15:56 ` [PATCH v6 07/12] Add async PF initialization to PV guest Gleb Natapov
2010-10-05  2:34   ` Rik van Riel
2010-10-05 18:25   ` Marcelo Tosatti
2010-10-06 10:55     ` Gleb Natapov
2010-10-06 14:45       ` Marcelo Tosatti
2010-10-06 20:05         ` Gleb Natapov
2010-10-07 12:50   ` Avi Kivity
2010-10-08  7:54     ` Gleb Natapov
2010-10-09 18:44       ` Avi Kivity
2010-10-04 15:56 ` [PATCH v6 08/12] Handle async PF in a guest Gleb Natapov
2010-10-07 13:10   ` Avi Kivity
2010-10-07 17:14     ` Gleb Natapov
2010-10-07 17:18       ` Avi Kivity
2010-10-07 17:48         ` Rik van Riel
2010-10-07 18:03         ` Gleb Natapov
2010-10-09 18:48           ` Avi Kivity
2010-10-10  7:56             ` Gleb Natapov
2010-10-10 12:40               ` Avi Kivity
2010-10-10 12:32     ` Gleb Natapov
2010-10-10 12:38       ` Avi Kivity
2010-10-10 13:22         ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 09/12] Inject asynchronous page fault into a PV guest if page is swapped out Gleb Natapov
2010-10-05  2:36   ` Rik van Riel
2010-10-05 19:00   ` Marcelo Tosatti
2010-10-06 10:42     ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 10/12] Handle async PF in non preemptable context Gleb Natapov
2010-10-05 19:51   ` Marcelo Tosatti
2010-10-06 10:41     ` Gleb Natapov
2010-10-10 14:25       ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 11/12] Let host know whether the guest can handle async PF in non-userspace context Gleb Natapov
2010-10-07 13:36   ` Avi Kivity
2010-10-04 15:56 ` [PATCH v6 12/12] Send async PF when guest is not in userspace too Gleb Natapov
2010-10-05  2:37   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101007174716.GD2397@redhat.com \
    --to=gleb@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=mtosatti@redhat.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).