Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gleb Natapov <gleb@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	a.p.zijlstra@chello.nl, tglx@linutronix.de, hpa@zytor.com,
	riel@redhat.com, cl@linux-foundation.org, mtosatti@redhat.com
Subject: Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
Date: Sun, 10 Oct 2010 18:16:19 +0200	[thread overview]
Message-ID: <20101010161619.GS2397@redhat.com> (raw)
In-Reply-To: <4CB1E1ED.6050405@redhat.com>

On Sun, Oct 10, 2010 at 05:55:25PM +0200, Avi Kivity wrote:
> >>
> >>  >>   >@@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >>  >>   >    	if (unlikely(r))
> >>  >>   >    		goto out;
> >>  >>   >
> >>  >>   >+	kvm_check_async_pf_completion(vcpu);
> >>  >>   >+	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) {
> >>  >>   >+		/* Page is swapped out. Do synthetic halt */
> >>  >>   >+		r = 1;
> >>  >>   >+		goto out;
> >>  >>   >+	}
> >>  >>   >+
> >>  >>
> >>  >>   Why do it here in the fast path?  Can't you halt the cpu when
> >>  >>   starting the page fault?
> >>  >Page fault may complete before guest re-entry. We do not want to halt vcpu
> >>  >in this case.
> >>
> >>  So unhalt on completion.
> >>
> >I want to avoid touching vcpu state from work if possible. Work code does
> >not contain arch dependent code right now and mp_state is x86 thing
> >
> 
> Use a KVM_REQ.
> 
Completion happens asynchronously. CPU may not be even halted at that
point. Actually completion does unhalt vcpu. It puts completed work into
vcpu->async_pf.done list and wakes vcpu thread if it sleeps. Next
invocation of kvm_arch_vcpu_runnable() will return true since vcpu->async_pf.done
is not empty and vcpu will be unhalted in usual way by kvm_vcpu_block().

> 
> >>  >>
> >>  >>   I guess the apf threads can't touch mp_state, but they can have a
> >>  >>   KVM_REQ to trigger the check.
> >>  >This will require KVM_REQ check on fast path, so what's the difference
> >>  >performance wise.
> >>
> >>  We already have a KVM_REQ check (if (vcpu->requests)) so it doesn't
> >>  cost anything extra.
> >if (vcpu->requests) does not clear req bit, so what will have to be added
> >is: if (kvm_check_request(KVM_REQ_APF_HLT, vcpu)) which is even more
> >expensive then my check (but not so expensive to worry about).
> 
> It's only expensive when it happens.  Most entries will have the bit clear.
kvm_check_async_pf_completion() (the one that detects if vcpu should be
halted) is called after vcpu->requests processing. It is done in order
to delay completion checking as far as possible in hope to get
completion before next vcpu entry and skip sending apf, so I do it at
the last possible moment before event injection.

> >>
> >>  >>   >+
> >>  >>   >+TRACE_EVENT(
> >>  >>   >+	kvm_async_pf_not_present,
> >>  >>   >+	TP_PROTO(u64 gva),
> >>  >>   >+	TP_ARGS(gva),
> >>  >>
> >>  >>   Do you actually have a gva with tdp?  With nested virtualization,
> >>  >>   how do you interpret this gva?
> >>  >With tdp it is gpa just like tdp_page_fault gets gpa where shadow page
> >>  >version gets gva. Nested virtualization is too complex to interpret.
> >>
> >>  It's not good to have a tracepoint that depends on cpu mode (without
> >>  recording that mode). I think we have the same issue in
> >>  trace_kvm_page_fault though.
> >We have mmu_is_nested(). I'll just disable apf while vcpu is in nested
> >mode for now.
> 
> What if we get the apf in non-nested mode and it completes in nested mode?
> 
I am not yet sure we have any problem with nested mode at all. I am
looking at it. If we have we can skip prefault if in nested.

> >>
> >>  >>   >+
> >>  >>   >+	/* do alloc nowait since if we are going to sleep anyway we
> >>  >>   >+	   may as well sleep faulting in page */
> >>  >>   /*
> >>  >>    * multi
> >>  >>    * line
> >>  >>    * comment
> >>  >>    */
> >>  >>
> >>  >>   (but a good one, this is subtle)
> >>  >>
> >>  >>   I missed where you halt the vcpu.  Can you point me at the function?
> >>  >>
> >>  >>   Note this is a synthetic halt and must not be visible to live
> >>  >>   migration, or we risk live migrating a halted state which doesn't
> >>  >>   really exist.
> >>  >>
> >>  >>   Might be simplest to drain the apf queue on any of the save/restore ioctls.
> >>  >>
> >>  >So that "info cpu" will interfere with apf? Migration should work
> >>  >in regular way. apf state should not be migrated since it has no meaning
> >>  >on the destination. I'll make sure synthetic halt state will not
> >>  >interfere with migration.
> >>
> >>  If you deliver an apf, the guest expects a completion.
> >>
> >There is special completion that tells guest to wake all sleeping tasks
> >on vcpu. It is delivered after migration on the destination.
> >
> 
> Yes, I saw.
> 
> What if you can't deliver it?  is it possible that some other vcpu
How can this happen? If I can't deliverer it I can't deliver
non-broadcast apfs too.

> will start receiving apfs that alias the old ones?  Or is the
> broadcast global?
> 
Broadcast is not global but tokens are unique per cpu so other vcpu will
not be able to receiving apfs that alias the old ones (if I understand
what you mean correctly). 

--
			Gleb.

WARNING: multiple messages have this Message-ID (diff)

From: Gleb Natapov <gleb@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	a.p.zijlstra@chello.nl, tglx@linutronix.de, hpa@zytor.com,
	riel@redhat.com, cl@linux-foundation.org, mtosatti@redhat.com
Subject: Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.
Date: Sun, 10 Oct 2010 18:16:19 +0200	[thread overview]
Message-ID: <20101010161619.GS2397@redhat.com> (raw)
In-Reply-To: <4CB1E1ED.6050405@redhat.com>

On Sun, Oct 10, 2010 at 05:55:25PM +0200, Avi Kivity wrote:
> >>
> >>  >>   >@@ -5112,6 +5122,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >>  >>   >    	if (unlikely(r))
> >>  >>   >    		goto out;
> >>  >>   >
> >>  >>   >+	kvm_check_async_pf_completion(vcpu);
> >>  >>   >+	if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) {
> >>  >>   >+		/* Page is swapped out. Do synthetic halt */
> >>  >>   >+		r = 1;
> >>  >>   >+		goto out;
> >>  >>   >+	}
> >>  >>   >+
> >>  >>
> >>  >>   Why do it here in the fast path?  Can't you halt the cpu when
> >>  >>   starting the page fault?
> >>  >Page fault may complete before guest re-entry. We do not want to halt vcpu
> >>  >in this case.
> >>
> >>  So unhalt on completion.
> >>
> >I want to avoid touching vcpu state from work if possible. Work code does
> >not contain arch dependent code right now and mp_state is x86 thing
> >
> 
> Use a KVM_REQ.
> 
Completion happens asynchronously. CPU may not be even halted at that
point. Actually completion does unhalt vcpu. It puts completed work into
vcpu->async_pf.done list and wakes vcpu thread if it sleeps. Next
invocation of kvm_arch_vcpu_runnable() will return true since vcpu->async_pf.done
is not empty and vcpu will be unhalted in usual way by kvm_vcpu_block().

> 
> >>  >>
> >>  >>   I guess the apf threads can't touch mp_state, but they can have a
> >>  >>   KVM_REQ to trigger the check.
> >>  >This will require KVM_REQ check on fast path, so what's the difference
> >>  >performance wise.
> >>
> >>  We already have a KVM_REQ check (if (vcpu->requests)) so it doesn't
> >>  cost anything extra.
> >if (vcpu->requests) does not clear req bit, so what will have to be added
> >is: if (kvm_check_request(KVM_REQ_APF_HLT, vcpu)) which is even more
> >expensive then my check (but not so expensive to worry about).
> 
> It's only expensive when it happens.  Most entries will have the bit clear.
kvm_check_async_pf_completion() (the one that detects if vcpu should be
halted) is called after vcpu->requests processing. It is done in order
to delay completion checking as far as possible in hope to get
completion before next vcpu entry and skip sending apf, so I do it at
the last possible moment before event injection.

> >>
> >>  >>   >+
> >>  >>   >+TRACE_EVENT(
> >>  >>   >+	kvm_async_pf_not_present,
> >>  >>   >+	TP_PROTO(u64 gva),
> >>  >>   >+	TP_ARGS(gva),
> >>  >>
> >>  >>   Do you actually have a gva with tdp?  With nested virtualization,
> >>  >>   how do you interpret this gva?
> >>  >With tdp it is gpa just like tdp_page_fault gets gpa where shadow page
> >>  >version gets gva. Nested virtualization is too complex to interpret.
> >>
> >>  It's not good to have a tracepoint that depends on cpu mode (without
> >>  recording that mode). I think we have the same issue in
> >>  trace_kvm_page_fault though.
> >We have mmu_is_nested(). I'll just disable apf while vcpu is in nested
> >mode for now.
> 
> What if we get the apf in non-nested mode and it completes in nested mode?
> 
I am not yet sure we have any problem with nested mode at all. I am
looking at it. If we have we can skip prefault if in nested.

> >>
> >>  >>   >+
> >>  >>   >+	/* do alloc nowait since if we are going to sleep anyway we
> >>  >>   >+	   may as well sleep faulting in page */
> >>  >>   /*
> >>  >>    * multi
> >>  >>    * line
> >>  >>    * comment
> >>  >>    */
> >>  >>
> >>  >>   (but a good one, this is subtle)
> >>  >>
> >>  >>   I missed where you halt the vcpu.  Can you point me at the function?
> >>  >>
> >>  >>   Note this is a synthetic halt and must not be visible to live
> >>  >>   migration, or we risk live migrating a halted state which doesn't
> >>  >>   really exist.
> >>  >>
> >>  >>   Might be simplest to drain the apf queue on any of the save/restore ioctls.
> >>  >>
> >>  >So that "info cpu" will interfere with apf? Migration should work
> >>  >in regular way. apf state should not be migrated since it has no meaning
> >>  >on the destination. I'll make sure synthetic halt state will not
> >>  >interfere with migration.
> >>
> >>  If you deliver an apf, the guest expects a completion.
> >>
> >There is special completion that tells guest to wake all sleeping tasks
> >on vcpu. It is delivered after migration on the destination.
> >
> 
> Yes, I saw.
> 
> What if you can't deliver it?  is it possible that some other vcpu
How can this happen? If I can't deliverer it I can't deliver
non-broadcast apfs too.

> will start receiving apfs that alias the old ones?  Or is the
> broadcast global?
> 
Broadcast is not global but tokens are unique per cpu so other vcpu will
not be able to receiving apfs that alias the old ones (if I understand
what you mean correctly). 

--
			Gleb.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-10-10 16:16 UTC|newest]

Thread overview: 176+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-04 15:56 [PATCH v6 00/12] KVM: Add host swap event notifications for PV guest Gleb Natapov
2010-10-04 15:56 ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 01/12] Add get_user_pages() variant that fails if major fault is required Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-05  1:20   ` Rik van Riel
2010-10-05  1:20     ` Rik van Riel
2010-10-05 14:59   ` Marcelo Tosatti
2010-10-05 14:59     ` Marcelo Tosatti
2010-10-06 10:50     ` Avi Kivity
2010-10-06 10:50       ` Avi Kivity
2010-10-06 10:52       ` Gleb Natapov
2010-10-06 10:52         ` Gleb Natapov
2010-10-07  9:54         ` Avi Kivity
2010-10-07  9:54           ` Avi Kivity
2010-10-07 17:48           ` Gleb Natapov
2010-10-07 17:48             ` Gleb Natapov
2010-10-06 11:15     ` Gleb Natapov
2010-10-06 11:15       ` Gleb Natapov
2010-10-07  9:50   ` Avi Kivity
2010-10-07  9:50     ` Avi Kivity
2010-10-07  9:52     ` Avi Kivity
2010-10-07  9:52       ` Avi Kivity
2010-10-07 13:24     ` Rik van Riel
2010-10-07 13:24       ` Rik van Riel
2010-10-07 13:29       ` Avi Kivity
2010-10-07 13:29         ` Avi Kivity
2010-10-07 17:47     ` Gleb Natapov
2010-10-07 17:47       ` Gleb Natapov
2010-10-09 18:30       ` Avi Kivity
2010-10-09 18:30         ` Avi Kivity
2010-10-09 18:32         ` Avi Kivity
2010-10-09 18:32           ` Avi Kivity
2010-10-10  7:30           ` Gleb Natapov
2010-10-10  7:30             ` Gleb Natapov
2010-10-10  7:29         ` Gleb Natapov
2010-10-10  7:29           ` Gleb Natapov
2010-10-10 15:55           ` Avi Kivity
2010-10-10 15:55             ` Avi Kivity
2010-10-10 15:56             ` Avi Kivity
2010-10-10 15:56               ` Avi Kivity
2010-10-10 16:17               ` Gleb Natapov
2010-10-10 16:17                 ` Gleb Natapov
2010-10-10 16:16             ` Gleb Natapov [this message]
2010-10-10 16:16               ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 03/12] Retry fault before vmentry Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-05 15:54   ` Marcelo Tosatti
2010-10-05 15:54     ` Marcelo Tosatti
2010-10-06 11:07     ` Gleb Natapov
2010-10-06 11:07       ` Gleb Natapov
2010-10-06 14:20       ` Marcelo Tosatti
2010-10-06 14:20         ` Marcelo Tosatti
2010-10-07 18:44         ` Gleb Natapov
2010-10-07 18:44           ` Gleb Natapov
2010-10-08 16:07           ` Marcelo Tosatti
2010-10-08 16:07             ` Marcelo Tosatti
2010-10-07 12:29   ` Avi Kivity
2010-10-07 12:29     ` Avi Kivity
2010-10-07 17:21     ` Gleb Natapov
2010-10-07 17:21       ` Gleb Natapov
2010-10-09 18:42       ` Avi Kivity
2010-10-09 18:42         ` Avi Kivity
2010-10-10  7:35         ` Gleb Natapov
2010-10-10  7:35           ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 04/12] Add memory slot versioning and use it to provide fast guest write interface Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-05  1:29   ` Rik van Riel
2010-10-05  1:29     ` Rik van Riel
2010-10-05 16:57   ` Marcelo Tosatti
2010-10-05 16:57     ` Marcelo Tosatti
2010-10-06 11:14     ` Gleb Natapov
2010-10-06 11:14       ` Gleb Natapov
2010-10-06 14:38       ` Marcelo Tosatti
2010-10-06 14:38         ` Marcelo Tosatti
2010-10-06 20:08         ` Gleb Natapov
2010-10-06 20:08           ` Gleb Natapov
2010-10-07 10:00           ` Avi Kivity
2010-10-07 10:00             ` Avi Kivity
2010-10-07 15:42             ` Marcelo Tosatti
2010-10-07 15:42               ` Marcelo Tosatti
2010-10-07 16:03               ` Gleb Natapov
2010-10-07 16:03                 ` Gleb Natapov
2010-10-07 16:20                 ` Avi Kivity
2010-10-07 16:20                   ` Avi Kivity
2010-10-07 17:23                   ` Gleb Natapov
2010-10-07 17:23                     ` Gleb Natapov
2010-10-10 12:48                     ` Avi Kivity
2010-10-10 12:48                       ` Avi Kivity
2010-10-07 12:31   ` Avi Kivity
2010-10-07 12:31     ` Avi Kivity
2010-10-04 15:56 ` [PATCH v6 05/12] Move kvm_smp_prepare_boot_cpu() from kvmclock.c to kvm.c Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 06/12] Add PV MSR to enable asynchronous page faults delivery Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-07 12:42   ` Avi Kivity
2010-10-07 12:42     ` Avi Kivity
2010-10-07 17:53     ` Gleb Natapov
2010-10-07 17:53       ` Gleb Natapov
2010-10-10 12:47       ` Avi Kivity
2010-10-10 12:47         ` Avi Kivity
2010-10-10 13:27         ` Gleb Natapov
2010-10-10 13:27           ` Gleb Natapov
2010-10-07 12:58   ` Avi Kivity
2010-10-07 12:58     ` Avi Kivity
2010-10-07 17:59     ` Gleb Natapov
2010-10-07 17:59       ` Gleb Natapov
2010-10-09 18:43       ` Avi Kivity
2010-10-09 18:43         ` Avi Kivity
2010-10-04 15:56 ` [PATCH v6 07/12] Add async PF initialization to PV guest Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-05  2:34   ` Rik van Riel
2010-10-05  2:34     ` Rik van Riel
2010-10-05 18:25   ` Marcelo Tosatti
2010-10-05 18:25     ` Marcelo Tosatti
2010-10-06 10:55     ` Gleb Natapov
2010-10-06 10:55       ` Gleb Natapov
2010-10-06 14:45       ` Marcelo Tosatti
2010-10-06 14:45         ` Marcelo Tosatti
2010-10-06 20:05         ` Gleb Natapov
2010-10-06 20:05           ` Gleb Natapov
2010-10-07 12:50   ` Avi Kivity
2010-10-07 12:50     ` Avi Kivity
2010-10-08  7:54     ` Gleb Natapov
2010-10-08  7:54       ` Gleb Natapov
2010-10-09 18:44       ` Avi Kivity
2010-10-09 18:44         ` Avi Kivity
2010-10-04 15:56 ` [PATCH v6 08/12] Handle async PF in a guest Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-07 13:10   ` Avi Kivity
2010-10-07 13:10     ` Avi Kivity
2010-10-07 17:14     ` Gleb Natapov
2010-10-07 17:14       ` Gleb Natapov
2010-10-07 17:18       ` Avi Kivity
2010-10-07 17:18         ` Avi Kivity
2010-10-07 17:48         ` Rik van Riel
2010-10-07 17:48           ` Rik van Riel
2010-10-07 18:03         ` Gleb Natapov
2010-10-07 18:03           ` Gleb Natapov
2010-10-09 18:48           ` Avi Kivity
2010-10-09 18:48             ` Avi Kivity
2010-10-10  7:56             ` Gleb Natapov
2010-10-10  7:56               ` Gleb Natapov
2010-10-10 12:40               ` Avi Kivity
2010-10-10 12:40                 ` Avi Kivity
2010-10-10 12:32     ` Gleb Natapov
2010-10-10 12:32       ` Gleb Natapov
2010-10-10 12:38       ` Avi Kivity
2010-10-10 12:38         ` Avi Kivity
2010-10-10 13:22         ` Gleb Natapov
2010-10-10 13:22           ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 09/12] Inject asynchronous page fault into a PV guest if page is swapped out Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-05  2:36   ` Rik van Riel
2010-10-05  2:36     ` Rik van Riel
2010-10-05 19:00   ` Marcelo Tosatti
2010-10-05 19:00     ` Marcelo Tosatti
2010-10-06 10:42     ` Gleb Natapov
2010-10-06 10:42       ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 10/12] Handle async PF in non preemptable context Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-05 19:51   ` Marcelo Tosatti
2010-10-05 19:51     ` Marcelo Tosatti
2010-10-06 10:41     ` Gleb Natapov
2010-10-06 10:41       ` Gleb Natapov
2010-10-10 14:25       ` Gleb Natapov
2010-10-10 14:25         ` Gleb Natapov
2010-10-04 15:56 ` [PATCH v6 11/12] Let host know whether the guest can handle async PF in non-userspace context Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-07 13:36   ` Avi Kivity
2010-10-07 13:36     ` Avi Kivity
2010-10-04 15:56 ` [PATCH v6 12/12] Send async PF when guest is not in userspace too Gleb Natapov
2010-10-04 15:56   ` Gleb Natapov
2010-10-05  2:37   ` Rik van Riel
2010-10-05  2:37     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101010161619.GS2397@redhat.com \
    --to=gleb@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@redhat.com \
    --cc=cl@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=mtosatti@redhat.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.