Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gleb Natapov <gleb@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.
Date: Mon, 2 Nov 2009 17:54:10 +0200	[thread overview]
Message-ID: <20091102155410.GE27911@redhat.com> (raw)
In-Reply-To: <4AEED2C6.6030508@redhat.com>

On Mon, Nov 02, 2009 at 02:38:30PM +0200, Avi Kivity wrote:
> On 11/01/2009 01:56 PM, Gleb Natapov wrote:
> >Asynchronous page fault notifies vcpu that page it is trying to access
> >is swapped out by a host. In response guest puts a task that caused the
> >fault to sleep until page is swapped in again. When missing page is
> >brought back into the memory guest is notified and task resumes execution.
> >
> >diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> >index 90708b7..61e2aa3 100644
> >--- a/arch/x86/include/asm/kvm_para.h
> >+++ b/arch/x86/include/asm/kvm_para.h
> >@@ -52,6 +52,9 @@ struct kvm_mmu_op_release_pt {
> >
> >  #define KVM_PV_SHM_FEATURES_ASYNC_PF		(1<<  0)
> >
> >+#define KVM_PV_REASON_PAGE_NP 1
> >+#define KVM_PV_REASON_PAGE_READY 2
> 
> _NOT_PRESENT would improve readability.
> 
> >+static void apf_task_wait(struct task_struct *tsk, u64 token)
> >  {
> >+	u64 key = hash_64(token, KVM_TASK_SLEEP_HASHBITS);
> >+	struct kvm_task_sleep_head *b =&async_pf_sleepers[key];
> >+	struct kvm_task_sleep_node n, *e;
> >+	DEFINE_WAIT(wait);
> >+
> >+	spin_lock(&b->lock);
> >+	e = _find_apf_task(b, token);
> >+	if (e) {
> >+		/* dummy entry exist ->  wake up was delivered ahead of PF */
> >+		hlist_del(&e->link);
> >+		kfree(e);
> >+		spin_unlock(&b->lock);
> >+		return;
> >+	}
> >+
> >+	n.token = token;
> >+	init_waitqueue_head(&n.wq);
> >+	hlist_add_head(&n.link,&b->list);
> >+	spin_unlock(&b->lock);
> >+
> >+	for (;;) {
> >+		prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE);
> >+		if (hlist_unhashed(&n.link))
> >+			break;
> 
> Don't you need locking here?  At least for the implied memory barriers.
> 
May be memory barrier will be enough. Will look at it.

> >+		schedule();
> >+	}
> >+	finish_wait(&n.wq,&wait);
> >+
> >+	return;
> >+}
> >+
> >+static void apf_task_wake(u64 token)
> >+{
> >+	u64 key = hash_64(token, KVM_TASK_SLEEP_HASHBITS);
> >+	struct kvm_task_sleep_head *b =&async_pf_sleepers[key];
> >+	struct kvm_task_sleep_node *n;
> >+
> >+	spin_lock(&b->lock);
> >+	n = _find_apf_task(b, token);
> >+	if (!n) {
> >+		/* PF was not yet handled. Add dummy entry for the token */
> >+		n = kmalloc(sizeof(*n), GFP_ATOMIC);
> >+		if (!n) {
> >+			printk(KERN_EMERG"async PF can't allocate memory\n");
> 
> Worrying.  We could have an emergency pool of one node per cpu, and
> disable apf if we use it until it's returned.  But that's a lot of
> complexity for an edge case, so a simpler solution would be welcome.
> 
Currently this code can't trigger since "wake up" is always sent on the
same vcpu as "not present", but I don't want this implementation detail
to be part of guest/host interface. Idea you've described can be easy
to implement. Will look into it.

> >+int kvm_handle_pf(struct pt_regs *regs, unsigned long error_code)
> >+{
> >+	u64 reason, token;
> >  	struct kvm_vcpu_pv_shm *pv_shm =
> >  		per_cpu(kvm_vcpu_pv_shm, smp_processor_id());
> >
> >  	if (!pv_shm)
> >-		return;
> >+		return 0;
> >+
> >+	reason = pv_shm->reason;
> >+	pv_shm->reason = 0;
> >+
> >+	token = pv_shm->param;
> >+
> >+	switch (reason) {
> >+	default:
> >+		return 0;
> >+	case KVM_PV_REASON_PAGE_NP:
> >+		/* real page is missing. */
> >+		apf_task_wait(current, token);
> >+		break;
> >+	case KVM_PV_REASON_PAGE_READY:
> >+		apf_task_wake(token);
> >+		break;
> >+	}
> 
> Ah, reason is not a bitmask but an enumerator.  __u32 is more
> friendly to i386 in that case.
> 
OK. Will need padding for 64 bit host case.

> Much of the code here is arch independent and would work well on
> non-x86 kvm ports.  But we can always lay the burden of moving it on
> the non-x86 maintainers.
> 
> -- 
> error compiling committee.c: too many arguments to function

--
			Gleb.

WARNING: multiple messages have this Message-ID (diff)

From: Gleb Natapov <gleb@redhat.com>
To: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 03/11] Handle asynchronous page fault in a PV guest.
Date: Mon, 2 Nov 2009 17:54:10 +0200	[thread overview]
Message-ID: <20091102155410.GE27911@redhat.com> (raw)
In-Reply-To: <4AEED2C6.6030508@redhat.com>

On Mon, Nov 02, 2009 at 02:38:30PM +0200, Avi Kivity wrote:
> On 11/01/2009 01:56 PM, Gleb Natapov wrote:
> >Asynchronous page fault notifies vcpu that page it is trying to access
> >is swapped out by a host. In response guest puts a task that caused the
> >fault to sleep until page is swapped in again. When missing page is
> >brought back into the memory guest is notified and task resumes execution.
> >
> >diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> >index 90708b7..61e2aa3 100644
> >--- a/arch/x86/include/asm/kvm_para.h
> >+++ b/arch/x86/include/asm/kvm_para.h
> >@@ -52,6 +52,9 @@ struct kvm_mmu_op_release_pt {
> >
> >  #define KVM_PV_SHM_FEATURES_ASYNC_PF		(1<<  0)
> >
> >+#define KVM_PV_REASON_PAGE_NP 1
> >+#define KVM_PV_REASON_PAGE_READY 2
> 
> _NOT_PRESENT would improve readability.
> 
> >+static void apf_task_wait(struct task_struct *tsk, u64 token)
> >  {
> >+	u64 key = hash_64(token, KVM_TASK_SLEEP_HASHBITS);
> >+	struct kvm_task_sleep_head *b =&async_pf_sleepers[key];
> >+	struct kvm_task_sleep_node n, *e;
> >+	DEFINE_WAIT(wait);
> >+
> >+	spin_lock(&b->lock);
> >+	e = _find_apf_task(b, token);
> >+	if (e) {
> >+		/* dummy entry exist ->  wake up was delivered ahead of PF */
> >+		hlist_del(&e->link);
> >+		kfree(e);
> >+		spin_unlock(&b->lock);
> >+		return;
> >+	}
> >+
> >+	n.token = token;
> >+	init_waitqueue_head(&n.wq);
> >+	hlist_add_head(&n.link,&b->list);
> >+	spin_unlock(&b->lock);
> >+
> >+	for (;;) {
> >+		prepare_to_wait(&n.wq,&wait, TASK_UNINTERRUPTIBLE);
> >+		if (hlist_unhashed(&n.link))
> >+			break;
> 
> Don't you need locking here?  At least for the implied memory barriers.
> 
May be memory barrier will be enough. Will look at it.

> >+		schedule();
> >+	}
> >+	finish_wait(&n.wq,&wait);
> >+
> >+	return;
> >+}
> >+
> >+static void apf_task_wake(u64 token)
> >+{
> >+	u64 key = hash_64(token, KVM_TASK_SLEEP_HASHBITS);
> >+	struct kvm_task_sleep_head *b =&async_pf_sleepers[key];
> >+	struct kvm_task_sleep_node *n;
> >+
> >+	spin_lock(&b->lock);
> >+	n = _find_apf_task(b, token);
> >+	if (!n) {
> >+		/* PF was not yet handled. Add dummy entry for the token */
> >+		n = kmalloc(sizeof(*n), GFP_ATOMIC);
> >+		if (!n) {
> >+			printk(KERN_EMERG"async PF can't allocate memory\n");
> 
> Worrying.  We could have an emergency pool of one node per cpu, and
> disable apf if we use it until it's returned.  But that's a lot of
> complexity for an edge case, so a simpler solution would be welcome.
> 
Currently this code can't trigger since "wake up" is always sent on the
same vcpu as "not present", but I don't want this implementation detail
to be part of guest/host interface. Idea you've described can be easy
to implement. Will look into it.

> >+int kvm_handle_pf(struct pt_regs *regs, unsigned long error_code)
> >+{
> >+	u64 reason, token;
> >  	struct kvm_vcpu_pv_shm *pv_shm =
> >  		per_cpu(kvm_vcpu_pv_shm, smp_processor_id());
> >
> >  	if (!pv_shm)
> >-		return;
> >+		return 0;
> >+
> >+	reason = pv_shm->reason;
> >+	pv_shm->reason = 0;
> >+
> >+	token = pv_shm->param;
> >+
> >+	switch (reason) {
> >+	default:
> >+		return 0;
> >+	case KVM_PV_REASON_PAGE_NP:
> >+		/* real page is missing. */
> >+		apf_task_wait(current, token);
> >+		break;
> >+	case KVM_PV_REASON_PAGE_READY:
> >+		apf_task_wake(token);
> >+		break;
> >+	}
> 
> Ah, reason is not a bitmask but an enumerator.  __u32 is more
> friendly to i386 in that case.
> 
OK. Will need padding for 64 bit host case.

> Much of the code here is arch independent and would work well on
> non-x86 kvm ports.  But we can always lay the burden of moving it on
> the non-x86 maintainers.
> 
> -- 
> error compiling committee.c: too many arguments to function

--
			Gleb.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-11-02 15:54 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-01 11:56 [PATCH 00/11] KVM: Add asynchronous page fault for PV guest Gleb Natapov
2009-11-01 11:56 ` Gleb Natapov
2009-11-01 11:56 ` [PATCH 01/11] Add shared memory hypercall to PV Linux guest Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-02  4:27   ` Rik van Riel
2009-11-02  4:27     ` Rik van Riel
2009-11-02  7:07     ` Gleb Natapov
2009-11-02  7:07       ` Gleb Natapov
2009-11-02 12:18   ` Avi Kivity
2009-11-02 12:18     ` Avi Kivity
2009-11-02 16:18     ` Gleb Natapov
2009-11-02 16:18       ` Gleb Natapov
2009-11-03  5:15       ` Avi Kivity
2009-11-03  5:15         ` Avi Kivity
2009-11-03  7:16         ` Gleb Natapov
2009-11-03  7:16           ` Gleb Natapov
2009-11-03  7:40           ` Avi Kivity
2009-11-03  7:40             ` Avi Kivity
2009-11-01 11:56 ` [PATCH 02/11] Add "handle page fault" PV helper Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-02  9:22   ` Ingo Molnar
2009-11-02  9:22     ` Ingo Molnar
2009-11-02 16:04     ` Gleb Natapov
2009-11-02 16:04       ` Gleb Natapov
2009-11-02 16:12       ` Ingo Molnar
2009-11-02 16:12         ` Ingo Molnar
2009-11-02 16:22         ` Gleb Natapov
2009-11-02 16:22           ` Gleb Natapov
2009-11-02 16:29           ` Ingo Molnar
2009-11-02 16:29             ` Ingo Molnar
2009-11-02 16:31             ` Gleb Natapov
2009-11-02 16:31               ` Gleb Natapov
2009-11-02 17:42             ` Gleb Natapov
2009-11-02 17:42               ` Gleb Natapov
2009-11-08 11:36               ` Ingo Molnar
2009-11-08 11:36                 ` Ingo Molnar
2009-11-08 12:43                 ` Avi Kivity
2009-11-08 12:43                   ` Avi Kivity
2009-11-08 12:51                   ` Ingo Molnar
2009-11-08 12:51                     ` Ingo Molnar
2009-11-08 13:01                     ` Avi Kivity
2009-11-08 13:01                       ` Avi Kivity
2009-11-08 13:05                       ` Ingo Molnar
2009-11-08 13:05                         ` Ingo Molnar
2009-11-08 13:08                         ` Avi Kivity
2009-11-08 13:08                           ` Avi Kivity
2009-11-08 16:44                     ` H. Peter Anvin
2009-11-08 16:44                       ` H. Peter Anvin
2009-11-08 16:47                       ` Ingo Molnar
2009-11-08 16:47                         ` Ingo Molnar
2009-11-02 19:03     ` Rik van Riel
2009-11-02 19:03       ` Rik van Riel
2009-11-02 19:33       ` Avi Kivity
2009-11-02 19:33         ` Avi Kivity
2009-11-02 23:35         ` Rik van Riel
2009-11-02 23:35           ` Rik van Riel
2009-11-03  4:57           ` Avi Kivity
2009-11-03  4:57             ` Avi Kivity
2009-11-03  4:57             ` Avi Kivity
2009-11-05  6:44             ` Tian, Kevin
2009-11-05  6:44               ` Tian, Kevin
2009-11-05  8:22               ` Avi Kivity
2009-11-05  8:22                 ` Avi Kivity
2009-11-05  8:22                 ` Avi Kivity
2009-11-01 11:56 ` [PATCH 03/11] Handle asynchronous page fault in a PV guest Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-02 12:38   ` Avi Kivity
2009-11-02 12:38     ` Avi Kivity
2009-11-02 15:54     ` Gleb Natapov [this message]
2009-11-02 15:54       ` Gleb Natapov
2009-11-03 14:14   ` Marcelo Tosatti
2009-11-03 14:14     ` Marcelo Tosatti
2009-11-03 14:25     ` Gleb Natapov
2009-11-03 14:25       ` Gleb Natapov
2009-11-03 14:32       ` Marcelo Tosatti
2009-11-03 14:32         ` Marcelo Tosatti
2009-11-03 14:38         ` Avi Kivity
2009-11-03 14:38           ` Avi Kivity
2009-11-01 11:56 ` [PATCH 04/11] Export __get_user_pages_fast Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-02  9:23   ` Ingo Molnar
2009-11-02  9:23     ` Ingo Molnar
2009-11-01 11:56 ` [PATCH 05/11] Add get_user_pages() variant that fails if major fault is required Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-02 19:05   ` Rik van Riel
2009-11-02 19:05     ` Rik van Riel
2009-11-01 11:56 ` [PATCH 06/11] Inject asynchronous page fault into a guest if page is swapped out Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-02 12:56   ` Avi Kivity
2009-11-02 12:56     ` Avi Kivity
2009-11-02 15:41     ` Gleb Natapov
2009-11-02 15:41       ` Gleb Natapov
2009-11-01 11:56 ` [PATCH 07/11] Retry fault before vmentry Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-02 13:03   ` Avi Kivity
2009-11-02 13:03     ` Avi Kivity
2009-11-01 11:56 ` [PATCH 08/11] Add "wait for page" hypercall Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-02 13:05   ` Avi Kivity
2009-11-02 13:05     ` Avi Kivity
2009-11-02 15:13     ` Gleb Natapov
2009-11-02 15:13       ` Gleb Natapov
2009-11-02 15:19       ` Avi Kivity
2009-11-02 15:19         ` Avi Kivity
2009-11-01 11:56 ` [PATCH 09/11] Maintain preemptability count even for !CONFIG_PREEMPT kernels Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-02  9:24   ` Ingo Molnar
2009-11-02  9:24     ` Ingo Molnar
2009-11-01 11:56 ` [PATCH 10/11] Handle async PF in non preemptable context Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov
2009-11-01 11:56 ` [PATCH 11/11] Send async PF when guest is not in userspace too Gleb Natapov
2009-11-01 11:56   ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091102155410.GE27911@redhat.com \
    --to=gleb@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.