Re: irqdevice INTR example - Gregory Haskins

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Gregory Haskins" <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
To: "Avi Kivity" <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: irqdevice INTR example
Date: Thu, 12 Apr 2007 09:43:10 -0400	[thread overview]
Message-ID: <461DFF1C.BA47.005A.0@novell.com> (raw)
In-Reply-To: <461E2AD5.7070905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 4568 bytes --]

I have attached a new version of the patch which eliminates the condition variable (if only by name, anyway ;)

>>> On Thu, Apr 12, 2007 at  8:49 AM, in message <461E2AD5.7070905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: 
>
> 
> Actually, I am in favor of having well- defined synchronization 
> primitives.  The only issues I see with condition variables are:
> 
> -  there are many variants of mutexes in the kernel (mutex, spinlock, 
> spinlock with irqs disabled, ...), so many condvar variants are needed.

Yeah, I tried to address that with lock_ops, but I agree.

> -  they're very close to waitqueues, so perhaps there's not much 
> justification

Agreed they are very close.  I won't go into my opinion of waitqueues vs cond-vars in this forum ;)

> 
> I'm no synchronization primitive expert, though.  My comment had nothing 
> to do with my opinion on condition variables or your implementation 
> thereof.  kvm is simply not the place to introduce them.

No problem.  I think your argument is a good one.

> 
> I referred to the comment.  Maybe just "the hardware"?

Done

> 
> Ah, ok --  I misunderstood the whole thing.  The way to avoid the race is 
> to disable interrupts before entering the guest.  This way the IPI is 
> delayed until you enter guest mode:
> 
>     irq_disable();
> 
>     spin_lock();
>     vcpu- >guest_mode = 1;
>     check_whether_an_irq_is_pending_and_if_so_inject_it();
>     spin_unlock();
> 
>     asm ("vm_entry_sequence");
> 
>     vcpu- >guest_mode = 0;
>    
>     irq_enable(); // currently in the vm entry sequence
> 
>     // recheck here?
> 
> If the interrupt happens before the spin_lock(), we'll get a non- ipi 
> wakeup and then see it in check_whether().  If it happens after it we'll 
> get an IPI which will be ignored until we're snugly in guest mode.

When I first read this I thought "whoa! you want to disable interrupts during the whole time we are in GUEST mode?"  But then it dawned on me what you are suggesting:  Interrupts would be re-enabled after the context switch because we re-load the processor state?  Cool!  The thing I can't wrap my head around is what happens when the guest has IF=0 and and external interrupt comes in?  Would we still exit?

But that aside, a problem that I see is that (IIUC) IPIs use NMIs not EXT-INTs.  Assuming that is right, I suppose we might be able to do a similar trick except we also disable NMIs first (is this possible/recommended/forbidden?).

> 
> In general I find it useful to pretend there are many userspaces being 
> written for kvm, otherwise we get locked into qemu's current mode of 
> operation.

Sounds reasonable.  I will start to do the same.

> 
>> This makes sense now that I think about it because something like hlt should 
> cause a suspension of CPU activity until further notice. 
> 
> An alternative is to handle hlt in the kernel in a place where we're 
> ready for the IPI wakeup.  The downside to that is that we have to be 
> prepared for external wakeup sources as well (signal, poll, aio... messy).

Hmmm...interesting.  I wonder if there are advantages that make this worth exploring.  Oh well, I will back burner these thoughts until the SMP/PV/APIC stuff is sorted out.

> 
> It's best not to use signals internally.  Qemu relies on them and we 
> have to support them, but in kernel we should use existing kernel 
> mechanisms.

But Avi, this was *your* idea to use signals ;)  But in all seriousness, I don't know if I have a choice.  I have two requirements which constrain me:

1) I need an IPI to cause a VMEXIT
2) I need to support 2.6.16, strongly preferable as a loadable module.

According to my research (which is undoubtedly not 100% definitive), 2.6.16 or even the newer kernels do not export most of the IPI facilities.  send_sig() happens to be exported and it happens to invoke a reschedule_IPI under the hood, so its convenient.  If there is another way to get access to the IPI facility without playing games with signals, I am all ears.   But until then I don't know what else to do.  If I had the luxury of modifying the kernel source, we could just export what we needed and be done with it.   

> 
> I was interested in how - >pending() and - >read_vector() and the raise 
> callback interact, but got distracted by the, err, uniqueness of the 
> signal thing.

You still haven't weighed in here ;)  Hopefully you have a clearer picture of what I was trying to do now at least.  Whether you agree or not is another matter.

Regards,
-Greg




[-- Attachment #2: preemptible-cpu-2.patch --]
[-- Type: text/plain, Size: 6675 bytes --]

KVM: Preemptible VCPU

From:  <>

This adds support for interrupting an executing CPU

Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---

 drivers/kvm/kvm.h      |   11 ++++++++++
 drivers/kvm/kvm_main.c |   54 ++++++++++++++++++++++++++++++++++++++++++++----
 drivers/kvm/svm.c      |   35 +++++++++++++++++++++++++++++++
 drivers/kvm/vmx.c      |   35 +++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+), 5 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 58966d9..70d1bb9 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -271,6 +271,16 @@ void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
 
 #define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
 
+/*
+ * structure for maintaining info for interrupting an executing VCPU
+ */
+struct kvm_vcpu_irq {
+	spinlock_t          lock;
+	wait_queue_head_t   wq;
+	struct task_struct *task;
+	int                 pending;
+};
+
 struct kvm_vcpu {
 	struct kvm *kvm;
 	union {
@@ -284,6 +294,7 @@ struct kvm_vcpu {
 	struct kvm_run *run;
 	int interrupt_window_open;
 	struct kvm_irqdevice irq_dev;
+	struct kvm_vcpu_irq irq;
 	unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */
 	unsigned long rip;      /* needs vcpu_load_rsp_rip() */
 
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 7e00412..1cf4060 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -299,6 +299,11 @@ static struct kvm *kvm_create_vm(void)
 		struct kvm_vcpu *vcpu = &kvm->vcpus[i];
 
 		mutex_init(&vcpu->mutex);
+
+		memset(&vcpu->irq, 0, sizeof(vcpu->irq));
+		spin_lock_init(&vcpu->irq.lock);
+		init_waitqueue_head(&vcpu->irq.wq);
+
 		vcpu->cpu = -1;
 		vcpu->kvm = kvm;
 		vcpu->mmu.root_hpa = INVALID_PAGE;
@@ -2320,13 +2325,52 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
 	 * Our irq device is requesting to interrupt the vcpu.  If it is
 	 * currently running, we should inject a host IPI to force a VMEXIT 
 	 */
-	
+	struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
+
 	/*
-	 * FIXME: Implement this or the CPU wont notice the interrupt until
-	 * the next natural VMEXIT.  Note that this is how the system
-	 * has always worked, so nothing is broken here.  This is a future
-	 * enhancement
+	 * HACK ALERT!
+	 *
+	 * We want to send a virtual interrupt signal to the task that owns
+	 * the guest.  However, the signal will only force a VMEXIT (via
+	 * a reschedule IPI) if the task is currently in GUEST mode.  There
+	 * is a race condition between the time that we mark the vcpu as
+	 * running and the time the system actually enter guest mode.  Since
+	 * there doesnt appear to be any way to help with this situation from
+	 * the hardware, we are forced to wait to make sure the guest 
+	 * actually gets interrupted in a reasonable amount of time.  If it
+	 * does not, we assume that the IPI failed because it was too early
+	 * and must try again until it does.
+	 *
+	 * This condvar/spinlock/timeout/retry eliminate the race in a safe
+	 * manner, at the expense of making the INTR delivery synchronous
 	 */
+	spin_lock(&vcpu->irq.lock);
+	
+	if (vcpu->irq.task) {
+		struct timespec tmo = {
+			.tv_sec  = 0,
+			.tv_nsec = 100000 /* 100us */
+		};
+
+		BUG_ON(vcpu->irq.task == current);
+			
+		while (vcpu->irq.task) {
+			DEFINE_WAIT(__wait);	
+
+			send_sig(SIGSTOP, vcpu->irq.task, 0);	
+
+			prepare_to_wait(&vcpu->irq.wq, &__wait, 
+					TASK_UNINTERRUPTIBLE);
+			spin_unlock(&vcpu->irq.lock);
+			schedule_timeout(timespec_to_jiffies(&tmo));
+			spin_lock(&vcpu->irq.lock);
+			finish_wait(&vcpu->irq.wq, &__wait);
+		}
+		
+		vcpu->irq.pending = 1;
+	}
+	
+	spin_unlock(&vcpu->irq.lock);
 }
 
 static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index e59a548..41765bd 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1463,9 +1463,25 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	int r;
 
 again:
+	spin_lock(&vcpu->irq.lock);
+
+	/*
+	 * Setting vcpu->task signals to outsiders that the VMCS is 
+	 * effectively in GUEST mode, and therefore must be signalled
+	 * to transition the task back to HOST mode if any new interrupts
+	 * arrive.
+	 */
+	vcpu->irq.task = current;
+
+	/*
+	 * We also must inject interrupts (if any) while the irq_lock
+	 * is held
+	 */
 	if (!vcpu->mmio_read_completed)
 		do_interrupt_requests(vcpu, kvm_run);
 
+	spin_unlock(&vcpu->irq.lock);
+
 	clgi();
 
 	pre_svm_run(vcpu);
@@ -1617,6 +1633,25 @@ again:
 	reload_tss(vcpu);
 
 	/*
+	 * Signal that we have transitioned back to host mode 
+	 */
+	spin_lock(&vcpu->irq.lock);
+
+	vcpu->irq.task = NULL;
+	wake_up(&vcpu->irq.wq);
+
+	/*
+	 * If irqpending is asserted someone undoubtedly has sent us a SIGSTOP
+	 * signal.  Counter it with a SIGCONT
+	 */
+	if(vcpu->irq.pending) {
+	    send_sig(SIGCONT, current, 0);
+	    vcpu->irq.pending = 0;
+	}
+
+	spin_unlock(&vcpu->irq.lock);
+
+	/*
 	 * Profile KVM exit RIPs:
 	 */
 	if (unlikely(prof_on == KVM_PROFILING))
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index a0fdf02..1d5ce85 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1748,9 +1748,25 @@ again:
 	vmcs_writel(HOST_GS_BASE, segment_base(gs_sel));
 #endif
 
+	spin_lock(&vcpu->irq.lock);
+
+	/*
+	 * Setting vcpu->task signals to outsiders that the VMCS is 
+	 * effectively in GUEST mode, and therefore must be signalled
+	 * to transition the task back to HOST mode if any new interrupts
+	 * arrive.
+	 */
+	vcpu->irq.task = current;
+
+	/*
+	 * We also must inject interrupts (if any) while the irq_lock
+	 * is held
+	 */
 	if (!vcpu->mmio_read_completed)
 		do_interrupt_requests(vcpu, kvm_run);
 
+	spin_unlock(&vcpu->irq.lock);
+
 	if (vcpu->guest_debug.enabled)
 		kvm_guest_debug_pre(vcpu);
 
@@ -1911,6 +1927,25 @@ again:
 
 	asm ("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
 
+	/*
+	 * Signal that we have transitioned back to host mode 
+	 */
+	spin_lock(&vcpu->irq.lock);
+
+	vcpu->irq.task = NULL;
+	wake_up(&vcpu->irq.wq);
+
+	/*
+	 * If irqpending is asserted someone undoubtedly has sent us a SIGSTOP
+	 * signal.  Counter it with a SIGCONT
+	 */
+	if(vcpu->irq.pending) {
+	    send_sig(SIGCONT, current, 0);
+	    vcpu->irq.pending = 0;
+	}
+
+	spin_unlock(&vcpu->irq.lock);
+
 	if (fail) {
 		kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
 		kvm_run->fail_entry.hardware_entry_failure_reason

[-- Attachment #3: Type: text/plain, Size: 345 bytes --]

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

[-- Attachment #4: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

next prev parent reply	other threads:[~2007-04-12 13:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-12  4:02 irqdevice INTR example Gregory Haskins
     [not found] ` <461D7702.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-12  8:02   ` Avi Kivity
     [not found]     ` <461DE791.1040707-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-12  8:18       ` Christoph Hellwig
2007-04-12 11:55       ` Gregory Haskins
     [not found]         ` <461DE5C9.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-12 12:49           ` Avi Kivity
     [not found]             ` <461E2AD5.7070905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-12 13:43               ` Gregory Haskins [this message]
     [not found]                 ` <461DFF1C.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-12 14:14                   ` Avi Kivity
     [not found]                     ` <461E3EDB.3080002-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-12 16:01                       ` Gregory Haskins
2007-04-13 13:05                         ` Fwd: " Gregory Haskins
     [not found]                         ` <461E1F73.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-14 14:30                           ` Avi Kivity
     [not found]                             ` <4620E56A.7040207-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-15 22:32                               ` Gregory Haskins
2007-04-15 23:32                                 ` Gregory Haskins
     [not found]                                 ` <46226FBC.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-16  5:46                                   ` Avi Kivity

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:58966d9 dfblob:70d1bb9 dfblob:7e00412 dfblob:1cf4060
dfblob:e59a548 dfblob:41765bd dfblob:a0fdf02 dfblob:1d5ce85 )
 OR (
bs:"Re: irqdevice INTR example" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=461DFF1C.BA47.005A.0@novell.com \
    --to=ghaskins-et1tbqhtxzrqt0dzr+alfa@public.gmane.org \
    --cc=avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org \
    --cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox