Re: irqdevice INTR example - Gregory Haskins

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Gregory Haskins" <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
To: "Avi Kivity" <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: irqdevice INTR example
Date: Thu, 12 Apr 2007 09:43:10 -0400	[thread overview]
Message-ID: <461DFF1C.BA47.005A.0@novell.com> (raw)
In-Reply-To: <461E2AD5.7070905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 4568 bytes --]

I have attached a new version of the patch which eliminates the condition variable (if only by name, anyway ;)

>>> On Thu, Apr 12, 2007 at  8:49 AM, in message <461E2AD5.7070905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>,
Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote: 
>
> 
> Actually, I am in favor of having well- defined synchronization 
> primitives.  The only issues I see with condition variables are:
> 
> -  there are many variants of mutexes in the kernel (mutex, spinlock, 
> spinlock with irqs disabled, ...), so many condvar variants are needed.

Yeah, I tried to address that with lock_ops, but I agree.

> -  they're very close to waitqueues, so perhaps there's not much 
> justification

Agreed they are very close.  I won't go into my opinion of waitqueues vs cond-vars in this forum ;)

> 
> I'm no synchronization primitive expert, though.  My comment had nothing 
> to do with my opinion on condition variables or your implementation 
> thereof.  kvm is simply not the place to introduce them.

No problem.  I think your argument is a good one.

> 
> I referred to the comment.  Maybe just "the hardware"?

Done

> 
> Ah, ok --  I misunderstood the whole thing.  The way to avoid the race is 
> to disable interrupts before entering the guest.  This way the IPI is 
> delayed until you enter guest mode:
> 
>     irq_disable();
> 
>     spin_lock();
>     vcpu- >guest_mode = 1;
>     check_whether_an_irq_is_pending_and_if_so_inject_it();
>     spin_unlock();
> 
>     asm ("vm_entry_sequence");
> 
>     vcpu- >guest_mode = 0;
>    
>     irq_enable(); // currently in the vm entry sequence
> 
>     // recheck here?
> 
> If the interrupt happens before the spin_lock(), we'll get a non- ipi 
> wakeup and then see it in check_whether().  If it happens after it we'll 
> get an IPI which will be ignored until we're snugly in guest mode.

When I first read this I thought "whoa! you want to disable interrupts during the whole time we are in GUEST mode?"  But then it dawned on me what you are suggesting:  Interrupts would be re-enabled after the context switch because we re-load the processor state?  Cool!  The thing I can't wrap my head around is what happens when the guest has IF=0 and and external interrupt comes in?  Would we still exit?

But that aside, a problem that I see is that (IIUC) IPIs use NMIs not EXT-INTs.  Assuming that is right, I suppose we might be able to do a similar trick except we also disable NMIs first (is this possible/recommended/forbidden?).

> 
> In general I find it useful to pretend there are many userspaces being 
> written for kvm, otherwise we get locked into qemu's current mode of 
> operation.

Sounds reasonable.  I will start to do the same.

> 
>> This makes sense now that I think about it because something like hlt should 
> cause a suspension of CPU activity until further notice. 
> 
> An alternative is to handle hlt in the kernel in a place where we're 
> ready for the IPI wakeup.  The downside to that is that we have to be 
> prepared for external wakeup sources as well (signal, poll, aio... messy).

Hmmm...interesting.  I wonder if there are advantages that make this worth exploring.  Oh well, I will back burner these thoughts until the SMP/PV/APIC stuff is sorted out.

> 
> It's best not to use signals internally.  Qemu relies on them and we 
> have to support them, but in kernel we should use existing kernel 
> mechanisms.

But Avi, this was *your* idea to use signals ;)  But in all seriousness, I don't know if I have a choice.  I have two requirements which constrain me:

1) I need an IPI to cause a VMEXIT
2) I need to support 2.6.16, strongly preferable as a loadable module.

According to my research (which is undoubtedly not 100% definitive), 2.6.16 or even the newer kernels do not export most of the IPI facilities.  send_sig() happens to be exported and it happens to invoke a reschedule_IPI under the hood, so its convenient.  If there is another way to get access to the IPI facility without playing games with signals, I am all ears.   But until then I don't know what else to do.  If I had the luxury of modifying the kernel source, we could just export what we needed and be done with it.   

> 
> I was interested in how - >pending() and - >read_vector() and the raise 
> callback interact, but got distracted by the, err, uniqueness of the 
> signal thing.

You still haven't weighed in here ;)  Hopefully you have a clearer picture of what I was trying to do now at least.  Whether you agree or not is another matter.

Regards,
-Greg




[-- Attachment #2: preemptible-cpu-2.patch --]
[-- Type: text/plain, Size: 6675 bytes --]

KVM: Preemptible VCPU

From:  <>

This adds support for interrupting an executing CPU

Signed-off-by: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
---

 drivers/kvm/kvm.h      |   11 ++++++++++
 drivers/kvm/kvm_main.c |   54 ++++++++++++++++++++++++++++++++++++++++++++----
 drivers/kvm/svm.c      |   35 +++++++++++++++++++++++++++++++
 drivers/kvm/vmx.c      |   35 +++++++++++++++++++++++++++++++
 4 files changed, 130 insertions(+), 5 deletions(-)

diff --git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
index 58966d9..70d1bb9 100644
--- a/drivers/kvm/kvm.h
+++ b/drivers/kvm/kvm.h
@@ -271,6 +271,16 @@ void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
 
 #define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
 
+/*
+ * structure for maintaining info for interrupting an executing VCPU
+ */
+struct kvm_vcpu_irq {
+	spinlock_t          lock;
+	wait_queue_head_t   wq;
+	struct task_struct *task;
+	int                 pending;
+};
+
 struct kvm_vcpu {
 	struct kvm *kvm;
 	union {
@@ -284,6 +294,7 @@ struct kvm_vcpu {
 	struct kvm_run *run;
 	int interrupt_window_open;
 	struct kvm_irqdevice irq_dev;
+	struct kvm_vcpu_irq irq;
 	unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */
 	unsigned long rip;      /* needs vcpu_load_rsp_rip() */
 
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 7e00412..1cf4060 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -299,6 +299,11 @@ static struct kvm *kvm_create_vm(void)
 		struct kvm_vcpu *vcpu = &kvm->vcpus[i];
 
 		mutex_init(&vcpu->mutex);
+
+		memset(&vcpu->irq, 0, sizeof(vcpu->irq));
+		spin_lock_init(&vcpu->irq.lock);
+		init_waitqueue_head(&vcpu->irq.wq);
+
 		vcpu->cpu = -1;
 		vcpu->kvm = kvm;
 		vcpu->mmu.root_hpa = INVALID_PAGE;
@@ -2320,13 +2325,52 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
 	 * Our irq device is requesting to interrupt the vcpu.  If it is
 	 * currently running, we should inject a host IPI to force a VMEXIT 
 	 */
-	
+	struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this->private;
+
 	/*
-	 * FIXME: Implement this or the CPU wont notice the interrupt until
-	 * the next natural VMEXIT.  Note that this is how the system
-	 * has always worked, so nothing is broken here.  This is a future
-	 * enhancement
+	 * HACK ALERT!
+	 *
+	 * We want to send a virtual interrupt signal to the task that owns
+	 * the guest.  However, the signal will only force a VMEXIT (via
+	 * a reschedule IPI) if the task is currently in GUEST mode.  There
+	 * is a race condition between the time that we mark the vcpu as
+	 * running and the time the system actually enter guest mode.  Since
+	 * there doesnt appear to be any way to help with this situation from
+	 * the hardware, we are forced to wait to make sure the guest 
+	 * actually gets interrupted in a reasonable amount of time.  If it
+	 * does not, we assume that the IPI failed because it was too early
+	 * and must try again until it does.
+	 *
+	 * This condvar/spinlock/timeout/retry eliminate the race in a safe
+	 * manner, at the expense of making the INTR delivery synchronous
 	 */
+	spin_lock(&vcpu->irq.lock);
+	
+	if (vcpu->irq.task) {
+		struct timespec tmo = {
+			.tv_sec  = 0,
+			.tv_nsec = 100000 /* 100us */
+		};
+
+		BUG_ON(vcpu->irq.task == current);
+			
+		while (vcpu->irq.task) {
+			DEFINE_WAIT(__wait);	
+
+			send_sig(SIGSTOP, vcpu->irq.task, 0);	
+
+			prepare_to_wait(&vcpu->irq.wq, &__wait, 
+					TASK_UNINTERRUPTIBLE);
+			spin_unlock(&vcpu->irq.lock);
+			schedule_timeout(timespec_to_jiffies(&tmo));
+			spin_lock(&vcpu->irq.lock);
+			finish_wait(&vcpu->irq.wq, &__wait);
+		}
+		
+		vcpu->irq.pending = 1;
+	}
+	
+	spin_unlock(&vcpu->irq.lock);
 }
 
 static void kvm_vcpu_irqsink_init(struct kvm_vcpu *vcpu)
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index e59a548..41765bd 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -1463,9 +1463,25 @@ static int svm_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	int r;
 
 again:
+	spin_lock(&vcpu->irq.lock);
+
+	/*
+	 * Setting vcpu->task signals to outsiders that the VMCS is 
+	 * effectively in GUEST mode, and therefore must be signalled
+	 * to transition the task back to HOST mode if any new interrupts
+	 * arrive.
+	 */
+	vcpu->irq.task = current;
+
+	/*
+	 * We also must inject interrupts (if any) while the irq_lock
+	 * is held
+	 */
 	if (!vcpu->mmio_read_completed)
 		do_interrupt_requests(vcpu, kvm_run);
 
+	spin_unlock(&vcpu->irq.lock);
+
 	clgi();
 
 	pre_svm_run(vcpu);
@@ -1617,6 +1633,25 @@ again:
 	reload_tss(vcpu);
 
 	/*
+	 * Signal that we have transitioned back to host mode 
+	 */
+	spin_lock(&vcpu->irq.lock);
+
+	vcpu->irq.task = NULL;
+	wake_up(&vcpu->irq.wq);
+
+	/*
+	 * If irqpending is asserted someone undoubtedly has sent us a SIGSTOP
+	 * signal.  Counter it with a SIGCONT
+	 */
+	if(vcpu->irq.pending) {
+	    send_sig(SIGCONT, current, 0);
+	    vcpu->irq.pending = 0;
+	}
+
+	spin_unlock(&vcpu->irq.lock);
+
+	/*
 	 * Profile KVM exit RIPs:
 	 */
 	if (unlikely(prof_on == KVM_PROFILING))
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index a0fdf02..1d5ce85 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1748,9 +1748,25 @@ again:
 	vmcs_writel(HOST_GS_BASE, segment_base(gs_sel));
 #endif
 
+	spin_lock(&vcpu->irq.lock);
+
+	/*
+	 * Setting vcpu->task signals to outsiders that the VMCS is 
+	 * effectively in GUEST mode, and therefore must be signalled
+	 * to transition the task back to HOST mode if any new interrupts
+	 * arrive.
+	 */
+	vcpu->irq.task = current;
+
+	/*
+	 * We also must inject interrupts (if any) while the irq_lock
+	 * is held
+	 */
 	if (!vcpu->mmio_read_completed)
 		do_interrupt_requests(vcpu, kvm_run);
 
+	spin_unlock(&vcpu->irq.lock);
+
 	if (vcpu->guest_debug.enabled)
 		kvm_guest_debug_pre(vcpu);
 
@@ -1911,6 +1927,25 @@ again:
 
 	asm ("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
 
+	/*
+	 * Signal that we have transitioned back to host mode 
+	 */
+	spin_lock(&vcpu->irq.lock);
+
+	vcpu->irq.task = NULL;
+	wake_up(&vcpu->irq.wq);
+
+	/*
+	 * If irqpending is asserted someone undoubtedly has sent us a SIGSTOP
+	 * signal.  Counter it with a SIGCONT
+	 */
+	if(vcpu->irq.pending) {
+	    send_sig(SIGCONT, current, 0);
+	    vcpu->irq.pending = 0;
+	}
+
+	spin_unlock(&vcpu->irq.lock);
+
 	if (fail) {
 		kvm_run->exit_reason = KVM_EXIT_FAIL_ENTRY;
 		kvm_run->fail_entry.hardware_entry_failure_reason

[-- Attachment #3: Type: text/plain, Size: 345 bytes --]

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

[-- Attachment #4: Type: text/plain, Size: 186 bytes --]

_______________________________________________
kvm-devel mailing list
kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/kvm-devel

next prev parent reply	other threads:[~2007-04-12 13:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-12  4:02 irqdevice INTR example Gregory Haskins
     [not found] ` <461D7702.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-12  8:02   ` Avi Kivity
     [not found]     ` <461DE791.1040707-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-12  8:18       ` Christoph Hellwig
2007-04-12 11:55       ` Gregory Haskins
     [not found]         ` <461DE5C9.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-12 12:49           ` Avi Kivity
     [not found]             ` <461E2AD5.7070905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-12 13:43               ` Gregory Haskins [this message]
     [not found]                 ` <461DFF1C.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-12 14:14                   ` Avi Kivity
     [not found]                     ` <461E3EDB.3080002-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-12 16:01                       ` Gregory Haskins
2007-04-13 13:05                         ` Fwd: " Gregory Haskins
     [not found]                         ` <461E1F73.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-14 14:30                           ` Avi Kivity
     [not found]                             ` <4620E56A.7040207-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-15 22:32                               ` Gregory Haskins
2007-04-15 23:32                                 ` Gregory Haskins
     [not found]                                 ` <46226FBC.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-16  5:46                                   ` Avi Kivity

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:58966d9 dfblob:70d1bb9 dfblob:7e00412 dfblob:1cf4060
dfblob:e59a548 dfblob:41765bd dfblob:a0fdf02 dfblob:1d5ce85 )
 OR (
bs:"Re: irqdevice INTR example" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=461DFF1C.BA47.005A.0@novell.com \
    --to=ghaskins-et1tbqhtxzrqt0dzr+alfa@public.gmane.org \
    --cc=avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org \
    --cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.