From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH 0/5]: Fix kdump under KVM Date: Wed, 28 Oct 2009 12:31:35 +0200 Message-ID: <4AE81D87.1030709@redhat.com> References: <1256661667-9298-1-git-send-email-clalance@redhat.com> <4AE81617.6060406@redhat.com> <4AE81959.2000801@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Chris Lalancette Return-path: Received: from mx1.redhat.com ([209.132.183.28]:13465 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753430AbZJ1Kbc (ORCPT ); Wed, 28 Oct 2009 06:31:32 -0400 Received: from int-mx04.intmail.prod.int.phx2.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.17]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n9SAVbpW016402 for ; Wed, 28 Oct 2009 06:31:37 -0400 In-Reply-To: <4AE81959.2000801@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 10/28/2009 12:13 PM, Chris Lalancette wrote: > >> The kick from i8254 code is pretty bad, as you mention. I forget why it >> is needed at all - shouldn't kvm_set_irq() end up kicking the correct >> > As I understand it, that's not quite how it works. From what I can see, what > happens is that the i8254 is programmed as an hrtimer. When the hrtimer > expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code). > That code is running in interrupt context, however, so you can't directly call > "set_irq" at that point. Instead, we update the "pending" variable and defer > work until later on. That "later on" is when we are doing a vcpu_run, at which > point we check the "pending" variable, and if set, inject the interrupt. > > The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we > don't kick them, no vcpu will wake up, and hence none of them will ever run > "set_irq" to get it injected into the guest. > Oh yes, I remember now. > If you have other ideas on how we might accomplish this, I'd definitely be > interested in hearing them. > We could schedule work to run in thread context. Previous to the user return notifier work, this would cause a reloading of a bunch of msrs and would thus be quite expensive, but now switching to kernel threads and back is pretty cheap. So we could try schedule_work(). My only worry is that it uses wake_up() instead of wake_up_sync(), so it could introduce delay. I'd much prefer a variant the schedules immediately. An alternative is to make irq injection work from irq context. It's not difficult to do technically - convert mutexes to spin_lock_irqsave() - I'm just worried about long irqoff times with large guests (ioapic needs to iterate over all vcpus sometimes). This is useful for other cases - device assignment interrupt injection, irqfd for vhost/vbus. So maybe we should go this path, and worry about reducing irqoff times later when we bump KVM_MAX_VCPUS. -- error compiling committee.c: too many arguments to function