From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Lalancette Subject: Re: [PATCH 0/5]: Fix kdump under KVM Date: Thu, 29 Oct 2009 09:34:46 +0100 Message-ID: <4AE953A6.1000204@redhat.com> References: <1256661667-9298-1-git-send-email-clalance@redhat.com> <4AE81617.6060406@redhat.com> <4AE81959.2000801@redhat.com> <4AE81D87.1030709@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Avi Kivity Return-path: Received: from mx1.redhat.com ([209.132.183.28]:16258 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750921AbZJ2Ien (ORCPT ); Thu, 29 Oct 2009 04:34:43 -0400 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n9T8Ymet004506 for ; Thu, 29 Oct 2009 04:34:48 -0400 In-Reply-To: <4AE81D87.1030709@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > On 10/28/2009 12:13 PM, Chris Lalancette wrote: >>> The kick from i8254 code is pretty bad, as you mention. I forget why it >>> is needed at all - shouldn't kvm_set_irq() end up kicking the correct >>> >> As I understand it, that's not quite how it works. From what I can see, what >> happens is that the i8254 is programmed as an hrtimer. When the hrtimer >> expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code). >> That code is running in interrupt context, however, so you can't directly call >> "set_irq" at that point. Instead, we update the "pending" variable and defer >> work until later on. That "later on" is when we are doing a vcpu_run, at which >> point we check the "pending" variable, and if set, inject the interrupt. >> >> The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we >> don't kick them, no vcpu will wake up, and hence none of them will ever run >> "set_irq" to get it injected into the guest. >> > > Oh yes, I remember now. > >> If you have other ideas on how we might accomplish this, I'd definitely be >> interested in hearing them. >> > > We could schedule work to run in thread context. Previous to the user > return notifier work, this would cause a reloading of a bunch of msrs > and would thus be quite expensive, but now switching to kernel threads > and back is pretty cheap. > > So we could try schedule_work(). My only worry is that it uses > wake_up() instead of wake_up_sync(), so it could introduce delay. I'd > much prefer a variant the schedules immediately. > > An alternative is to make irq injection work from irq context. It's not > difficult to do technically - convert mutexes to spin_lock_irqsave() - > I'm just worried about long irqoff times with large guests (ioapic needs > to iterate over all vcpus sometimes). This is useful for other cases - > device assignment interrupt injection, irqfd for vhost/vbus. So maybe > we should go this path, and worry about reducing irqoff times later when > we bump KVM_MAX_VCPUS. > I'm starting to take a look at this. While this may be a generically useful cleanup, it occurs to me that even if we call "set_irq" from the hrtimer callback, we still have to do the "kick_vcpus" type thing. Otherwise, we still have the problem where if all vcpus are idle, none of them will be doing vcpu_run anytime soon to actually inject the interrupt into the guest. Or did I mis-understand what you are proposing? -- Chris Lalancette