From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Yang, Sheng" Subject: Re: Remaining passthrough/VT-d tasks list Date: Sun, 28 Sep 2008 13:54:55 +0800 Message-ID: <200809281354.56553.sheng.yang@intel.com> References: <0122C7C995D32147B66BF4F440D3016301C49E61@pdsmsx415.ccr.corp.intel.com> <48DF1046.1050102@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 7bit Cc: "Tian, Kevin" , "Han, Weidong" , "kvm@vger.kernel.org" , Amit Shah , "benami@il.ibm.com" , "muli@il.ibm.com" , "Kay, Allen M" , "Zhang, Xiantao" , Eddie Dong To: Avi Kivity Return-path: Received: from mga11.intel.com ([192.55.52.93]:8897 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750925AbYI1FyH (ORCPT ); Sun, 28 Sep 2008 01:54:07 -0400 In-Reply-To: <48DF1046.1050102@redhat.com> Content-Disposition: inline Sender: kvm-owner@vger.kernel.org List-ID: On Sunday 28 September 2008 13:04:06 Avi Kivity wrote: > Tian, Kevin wrote: > >> No. Maybe the Neocleus polarity trick (which also reduces performance). > > > > To my knowledge, Neocleus polarity trick can't solve this isolation > > issue, which just provides one effecient way to track > > assertion/deassertion transition on the irq line. For example, reverse > > polarity when receiving an instance, and then a new irq instance would > > occur when all devices de- assert on shared irq line, and then recover > > the polarity. In your concerned case where guest driver misbehaves, this > > polarity trick can't work neither as one device always asserts the line. > > You're right, I didn't think it through. > > If there was a standard way to mask pci irqs, it might have worked, but > there isn't, unfortunately. > One purpose: If we suffered from IRQ storm of one level triggered irq line, two possible: host issue or guest issue. If it's a host issue, host should try to stop it. If it can't, the IRQ line would be disabled, and guest device also isn't functional. If it's a guest issue, guest should try to stop it, and prevent it from causing trouble in host. KVM should try best including disable guest device to do this. So guest device also won't functional. Base on above theory, we can assume that IRQ storm caused by assigned guest device, and try to stop device from doing this. (Yeah, anyway, guest device won't survive). I think we can brought a little QoS concept here(stolen from Eddie :) ). The assumption is, the normal rate of device deliver interrupts is much slower than a continuous level trigger if the EOI is wrote immediately. So we can do something with the gap. Measure the calling rate of our irq handler, if it's exceed some reasonable threshold, KVM would try to stop guest device for a while (even it don't know if the guest device cause this). First to try set interrupt disable bit in Device Control Register, wait for a period of time, then check again. If the irq strom can't be stopped, KVM try a more aggressive way: Do the Function Level Reset. It's should be the end of device's life... Oh, of course, if even FLR didn't solve the IRQ storm, that's host's issue. Let's wait host to disable the IRQ line - of course, the guest device can't be recovered too. It's just a initial purpose, I think it may work. The problem is if the gap is easy to catch... But at least, I think a physical continuous one should be much different from any working ones... -- regards Yang, Sheng