From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:48072) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RqR0k-0005HE-QG for qemu-devel@nongnu.org; Thu, 26 Jan 2012 10:13:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RqR0e-0000IQ-P1 for qemu-devel@nongnu.org; Thu, 26 Jan 2012 10:13:14 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56099) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RqR0e-0000ID-Ge for qemu-devel@nongnu.org; Thu, 26 Jan 2012 10:13:08 -0500 Message-ID: <4F216D4E.7030405@redhat.com> Date: Thu, 26 Jan 2012 17:12:14 +0200 From: Avi Kivity MIME-Version: 1.0 References: <4F1F971B.4020309@endace.com> <20120126091436.GB13974@redhat.com> <4F215A9B.6090204@redhat.com> <20120126143626.GE17198@redhat.com> In-Reply-To: <20120126143626.GE17198@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Alexey Korolev , sfd@endace.com, Kevin O'Connor , "qemu-devel@nongnu.org" On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote: > On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: > > On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: > > > On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > > > > Hi, > > > > In this post > > > > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > > > > mentioned about the issues when 64Bit PCI BAR is present and 32bit > > > > address range is selected for it. > > > > The issue affects all recent qemu releases and all > > > > old and recent guest Linux kernel versions. > > > > > > > > We've done some investigations. Let me explain what happens. > > > > Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > > > > 0xF2000000] > > > > > > > > When Linux guest starts it does PCI bus enumeration. > > > > The OS enumerates 64BIT bars using the following procedure. > > > > 1. Write all FF's to lower half of 64bit BAR > > > > 2. Write address back to lower half of 64bit BAR > > > > 3. Write all FF's to higher half of 64bit BAR > > > > 4. Write address back to higher half of 64bit BAR > > > > > > > > Linux code is here: > > > > http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > > > > > > > > What does it mean for qemu? > > > > > > > > At step 1. qemu pci_default_write_config() recevies all FFs for lower > > > > part of the 64bit BAR. Then it applies the mask and converts the value > > > > to "All FF's - size + 1" (FE000000 if size is 32MB). > > > > Then pci_bar_address() checks if BAR address is valid. Since it is a > > > > 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > > > > updates topology and sends request to update mappings in KVM with new > > > > range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > > > > panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > > > > range, which is quite common. > > > > > > Do you know why does it panic? As far as I can see > > > from code at > > > http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 > > > > > > 171 pci_read_config_dword(dev, pos, &l); > > > 172 pci_write_config_dword(dev, pos, l | mask); > > > 173 pci_read_config_dword(dev, pos, &sz); > > > 174 pci_write_config_dword(dev, pos, l); > > > > > > BAR is restored: what triggers an access between lines 172 and 174? > > > > Random interrupt reading the time, likely. > > Weird, what the backtrace shows is init, unrelated > to interrupts. > It's a bug then. qemu doesn't undo the mapping correctly. If you have clear instructions, I'll try to reproduce it. -- error compiling committee.c: too many arguments to function