qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Korolev <alexey.korolev@endace.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: sfd@endace.com, Kevin O'Connor <kevin@koconnor.net>,
	Avi Kivity <avi@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
Date: Fri, 27 Jan 2012 17:40:09 +1300	[thread overview]
Message-ID: <4F222AA9.1080107@endace.com> (raw)
In-Reply-To: <20120126143626.GE17198@redhat.com>

On 27/01/12 03:36, Michael S. Tsirkin wrote:
> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote:
>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote:
>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote:
>>>> Hi, 
>>>> In this post
>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've
>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit
>>>> address range is selected for it.
>>>> The issue affects all recent qemu releases and all
>>>> old and recent guest Linux kernel versions.
>>>>
>>>> We've done some investigations. Let me explain what happens.
>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 -
>>>> 0xF2000000]
>>>>
>>>> When Linux guest starts it does PCI bus enumeration.
>>>> The OS enumerates 64BIT bars using the following procedure.
>>>> 1. Write all FF's to lower half of 64bit BAR
>>>> 2. Write address back to lower half of 64bit BAR
>>>> 3. Write all FF's to higher half of 64bit BAR
>>>> 4. Write address back to higher half of 64bit BAR
>>>>
>>>> Linux code is here: 
>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149
>>>>
>>>> What does it mean for qemu?
>>>>
>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower
>>>> part of the 64bit BAR. Then it applies the mask and converts the value
>>>> to "All FF's - size + 1" (FE000000 if size is 32MB).
>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a
>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu
>>>> updates topology and sends request to update mappings in KVM with new
>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel
>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF
>>>> range, which is quite common.
>>> Do you know why does it panic? As far as I can see
>>> from code at
>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162
>>>
>>>  171        pci_read_config_dword(dev, pos, &l);
>>>  172        pci_write_config_dword(dev, pos, l | mask);
>>>  173        pci_read_config_dword(dev, pos, &sz);
>>>  174        pci_write_config_dword(dev, pos, l);
>>>
>>> BAR is restored: what triggers an access between lines 172 and 174?
>> Random interrupt reading the time, likely.
> Weird, what the backtrace shows is init, unrelated
> to interrupts.
Yes, it fails during ordered late_hpet_init() call. Which is a part of kernel
fs_initcall list. So no time interrupts are involved here.
Basically once the region is programmed (even temporary), area behind it is lost.
I mean if we even temporary overlap the HPET region with our BAR, backed by host user space memory, and
commit a mapping request to kvm, the information about the old mappings belonging to HPET are lost.
Even if we did this for short period of time, and later restore the original address.

>>> Also, what you describe happens on a 32 bit BAR in the same way, no?
>> So it seems.  Btw, is this procedure correct for sizing a BAR which is
>> larger than 4GB?
> There's more code sizing 64 bit BARs, but generally
> software is allowed to write any junk into enabled BARs
> as long as there aren't any memory accesses.

      parent reply	other threads:[~2012-01-27  4:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-25  5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev
2012-01-25 12:51 ` Michael S. Tsirkin
2012-01-26  3:20   ` Alexey Korolev
2012-01-25 15:38 ` Michael S. Tsirkin
2012-01-25 18:59   ` Alex Williamson
2012-01-26  3:19     ` Alexey Korolev
2012-01-26 13:51       ` Avi Kivity
2012-01-26 14:05         ` Michael S. Tsirkin
2012-01-26 14:33           ` Avi Kivity
2012-01-26  9:14 ` Michael S. Tsirkin
2012-01-26 13:52   ` Avi Kivity
2012-01-26 14:36     ` Michael S. Tsirkin
2012-01-26 15:12       ` Avi Kivity
2012-01-27  4:42         ` Alexey Korolev
2012-01-31  9:40           ` Avi Kivity
2012-01-31  9:43             ` Avi Kivity
2012-02-01  5:44               ` Alexey Korolev
2012-02-01  7:04                 ` Michael S. Tsirkin
2012-02-02  2:22                   ` Alexey Korolev
2012-01-31 10:51           ` Avi Kivity
2012-01-27  4:40       ` Alexey Korolev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F222AA9.1080107@endace.com \
    --to=alexey.korolev@endace.com \
    --cc=avi@redhat.com \
    --cc=kevin@koconnor.net \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sfd@endace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).