Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: George Dunlap <george.dunlap@eu.citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Tim Deegan <tim@xen.org>, Yongjie Ren <yongjie.ren@intel.com>,
	yanqiangjun@huawei.com, Keir Fraser <keir@xen.org>,
	hanweidong@huawei.com, Xudong Hao <xudong.hao@intel.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	luonengjun@huawei.com, qemu-devel@nongnu.org,
	wangzhenguo@huawei.com, xiaowei.yang@huawei.com,
	arei.gonglei@huawei.com, Jan Beulich <JBeulich@suse.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	YongweiX Xu <yongweix.xu@intel.com>,
	SongtaoX Liu <songtaox.liu@intel.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>
Subject: Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M
Date: Thu, 13 Jun 2013 16:40:51 +0100	[thread overview]
Message-ID: <51B9E803.5020806@eu.citrix.com> (raw)
In-Reply-To: <1371137767.6955.26.camel@zakaz.uk.xensource.com>

On 13/06/13 16:36, Ian Campbell wrote:
> On Thu, 2013-06-13 at 16:30 +0100, George Dunlap wrote:
>> On 13/06/13 16:16, Ian Campbell wrote:
>>> On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote:
>>>> On 13/06/13 14:44, Stefano Stabellini wrote:
>>>>> On Wed, 12 Jun 2013, George Dunlap wrote:
>>>>>> On 12/06/13 08:25, Jan Beulich wrote:
>>>>>>>>>> On 11.06.13 at 19:26, Stefano Stabellini
>>>>>>>>>> <stefano.stabellini@eu.citrix.com> wrote:
>>>>>>>> I went through the code that maps the PCI MMIO regions in hvmloader
>>>>>>>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already
>>>>>>>> maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO
>>>>>>>> region is larger than 512MB.
>>>>>>>>
>>>>>>>> Maybe we could just relax this condition and map the device memory to
>>>>>>>> high memory no matter the size of the MMIO region if the PCI bar is
>>>>>>>> 64-bit?
>>>>>>> I can only recommend not to: For one, guests not using PAE or
>>>>>>> PSE-36 can't map such space at all (and older OSes may not
>>>>>>> properly deal with 64-bit BARs at all). And then one would generally
>>>>>>> expect this allocation to be done top down (to minimize risk of
>>>>>>> running into RAM), and doing so is going to present further risks of
>>>>>>> incompatibilities with guest OSes (Linux for example learned only in
>>>>>>> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in
>>>>>>> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the
>>>>>>> PFN to pfn_pte(), the respective parameter of which is
>>>>>>> "unsigned long").
>>>>>>>
>>>>>>> I think this ought to be done in an iterative process - if all MMIO
>>>>>>> regions together don't fit below 4G, the biggest one should be
>>>>>>> moved up beyond 4G first, followed by the next to biggest one
>>>>>>> etc.
>>>>>> First of all, the proposal to move the PCI BAR up to the 64-bit range is a
>>>>>> temporary work-around.  It should only be done if a device doesn't fit in the
>>>>>> current MMIO range.
>>>>>>
>>>>>> We have three options here:
>>>>>> 1. Don't do anything
>>>>>> 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they don't
>>>>>> fit
>>>>>> 3. Convince qemu to allow MMIO regions to mask memory (or what it thinks is
>>>>>> memory).
>>>>>> 4. Add a mechanism to tell qemu that memory is being relocated.
>>>>>>
>>>>>> Number 4 is definitely the right answer long-term, but we just don't have time
>>>>>> to do that before the 4.3 release.  We're not sure yet if #3 is possible; even
>>>>>> if it is, it may have unpredictable knock-on effects.
>>>>>>
>>>>>> Doing #2, it is true that many guests will be unable to access the device
>>>>>> because of 32-bit limitations.  However, in #1, *no* guests will be able to
>>>>>> access the device.  At least in #2, *many* guests will be able to do so.  In
>>>>>> any case, apparently #2 is what KVM does, so having the limitation on guests
>>>>>> is not without precedent.  It's also likely to be a somewhat tested
>>>>>> configuration (unlike #3, for example).
>>>>> I would avoid #3, because I don't think is a good idea to rely on that
>>>>> behaviour.
>>>>> I would also avoid #4, because having seen QEMU's code, it's wouldn't be
>>>>> easy and certainly not doable in time for 4.3.
>>>>>
>>>>> So we are left to play with the PCI MMIO region size and location in
>>>>> hvmloader.
>>>>>
>>>>> I agree with Jan that we shouldn't relocate unconditionally all the
>>>>> devices to the region above 4G. I meant to say that we should relocate
>>>>> only the ones that don't fit. And we shouldn't try to dynamically
>>>>> increase the PCI hole below 4G because clearly that doesn't work.
>>>>> However we could still increase the size of the PCI hole below 4G by
>>>>> default from start at 0xf0000000 to starting at 0xe0000000.
>>>>> Why do we know that is safe? Because in the current configuration
>>>>> hvmloader *already* increases the PCI hole size by decreasing the start
>>>>> address every time a device doesn't fit.
>>>>> So it's already common for hvmloader to set pci_mem_start to
>>>>> 0xe0000000, you just need to assign a device with a PCI hole size big
>>>>> enough.
>>> Isn't this the exact case which is broken? And therefore not known safe
>>> at all?
>>>
>>>>> My proposed solution is:
>>>>>
>>>>> - set 0xe0000000 as the default PCI hole start for everybody, including
>>>>> qemu-xen-traditional
>>> What is the impact on existing qemu-trad guests?
>>>
>>> It does mean that guest which were installed with a bit less than 4GB
>>> RAM may now find a little bit of RAM moves above 4GB to make room for
>>> the bigger whole. If they can dynamically enable PAE that might be ok.
>>>
>>> Does this have any impact on Windows activation?
>>>
>>>>> - move above 4G everything that doesn't fit and support 64-bit bars
>>>>> - print an error if the device doesn't fit and doesn't support 64-bit
>>>>> bars
>>>> Also, as I understand it, at the moment:
>>>> 1. Some operating systems (32-bit XP) won't be able to use relocated devices
>>>> 2. Some devices (without 64-bit BARs) can't be relocated
>>>> 3. qemu-traditional is fine with a resized <4GiB MMIO hole.
>>>>
>>>> So if we have #1 or #2, at the moment an option for a work-around is to
>>>> use qemu-traditional.
>>>>
>>>> However, if we add your "print an error if the device doesn't fit", then
>>>> this option will go away -- this will be a regression in functionality
>>>> from 4.2.
>>> Only if print an error also involves aborting. It could print an error
>>> (lets call it a warning) and continue, which would leave the workaround
>>> viable.\
>> No, because if hvmloader doesn't increase the size of the MMIO hole,
>> then the device won't actually work.  The guest will boot, but the OS
>> will not be able to use it.
> I meant continue as in increasing the hole too, although rereading the
> thread maybe that's not what everyone else was talking about ;-)

Well if you continue increasing the hole, then it works on 
qemu-traditional but on qemu-xen you have weird crashes and guest hangs 
at some point in the future when qemu tries to map a non-existent guest 
memory address -- that's much worse than the device just not being 
visible to the OS.

That's the point -- current behavior on qemu-xen causes weird hangs; but 
the simple way of preventing those hangs (just not increasing the MMIO 
hole size) removes functionality from both qemu-xen and 
qemu-traditional, even though qemu-traditional doesn't have any problems 
with the resized MMIO hole.

So there's no simple way to avoid random crashes while keeping the 
work-around functional; that's why someone suggested adding a xenstore 
key to tell hvmloader what to do.

At least, that's what I understood the situation to be -- someone 
correct me if I'm wrong. :-)

  -George

next prev parent reply	other threads:[~2013-06-13 15:41 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <EE92950F97EE42469CA4F508D4691F5E016FAD15@SHSMSX104.ccr.corp.intel.com>
     [not found] ` <alpine.DEB.2.02.1306071246270.4589@kaball.uk.xensource.com>
     [not found]   ` <51B1FF50.90406@eu.citrix.com>
     [not found]     ` <alpine.DEB.2.02.1306071655060.4589@kaball.uk.xensource.com>
     [not found]       ` <403610A45A2B5242BD291EDAE8B37D3010E56731@SHSMSX102.ccr.corp.intel.com>
     [not found]         ` <CAFLBxZZfH8im-hTrma29Ag7CUR1HZEm=4b7ft_h5weukGL1BzQ@mail.gmail.com>
2013-06-11 17:26           ` [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M Stefano Stabellini
2013-06-12  7:25             ` Jan Beulich
2013-06-12  8:31               ` Ian Campbell
2013-06-12  9:02                 ` Jan Beulich
2013-06-12  9:22                   ` Ian Campbell
2013-06-12 10:07                     ` Jan Beulich
2013-06-12 11:23                       ` Ian Campbell
2013-06-12 11:56                         ` Jan Beulich
2013-06-12 11:59                           ` Ian Campbell
2013-06-12 10:05               ` George Dunlap
2013-06-12 10:11                 ` Jan Beulich
2013-06-12 10:15                   ` George Dunlap
2013-06-12 13:23                 ` Paolo Bonzini
2013-06-12 13:49                   ` Jan Beulich
2013-06-12 14:02                     ` Paolo Bonzini
2013-06-12 14:19                       ` Jan Beulich
2013-06-12 15:25                         ` George Dunlap
2013-06-12 20:13                           ` Paolo Bonzini
2013-06-13 13:44                 ` Stefano Stabellini
2013-06-13 13:54                   ` George Dunlap
2013-06-13 14:50                     ` Stefano Stabellini
2013-06-13 15:06                       ` Jan Beulich
2013-06-13 15:29                       ` George Dunlap
2013-06-13 16:13                         ` Stefano Stabellini
2013-06-13 15:34                       ` Ian Campbell
2013-06-13 16:55                         ` Stefano Stabellini
2013-06-13 17:22                           ` Ian Campbell
2013-06-14 10:53                             ` George Dunlap
2013-06-14 11:34                               ` Ian Campbell
2013-06-14 14:14                                 ` George Dunlap
2013-06-14 14:36                                   ` George Dunlap
2013-06-13 14:54                     ` Paolo Bonzini
2013-06-13 15:16                     ` Ian Campbell
2013-06-13 15:30                       ` George Dunlap
2013-06-13 15:36                         ` Ian Campbell
2013-06-13 15:40                           ` George Dunlap [this message]
2013-06-13 15:42                             ` Ian Campbell
2013-06-13 15:40                       ` Stefano Stabellini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B9E803.5020806@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=arei.gonglei@huawei.com \
    --cc=hanweidong@huawei.com \
    --cc=keir@xen.org \
    --cc=luonengjun@huawei.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=songtaox.liu@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=tim@xen.org \
    --cc=wangzhenguo@huawei.com \
    --cc=xen-devel@lists.xensource.com \
    --cc=xiaowei.yang@huawei.com \
    --cc=xudong.hao@intel.com \
    --cc=yanqiangjun@huawei.com \
    --cc=yongjie.ren@intel.com \
    --cc=yongweix.xu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).