From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39294) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Un9eG-0003sk-GO for qemu-devel@nongnu.org; Thu, 13 Jun 2013 11:41:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Un9e8-0005Y4-33 for qemu-devel@nongnu.org; Thu, 13 Jun 2013 11:41:16 -0400 Received: from smtp.citrix.com ([66.165.176.89]:22283) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Un9e7-0005Xw-P1 for qemu-devel@nongnu.org; Thu, 13 Jun 2013 11:41:08 -0400 Message-ID: <51B9E803.5020806@eu.citrix.com> Date: Thu, 13 Jun 2013 16:40:51 +0100 From: George Dunlap MIME-Version: 1.0 References: <51B1FF50.90406@eu.citrix.com> <403610A45A2B5242BD291EDAE8B37D3010E56731@SHSMSX102.ccr.corp.intel.com> <51B83E7A02000078000DD6E9@nat28.tlf.novell.com> <51B847E3.5010604@eu.citrix.com> <51B9CF26.1080707@eu.citrix.com> <1371136604.6955.12.camel@zakaz.uk.xensource.com> <51B9E5A6.4050607@eu.citrix.com> <1371137767.6955.26.camel@zakaz.uk.xensource.com> In-Reply-To: <1371137767.6955.26.camel@zakaz.uk.xensource.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Xen-devel] [BUG 1747]Guest could't find bootable device with memory more than 3600M List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ian Campbell Cc: Tim Deegan , Yongjie Ren , yanqiangjun@huawei.com, Keir Fraser , hanweidong@huawei.com, Xudong Hao , Stefano Stabellini , luonengjun@huawei.com, qemu-devel@nongnu.org, wangzhenguo@huawei.com, xiaowei.yang@huawei.com, arei.gonglei@huawei.com, Jan Beulich , Paolo Bonzini , YongweiX Xu , SongtaoX Liu , "xen-devel@lists.xensource.com" On 13/06/13 16:36, Ian Campbell wrote: > On Thu, 2013-06-13 at 16:30 +0100, George Dunlap wrote: >> On 13/06/13 16:16, Ian Campbell wrote: >>> On Thu, 2013-06-13 at 14:54 +0100, George Dunlap wrote: >>>> On 13/06/13 14:44, Stefano Stabellini wrote: >>>>> On Wed, 12 Jun 2013, George Dunlap wrote: >>>>>> On 12/06/13 08:25, Jan Beulich wrote: >>>>>>>>>> On 11.06.13 at 19:26, Stefano Stabellini >>>>>>>>>> wrote: >>>>>>>> I went through the code that maps the PCI MMIO regions in hvmloader >>>>>>>> (tools/firmware/hvmloader/pci.c:pci_setup) and it looks like it already >>>>>>>> maps the PCI region to high memory if the PCI bar is 64-bit and the MMIO >>>>>>>> region is larger than 512MB. >>>>>>>> >>>>>>>> Maybe we could just relax this condition and map the device memory to >>>>>>>> high memory no matter the size of the MMIO region if the PCI bar is >>>>>>>> 64-bit? >>>>>>> I can only recommend not to: For one, guests not using PAE or >>>>>>> PSE-36 can't map such space at all (and older OSes may not >>>>>>> properly deal with 64-bit BARs at all). And then one would generally >>>>>>> expect this allocation to be done top down (to minimize risk of >>>>>>> running into RAM), and doing so is going to present further risks of >>>>>>> incompatibilities with guest OSes (Linux for example learned only in >>>>>>> 2.6.36 that PFNs in ioremap() can exceed 32 bits, but even in >>>>>>> 3.10-rc5 ioremap_pte_range(), while using "u64 pfn", passes the >>>>>>> PFN to pfn_pte(), the respective parameter of which is >>>>>>> "unsigned long"). >>>>>>> >>>>>>> I think this ought to be done in an iterative process - if all MMIO >>>>>>> regions together don't fit below 4G, the biggest one should be >>>>>>> moved up beyond 4G first, followed by the next to biggest one >>>>>>> etc. >>>>>> First of all, the proposal to move the PCI BAR up to the 64-bit range is a >>>>>> temporary work-around. It should only be done if a device doesn't fit in the >>>>>> current MMIO range. >>>>>> >>>>>> We have three options here: >>>>>> 1. Don't do anything >>>>>> 2. Have hvmloader move PCI devices up to the 64-bit MMIO hole if they don't >>>>>> fit >>>>>> 3. Convince qemu to allow MMIO regions to mask memory (or what it thinks is >>>>>> memory). >>>>>> 4. Add a mechanism to tell qemu that memory is being relocated. >>>>>> >>>>>> Number 4 is definitely the right answer long-term, but we just don't have time >>>>>> to do that before the 4.3 release. We're not sure yet if #3 is possible; even >>>>>> if it is, it may have unpredictable knock-on effects. >>>>>> >>>>>> Doing #2, it is true that many guests will be unable to access the device >>>>>> because of 32-bit limitations. However, in #1, *no* guests will be able to >>>>>> access the device. At least in #2, *many* guests will be able to do so. In >>>>>> any case, apparently #2 is what KVM does, so having the limitation on guests >>>>>> is not without precedent. It's also likely to be a somewhat tested >>>>>> configuration (unlike #3, for example). >>>>> I would avoid #3, because I don't think is a good idea to rely on that >>>>> behaviour. >>>>> I would also avoid #4, because having seen QEMU's code, it's wouldn't be >>>>> easy and certainly not doable in time for 4.3. >>>>> >>>>> So we are left to play with the PCI MMIO region size and location in >>>>> hvmloader. >>>>> >>>>> I agree with Jan that we shouldn't relocate unconditionally all the >>>>> devices to the region above 4G. I meant to say that we should relocate >>>>> only the ones that don't fit. And we shouldn't try to dynamically >>>>> increase the PCI hole below 4G because clearly that doesn't work. >>>>> However we could still increase the size of the PCI hole below 4G by >>>>> default from start at 0xf0000000 to starting at 0xe0000000. >>>>> Why do we know that is safe? Because in the current configuration >>>>> hvmloader *already* increases the PCI hole size by decreasing the start >>>>> address every time a device doesn't fit. >>>>> So it's already common for hvmloader to set pci_mem_start to >>>>> 0xe0000000, you just need to assign a device with a PCI hole size big >>>>> enough. >>> Isn't this the exact case which is broken? And therefore not known safe >>> at all? >>> >>>>> My proposed solution is: >>>>> >>>>> - set 0xe0000000 as the default PCI hole start for everybody, including >>>>> qemu-xen-traditional >>> What is the impact on existing qemu-trad guests? >>> >>> It does mean that guest which were installed with a bit less than 4GB >>> RAM may now find a little bit of RAM moves above 4GB to make room for >>> the bigger whole. If they can dynamically enable PAE that might be ok. >>> >>> Does this have any impact on Windows activation? >>> >>>>> - move above 4G everything that doesn't fit and support 64-bit bars >>>>> - print an error if the device doesn't fit and doesn't support 64-bit >>>>> bars >>>> Also, as I understand it, at the moment: >>>> 1. Some operating systems (32-bit XP) won't be able to use relocated devices >>>> 2. Some devices (without 64-bit BARs) can't be relocated >>>> 3. qemu-traditional is fine with a resized <4GiB MMIO hole. >>>> >>>> So if we have #1 or #2, at the moment an option for a work-around is to >>>> use qemu-traditional. >>>> >>>> However, if we add your "print an error if the device doesn't fit", then >>>> this option will go away -- this will be a regression in functionality >>>> from 4.2. >>> Only if print an error also involves aborting. It could print an error >>> (lets call it a warning) and continue, which would leave the workaround >>> viable.\ >> No, because if hvmloader doesn't increase the size of the MMIO hole, >> then the device won't actually work. The guest will boot, but the OS >> will not be able to use it. > I meant continue as in increasing the hole too, although rereading the > thread maybe that's not what everyone else was talking about ;-) Well if you continue increasing the hole, then it works on qemu-traditional but on qemu-xen you have weird crashes and guest hangs at some point in the future when qemu tries to map a non-existent guest memory address -- that's much worse than the device just not being visible to the OS. That's the point -- current behavior on qemu-xen causes weird hangs; but the simple way of preventing those hangs (just not increasing the MMIO hole size) removes functionality from both qemu-xen and qemu-traditional, even though qemu-traditional doesn't have any problems with the resized MMIO hole. So there's no simple way to avoid random crashes while keeping the work-around functional; that's why someone suggested adding a xenstore key to tell hvmloader what to do. At least, that's what I understood the situation to be -- someone correct me if I'm wrong. :-) -George