From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: HVM support for e820_host (Was: Bug: Limitation of <=2GB RAM in domU persists with 4.3.0) Date: Fri, 6 Sep 2013 09:20:50 -0400 Message-ID: <20130906132050.GG2590@phenom.dumpdata.com> References: <52264826.3010402@bobich.net> <20130903210833.GB13777@phenom.dumpdata.com> <20130905020442.GA2459@phenom.dumpdata.com> <5228F3E2.8090905@bobich.net> <52290986.3090601@bobich.net> <184bac5f-7bbc-46c9-b943-40f15534a50c@email.android.com> <08e0b42b96dcd460512302a8df3da7f8@mail.shatteredsilicon.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <08e0b42b96dcd460512302a8df3da7f8@mail.shatteredsilicon.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Gordan Bobic Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Fri, Sep 06, 2013 at 01:23:19PM +0100, Gordan Bobic wrote: > On Thu, 05 Sep 2013 19:01:03 -0400, Konrad Rzeszutek Wilk > wrote: > >Gordan Bobic wrote: > >>On 09/05/2013 11:23 PM, Konrad Rzeszutek Wilk wrote: > >>>Gordan Bobic wrote: > >>>>Right, finally got around to trying this with the latest patch. > >>>> > >>>>With e820_host=0 things work as before: > >>>> > >>>>(XEN) HVM3: BIOS map: > >>>>(XEN) HVM3: f0000-fffff: Main BIOS > >>>>(XEN) HVM3: E820 table: > >>>>(XEN) HVM3: [00]: 00000000:00000000 - 00000000:0009e000: RAM > >>>>(XEN) HVM3: [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED > >>>>(XEN) HVM3: HOLE: 00000000:000a0000 - 00000000:000e0000 > >>>>(XEN) HVM3: [02]: 00000000:000e0000 - 00000000:00100000: RESERVED > >>>>(XEN) HVM3: [03]: 00000000:00100000 - 00000000:e0000000: RAM > >>>>(XEN) HVM3: HOLE: 00000000:e0000000 - 00000000:fc000000 > >>>>(XEN) HVM3: [04]: 00000000:fc000000 - 00000001:00000000: RESERVED > >>>>(XEN) HVM3: [05]: 00000001:00000000 - 00000002:1f800000: RAM > >>>> > >>>> > >>>>I seem to be getting two different E820 table dumps with > >>e820_host=1: > >>>> > >>>>(XEN) HVM1: BIOS map: > >>>>(XEN) HVM1: f0000-fffff: Main BIOS > >>>>(XEN) HVM1: build_e820_table:91 got 8 op.nr_entries > >>>>(XEN) HVM1: E820 table: > >>>>(XEN) HVM1: [00]: 00000000:00000000 - 00000000:3f790000: RAM > >>>>(XEN) HVM1: [01]: 00000000:3f790000 - 00000000:3f79e000: ACPI > >>>>(XEN) HVM1: [02]: 00000000:3f79e000 - 00000000:3f7d0000: NVS > >>>>(XEN) HVM1: [03]: 00000000:3f7d0000 - 00000000:3f7e0000: RESERVED > >>>>(XEN) HVM1: HOLE: 00000000:3f7e0000 - 00000000:3f7e7000 > >>>>(XEN) HVM1: [04]: 00000000:3f7e7000 - 00000000:40000000: RESERVED > >>>>(XEN) HVM1: HOLE: 00000000:40000000 - 00000000:fee00000 > >>>>(XEN) HVM1: [05]: 00000000:fee00000 - 00000000:fee01000: RESERVED > >>>>(XEN) HVM1: HOLE: 00000000:fee01000 - 00000000:ffc00000 > >>>>(XEN) HVM1: [06]: 00000000:ffc00000 - 00000001:00000000: RESERVED > >>>>(XEN) HVM1: [07]: 00000001:00000000 - 00000001:68870000: RAM > >>>>(XEN) HVM1: E820 table: > >>>>(XEN) HVM1: [00]: 00000000:00000000 - 00000000:0009e000: RAM > >>>>(XEN) HVM1: [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED > >>>>(XEN) HVM1: HOLE: 00000000:000a0000 - 00000000:000e0000 > >>>>(XEN) HVM1: [02]: 00000000:000e0000 - 00000000:00100000: RESERVED > >>>>(XEN) HVM1: [03]: 00000000:00100000 - 00000000:a7800000: RAM > >>>>(XEN) HVM1: HOLE: 00000000:a7800000 - 00000000:fc000000 > >>>>(XEN) HVM1: [04]: 00000000:fc000000 - 00000001:00000000: RESERVED > >>>>(XEN) HVM1: Invoking ROMBIOS ... > >>>> > >>>>I cannot quite figure out what is going on here - these > >>>>tables can't > >>>>both be true. > >>>> > >>> > >>>Right. The code just prints the E820 that was constructed b/c > >>>of the > >>e820_host =1 parameter as the first output. Then the second one is > >>what was constructed originally. > >>> > >>>The code that would tie in the E820 from the hyper call and > >>>the alter > >>how the hvmloader sets it up is not yet done. > >>> > >>> > >>>>Looking at the IOMEM on the host, the IOMEM begins at > >>>>0xa8000000 and > >>>>goes more or less contiguously up to 0xfec8b000. > >>>> > >>>>Looking at dmesg on domU, the e820 map more or less matches the > >>second > >>>>dump above. > >>> > >>>Right. That is correct since the patch I sent just outputs stuff. > >>No real changes to the E820 yet. > >> > >>I thought this did that in hvmloader/e820c: > >>hypercall_memory_op ( XENMEM_memory_map, &op); > >> > >>Gordan > > > >No. They just gets the E820 that is stashed in the hypervisor for > >the guest. The PV guest would use it but hvmloader is not. This is > >what would needed to be implemented to allow hvmloader construct the > >E820 on its own. > > Right. So so in hvmloader/e820.c we now have the host based map in > struct e820entry map[E820MAX]; > > The rest of the function then goes and constructs the standard HVM > e820 map in the passed in > struct e820entry *e820 > > So all that needs to happen here is if e820_host is set, fill e820[] > by copying map[] up to the hvm_info->low_mem_pgend > (or hvm_info->high_mem_pgend if it is set). I am guessing that Right. And then the overflow would be put past 4GB. Or fill in the E820_RAM regions with it. > SeaBIOS and other existing stuff might break if the host map is > just copied in verbatim, so presumably I need to add/dedupe the > non-RAM parts of the maps. Probably. Or tweak SeaBIOS to use your E820. Also you need to figure out where hvmloader constructs the ACPI and SMBIOS tables and make sure they are within the E820_RESERVED regions. > > Is that right? Nothing else needs to happen? HA! You are going to hit some bugs probably :-) > > The following questions arise: > > 1) What to do in case of overlaps? On my specific hardware, > the key difference in the end map will be that the hole at: > (XEN) HVM1: HOLE: 00000000:40000000 - 00000000:fee00000 > will end up being created in domU. The hole is also known as PCI gap or MMIO region. With the e820_host in effect you should use the host's layout and use its hole placement. That will replicate it and make domU's E820 hole look like the host. > > 2) Do only the holes need to be pulled from the host or > the entire map? Would hvmloader/seabios/whatever know > what to do if passed a map that is different from what > they might expect (i.e. different from what the current > hvmloader provides)? Or would this be likely to cause > extensive further breakages? I think there are some assumptions made where the hole starts. Those would have to be made more dynamic to deal with a different E820 layout. > > 3) At the moment I am leaning toward just pulling in the > holes from the host e820, mirroring them in domU. > 3.1) Marking them as "reserved" would likely fix the > problem that was my primary motivation for doing this > in the first place. Having said that - with all of That unfortuntaly will make them not-gaps nor MMIO regions. Meaning the kernel will scream: "You have a BAR in E820_ reserved region! That is bad!", and won't setup the card. The hole needs to be replicated in the guest. > the 1GB-3GB space marked as reserved, I'm not sure where > the IOMEM would end up mapped in domU - things might just > break. If marking the dom0 hole as a hole in domU without > ensuring pBAR=vBAR, the PCI device in domU might get > mapped with where another device is in dom0, which might > cause the same problem. Right. hvmloader could (I hadn't checked the code) scan the E820 and determine that the PCI BARs are within the E820_RESRV and try to move them to a hole. Since no hole would be found below 4GB it would remap the PCI BAR above 4GB. That - depending on the device - could be disastrous for the device. That is if it is only capable of 32-bit DMA's it will never do anything. > > At the moment, I think the expedient thing to do is make > domU map holes as per dom0 and ignore other non-RAM > areas. This may (by luck) or may not fix my immediate problem > (RAM in domU clobbering host's mapped IOMEM), but at > least it would cover the pre-requisite hole mapping for > the next step which is vBAR=pBAR. > > I light of this, however, depending on the answer to 2) > above, it may not be practical for e820_host option do do I think it will mean you need to look in the hvmloader directory a bit more and find all of the assumptions it makes about memory locations. One excellent tool is to do 'git log -p tools/hvmloader' as it will tell you what changes have been done to address the memory layout construction. > what it actually means for HVMs, at least not to the same > extent as happens for PV. It would only do a part of it > (initial vHOLE=pHOLE, to later be extended to the more > specific case of vBAR=pBAR). > > Does this sound reasonable? Yes. I think the plan you outlined is sound. The difficultiy is going to be cramming the E820 constructed by e820_host in hvmloader and making sure that all the other parts of it (SMBIOS, ACPI, BIOS) will be more dynamic and use dynamic locations instead of hard-coded values. Loads of printks can help with that :-) The awesome thing is that it will make hvmloader a lot more flexible. And one can extend the e820_host to construct an E820 that is bizzare for testing even more absurd memory layouts (say, no RAM below 4GB). Keep on digging! Thanks for great analysis. > > Gordan