From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: =?utf-8?q?HVM_support_for_e820=5Fhost_=28Was=3A_Bug?= =?utf-8?q?=3A_Limitation_of_=3C=3D2GB_RAM_in_domU_persists_with_4=2E3=2E0?= =?utf-8?q?=29?= Date: Tue, 10 Sep 2013 16:04:47 +0100 Message-ID: References: <20130905020442.GA2459@phenom.dumpdata.com> <5228F3E2.8090905@bobich.net> <522906AE.3070402@bobich.net> <20130906130435.GE2590@phenom.dumpdata.com> <20130906143223.GB5140@phenom.dumpdata.com> <522A32F0.6080304@bobich.net> <20130910133559.GA5667@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130910133559.GA5667@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Stefano Stabellini , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Tue, 10 Sep 2013 09:35:59 -0400, Konrad Rzeszutek Wilk wrote: > On Fri, Sep 06, 2013 at 08:54:24PM +0100, Gordan Bobic wrote: >> Here is a test patch I applied to: >> /tools/firmware/hvmloader/e820.c >> >> === >> --- e820.c.orig 2013-09-06 11:15:20.023337321 +0100 >> +++ e820.c 2013-09-06 19:53:00.141876019 +0100 >> @@ -79,6 +79,7 @@ >> unsigned int nr = 0; >> struct xen_memory_map op; >> struct e820entry map[E820MAX]; >> + int e820_host = 0; >> int rc; >> >> if ( !lowmem_reserved_base ) >> @@ -88,6 +89,7 @@ >> >> rc = hypercall_memory_op ( XENMEM_memory_map, &op); >> if ( rc != -ENOSYS) { /* It works!? */ >> + e820_host = 1; >> printf("%s:%d got %d op.nr_entries \n", __func__, __LINE__, >> op.nr_entries); >> dump_e820_table(&map[0], op.nr_entries); >> } >> @@ -133,7 +135,12 @@ >> /* Low RAM goes here. Reserve space for special pages. */ >> BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20)); >> e820[nr].addr = 0x100000; >> - e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - >> e820[nr].addr; >> + >> + if (e820_host) >> + e820[nr].size = 0x3f7e0000 - e820[nr].addr; >> + else >> + e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - >> e820[nr].addr; >> + >> e820[nr].type = E820_RAM; >> nr++; >> >> === >> >> I'm sure this doesn't need explicitly pointing out, but for the >> record, it is a gross hack just to prove the concept. >> >> The map dump with this patch applied and memory set to 8192 is: >> >> === >> (XEN) HVM5: BIOS map: >> (XEN) HVM5: f0000-fffff: Main BIOS >> (XEN) HVM5: build_e820_table:93 got 8 op.nr_entries >> (XEN) HVM5: E820 table: >> (XEN) HVM5: [00]: 00000000:00000000 - 00000000:3f790000: RAM >> (XEN) HVM5: [01]: 00000000:3f790000 - 00000000:3f79e000: ACPI >> (XEN) HVM5: [02]: 00000000:3f79e000 - 00000000:3f7d0000: NVS >> (XEN) HVM5: [03]: 00000000:3f7d0000 - 00000000:3f7e0000: RESERVED >> (XEN) HVM5: HOLE: 00000000:3f7e0000 - 00000000:3f7e7000 >> (XEN) HVM5: [04]: 00000000:3f7e7000 - 00000000:40000000: RESERVED >> (XEN) HVM5: HOLE: 00000000:40000000 - 00000000:fee00000 >> (XEN) HVM5: [05]: 00000000:fee00000 - 00000000:fee01000: RESERVED >> (XEN) HVM5: HOLE: 00000000:fee01000 - 00000000:ffc00000 >> (XEN) HVM5: [06]: 00000000:ffc00000 - 00000001:00000000: RESERVED >> (XEN) HVM5: [07]: 00000001:00000000 - 00000002:c0870000: RAM >> (XEN) HVM5: E820 table: >> (XEN) HVM5: [00]: 00000000:00000000 - 00000000:0009e000: RAM >> (XEN) HVM5: [01]: 00000000:0009e000 - 00000000:000a0000: RESERVED >> (XEN) HVM5: HOLE: 00000000:000a0000 - 00000000:000e0000 >> (XEN) HVM5: [02]: 00000000:000e0000 - 00000000:00100000: RESERVED >> (XEN) HVM5: [03]: 00000000:00100000 - 00000000:3f7e0000: RAM >> (XEN) HVM5: HOLE: 00000000:3f7e0000 - 00000000:fc000000 >> (XEN) HVM5: [04]: 00000000:fc000000 - 00000001:00000000: RESERVED >> (XEN) HVM5: [05]: 00000001:00000000 - 00000002:1f800000: RAM >> (XEN) HVM5: Invoking ROMBIOS ... >> === >> >> Good observations: >> It works! No crashes, no screen corruption! As an added bonus, it >> fixes the problem of rebooting domUs causing them to lose GPU access >> and eventually crash the host even with memory allocation below the >> first PCI MMIO block. I am suspecting that something in the >> 0x3f7e0000-0x3f7e7000 hole that isn't showing up on lspci might be >> responsible. >> >> I think that proves beyond any doubt what the problem was before. >> >> Interesting observations: >> 1) GPU PCI MMIO is still mapped at E0000000, rather than at the >> bottom of the memory hole. That implies that SeaBIOS (or whatever >> does the mapping) makes assumptions about where the memory hole >> begins. This will need to somehow be fixed / made dynamic. What >> decides where to map PCI memory for each device? >> >> 2) The memory hole size difference counts toward the total guest >> memory. I set >> memory=8192 >> maxmem=8192 >> but Windows in domU only sees 5.48GB. What is particularly odd is >> that that the missing memory isn't 3GB, but 2.5GB - which implies >> that, again, there are other things making assumptions about the >> size and shape of the memory hole and moving the memory from the >> hole elsewhere to make it usable. What does this? >> >> My todo list, in order of priority (unless somebody here has a >> better idea) is: >> 1) Tidy up the hole enlargement to make it dynamically based on the >> host hole locations. In cases where the host hole overlaps something >> other than guest RAM/HOLE (i.e. RESERVED), guest spec wins. > > guest spec is .. the default hvmloader behavior? Yes, that's exactly what I meant. At least until I can figure out what necessitates the default HVM behaviour. >> 2) Fix whatever is causing the hole memory increase to reduce the >> guest memory. The memory hole is a hole, not a shadow. I need some >> pointers on where to look for whatever is responsible for this. > > That is where git log tools/hvmloader/firmware might shed some light. I grepped for low_mem_pgend and high_mem_pgend, and the only place where I have found anything is in one place in libxc. Is this what sets it? Is this common to xm and xl? >> 3) Fix what makes decisions on where to map devices' memory >> apertures. Ideally, the fix should be to detect host's pBAR make >> vBAR=pBAR. Again, I need some pointers on where to look for whatever >> is responsible for doing this mapping. > > That should be all in tools/hvmloader/firmware I believe. > 'pci_setup' function, where it says: > /* Assign iomem and ioport resources in descending order of size. */ Thanks, will take a closer look there. Gordan