From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37864) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xvtz0-0001TH-LG for qemu-devel@nongnu.org; Tue, 02 Dec 2014 15:23:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xvtyu-0004kB-3B for qemu-devel@nongnu.org; Tue, 02 Dec 2014 15:23:38 -0500 Received: from fldsmtpe04.verizon.com ([140.108.26.143]:23090) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xvtyt-0004k7-Pc for qemu-devel@nongnu.org; Tue, 02 Dec 2014 15:23:32 -0500 From: Don Slutz Message-ID: <547E1FC1.70004@terremark.com> Date: Tue, 02 Dec 2014 15:23:29 -0500 MIME-Version: 1.0 References: <5474C96A.6090506@citrix.com> <54768F7F.2030602@terremark.com> <547C6E12.50502@terremark.com> <547CF6C6.4060106@terremark.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Xen-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefano Stabellini , Don Slutz Cc: Andrew Cooper , xen-devel@lists.xensource.com, "Wei Liu (Intern)" , Ian Campbell , qemu-devel@nongnu.org On 12/02/14 07:26, Stefano Stabellini wrote: > On Mon, 1 Dec 2014, Don Slutz wrote: >> On 12/01/14 10:37, Stefano Stabellini wrote: >>> On Mon, 1 Dec 2014, Don Slutz wrote: >>>> On 11/27/14 05:48, Stefano Stabellini wrote: >> [...] >> >>>> Works fine in both claim modes and with PoD used (maxmem > memory). Do >>>> not know how to test with tmem. I do not see how it would be worse then >>>> current >>>> code that does not auto increase. I.E. even without a xen change, I think >>>> something >>>> like this could be done. >>> OK, good to know. I am OK with increasing maxmem only if it is strictly >>> necessary. >>> >>> >>>>>> My testing shows a free 32 pages that I am not sure where they come >>>>>> from. >>>>>> But >>>>>> the code about is passing my 8 nics of e1000. >>>>> I think that raising maxmem a bit higher than necessary is not too bad. >>>>> If we really care about it, we could lower the limit after QEMU's >>>>> initialization is completed. >>>> Ok. I did find the 32 it is VGA_HOLE_SIZE. So here is what I have which >>>> includes >>>> a lot of extra printf. >>> In QEMU I would prefer not to assume that libxl increased maxmem for the >>> vga hole. I would rather call xc_domain_setmaxmem twice for the vga hole >>> than tie QEMU to a particular maxmem allocation scheme in libxl. >> Ok. The area we are talking about is 0x000a0000 to 0x000c0000. >> It is in libxc (xc_hvm_build_x86), not libxl. I have no issue with a name >> change to >> some thing like QEMU_EXTRA_FREE_PAGES. > Sorry, I was thinking about the relocated videoram region, the one > allocated by xen_add_to_physmap in QEMU. > > Regarding 0x000a0000-0x000c0000, according to this comment: > > /* > * Allocate memory for HVM guest, skipping VGA hole 0xA0000-0xC0000. > * > > xc_hvm_build_x86 shouldn't be allocating memory for it. > In fact if I remember correctly 0xa0000-0xc0000 is trapped and emulated. > Clearly I am not writing clear enough statements. xc_hvm_build_x86 is not allocating memory for it. libxl__build_pre() sets maxmem via xc_domain_setmaxmem(). Then xc_hvm_build_x86 allocates memory skipping the VGA hole 0xA0000-0xC0000. So difference between maxmem (max_pages, xlinfo->max_memkb) and tot_pages (xlinfo->current_memkb) is videoram + LIBXL_MAXMEM_CONSTANT + 32 (I.E. what VGA_HOLE_SIZE is definded as). >> My testing has shows that some of these 32 pages are used outside of QEMU. >> I am seeing just 23 free pages using a standalone program to display >> the same info after a CentOS 6.3 guest is done booting. >> >>> In libxl I would like to avoid increasing mamxem for anything QEMU will >>> allocate later, that includes rom and vga vram. I am not sure how to >>> make that work with older QEMU versions that don't call >>> xc_domain_setmaxmem by themselves yet though. Maybe we could check the >>> specific QEMU version in libxl and decide based on that. Or we could >>> export a feature flag in QEMU. >> Yes, it would be nice to adjust libxl to not increase maxmem. However since >> videoram is included in memory (and maxmem), making the change related >> to vram is a bigger issue. >> >> the rom change is much simpler. >> >> Currently I do not know of a way to do different things based on the QEMU >> version >> and/or features (this includes getting the QEMU version in libxl). >> >> I have been going with: >> 1) change QEMU 1st. >> 2) Wait for an upstream version of QEMU with this. >> 3) change xen to optionally use a feature in the latest QEMU. > Let's try to take this approach for the videoram too Ok. > >>>> --- a/xen-hvm.c >>>> +++ b/xen-hvm.c >>>> @@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t >>>> *shared_page, int vcpu) >>>> #endif >>>> >>>> #define BUFFER_IO_MAX_DELAY 100 >>>> +#define VGA_HOLE_SIZE (0x20) >>>> >>>> typedef struct XenPhysmap { >>>> hwaddr start_addr; >>>> @@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t >>>> size, >>>> MemoryRegion *mr) >>>> xen_pfn_t *pfn_list; >>>> int i; >>>> xc_dominfo_t info; >>>> + unsigned long max_pages, free_pages, real_free; >>>> + long need_pages; >>>> + uint64_t tot_pages, pod_cache_pages, pod_entries; >>>> + >>>> + trace_xen_ram_alloc(ram_addr, size, mr->name); >>>> >>>> if (runstate_check(RUN_STATE_INMIGRATE)) { >>>> /* RAM already populated in Xen */ >>>> @@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t >>>> size, >>>> MemoryRegion *mr) >>>> return; >>>> } >>>> >>>> - fprintf(stderr, "%s: alloc "RAM_ADDR_FMT >>>> - " bytes (%ld Kib) of ram at "RAM_ADDR_FMT >>>> - " mr.name=%s\n", >>>> - __func__, size, (long)(size>>10), ram_addr, mr->name); >>>> - >>>> - trace_xen_ram_alloc(ram_addr, size); >>>> - >>>> nr_pfn = size >> TARGET_PAGE_BITS; >>>> pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn); >>>> >>>> @@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t >>>> size, >>>> MemoryRegion *mr) >>>> pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i; >>>> } >>>> >>>> - if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) { >>>> + if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) || > I think it's best to call xc_domain_getinfolist. > Will switch. >>>> + (info.domid != xen_domid)) { >>>> hw_error("xc_domain_getinfo failed"); >>>> } >>>> - if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + >>>> - (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) { >>>> + max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE; >>>> + free_pages = max_pages - info.nr_pages; >>>> + real_free = free_pages; >>>> + if (free_pages > VGA_HOLE_SIZE) { >>>> + free_pages -= VGA_HOLE_SIZE; >>>> + } else { >>>> + free_pages = 0; >>>> + } > I don't think we need to subtract VGA_HOLE_SIZE. If you do not use some slack (like 32 here), xen does report: (d5) [2014-11-29 17:07:21] Loaded SeaBIOS (d5) [2014-11-29 17:07:21] Creating MP tables ... (d5) [2014-11-29 17:07:21] Loading ACPI ... (XEN) [2014-11-29 17:07:21] page_alloc.c:1568:d5 Over-allocation for domain 5: 1057417 > 1057416 (XEN) [2014-11-29 17:07:21] memory.c:158:d5 Could not allocate order=0 extent: id=5 memflags=0 (0 of 1) (d5) [2014-11-29 17:07:21] vm86 TSS at 00098c00 (d5) [2014-11-29 17:07:21] BIOS map: However while QEMU startup ends with 32 "free" pages (maxmem - curmem) this quickly changes to 23. I.E. there are 7 more places that act like a call to xc_domain_populate_physmap_exact() but do not log errors if there is no free pages. And just to make sure I do not fully understand what is happening here, after the message above, the domain appears to work fine and ends up with 8 "free" pages (code I do not know about ends up releasing populated pages). So my current thinking is to have QEMU leave a few (9 to 32 (64?)) pages "free". > >>>> + need_pages = nr_pfn - free_pages; >>>> + fprintf(stderr, "%s: alloc "RAM_ADDR_FMT >>>> + " bytes (%ld KiB) of ram at "RAM_ADDR_FMT >>>> + " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n", >>>> + __func__, size, (long)(size>>10), ram_addr, >>>> + max_pages, free_pages, nr_pfn, need_pages, >>>> + mr->name); >>>> + if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages, >>>> + &pod_cache_pages, &pod_entries) >= 0) { >>>> + unsigned long populated = tot_pages - pod_cache_pages; >>>> + long delta_tot = tot_pages - info.nr_pages; >>>> + >>>> + fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld >>>> nop=%ld >>>> free=%ld\n", >>>> + __func__, populated, (long)tot_pages, delta_tot, >>>> + (long)pod_cache_pages, (long)pod_entries, >>>> + (long)info.nr_outstanding_pages, real_free); >>>> + } >>> What is the purpose of calling xc_domain_get_pod_target here? It doesn't >>> look like is affecting the parameters passed to xc_domain_setmaxmem. >> This was just to test and see if anything was needed for PoD mode. >> I did not see anything. >> >> Did you want me to make a patch to send to the list that has my proposed >> changes? > Yep, that's fine. > > >> I do think that what I have would be fine to do since most of the time the >> new code does nothing with the current xen (and older versions), until >> more then 4 nics are configured for xen. >> >> It would be good to have the change: >> >> [PATCH] libxl_set_memory_target: retain the same maxmem offset on top of the >> current target >> >> (When working) in xen before using a QEMU with this change. >> >> Not sure I would push for 2.2 however. > I wouldn't. > Will mark as for 2.3 -Don Slutz