From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39005) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvR6K-0007o5-Bf for qemu-devel@nongnu.org; Mon, 01 Dec 2014 08:33:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XvR6E-0003Qy-W3 for qemu-devel@nongnu.org; Mon, 01 Dec 2014 08:33:16 -0500 Received: from omzsmtpe04.verizonbusiness.com ([199.249.25.207]:4025) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XvR6E-0003QV-Me for qemu-devel@nongnu.org; Mon, 01 Dec 2014 08:33:10 -0500 From: Don Slutz Message-ID: <547C6E12.50502@terremark.com> Date: Mon, 01 Dec 2014 08:33:06 -0500 MIME-Version: 1.0 References: <5474C96A.6090506@citrix.com> <54768F7F.2030602@terremark.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Xen-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefano Stabellini Cc: xen-devel@lists.xensource.com, "Wei Liu (Intern)" , Ian Campbell , Andrew Cooper , qemu-devel@nongnu.org, Don Slutz On 11/27/14 05:48, Stefano Stabellini wrote: > On Wed, 26 Nov 2014, Don Slutz wrote: >> On 11/26/14 13:17, Stefano Stabellini wrote: >>> On Tue, 25 Nov 2014, Andrew Cooper wrote: >>>> On 25/11/14 17:45, Stefano Stabellini wrote: >>>>> Increase maxmem before calling xc_domain_populate_physmap_exact to avoid >>>>> the risk of running out of guest memory. This way we can also avoid >>>>> complex memory calculations in libxl at domain construction time. >>>>> >>>>> This patch fixes an abort() when assigning more than 4 NICs to a VM. >>>>> >>>>> Signed-off-by: Stefano Stabellini >>>>> >>>>> diff --git a/xen-hvm.c b/xen-hvm.c >>>>> index 5c69a8d..38e08c3 100644 >>>>> --- a/xen-hvm.c >>>>> +++ b/xen-hvm.c >>>>> @@ -218,6 +218,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t >>>>> size, MemoryRegion *mr) >>>>> unsigned long nr_pfn; >>>>> xen_pfn_t *pfn_list; >>>>> int i; >>>>> + xc_dominfo_t info; >>>>> if (runstate_check(RUN_STATE_INMIGRATE)) { >>>>> /* RAM already populated in Xen */ >>>>> @@ -240,6 +241,13 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t >>>>> size, MemoryRegion *mr) >>>>> pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i; >>>>> } >>>>> + if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) { >>>> xc_domain_getinfo()'s interface is mad, and provides no guarantee that >>>> it returns the information for the domain you requested. It also won't >>>> return -1 on error. The correct error handing is: >>>> >>>> (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) || (info.domid != >>>> xen_domid) >>> It might be wiser to switch to xc_domain_getinfolist >> Either needs the same tests, since both return an vector of info. > Right > > >>>> ~Andrew >>>> >>>>> + hw_error("xc_domain_getinfo failed"); >>>>> + } >>>>> + if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + >>>>> + (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) { >> There are two big issues and 1 minor one with this. >> 1) You will allocate the videoram again. >> 2) You will never use the 1 MB already allocated for option ROMs. >> >> And the minor one is that you can increase maxmem more then is needed. > I don't understand: are you aware that setmaxmem doesn't allocate any > memory, just raises the maximum amount of memory allowed for the domain > to have? Yes. > But you are right that we would raise the limit more than it could be, > specifically the videoram would get accounted for twice and we wouldn't > need LIBXL_MAXMEM_CONSTANT. I guess we would have to write a patch for > that. > > > >> Here is a better if: >> >> - if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + >> - (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) { >> + max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE; >> + free_pages = max_pages - info.nr_pages; >> + need_pages = nr_pfn - free_pages; >> + if ((free_pages < nr_pfn) && >> + (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + >> + (need_pages * XC_PAGE_SIZE / 1024)) < 0)) { > That's an interesting idea, but I am not sure if it is safe in all > configurations. > > It could make QEMU work better with older libxl and avoid increasing > maxmem more than necessary. > On the other hand I guess it could break things when PoD is used, or in > general when the user purposely sets maxmem on the vm config file. > Works fine in both claim modes and with PoD used (maxmem > memory). Do not know how to test with tmem. I do not see how it would be worse then current code that does not auto increase. I.E. even without a xen change, I think something like this could be done. >> My testing shows a free 32 pages that I am not sure where they come from. But >> the code about is passing my 8 nics of e1000. > I think that raising maxmem a bit higher than necessary is not too bad. > If we really care about it, we could lower the limit after QEMU's > initialization is completed. Ok. I did find the 32 it is VGA_HOLE_SIZE. So here is what I have which includes a lot of extra printf. --- a/xen-hvm.c +++ b/xen-hvm.c @@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu) #endif #define BUFFER_IO_MAX_DELAY 100 +#define VGA_HOLE_SIZE (0x20) typedef struct XenPhysmap { hwaddr start_addr; @@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr) xen_pfn_t *pfn_list; int i; xc_dominfo_t info; + unsigned long max_pages, free_pages, real_free; + long need_pages; + uint64_t tot_pages, pod_cache_pages, pod_entries; + + trace_xen_ram_alloc(ram_addr, size, mr->name); if (runstate_check(RUN_STATE_INMIGRATE)) { /* RAM already populated in Xen */ @@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr) return; } - fprintf(stderr, "%s: alloc "RAM_ADDR_FMT - " bytes (%ld Kib) of ram at "RAM_ADDR_FMT - " mr.name=%s\n", - __func__, size, (long)(size>>10), ram_addr, mr->name); - - trace_xen_ram_alloc(ram_addr, size); - nr_pfn = size >> TARGET_PAGE_BITS; pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn); @@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr) pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i; } - if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) { + if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) || + (info.domid != xen_domid)) { hw_error("xc_domain_getinfo failed"); } - if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + - (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) { + max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE; + free_pages = max_pages - info.nr_pages; + real_free = free_pages; + if (free_pages > VGA_HOLE_SIZE) { + free_pages -= VGA_HOLE_SIZE; + } else { + free_pages = 0; + } + need_pages = nr_pfn - free_pages; + fprintf(stderr, "%s: alloc "RAM_ADDR_FMT + " bytes (%ld KiB) of ram at "RAM_ADDR_FMT + " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n", + __func__, size, (long)(size>>10), ram_addr, + max_pages, free_pages, nr_pfn, need_pages, + mr->name); + if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages, + &pod_cache_pages, &pod_entries) >= 0) { + unsigned long populated = tot_pages - pod_cache_pages; + long delta_tot = tot_pages - info.nr_pages; + + fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld nop=%ld free=%ld\n", + __func__, populated, (long)tot_pages, delta_tot, + (long)pod_cache_pages, (long)pod_entries, + (long)info.nr_outstanding_pages, real_free); + } + if ((free_pages < nr_pfn) && + (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb + + (need_pages * XC_PAGE_SIZE / 1024)) < 0)) { hw_error("xc_domain_setmaxmem failed"); } if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, 0, pfn_list)) { Note: I already had a trace_xen_ram_alloc, here is the delta in trace-events (which has others not yet sent): diff --git a/trace-events b/trace-events index eaeecd5..a58fc11 100644 --- a/trace-events +++ b/trace-events @@ -908,7 +908,7 @@ pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) "%s page: %"PRIx64"" pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s pages: %u" # xen-all.c -xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx, size %#lx" +xen_ram_alloc(unsigned long ram_addr, unsigned long size, const char* mr_name) "requested: %#lx size %#lx mr->name=%s" xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) "%#"PRIx64" size %#lx, log_dirty %i" handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p type=%d dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d" handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p read type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d size=%d" -Don Slutz > >> -Don Slutz >> >> >>>>> + hw_error("xc_domain_setmaxmem failed"); >>>>> + } >>>>> if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, >>>>> 0, pfn_list)) { >>>>> hw_error("xen: failed to populate ram at " RAM_ADDR_FMT, >>>>> ram_addr); >>>>> } >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xen.org >>>>> http://lists.xen.org/xen-devel