Re: [Qemu-devel] [Xen-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Don Slutz <dslutz@verizon.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Don Slutz <dslutz@verizon.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel@lists.xensource.com,
	"Wei Liu (Intern)" <wei.liu2@citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [Xen-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap
Date: Tue, 02 Dec 2014 15:23:29 -0500	[thread overview]
Message-ID: <547E1FC1.70004@terremark.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1412021158210.14135@kaball.uk.xensource.com>

On 12/02/14 07:26, Stefano Stabellini wrote:
> On Mon, 1 Dec 2014, Don Slutz wrote:
>> On 12/01/14 10:37, Stefano Stabellini wrote:
>>> On Mon, 1 Dec 2014, Don Slutz wrote:
>>>> On 11/27/14 05:48, Stefano Stabellini wrote:
>> [...]
>>
>>>> Works fine in both claim modes and with PoD used (maxmem > memory).  Do
>>>> not know how to test with tmem.  I do not see how it would be worse then
>>>> current
>>>> code that does not auto increase.  I.E. even without a xen change, I think
>>>> something
>>>> like this could be done.
>>> OK, good to know. I am OK with increasing maxmem only if it is strictly
>>> necessary.
>>>
>>>
>>>>>> My testing shows a free 32 pages that I am not sure where they come
>>>>>> from.
>>>>>> But
>>>>>> the code about is passing my 8 nics of e1000.
>>>>> I think that raising maxmem a bit higher than necessary is not too bad.
>>>>> If we really care about it, we could lower the limit after QEMU's
>>>>> initialization is completed.
>>>> Ok.  I did find the 32 it is VGA_HOLE_SIZE.  So here is what I have which
>>>> includes
>>>> a lot of extra printf.
>>> In QEMU I would prefer not to assume that libxl increased maxmem for the
>>> vga hole. I would rather call xc_domain_setmaxmem twice for the vga hole
>>> than tie QEMU to a particular maxmem allocation scheme in libxl.
>> Ok.  The area we are talking about is 0x000a0000 to 0x000c0000.
>> It is in libxc (xc_hvm_build_x86), not libxl.   I have no issue with a name
>> change to
>> some thing like QEMU_EXTRA_FREE_PAGES.
> Sorry, I was thinking about the relocated videoram region, the one
> allocated by xen_add_to_physmap in QEMU.
>
> Regarding 0x000a0000-0x000c0000, according to this comment:
>
>      /*
>       * Allocate memory for HVM guest, skipping VGA hole 0xA0000-0xC0000.
>       *
>
> xc_hvm_build_x86 shouldn't be allocating memory for it.
> In fact if I remember correctly 0xa0000-0xc0000 is trapped and emulated.
>

Clearly I am not writing clear enough statements.

xc_hvm_build_x86 is not allocating memory for it.  libxl__build_pre() sets
maxmem via xc_domain_setmaxmem().  Then xc_hvm_build_x86 allocates
memory skipping the VGA hole 0xA0000-0xC0000. So difference between
maxmem (max_pages, xlinfo->max_memkb) and tot_pages 
(xlinfo->current_memkb) is videoram + LIBXL_MAXMEM_CONSTANT + 32
(I.E. what VGA_HOLE_SIZE is definded as).


>> My testing has shows that some of these 32 pages are used outside of QEMU.
>> I am seeing just 23 free pages using a standalone program to display
>> the same info after a CentOS 6.3 guest is done booting.
>>
>>> In libxl I would like to avoid increasing mamxem for anything QEMU will
>>> allocate later, that includes rom and vga vram. I am not sure how to
>>> make that work with older QEMU versions that don't call
>>> xc_domain_setmaxmem by themselves yet though. Maybe we could check the
>>> specific QEMU version in libxl and decide based on that. Or we could
>>> export a feature flag in QEMU.
>> Yes, it would be nice to adjust libxl to not increase maxmem. However since
>> videoram is included in memory (and maxmem), making the change related
>> to vram is a bigger issue.
>>
>> the rom change is much simpler.
>>
>> Currently I do not know of a way to do different things based on the QEMU
>> version
>> and/or features (this includes getting the QEMU version in libxl).
>>
>> I have been going with:
>> 1) change QEMU 1st.
>> 2) Wait for an upstream version of QEMU with this.
>> 3) change xen to optionally use a feature in the latest QEMU.
> Let's try to take this approach for the videoram too

Ok.

>
>>>> --- a/xen-hvm.c
>>>> +++ b/xen-hvm.c
>>>> @@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t
>>>> *shared_page, int vcpu)
>>>>    #endif
>>>>
>>>>    #define BUFFER_IO_MAX_DELAY  100
>>>> +#define VGA_HOLE_SIZE (0x20)
>>>>
>>>>    typedef struct XenPhysmap {
>>>>        hwaddr start_addr;
>>>> @@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>> size,
>>>> MemoryRegion *mr)
>>>>        xen_pfn_t *pfn_list;
>>>>        int i;
>>>>        xc_dominfo_t info;
>>>> +    unsigned long max_pages, free_pages, real_free;
>>>> +    long need_pages;
>>>> +    uint64_t tot_pages, pod_cache_pages, pod_entries;
>>>> +
>>>> +    trace_xen_ram_alloc(ram_addr, size, mr->name);
>>>>
>>>>        if (runstate_check(RUN_STATE_INMIGRATE)) {
>>>>            /* RAM already populated in Xen */
>>>> @@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>> size,
>>>> MemoryRegion *mr)
>>>>            return;
>>>>        }
>>>>
>>>> -    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
>>>> -            " bytes (%ld Kib) of ram at "RAM_ADDR_FMT
>>>> -            " mr.name=%s\n",
>>>> -            __func__, size, (long)(size>>10), ram_addr, mr->name);
>>>> -
>>>> -    trace_xen_ram_alloc(ram_addr, size);
>>>> -
>>>>        nr_pfn = size >> TARGET_PAGE_BITS;
>>>>        pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
>>>>
>>>> @@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>> size,
>>>> MemoryRegion *mr)
>>>>            pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
>>>>        }
>>>>
>>>> -    if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
>>>> +    if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) ||
> I think it's best to call xc_domain_getinfolist.
>

Will switch.

>>>> +       (info.domid != xen_domid)) {
>>>>            hw_error("xc_domain_getinfo failed");
>>>>        }
>>>> -    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>>>> -                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
>>>> +    max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
>>>> +    free_pages = max_pages - info.nr_pages;
>>>> +    real_free = free_pages;
>>>> +    if (free_pages > VGA_HOLE_SIZE) {
>>>> +        free_pages -= VGA_HOLE_SIZE;
>>>> +    } else {
>>>> +        free_pages = 0;
>>>> +    }
> I don't think we need to subtract VGA_HOLE_SIZE.

If you do not use some slack (like 32 here), xen does report:


(d5) [2014-11-29 17:07:21] Loaded SeaBIOS
(d5) [2014-11-29 17:07:21] Creating MP tables ...
(d5) [2014-11-29 17:07:21] Loading ACPI ...
(XEN) [2014-11-29 17:07:21] page_alloc.c:1568:d5 Over-allocation for 
domain 5: 1057417 > 1057416
(XEN) [2014-11-29 17:07:21] memory.c:158:d5 Could not allocate order=0 
extent: id=5 memflags=0 (0 of 1)
(d5) [2014-11-29 17:07:21] vm86 TSS at 00098c00
(d5) [2014-11-29 17:07:21] BIOS map:


However while QEMU startup ends with 32 "free" pages (maxmem - curmem)
this quickly changes to 23.  I.E. there are 7 more places that act like 
a call
to xc_domain_populate_physmap_exact() but do not log errors if there is
no free pages.  And just to make sure I do not fully understand what is
happening here, after the message above, the domain appears to work
fine and ends up with 8 "free" pages (code I do not know about ends up
releasing populated pages).

So my current thinking is to have QEMU leave a few (9 to 32 (64?)) pages 
"free".


>
>>>> +    need_pages = nr_pfn - free_pages;
>>>> +    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
>>>> +            " bytes (%ld KiB) of ram at "RAM_ADDR_FMT
>>>> +            " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n",
>>>> +            __func__, size, (long)(size>>10), ram_addr,
>>>> +           max_pages, free_pages, nr_pfn, need_pages,
>>>> +            mr->name);
>>>> +    if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages,
>>>> +                                 &pod_cache_pages, &pod_entries) >= 0) {
>>>> +        unsigned long populated = tot_pages - pod_cache_pages;
>>>> +        long delta_tot = tot_pages - info.nr_pages;
>>>> +
>>>> +        fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld
>>>> nop=%ld
>>>> free=%ld\n",
>>>> +                __func__, populated, (long)tot_pages, delta_tot,
>>>> +                (long)pod_cache_pages, (long)pod_entries,
>>>> +               (long)info.nr_outstanding_pages, real_free);
>>>> +    }
>>> What is the purpose of calling xc_domain_get_pod_target here? It doesn't
>>> look like is affecting the parameters passed to xc_domain_setmaxmem.
>> This was just to test and see if anything was needed for PoD mode.
>> I did not see anything.
>>
>> Did you want me to make a patch to send to the list that has my proposed
>> changes?
> Yep, that's fine.
>
>
>> I do think that what I have would be fine to do since most of the time the
>> new code does nothing with the current xen (and older versions), until
>> more then 4 nics are configured for xen.
>>
>> It would be good to have the change:
>>
>> [PATCH] libxl_set_memory_target: retain the same maxmem offset on top of the
>> current target
>>
>> (When working) in xen before using a QEMU with this change.
>>
>> Not sure I would push for 2.2 however.
> I wouldn't.
>

Will mark as for 2.3

    -Don Slutz

next prev parent reply	other threads:[~2014-12-02 20:23 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-25 17:45 [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap Stefano Stabellini
2014-11-25 18:24 ` [Qemu-devel] [Xen-devel] " Andrew Cooper
2014-11-26 18:17   ` Stefano Stabellini
2014-11-27  2:42     ` Don Slutz
2014-11-27 10:48       ` Stefano Stabellini
2014-12-01 13:33         ` Don Slutz
2014-12-01 15:37           ` Stefano Stabellini
2014-12-01 23:16             ` Don Slutz
2014-12-02 12:26               ` Stefano Stabellini
2014-12-02 20:23                 ` Don Slutz [this message]
2014-12-03 10:54                   ` Wei Liu
2014-12-03 12:20                     ` Stefano Stabellini
2014-12-03 13:39                       ` Don Slutz
2014-12-03 14:50                         ` Stefano Stabellini
2014-12-04 16:26                           ` Don Slutz
2014-12-04 16:38                             ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=547E1FC1.70004@terremark.com \
    --to=dslutz@verizon.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).