From: Don Slutz <dslutz@verizon.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
Don Slutz <dslutz@verizon.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
xen-devel@lists.xensource.com,
"Wei Liu (Intern)" <wei.liu2@citrix.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [Xen-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap
Date: Tue, 02 Dec 2014 15:23:29 -0500 [thread overview]
Message-ID: <547E1FC1.70004@terremark.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1412021158210.14135@kaball.uk.xensource.com>
On 12/02/14 07:26, Stefano Stabellini wrote:
> On Mon, 1 Dec 2014, Don Slutz wrote:
>> On 12/01/14 10:37, Stefano Stabellini wrote:
>>> On Mon, 1 Dec 2014, Don Slutz wrote:
>>>> On 11/27/14 05:48, Stefano Stabellini wrote:
>> [...]
>>
>>>> Works fine in both claim modes and with PoD used (maxmem > memory). Do
>>>> not know how to test with tmem. I do not see how it would be worse then
>>>> current
>>>> code that does not auto increase. I.E. even without a xen change, I think
>>>> something
>>>> like this could be done.
>>> OK, good to know. I am OK with increasing maxmem only if it is strictly
>>> necessary.
>>>
>>>
>>>>>> My testing shows a free 32 pages that I am not sure where they come
>>>>>> from.
>>>>>> But
>>>>>> the code about is passing my 8 nics of e1000.
>>>>> I think that raising maxmem a bit higher than necessary is not too bad.
>>>>> If we really care about it, we could lower the limit after QEMU's
>>>>> initialization is completed.
>>>> Ok. I did find the 32 it is VGA_HOLE_SIZE. So here is what I have which
>>>> includes
>>>> a lot of extra printf.
>>> In QEMU I would prefer not to assume that libxl increased maxmem for the
>>> vga hole. I would rather call xc_domain_setmaxmem twice for the vga hole
>>> than tie QEMU to a particular maxmem allocation scheme in libxl.
>> Ok. The area we are talking about is 0x000a0000 to 0x000c0000.
>> It is in libxc (xc_hvm_build_x86), not libxl. I have no issue with a name
>> change to
>> some thing like QEMU_EXTRA_FREE_PAGES.
> Sorry, I was thinking about the relocated videoram region, the one
> allocated by xen_add_to_physmap in QEMU.
>
> Regarding 0x000a0000-0x000c0000, according to this comment:
>
> /*
> * Allocate memory for HVM guest, skipping VGA hole 0xA0000-0xC0000.
> *
>
> xc_hvm_build_x86 shouldn't be allocating memory for it.
> In fact if I remember correctly 0xa0000-0xc0000 is trapped and emulated.
>
Clearly I am not writing clear enough statements.
xc_hvm_build_x86 is not allocating memory for it. libxl__build_pre() sets
maxmem via xc_domain_setmaxmem(). Then xc_hvm_build_x86 allocates
memory skipping the VGA hole 0xA0000-0xC0000. So difference between
maxmem (max_pages, xlinfo->max_memkb) and tot_pages
(xlinfo->current_memkb) is videoram + LIBXL_MAXMEM_CONSTANT + 32
(I.E. what VGA_HOLE_SIZE is definded as).
>> My testing has shows that some of these 32 pages are used outside of QEMU.
>> I am seeing just 23 free pages using a standalone program to display
>> the same info after a CentOS 6.3 guest is done booting.
>>
>>> In libxl I would like to avoid increasing mamxem for anything QEMU will
>>> allocate later, that includes rom and vga vram. I am not sure how to
>>> make that work with older QEMU versions that don't call
>>> xc_domain_setmaxmem by themselves yet though. Maybe we could check the
>>> specific QEMU version in libxl and decide based on that. Or we could
>>> export a feature flag in QEMU.
>> Yes, it would be nice to adjust libxl to not increase maxmem. However since
>> videoram is included in memory (and maxmem), making the change related
>> to vram is a bigger issue.
>>
>> the rom change is much simpler.
>>
>> Currently I do not know of a way to do different things based on the QEMU
>> version
>> and/or features (this includes getting the QEMU version in libxl).
>>
>> I have been going with:
>> 1) change QEMU 1st.
>> 2) Wait for an upstream version of QEMU with this.
>> 3) change xen to optionally use a feature in the latest QEMU.
> Let's try to take this approach for the videoram too
Ok.
>
>>>> --- a/xen-hvm.c
>>>> +++ b/xen-hvm.c
>>>> @@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t
>>>> *shared_page, int vcpu)
>>>> #endif
>>>>
>>>> #define BUFFER_IO_MAX_DELAY 100
>>>> +#define VGA_HOLE_SIZE (0x20)
>>>>
>>>> typedef struct XenPhysmap {
>>>> hwaddr start_addr;
>>>> @@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>> size,
>>>> MemoryRegion *mr)
>>>> xen_pfn_t *pfn_list;
>>>> int i;
>>>> xc_dominfo_t info;
>>>> + unsigned long max_pages, free_pages, real_free;
>>>> + long need_pages;
>>>> + uint64_t tot_pages, pod_cache_pages, pod_entries;
>>>> +
>>>> + trace_xen_ram_alloc(ram_addr, size, mr->name);
>>>>
>>>> if (runstate_check(RUN_STATE_INMIGRATE)) {
>>>> /* RAM already populated in Xen */
>>>> @@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>> size,
>>>> MemoryRegion *mr)
>>>> return;
>>>> }
>>>>
>>>> - fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
>>>> - " bytes (%ld Kib) of ram at "RAM_ADDR_FMT
>>>> - " mr.name=%s\n",
>>>> - __func__, size, (long)(size>>10), ram_addr, mr->name);
>>>> -
>>>> - trace_xen_ram_alloc(ram_addr, size);
>>>> -
>>>> nr_pfn = size >> TARGET_PAGE_BITS;
>>>> pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
>>>>
>>>> @@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>> size,
>>>> MemoryRegion *mr)
>>>> pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
>>>> }
>>>>
>>>> - if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
>>>> + if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) ||
> I think it's best to call xc_domain_getinfolist.
>
Will switch.
>>>> + (info.domid != xen_domid)) {
>>>> hw_error("xc_domain_getinfo failed");
>>>> }
>>>> - if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>>>> - (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
>>>> + max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
>>>> + free_pages = max_pages - info.nr_pages;
>>>> + real_free = free_pages;
>>>> + if (free_pages > VGA_HOLE_SIZE) {
>>>> + free_pages -= VGA_HOLE_SIZE;
>>>> + } else {
>>>> + free_pages = 0;
>>>> + }
> I don't think we need to subtract VGA_HOLE_SIZE.
If you do not use some slack (like 32 here), xen does report:
(d5) [2014-11-29 17:07:21] Loaded SeaBIOS
(d5) [2014-11-29 17:07:21] Creating MP tables ...
(d5) [2014-11-29 17:07:21] Loading ACPI ...
(XEN) [2014-11-29 17:07:21] page_alloc.c:1568:d5 Over-allocation for
domain 5: 1057417 > 1057416
(XEN) [2014-11-29 17:07:21] memory.c:158:d5 Could not allocate order=0
extent: id=5 memflags=0 (0 of 1)
(d5) [2014-11-29 17:07:21] vm86 TSS at 00098c00
(d5) [2014-11-29 17:07:21] BIOS map:
However while QEMU startup ends with 32 "free" pages (maxmem - curmem)
this quickly changes to 23. I.E. there are 7 more places that act like
a call
to xc_domain_populate_physmap_exact() but do not log errors if there is
no free pages. And just to make sure I do not fully understand what is
happening here, after the message above, the domain appears to work
fine and ends up with 8 "free" pages (code I do not know about ends up
releasing populated pages).
So my current thinking is to have QEMU leave a few (9 to 32 (64?)) pages
"free".
>
>>>> + need_pages = nr_pfn - free_pages;
>>>> + fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
>>>> + " bytes (%ld KiB) of ram at "RAM_ADDR_FMT
>>>> + " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n",
>>>> + __func__, size, (long)(size>>10), ram_addr,
>>>> + max_pages, free_pages, nr_pfn, need_pages,
>>>> + mr->name);
>>>> + if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages,
>>>> + &pod_cache_pages, &pod_entries) >= 0) {
>>>> + unsigned long populated = tot_pages - pod_cache_pages;
>>>> + long delta_tot = tot_pages - info.nr_pages;
>>>> +
>>>> + fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld
>>>> nop=%ld
>>>> free=%ld\n",
>>>> + __func__, populated, (long)tot_pages, delta_tot,
>>>> + (long)pod_cache_pages, (long)pod_entries,
>>>> + (long)info.nr_outstanding_pages, real_free);
>>>> + }
>>> What is the purpose of calling xc_domain_get_pod_target here? It doesn't
>>> look like is affecting the parameters passed to xc_domain_setmaxmem.
>> This was just to test and see if anything was needed for PoD mode.
>> I did not see anything.
>>
>> Did you want me to make a patch to send to the list that has my proposed
>> changes?
> Yep, that's fine.
>
>
>> I do think that what I have would be fine to do since most of the time the
>> new code does nothing with the current xen (and older versions), until
>> more then 4 nics are configured for xen.
>>
>> It would be good to have the change:
>>
>> [PATCH] libxl_set_memory_target: retain the same maxmem offset on top of the
>> current target
>>
>> (When working) in xen before using a QEMU with this change.
>>
>> Not sure I would push for 2.2 however.
> I wouldn't.
>
Will mark as for 2.3
-Don Slutz
next prev parent reply other threads:[~2014-12-02 20:23 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-25 17:45 [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap Stefano Stabellini
2014-11-25 18:24 ` [Qemu-devel] [Xen-devel] " Andrew Cooper
2014-11-26 18:17 ` Stefano Stabellini
2014-11-27 2:42 ` Don Slutz
2014-11-27 10:48 ` Stefano Stabellini
2014-12-01 13:33 ` Don Slutz
2014-12-01 15:37 ` Stefano Stabellini
2014-12-01 23:16 ` Don Slutz
2014-12-02 12:26 ` Stefano Stabellini
2014-12-02 20:23 ` Don Slutz [this message]
2014-12-03 10:54 ` Wei Liu
2014-12-03 12:20 ` Stefano Stabellini
2014-12-03 13:39 ` Don Slutz
2014-12-03 14:50 ` Stefano Stabellini
2014-12-04 16:26 ` Don Slutz
2014-12-04 16:38 ` Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=547E1FC1.70004@terremark.com \
--to=dslutz@verizon.com \
--cc=Ian.Campbell@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=qemu-devel@nongnu.org \
--cc=stefano.stabellini@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).