From: Don Slutz <dslutz@verizon.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
Don Slutz <dslutz@verizon.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
xen-devel@lists.xensource.com,
"Wei Liu (Intern)" <wei.liu2@citrix.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [Xen-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap
Date: Mon, 01 Dec 2014 18:16:22 -0500 [thread overview]
Message-ID: <547CF6C6.4060106@terremark.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1412011528540.14135@kaball.uk.xensource.com>
On 12/01/14 10:37, Stefano Stabellini wrote:
> On Mon, 1 Dec 2014, Don Slutz wrote:
>> On 11/27/14 05:48, Stefano Stabellini wrote:
[...]
>>
>> Works fine in both claim modes and with PoD used (maxmem > memory). Do
>> not know how to test with tmem. I do not see how it would be worse then
>> current
>> code that does not auto increase. I.E. even without a xen change, I think
>> something
>> like this could be done.
> OK, good to know. I am OK with increasing maxmem only if it is strictly
> necessary.
>
>
>>>> My testing shows a free 32 pages that I am not sure where they come from.
>>>> But
>>>> the code about is passing my 8 nics of e1000.
>>> I think that raising maxmem a bit higher than necessary is not too bad.
>>> If we really care about it, we could lower the limit after QEMU's
>>> initialization is completed.
>> Ok. I did find the 32 it is VGA_HOLE_SIZE. So here is what I have which
>> includes
>> a lot of extra printf.
> In QEMU I would prefer not to assume that libxl increased maxmem for the
> vga hole. I would rather call xc_domain_setmaxmem twice for the vga hole
> than tie QEMU to a particular maxmem allocation scheme in libxl.
Ok. The area we are talking about is 0x000a0000 to 0x000c0000.
It is in libxc (xc_hvm_build_x86), not libxl. I have no issue with a
name change to
some thing like QEMU_EXTRA_FREE_PAGES.
My testing has shows that some of these 32 pages are used outside of QEMU.
I am seeing just 23 free pages using a standalone program to display
the same info after a CentOS 6.3 guest is done booting.
> In libxl I would like to avoid increasing mamxem for anything QEMU will
> allocate later, that includes rom and vga vram. I am not sure how to
> make that work with older QEMU versions that don't call
> xc_domain_setmaxmem by themselves yet though. Maybe we could check the
> specific QEMU version in libxl and decide based on that. Or we could
> export a feature flag in QEMU.
Yes, it would be nice to adjust libxl to not increase maxmem. However since
videoram is included in memory (and maxmem), making the change related
to vram is a bigger issue.
the rom change is much simpler.
Currently I do not know of a way to do different things based on the
QEMU version
and/or features (this includes getting the QEMU version in libxl).
I have been going with:
1) change QEMU 1st.
2) Wait for an upstream version of QEMU with this.
3) change xen to optionally use a feature in the latest QEMU.
>
>
>> --- a/xen-hvm.c
>> +++ b/xen-hvm.c
>> @@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t
>> *shared_page, int vcpu)
>> #endif
>>
>> #define BUFFER_IO_MAX_DELAY 100
>> +#define VGA_HOLE_SIZE (0x20)
>>
>> typedef struct XenPhysmap {
>> hwaddr start_addr;
>> @@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
>> MemoryRegion *mr)
>> xen_pfn_t *pfn_list;
>> int i;
>> xc_dominfo_t info;
>> + unsigned long max_pages, free_pages, real_free;
>> + long need_pages;
>> + uint64_t tot_pages, pod_cache_pages, pod_entries;
>> +
>> + trace_xen_ram_alloc(ram_addr, size, mr->name);
>>
>> if (runstate_check(RUN_STATE_INMIGRATE)) {
>> /* RAM already populated in Xen */
>> @@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
>> MemoryRegion *mr)
>> return;
>> }
>>
>> - fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
>> - " bytes (%ld Kib) of ram at "RAM_ADDR_FMT
>> - " mr.name=%s\n",
>> - __func__, size, (long)(size>>10), ram_addr, mr->name);
>> -
>> - trace_xen_ram_alloc(ram_addr, size);
>> -
>> nr_pfn = size >> TARGET_PAGE_BITS;
>> pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
>>
>> @@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size,
>> MemoryRegion *mr)
>> pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
>> }
>>
>> - if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
>> + if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) ||
>> + (info.domid != xen_domid)) {
>> hw_error("xc_domain_getinfo failed");
>> }
>> - if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>> - (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
>> + max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
>> + free_pages = max_pages - info.nr_pages;
>> + real_free = free_pages;
>> + if (free_pages > VGA_HOLE_SIZE) {
>> + free_pages -= VGA_HOLE_SIZE;
>> + } else {
>> + free_pages = 0;
>> + }
>> + need_pages = nr_pfn - free_pages;
>> + fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
>> + " bytes (%ld KiB) of ram at "RAM_ADDR_FMT
>> + " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n",
>> + __func__, size, (long)(size>>10), ram_addr,
>> + max_pages, free_pages, nr_pfn, need_pages,
>> + mr->name);
>> + if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages,
>> + &pod_cache_pages, &pod_entries) >= 0) {
>> + unsigned long populated = tot_pages - pod_cache_pages;
>> + long delta_tot = tot_pages - info.nr_pages;
>> +
>> + fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld nop=%ld
>> free=%ld\n",
>> + __func__, populated, (long)tot_pages, delta_tot,
>> + (long)pod_cache_pages, (long)pod_entries,
>> + (long)info.nr_outstanding_pages, real_free);
>> + }
> What is the purpose of calling xc_domain_get_pod_target here? It doesn't
> look like is affecting the parameters passed to xc_domain_setmaxmem.
This was just to test and see if anything was needed for PoD mode.
I did not see anything.
Did you want me to make a patch to send to the list that has my proposed
changes?
I do think that what I have would be fine to do since most of the time the
new code does nothing with the current xen (and older versions), until
more then 4 nics are configured for xen.
It would be good to have the change:
[PATCH] libxl_set_memory_target: retain the same maxmem offset on top of
the current target
(When working) in xen before using a QEMU with this change.
Not sure I would push for 2.2 however.
-Don Slutz
>
>> + if ((free_pages < nr_pfn) &&
>> + (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>> + (need_pages * XC_PAGE_SIZE / 1024)) < 0)) {
>> hw_error("xc_domain_setmaxmem failed");
>> }
>> if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, 0,
>> pfn_list)) {
>>
>>
>> Note: I already had a trace_xen_ram_alloc, here is the delta in trace-events
>> (which
>> has others not yet sent):
>>
>>
>>
>> diff --git a/trace-events b/trace-events
>> index eaeecd5..a58fc11 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -908,7 +908,7 @@ pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) "%s
>> page: %"PRIx64""
>> pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of %s
>> pages: %u"
>>
>> # xen-all.c
>> -xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: %#lx,
>> size %#lx"
>> +xen_ram_alloc(unsigned long ram_addr, unsigned long size, const char*
>> mr_name) "requested: %#lx size %#lx mr->name=%s"
>> xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
>> log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
>> handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t
>> data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size)
>> "I/O=%p type=%d dir=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d
>> size=%d"
>> handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t
>> data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size)
>> "I/O=%p read type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" count=%d
>> size=%d"
>>
>>
>> -Don Slutz
>>
>>>> -Don Slutz
>>>>
>>>>
>>>>>>> + hw_error("xc_domain_setmaxmem failed");
>>>>>>> + }
>>>>>>> if (xc_domain_populate_physmap_exact(xen_xc, xen_domid,
>>>>>>> nr_pfn, 0,
>>>>>>> 0, pfn_list)) {
>>>>>>> hw_error("xen: failed to populate ram at " RAM_ADDR_FMT,
>>>>>>> ram_addr);
>>>>>>> }
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xen.org
>>>>>>> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2014-12-01 23:16 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-25 17:45 [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap Stefano Stabellini
2014-11-25 18:24 ` [Qemu-devel] [Xen-devel] " Andrew Cooper
2014-11-26 18:17 ` Stefano Stabellini
2014-11-27 2:42 ` Don Slutz
2014-11-27 10:48 ` Stefano Stabellini
2014-12-01 13:33 ` Don Slutz
2014-12-01 15:37 ` Stefano Stabellini
2014-12-01 23:16 ` Don Slutz [this message]
2014-12-02 12:26 ` Stefano Stabellini
2014-12-02 20:23 ` Don Slutz
2014-12-03 10:54 ` Wei Liu
2014-12-03 12:20 ` Stefano Stabellini
2014-12-03 13:39 ` Don Slutz
2014-12-03 14:50 ` Stefano Stabellini
2014-12-04 16:26 ` Don Slutz
2014-12-04 16:38 ` Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=547CF6C6.4060106@terremark.com \
--to=dslutz@verizon.com \
--cc=Ian.Campbell@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=qemu-devel@nongnu.org \
--cc=stefano.stabellini@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).