From: Don Slutz <dslutz@verizon.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: xen-devel@lists.xensource.com,
"Wei Liu (Intern)" <wei.liu2@citrix.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
qemu-devel@nongnu.org, Don Slutz <dslutz@verizon.com>
Subject: Re: [Qemu-devel] [Xen-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap
Date: Mon, 01 Dec 2014 08:33:06 -0500 [thread overview]
Message-ID: <547C6E12.50502@terremark.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1411271035580.14135@kaball.uk.xensource.com>
On 11/27/14 05:48, Stefano Stabellini wrote:
> On Wed, 26 Nov 2014, Don Slutz wrote:
>> On 11/26/14 13:17, Stefano Stabellini wrote:
>>> On Tue, 25 Nov 2014, Andrew Cooper wrote:
>>>> On 25/11/14 17:45, Stefano Stabellini wrote:
>>>>> Increase maxmem before calling xc_domain_populate_physmap_exact to avoid
>>>>> the risk of running out of guest memory. This way we can also avoid
>>>>> complex memory calculations in libxl at domain construction time.
>>>>>
>>>>> This patch fixes an abort() when assigning more than 4 NICs to a VM.
>>>>>
>>>>> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>>>
>>>>> diff --git a/xen-hvm.c b/xen-hvm.c
>>>>> index 5c69a8d..38e08c3 100644
>>>>> --- a/xen-hvm.c
>>>>> +++ b/xen-hvm.c
>>>>> @@ -218,6 +218,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>>> size, MemoryRegion *mr)
>>>>> unsigned long nr_pfn;
>>>>> xen_pfn_t *pfn_list;
>>>>> int i;
>>>>> + xc_dominfo_t info;
>>>>> if (runstate_check(RUN_STATE_INMIGRATE)) {
>>>>> /* RAM already populated in Xen */
>>>>> @@ -240,6 +241,13 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>>> size, MemoryRegion *mr)
>>>>> pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
>>>>> }
>>>>> + if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
>>>> xc_domain_getinfo()'s interface is mad, and provides no guarantee that
>>>> it returns the information for the domain you requested. It also won't
>>>> return -1 on error. The correct error handing is:
>>>>
>>>> (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) || (info.domid !=
>>>> xen_domid)
>>> It might be wiser to switch to xc_domain_getinfolist
>> Either needs the same tests, since both return an vector of info.
> Right
>
>
>>>> ~Andrew
>>>>
>>>>> + hw_error("xc_domain_getinfo failed");
>>>>> + }
>>>>> + if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>>>>> + (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
>> There are two big issues and 1 minor one with this.
>> 1) You will allocate the videoram again.
>> 2) You will never use the 1 MB already allocated for option ROMs.
>>
>> And the minor one is that you can increase maxmem more then is needed.
> I don't understand: are you aware that setmaxmem doesn't allocate any
> memory, just raises the maximum amount of memory allowed for the domain
> to have?
Yes.
> But you are right that we would raise the limit more than it could be,
> specifically the videoram would get accounted for twice and we wouldn't
> need LIBXL_MAXMEM_CONSTANT. I guess we would have to write a patch for
> that.
>
>
>
>> Here is a better if:
>>
>> - if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>> - (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
>> + max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
>> + free_pages = max_pages - info.nr_pages;
>> + need_pages = nr_pfn - free_pages;
>> + if ((free_pages < nr_pfn) &&
>> + (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>> + (need_pages * XC_PAGE_SIZE / 1024)) < 0)) {
> That's an interesting idea, but I am not sure if it is safe in all
> configurations.
>
> It could make QEMU work better with older libxl and avoid increasing
> maxmem more than necessary.
> On the other hand I guess it could break things when PoD is used, or in
> general when the user purposely sets maxmem on the vm config file.
>
Works fine in both claim modes and with PoD used (maxmem > memory). Do
not know how to test with tmem. I do not see how it would be worse then
current
code that does not auto increase. I.E. even without a xen change, I
think something
like this could be done.
>> My testing shows a free 32 pages that I am not sure where they come from. But
>> the code about is passing my 8 nics of e1000.
> I think that raising maxmem a bit higher than necessary is not too bad.
> If we really care about it, we could lower the limit after QEMU's
> initialization is completed.
Ok. I did find the 32 it is VGA_HOLE_SIZE. So here is what I have
which includes
a lot of extra printf.
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t
*shared_page, int vcpu)
#endif
#define BUFFER_IO_MAX_DELAY 100
+#define VGA_HOLE_SIZE (0x20)
typedef struct XenPhysmap {
hwaddr start_addr;
@@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
size, MemoryRegion *mr)
xen_pfn_t *pfn_list;
int i;
xc_dominfo_t info;
+ unsigned long max_pages, free_pages, real_free;
+ long need_pages;
+ uint64_t tot_pages, pod_cache_pages, pod_entries;
+
+ trace_xen_ram_alloc(ram_addr, size, mr->name);
if (runstate_check(RUN_STATE_INMIGRATE)) {
/* RAM already populated in Xen */
@@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
size, MemoryRegion *mr)
return;
}
- fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
- " bytes (%ld Kib) of ram at "RAM_ADDR_FMT
- " mr.name=%s\n",
- __func__, size, (long)(size>>10), ram_addr, mr->name);
-
- trace_xen_ram_alloc(ram_addr, size);
-
nr_pfn = size >> TARGET_PAGE_BITS;
pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
@@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
size, MemoryRegion *mr)
pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
}
- if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
+ if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) ||
+ (info.domid != xen_domid)) {
hw_error("xc_domain_getinfo failed");
}
- if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
- (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
+ max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
+ free_pages = max_pages - info.nr_pages;
+ real_free = free_pages;
+ if (free_pages > VGA_HOLE_SIZE) {
+ free_pages -= VGA_HOLE_SIZE;
+ } else {
+ free_pages = 0;
+ }
+ need_pages = nr_pfn - free_pages;
+ fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
+ " bytes (%ld KiB) of ram at "RAM_ADDR_FMT
+ " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n",
+ __func__, size, (long)(size>>10), ram_addr,
+ max_pages, free_pages, nr_pfn, need_pages,
+ mr->name);
+ if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages,
+ &pod_cache_pages, &pod_entries) >= 0) {
+ unsigned long populated = tot_pages - pod_cache_pages;
+ long delta_tot = tot_pages - info.nr_pages;
+
+ fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld
nop=%ld free=%ld\n",
+ __func__, populated, (long)tot_pages, delta_tot,
+ (long)pod_cache_pages, (long)pod_entries,
+ (long)info.nr_outstanding_pages, real_free);
+ }
+ if ((free_pages < nr_pfn) &&
+ (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
+ (need_pages * XC_PAGE_SIZE / 1024)) < 0)) {
hw_error("xc_domain_setmaxmem failed");
}
if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0,
0, pfn_list)) {
Note: I already had a trace_xen_ram_alloc, here is the delta in
trace-events (which
has others not yet sent):
diff --git a/trace-events b/trace-events
index eaeecd5..a58fc11 100644
--- a/trace-events
+++ b/trace-events
@@ -908,7 +908,7 @@ pvscsi_tx_rings_ppn(const char* label, uint64_t ppn)
"%s page: %"PRIx64""
pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of
%s pages: %u"
# xen-all.c
-xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested:
%#lx, size %#lx"
+xen_ram_alloc(unsigned long ram_addr, unsigned long size, const char*
mr_name) "requested: %#lx size %#lx mr->name=%s"
xen_client_set_memory(uint64_t start_addr, unsigned long size, bool
log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df,
uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count,
uint32_t size) "I/O=%p type=%d dir=%d df=%d ptr=%d port=%#"PRIx64"
data=%#"PRIx64" count=%d size=%d"
handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t
size) "I/O=%p read type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64"
count=%d size=%d"
-Don Slutz
>
>> -Don Slutz
>>
>>
>>>>> + hw_error("xc_domain_setmaxmem failed");
>>>>> + }
>>>>> if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0,
>>>>> 0, pfn_list)) {
>>>>> hw_error("xen: failed to populate ram at " RAM_ADDR_FMT,
>>>>> ram_addr);
>>>>> }
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xen.org
>>>>> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2014-12-01 13:33 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-25 17:45 [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap Stefano Stabellini
2014-11-25 18:24 ` [Qemu-devel] [Xen-devel] " Andrew Cooper
2014-11-26 18:17 ` Stefano Stabellini
2014-11-27 2:42 ` Don Slutz
2014-11-27 10:48 ` Stefano Stabellini
2014-12-01 13:33 ` Don Slutz [this message]
2014-12-01 15:37 ` Stefano Stabellini
2014-12-01 23:16 ` Don Slutz
2014-12-02 12:26 ` Stefano Stabellini
2014-12-02 20:23 ` Don Slutz
2014-12-03 10:54 ` Wei Liu
2014-12-03 12:20 ` Stefano Stabellini
2014-12-03 13:39 ` Don Slutz
2014-12-03 14:50 ` Stefano Stabellini
2014-12-04 16:26 ` Don Slutz
2014-12-04 16:38 ` Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=547C6E12.50502@terremark.com \
--to=dslutz@verizon.com \
--cc=Ian.Campbell@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=qemu-devel@nongnu.org \
--cc=stefano.stabellini@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).