qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Don Slutz <dslutz@verizon.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: xen-devel@lists.xensource.com,
	"Wei Liu (Intern)" <wei.liu2@citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	qemu-devel@nongnu.org, Don Slutz <dslutz@verizon.com>
Subject: Re: [Qemu-devel] [Xen-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap
Date: Mon, 01 Dec 2014 08:33:06 -0500	[thread overview]
Message-ID: <547C6E12.50502@terremark.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1411271035580.14135@kaball.uk.xensource.com>

On 11/27/14 05:48, Stefano Stabellini wrote:
> On Wed, 26 Nov 2014, Don Slutz wrote:
>> On 11/26/14 13:17, Stefano Stabellini wrote:
>>> On Tue, 25 Nov 2014, Andrew Cooper wrote:
>>>> On 25/11/14 17:45, Stefano Stabellini wrote:
>>>>> Increase maxmem before calling xc_domain_populate_physmap_exact to avoid
>>>>> the risk of running out of guest memory. This way we can also avoid
>>>>> complex memory calculations in libxl at domain construction time.
>>>>>
>>>>> This patch fixes an abort() when assigning more than 4 NICs to a VM.
>>>>>
>>>>> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>>>
>>>>> diff --git a/xen-hvm.c b/xen-hvm.c
>>>>> index 5c69a8d..38e08c3 100644
>>>>> --- a/xen-hvm.c
>>>>> +++ b/xen-hvm.c
>>>>> @@ -218,6 +218,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>>> size, MemoryRegion *mr)
>>>>>        unsigned long nr_pfn;
>>>>>        xen_pfn_t *pfn_list;
>>>>>        int i;
>>>>> +    xc_dominfo_t info;
>>>>>          if (runstate_check(RUN_STATE_INMIGRATE)) {
>>>>>            /* RAM already populated in Xen */
>>>>> @@ -240,6 +241,13 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>>>>> size, MemoryRegion *mr)
>>>>>            pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
>>>>>        }
>>>>>    +    if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
>>>> xc_domain_getinfo()'s interface is mad, and provides no guarantee that
>>>> it returns the information for the domain you requested.  It also won't
>>>> return -1 on error.  The correct error handing is:
>>>>
>>>> (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) || (info.domid !=
>>>> xen_domid)
>>> It might be wiser to switch to xc_domain_getinfolist
>> Either needs the same tests, since both return an vector of info.
> Right
>
>
>>>> ~Andrew
>>>>
>>>>> +        hw_error("xc_domain_getinfo failed");
>>>>> +    }
>>>>> +    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>>>>> +                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
>> There are two big issues and 1 minor one with this.
>> 1) You will allocate the videoram again.
>> 2) You will never use the 1 MB already allocated for option ROMs.
>>
>> And the minor one is that you can increase maxmem more then is needed.
> I don't understand: are you aware that setmaxmem doesn't allocate any
> memory, just raises the maximum amount of memory allowed for the domain
> to have?

Yes.

> But you are right that we would raise the limit more than it could be,
> specifically the videoram would get accounted for twice and we wouldn't
> need LIBXL_MAXMEM_CONSTANT. I guess we would have to write a patch for
> that.
>
>
>
>> Here is a better if:
>>
>> -    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>> -                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
>> +    max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
>> +    free_pages = max_pages - info.nr_pages;
>> +    need_pages = nr_pfn - free_pages;
>> +    if ((free_pages < nr_pfn) &&
>> +       (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
>> +                            (need_pages * XC_PAGE_SIZE / 1024)) < 0)) {
> That's an interesting idea, but I am not sure if it is safe in all
> configurations.
>
> It could make QEMU work better with older libxl and avoid increasing
> maxmem more than necessary.
> On the other hand I guess it could break things when PoD is used, or in
> general when the user purposely sets maxmem on the vm config file.
>

Works fine in both claim modes and with PoD used (maxmem > memory).  Do
not know how to test with tmem.  I do not see how it would be worse then 
current
code that does not auto increase.  I.E. even without a xen change, I 
think something
like this could be done.


>> My testing shows a free 32 pages that I am not sure where they come from.  But
>> the code about is passing my 8 nics of e1000.
> I think that raising maxmem a bit higher than necessary is not too bad.
> If we really care about it, we could lower the limit after QEMU's
> initialization is completed.

Ok.  I did find the 32 it is VGA_HOLE_SIZE.  So here is what I have 
which includes
a lot of extra printf.


--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -67,6 +67,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t 
*shared_page, int vcpu)
  #endif

  #define BUFFER_IO_MAX_DELAY  100
+#define VGA_HOLE_SIZE (0x20)

  typedef struct XenPhysmap {
      hwaddr start_addr;
@@ -219,6 +220,11 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t 
size, MemoryRegion *mr)
      xen_pfn_t *pfn_list;
      int i;
      xc_dominfo_t info;
+    unsigned long max_pages, free_pages, real_free;
+    long need_pages;
+    uint64_t tot_pages, pod_cache_pages, pod_entries;
+
+    trace_xen_ram_alloc(ram_addr, size, mr->name);

      if (runstate_check(RUN_STATE_INMIGRATE)) {
          /* RAM already populated in Xen */
@@ -232,13 +238,6 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t 
size, MemoryRegion *mr)
          return;
      }

-    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
-            " bytes (%ld Kib) of ram at "RAM_ADDR_FMT
-            " mr.name=%s\n",
-            __func__, size, (long)(size>>10), ram_addr, mr->name);
-
-    trace_xen_ram_alloc(ram_addr, size);
-
      nr_pfn = size >> TARGET_PAGE_BITS;
      pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);

@@ -246,11 +245,38 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t 
size, MemoryRegion *mr)
          pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
      }

-    if (xc_domain_getinfo(xen_xc, xen_domid, 1, &info) < 0) {
+    if ((xc_domain_getinfo(xen_xc, xen_domid, 1, &info) != 1) ||
+       (info.domid != xen_domid)) {
          hw_error("xc_domain_getinfo failed");
      }
-    if (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
-                            (nr_pfn * XC_PAGE_SIZE / 1024)) < 0) {
+    max_pages = info.max_memkb * 1024 / XC_PAGE_SIZE;
+    free_pages = max_pages - info.nr_pages;
+    real_free = free_pages;
+    if (free_pages > VGA_HOLE_SIZE) {
+        free_pages -= VGA_HOLE_SIZE;
+    } else {
+        free_pages = 0;
+    }
+    need_pages = nr_pfn - free_pages;
+    fprintf(stderr, "%s: alloc "RAM_ADDR_FMT
+            " bytes (%ld KiB) of ram at "RAM_ADDR_FMT
+            " mp=%ld fp=%ld nr=%ld need=%ld mr.name=%s\n",
+            __func__, size, (long)(size>>10), ram_addr,
+           max_pages, free_pages, nr_pfn, need_pages,
+            mr->name);
+    if (xc_domain_get_pod_target(xen_xc, xen_domid, &tot_pages,
+                                 &pod_cache_pages, &pod_entries) >= 0) {
+        unsigned long populated = tot_pages - pod_cache_pages;
+        long delta_tot = tot_pages - info.nr_pages;
+
+        fprintf(stderr, "%s: PoD pop=%ld tot=%ld(%ld) cnt=%ld ent=%ld 
nop=%ld free=%ld\n",
+                __func__, populated, (long)tot_pages, delta_tot,
+                (long)pod_cache_pages, (long)pod_entries,
+               (long)info.nr_outstanding_pages, real_free);
+    }
+    if ((free_pages < nr_pfn) &&
+       (xc_domain_setmaxmem(xen_xc, xen_domid, info.max_memkb +
+                            (need_pages * XC_PAGE_SIZE / 1024)) < 0)) {
          hw_error("xc_domain_setmaxmem failed");
      }
      if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, 
0, pfn_list)) {


Note: I already had a trace_xen_ram_alloc, here is the delta in 
trace-events (which
has others not yet sent):



diff --git a/trace-events b/trace-events
index eaeecd5..a58fc11 100644
--- a/trace-events
+++ b/trace-events
@@ -908,7 +908,7 @@ pvscsi_tx_rings_ppn(const char* label, uint64_t ppn) 
"%s page: %"PRIx64""
  pvscsi_tx_rings_num_pages(const char* label, uint32_t num) "Number of 
%s pages: %u"

  # xen-all.c
-xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: 
%#lx, size %#lx"
+xen_ram_alloc(unsigned long ram_addr, unsigned long size, const char* 
mr_name) "requested: %#lx size %#lx mr->name=%s"
  xen_client_set_memory(uint64_t start_addr, unsigned long size, bool 
log_dirty) "%#"PRIx64" size %#lx, log_dirty %i"
  handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, 
uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, 
uint32_t size) "I/O=%p type=%d dir=%d df=%d ptr=%d port=%#"PRIx64" 
data=%#"PRIx64" count=%d size=%d"
  handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t 
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t 
size) "I/O=%p read type=%d df=%d ptr=%d port=%#"PRIx64" data=%#"PRIx64" 
count=%d size=%d"


    -Don Slutz

>
>>     -Don Slutz
>>
>>
>>>>> +        hw_error("xc_domain_setmaxmem failed");
>>>>> +    }
>>>>>        if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0,
>>>>> 0, pfn_list)) {
>>>>>            hw_error("xen: failed to populate ram at " RAM_ADDR_FMT,
>>>>> ram_addr);
>>>>>        }
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xen.org
>>>>> http://lists.xen.org/xen-devel

  reply	other threads:[~2014-12-01 13:33 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-25 17:45 [Qemu-devel] [PATCH] increase maxmem before calling xc_domain_populate_physmap Stefano Stabellini
2014-11-25 18:24 ` [Qemu-devel] [Xen-devel] " Andrew Cooper
2014-11-26 18:17   ` Stefano Stabellini
2014-11-27  2:42     ` Don Slutz
2014-11-27 10:48       ` Stefano Stabellini
2014-12-01 13:33         ` Don Slutz [this message]
2014-12-01 15:37           ` Stefano Stabellini
2014-12-01 23:16             ` Don Slutz
2014-12-02 12:26               ` Stefano Stabellini
2014-12-02 20:23                 ` Don Slutz
2014-12-03 10:54                   ` Wei Liu
2014-12-03 12:20                     ` Stefano Stabellini
2014-12-03 13:39                       ` Don Slutz
2014-12-03 14:50                         ` Stefano Stabellini
2014-12-04 16:26                           ` Don Slutz
2014-12-04 16:38                             ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=547C6E12.50502@terremark.com \
    --to=dslutz@verizon.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).