xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dan Magenheimer <dan.magenheimer@oracle.com>
To: "Keir (Xen.org)" <keir@xen.org>, Jan Beulich <JBeulich@novell.com>
Cc: Olaf Hering <olaf@aepfle.de>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Konrad Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	George Shuklin <george.shuklin@gmail.com>,
	"Tim (Xen.org)" <tim@xen.org>,
	xen-devel@lists.xen.org, Dario Faggioli <raistlin@linux.it>,
	Kurt Hackel <kurt.hackel@oracle.com>,
	Zhigang Wang <zhigang.x.wang@oracle.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: Proposed new "memory capacity claim" hypercall/feature
Date: Mon, 29 Oct 2012 10:06:15 -0700 (PDT)	[thread overview]
Message-ID: <60d00f38-98a3-4ec2-acbd-b49dafaada56@default> (raw)

Keir, Jan (et al) --

In a recent long thread [1], there was a great deal of discussion
about the possible need for a "memory reservation" hypercall.
While there was some confusion due to the two worldviews of static
vs dynamic management of physical memory capacity, one worldview
definitely has a requirement for this new capability.  It is still
uncertain whether the other worldview will benefit as well, though
I believe it eventually will, especially when page sharing is
fully deployed.

Note that to avoid confusion with existing usages of various
terms (such as "reservation"), I am now using the distinct
word "claim" as in a "land claim" or "mining claim":
http://dictionary.cambridge.org/dictionary/british/stake-a-claim 
When a toolstack creates a domain, it can first "stake a claim"
to the amount of memory capacity necessary to ensure the domain
launch will succeed.

In order to explore feasibility, I wanted to propose a possible
hypervisor design and would very much appreciate feedback!

The objective of the design is to ensure that a multi-threaded
toolstack can atomically claim a specific amount of RAM capacity for a
domain, especially in the presence of independent dynamic memory demand
(such as tmem and selfballooning) which the toolstack is not able to track.
"Claim X 50G" means that, on completion of the call, either (A) 50G of
capacity has been claimed for use by domain X and the call returns
success or (B) the call returns failure.  Note that in the above,
"claim" explicitly does NOT mean that specific physical RAM pages have
been assigned, only that the 50G of RAM capacity is not available either
to a subsequent "claim" or for most[2] independent dynamic memory demands.

I think the underlying hypervisor issue is that the current process
of "reserving" memory capacity (which currently does assign specific
physical RAM pages) is, by necessity when used for large quantities of RAM,
batched and slow and, consequently, can NOT be atomic.  One way to think
of the newly proposed "claim" is as "lazy reserving":  The capacity is
set aside even though specific physical RAM pages have not been assigned.
In another way, claiming is really just an accounting illusion, similar
to how an accountant must "accrue" future liabilities.

Hypervisor design/implementation overview:

A domain currently does RAM accounting with two primary counters
"tot_pages" and "max_pages".  (For now, let's ignore shr_pages,
paged_pages, and xenheap_pages, and I hope Olaf/Andre/others can
provide further expertise and input.)

Tot_pages is a struct_domain element in the hypervisor that tracks
the number of physical RAM pageframes "owned" by the domain.  The
hypervisor enforces that tot_pages is never allowed to exceed another
struct_domain element called max_pages.

I would like to introduce a new counter, which records how
much capacity is claimed for a domain which may or may not yet be
mapped to physical RAM pageframes.  To do so, I'd like to split
the concept of tot_pages into two variables, tot_phys_pages and
tot_claimed_pages and require the hypervisor to also enforce:

d.tot_phys_pages <= d.tot_claimed_pages[3] <= d.max_pages

I'd also split the hypervisor global "total_avail_pages" into
"total_free_pages" and "total_unclaimed_pages".  (I'm definitely
going to need to study more the two-dimensional array "avail"...)
The hypervisor must now do additional accounting to keep track
of the sum of claims across all domains and also enforce the
global:

total_unclaimed_pages <= total_free_pages

I think the memory_op hypercall can be extended to add two
additional subops, XENMEM_claim and XENMEM_release.  (Note: To
support tmem, there will need to be two variations of XEN_claim,
"hard claim" and "soft claim" [3].)  The XEN_claim subop atomically
evaluates total_unclaimed_pages against the new claim, claims
the pages for the domain if possible and returns success or failure.
The XEN_release "unsets" the domain's tot_claimed_pages (to an
"illegal" value such as zero or MINUS_ONE).

The hypervisor must also enforce some semantics:  If an allocation
occurs such that a domain's tot_phys_pages would equal or exceed
d.tot_claimed_pages, then d.tot_claimed_pages becomes "unset".
This enforces the temporary nature of a claim:  Once a domain
fully "occupies" its claim, the claim silently expires.

In the case of a dying domain, a XENMEM_release operation
is implied and must be executed by the hypervisor.

Ideally, the quantity of unclaimed memory for each domain and
for the system should be query-able.  This may require additional
memory_op hypercalls.

I'd very much appreciate feedback on this proposed design!

Thanks,
Dan

[1] http://lists.xen.org/archives/html/xen-devel/2012-09/msg02229.html
    and continued in October (the archives don't thread across months)
    http://lists.xen.org/archives/html/xen-devel/2012-10/msg00080.html 
[2] Pages used to store tmem "ephemeral" data may be an exception
    because those pages are "free-on-demand".
[3] I'd be happy to explain the minor additional work necessary to
    support tmem but have mostly left it out of the proposal for clarity.

             reply	other threads:[~2012-10-29 17:06 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 17:06 Dan Magenheimer [this message]
2012-10-29 18:24 ` Proposed new "memory capacity claim" hypercall/feature Keir Fraser
2012-10-29 21:08   ` Dan Magenheimer
2012-10-29 22:22     ` Keir Fraser
2012-10-29 23:03       ` Dan Magenheimer
2012-10-29 23:17         ` Keir Fraser
2012-10-30 15:13           ` Dan Magenheimer
2012-10-30 14:43             ` Keir Fraser
2012-10-30 16:33               ` Dan Magenheimer
2012-10-30  9:11         ` George Dunlap
2012-10-30 16:13           ` Dan Magenheimer
2012-10-29 22:35 ` Tim Deegan
2012-10-29 23:21   ` Dan Magenheimer
2012-10-30  8:13     ` Tim Deegan
2012-10-30 15:26       ` Dan Magenheimer
2012-10-30  8:29     ` Jan Beulich
2012-10-30 15:43       ` Dan Magenheimer
2012-10-30 16:04         ` Jan Beulich
2012-10-30 17:13           ` Dan Magenheimer
2012-10-31  8:14             ` Jan Beulich
2012-10-31 16:04               ` Dan Magenheimer
2012-10-31 16:19                 ` Jan Beulich
2012-10-31 16:51                   ` Dan Magenheimer
2012-11-02  9:01                     ` Jan Beulich
2012-11-02  9:30                       ` Keir Fraser
2012-11-04 19:43                         ` Dan Magenheimer
2012-11-04 20:35                           ` Tim Deegan
2012-11-05  0:23                             ` Dan Magenheimer
2012-11-05 10:29                               ` Ian Campbell
2012-11-05 14:54                                 ` Dan Magenheimer
2012-11-05 22:24                                   ` Ian Campbell
2012-11-05 22:58                                     ` Zhigang Wang
2012-11-05 22:58                                     ` Dan Magenheimer
2012-11-06 13:23                                       ` Ian Campbell
2012-11-05 22:33                             ` Dan Magenheimer
2012-11-06 10:49                               ` Jan Beulich
2012-11-05  9:16                           ` Jan Beulich
2012-11-07 22:17                             ` Dan Magenheimer
2012-11-08  7:36                               ` Keir Fraser
2012-11-08 10:11                                 ` Ian Jackson
2012-11-08 10:57                                   ` Keir Fraser
2012-11-08 21:45                                   ` Dan Magenheimer
2012-11-12 11:03                                     ` Ian Jackson
2012-11-08  8:00                               ` Jan Beulich
2012-11-08  8:18                                 ` Keir Fraser
2012-11-08  8:54                                   ` Jan Beulich
2012-11-08  9:12                                     ` Keir Fraser
2012-11-08  9:47                                       ` Jan Beulich
2012-11-08 10:50                                         ` Keir Fraser
2012-11-08 13:48                                           ` Jan Beulich
2012-11-08 19:16                                             ` Dan Magenheimer
2012-11-08 22:32                                               ` Keir Fraser
2012-11-09  8:47                                               ` Jan Beulich
2012-11-08 18:38                                 ` Dan Magenheimer
2012-11-05 17:14         ` George Dunlap
2012-11-05 18:21           ` Dan Magenheimer
2012-11-01  2:13   ` Dario Faggioli
2012-11-01 15:51     ` Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=60d00f38-98a3-4ec2-acbd-b49dafaada56@default \
    --to=dan.magenheimer@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@novell.com \
    --cc=george.shuklin@gmail.com \
    --cc=keir@xen.org \
    --cc=konrad.wilk@oracle.com \
    --cc=kurt.hackel@oracle.com \
    --cc=olaf@aepfle.de \
    --cc=raistlin@linux.it \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    --cc=zhigang.x.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).