From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Tim Deegan <tim@xen.org>
Cc: dan.magenheimer@oracle.com, xen-devel@lists.xensource.com,
keir@xen.org, keir.xen@gmail.com
Subject: Re: [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops).
Date: Fri, 8 Mar 2013 15:07:13 -0500 [thread overview]
Message-ID: <20130308200713.GA21733@phenom.dumpdata.com> (raw)
In-Reply-To: <20130307155635.GC30110@ocelot.phlegethon.org>
On Thu, Mar 07, 2013 at 03:56:35PM +0000, Tim Deegan wrote:
> At 10:36 -0500 on 06 Mar (1362566187), Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 06, 2013 at 09:07:46AM +0000, Tim Deegan wrote:
> > > Oh I see. That's pretty strange semantics for a 'claim', though.
> > > Wouldn't it make sense for the toolstack just to query free and claimed
> > > memory on the first pass and fail if there's not enough space?
> >
> > So do something like this:
> >
> > if ( dom->claim_enabled ) {
> > unsigned long outstanding = xc_domain_get_outstanding_pages(dom->xch);
> > xc_physinfo_t xcphysinfo = { 0 };
> > int flag = XENMEMF_claim_normal;
> >
> > rc = xc_physinfo(dom->xch, &xcphysinfo);
> >
> > if (xcphysinfo.total_pages + outstanding > dom->total_pages)
> > flag = XENMEMF_claim_tmem;
> >
> > rc = xc_domain_claim_pages(dom->xch, dom->guest_domid, dom->total_pages,
> > flag);
> > }
> >
> > (Ignorning the checks for 'rc' and bailing out as neccessary)
>
> No, I meant to get rid of the XENMEMF_claim_* flags altogether (so all
> claims are 'tmem' claims) and, at a suitable level in the toolstack, do
> something like this:
>
> LOCK
> free := [XEN_SYSCTL_physinfo]->free
> claimed := [XENMEM_get_outstanding_claims]
> IF need_free_memory AND free - claimed < needed THEN
> DESPAIR (claim would need tmem freeable pages)
> ELSE
> IF [XENMEM_claim](needed) THEN
> REJOICE
> ELSE
> DESPAIR (could not claim enough memory)
> ENDIF
> ENDIF
> UNLOCK
>
> (using whatever wrappers around the hypercalls are appropriate)
Right so if we eliminate the flags and the 'need_free_memory' this
becomes a lot easier.
Here is the patch (there is of course more of the toolstack changes - those
I can post later).
diff --git a/xen/common/domain.c b/xen/common/domain.c
index b360de1..64ee29d 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -507,6 +507,7 @@ int domain_kill(struct domain *d)
evtchn_destroy(d);
gnttab_release_mappings(d);
tmem_destroy(d->tmem);
+ domain_set_outstanding_pages(d, 0);
d->tmem = NULL;
/* fallthrough */
case DOMDYING_dying:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index b7f6619..c98e99c 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -154,6 +154,7 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
info->tot_pages = d->tot_pages;
info->max_pages = d->max_pages;
+ info->outstanding_pages = d->outstanding_pages;
info->shr_pages = atomic_read(&d->shr_pages);
info->paged_pages = atomic_read(&d->paged_pages);
info->shared_info_frame = mfn_to_gmfn(d, virt_to_mfn(d->shared_info));
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 08550ef..3cfc1e3 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -712,6 +712,39 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
break;
}
+ case XENMEM_claim_pages:
+ if ( !IS_PRIV(current->domain) )
+ return -EPERM;
+
+ if ( copy_from_guest(&reservation, arg, 1) )
+ return -EFAULT;
+
+ if ( !guest_handle_is_null(reservation.extent_start) )
+ return -EINVAL;
+
+ if ( reservation.extent_order != 0 )
+ return -EINVAL;
+
+ if ( reservation.mem_flags != 0 )
+ return -EINVAL;
+
+ d = rcu_lock_domain_by_id(reservation.domid);
+ if ( d == NULL )
+ return -EINVAL;
+
+ rc = domain_set_outstanding_pages(d, reservation.nr_extents);
+
+ rcu_unlock_domain(d);
+
+ break;
+
+ case XENMEM_get_outstanding_pages:
+ if ( !IS_PRIV(current->domain) )
+ return -EPERM;
+
+ rc = get_outstanding_claims();
+ break;
+
default:
rc = arch_memory_op(op, arg);
break;
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e9fb15..aefef29 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -252,11 +252,114 @@ static long midsize_alloc_zone_pages;
#define MIDSIZE_ALLOC_FRAC 128
static DEFINE_SPINLOCK(heap_lock);
+static long outstanding_claims; /* total outstanding claims by all domains */
unsigned long domain_adjust_tot_pages(struct domain *d, long pages)
{
+ long dom_before, dom_after, dom_claimed, sys_before, sys_after;
+
ASSERT(spin_is_locked(&d->page_alloc_lock));
- return d->tot_pages += pages;
+ d->tot_pages += pages;
+
+ /*
+ * can test d->claimed_pages race-free because it can only change
+ * if d->page_alloc_lock and heap_lock are both held, see also
+ * domain_set_outstanding_pages below
+ */
+ if ( !d->outstanding_pages )
+ goto out;
+
+ spin_lock(&heap_lock);
+ /* adjust domain outstanding pages; may not go negative */
+ dom_before = d->outstanding_pages;
+ dom_after = dom_before - pages;
+ BUG_ON(dom_before < 0);
+ dom_claimed = dom_after < 0 ? 0 : dom_after;
+ d->outstanding_pages = dom_claimed;
+ /* flag accounting bug if system outstanding_claims would go negative */
+ sys_before = outstanding_claims;
+ sys_after = sys_before - (dom_before - dom_claimed);
+ BUG_ON(sys_after < 0);
+ outstanding_claims = sys_after;
+ spin_unlock(&heap_lock);
+
+out:
+ return d->tot_pages;
+}
+
+int domain_set_outstanding_pages(struct domain *d, unsigned long pages)
+{
+ int ret = -ENOMEM;
+ unsigned long claim, avail_pages;
+
+ /*
+ * take the domain's page_alloc_lock, else all d->tot_page adjustments
+ * must always take the global heap_lock rather than only in the much
+ * rarer case that d->outstanding_pages is non-zero
+ */
+ spin_lock(&d->page_alloc_lock);
+ spin_lock(&heap_lock);
+
+ /* pages==0 means "unset" the claim. */
+ if ( pages == 0 )
+ {
+ outstanding_claims -= d->outstanding_pages;
+ d->outstanding_pages = 0;
+ ret = 0;
+ goto out;
+ }
+
+ /* only one active claim per domain please */
+ if ( d->outstanding_pages )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* disallow a claim not exceeding current tot_pages or above max_pages */
+ if ( (pages <= d->tot_pages) || (pages > d->max_pages) )
+ {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ /* how much memory is available? */
+ avail_pages = total_avail_pages;
+
+ /* Note: The usage of claim means that allocation from a guest *might*
+ * have to come from freeable memory. Using free memory is always better, if
+ * it is available, than using freeable memory.
+ *
+ * But that is OK as once the claim has been made, it still can take minutes
+ * before the claim is fully satisfied. Tmem can make use of the unclaimed
+ * pages during this time (to store ephemeral/freeable pages only,
+ * not persistent pages).
+ */
+ avail_pages += tmem_freeable_pages();
+ avail_pages -= outstanding_claims;
+
+ /*
+ * Note, if domain has already allocated memory before making a claim
+ * then the claim must take tot_pages into account
+ */
+ claim = pages - d->tot_pages;
+ if ( claim > avail_pages )
+ goto out;
+
+ /* yay, claim fits in available memory, stake the claim, success! */
+ d->outstanding_pages = claim;
+ outstanding_claims += d->outstanding_pages;
+ ret = 0;
+
+out:
+ spin_unlock(&heap_lock);
+ spin_unlock(&d->page_alloc_lock);
+ return ret;
+}
+
+long get_outstanding_claims(void)
+{
+ return outstanding_claims;
}
static unsigned long init_node_heap(int node, unsigned long mfn,
@@ -397,7 +500,7 @@ static void __init setup_low_mem_virq(void)
static void check_low_mem_virq(void)
{
unsigned long avail_pages = total_avail_pages +
- (opt_tmem ? tmem_freeable_pages() : 0);
+ (opt_tmem ? tmem_freeable_pages() : 0) - outstanding_claims;
if ( unlikely(avail_pages <= low_mem_virq_th) )
{
@@ -466,6 +569,15 @@ static struct page_info *alloc_heap_pages(
spin_lock(&heap_lock);
/*
+ * Claimed memory is considered unavailable unless the request
+ * is made by a domain with sufficient unclaimed pages.
+ */
+ if ( (outstanding_claims + request >
+ total_avail_pages + tmem_freeable_pages()) &&
+ (d == NULL || d->outstanding_pages < request) )
+ goto not_found;
+
+ /*
* TMEM: When available memory is scarce due to tmem absorbing it, allow
* only mid-size allocations to avoid worst of fragmentation issues.
* Others try tmem pools then fail. This is a workaround until all
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index deb19db..113b8dc 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -36,7 +36,7 @@
#include "grant_table.h"
#include "hvm/save.h"
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000008
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000009
/*
* NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -95,6 +95,7 @@ struct xen_domctl_getdomaininfo {
uint32_t flags; /* XEN_DOMINF_* */
uint64_aligned_t tot_pages;
uint64_aligned_t max_pages;
+ uint64_aligned_t outstanding_pages;
uint64_aligned_t shr_pages;
uint64_aligned_t paged_pages;
uint64_aligned_t shared_info_frame; /* GMFN of shared_info struct */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 1c5ca19..51d5254 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -68,6 +68,8 @@ struct xen_memory_reservation {
* IN: GPFN bases of extents to populate with memory
* OUT: GMFN bases of extents that were allocated
* (NB. This command also updates the mach_to_phys translation table)
+ * XENMEM_claim_pages:
+ * IN: must be zero
*/
XEN_GUEST_HANDLE(xen_pfn_t) extent_start;
@@ -430,10 +432,39 @@ typedef struct xen_mem_sharing_op xen_mem_sharing_op_t;
DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_op_t);
/*
- * Reserve ops for future/out-of-tree "claim" patches (Oracle)
+ * Attempt to stake a claim for a domain on a quantity of pages
+ * of system RAM, but _not_ assign specific pageframes. Only
+ * arithmetic is performed so the hypercall is very fast and need
+ * not be preemptible, thus sidestepping time-of-check-time-of-use
+ * races for memory allocation. Returns 0 if the hypervisor page
+ * allocator has atomically and successfully claimed the requested
+ * number of pages, else non-zero.
+ *
+ * Any domain may have only one active claim. When sufficient memory
+ * has been allocated to resolve the claim, the claim silently expires.
+ * Claiming zero pages effectively resets any outstanding claim and
+ * is always successful.
+ *
+ * Note that a valid claim may be staked even after memory has been
+ * allocated for a domain. In this case, the claim is not incremental,
+ * i.e. if the domain's tot_pages is 3, and a claim is staked for 10,
+ * only 7 additional pages are claimed.
+ *
+ * Caller must be privileged or the hypercall fails.
*/
#define XENMEM_claim_pages 24
-#define XENMEM_get_unclaimed_pages 25
+
+/*
+ * XENMEM_claim_pages flags - the are no flags at this time.
+ * The zero value is appropiate.
+ */
+
+/*
+ * Get the number of pages currently claimed (but not yet "possessed")
+ * across all domains. The caller must be privileged but otherwise
+ * the call never fails.
+ */
+#define XENMEM_get_outstanding_pages 25
#endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 2f701f5..28512fb 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -49,7 +49,10 @@ void free_xenheap_pages(void *v, unsigned int order);
#define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
#define free_xenheap_page(v) (free_xenheap_pages(v,0))
+/* Claim handling */
unsigned long domain_adjust_tot_pages(struct domain *d, long pages);
+int domain_set_outstanding_pages(struct domain *d, unsigned long pages);
+long get_outstanding_claims(void);
/* Domain suballocator. These functions are *not* interrupt-safe.*/
void init_domheap_pages(paddr_t ps, paddr_t pe);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index e108436..569e76e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -242,6 +242,7 @@ struct domain
struct page_list_head page_list; /* linked list */
struct page_list_head xenpage_list; /* linked list (size xenheap_pages) */
unsigned int tot_pages; /* number of pages currently possesed */
+ unsigned int outstanding_pages; /* pages claimed but not possessed */
unsigned int max_pages; /* maximum value for tot_pages */
atomic_t shr_pages; /* number of shared pages */
atomic_t paged_pages; /* number of paged-out pages */
--
1.8.0.2
next prev parent reply other threads:[~2013-03-08 20:07 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-04 17:47 [PATCH v10] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
2013-03-04 17:47 ` [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops) Konrad Rzeszutek Wilk
2013-03-04 21:50 ` Keir Fraser
2013-03-05 12:01 ` Tim Deegan
2013-03-05 21:43 ` Konrad Rzeszutek Wilk
2013-03-06 9:07 ` Tim Deegan
2013-03-06 15:36 ` Konrad Rzeszutek Wilk
2013-03-07 14:59 ` Konrad Rzeszutek Wilk
2013-03-07 15:56 ` Tim Deegan
2013-03-08 20:07 ` Konrad Rzeszutek Wilk [this message]
2013-03-11 9:25 ` Tim Deegan
2013-03-04 17:47 ` [PATCH 2/6] xc: use XENMEM_claim_pages during guest creation Konrad Rzeszutek Wilk
2013-03-04 17:47 ` [PATCH 3/6] xl: Implement XENMEM_claim_pages support via 'claim_mode' global config Konrad Rzeszutek Wilk
2013-03-04 17:47 ` [PATCH 4/6] xc: XENMEM_claim_pages claimed but not possesed value Konrad Rzeszutek Wilk
2013-03-04 17:47 ` [PATCH 5/6] xl: export 'unclaimed_pages' value from xcinfo Konrad Rzeszutek Wilk
2013-03-04 17:47 ` [PATCH 6/6] xl: 'xl list' supports '-c' for global claim information Konrad Rzeszutek Wilk
-- strict thread matches above, loose matches on Subject: below --
2013-03-11 14:20 [PATCH v11] claim and its friends for allocating multiple self-ballooning guests Konrad Rzeszutek Wilk
2013-03-11 14:20 ` [PATCH 1/6] mmu: Introduce XENMEM_claim_pages (subop of memory ops) Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130308200713.GA21733@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=dan.magenheimer@oracle.com \
--cc=keir.xen@gmail.com \
--cc=keir@xen.org \
--cc=tim@xen.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).