xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Keir Fraser <keir.xen@gmail.com>
To: Dan Magenheimer <dan.magenheimer@oracle.com>,
	Jan Beulich <JBeulich@novell.com>
Cc: Olaf Hering <olaf@aepfle.de>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	George Shuklin <george.shuklin@gmail.com>,
	"Tim (Xen.org)" <tim@xen.org>,
	xen-devel@lists.xen.org, Dario Faggioli <raistlin@linux.it>,
	Kurt Hackel <kurt.hackel@oracle.com>,
	Zhigang Wang <zhigang.x.wang@oracle.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: Re: Proposed new "memory capacity claim" hypercall/feature
Date: Tue, 30 Oct 2012 00:17:12 +0100	[thread overview]
Message-ID: <CCB4CD08.42BC0%keir.xen@gmail.com> (raw)
In-Reply-To: <806ef32a-2ded-42b5-a2ff-5f84ac1c1d47@default>

On 30/10/2012 00:03, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

>> From: Keir Fraser [mailto:keir@xen.org]
>> Subject: Re: Proposed new "memory capacity claim" hypercall/feature
>> 
>> On 29/10/2012 21:08, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
>> 
>> Well it does depend how scalable domain creation actually is as an
>> operation. If it is spending most of its time allocating memory then it is
>> quite likely that parallel creations will spend a lot of time competing for
>> the heap spinlock, and actually there will be little/no speedup compared
>> with serialising the creations. Further, if domain creation can take
>> minutes, it may be that we simply need to go optimise that -- we already
>> found one stupid thing in the heap allocator recently that was burining
>> loads of time during large-memory domain creations, and fixed it for a
>> massive speedup in that particular case.
> 
> I suppose ultimately it is a scalability question.  But Oracle's
> measure of success here is based on how long a human or a tool
> has to wait for confirmation to ensure that a domain will
> successfully launch.  If two domains are launched in parallel
> AND an indication is given that both will succeed, spinning on
> the heaplock a bit just makes for a longer "boot" time, which is
> just a cost of virtualization.  If they are launched in parallel
> and, minutes later (or maybe even 20 seconds later), one or
> both say "oops, I was wrong, there wasn't enough memory, so
> try again", that's not OK for data center operations, especially if
> there really was enough RAM for one, but not for both. Remember,
> in the Oracle environment, we are talking about an administrator/automation
> overseeing possibly hundreds of physical servers, not just a single
> user/server.
> 
> Does that make more sense?

Yes, that makes sense.

> The "claim" approach immediately guarantees success or failure.
> Unless there are enough "stupid things/optimisations" found that
> you would be comfortable putting memory allocation for a domain
> creation in a hypervisor spinlock, there will be a race unless
> an atomic mechanism exists such as "claiming" where
> only simple arithmetic must be done within a hypervisor lock.
> 
> Do you disagree?
> 
>>> and (2) tmem and/or other dynamic
>>> memory mechanisms may be asynchronously absorbing small-but-significant
>>> portions of RAM for other purposes during an attempted domain launch.
>> 
>> This is an argument against allocate-rather-than-reserve? I don't think that
>> makes sense -- so is this instead an argument against
>> reservation-as-a-toolstack-only-mechanism? I'm not actually convinced yet we
>> need reservations *at all*, before we get down to where it should be
>> implemented.
> 
> I'm not sure if we are defining terms the same, so that's hard
> to answer.  If you define "allocation" as "a physical RAM page frame
> number is selected (and possibly the physical page is zeroed)",
> then I'm not sure how your definition of "reservation" differs
> (because that's how increase/decrease_reservation are implemented
> in the hypervisor, right?).
> 
> Or did you mean "allocate-rather-than-claim" (where "allocate" is
> select a specific physical pageframe and "claim" means do accounting
> only?  If so, see the atomicity argument above.
> 
> I'm not just arguing against reservation-as-a-toolstack-mechanism,
> I'm stating I believe unequivocally that reservation-as-a-toolstack-
> only-mechanism and tmem are incompatible.  (Well, not _totally_
> incompatible... the existing workaround, tmem freeze/thaw, works
> but is also single-threaded and has fairly severe unnecessary
> performance repercussions.  So I'd like to solve both problems
> at the same time.)

Okay, so why is tmem incompatible with implementing claims in the toolstack?

 -- Keir

> Dan

  reply	other threads:[~2012-10-29 23:17 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-29 17:06 Proposed new "memory capacity claim" hypercall/feature Dan Magenheimer
2012-10-29 18:24 ` Keir Fraser
2012-10-29 21:08   ` Dan Magenheimer
2012-10-29 22:22     ` Keir Fraser
2012-10-29 23:03       ` Dan Magenheimer
2012-10-29 23:17         ` Keir Fraser [this message]
2012-10-30 15:13           ` Dan Magenheimer
2012-10-30 14:43             ` Keir Fraser
2012-10-30 16:33               ` Dan Magenheimer
2012-10-30  9:11         ` George Dunlap
2012-10-30 16:13           ` Dan Magenheimer
2012-10-29 22:35 ` Tim Deegan
2012-10-29 23:21   ` Dan Magenheimer
2012-10-30  8:13     ` Tim Deegan
2012-10-30 15:26       ` Dan Magenheimer
2012-10-30  8:29     ` Jan Beulich
2012-10-30 15:43       ` Dan Magenheimer
2012-10-30 16:04         ` Jan Beulich
2012-10-30 17:13           ` Dan Magenheimer
2012-10-31  8:14             ` Jan Beulich
2012-10-31 16:04               ` Dan Magenheimer
2012-10-31 16:19                 ` Jan Beulich
2012-10-31 16:51                   ` Dan Magenheimer
2012-11-02  9:01                     ` Jan Beulich
2012-11-02  9:30                       ` Keir Fraser
2012-11-04 19:43                         ` Dan Magenheimer
2012-11-04 20:35                           ` Tim Deegan
2012-11-05  0:23                             ` Dan Magenheimer
2012-11-05 10:29                               ` Ian Campbell
2012-11-05 14:54                                 ` Dan Magenheimer
2012-11-05 22:24                                   ` Ian Campbell
2012-11-05 22:58                                     ` Zhigang Wang
2012-11-05 22:58                                     ` Dan Magenheimer
2012-11-06 13:23                                       ` Ian Campbell
2012-11-05 22:33                             ` Dan Magenheimer
2012-11-06 10:49                               ` Jan Beulich
2012-11-05  9:16                           ` Jan Beulich
2012-11-07 22:17                             ` Dan Magenheimer
2012-11-08  7:36                               ` Keir Fraser
2012-11-08 10:11                                 ` Ian Jackson
2012-11-08 10:57                                   ` Keir Fraser
2012-11-08 21:45                                   ` Dan Magenheimer
2012-11-12 11:03                                     ` Ian Jackson
2012-11-08  8:00                               ` Jan Beulich
2012-11-08  8:18                                 ` Keir Fraser
2012-11-08  8:54                                   ` Jan Beulich
2012-11-08  9:12                                     ` Keir Fraser
2012-11-08  9:47                                       ` Jan Beulich
2012-11-08 10:50                                         ` Keir Fraser
2012-11-08 13:48                                           ` Jan Beulich
2012-11-08 19:16                                             ` Dan Magenheimer
2012-11-08 22:32                                               ` Keir Fraser
2012-11-09  8:47                                               ` Jan Beulich
2012-11-08 18:38                                 ` Dan Magenheimer
2012-11-05 17:14         ` George Dunlap
2012-11-05 18:21           ` Dan Magenheimer
2012-11-01  2:13   ` Dario Faggioli
2012-11-01 15:51     ` Dan Magenheimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CCB4CD08.42BC0%keir.xen@gmail.com \
    --to=keir.xen@gmail.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@novell.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=george.shuklin@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kurt.hackel@oracle.com \
    --cc=olaf@aepfle.de \
    --cc=raistlin@linux.it \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    --cc=zhigang.x.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).