From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Magenheimer Subject: Re: domain creation vs querying free memory (xend and xl) Date: Thu, 4 Oct 2012 10:18:29 -0700 (PDT) Message-ID: <16aa5326-5fd9-4f9c-b0fe-6196c6379991@default> References: <53b8c758-2675-42a7-b63f-4f9ad0006d84@default> <20581.55931.246130.308384@mariner.uk.xensource.com> <8ba2021c-1095-4fd1-98a5-f6eec8a3498b@default> <20121002091017.GA95926@ocelot.phlegethon.org> <66cc0085-1216-40f7-8059-eaf615202c12@default> <20121002201624.GA98445@ocelot.phlegethon.org> <20121004100645.GC38243@ocelot.phlegethon.org> <51CA094C-870A-4772-A22E-4CB151E854F2@gridcentric.ca> <48a08581-faa9-40a0-8afd-dc334ab82e43@default> <91C20B33-83CF-4B7A-A5AC-E61421B00536@gridcentric.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <91C20B33-83CF-4B7A-A5AC-E61421B00536@gridcentric.ca> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andres Lagar-Cavilla Cc: Olaf Hering , Keir Fraser , Konrad Wilk , George Dunlap , Kurt Hackel , Tim Deegan , xen-devel@lists.xen.org, George Shuklin , Dario Faggioli , Ian Jackson List-Id: xen-devel@lists.xenproject.org > From: Andres Lagar-Cavilla [mailto:andreslc@gridcentric.ca] > Subject: Re: [Xen-devel] domain creation vs querying free memory (xend and xl) > > > On Oct 4, 2012, at 12:59 PM, Dan Magenheimer wrote: > > >> From: Andres Lagar-Cavilla [mailto:andreslc@gridcentric.ca] > >> Subject: Re: [Xen-devel] domain creation vs querying free memory (xend and xl) > >> > >> On Oct 4, 2012, at 6:06 AM, Tim Deegan wrote: > >> > >>> At 14:56 -0700 on 02 Oct (1349189817), Dan Magenheimer wrote: > >>>> Tmem argues that doing "memory capacity transfers" at a page granularity > >>>> can only be done efficiently in the hypervisor. This is true for > >>>> page-sharing when it breaks a "share" also... it can't go ask the > >>>> toolstack to approve allocation of a new page every time a write to a shared > >>>> page occurs. > >>>> > >>>> Does that make sense? > >>> > >>> Yes. The page-sharing version can be handled by having a pool of > >>> dedicated memory for breaking shares, and the toolstack asynchronously > >>> replenish that, rather than allowing CoW to use up all memory in the > >>> system. > >> > >> That is doable. One benefit is that it would minimize the chance of a VM hitting a CoW ENOMEM. I > don't > >> see how it would altogether avoid it. > > > > Agreed, so it doesn't really solve the problem. (See longer reply > > to Tim.) > > > >> If the objective is trying to put a cap to the unpredictable growth of memory allocations via CoW > >> unsharing, two observations: (1) will never grow past nominal VM footprint (2) One can put a cap > today > >> by tweaking d->max_pages -- CoW will fail, faulting vcpu will sleep, and things can be kicked back > >> into action at a later point. > > > > But IIRC isn't it (2) that has given VMware memory overcommit a bad name? > > Any significant memory pressure due to overcommit leads to double-swapping, > > which leads to horrible performance? > > The little that I've been able to read from their published results is that a "lot" of CPU is consumed > scanning memory and fingerprinting, which leads to a massive assault on micro-architectural caches. > > I don't know if that equates to a "bad name", but I don't think that is a productive discussion > either. Sorry, I wasn't intending that to be snarky, but on re-read I guess it did sound snarky. What I meant is: Is this just a manual version of what VMware does automatically? Or is there something I am misunderstanding? (I think you answered that below.) > (2) doesn't mean swapping. Note that d->max_pages can be set artificially low by an admin, raised > again. etc. It's just a mechanism to keep a VM at bay while corrective measures of any kind are taken. > It's really up to a higher level controller whether you accept allocations and later reach a point of > thrashing. > > I understand this is partly where your discussion is headed, but certainly fixing the primary issue of > nominal vanilla allocations preempting each other looks fairly critical to begin with. OK. I _think_ the design I proposed helps in systems that are using page-sharing/host-swapping as well... I assume share-breaking just calls the normal hypervisor allocator interface to allocate a new page (if available)? If you could review and comment on the design from a page-sharing/host-swapping perspective, I would appreciate it. Thanks, Dan