From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH 7/7] xl: Fix 'free_memory' to include outstanding_claims value. Date: Tue, 16 Apr 2013 14:35:29 -0400 Message-ID: <20130416183529.GD9417@phenom.dumpdata.com> References: <1365800181-9877-1-git-send-email-konrad.wilk@oracle.com> <1365800181-9877-8-git-send-email-konrad.wilk@oracle.com> <20843.59271.672380.158979@mariner.uk.xensource.com> <20844.12237.939778.932762@mariner.uk.xensource.com> <516C9901.30706@oracle.com> <20845.28476.179853.234902@mariner.uk.xensource.com> <20130416164520.GA13390@ocelot.phlegethon.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20130416164520.GA13390@ocelot.phlegethon.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tim Deegan Cc: George Dunlap , Dan Magenheimer , "xen-devel@lists.xensource.com" , Ian Jackson , Ian Campbell List-Id: xen-devel@lists.xenproject.org On Tue, Apr 16, 2013 at 05:45:20PM +0100, Tim Deegan wrote: > At 16:33 +0100 on 16 Apr (1366129996), Ian Jackson wrote: > > konrad wilk writes ("Re: [PATCH 7/7] xl: Fix 'free_memory' to include outstanding_claims value."): > > > On 4/15/2013 12:50 PM, Ian Jackson wrote: > > > > I have checked this and the race is in the hypercall API. The > > > > hypercall API has already been checked in. So, under the > > > > circumstances, for this patch: > > ... > > > Right. Let me enumerate some ideas for fixing this on a different thread > > > (if we have the same race in mind that you spotted). > > > > The race I'm thinking of is this: > > > > When we display the effective amount of free memory (in "xl info" > > etc.), we calculate it as > > physinfo->free_pages - xc_domain_get_outstanding_pages() > > > > But these two values have been retrieved at different times. So while > > a domain is being built and memory moves from "free but claimed" to > > "in use", the free memory visible via the libxl API will sometimes be > > "wrong". > > > > This may seem like a minor point, but it will show up as occasional > > glitches in automatic monitoring and graphing systems; it might even > > trigger spurious nagios alerts in higher layers etc. If that was all > > there was to it I wouldn't regard it as a release-critical bug - a > > race like that would be annoying but could be fixed later. > > > > However, the race is baked into the hypercall API/ABI which we want to > > keep relatively stable at least in stable releases. > > > > I think the right answer is probably simply to move the total claim > > value into the physinfo struct, and do away with the separate > > XENMEM_get_outstanding_pages memory op hypercall. Do you agree ? > > FWIW, I think this is a good idea. You might have to be a little > careful on the hypervisor side to make sure the two values are actually > a matched pair (say by taking the page allocator lock). Going to prep a patch for that. I might not have it ready this week as Linus merge window is immienient and I need to queue up patches (And test). And then right after that I am out for a week. But after that - I will have the patch ready. > > Tim.