From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Linux 4.1 reports wrong number of pages to toolstack Date: Fri, 4 Sep 2015 21:32:29 +0100 Message-ID: <55E9FFDD.10301@citrix.com> References: <20150904004039.GA23402@zion.uk.xensource.com> <55E91225.4090500@suse.com> <55E97259020000780009F87F@prv-mh.provo.novell.com> <55E965F8.7060200@citrix.com> <20150904113503.GP18474@zion.uk.xensource.com> <55E9E55F.6000108@citrix.com> <20150904194656.GA5612@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZXxf4-0005AY-SC for xen-devel@lists.xenproject.org; Fri, 04 Sep 2015 20:32:38 +0000 In-Reply-To: <20150904194656.GA5612@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: Juergen Gross , Ian Campbell , Ian Jackson , David Vrabel , Jan Beulich , xen-devel@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org On 04/09/15 20:46, Wei Liu wrote: > >>> When I looked at write_batch() I found some snippets that I thought to >>> be wrong. But I didn't what to make the judgement when I didn't have a >>> clear head. >> write_batch() is a complicated function but it can't usefully be split any >> further. I would be happy to explain bits or expand the existing comments, >> but it is also possible that it is buggy. >> > I think write_batch is correct. I overlooked one function call. I'm not > overly happy with the handling of balloon pages and the use of deferred > array in non-live transfer, but those things are not buggy in itself. Handling of ballooned pages is broken at several layers. This was covered in my talk at Seattle. Fixing it is non-trivial. The use of the deferred array is necessary for live migrates, and used in non-live migrates to avoid diverging the algorithm. Nothing in the non-live side queries the deferred array (which itself is a contributory factor to the ballooning issue, as there is no interlock to prevent something else issuing population/depopoulation hypercalls on behalf of the paused domain). ~Andrew