From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Vrabel Subject: Re: Balloon driver bug in increase_reservation Date: Mon, 2 Sep 2013 16:07:11 +0100 Message-ID: <5224A99F.1070300@citrix.com> References: <20130902144333.GA14104@zion.uk.xensource.com> <1378133323.7651.39.camel@kazak.uk.xensource.com> <20130902150432.GB14104@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130902150432.GB14104@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: xen-devel@lists.xen.org, Ian Campbell , Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On 02/09/13 16:04, Wei Liu wrote: > On Mon, Sep 02, 2013 at 03:48:43PM +0100, Ian Campbell wrote: >> On Mon, 2013-09-02 at 15:43 +0100, Wei Liu wrote: >>> Hi, Stefano >>> >>> I found another bug in the balloon scratch page code. As I didn't follow >>> the discussion on scratch page so I cannot propose a proper fix at the >>> moment. >>> >>> The problem is that in balloon.c:increase_reservation, when a ballooned >>> page is resued, it can have a valid P2M entry pointing to the scratch, >>> hitting the BUG_ON >>> >>> BUG_ON(!xen_feature(XENFEAT_auto_translated_physmap) && >>> phys_to_machine_mapping_valid(pfn)); >>> >>> As balloon worker might run by a CPU other then the one that returns the >>> page, checking pfn_to_mfn(pfn) == local_cpu_scratch_page_mfn wouldn't >>> work. Checking pfn_to_mfn(pfn) belongs to the set of all scratch page >>> mfns is not desirable. >> >> This makes me think that whoever suggested that pfn_to_mfn for a >> ballooned page out to return INVALID_MFN was right. >> > > If there are many balloon pages the check can be expensive. If we make > the scratch page globally shared by all CPUs the check can be less > expensive? I don't understand why we need to have one page per CPU at > first glance. There needs to be a per-CPU mapping of the scratch pages as part of doing the unmap_and_replace with the scratch page, its mapping is cleared. It is later restored as a separate hypercall. David