From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] hypercall/mem: Introduce XENMEM_machphys_compat_mfn_list Date: Thu, 17 Apr 2014 12:58:03 +0100 Message-ID: <534FC1CB.5060006@citrix.com> References: <1397733014-26715-1-git-send-email-andrew.cooper3@citrix.com> <534FDA3C0200007800009FA4@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <534FDA3C0200007800009FA4@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Keir Fraser , Tim Deegan , Ian Jackson , Ian Campbell , Xen-devel List-Id: xen-devel@lists.xenproject.org On 17/04/14 12:42, Jan Beulich wrote: >>>> On 17.04.14 at 13:10, wrote: >> To correctly migrate a PV guest, the toolstack must remove Xen mappings from >> the guest pagetables. For 32bit PV guests, the pagetables cannot be walked >> from the top so upon encountering an L2 table, the toolstack must decide >> whether it contains Xen mappings or not, to avoid corrupting L2s without Xen >> mappings. >> >> The migration code performs this search efficiently by knowing that the Xen >> mappings will start at a known L2e and point to a known mfn, which will be >> the >> fist mfn in the m2p table. >> >> Unfortunately there are two m2p tables in use; a regular and a compatibility >> one. The toolstack looks for the first mfn of its own m2p table in the >> guest >> pagetables. This only works if the toolstack is the same bitness as the >> 32bit >> domain being migrated, and leaves a problem for 64bit toolstacks which will >> never be able to find its regular m2p in a compat guest. >> >> It appears that this bug for 64bit toolstacks was discovered, but hacked >> around in an unsafe manor. The code currently shoots any invalid L2es and >> doesn't report a failure for L2 tables in a 32 bit guest, even after the >> guest >> is paused. This means that non Xen entries which should fail the migration >> don't, and the guest will resume on the far side with unexpectedly fewer >> present pagetable entries. > So since that hack isn't being removed here, do you have plans to do > so? I am rewriting migration basically from scratch, which will involve removing/replacing almost all of xc_domain_{save,restore}.c Specifically fixing this bug here, just to delete the fix in the near future seems pointless. I suppose the one piece of information I missed was that I am happy to keep this as part of my migration series, but would like it to be reviewed slightly independently. > Knowing where that is would help judging on the need for this > patch (and I don't want to waste time hunting for it knowing that you > already spotted it)... The bug is in canonicalize_pagetable() in xc_domain_save.c The search for affected L2 tables at line 445 will never succeed for a 64 bit guest, and the decision at line 500 will fail to identify any errors which should be fatal. > >> + case XENMEM_machphys_compat_mfn_list: >> + { >> + unsigned long limit, last_mfn; > You re-use other function scope variables - why not also last_mfn? > Which then raises the question whether limit shouldn't be simply > added there too. > > Jan > I didn't spot last_mfn in the other scope variables. I shall use the outer scope ones ~Andrew