From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tomasz Wroblewski Subject: Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU Date: Wed, 4 Dec 2013 18:16:11 +0100 Message-ID: <529F635B.9060309@citrix.com> References: <20131203190741.GB31373@phenom.dumpdata.com> <529F02D5.8090206@citrix.com> <529F12AD0200007800109E63@nat28.tlf.novell.com> <1386153568.15530.24.camel@kazak.uk.xensource.com> <529F15360200007800109EA3@nat28.tlf.novell.com> <1386153913.15530.27.camel@kazak.uk.xensource.com> <529F17DD0200007800109EDA@nat28.tlf.novell.com> <1386155067.17466.16.camel@kazak.uk.xensource.com> <529F109F.8080508@citrix.com> <529F21D10200007800109F6B@nat28.tlf.novell.com> <20131204164021.GE391@pegasus.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20131204164021.GE391@pegasus.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Razvan Cojocaru , Ian Campbell , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 12/04/2013 05:40 PM, Konrad Rzeszutek Wilk wrote: > On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote: >>>>> On 04.12.13 at 12:23, Tomasz Wroblewski wrote: >>> On 12/04/2013 12:04 PM, Ian Campbell wrote: >>>> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote: >>>>>>>> On 04.12.13 at 11:45, Ian Campbell wrote: >>>>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I >>>>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M. >>>>> >>>>> But that's the source of my not understanding: xen_make_pte() >>>>> derives addr from the passed in pte, and that pte can - for a >>>>> foreign domain's page - hardly hold a PFN. Otherwise how would >>>>> the translation to MFN be supposed to happen? Yet, if it's a >>>>> machine address that's coming in, it can't point into the low 1Mb. >>>> >>>> Isn't it a foreign gpfn at this point, which for an HVM guest is >>>> actually a PFN not an MFN? >>>> >>>> You are making me think I might be talking out my a**e though, because >>>> what is a foreign mapping even doing in xen_make_pte -- those need to be >>>> instantiated in a special way. >>>> >>> I believe the callpath for this is >>> >>> xen_remap_domain_range() (mmu.c) >>> | >>> v >>> remap_area_pfn_pte() (mmu.c) >>> | >>> v >>> pfn_pte() (somewhere, one of the pgtable.h hdrs) >>> | >>> v >>> __pte() (paravirt.h) >>> | >>> v >>> xen_make_pte (mmu.c) via pv_mmu_ops.make_pte >>> >>> Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN, >>> but it seems the case. >> >> But that's a fundamental thing to explain. As Ian says - foreign PFNs >> shouldn't make it here, or else how do you know how to translate >> them to MFNs (as you can't consult the local P2M table to do so)? > > This is all done via the toolstack which does the /dev/xen ioctl to map > some of its user-space memory in the guest memory. It ends up getting > the MFNs via some hypercall (forgotten which) and inputs those in the > IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with > _PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore > the P2M and use that specific MFN value. > > It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage > out - but this is the last bastion where it is used. > > The check that the xen_make_pte for the VM_IO for 1:1 pages is not > really needed anymore - as we have the 1:1 pages in the P2M (except for > the InfiniBand MMIO regions which are at 60TB and the P2M doesn't reach > there - but that is different bug). > > So the check there could actually be lessen - and we can piggyback on > the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the > xen_pte_val - which we would only be set by xen_make_pte iff P2M says > the page is 1:1. > > > Not compile tested: > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index ce563be..98efb65 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val) > if (mfn & IDENTITY_FRAME_BIT) { > mfn &= ~IDENTITY_FRAME_BIT; > flags |= _PAGE_IOMAP; > - } > + } else > + flags &= _PAGE_IOMAP; > } > val = ((pteval_t)mfn << PAGE_SHIFT) | flags; > } > @@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte) > pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT; > } > #endif > - if (xen_initial_domain() && (pteval & _PAGE_IOMAP)) > + if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */ > return pteval; > > return pte_mfn_to_pfn(pteval); > @@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte) > #endif > /* > * Unprivileged domains are allowed to do IOMAPpings for > - * PCI passthrough, but not map ISA space. The ISA > - * mappings are just dummy local mappings to keep other > - * parts of the kernel happy. > + * PCI passthrough. _PAGE_SPECIAL is done when user-space uses > + * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP > + * is supplied to use by xen_set_fixmap. > */ > - if (unlikely(pte & _PAGE_IOMAP) && > - (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { > + if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP)) > pte = iomap_pte(pte); I think this wont work because _PAGE_SPECIAL is not set at this point yet (inside xen_make_pte). It is only set after xen_make_pte. This is why my patch contained this extra, rather nasty, hunk, which made _PAGE_SPECIAL set a bit earlier: +static inline pte_t foreign_special_pfn_pte(unsigned long page_nr, pgprot_t pgprot) +{ + return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) | + massage_pgprot(pgprot) | _PAGE_SPECIAL); +} + + static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr, void *data) { struct remap_data *rmd = data; - pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot)); + pte_t pte = foreign_special_pfn_pte(rmd->mfn++, rmd->prot); rmd->mmu_update->ptr = virt_to_machine(ptep).maddr; rmd->mmu_update->val = pte_val_ma(pte); I've basically made a new function foreign_special_pfn_pte which is unrolled pte_mkspecial with a small difference that it sets _PAGE_SPECIAL bit before calling __pte, not after (because __pte calls into xen_make_pte). Maybe cleanest way of fixing this would be just to have separate path for this which doesn't use xen_make_pte at all?