From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>,
Razvan Cojocaru <rzvncj@gmail.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
Date: Wed, 4 Dec 2013 11:40:22 -0500 [thread overview]
Message-ID: <20131204164021.GE391@pegasus.dumpdata.com> (raw)
In-Reply-To: <529F21D10200007800109F6B@nat28.tlf.novell.com>
On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote:
> >>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
> > On 12/04/2013 12:04 PM, Ian Campbell wrote:
> >> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
> >>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> >>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I
> >>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
> >>>
> >>> But that's the source of my not understanding: xen_make_pte()
> >>> derives addr from the passed in pte, and that pte can - for a
> >>> foreign domain's page - hardly hold a PFN. Otherwise how would
> >>> the translation to MFN be supposed to happen? Yet, if it's a
> >>> machine address that's coming in, it can't point into the low 1Mb.
> >>
> >> Isn't it a foreign gpfn at this point, which for an HVM guest is
> >> actually a PFN not an MFN?
> >>
> >> You are making me think I might be talking out my a**e though, because
> >> what is a foreign mapping even doing in xen_make_pte -- those need to be
> >> instantiated in a special way.
> >>
> > I believe the callpath for this is
> >
> > xen_remap_domain_range() (mmu.c)
> > |
> > v
> > remap_area_pfn_pte() (mmu.c)
> > |
> > v
> > pfn_pte() (somewhere, one of the pgtable.h hdrs)
> > |
> > v
> > __pte() (paravirt.h)
> > |
> > v
> > xen_make_pte (mmu.c) via pv_mmu_ops.make_pte
> >
> > Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN,
> > but it seems the case.
>
> But that's a fundamental thing to explain. As Ian says - foreign PFNs
> shouldn't make it here, or else how do you know how to translate
> them to MFNs (as you can't consult the local P2M table to do so)?
This is all done via the toolstack which does the /dev/xen ioctl to map
some of its user-space memory in the guest memory. It ends up getting
the MFNs via some hypercall (forgotten which) and inputs those in the
IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with
_PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore
the P2M and use that specific MFN value.
It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage
out - but this is the last bastion where it is used.
The check that the xen_make_pte for the VM_IO for 1:1 pages is not
really needed anymore - as we have the 1:1 pages in the P2M (except for
the InfiniBand MMIO regions which are at 60TB and the P2M doesn't reach
there - but that is different bug).
So the check there could actually be lessen - and we can piggyback on
the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the
xen_pte_val - which we would only be set by xen_make_pte iff P2M says
the page is 1:1.
Not compile tested:
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ce563be..98efb65 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
if (mfn & IDENTITY_FRAME_BIT) {
mfn &= ~IDENTITY_FRAME_BIT;
flags |= _PAGE_IOMAP;
- }
+ } else
+ flags &= _PAGE_IOMAP;
}
val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
}
@@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte)
pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
}
#endif
- if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
+ if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */
return pteval;
return pte_mfn_to_pfn(pteval);
@@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte)
#endif
/*
* Unprivileged domains are allowed to do IOMAPpings for
- * PCI passthrough, but not map ISA space. The ISA
- * mappings are just dummy local mappings to keep other
- * parts of the kernel happy.
+ * PCI passthrough. _PAGE_SPECIAL is done when user-space uses
+ * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP
+ * is supplied to use by xen_set_fixmap.
*/
- if (unlikely(pte & _PAGE_IOMAP) &&
- (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
+ if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP))
pte = iomap_pte(pte);
- } else {
- pte &= ~_PAGE_IOMAP;
+ else
pte = pte_pfn_to_mfn(pte);
- }
return native_make_pte(pte);
}
> Jan
>
>
next prev parent reply other threads:[~2013-12-04 16:40 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-03 15:06 Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU Razvan Cojocaru
2013-12-03 15:51 ` Ian Campbell
2013-12-03 15:59 ` Razvan Cojocaru
2013-12-03 16:09 ` Ian Campbell
2013-12-03 17:36 ` Tomasz Wroblewski
2013-12-03 18:59 ` Razvan Cojocaru
2013-12-03 19:07 ` Konrad Rzeszutek Wilk
2013-12-04 10:24 ` Tomasz Wroblewski
2013-12-04 10:31 ` Jan Beulich
2013-12-04 10:39 ` Ian Campbell
2013-12-04 10:42 ` Jan Beulich
2013-12-04 10:45 ` Ian Campbell
2013-12-04 10:54 ` Jan Beulich
2013-12-04 11:04 ` Ian Campbell
2013-12-04 11:23 ` Tomasz Wroblewski
2013-12-04 11:36 ` Jan Beulich
2013-12-04 12:01 ` Tomasz Wroblewski
2013-12-04 12:14 ` Jan Beulich
2013-12-04 12:23 ` Ian Campbell
2013-12-04 12:39 ` Jan Beulich
2013-12-04 16:40 ` Konrad Rzeszutek Wilk [this message]
2013-12-04 17:16 ` Tomasz Wroblewski
2014-07-08 14:54 ` Mihai Donțu
2013-12-04 11:42 ` Mihai Donțu
2013-12-04 14:19 ` Tomasz Wroblewski
2013-12-04 16:15 ` Mihai Donțu
-- strict thread matches above, loose matches on Subject: below --
2013-12-03 16:18 Razvan Cojocaru
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131204164021.GE391@pegasus.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@suse.com \
--cc=rzvncj@gmail.com \
--cc=tomasz.wroblewski@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.