xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>,
	Razvan Cojocaru <rzvncj@gmail.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
Date: Wed, 4 Dec 2013 11:40:22 -0500	[thread overview]
Message-ID: <20131204164021.GE391@pegasus.dumpdata.com> (raw)
In-Reply-To: <529F21D10200007800109F6B@nat28.tlf.novell.com>

On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote:
> >>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
> > On 12/04/2013 12:04 PM, Ian Campbell wrote:
> >> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
> >>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> >>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I
> >>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
> >>>
> >>> But that's the source of my not understanding: xen_make_pte()
> >>> derives addr from the passed in pte, and that pte can - for a
> >>> foreign domain's page - hardly hold a PFN. Otherwise how would
> >>> the translation to MFN be supposed to happen? Yet, if it's a
> >>> machine address that's coming in, it can't point into the low 1Mb.
> >>
> >> Isn't it a foreign gpfn at this point, which for an HVM guest is
> >> actually a PFN not an MFN?
> >>
> >> You are making me think I might be talking out my a**e though, because
> >> what is a foreign mapping even doing in xen_make_pte -- those need to be
> >> instantiated in a special way.
> >>
> > I believe the callpath for this is
> > 
> > xen_remap_domain_range() (mmu.c)
> >   |
> >   v
> > remap_area_pfn_pte() (mmu.c)
> >   |
> >   v
> > pfn_pte() (somewhere, one of the pgtable.h hdrs)
> >   |
> >   v
> > __pte() (paravirt.h)
> >   |
> >   v
> > xen_make_pte (mmu.c) via pv_mmu_ops.make_pte
> > 
> > Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN, 
> > but it seems the case.
> 
> But that's a fundamental thing to explain. As Ian says - foreign PFNs
> shouldn't make it here, or else how do you know how to translate
> them to MFNs (as you can't consult the local P2M table to do so)?

This is all done via the toolstack which does the /dev/xen ioctl to map
some of its user-space memory in the guest memory. It ends up getting
the MFNs via some hypercall (forgotten which) and inputs those in the
IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with
_PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore
the P2M and use that specific MFN value.

It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage
out - but this is the last bastion where it is used.

The check that the xen_make_pte for the VM_IO for 1:1 pages is not
really needed anymore - as we have the 1:1 pages in the P2M (except for
the InfiniBand MMIO regions which are at 60TB and the P2M doesn't reach
there - but that is different bug).

So the check there could actually be lessen - and we can piggyback on
the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the
xen_pte_val - which we would only be set by xen_make_pte iff P2M says
the page is 1:1.


Not compile tested:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ce563be..98efb65 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 			if (mfn & IDENTITY_FRAME_BIT) {
 				mfn &= ~IDENTITY_FRAME_BIT;
 				flags |= _PAGE_IOMAP;
-			}
+			} else
+				flags &= _PAGE_IOMAP;
 		}
 		val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
 	}
@@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte)
 		pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
 	}
 #endif
-	if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
+	if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */
 		return pteval;
 
 	return pte_mfn_to_pfn(pteval);
@@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte)
 #endif
 	/*
 	 * Unprivileged domains are allowed to do IOMAPpings for
-	 * PCI passthrough, but not map ISA space.  The ISA
-	 * mappings are just dummy local mappings to keep other
-	 * parts of the kernel happy.
+	 * PCI passthrough. _PAGE_SPECIAL is done when user-space uses
+	 * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP
+	 * is supplied to use by xen_set_fixmap.
 	 */
-	if (unlikely(pte & _PAGE_IOMAP) &&
-	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
+	if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP))
 		pte = iomap_pte(pte);
-	} else {
-		pte &= ~_PAGE_IOMAP;
+	else
 		pte = pte_pfn_to_mfn(pte);
-	}
 
 	return native_make_pte(pte);
 }


> Jan
> 
> 

  parent reply	other threads:[~2013-12-04 16:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-03 15:06 Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU Razvan Cojocaru
2013-12-03 15:51 ` Ian Campbell
2013-12-03 15:59   ` Razvan Cojocaru
2013-12-03 16:09     ` Ian Campbell
2013-12-03 17:36       ` Tomasz Wroblewski
2013-12-03 18:59         ` Razvan Cojocaru
2013-12-03 19:07         ` Konrad Rzeszutek Wilk
2013-12-04 10:24           ` Tomasz Wroblewski
2013-12-04 10:31             ` Jan Beulich
2013-12-04 10:39               ` Ian Campbell
2013-12-04 10:42                 ` Jan Beulich
2013-12-04 10:45                   ` Ian Campbell
2013-12-04 10:54                     ` Jan Beulich
2013-12-04 11:04                       ` Ian Campbell
2013-12-04 11:23                         ` Tomasz Wroblewski
2013-12-04 11:36                           ` Jan Beulich
2013-12-04 12:01                             ` Tomasz Wroblewski
2013-12-04 12:14                               ` Jan Beulich
2013-12-04 12:23                                 ` Ian Campbell
2013-12-04 12:39                                   ` Jan Beulich
2013-12-04 16:40                             ` Konrad Rzeszutek Wilk [this message]
2013-12-04 17:16                               ` Tomasz Wroblewski
2014-07-08 14:54                                 ` Mihai Donțu
2013-12-04 11:42             ` Mihai Donțu
2013-12-04 14:19               ` Tomasz Wroblewski
2013-12-04 16:15                 ` Mihai Donțu
  -- strict thread matches above, loose matches on Subject: below --
2013-12-03 16:18 Razvan Cojocaru

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131204164021.GE391@pegasus.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=rzvncj@gmail.com \
    --cc=tomasz.wroblewski@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).