From: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Razvan Cojocaru <rzvncj@gmail.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Jan Beulich <JBeulich@suse.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
Date: Wed, 4 Dec 2013 18:16:11 +0100 [thread overview]
Message-ID: <529F635B.9060309@citrix.com> (raw)
In-Reply-To: <20131204164021.GE391@pegasus.dumpdata.com>
On 12/04/2013 05:40 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote:
>>>>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
>>> On 12/04/2013 12:04 PM, Ian Campbell wrote:
>>>> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
>>>>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>>>>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I
>>>>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
>>>>>
>>>>> But that's the source of my not understanding: xen_make_pte()
>>>>> derives addr from the passed in pte, and that pte can - for a
>>>>> foreign domain's page - hardly hold a PFN. Otherwise how would
>>>>> the translation to MFN be supposed to happen? Yet, if it's a
>>>>> machine address that's coming in, it can't point into the low 1Mb.
>>>>
>>>> Isn't it a foreign gpfn at this point, which for an HVM guest is
>>>> actually a PFN not an MFN?
>>>>
>>>> You are making me think I might be talking out my a**e though, because
>>>> what is a foreign mapping even doing in xen_make_pte -- those need to be
>>>> instantiated in a special way.
>>>>
>>> I believe the callpath for this is
>>>
>>> xen_remap_domain_range() (mmu.c)
>>> |
>>> v
>>> remap_area_pfn_pte() (mmu.c)
>>> |
>>> v
>>> pfn_pte() (somewhere, one of the pgtable.h hdrs)
>>> |
>>> v
>>> __pte() (paravirt.h)
>>> |
>>> v
>>> xen_make_pte (mmu.c) via pv_mmu_ops.make_pte
>>>
>>> Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN,
>>> but it seems the case.
>>
>> But that's a fundamental thing to explain. As Ian says - foreign PFNs
>> shouldn't make it here, or else how do you know how to translate
>> them to MFNs (as you can't consult the local P2M table to do so)?
>
> This is all done via the toolstack which does the /dev/xen ioctl to map
> some of its user-space memory in the guest memory. It ends up getting
> the MFNs via some hypercall (forgotten which) and inputs those in the
> IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with
> _PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore
> the P2M and use that specific MFN value.
>
> It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage
> out - but this is the last bastion where it is used.
>
> The check that the xen_make_pte for the VM_IO for 1:1 pages is not
> really needed anymore - as we have the 1:1 pages in the P2M (except for
> the InfiniBand MMIO regions which are at 60TB and the P2M doesn't reach
> there - but that is different bug).
>
> So the check there could actually be lessen - and we can piggyback on
> the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the
> xen_pte_val - which we would only be set by xen_make_pte iff P2M says
> the page is 1:1.
>
>
> Not compile tested:
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index ce563be..98efb65 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
> if (mfn & IDENTITY_FRAME_BIT) {
> mfn &= ~IDENTITY_FRAME_BIT;
> flags |= _PAGE_IOMAP;
> - }
> + } else
> + flags &= _PAGE_IOMAP;
> }
> val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
> }
> @@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte)
> pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
> }
> #endif
> - if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
> + if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */
> return pteval;
>
> return pte_mfn_to_pfn(pteval);
> @@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte)
> #endif
> /*
> * Unprivileged domains are allowed to do IOMAPpings for
> - * PCI passthrough, but not map ISA space. The ISA
> - * mappings are just dummy local mappings to keep other
> - * parts of the kernel happy.
> + * PCI passthrough. _PAGE_SPECIAL is done when user-space uses
> + * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP
> + * is supplied to use by xen_set_fixmap.
> */
> - if (unlikely(pte & _PAGE_IOMAP) &&
> - (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> + if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP))
> pte = iomap_pte(pte);
I think this wont work because _PAGE_SPECIAL is not set at this point yet (inside xen_make_pte). It is only set after xen_make_pte. This is
why my patch contained this extra, rather nasty, hunk, which made _PAGE_SPECIAL set a bit earlier:
+static inline pte_t foreign_special_pfn_pte(unsigned long page_nr, pgprot_t pgprot)
+{
+ return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
+ massage_pgprot(pgprot) | _PAGE_SPECIAL);
+}
+
+
static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
unsigned long addr, void *data)
{
struct remap_data *rmd = data;
- pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot));
+ pte_t pte = foreign_special_pfn_pte(rmd->mfn++, rmd->prot);
rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
rmd->mmu_update->val = pte_val_ma(pte);
I've basically made a new function foreign_special_pfn_pte which is unrolled pte_mkspecial with a small difference that it sets
_PAGE_SPECIAL bit before calling __pte, not after (because __pte calls into xen_make_pte). Maybe cleanest way of fixing this would be just
to have separate path for this which doesn't use xen_make_pte at all?
next prev parent reply other threads:[~2013-12-04 17:16 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-03 15:06 Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU Razvan Cojocaru
2013-12-03 15:51 ` Ian Campbell
2013-12-03 15:59 ` Razvan Cojocaru
2013-12-03 16:09 ` Ian Campbell
2013-12-03 17:36 ` Tomasz Wroblewski
2013-12-03 18:59 ` Razvan Cojocaru
2013-12-03 19:07 ` Konrad Rzeszutek Wilk
2013-12-04 10:24 ` Tomasz Wroblewski
2013-12-04 10:31 ` Jan Beulich
2013-12-04 10:39 ` Ian Campbell
2013-12-04 10:42 ` Jan Beulich
2013-12-04 10:45 ` Ian Campbell
2013-12-04 10:54 ` Jan Beulich
2013-12-04 11:04 ` Ian Campbell
2013-12-04 11:23 ` Tomasz Wroblewski
2013-12-04 11:36 ` Jan Beulich
2013-12-04 12:01 ` Tomasz Wroblewski
2013-12-04 12:14 ` Jan Beulich
2013-12-04 12:23 ` Ian Campbell
2013-12-04 12:39 ` Jan Beulich
2013-12-04 16:40 ` Konrad Rzeszutek Wilk
2013-12-04 17:16 ` Tomasz Wroblewski [this message]
2014-07-08 14:54 ` Mihai Donțu
2013-12-04 11:42 ` Mihai Donțu
2013-12-04 14:19 ` Tomasz Wroblewski
2013-12-04 16:15 ` Mihai Donțu
-- strict thread matches above, loose matches on Subject: below --
2013-12-03 16:18 Razvan Cojocaru
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=529F635B.9060309@citrix.com \
--to=tomasz.wroblewski@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@suse.com \
--cc=konrad.wilk@oracle.com \
--cc=rzvncj@gmail.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.