xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
@ 2013-12-03 15:06 Razvan Cojocaru
  2013-12-03 15:51 ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Razvan Cojocaru @ 2013-12-03 15:06 UTC (permalink / raw)
  To: xen-devel@lists.xen.org

Hello,

here's the setup: a Windows HVM domU and a Linux PV domU. The Linux
domU wants to map pages from the Windows domU. No XSM involved.

The Linux domU is perfectly able to map (using xc_map_foreign_range())
pages from the Windows domU, except for pages below 1M. For pages
below 1M, it returns "invalid argument". The same code, trying to map
the exact same pages, does succeed, however, if the application trying
to map those pages runs from dom0.

Why is this happening, and can anything be done about it so that the
Linux domU becomes able to map those pages from the HVM Windows domU?


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-03 15:06 Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU Razvan Cojocaru
@ 2013-12-03 15:51 ` Ian Campbell
  2013-12-03 15:59   ` Razvan Cojocaru
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-12-03 15:51 UTC (permalink / raw)
  To: Razvan Cojocaru; +Cc: xen-devel@lists.xen.org

On Tue, 2013-12-03 at 17:06 +0200, Razvan Cojocaru wrote:
> Hello,
> 
> here's the setup: a Windows HVM domU and a Linux PV domU. The Linux
> domU wants to map pages from the Windows domU. No XSM involved.
> 
> The Linux domU is perfectly able to map (using xc_map_foreign_range())
> pages from the Windows domU, except for pages below 1M.

With no XSM how does it have the privilege to do this?

>  For pages
> below 1M, it returns "invalid argument". The same code, trying to map
> the exact same pages, does succeed, however, if the application trying
> to map those pages runs from dom0.

For dom0 it works because by default dom0 has the foreign mapping
privilege.

> Why is this happening, and can anything be done about it so that the
> Linux domU becomes able to map those pages from the HVM Windows domU?
> 
> 
> Thanks,
> Razvan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-03 15:51 ` Ian Campbell
@ 2013-12-03 15:59   ` Razvan Cojocaru
  2013-12-03 16:09     ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Razvan Cojocaru @ 2013-12-03 15:59 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel@lists.xen.org

>> The Linux domU is perfectly able to map (using xc_map_foreign_range())
>> pages from the Windows domU, except for pages below 1M.
>
> With no XSM how does it have the privilege to do this?

What I meant to say is that the domU is being allowed to do this sort
of thing, i.e. the problem is definitely not caused by XSM.

>>  For pages
>> below 1M, it returns "invalid argument". The same code, trying to map
>> the exact same pages, does succeed, however, if the application trying
>> to map those pages runs from dom0.
>
> For dom0 it works because by default dom0 has the foreign mapping
> privilege.

OK, and can the foreign mapping privilege be extended to the domU so
that it can go about mapping pages under 1M? Is there some way this
can be achieved with xl, or even by hacking the HV source code
somehow?


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-03 15:59   ` Razvan Cojocaru
@ 2013-12-03 16:09     ` Ian Campbell
  2013-12-03 17:36       ` Tomasz Wroblewski
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-12-03 16:09 UTC (permalink / raw)
  To: Razvan Cojocaru; +Cc: xen-devel@lists.xen.org

On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote:
> >> The Linux domU is perfectly able to map (using xc_map_foreign_range())
> >> pages from the Windows domU, except for pages below 1M.
> >
> > With no XSM how does it have the privilege to do this?
> 
> What I meant to say is that the domU is being allowed to do this sort
> of thing, i.e. the problem is definitely not caused by XSM.

OK, so XSM is involved but you are 101% certain that it is not
preventing the mappings?

> 
> >>  For pages
> >> below 1M, it returns "invalid argument". The same code, trying to map
> >> the exact same pages, does succeed, however, if the application trying
> >> to map those pages runs from dom0.
> >
> > For dom0 it works because by default dom0 has the foreign mapping
> > privilege.
> 
> OK, and can the foreign mapping privilege be extended to the domU so
> that it can go about mapping pages under 1M?

AFAIK the foreign mapping privilege should already allow this. You have
just uncovered a bug somewhere. I'm afraid I don't know where, it might
be in your code, in libxc, in the privcmd ioctl driver or in the
hypervisor.

You probably need to instrument things up down the call stack to find
out where these attempts are getting rejected.

>  Is there some way this
> can be achieved with xl, or even by hacking the HV source code
> somehow?

You need to diagnose and fix the bug I think.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
@ 2013-12-03 16:18 Razvan Cojocaru
  0 siblings, 0 replies; 27+ messages in thread
From: Razvan Cojocaru @ 2013-12-03 16:18 UTC (permalink / raw)
  Cc: xen-devel@lists.xen.org

> OK, so XSM is involved but you are 101% certain that it is not
> preventing the mappings?

Yes, I really am :)

> AFAIK the foreign mapping privilege should already allow this. You have
> just uncovered a bug somewhere. I'm afraid I don't know where, it might
> be in your code, in libxc, in the privcmd ioctl driver or in the
> hypervisor.
>
> You probably need to instrument things up down the call stack to find
> out where these attempts are getting rejected.

Right. Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-03 16:09     ` Ian Campbell
@ 2013-12-03 17:36       ` Tomasz Wroblewski
  2013-12-03 18:59         ` Razvan Cojocaru
  2013-12-03 19:07         ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 27+ messages in thread
From: Tomasz Wroblewski @ 2013-12-03 17:36 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Razvan Cojocaru, xen-devel@lists.xen.org

On 12/03/2013 05:09 PM, Ian Campbell wrote:
> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote:
>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range())
>>>> pages from the Windows domU, except for pages below 1M.
>>>
>>> With no XSM how does it have the privilege to do this?
>>
>> What I meant to say is that the domU is being allowed to do this sort
>> of thing, i.e. the problem is definitely not caused by XSM.
>
> OK, so XSM is involved but you are 101% certain that it is not
> preventing the mappings?
>
We've ran into this issue in xenclient recently too, when we finally upgraded stubdomain's kernel to pvops version. It seems pvops kernel 
contains safeguard to only allow <1M mappings if it's dom0 (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c:

static pte_t xen_make_pte(pteval_t pte)
{
         phys_addr_t addr = (pte & PTE_PFN_MASK);

...
         /*
          * Unprivileged domains are allowed to do IOMAPpings for
          * PCI passthrough, but not map ISA space.  The ISA
          * mappings are just dummy local mappings to keep other
          * parts of the kernel happy.
          */
         if (unlikely(pte & _PAGE_IOMAP) &&
             (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
                 pte = iomap_pte(pte);
         } else {
                 pte &= ~_PAGE_IOMAP;
                 pte = pte_pfn_to_mfn(pte);
         }

         return native_make_pte(pte);
}

We patched this out (in a fugly and probably not very correct way), for our stubdomain kernel, since we needed our stubdomain qemu vms to be 
able to map windows guest <1M range (since qemu needs to be able to write data and read data there in order to chat with seabios etc). Maybe 
Konrad (CC'ed) knows why the check is there in guest kernel, and a good way to solve this.

I think the goal of check was to only stop <1M mapping of its own memory in order to stop pvops kernel boot messing it, but by ricochet it 
also prevents mapping of foreign domain <1M ranges...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-03 17:36       ` Tomasz Wroblewski
@ 2013-12-03 18:59         ` Razvan Cojocaru
  2013-12-03 19:07         ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 27+ messages in thread
From: Razvan Cojocaru @ 2013-12-03 18:59 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: xen-devel@lists.xen.org

> We've ran into this issue in xenclient recently too, when we finally
> upgraded stubdomain's kernel to pvops version. It seems pvops kernel
> contains safeguard to only allow <1M mappings if it's dom0
> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c:

Thanks Tomasz! That's a great lead.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-03 17:36       ` Tomasz Wroblewski
  2013-12-03 18:59         ` Razvan Cojocaru
@ 2013-12-03 19:07         ` Konrad Rzeszutek Wilk
  2013-12-04 10:24           ` Tomasz Wroblewski
  1 sibling, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-03 19:07 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: Razvan Cojocaru, Ian Campbell, xen-devel@lists.xen.org

On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote:
> On 12/03/2013 05:09 PM, Ian Campbell wrote:
> >On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote:
> >>>>The Linux domU is perfectly able to map (using xc_map_foreign_range())
> >>>>pages from the Windows domU, except for pages below 1M.
> >>>
> >>>With no XSM how does it have the privilege to do this?
> >>
> >>What I meant to say is that the domU is being allowed to do this sort
> >>of thing, i.e. the problem is definitely not caused by XSM.
> >
> >OK, so XSM is involved but you are 101% certain that it is not
> >preventing the mappings?
> >
> We've ran into this issue in xenclient recently too, when we finally
> upgraded stubdomain's kernel to pvops version. It seems pvops kernel
> contains safeguard to only allow <1M mappings if it's dom0
> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c:
> 
> static pte_t xen_make_pte(pteval_t pte)
> {
>         phys_addr_t addr = (pte & PTE_PFN_MASK);
> 
> ...
>         /*
>          * Unprivileged domains are allowed to do IOMAPpings for
>          * PCI passthrough, but not map ISA space.  The ISA
>          * mappings are just dummy local mappings to keep other
>          * parts of the kernel happy.
>          */
>         if (unlikely(pte & _PAGE_IOMAP) &&
>             (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
>                 pte = iomap_pte(pte);
>         } else {
>                 pte &= ~_PAGE_IOMAP;
>                 pte = pte_pfn_to_mfn(pte);
>         }
> 
>         return native_make_pte(pte);
> }
> 
> We patched this out (in a fugly and probably not very correct way),
> for our stubdomain kernel, since we needed our stubdomain qemu vms
> to be able to map windows guest <1M range (since qemu needs to be
> able to write data and read data there in order to chat with seabios
> etc). Maybe Konrad (CC'ed) knows why the check is there in guest
> kernel, and a good way to solve this.

For PV domU guests the ISA are usually RAM - so you don't want during
early bootup of a PV guest for it to scan MFNs it does not have access
to. Granted it does not have access to them but it would have the
MFNs coded in and any access to that area will result in .. Xen
"fixing" up the PTEs (I can't recall exaclty how).

If you boot a PV Guest and remove the:
             (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {

do you see anything that in the Xen console?

> 
> I think the goal of check was to only stop <1M mapping of its own
> memory in order to stop pvops kernel boot messing it, but by
> ricochet it also prevents mapping of foreign domain <1M ranges...

Duh! That was certainly unintentional.

> 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-03 19:07         ` Konrad Rzeszutek Wilk
@ 2013-12-04 10:24           ` Tomasz Wroblewski
  2013-12-04 10:31             ` Jan Beulich
  2013-12-04 11:42             ` Mihai Donțu
  0 siblings, 2 replies; 27+ messages in thread
From: Tomasz Wroblewski @ 2013-12-04 10:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Razvan Cojocaru, Ian Campbell, xen-devel@lists.xen.org

[-- Attachment #1: Type: text/plain, Size: 3264 bytes --]

On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote:
>> On 12/03/2013 05:09 PM, Ian Campbell wrote:
>>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote:
>>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range())
>>>>>> pages from the Windows domU, except for pages below 1M.
>>>>>
>>>>> With no XSM how does it have the privilege to do this?
>>>>
>>>> What I meant to say is that the domU is being allowed to do this sort
>>>> of thing, i.e. the problem is definitely not caused by XSM.
>>>
>>> OK, so XSM is involved but you are 101% certain that it is not
>>> preventing the mappings?
>>>
>> We've ran into this issue in xenclient recently too, when we finally
>> upgraded stubdomain's kernel to pvops version. It seems pvops kernel
>> contains safeguard to only allow <1M mappings if it's dom0
>> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c:
>>
>> static pte_t xen_make_pte(pteval_t pte)
>> {
>>          phys_addr_t addr = (pte & PTE_PFN_MASK);
>>
>> ...
>>          /*
>>           * Unprivileged domains are allowed to do IOMAPpings for
>>           * PCI passthrough, but not map ISA space.  The ISA
>>           * mappings are just dummy local mappings to keep other
>>           * parts of the kernel happy.
>>           */
>>          if (unlikely(pte & _PAGE_IOMAP) &&
>>              (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
>>                  pte = iomap_pte(pte);
>>          } else {
>>                  pte &= ~_PAGE_IOMAP;
>>                  pte = pte_pfn_to_mfn(pte);
>>          }
>>
>>          return native_make_pte(pte);
>> }
>>
>> We patched this out (in a fugly and probably not very correct way),
>> for our stubdomain kernel, since we needed our stubdomain qemu vms
>> to be able to map windows guest <1M range (since qemu needs to be
>> able to write data and read data there in order to chat with seabios
>> etc). Maybe Konrad (CC'ed) knows why the check is there in guest
>> kernel, and a good way to solve this.
>
> For PV domU guests the ISA are usually RAM - so you don't want during
> early bootup of a PV guest for it to scan MFNs it does not have access
> to. Granted it does not have access to them but it would have the
> MFNs coded in and any access to that area will result in .. Xen
> "fixing" up the PTEs (I can't recall exaclty how).
>
> If you boot a PV Guest and remove the:
>               (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
>
> do you see anything that in the Xen console?
>
I recall I wasn't seeing anything, the pv domU was just hanging super early in the boot then. The way we worked around it is via attached 
patch (applied to PV domU's kernel, in our case stubdom hosting qemu process). It keeps the <1M safeguard for local mapping but allows 
foreign mappings (detected via _PAGE_SPECIAL flag).

Razvan, you can try attached patch as well applied to your pv domU kernel to see if it helps you.




>>
>> I think the goal of check was to only stop <1M mapping of its own
>> memory in order to stop pvops kernel boot messing it, but by
>> ricochet it also prevents mapping of foreign domain <1M ranges...
>
> Duh! That was certainly unintentional.
>
>>
>>
>>


[-- Attachment #2: stubdom-allow-foreign-lowmem-map --]
[-- Type: text/plain, Size: 1317 bytes --]

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index cab96b6..dafd70d 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -502,8 +502,11 @@ static pte_t xen_make_pte(pteval_t pte)
 	 * mappings are just dummy local mappings to keep other
 	 * parts of the kernel happy.
 	 */
+
+        /* stubdom: we allow the mapping of lowmem of another domain, marked via
+         * simultaneous _PAGE_SPECIAL and _PAGE_IOMAP bits */
 	if (unlikely(pte & _PAGE_IOMAP) &&
-	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
+	    (xen_initial_domain() || addr >= ISA_END_ADDRESS) || (pte & _PAGE_SPECIAL)) {
 		pte = iomap_pte(pte);
 	} else {
 		pte &= ~_PAGE_IOMAP;
@@ -2483,11 +2486,18 @@ struct remap_data {
 	struct mmu_update *mmu_update;
 };
 
+static inline pte_t foreign_special_pfn_pte(unsigned long page_nr, pgprot_t pgprot)
+{
+	return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
+		     massage_pgprot(pgprot) | _PAGE_SPECIAL);
+}
+
+
 static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
 				 unsigned long addr, void *data)
 {
 	struct remap_data *rmd = data;
-	pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot));
+	pte_t pte = foreign_special_pfn_pte(rmd->mfn++, rmd->prot);
 
 	rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
 	rmd->mmu_update->val = pte_val_ma(pte);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 10:24           ` Tomasz Wroblewski
@ 2013-12-04 10:31             ` Jan Beulich
  2013-12-04 10:39               ` Ian Campbell
  2013-12-04 11:42             ` Mihai Donțu
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2013-12-04 10:31 UTC (permalink / raw)
  To: Tomasz Wroblewski, Konrad Rzeszutek Wilk
  Cc: Razvan Cojocaru, Ian Campbell, xen-devel@lists.xen.org

>>> On 04.12.13 at 11:24, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
> On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote:
>> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote:
>>> On 12/03/2013 05:09 PM, Ian Campbell wrote:
>>>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote:
>>>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range())
>>>>>>> pages from the Windows domU, except for pages below 1M.
>>>>>>
>>>>>> With no XSM how does it have the privilege to do this?
>>>>>
>>>>> What I meant to say is that the domU is being allowed to do this sort
>>>>> of thing, i.e. the problem is definitely not caused by XSM.
>>>>
>>>> OK, so XSM is involved but you are 101% certain that it is not
>>>> preventing the mappings?
>>>>
>>> We've ran into this issue in xenclient recently too, when we finally
>>> upgraded stubdomain's kernel to pvops version. It seems pvops kernel
>>> contains safeguard to only allow <1M mappings if it's dom0
>>> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c:
>>>
>>> static pte_t xen_make_pte(pteval_t pte)
>>> {
>>>          phys_addr_t addr = (pte & PTE_PFN_MASK);
>>>
>>> ...
>>>          /*
>>>           * Unprivileged domains are allowed to do IOMAPpings for
>>>           * PCI passthrough, but not map ISA space.  The ISA
>>>           * mappings are just dummy local mappings to keep other
>>>           * parts of the kernel happy.
>>>           */
>>>          if (unlikely(pte & _PAGE_IOMAP) &&
>>>              (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
>>>                  pte = iomap_pte(pte);
>>>          } else {
>>>                  pte &= ~_PAGE_IOMAP;
>>>                  pte = pte_pfn_to_mfn(pte);
>>>          }
>>>
>>>          return native_make_pte(pte);
>>> }
>>>
>>> We patched this out (in a fugly and probably not very correct way),
>>> for our stubdomain kernel, since we needed our stubdomain qemu vms
>>> to be able to map windows guest <1M range (since qemu needs to be
>>> able to write data and read data there in order to chat with seabios
>>> etc). Maybe Konrad (CC'ed) knows why the check is there in guest
>>> kernel, and a good way to solve this.
>>
>> For PV domU guests the ISA are usually RAM - so you don't want during
>> early bootup of a PV guest for it to scan MFNs it does not have access
>> to. Granted it does not have access to them but it would have the
>> MFNs coded in and any access to that area will result in .. Xen
>> "fixing" up the PTEs (I can't recall exaclty how).
>>
>> If you boot a PV Guest and remove the:
>>               (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
>>
>> do you see anything that in the Xen console?
>>
> I recall I wasn't seeing anything, the pv domU was just hanging super early 
> in the boot then. The way we worked around it is via attached 
> patch (applied to PV domU's kernel, in our case stubdom hosting qemu 
> process). It keeps the <1M safeguard for local mapping but allows 
> foreign mappings (detected via _PAGE_SPECIAL flag).

I've been following this thread, with each new response making it
less clear what is being talked about here: The original request
was to map the MFN backing a guest's PFN below 1M. That says
nothing about the value of the MFN (and iirc Xen doesn't allocate
MFNs from the first 1M to any guest on x86). Yet the safe guard
ought to be dealing with a specific MFN range only.

Can someone explain what I'm missing here?

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 10:31             ` Jan Beulich
@ 2013-12-04 10:39               ` Ian Campbell
  2013-12-04 10:42                 ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-12-04 10:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tomasz Wroblewski, xen-devel@lists.xen.org, Razvan Cojocaru

On Wed, 2013-12-04 at 10:31 +0000, Jan Beulich wrote:
> >>> On 04.12.13 at 11:24, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
> > On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote:
> >> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote:
> >>> On 12/03/2013 05:09 PM, Ian Campbell wrote:
> >>>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote:
> >>>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range())
> >>>>>>> pages from the Windows domU, except for pages below 1M.
> >>>>>>
> >>>>>> With no XSM how does it have the privilege to do this?
> >>>>>
> >>>>> What I meant to say is that the domU is being allowed to do this sort
> >>>>> of thing, i.e. the problem is definitely not caused by XSM.
> >>>>
> >>>> OK, so XSM is involved but you are 101% certain that it is not
> >>>> preventing the mappings?
> >>>>
> >>> We've ran into this issue in xenclient recently too, when we finally
> >>> upgraded stubdomain's kernel to pvops version. It seems pvops kernel
> >>> contains safeguard to only allow <1M mappings if it's dom0
> >>> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c:
> >>>
> >>> static pte_t xen_make_pte(pteval_t pte)
> >>> {
> >>>          phys_addr_t addr = (pte & PTE_PFN_MASK);
> >>>
> >>> ...
> >>>          /*
> >>>           * Unprivileged domains are allowed to do IOMAPpings for
> >>>           * PCI passthrough, but not map ISA space.  The ISA
> >>>           * mappings are just dummy local mappings to keep other
> >>>           * parts of the kernel happy.
> >>>           */
> >>>          if (unlikely(pte & _PAGE_IOMAP) &&
> >>>              (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> >>>                  pte = iomap_pte(pte);
> >>>          } else {
> >>>                  pte &= ~_PAGE_IOMAP;
> >>>                  pte = pte_pfn_to_mfn(pte);
> >>>          }
> >>>
> >>>          return native_make_pte(pte);
> >>> }
> >>>
> >>> We patched this out (in a fugly and probably not very correct way),
> >>> for our stubdomain kernel, since we needed our stubdomain qemu vms
> >>> to be able to map windows guest <1M range (since qemu needs to be
> >>> able to write data and read data there in order to chat with seabios
> >>> etc). Maybe Konrad (CC'ed) knows why the check is there in guest
> >>> kernel, and a good way to solve this.
> >>
> >> For PV domU guests the ISA are usually RAM - so you don't want during
> >> early bootup of a PV guest for it to scan MFNs it does not have access
> >> to. Granted it does not have access to them but it would have the
> >> MFNs coded in and any access to that area will result in .. Xen
> >> "fixing" up the PTEs (I can't recall exaclty how).
> >>
> >> If you boot a PV Guest and remove the:
> >>               (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> >>
> >> do you see anything that in the Xen console?
> >>
> > I recall I wasn't seeing anything, the pv domU was just hanging super early 
> > in the boot then. The way we worked around it is via attached 
> > patch (applied to PV domU's kernel, in our case stubdom hosting qemu 
> > process). It keeps the <1M safeguard for local mapping but allows 
> > foreign mappings (detected via _PAGE_SPECIAL flag).
> 
> I've been following this thread, with each new response making it
> less clear what is being talked about here: The original request
> was to map the MFN backing a guest's PFN below 1M. That says
> nothing about the value of the MFN (and iirc Xen doesn't allocate
> MFNs from the first 1M to any guest on x86). Yet the safe guard
> ought to be dealing with a specific MFN range only.
> 
> Can someone explain what I'm missing here?

I believe the intention is to catch domain 0's 1:1 mapping of the first
1M of host RAM.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 10:39               ` Ian Campbell
@ 2013-12-04 10:42                 ` Jan Beulich
  2013-12-04 10:45                   ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2013-12-04 10:42 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tomasz Wroblewski, Razvan Cojocaru, xen-devel@lists.xen.org

>>> On 04.12.13 at 11:39, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Wed, 2013-12-04 at 10:31 +0000, Jan Beulich wrote:
>> >>> On 04.12.13 at 11:24, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
>> > On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote:
>> >> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote:
>> >>> On 12/03/2013 05:09 PM, Ian Campbell wrote:
>> >>>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote:
>> >>>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range())
>> >>>>>>> pages from the Windows domU, except for pages below 1M.
>> >>>>>>
>> >>>>>> With no XSM how does it have the privilege to do this?
>> >>>>>
>> >>>>> What I meant to say is that the domU is being allowed to do this sort
>> >>>>> of thing, i.e. the problem is definitely not caused by XSM.
>> >>>>
>> >>>> OK, so XSM is involved but you are 101% certain that it is not
>> >>>> preventing the mappings?
>> >>>>
>> >>> We've ran into this issue in xenclient recently too, when we finally
>> >>> upgraded stubdomain's kernel to pvops version. It seems pvops kernel
>> >>> contains safeguard to only allow <1M mappings if it's dom0
>> >>> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c:
>> >>>
>> >>> static pte_t xen_make_pte(pteval_t pte)
>> >>> {
>> >>>          phys_addr_t addr = (pte & PTE_PFN_MASK);
>> >>>
>> >>> ...
>> >>>          /*
>> >>>           * Unprivileged domains are allowed to do IOMAPpings for
>> >>>           * PCI passthrough, but not map ISA space.  The ISA
>> >>>           * mappings are just dummy local mappings to keep other
>> >>>           * parts of the kernel happy.
>> >>>           */
>> >>>          if (unlikely(pte & _PAGE_IOMAP) &&
>> >>>              (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
>> >>>                  pte = iomap_pte(pte);
>> >>>          } else {
>> >>>                  pte &= ~_PAGE_IOMAP;
>> >>>                  pte = pte_pfn_to_mfn(pte);
>> >>>          }
>> >>>
>> >>>          return native_make_pte(pte);
>> >>> }
>> >>>
>> >>> We patched this out (in a fugly and probably not very correct way),
>> >>> for our stubdomain kernel, since we needed our stubdomain qemu vms
>> >>> to be able to map windows guest <1M range (since qemu needs to be
>> >>> able to write data and read data there in order to chat with seabios
>> >>> etc). Maybe Konrad (CC'ed) knows why the check is there in guest
>> >>> kernel, and a good way to solve this.
>> >>
>> >> For PV domU guests the ISA are usually RAM - so you don't want during
>> >> early bootup of a PV guest for it to scan MFNs it does not have access
>> >> to. Granted it does not have access to them but it would have the
>> >> MFNs coded in and any access to that area will result in .. Xen
>> >> "fixing" up the PTEs (I can't recall exaclty how).
>> >>
>> >> If you boot a PV Guest and remove the:
>> >>               (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
>> >>
>> >> do you see anything that in the Xen console?
>> >>
>> > I recall I wasn't seeing anything, the pv domU was just hanging super early 
> 
>> > in the boot then. The way we worked around it is via attached 
>> > patch (applied to PV domU's kernel, in our case stubdom hosting qemu 
>> > process). It keeps the <1M safeguard for local mapping but allows 
>> > foreign mappings (detected via _PAGE_SPECIAL flag).
>> 
>> I've been following this thread, with each new response making it
>> less clear what is being talked about here: The original request
>> was to map the MFN backing a guest's PFN below 1M. That says
>> nothing about the value of the MFN (and iirc Xen doesn't allocate
>> MFNs from the first 1M to any guest on x86). Yet the safe guard
>> ought to be dealing with a specific MFN range only.
>> 
>> Can someone explain what I'm missing here?
> 
> I believe the intention is to catch domain 0's 1:1 mapping of the first
> 1M of host RAM.

But iirc Razvan started out with wanting to map PFNs inside a
Windows guest.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 10:42                 ` Jan Beulich
@ 2013-12-04 10:45                   ` Ian Campbell
  2013-12-04 10:54                     ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-12-04 10:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tomasz Wroblewski, Razvan Cojocaru, xen-devel@lists.xen.org

On Wed, 2013-12-04 at 10:42 +0000, Jan Beulich wrote:
> >>> On 04.12.13 at 11:39, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Wed, 2013-12-04 at 10:31 +0000, Jan Beulich wrote:
> >> >>> On 04.12.13 at 11:24, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
> >> > On 12/03/2013 08:07 PM, Konrad Rzeszutek Wilk wrote:
> >> >> On Tue, Dec 03, 2013 at 06:36:48PM +0100, Tomasz Wroblewski wrote:
> >> >>> On 12/03/2013 05:09 PM, Ian Campbell wrote:
> >> >>>> On Tue, 2013-12-03 at 17:59 +0200, Razvan Cojocaru wrote:
> >> >>>>>>> The Linux domU is perfectly able to map (using xc_map_foreign_range())
> >> >>>>>>> pages from the Windows domU, except for pages below 1M.
> >> >>>>>>
> >> >>>>>> With no XSM how does it have the privilege to do this?
> >> >>>>>
> >> >>>>> What I meant to say is that the domU is being allowed to do this sort
> >> >>>>> of thing, i.e. the problem is definitely not caused by XSM.
> >> >>>>
> >> >>>> OK, so XSM is involved but you are 101% certain that it is not
> >> >>>> preventing the mappings?
> >> >>>>
> >> >>> We've ran into this issue in xenclient recently too, when we finally
> >> >>> upgraded stubdomain's kernel to pvops version. It seems pvops kernel
> >> >>> contains safeguard to only allow <1M mappings if it's dom0
> >> >>> (xen_initial_domain()). This check is placed in arch/x86/xen/mmu.c:
> >> >>>
> >> >>> static pte_t xen_make_pte(pteval_t pte)
> >> >>> {
> >> >>>          phys_addr_t addr = (pte & PTE_PFN_MASK);
> >> >>>
> >> >>> ...
> >> >>>          /*
> >> >>>           * Unprivileged domains are allowed to do IOMAPpings for
> >> >>>           * PCI passthrough, but not map ISA space.  The ISA
> >> >>>           * mappings are just dummy local mappings to keep other
> >> >>>           * parts of the kernel happy.
> >> >>>           */
> >> >>>          if (unlikely(pte & _PAGE_IOMAP) &&
> >> >>>              (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> >> >>>                  pte = iomap_pte(pte);
> >> >>>          } else {
> >> >>>                  pte &= ~_PAGE_IOMAP;
> >> >>>                  pte = pte_pfn_to_mfn(pte);
> >> >>>          }
> >> >>>
> >> >>>          return native_make_pte(pte);
> >> >>> }
> >> >>>
> >> >>> We patched this out (in a fugly and probably not very correct way),
> >> >>> for our stubdomain kernel, since we needed our stubdomain qemu vms
> >> >>> to be able to map windows guest <1M range (since qemu needs to be
> >> >>> able to write data and read data there in order to chat with seabios
> >> >>> etc). Maybe Konrad (CC'ed) knows why the check is there in guest
> >> >>> kernel, and a good way to solve this.
> >> >>
> >> >> For PV domU guests the ISA are usually RAM - so you don't want during
> >> >> early bootup of a PV guest for it to scan MFNs it does not have access
> >> >> to. Granted it does not have access to them but it would have the
> >> >> MFNs coded in and any access to that area will result in .. Xen
> >> >> "fixing" up the PTEs (I can't recall exaclty how).
> >> >>
> >> >> If you boot a PV Guest and remove the:
> >> >>               (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> >> >>
> >> >> do you see anything that in the Xen console?
> >> >>
> >> > I recall I wasn't seeing anything, the pv domU was just hanging super early 
> > 
> >> > in the boot then. The way we worked around it is via attached 
> >> > patch (applied to PV domU's kernel, in our case stubdom hosting qemu 
> >> > process). It keeps the <1M safeguard for local mapping but allows 
> >> > foreign mappings (detected via _PAGE_SPECIAL flag).
> >> 
> >> I've been following this thread, with each new response making it
> >> less clear what is being talked about here: The original request
> >> was to map the MFN backing a guest's PFN below 1M. That says
> >> nothing about the value of the MFN (and iirc Xen doesn't allocate
> >> MFNs from the first 1M to any guest on x86). Yet the safe guard
> >> ought to be dealing with a specific MFN range only.
> >> 
> >> Can someone explain what I'm missing here?
> > 
> > I believe the intention is to catch domain 0's 1:1 mapping of the first
> > 1M of host RAM.
> 
> But iirc Razvan started out with wanting to map PFNs inside a
> Windows guest.

Correct. The check for mapping domain 0's 1:1 map is overly broad I
think, and erroneously prevents a domU from mapping a foreign PFN < 1M.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 10:45                   ` Ian Campbell
@ 2013-12-04 10:54                     ` Jan Beulich
  2013-12-04 11:04                       ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2013-12-04 10:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tomasz Wroblewski, Razvan Cojocaru, xen-devel@lists.xen.org

>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> Correct. The check for mapping domain 0's 1:1 map is overly broad I
> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.

But that's the source of my not understanding: xen_make_pte()
derives addr from the passed in pte, and that pte can - for a
foreign domain's page - hardly hold a PFN. Otherwise how would
the translation to MFN be supposed to happen? Yet, if it's a
machine address that's coming in, it can't point into the low 1Mb.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 10:54                     ` Jan Beulich
@ 2013-12-04 11:04                       ` Ian Campbell
  2013-12-04 11:23                         ` Tomasz Wroblewski
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-12-04 11:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tomasz Wroblewski, Razvan Cojocaru, xen-devel@lists.xen.org

On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
> >>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > Correct. The check for mapping domain 0's 1:1 map is overly broad I
> > think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
> 
> But that's the source of my not understanding: xen_make_pte()
> derives addr from the passed in pte, and that pte can - for a
> foreign domain's page - hardly hold a PFN. Otherwise how would
> the translation to MFN be supposed to happen? Yet, if it's a
> machine address that's coming in, it can't point into the low 1Mb.

Isn't it a foreign gpfn at this point, which for an HVM guest is
actually a PFN not an MFN?

You are making me think I might be talking out my a**e though, because
what is a foreign mapping even doing in xen_make_pte -- those need to be
instantiated in a special way.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 11:04                       ` Ian Campbell
@ 2013-12-04 11:23                         ` Tomasz Wroblewski
  2013-12-04 11:36                           ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Tomasz Wroblewski @ 2013-12-04 11:23 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Razvan Cojocaru, Jan Beulich, xen-devel@lists.xen.org

On 12/04/2013 12:04 PM, Ian Campbell wrote:
> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I
>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
>>
>> But that's the source of my not understanding: xen_make_pte()
>> derives addr from the passed in pte, and that pte can - for a
>> foreign domain's page - hardly hold a PFN. Otherwise how would
>> the translation to MFN be supposed to happen? Yet, if it's a
>> machine address that's coming in, it can't point into the low 1Mb.
>
> Isn't it a foreign gpfn at this point, which for an HVM guest is
> actually a PFN not an MFN?
>
> You are making me think I might be talking out my a**e though, because
> what is a foreign mapping even doing in xen_make_pte -- those need to be
> instantiated in a special way.
>
I believe the callpath for this is

xen_remap_domain_range() (mmu.c)
  |
  v
remap_area_pfn_pte() (mmu.c)
  |
  v
pfn_pte() (somewhere, one of the pgtable.h hdrs)
  |
  v
__pte() (paravirt.h)
  |
  v
xen_make_pte (mmu.c) via pv_mmu_ops.make_pte

Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN, but it seems the case.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 11:23                         ` Tomasz Wroblewski
@ 2013-12-04 11:36                           ` Jan Beulich
  2013-12-04 12:01                             ` Tomasz Wroblewski
  2013-12-04 16:40                             ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 27+ messages in thread
From: Jan Beulich @ 2013-12-04 11:36 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: Razvan Cojocaru, Ian Campbell, xen-devel@lists.xen.org

>>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
> On 12/04/2013 12:04 PM, Ian Campbell wrote:
>> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
>>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I
>>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
>>>
>>> But that's the source of my not understanding: xen_make_pte()
>>> derives addr from the passed in pte, and that pte can - for a
>>> foreign domain's page - hardly hold a PFN. Otherwise how would
>>> the translation to MFN be supposed to happen? Yet, if it's a
>>> machine address that's coming in, it can't point into the low 1Mb.
>>
>> Isn't it a foreign gpfn at this point, which for an HVM guest is
>> actually a PFN not an MFN?
>>
>> You are making me think I might be talking out my a**e though, because
>> what is a foreign mapping even doing in xen_make_pte -- those need to be
>> instantiated in a special way.
>>
> I believe the callpath for this is
> 
> xen_remap_domain_range() (mmu.c)
>   |
>   v
> remap_area_pfn_pte() (mmu.c)
>   |
>   v
> pfn_pte() (somewhere, one of the pgtable.h hdrs)
>   |
>   v
> __pte() (paravirt.h)
>   |
>   v
> xen_make_pte (mmu.c) via pv_mmu_ops.make_pte
> 
> Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN, 
> but it seems the case.

But that's a fundamental thing to explain. As Ian says - foreign PFNs
shouldn't make it here, or else how do you know how to translate
them to MFNs (as you can't consult the local P2M table to do so)?

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 10:24           ` Tomasz Wroblewski
  2013-12-04 10:31             ` Jan Beulich
@ 2013-12-04 11:42             ` Mihai Donțu
  2013-12-04 14:19               ` Tomasz Wroblewski
  1 sibling, 1 reply; 27+ messages in thread
From: Mihai Donțu @ 2013-12-04 11:42 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: xen-devel, Razvan Cojocaru, Ian Campbell

On Wed, 4 Dec 2013 11:24:21 +0100 Tomasz Wroblewski wrote:
> >> We've ran into this issue in xenclient recently too, when we
> >> finally upgraded stubdomain's kernel to pvops version. It seems
> >> pvops kernel contains safeguard to only allow <1M mappings if it's
> >> dom0 (xen_initial_domain()). This check is placed in
> >> arch/x86/xen/mmu.c:
> >>
> >> static pte_t xen_make_pte(pteval_t pte)
> >> {
> >>          phys_addr_t addr = (pte & PTE_PFN_MASK);
> >>
> >> ...
> >>          /*
> >>           * Unprivileged domains are allowed to do IOMAPpings for
> >>           * PCI passthrough, but not map ISA space.  The ISA
> >>           * mappings are just dummy local mappings to keep other
> >>           * parts of the kernel happy.
> >>           */
> >>          if (unlikely(pte & _PAGE_IOMAP) &&
> >>              (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> >>                  pte = iomap_pte(pte);
> >>          } else {
> >>                  pte &= ~_PAGE_IOMAP;
> >>                  pte = pte_pfn_to_mfn(pte);
> >>          }
> >>
> >>          return native_make_pte(pte);
> >> }
> >>
> >> We patched this out (in a fugly and probably not very correct way),
> >> for our stubdomain kernel, since we needed our stubdomain qemu vms
> >> to be able to map windows guest <1M range (since qemu needs to be
> >> able to write data and read data there in order to chat with
> >> seabios etc). Maybe Konrad (CC'ed) knows why the check is there in
> >> guest kernel, and a good way to solve this.
> >
> > For PV domU guests the ISA are usually RAM - so you don't want
> > during early bootup of a PV guest for it to scan MFNs it does not
> > have access to. Granted it does not have access to them but it
> > would have the MFNs coded in and any access to that area will
> > result in .. Xen "fixing" up the PTEs (I can't recall exaclty how).
> >
> > If you boot a PV Guest and remove the:
> >               (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> >
> > do you see anything that in the Xen console?
> >
> I recall I wasn't seeing anything, the pv domU was just hanging super
> early in the boot then. The way we worked around it is via attached
> patch (applied to PV domU's kernel, in our case stubdom hosting qemu
> process). It keeps the <1M safeguard for local mapping but allows
> foreign mappings (detected via _PAGE_SPECIAL flag).
> 
> Razvan, you can try attached patch as well applied to your pv domU
> kernel to see if it helps you.
> 

Razvan and I are working together to find a solution to this. I took
your patch for a spin and while that code path is taken when invoking
xc_map_foreign_range(), the call still fails with EINVAL. I haven't yet
determined if the call stops in the domU kernel or it reaches xen and
gets terminated there. I've tried this on Ubuntu's 3.8. on top of
XenServer's xen-4.3.1.

Thanks,

-- 
Mihai Donțu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 11:36                           ` Jan Beulich
@ 2013-12-04 12:01                             ` Tomasz Wroblewski
  2013-12-04 12:14                               ` Jan Beulich
  2013-12-04 16:40                             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 27+ messages in thread
From: Tomasz Wroblewski @ 2013-12-04 12:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Razvan Cojocaru, Ian Campbell, xen-devel@lists.xen.org

On 12/04/2013 12:36 PM, Jan Beulich wrote:
>>>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
>> On 12/04/2013 12:04 PM, Ian Campbell wrote:
>>> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
>>>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>>>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I
>>>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
>>>>
>>>> But that's the source of my not understanding: xen_make_pte()
>>>> derives addr from the passed in pte, and that pte can - for a
>>>> foreign domain's page - hardly hold a PFN. Otherwise how would
>>>> the translation to MFN be supposed to happen? Yet, if it's a
>>>> machine address that's coming in, it can't point into the low 1Mb.
>>>
>>> Isn't it a foreign gpfn at this point, which for an HVM guest is
>>> actually a PFN not an MFN?
>>>
>>> You are making me think I might be talking out my a**e though, because
>>> what is a foreign mapping even doing in xen_make_pte -- those need to be
>>> instantiated in a special way.
>>>
>> I believe the callpath for this is
>>
>> xen_remap_domain_range() (mmu.c)
>>    |
>>    v
>> remap_area_pfn_pte() (mmu.c)
>>    |
>>    v
>> pfn_pte() (somewhere, one of the pgtable.h hdrs)
>>    |
>>    v
>> __pte() (paravirt.h)
>>    |
>>    v
>> xen_make_pte (mmu.c) via pv_mmu_ops.make_pte
>>
>> Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN,
>> but it seems the case.
>
> But that's a fundamental thing to explain. As Ian says - foreign PFNs
> shouldn't make it here, or else how do you know how to translate
> them to MFNs (as you can't consult the local P2M table to do so)?
>
I was under the impression that the translation is done inside in xen inside HYPERVISOR_mmu_update, which gets called from 
xen_remap_domain_mfn_range shortly after setting up the ptes via xen_make_pte:

int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
			       unsigned long addr,
			       xen_pfn_t mfn, int nr,
			       pgprot_t prot, unsigned domid,
			       struct page **pages)
...
		err = apply_to_page_range(vma->vm_mm, addr, range,
					  remap_area_mfn_pte_fn, &rmd);
^^^ this calls xen_make_pte via the callpath I quoted in previous post
		if (err)
			goto out;

		err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid);
^^^ this goes into xen and does p2m translation and mmu setup etc

		if (err < 0)
			goto out;
...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 12:01                             ` Tomasz Wroblewski
@ 2013-12-04 12:14                               ` Jan Beulich
  2013-12-04 12:23                                 ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2013-12-04 12:14 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: Razvan Cojocaru, Ian Campbell, xen-devel@lists.xen.org

>>> On 04.12.13 at 13:01, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
> On 12/04/2013 12:36 PM, Jan Beulich wrote:
>> But that's a fundamental thing to explain. As Ian says - foreign PFNs
>> shouldn't make it here, or else how do you know how to translate
>> them to MFNs (as you can't consult the local P2M table to do so)?
>>
> I was under the impression that the translation is done inside in xen inside 
> HYPERVISOR_mmu_update,

That hypercall does translation only for auto-translated guests,
which a normal PV one clearly isn't.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 12:14                               ` Jan Beulich
@ 2013-12-04 12:23                                 ` Ian Campbell
  2013-12-04 12:39                                   ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2013-12-04 12:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tomasz Wroblewski, Razvan Cojocaru, xen-devel@lists.xen.org

On Wed, 2013-12-04 at 12:14 +0000, Jan Beulich wrote:
> >>> On 04.12.13 at 13:01, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
> > On 12/04/2013 12:36 PM, Jan Beulich wrote:
> >> But that's a fundamental thing to explain. As Ian says - foreign PFNs
> >> shouldn't make it here, or else how do you know how to translate
> >> them to MFNs (as you can't consult the local P2M table to do so)?
> >>
> > I was under the impression that the translation is done inside in xen inside 
> > HYPERVISOR_mmu_update,
> 
> That hypercall does translation only for auto-translated guests,
> which a normal PV one clearly isn't.

When mapping a foreign owned page it is the remote owners mode which
matters though, isn't it?

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 12:23                                 ` Ian Campbell
@ 2013-12-04 12:39                                   ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2013-12-04 12:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tomasz Wroblewski, Razvan Cojocaru, xen-devel@lists.xen.org

>>> On 04.12.13 at 13:23, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Wed, 2013-12-04 at 12:14 +0000, Jan Beulich wrote:
>> >>> On 04.12.13 at 13:01, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
>> > On 12/04/2013 12:36 PM, Jan Beulich wrote:
>> >> But that's a fundamental thing to explain. As Ian says - foreign PFNs
>> >> shouldn't make it here, or else how do you know how to translate
>> >> them to MFNs (as you can't consult the local P2M table to do so)?
>> >>
>> > I was under the impression that the translation is done inside in xen 
> inside 
>> > HYPERVISOR_mmu_update,
>> 
>> That hypercall does translation only for auto-translated guests,
>> which a normal PV one clearly isn't.
> 
> When mapping a foreign owned page it is the remote owners mode which
> matters though, isn't it?

Oh, right. Which - for the code at hand - makes it even more
difficult to do the right thing (refuse PV DomU mappings of MFNs
below 1Mb, but allow translated DomU mappings of PFNs in that
range). I.e. we're back to why execution goes that route in the
first place for foreign mappings and doesn't - like on XenoLinux -
bypass the normal PTE construction code.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 11:42             ` Mihai Donțu
@ 2013-12-04 14:19               ` Tomasz Wroblewski
  2013-12-04 16:15                 ` Mihai Donțu
  0 siblings, 1 reply; 27+ messages in thread
From: Tomasz Wroblewski @ 2013-12-04 14:19 UTC (permalink / raw)
  To: Mihai Donțu; +Cc: xen-devel, Razvan Cojocaru, Ian Campbell


> Razvan and I are working together to find a solution to this. I took
> your patch for a spin and while that code path is taken when invoking
> xc_map_foreign_range(), the call still fails with EINVAL. I haven't yet
> determined if the call stops in the domU kernel or it reaches xen and
> gets terminated there. I've tried this on Ubuntu's 3.8. on top of
> XenServer's xen-4.3.1.
>
Not sure why the patch doesn't work for you (you applied it to domU kernel which ties to map, right?), but before we applied this, the 
EINVAL was coming from hypervisor's HYPERVISOR_mmu_update in xen_remap_domain_mfn_range(), since the PTE constructed by xen_make_pte was 
invalid for the other domain.
> Thanks,
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 14:19               ` Tomasz Wroblewski
@ 2013-12-04 16:15                 ` Mihai Donțu
  0 siblings, 0 replies; 27+ messages in thread
From: Mihai Donțu @ 2013-12-04 16:15 UTC (permalink / raw)
  To: Tomasz Wroblewski; +Cc: xen-devel, Razvan Cojocaru, Ian Campbell

On Wed, 4 Dec 2013 15:19:54 +0100 Tomasz Wroblewski wrote:
> > Razvan and I are working together to find a solution to this. I took
> > your patch for a spin and while that code path is taken when
> > invoking xc_map_foreign_range(), the call still fails with EINVAL.
> > I haven't yet determined if the call stops in the domU kernel or it
> > reaches xen and gets terminated there. I've tried this on Ubuntu's
> > 3.8. on top of XenServer's xen-4.3.1.
> >
> 
> Not sure why the patch doesn't work for you (you applied it to domU
> kernel which ties to map, right?), but before we applied this, the
> EINVAL was coming from hypervisor's HYPERVISOR_mmu_update in
> xen_remap_domain_mfn_range(), since the PTE constructed by
> xen_make_pte was invalid for the other domain.
> > Thanks,
> >
> 

I'm sorry, I take back what I said before. The patch works OK, I just
interpreted the results wrong (some pages really _are_ unaccessible,
even from dom0). Thank you for all your help. :-)

-- 
Mihai Donțu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 11:36                           ` Jan Beulich
  2013-12-04 12:01                             ` Tomasz Wroblewski
@ 2013-12-04 16:40                             ` Konrad Rzeszutek Wilk
  2013-12-04 17:16                               ` Tomasz Wroblewski
  1 sibling, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-04 16:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tomasz Wroblewski, Razvan Cojocaru, Ian Campbell,
	xen-devel@lists.xen.org

On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote:
> >>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
> > On 12/04/2013 12:04 PM, Ian Campbell wrote:
> >> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
> >>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> >>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I
> >>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
> >>>
> >>> But that's the source of my not understanding: xen_make_pte()
> >>> derives addr from the passed in pte, and that pte can - for a
> >>> foreign domain's page - hardly hold a PFN. Otherwise how would
> >>> the translation to MFN be supposed to happen? Yet, if it's a
> >>> machine address that's coming in, it can't point into the low 1Mb.
> >>
> >> Isn't it a foreign gpfn at this point, which for an HVM guest is
> >> actually a PFN not an MFN?
> >>
> >> You are making me think I might be talking out my a**e though, because
> >> what is a foreign mapping even doing in xen_make_pte -- those need to be
> >> instantiated in a special way.
> >>
> > I believe the callpath for this is
> > 
> > xen_remap_domain_range() (mmu.c)
> >   |
> >   v
> > remap_area_pfn_pte() (mmu.c)
> >   |
> >   v
> > pfn_pte() (somewhere, one of the pgtable.h hdrs)
> >   |
> >   v
> > __pte() (paravirt.h)
> >   |
> >   v
> > xen_make_pte (mmu.c) via pv_mmu_ops.make_pte
> > 
> > Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN, 
> > but it seems the case.
> 
> But that's a fundamental thing to explain. As Ian says - foreign PFNs
> shouldn't make it here, or else how do you know how to translate
> them to MFNs (as you can't consult the local P2M table to do so)?

This is all done via the toolstack which does the /dev/xen ioctl to map
some of its user-space memory in the guest memory. It ends up getting
the MFNs via some hypercall (forgotten which) and inputs those in the
IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with
_PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore
the P2M and use that specific MFN value.

It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage
out - but this is the last bastion where it is used.

The check that the xen_make_pte for the VM_IO for 1:1 pages is not
really needed anymore - as we have the 1:1 pages in the P2M (except for
the InfiniBand MMIO regions which are at 60TB and the P2M doesn't reach
there - but that is different bug).

So the check there could actually be lessen - and we can piggyback on
the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the
xen_pte_val - which we would only be set by xen_make_pte iff P2M says
the page is 1:1.


Not compile tested:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index ce563be..98efb65 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
 			if (mfn & IDENTITY_FRAME_BIT) {
 				mfn &= ~IDENTITY_FRAME_BIT;
 				flags |= _PAGE_IOMAP;
-			}
+			} else
+				flags &= _PAGE_IOMAP;
 		}
 		val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
 	}
@@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte)
 		pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
 	}
 #endif
-	if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
+	if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */
 		return pteval;
 
 	return pte_mfn_to_pfn(pteval);
@@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte)
 #endif
 	/*
 	 * Unprivileged domains are allowed to do IOMAPpings for
-	 * PCI passthrough, but not map ISA space.  The ISA
-	 * mappings are just dummy local mappings to keep other
-	 * parts of the kernel happy.
+	 * PCI passthrough. _PAGE_SPECIAL is done when user-space uses
+	 * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP
+	 * is supplied to use by xen_set_fixmap.
 	 */
-	if (unlikely(pte & _PAGE_IOMAP) &&
-	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
+	if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP))
 		pte = iomap_pte(pte);
-	} else {
-		pte &= ~_PAGE_IOMAP;
+	else
 		pte = pte_pfn_to_mfn(pte);
-	}
 
 	return native_make_pte(pte);
 }


> Jan
> 
> 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 16:40                             ` Konrad Rzeszutek Wilk
@ 2013-12-04 17:16                               ` Tomasz Wroblewski
  2014-07-08 14:54                                 ` Mihai Donțu
  0 siblings, 1 reply; 27+ messages in thread
From: Tomasz Wroblewski @ 2013-12-04 17:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Razvan Cojocaru, Ian Campbell, Jan Beulich,
	xen-devel@lists.xen.org

On 12/04/2013 05:40 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Dec 04, 2013 at 11:36:33AM +0000, Jan Beulich wrote:
>>>>> On 04.12.13 at 12:23, Tomasz Wroblewski <tomasz.wroblewski@citrix.com> wrote:
>>> On 12/04/2013 12:04 PM, Ian Campbell wrote:
>>>> On Wed, 2013-12-04 at 10:54 +0000, Jan Beulich wrote:
>>>>>>>> On 04.12.13 at 11:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>>>>>> Correct. The check for mapping domain 0's 1:1 map is overly broad I
>>>>>> think, and erroneously prevents a domU from mapping a foreign PFN < 1M.
>>>>>
>>>>> But that's the source of my not understanding: xen_make_pte()
>>>>> derives addr from the passed in pte, and that pte can - for a
>>>>> foreign domain's page - hardly hold a PFN. Otherwise how would
>>>>> the translation to MFN be supposed to happen? Yet, if it's a
>>>>> machine address that's coming in, it can't point into the low 1Mb.
>>>>
>>>> Isn't it a foreign gpfn at this point, which for an HVM guest is
>>>> actually a PFN not an MFN?
>>>>
>>>> You are making me think I might be talking out my a**e though, because
>>>> what is a foreign mapping even doing in xen_make_pte -- those need to be
>>>> instantiated in a special way.
>>>>
>>> I believe the callpath for this is
>>>
>>> xen_remap_domain_range() (mmu.c)
>>>    |
>>>    v
>>> remap_area_pfn_pte() (mmu.c)
>>>    |
>>>    v
>>> pfn_pte() (somewhere, one of the pgtable.h hdrs)
>>>    |
>>>    v
>>> __pte() (paravirt.h)
>>>    |
>>>    v
>>> xen_make_pte (mmu.c) via pv_mmu_ops.make_pte
>>>
>>> Sorry, can't offer much insight as to why addr in pte holds the hvm's PFN,
>>> but it seems the case.
>>
>> But that's a fundamental thing to explain. As Ian says - foreign PFNs
>> shouldn't make it here, or else how do you know how to translate
>> them to MFNs (as you can't consult the local P2M table to do so)?
>
> This is all done via the toolstack which does the /dev/xen ioctl to map
> some of its user-space memory in the guest memory. It ends up getting
> the MFNs via some hypercall (forgotten which) and inputs those in the
> IOCTL_PRIVCMD_MMAP ioctl. That function ends up calling remap with
> _PAGE_IOMAP (well actually VM_IO) so that the xen_make_pte will ignore
> the P2M and use that specific MFN value.
>
> It is kind of nasty. I was hoping we could remove the _PAGE_IOMAP usage
> out - but this is the last bastion where it is used.
>
> The check that the xen_make_pte for the VM_IO for 1:1 pages is not
> really needed anymore - as we have the 1:1 pages in the P2M (except for
> the InfiniBand MMIO regions which are at 60TB and the P2M doesn't reach
> there - but that is different bug).
>
> So the check there could actually be lessen - and we can piggyback on
> the _PTE_SPECIAL. Hm, and only keep the _PAGE_IOMAP check in the
> xen_pte_val - which we would only be set by xen_make_pte iff P2M says
> the page is 1:1.
>
>
> Not compile tested:
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index ce563be..98efb65 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
>   			if (mfn & IDENTITY_FRAME_BIT) {
>   				mfn &= ~IDENTITY_FRAME_BIT;
>   				flags |= _PAGE_IOMAP;
> -			}
> +			} else
> +				flags &= _PAGE_IOMAP;
>   		}
>   		val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
>   	}
> @@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte)
>   		pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
>   	}
>   #endif
> -	if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
> +	if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1 PFNs. */
>   		return pteval;
>
>   	return pte_mfn_to_pfn(pteval);
> @@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte)
>   #endif
>   	/*
>   	 * Unprivileged domains are allowed to do IOMAPpings for
> -	 * PCI passthrough, but not map ISA space.  The ISA
> -	 * mappings are just dummy local mappings to keep other
> -	 * parts of the kernel happy.
> +	 * PCI passthrough. _PAGE_SPECIAL is done when user-space uses
> +	 * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The _PAGE_IOMAP
> +	 * is supplied to use by xen_set_fixmap.
>   	 */
> -	if (unlikely(pte & _PAGE_IOMAP) &&
> -	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> +	if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP))
>   		pte = iomap_pte(pte);

I think this wont work because _PAGE_SPECIAL is not set at this point yet (inside xen_make_pte). It is only set after xen_make_pte. This is 
why my patch contained this extra, rather nasty, hunk, which made _PAGE_SPECIAL set a bit earlier:

+static inline pte_t foreign_special_pfn_pte(unsigned long page_nr, pgprot_t pgprot)
+{
+	return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
+		     massage_pgprot(pgprot) | _PAGE_SPECIAL);
+}
+
+
  static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
  				 unsigned long addr, void *data)
  {
  	struct remap_data *rmd = data;
-	pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot));
+	pte_t pte = foreign_special_pfn_pte(rmd->mfn++, rmd->prot);

  	rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
  	rmd->mmu_update->val = pte_val_ma(pte);


I've basically made a new function foreign_special_pfn_pte which is unrolled pte_mkspecial with a small difference that it sets 
_PAGE_SPECIAL bit before calling __pte, not after (because __pte calls into xen_make_pte). Maybe cleanest way of fixing this would be just 
to have separate path for this which doesn't use xen_make_pte at all?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU
  2013-12-04 17:16                               ` Tomasz Wroblewski
@ 2014-07-08 14:54                                 ` Mihai Donțu
  0 siblings, 0 replies; 27+ messages in thread
From: Mihai Donțu @ 2014-07-08 14:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Tomasz Wroblewski, Razvan Cojocaru, Ian Campbell, Jan Beulich

On Wed, 4 Dec 2013 18:16:11 +0100 Tomasz Wroblewski wrote:
> > [...]
> > 
> > Not compile tested:
> >
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index ce563be..98efb65 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -409,7 +409,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
> >   			if (mfn & IDENTITY_FRAME_BIT) {
> >   				mfn &= ~IDENTITY_FRAME_BIT;
> >   				flags |= _PAGE_IOMAP;
> > -			}
> > +			} else
> > +				flags &= _PAGE_IOMAP;
> >   		}
> >   		val = ((pteval_t)mfn << PAGE_SHIFT) | flags;
> >   	}
> > @@ -441,7 +442,7 @@ static pteval_t xen_pte_val(pte_t pte)
> >   		pteval = (pteval & ~_PAGE_PAT) | _PAGE_PWT;
> >   	}
> >   #endif
> > -	if (xen_initial_domain() && (pteval & _PAGE_IOMAP))
> > +	if (pteval & _PAGE_IOMAP) /* Set by xen_make_pte for 1:1
> > PFNs. */ return pteval;
> >
> >   	return pte_mfn_to_pfn(pteval);
> > @@ -498,17 +499,14 @@ static pte_t xen_make_pte(pteval_t pte)
> >   #endif
> >   	/*
> >   	 * Unprivileged domains are allowed to do IOMAPpings for
> > -	 * PCI passthrough, but not map ISA space.  The ISA
> > -	 * mappings are just dummy local mappings to keep other
> > -	 * parts of the kernel happy.
> > +	 * PCI passthrough. _PAGE_SPECIAL is done when user-space
> > uses
> > +	 * IOCTL_PRIVCMD_MMAP and gives us the MFNs. The
> > _PAGE_IOMAP
> > +	 * is supplied to use by xen_set_fixmap.
> >   	 */
> > -	if (unlikely(pte & _PAGE_IOMAP) &&
> > -	    (xen_initial_domain() || addr >= ISA_END_ADDRESS)) {
> > +	if (unlikely(pte & _PAGE_SPECIAL | _PAGE_IOMAP))
> >   		pte = iomap_pte(pte);
> 
> I think this wont work because _PAGE_SPECIAL is not set at this point
> yet (inside xen_make_pte). It is only set after xen_make_pte. This is
> why my patch contained this extra, rather nasty, hunk, which made
> _PAGE_SPECIAL set a bit earlier:
> 
> +static inline pte_t foreign_special_pfn_pte(unsigned long page_nr,
> pgprot_t pgprot) +{
> +	return __pte(((phys_addr_t)page_nr << PAGE_SHIFT) |
> +		     massage_pgprot(pgprot) | _PAGE_SPECIAL);
> +}
> +
> +
>   static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
>   				 unsigned long addr, void *data)
>   {
>   	struct remap_data *rmd = data;
> -	pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot));
> +	pte_t pte = foreign_special_pfn_pte(rmd->mfn++, rmd->prot);
> 
>   	rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
>   	rmd->mmu_update->val = pte_val_ma(pte);
> 
> 
> I've basically made a new function foreign_special_pfn_pte which is
> unrolled pte_mkspecial with a small difference that it sets
> _PAGE_SPECIAL bit before calling __pte, not after (because __pte
> calls into xen_make_pte). Maybe cleanest way of fixing this would be
> just to have separate path for this which doesn't use xen_make_pte at
> all?

This thread has stalled for some time now. Can this last patch from
Tomasz be considered for inclusion and maybe -stable, even if it's not
the _cleanest_ fix?

Thanks,

-- 
Mihai Donțu

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-07-08 14:54 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-03 15:06 Why does xc_map_foreign_range() refuse to map pfns below 1M from a domU Razvan Cojocaru
2013-12-03 15:51 ` Ian Campbell
2013-12-03 15:59   ` Razvan Cojocaru
2013-12-03 16:09     ` Ian Campbell
2013-12-03 17:36       ` Tomasz Wroblewski
2013-12-03 18:59         ` Razvan Cojocaru
2013-12-03 19:07         ` Konrad Rzeszutek Wilk
2013-12-04 10:24           ` Tomasz Wroblewski
2013-12-04 10:31             ` Jan Beulich
2013-12-04 10:39               ` Ian Campbell
2013-12-04 10:42                 ` Jan Beulich
2013-12-04 10:45                   ` Ian Campbell
2013-12-04 10:54                     ` Jan Beulich
2013-12-04 11:04                       ` Ian Campbell
2013-12-04 11:23                         ` Tomasz Wroblewski
2013-12-04 11:36                           ` Jan Beulich
2013-12-04 12:01                             ` Tomasz Wroblewski
2013-12-04 12:14                               ` Jan Beulich
2013-12-04 12:23                                 ` Ian Campbell
2013-12-04 12:39                                   ` Jan Beulich
2013-12-04 16:40                             ` Konrad Rzeszutek Wilk
2013-12-04 17:16                               ` Tomasz Wroblewski
2014-07-08 14:54                                 ` Mihai Donțu
2013-12-04 11:42             ` Mihai Donțu
2013-12-04 14:19               ` Tomasz Wroblewski
2013-12-04 16:15                 ` Mihai Donțu
  -- strict thread matches above, loose matches on Subject: below --
2013-12-03 16:18 Razvan Cojocaru

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).