xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm: xen: foreign mapping PTEs are special.
@ 2013-12-04 15:52 Ian Campbell
  2013-12-06 17:48 ` Stefano Stabellini
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Campbell @ 2013-12-04 15:52 UTC (permalink / raw)
  To: xen-devel; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

These mappings are in fact special and require special handling in privcmd,
which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun.

x86 already gets this correct.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: xen-devel@lists.xenproject.org
---
 arch/arm/xen/enlighten.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 83e4f95..932cc50 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -96,7 +96,7 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
 	struct remap_data *info = data;
 	struct page *page = info->pages[info->index++];
 	unsigned long pfn = page_to_pfn(page);
-	pte_t pte = pfn_pte(pfn, info->prot);
+	pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot));
 
 	if (map_foreign_page(pfn, info->fgmfn, info->domid))
 		return -EFAULT;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] arm: xen: foreign mapping PTEs are special.
  2013-12-04 15:52 [PATCH] arm: xen: foreign mapping PTEs are special Ian Campbell
@ 2013-12-06 17:48 ` Stefano Stabellini
  2013-12-06 17:54   ` Ian Campbell
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Stabellini @ 2013-12-06 17:48 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Stefano Stabellini, xen-devel

On Wed, 4 Dec 2013, Ian Campbell wrote:
> These mappings are in fact special and require special handling in privcmd,
> which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun.
> 
> x86 already gets this correct.

Yes, but x86 does that for PV guests, not for autotranslate guests (for
which the function return -EINVAL).

Given that in the ARM case we are changing the p2m underneath, why do we
also need to mark them special?



> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Cc: xen-devel@lists.xenproject.org
> ---
>  arch/arm/xen/enlighten.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 83e4f95..932cc50 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -96,7 +96,7 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
>  	struct remap_data *info = data;
>  	struct page *page = info->pages[info->index++];
>  	unsigned long pfn = page_to_pfn(page);
> -	pte_t pte = pfn_pte(pfn, info->prot);
> +	pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot));
>  
>  	if (map_foreign_page(pfn, info->fgmfn, info->domid))
>  		return -EFAULT;
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arm: xen: foreign mapping PTEs are special.
  2013-12-06 17:48 ` Stefano Stabellini
@ 2013-12-06 17:54   ` Ian Campbell
  2013-12-08 17:32     ` Stefano Stabellini
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Campbell @ 2013-12-06 17:54 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, xen-devel

On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote:
> On Wed, 4 Dec 2013, Ian Campbell wrote:
> > These mappings are in fact special and require special handling in privcmd,
> > which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun.
> > 
> > x86 already gets this correct.
> 
> Yes, but x86 does that for PV guests, not for autotranslate guests (for
> which the function return -EINVAL).
> 
> Given that in the ARM case we are changing the p2m underneath, why do we
> also need to mark them special?

It's not about the p2m, it's about the handling in privcmd wrt setup and
teardown of the stage one mapping which goes along with the p2m
manipulations.

Without this the normal rmap counting kicks in and complains about the
mapcount being -1.

It's possible that we could handle this by implementing vma fault
handling for the privcmd driver, but that would mean a lot more book
keeping (e.g. to know if the ioctl has been made for a particular
address or not) and would mean we diverged more from the pv paths.

Ian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arm: xen: foreign mapping PTEs are special.
  2013-12-06 17:54   ` Ian Campbell
@ 2013-12-08 17:32     ` Stefano Stabellini
  2013-12-09 10:19       ` Ian Campbell
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Stabellini @ 2013-12-08 17:32 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, xen-devel, Stefano Stabellini

On Fri, 6 Dec 2013, Ian Campbell wrote:
> On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote:
> > On Wed, 4 Dec 2013, Ian Campbell wrote:
> > > These mappings are in fact special and require special handling in privcmd,
> > > which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun.
> > > 
> > > x86 already gets this correct.
> > 
> > Yes, but x86 does that for PV guests, not for autotranslate guests (for
> > which the function return -EINVAL).
> > 
> > Given that in the ARM case we are changing the p2m underneath, why do we
> > also need to mark them special?
> 
> It's not about the p2m, it's about the handling in privcmd wrt setup and
> teardown of the stage one mapping which goes along with the p2m
> manipulations.
> 
> Without this the normal rmap counting kicks in and complains about the
> mapcount being -1.

In the case of PV guests we need to mark the pte special because there
is no struct page associated to it: the userspace virtual address is
mapped to a foreign mfn. No dom0 pages are involved.

In this case we do have a corresponding physical address and a
corresponding struct page in dom0.  In fact the dom0 page is a
xenballooned_pages allocated by alloc_empty_pages.

What exactly is the kernel complaining about? Didn't we allocate the
pages correctly? Didn't we call get_page appropriately?
There must be a way to make the mapcount happy without marking the pages
as special.


Also consider that on 3.13-rc2 pte_mkspecial on arm is implemented as:

static inline pte_t pte_mkspecial(pte_t pte) { return pte; }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arm: xen: foreign mapping PTEs are special.
  2013-12-08 17:32     ` Stefano Stabellini
@ 2013-12-09 10:19       ` Ian Campbell
  2013-12-09 12:16         ` Stefano Stabellini
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Campbell @ 2013-12-09 10:19 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel

On Sun, 2013-12-08 at 17:32 +0000, Stefano Stabellini wrote:
> On Fri, 6 Dec 2013, Ian Campbell wrote:
> > On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote:
> > > On Wed, 4 Dec 2013, Ian Campbell wrote:
> > > > These mappings are in fact special and require special handling in privcmd,
> > > > which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun.
> > > > 
> > > > x86 already gets this correct.
> > > 
> > > Yes, but x86 does that for PV guests, not for autotranslate guests (for
> > > which the function return -EINVAL).
> > > 
> > > Given that in the ARM case we are changing the p2m underneath, why do we
> > > also need to mark them special?
> > 
> > It's not about the p2m, it's about the handling in privcmd wrt setup and
> > teardown of the stage one mapping which goes along with the p2m
> > manipulations.
> > 
> > Without this the normal rmap counting kicks in and complains about the
> > mapcount being -1.
> 
> In the case of PV guests we need to mark the pte special because there
> is no struct page associated to it: the userspace virtual address is
> mapped to a foreign mfn. No dom0 pages are involved.
> 
> In this case we do have a corresponding physical address and a
> corresponding struct page in dom0.  In fact the dom0 page is a
> xenballooned_pages allocated by alloc_empty_pages.

special is not 100% the same as "no struct page". It means "not normal"
where normal also has other requirements, such as being normal user
process RAM established in the normal way (e.g. get_user_pages or
similar), which privcmd does not do.

> What exactly is the kernel complaining about?

There are a few different symptoms, but frequently it is hitting a bad
pte path on teardown. This is caused by vm_normal_page returning true
when it should be false (where true and false are actually non-NULL and
NULL).

e.g.:

BUG: Bad page map in process xl  pte:e0004077b33f53 pmd:4079575003
page:ffffffbce1a2f328 count:1 mapcount:-1 mapping:          (null) index:0x0
page flags: 0x4000000000000014(referenced|dirty)
addr:0000007fb5259000 vm_flags:040644fa anon_vma:          (null) mapping:ffffffc03a6fda58 index:0
vma->vm_ops->fault: privcmd_fault+0x0/0x38
vma->vm_file->f_op->mmap: privcmd_mmap+0x0/0x2c
CPU: 0 PID: 2657 Comm: xl Not tainted 3.12.0+ #102
Call trace:  
[<ffffffc0000880f8>] dump_backtrace+0x0/0x12c
[<ffffffc000088238>] show_stack+0x14/0x1c
[<ffffffc0004b67e0>] dump_stack+0x70/0x90
[<ffffffc000125690>] print_bad_pte+0x12c/0x1bc
[<ffffffc0001268f4>] unmap_single_vma+0x4cc/0x700
[<ffffffc0001273b4>] unmap_vmas+0x68/0xb4
[<ffffffc00012c050>] unmap_region+0xcc/0x1d4
[<ffffffc00012df20>] do_munmap+0x218/0x314
[<ffffffc00012e060>] vm_munmap+0x44/0x64
[<ffffffc00012ed78>] SyS_munmap+0x24/0x34

Or:

BUG: Bad page state in process xl  pfn:4077b4d
page:ffffffbce1a2f8d8 count:0 mapcount:-1 mapping:          (null) index:0x0
page flags: 0x4000000000000014(referenced|dirty)
Modules linked in:
CPU: 0 PID: 2657 Comm: xl Tainted: G    B        3.12.0+ #102
Call trace:
[<ffffffc0000880f8>] dump_backtrace+0x0/0x12c
[<ffffffc000088238>] show_stack+0x14/0x1c
[<ffffffc0004b67e0>] dump_stack+0x70/0x90
[<ffffffc00010f798>] bad_page+0xc4/0x110
[<ffffffc00010f8b4>] free_pages_prepare+0xd0/0xd8
[<ffffffc000110e94>] free_hot_cold_page+0x28/0x178
[<ffffffc000111460>] free_hot_cold_page_list+0x38/0x60
[<ffffffc000114cf0>] release_pages+0x190/0x1dc
[<ffffffc00012c0e0>] unmap_region+0x15c/0x1d4
[<ffffffc00012df20>] do_munmap+0x218/0x314
[<ffffffc00012e060>] vm_munmap+0x44/0x64
[<ffffffc00012ed78>] SyS_munmap+0x24/0x34

> Didn't we allocate the pages correctly?

It depends what you mean by correctly, we've done everything right for a
special page, except telling the kernel (via the pte) that the page is
special.

> Didn't we call get_page appropriately?

The count which is wrong is not page->count it is page->_mapcount, which
is used by the rmap stuff AFAIK.

> There must be a way to make the mapcount happy without marking the pages
> as special.

I'm sure there is, but it would be semantically incorrect, these pages
are not normal process memory, even though they do happen to have a
struct page. I really don't think making these pages appear normal is
what we want. We don't want them being considered page cache, available
to be swapped or any of that "normal" stuff.

> Also consider that on 3.13-rc2 pte_mkspecial on arm is implemented as:
> 
> static inline pte_t pte_mkspecial(pte_t pte) { return pte; }

This is because 32-bit arm does not have a spare PTE bit to use as
PTE_SPECIAL, end therefore does not define __HAVE_ARCH_PTE_SPECIAL. It
takes the slow/fallback path in vm_normal, which works fine.

arm64 does have such a spare bit, so vm_normal takes the fast path but
reaches the wrong conclusion because we didn't set the special bit like
we should.

Ian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] arm: xen: foreign mapping PTEs are special.
  2013-12-09 10:19       ` Ian Campbell
@ 2013-12-09 12:16         ` Stefano Stabellini
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Stabellini @ 2013-12-09 12:16 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel, Stefano Stabellini

On Mon, 9 Dec 2013, Ian Campbell wrote:
> On Sun, 2013-12-08 at 17:32 +0000, Stefano Stabellini wrote:
> > On Fri, 6 Dec 2013, Ian Campbell wrote:
> > > On Fri, 2013-12-06 at 17:48 +0000, Stefano Stabellini wrote:
> > > > On Wed, 4 Dec 2013, Ian Campbell wrote:
> > > > > These mappings are in fact special and require special handling in privcmd,
> > > > > which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun.
> > > > > 
> > > > > x86 already gets this correct.
> > > > 
> > > > Yes, but x86 does that for PV guests, not for autotranslate guests (for
> > > > which the function return -EINVAL).
> > > > 
> > > > Given that in the ARM case we are changing the p2m underneath, why do we
> > > > also need to mark them special?
> > > 
> > > It's not about the p2m, it's about the handling in privcmd wrt setup and
> > > teardown of the stage one mapping which goes along with the p2m
> > > manipulations.
> > > 
> > > Without this the normal rmap counting kicks in and complains about the
> > > mapcount being -1.
> > 
> > In the case of PV guests we need to mark the pte special because there
> > is no struct page associated to it: the userspace virtual address is
> > mapped to a foreign mfn. No dom0 pages are involved.
> > 
> > In this case we do have a corresponding physical address and a
> > corresponding struct page in dom0.  In fact the dom0 page is a
> > xenballooned_pages allocated by alloc_empty_pages.
> 
> special is not 100% the same as "no struct page". It means "not normal"
> where normal also has other requirements, such as being normal user
> process RAM established in the normal way (e.g. get_user_pages or
> similar), which privcmd does not do.
> 
> > What exactly is the kernel complaining about?
> 
> There are a few different symptoms, but frequently it is hitting a bad
> pte path on teardown. This is caused by vm_normal_page returning true
> when it should be false (where true and false are actually non-NULL and
> NULL).
> 
> e.g.:
> 
> BUG: Bad page map in process xl  pte:e0004077b33f53 pmd:4079575003
> page:ffffffbce1a2f328 count:1 mapcount:-1 mapping:          (null) index:0x0
> page flags: 0x4000000000000014(referenced|dirty)
> addr:0000007fb5259000 vm_flags:040644fa anon_vma:          (null) mapping:ffffffc03a6fda58 index:0
> vma->vm_ops->fault: privcmd_fault+0x0/0x38
> vma->vm_file->f_op->mmap: privcmd_mmap+0x0/0x2c
> CPU: 0 PID: 2657 Comm: xl Not tainted 3.12.0+ #102
> Call trace:  
> [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c
> [<ffffffc000088238>] show_stack+0x14/0x1c
> [<ffffffc0004b67e0>] dump_stack+0x70/0x90
> [<ffffffc000125690>] print_bad_pte+0x12c/0x1bc
> [<ffffffc0001268f4>] unmap_single_vma+0x4cc/0x700
> [<ffffffc0001273b4>] unmap_vmas+0x68/0xb4
> [<ffffffc00012c050>] unmap_region+0xcc/0x1d4
> [<ffffffc00012df20>] do_munmap+0x218/0x314
> [<ffffffc00012e060>] vm_munmap+0x44/0x64
> [<ffffffc00012ed78>] SyS_munmap+0x24/0x34
> 
> Or:
> 
> BUG: Bad page state in process xl  pfn:4077b4d
> page:ffffffbce1a2f8d8 count:0 mapcount:-1 mapping:          (null) index:0x0
> page flags: 0x4000000000000014(referenced|dirty)
> Modules linked in:
> CPU: 0 PID: 2657 Comm: xl Tainted: G    B        3.12.0+ #102
> Call trace:
> [<ffffffc0000880f8>] dump_backtrace+0x0/0x12c
> [<ffffffc000088238>] show_stack+0x14/0x1c
> [<ffffffc0004b67e0>] dump_stack+0x70/0x90
> [<ffffffc00010f798>] bad_page+0xc4/0x110
> [<ffffffc00010f8b4>] free_pages_prepare+0xd0/0xd8
> [<ffffffc000110e94>] free_hot_cold_page+0x28/0x178
> [<ffffffc000111460>] free_hot_cold_page_list+0x38/0x60
> [<ffffffc000114cf0>] release_pages+0x190/0x1dc
> [<ffffffc00012c0e0>] unmap_region+0x15c/0x1d4
> [<ffffffc00012df20>] do_munmap+0x218/0x314
> [<ffffffc00012e060>] vm_munmap+0x44/0x64
> [<ffffffc00012ed78>] SyS_munmap+0x24/0x34

Could you please resend the patch adding the backtrace to the commit
message?
Given that it skips a few steps, maybe it's best to write it down
manually:

unmap_single_vma -> unmap_page_range -> zap_pud_range -> zap_pmd_range
-> zap_pte_range -> print_bad_pte

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-09 12:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-04 15:52 [PATCH] arm: xen: foreign mapping PTEs are special Ian Campbell
2013-12-06 17:48 ` Stefano Stabellini
2013-12-06 17:54   ` Ian Campbell
2013-12-08 17:32     ` Stefano Stabellini
2013-12-09 10:19       ` Ian Campbell
2013-12-09 12:16         ` Stefano Stabellini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).