xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] XEN/X86: Document pagetable_reserve PVOPS and enforce a better semantic
@ 2012-08-14 12:24 Attilio Rao
  2012-08-14 12:24 ` [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS Attilio Rao
  2012-08-14 12:24 ` [PATCH v2 2/2] Xen: Document the semantic of the " Attilio Rao
  0 siblings, 2 replies; 20+ messages in thread
From: Attilio Rao @ 2012-08-14 12:24 UTC (permalink / raw)
  To: xen-devel, Konrad Rzeszutek Wilk, Stefano Stabellini; +Cc: Attilio Rao

When looking for documenting the pagetable_reserve PVOPS, I realized that it
assumes start == pgt_buf_start. I think this is not semantically right
(even if with the current code this should not be a problem in practice) and
what we really want is to extend the logic in order to do the RO -> RW
convertion also for the range [pgt_buf_start, start).
This patch then implements this missing conversion, adding some smaller
cleanups and finally provides documentation for the PVOPS.
Please look at 2/2 for more details on how the comment is structured.
If we get this right we will have a reference to be used later on for others
PVOPS.

The difference with v1 is that sh_start local variable in
xen_mapping_pagetable_reserve() is renamed as begin. Also, the commit messages
have been tweaked.

Attilio Rao (2):
  XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  Xen: Document the semantic of the pagetable_reserve PVOPS

 arch/x86/include/asm/x86_init.h |   19 +++++++++++++++++--
 arch/x86/mm/init.c              |    4 ++++
 arch/x86/xen/mmu.c              |   22 ++++++++++++++++++++--
 3 files changed, 41 insertions(+), 4 deletions(-)

-- 
1.7.2.5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-14 12:24 [PATCH v2 0/2] XEN/X86: Document pagetable_reserve PVOPS and enforce a better semantic Attilio Rao
@ 2012-08-14 12:24 ` Attilio Rao
  2012-08-15 17:25   ` Stefano Stabellini
  2012-08-14 12:24 ` [PATCH v2 2/2] Xen: Document the semantic of the " Attilio Rao
  1 sibling, 1 reply; 20+ messages in thread
From: Attilio Rao @ 2012-08-14 12:24 UTC (permalink / raw)
  To: xen-devel, Konrad Rzeszutek Wilk, Stefano Stabellini; +Cc: Attilio Rao

- Allow xen_mapping_pagetable_reserve() to handle a start different from
  pgt_buf_start, but still bigger than it.
- Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
  for verifying start and end are contained in the range
  [pgt_buf_start, pgt_buf_top].
- In xen_mapping_pagetable_reserve(), change printk into pr_debug.
- In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
  an actual need to do that (or, in other words, if there are actually some
  pages going to switch from RO to RW).

Signed-off-by: Attilio Rao <attilio.rao@citrix.com>
---
 arch/x86/mm/init.c |    4 ++++
 arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e0e6990..c5849b6 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
 
 void __init native_pagetable_reserve(u64 start, u64 end)
 {
+	if (start < PFN_PHYS(pgt_buf_start) || end > PFN_PHYS(pgt_buf_top))
+		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
+			start, end, PFN_PHYS(pgt_buf_start),
+			PFN_PHYS(pgt_buf_top));
 	memblock_reserve(start, end - start);
 }
 
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b65a761..66d73a2 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1180,12 +1180,30 @@ static void __init xen_pagetable_setup_start(pgd_t *base)
 
 static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
 {
+	u64 begin;
+
+	begin = PFN_PHYS(pgt_buf_start);
+
+	if (start < begin || end > PFN_PHYS(pgt_buf_top))
+		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
+			start, end, begin, PFN_PHYS(pgt_buf_top));
+
+	/* set RW the initial range */
+	if  (start != begin)
+		pr_debug("xen: setting RW the range %llx - %llx\n",
+			begin, start);
+	while (begin < start) {
+		make_lowmem_page_readwrite(__va(begin));
+		begin += PAGE_SIZE;
+	}
+
 	/* reserve the range used */
 	native_pagetable_reserve(start, end);
 
 	/* set as RW the rest */
-	printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
-			PFN_PHYS(pgt_buf_top));
+	if (end != PFN_PHYS(pgt_buf_top))
+		pr_debug("xen: setting RW the range %llx - %llx\n",
+			end, PFN_PHYS(pgt_buf_top));
 	while (end < PFN_PHYS(pgt_buf_top)) {
 		make_lowmem_page_readwrite(__va(end));
 		end += PAGE_SIZE;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
  2012-08-14 12:24 [PATCH v2 0/2] XEN/X86: Document pagetable_reserve PVOPS and enforce a better semantic Attilio Rao
  2012-08-14 12:24 ` [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS Attilio Rao
@ 2012-08-14 12:24 ` Attilio Rao
  2012-08-14 13:57   ` David Vrabel
  2012-08-15 17:46   ` Stefano Stabellini
  1 sibling, 2 replies; 20+ messages in thread
From: Attilio Rao @ 2012-08-14 12:24 UTC (permalink / raw)
  To: xen-devel, Konrad Rzeszutek Wilk, Stefano Stabellini; +Cc: Attilio Rao

The informations added on the hook are:
- Native behaviour
- Xen specific behaviour
- Logic behind the Xen specific behaviour
- PVOPS semantic

Signed-off-by: Attilio Rao <attilio.rao@citrix.com>
---
 arch/x86/include/asm/x86_init.h |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 38155f6..b22093c 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -72,8 +72,23 @@ struct x86_init_oem {
  * struct x86_init_mapping - platform specific initial kernel pagetable setup
  * @pagetable_reserve:	reserve a range of addresses for kernel pagetable usage
  *
- * For more details on the purpose of this hook, look in
- * init_memory_mapping and the commit that added it.
+ * It does reserve a range of pages, to be used as pagetable pages.
+ * The start and end parameters are expected to be contained in the
+ * [pgt_buf_start, pgt_buf_top] range.
+ * The native implementation reserves the pages via the memblock_reserve()
+ * interface.
+ * The Xen implementation, besides reserving the range via memblock_reserve(),
+ * also sets RW the remaining pages contained in the ranges
+ * [pgt_buf_start, start) and [end, pgt_buf_top).
+ * This is needed because the range [pgt_buf_start, pgt_buf_top] was
+ * previously mapped read-only by xen_set_pte_init: when running
+ * on Xen all the pagetable pages need to be mapped read-only in order to
+ * avoid protection faults from the hypervisor. However, once the correct
+ * amount of pages is reserved for the pagetables, all the others contained
+ * in the range must be set to RW so that they can be correctly recycled by
+ * the VM subsystem.
+ * This operation is meant to be performed only during init_memory_mapping(),
+ * just after space for the kernel direct mapping tables is found.
  */
 struct x86_init_mapping {
 	void (*pagetable_reserve)(u64 start, u64 end);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
  2012-08-14 12:24 ` [PATCH v2 2/2] Xen: Document the semantic of the " Attilio Rao
@ 2012-08-14 13:57   ` David Vrabel
  2012-08-14 14:12     ` Attilio Rao
  2012-08-15 17:46   ` Stefano Stabellini
  1 sibling, 1 reply; 20+ messages in thread
From: David Vrabel @ 2012-08-14 13:57 UTC (permalink / raw)
  To: Attilio Rao; +Cc: Konrad Rzeszutek Wilk, Stefano Stabellini, xen-devel

On 14/08/12 13:24, Attilio Rao wrote:
> The informations added on the hook are:
> - Native behaviour
> - Xen specific behaviour
> - Logic behind the Xen specific behaviour

These are implementation details and should be documented with the
implementations (if necessary).

> - PVOPS semantic

This is the interesting stuff.

This particular pvop seems a little odd really.  It might make more
sense if it took a third parameter for pgt_buf_top.

"@pagetable_reserve is used to reserve a range of PFNs used for the
kernel direct mapping page tables and cleans-up any PFNs that ended up
not being used for the tables.

It shall reserve the range (start, end] with memblock_reserve(). It
shall prepare PFNs in the range (end, pgt_buf_top] for general (non page
table) use.

It shall only be called in init_memory_mapping() after the direct
mapping tables have been constructed."

Having said that, I couldn't immediately see where pages in (end,
pgt_buf_top] was getting set RO.  Can you point me to where it's done?

David

> Signed-off-by: Attilio Rao <attilio.rao@citrix.com>
> ---
>  arch/x86/include/asm/x86_init.h |   19 +++++++++++++++++--
>  1 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index 38155f6..b22093c 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -72,8 +72,23 @@ struct x86_init_oem {
>   * struct x86_init_mapping - platform specific initial kernel pagetable setup
>   * @pagetable_reserve:	reserve a range of addresses for kernel pagetable usage
>   *
> - * For more details on the purpose of this hook, look in
> - * init_memory_mapping and the commit that added it.
> + * It does reserve a range of pages, to be used as pagetable pages.
> + * The start and end parameters are expected to be contained in the
> + * [pgt_buf_start, pgt_buf_top] range.
> + * The native implementation reserves the pages via the memblock_reserve()
> + * interface.
> + * The Xen implementation, besides reserving the range via memblock_reserve(),
> + * also sets RW the remaining pages contained in the ranges
> + * [pgt_buf_start, start) and [end, pgt_buf_top).
> + * This is needed because the range [pgt_buf_start, pgt_buf_top] was
> + * previously mapped read-only by xen_set_pte_init: when running
> + * on Xen all the pagetable pages need to be mapped read-only in order to
> + * avoid protection faults from the hypervisor. However, once the correct
> + * amount of pages is reserved for the pagetables, all the others contained
> + * in the range must be set to RW so that they can be correctly recycled by
> + * the VM subsystem.
> + * This operation is meant to be performed only during init_memory_mapping(),
> + * just after space for the kernel direct mapping tables is found.
>   */
>  struct x86_init_mapping {
>  	void (*pagetable_reserve)(u64 start, u64 end);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
  2012-08-14 13:57   ` David Vrabel
@ 2012-08-14 14:12     ` Attilio Rao
  2012-08-15 11:19       ` David Vrabel
  0 siblings, 1 reply; 20+ messages in thread
From: Attilio Rao @ 2012-08-14 14:12 UTC (permalink / raw)
  To: David Vrabel
  Cc: Konrad Rzeszutek Wilk, Stefano Stabellini,
	xen-devel@lists.xen.org

On 14/08/12 14:57, David Vrabel wrote:
> On 14/08/12 13:24, Attilio Rao wrote:
>    
>> The informations added on the hook are:
>> - Native behaviour
>> - Xen specific behaviour
>> - Logic behind the Xen specific behaviour
>>      
> These are implementation details and should be documented with the
> implementations (if necessary).
>    

In this specific case, implementation details are very valuable to 
understand the semantic of operations, this is why I added them there. I 
think, at least for this case, this is the best trade-off.

>> - PVOPS semantic
>>      
> This is the interesting stuff.
>
> This particular pvop seems a little odd really.  It might make more
> sense if it took a third parameter for pgt_buf_top.
>    

The thing is that this work (documenting PVOPS) should help in 
understanding the logic behind some PVOPS and possibly improve/rework 
them. For this stage, I agreed with Konrad to keep the changes as small 
as possible. Once the documentation about the semantic is in place we 
can think about ways to improve things more effectively (for example, in 
some cases we may want to rewrite the PVOP completely).

> "@pagetable_reserve is used to reserve a range of PFNs used for the
> kernel direct mapping page tables and cleans-up any PFNs that ended up
> not being used for the tables.
>
> It shall reserve the range (start, end] with memblock_reserve(). It
> shall prepare PFNs in the range (end, pgt_buf_top] for general (non page
> table) use.
>
> It shall only be called in init_memory_mapping() after the direct
> mapping tables have been constructed."
>
> Having said that, I couldn't immediately see where pages in (end,
> pgt_buf_top] was getting set RO.  Can you point me to where it's done?
>    

As mentioned in the comment, please look at xen_set_pte_init().

Attilio

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
  2012-08-14 14:12     ` Attilio Rao
@ 2012-08-15 11:19       ` David Vrabel
  2012-08-15 13:55         ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: David Vrabel @ 2012-08-15 11:19 UTC (permalink / raw)
  To: Attilio Rao
  Cc: Konrad Rzeszutek Wilk, Stefano Stabellini,
	xen-devel@lists.xen.org

On 14/08/12 15:12, Attilio Rao wrote:
> On 14/08/12 14:57, David Vrabel wrote:
>> On 14/08/12 13:24, Attilio Rao wrote:
>> 
>>> The informations added on the hook are: - Native behaviour - Xen 
>>> specific behaviour - Logic behind the Xen specific behaviour
>>> 
>> These are implementation details and should be documented with the
>>  implementations (if necessary).
>> 
> 
> In this specific case, implementation details are very valuable to 
> understand the semantic of operations, this is why I added them 
> there. I think, at least for this case, this is the best trade-off.

Documenting the implementation details will be useful for reviewing or
refactoring the pv-ops but I don't think it useful in the longer term
for it to be included in the API documentation upstream.

>>> - PVOPS semantic
>>> 
>> This is the interesting stuff.
>> 
>> This particular pvop seems a little odd really.  It might make more
>> sense if it took a third parameter for pgt_buf_top.
> 
> The thing is that this work (documenting PVOPS) should help in 
> understanding the logic behind some PVOPS and possibly
> improve/rework them. For this stage, I agreed with Konrad to keep the
> changes as small as possible. Once the documentation about the
> semantic is in place we can think about ways to improve things more
> effectively (for example, in some cases we may want to rewrite the
> PVOP completely).

After looking at it some more, I think this pv-ops is unnecessary. How
about the following patch to just remove it completely?

I've only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning
is sound.

>> "@pagetable_reserve is used to reserve a range of PFNs used for the
>> kernel direct mapping page tables and cleans-up any PFNs that ended
>> up not being used for the tables.
>> 
>> It shall reserve the range (start, end] with memblock_reserve(). It
>> shall prepare PFNs in the range (end, pgt_buf_top] for general (non
>> page table) use.
>> 
>> It shall only be called in init_memory_mapping() after the direct 
>> mapping tables have been constructed."
>> 
>> Having said that, I couldn't immediately see where pages in (end, 
>> pgt_buf_top] was getting set RO.  Can you point me to where it's 
>> done?
>> 
> 
> As mentioned in the comment, please look at xen_set_pte_init().

xen_set_pte_init() only ensures it doesn't set the PTE as writable if it
is already present and read-only.

David

8<----------------------
x86: remove x86_init.mapping.pagetable_reserve paravirt op

The x86_init.mapping.pagetable_reserve paravirt op is used for Xen
guests to set the writable flag for the mapping of (pgt_buf_end,
pgt_buf_top].  This is not necessary as these pages are never set as
read-only as they have never contained page tables.

When running as a Xen guest, the initial page tables are provided by
Xen (these are reserved with memblock_reserve() in
xen_setup_kernel_pagetable()) and constructed in brk space (for 32-bit
guests) or in the kernel's .data section (for 64-bit guests, see
head_64.S).

Since these are all marked as reserved, (pgt_buf_start, pgt_buf_top]
does not overlap with them and the mappings for these PFNs will be
read-write.

Since Xen doesn't need to change the mapping its implementation
becomes the same as a native and we can simply remove this pv-op
completely.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/include/asm/pgtable_types.h |    1 -
 arch/x86/include/asm/x86_init.h      |   12 ------------
 arch/x86/kernel/x86_init.c           |    4 ----
 arch/x86/mm/init.c                   |   22 +++-------------------
 arch/x86/xen/mmu.c                   |   19 ++-----------------
 5 files changed, 5 insertions(+), 53 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 013286a..0a11293 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -301,7 +301,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
 /* Install a pte for a particular vaddr in kernel space. */
 void set_pte_vaddr(unsigned long vaddr, pte_t pte);
 
-extern void native_pagetable_reserve(u64 start, u64 end);
 #ifdef CONFIG_X86_32
 extern void native_pagetable_setup_start(pgd_t *base);
 extern void native_pagetable_setup_done(pgd_t *base);
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 38155f6..b527dd4 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -69,17 +69,6 @@ struct x86_init_oem {
 };
 
 /**
- * struct x86_init_mapping - platform specific initial kernel pagetable setup
- * @pagetable_reserve:	reserve a range of addresses for kernel pagetable usage
- *
- * For more details on the purpose of this hook, look in
- * init_memory_mapping and the commit that added it.
- */
-struct x86_init_mapping {
-	void (*pagetable_reserve)(u64 start, u64 end);
-};
-
-/**
  * struct x86_init_paging - platform specific paging functions
  * @pagetable_setup_start:	platform specific pre paging_init() call
  * @pagetable_setup_done:	platform specific post paging_init() call
@@ -135,7 +124,6 @@ struct x86_init_ops {
 	struct x86_init_mpparse		mpparse;
 	struct x86_init_irqs		irqs;
 	struct x86_init_oem		oem;
-	struct x86_init_mapping		mapping;
 	struct x86_init_paging		paging;
 	struct x86_init_timers		timers;
 	struct x86_init_iommu		iommu;
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 9f3167e..040c05f 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -63,10 +63,6 @@ struct x86_init_ops x86_init __initdata = {
 		.banner			= default_banner,
 	},
 
-	.mapping = {
-		.pagetable_reserve		= native_pagetable_reserve,
-	},
-
 	.paging = {
 		.pagetable_setup_start	= native_pagetable_setup_start,
 		.pagetable_setup_done	= native_pagetable_setup_done,
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e0e6990..c449873 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -90,11 +90,6 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
 		(pgt_buf_top << PAGE_SHIFT) - 1);
 }
 
-void __init native_pagetable_reserve(u64 start, u64 end)
-{
-	memblock_reserve(start, end - start);
-}
-
 #ifdef CONFIG_X86_32
 #define NR_RANGE_MR 3
 #else /* CONFIG_X86_64 */
@@ -283,22 +278,11 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
 
 	/*
 	 * Reserve the kernel pagetable pages we used (pgt_buf_start -
-	 * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
-	 * so that they can be reused for other purposes.
-	 *
-	 * On native it just means calling memblock_reserve, on Xen it also
-	 * means marking RW the pagetable pages that we allocated before
-	 * but that haven't been used.
-	 *
-	 * In fact on xen we mark RO the whole range pgt_buf_start -
-	 * pgt_buf_top, because we have to make sure that when
-	 * init_memory_mapping reaches the pagetable pages area, it maps
-	 * RO all the pagetable pages, including the ones that are beyond
-	 * pgt_buf_end at that time.
+	 * pgt_buf_end).
 	 */
 	if (!after_bootmem && pgt_buf_end > pgt_buf_start)
-		x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start),
-				PFN_PHYS(pgt_buf_end));
+		memblock_reserve(PFN_PHYS(pgt_buf_start),
+				 PFN_PHYS(pgt_buf_end) - PFN_PHYS(pgt_buf_start));
 
 	if (!after_bootmem)
 		early_memtest(start, end);
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index b65a761..e55dfc0 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1178,20 +1178,6 @@ static void __init xen_pagetable_setup_start(pgd_t *base)
 {
 }
 
-static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
-{
-	/* reserve the range used */
-	native_pagetable_reserve(start, end);
-
-	/* set as RW the rest */
-	printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
-			PFN_PHYS(pgt_buf_top));
-	while (end < PFN_PHYS(pgt_buf_top)) {
-		make_lowmem_page_readwrite(__va(end));
-		end += PAGE_SIZE;
-	}
-}
-
 static void xen_post_allocator_init(void);
 
 static void __init xen_pagetable_setup_done(pgd_t *base)
@@ -2067,7 +2053,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
 
 void __init xen_init_mmu_ops(void)
 {
-	x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
 	x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start;
 	x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
 	pv_mmu_ops = xen_mmu_ops;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
  2012-08-15 11:19       ` David Vrabel
@ 2012-08-15 13:55         ` Stefano Stabellini
  2012-08-15 19:15           ` David Vrabel
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2012-08-15 13:55 UTC (permalink / raw)
  To: David Vrabel
  Cc: Attilio Rao, Konrad Rzeszutek Wilk, Stefano Stabellini,
	xen-devel@lists.xen.org

On Wed, 15 Aug 2012, David Vrabel wrote:
> On 14/08/12 15:12, Attilio Rao wrote:
> > On 14/08/12 14:57, David Vrabel wrote:
> >> On 14/08/12 13:24, Attilio Rao wrote:
> >> 
> >>> The informations added on the hook are: - Native behaviour - Xen 
> >>> specific behaviour - Logic behind the Xen specific behaviour
> >>> 
> >> These are implementation details and should be documented with the
> >>  implementations (if necessary).
> >> 
> > 
> > In this specific case, implementation details are very valuable to 
> > understand the semantic of operations, this is why I added them 
> > there. I think, at least for this case, this is the best trade-off.
> 
> Documenting the implementation details will be useful for reviewing or
> refactoring the pv-ops but I don't think it useful in the longer term
> for it to be included in the API documentation upstream.
> 
> >>> - PVOPS semantic
> >>> 
> >> This is the interesting stuff.
> >> 
> >> This particular pvop seems a little odd really.  It might make more
> >> sense if it took a third parameter for pgt_buf_top.
> > 
> > The thing is that this work (documenting PVOPS) should help in 
> > understanding the logic behind some PVOPS and possibly
> > improve/rework them. For this stage, I agreed with Konrad to keep the
> > changes as small as possible. Once the documentation about the
> > semantic is in place we can think about ways to improve things more
> > effectively (for example, in some cases we may want to rewrite the
> > PVOP completely).
> 
> After looking at it some more, I think this pv-ops is unnecessary. How
> about the following patch to just remove it completely?
> 
> I've only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning
> is sound.

Do you have more then 4G to dom0 on those boxes?
It certainly fixed a serious crash at the time it was introduced, see
http://marc.info/?l=linux-kernel&m=129901609503574 and
http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big
changed in kernel_physical_mapping_init, I think we still need it.
Depending on the e820 of your test box, the kernel could crash (or not),
possibly in different places.


> >> "@pagetable_reserve is used to reserve a range of PFNs used for the
> >> kernel direct mapping page tables and cleans-up any PFNs that ended
> >> up not being used for the tables.
> >> 
> >> It shall reserve the range (start, end] with memblock_reserve(). It
> >> shall prepare PFNs in the range (end, pgt_buf_top] for general (non
> >> page table) use.
> >> 
> >> It shall only be called in init_memory_mapping() after the direct 
> >> mapping tables have been constructed."
> >> 
> >> Having said that, I couldn't immediately see where pages in (end, 
> >> pgt_buf_top] was getting set RO.  Can you point me to where it's 
> >> done?
> >> 
> > 
> > As mentioned in the comment, please look at xen_set_pte_init().
> 
> xen_set_pte_init() only ensures it doesn't set the PTE as writable if it
> is already present and read-only.

look at mask_rw_pte and read the threads linked above, unfortunately it
is not that simple.


> David
> 
> 8<----------------------
> x86: remove x86_init.mapping.pagetable_reserve paravirt op
> 
> The x86_init.mapping.pagetable_reserve paravirt op is used for Xen
> guests to set the writable flag for the mapping of (pgt_buf_end,
> pgt_buf_top].  This is not necessary as these pages are never set as
> read-only as they have never contained page tables.

Is this actually true? It is possible when pagetable pages are
allocated by alloc_low_page.


> When running as a Xen guest, the initial page tables are provided by
> Xen (these are reserved with memblock_reserve() in
> xen_setup_kernel_pagetable()) and constructed in brk space (for 32-bit
> guests) or in the kernel's .data section (for 64-bit guests, see
> head_64.S).
> 
> Since these are all marked as reserved, (pgt_buf_start, pgt_buf_top]
> does not overlap with them and the mappings for these PFNs will be
> read-write.

We are talking about pagetable pages built by
kernel_physical_mapping_init.


> Since Xen doesn't need to change the mapping its implementation
> becomes the same as a native and we can simply remove this pv-op
> completely.

I don't think so.



> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  arch/x86/include/asm/pgtable_types.h |    1 -
>  arch/x86/include/asm/x86_init.h      |   12 ------------
>  arch/x86/kernel/x86_init.c           |    4 ----
>  arch/x86/mm/init.c                   |   22 +++-------------------
>  arch/x86/xen/mmu.c                   |   19 ++-----------------
>  5 files changed, 5 insertions(+), 53 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index 013286a..0a11293 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -301,7 +301,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
>  /* Install a pte for a particular vaddr in kernel space. */
>  void set_pte_vaddr(unsigned long vaddr, pte_t pte);
>  
> -extern void native_pagetable_reserve(u64 start, u64 end);
>  #ifdef CONFIG_X86_32
>  extern void native_pagetable_setup_start(pgd_t *base);
>  extern void native_pagetable_setup_done(pgd_t *base);
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index 38155f6..b527dd4 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -69,17 +69,6 @@ struct x86_init_oem {
>  };
>  
>  /**
> - * struct x86_init_mapping - platform specific initial kernel pagetable setup
> - * @pagetable_reserve:	reserve a range of addresses for kernel pagetable usage
> - *
> - * For more details on the purpose of this hook, look in
> - * init_memory_mapping and the commit that added it.
> - */
> -struct x86_init_mapping {
> -	void (*pagetable_reserve)(u64 start, u64 end);
> -};
> -
> -/**
>   * struct x86_init_paging - platform specific paging functions
>   * @pagetable_setup_start:	platform specific pre paging_init() call
>   * @pagetable_setup_done:	platform specific post paging_init() call
> @@ -135,7 +124,6 @@ struct x86_init_ops {
>  	struct x86_init_mpparse		mpparse;
>  	struct x86_init_irqs		irqs;
>  	struct x86_init_oem		oem;
> -	struct x86_init_mapping		mapping;
>  	struct x86_init_paging		paging;
>  	struct x86_init_timers		timers;
>  	struct x86_init_iommu		iommu;
> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 9f3167e..040c05f 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -63,10 +63,6 @@ struct x86_init_ops x86_init __initdata = {
>  		.banner			= default_banner,
>  	},
>  
> -	.mapping = {
> -		.pagetable_reserve		= native_pagetable_reserve,
> -	},
> -
>  	.paging = {
>  		.pagetable_setup_start	= native_pagetable_setup_start,
>  		.pagetable_setup_done	= native_pagetable_setup_done,
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index e0e6990..c449873 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -90,11 +90,6 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
>  		(pgt_buf_top << PAGE_SHIFT) - 1);
>  }
>  
> -void __init native_pagetable_reserve(u64 start, u64 end)
> -{
> -	memblock_reserve(start, end - start);
> -}
> -
>  #ifdef CONFIG_X86_32
>  #define NR_RANGE_MR 3
>  #else /* CONFIG_X86_64 */
> @@ -283,22 +278,11 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
>  
>  	/*
>  	 * Reserve the kernel pagetable pages we used (pgt_buf_start -
> -	 * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
> -	 * so that they can be reused for other purposes.
> -	 *
> -	 * On native it just means calling memblock_reserve, on Xen it also
> -	 * means marking RW the pagetable pages that we allocated before
> -	 * but that haven't been used.
> -	 *
> -	 * In fact on xen we mark RO the whole range pgt_buf_start -
> -	 * pgt_buf_top, because we have to make sure that when
> -	 * init_memory_mapping reaches the pagetable pages area, it maps
> -	 * RO all the pagetable pages, including the ones that are beyond
> -	 * pgt_buf_end at that time.
> +	 * pgt_buf_end).
>  	 */
>  	if (!after_bootmem && pgt_buf_end > pgt_buf_start)
> -		x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start),
> -				PFN_PHYS(pgt_buf_end));
> +		memblock_reserve(PFN_PHYS(pgt_buf_start),
> +				 PFN_PHYS(pgt_buf_end) - PFN_PHYS(pgt_buf_start));
>  
>  	if (!after_bootmem)
>  		early_memtest(start, end);
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index b65a761..e55dfc0 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1178,20 +1178,6 @@ static void __init xen_pagetable_setup_start(pgd_t *base)
>  {
>  }
>  
> -static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
> -{
> -	/* reserve the range used */
> -	native_pagetable_reserve(start, end);
> -
> -	/* set as RW the rest */
> -	printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
> -			PFN_PHYS(pgt_buf_top));
> -	while (end < PFN_PHYS(pgt_buf_top)) {
> -		make_lowmem_page_readwrite(__va(end));
> -		end += PAGE_SIZE;
> -	}
> -}
> -
>  static void xen_post_allocator_init(void);
>  
>  static void __init xen_pagetable_setup_done(pgd_t *base)
> @@ -2067,7 +2053,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {
>  
>  void __init xen_init_mmu_ops(void)
>  {
> -	x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
>  	x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start;
>  	x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
>  	pv_mmu_ops = xen_mmu_ops;
> -- 
> 1.7.2.5
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-15 17:25   ` Stefano Stabellini
@ 2012-08-15 17:15     ` Attilio Rao
  2012-08-15 17:46       ` Stefano Stabellini
  2012-08-15 17:33     ` Stefano Stabellini
  1 sibling, 1 reply; 20+ messages in thread
From: Attilio Rao @ 2012-08-15 17:15 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Konrad Rzeszutek Wilk, xen-devel@lists.xen.org

On 15/08/12 18:25, Stefano Stabellini wrote:
> On Tue, 14 Aug 2012, Attilio Rao wrote:
>    
>> - Allow xen_mapping_pagetable_reserve() to handle a start different from
>>    pgt_buf_start, but still bigger than it.
>> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
>>    for verifying start and end are contained in the range
>>    [pgt_buf_start, pgt_buf_top].
>> - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
>> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
>>    an actual need to do that (or, in other words, if there are actually some
>>    pages going to switch from RO to RW).
>>
>> Signed-off-by: Attilio Rao<attilio.rao@citrix.com>
>> ---
>>   arch/x86/mm/init.c |    4 ++++
>>   arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
>>   2 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>> index e0e6990..c5849b6 100644
>> --- a/arch/x86/mm/init.c
>> +++ b/arch/x86/mm/init.c
>> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
>>
>>   void __init native_pagetable_reserve(u64 start, u64 end)
>>   {
>> +	if (start<  PFN_PHYS(pgt_buf_start) || end>  PFN_PHYS(pgt_buf_top))
>> +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
>>      
> code style (you can check whether your patch breaks the code style with
> scripts/checkpatch.pl)
>    

I actually did before to submit, it reported 0 errors/warning.
Do you have an handy link on where I can find a style guide for Linux 
kernel? I tried to follow what other parts of the code do.

Attilio

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-14 12:24 ` [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS Attilio Rao
@ 2012-08-15 17:25   ` Stefano Stabellini
  2012-08-15 17:15     ` Attilio Rao
  2012-08-15 17:33     ` Stefano Stabellini
  0 siblings, 2 replies; 20+ messages in thread
From: Stefano Stabellini @ 2012-08-15 17:25 UTC (permalink / raw)
  To: Attilio Rao
  Cc: Konrad Rzeszutek Wilk, Stefano Stabellini,
	xen-devel@lists.xen.org

On Tue, 14 Aug 2012, Attilio Rao wrote:
> - Allow xen_mapping_pagetable_reserve() to handle a start different from
>   pgt_buf_start, but still bigger than it.
> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
>   for verifying start and end are contained in the range
>   [pgt_buf_start, pgt_buf_top].
> - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
>   an actual need to do that (or, in other words, if there are actually some
>   pages going to switch from RO to RW).
> 
> Signed-off-by: Attilio Rao <attilio.rao@citrix.com>
> ---
>  arch/x86/mm/init.c |    4 ++++
>  arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index e0e6990..c5849b6 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
>  
>  void __init native_pagetable_reserve(u64 start, u64 end)
>  {
> +	if (start < PFN_PHYS(pgt_buf_start) || end > PFN_PHYS(pgt_buf_top))
> +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"

code style (you can check whether your patch breaks the code style with
scripts/checkpatch.pl)

> +			start, end, PFN_PHYS(pgt_buf_start),
> +			PFN_PHYS(pgt_buf_top));
>  	memblock_reserve(start, end - start);
>  }
>  
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index b65a761..66d73a2 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1180,12 +1180,30 @@ static void __init xen_pagetable_setup_start(pgd_t *base)
>  
>  static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
>  {
> +	u64 begin;
> +
> +	begin = PFN_PHYS(pgt_buf_start);
> +
> +	if (start < begin || end > PFN_PHYS(pgt_buf_top))
> +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"

code style

> +			start, end, begin, PFN_PHYS(pgt_buf_top));
> +
> +	/* set RW the initial range */
> +	if  (start != begin)
> +		pr_debug("xen: setting RW the range %llx - %llx\n",
> +			begin, start);
> +	while (begin < start) {
> +		make_lowmem_page_readwrite(__va(begin));
> +		begin += PAGE_SIZE;
> +	}
> +
>  	/* reserve the range used */
>  	native_pagetable_reserve(start, end);
>  
>  	/* set as RW the rest */
> -	printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
> -			PFN_PHYS(pgt_buf_top));
> +	if (end != PFN_PHYS(pgt_buf_top))
> +		pr_debug("xen: setting RW the range %llx - %llx\n",
> +			end, PFN_PHYS(pgt_buf_top));
>  	while (end < PFN_PHYS(pgt_buf_top)) {
>  		make_lowmem_page_readwrite(__va(end));
>  		end += PAGE_SIZE;
> -- 
> 1.7.2.5
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-15 17:25   ` Stefano Stabellini
  2012-08-15 17:15     ` Attilio Rao
@ 2012-08-15 17:33     ` Stefano Stabellini
  1 sibling, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2012-08-15 17:33 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Attilio Rao, Konrad Rzeszutek Wilk, xen-devel@lists.xen.org

You can add me acked-by, once you do these changes

On Wed, 15 Aug 2012, Stefano Stabellini wrote:
> On Tue, 14 Aug 2012, Attilio Rao wrote:
> > - Allow xen_mapping_pagetable_reserve() to handle a start different from
> >   pgt_buf_start, but still bigger than it.
> > - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
> >   for verifying start and end are contained in the range
> >   [pgt_buf_start, pgt_buf_top].
> > - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
> > - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
> >   an actual need to do that (or, in other words, if there are actually some
> >   pages going to switch from RO to RW).
> > 
> > Signed-off-by: Attilio Rao <attilio.rao@citrix.com>
> > ---
> >  arch/x86/mm/init.c |    4 ++++
> >  arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
> >  2 files changed, 24 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> > index e0e6990..c5849b6 100644
> > --- a/arch/x86/mm/init.c
> > +++ b/arch/x86/mm/init.c
> > @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
> >  
> >  void __init native_pagetable_reserve(u64 start, u64 end)
> >  {
> > +	if (start < PFN_PHYS(pgt_buf_start) || end > PFN_PHYS(pgt_buf_top))
> > +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
> 
> code style (you can check whether your patch breaks the code style with
> scripts/checkpatch.pl)
> 
> > +			start, end, PFN_PHYS(pgt_buf_start),
> > +			PFN_PHYS(pgt_buf_top));
> >  	memblock_reserve(start, end - start);
> >  }
> >  
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index b65a761..66d73a2 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -1180,12 +1180,30 @@ static void __init xen_pagetable_setup_start(pgd_t *base)
> >  
> >  static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
> >  {
> > +	u64 begin;
> > +
> > +	begin = PFN_PHYS(pgt_buf_start);
> > +
> > +	if (start < begin || end > PFN_PHYS(pgt_buf_top))
> > +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
> 
> code style
> 
> > +			start, end, begin, PFN_PHYS(pgt_buf_top));
> > +
> > +	/* set RW the initial range */
> > +	if  (start != begin)
> > +		pr_debug("xen: setting RW the range %llx - %llx\n",
> > +			begin, start);
> > +	while (begin < start) {
> > +		make_lowmem_page_readwrite(__va(begin));
> > +		begin += PAGE_SIZE;
> > +	}
> > +
> >  	/* reserve the range used */
> >  	native_pagetable_reserve(start, end);
> >  
> >  	/* set as RW the rest */
> > -	printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
> > -			PFN_PHYS(pgt_buf_top));
> > +	if (end != PFN_PHYS(pgt_buf_top))
> > +		pr_debug("xen: setting RW the range %llx - %llx\n",
> > +			end, PFN_PHYS(pgt_buf_top));
> >  	while (end < PFN_PHYS(pgt_buf_top)) {
> >  		make_lowmem_page_readwrite(__va(end));
> >  		end += PAGE_SIZE;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-15 17:15     ` Attilio Rao
@ 2012-08-15 17:46       ` Stefano Stabellini
  2012-08-15 18:43         ` Ian Campbell
  2012-08-15 18:47         ` Attilio Rao
  0 siblings, 2 replies; 20+ messages in thread
From: Stefano Stabellini @ 2012-08-15 17:46 UTC (permalink / raw)
  To: Attilio Rao
  Cc: xen-devel@lists.xen.org, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, 15 Aug 2012, Attilio Rao wrote:
> On 15/08/12 18:25, Stefano Stabellini wrote:
> > On Tue, 14 Aug 2012, Attilio Rao wrote:
> >    
> >> - Allow xen_mapping_pagetable_reserve() to handle a start different from
> >>    pgt_buf_start, but still bigger than it.
> >> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
> >>    for verifying start and end are contained in the range
> >>    [pgt_buf_start, pgt_buf_top].
> >> - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
> >> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
> >>    an actual need to do that (or, in other words, if there are actually some
> >>    pages going to switch from RO to RW).
> >>
> >> Signed-off-by: Attilio Rao<attilio.rao@citrix.com>
> >> ---
> >>   arch/x86/mm/init.c |    4 ++++
> >>   arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
> >>   2 files changed, 24 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> >> index e0e6990..c5849b6 100644
> >> --- a/arch/x86/mm/init.c
> >> +++ b/arch/x86/mm/init.c
> >> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
> >>
> >>   void __init native_pagetable_reserve(u64 start, u64 end)
> >>   {
> >> +	if (start<  PFN_PHYS(pgt_buf_start) || end>  PFN_PHYS(pgt_buf_top))
> >> +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
> >>      
> > code style (you can check whether your patch breaks the code style with
> > scripts/checkpatch.pl)
> >    
> 
> I actually did before to submit, it reported 0 errors/warning.

strange, that really looks like a line over 80 chars


> Do you have an handy link on where I can find a style guide for Linux 
> kernel? I tried to follow what other parts of the code do.

Documentation/CodingStyle

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
  2012-08-14 12:24 ` [PATCH v2 2/2] Xen: Document the semantic of the " Attilio Rao
  2012-08-14 13:57   ` David Vrabel
@ 2012-08-15 17:46   ` Stefano Stabellini
  1 sibling, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2012-08-15 17:46 UTC (permalink / raw)
  To: Attilio Rao
  Cc: Konrad Rzeszutek Wilk, Stefano Stabellini,
	xen-devel@lists.xen.org

On Tue, 14 Aug 2012, Attilio Rao wrote:
> The informations added on the hook are:
> - Native behaviour
> - Xen specific behaviour
> - Logic behind the Xen specific behaviour
> - PVOPS semantic
> 
> Signed-off-by: Attilio Rao <attilio.rao@citrix.com>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

>  arch/x86/include/asm/x86_init.h |   19 +++++++++++++++++--
>  1 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index 38155f6..b22093c 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -72,8 +72,23 @@ struct x86_init_oem {
>   * struct x86_init_mapping - platform specific initial kernel pagetable setup
>   * @pagetable_reserve:	reserve a range of addresses for kernel pagetable usage
>   *
> - * For more details on the purpose of this hook, look in
> - * init_memory_mapping and the commit that added it.
> + * It does reserve a range of pages, to be used as pagetable pages.
> + * The start and end parameters are expected to be contained in the
> + * [pgt_buf_start, pgt_buf_top] range.
> + * The native implementation reserves the pages via the memblock_reserve()
> + * interface.
> + * The Xen implementation, besides reserving the range via memblock_reserve(),
> + * also sets RW the remaining pages contained in the ranges
> + * [pgt_buf_start, start) and [end, pgt_buf_top).
> + * This is needed because the range [pgt_buf_start, pgt_buf_top] was
> + * previously mapped read-only by xen_set_pte_init: when running
> + * on Xen all the pagetable pages need to be mapped read-only in order to
> + * avoid protection faults from the hypervisor. However, once the correct
> + * amount of pages is reserved for the pagetables, all the others contained
> + * in the range must be set to RW so that they can be correctly recycled by
> + * the VM subsystem.
> + * This operation is meant to be performed only during init_memory_mapping(),
> + * just after space for the kernel direct mapping tables is found.
>   */
>  struct x86_init_mapping {
>  	void (*pagetable_reserve)(u64 start, u64 end);
> -- 
> 1.7.2.5
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-15 17:46       ` Stefano Stabellini
@ 2012-08-15 18:43         ` Ian Campbell
  2012-08-15 18:50           ` Attilio Rao
  2012-08-15 18:47         ` Attilio Rao
  1 sibling, 1 reply; 20+ messages in thread
From: Ian Campbell @ 2012-08-15 18:43 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Attilio Rao, Konrad Rzeszutek Wilk, xen-devel@lists.xen.org

On Wed, 2012-08-15 at 18:46 +0100, Stefano Stabellini wrote:
> On Wed, 15 Aug 2012, Attilio Rao wrote:
> > On 15/08/12 18:25, Stefano Stabellini wrote:
> > > On Tue, 14 Aug 2012, Attilio Rao wrote:
> > >    
> > >> - Allow xen_mapping_pagetable_reserve() to handle a start different from
> > >>    pgt_buf_start, but still bigger than it.
> > >> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
> > >>    for verifying start and end are contained in the range
> > >>    [pgt_buf_start, pgt_buf_top].
> > >> - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
> > >> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
> > >>    an actual need to do that (or, in other words, if there are actually some
> > >>    pages going to switch from RO to RW).
> > >>
> > >> Signed-off-by: Attilio Rao<attilio.rao@citrix.com>
> > >> ---
> > >>   arch/x86/mm/init.c |    4 ++++
> > >>   arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
> > >>   2 files changed, 24 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> > >> index e0e6990..c5849b6 100644
> > >> --- a/arch/x86/mm/init.c
> > >> +++ b/arch/x86/mm/init.c
> > >> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
> > >>
> > >>   void __init native_pagetable_reserve(u64 start, u64 end)
> > >>   {
> > >> +	if (start<  PFN_PHYS(pgt_buf_start) || end>  PFN_PHYS(pgt_buf_top))
> > >> +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
> > >>      
> > > code style (you can check whether your patch breaks the code style with
> > > scripts/checkpatch.pl)
> > >    
> > 
> > I actually did before to submit, it reported 0 errors/warning.
> 
> strange, that really looks like a line over 80 chars

Also there should be one space either side of the "<" and ">" in the
conditional.

> 
> 
> > Do you have an handy link on where I can find a style guide for Linux 
> > kernel? I tried to follow what other parts of the code do.
> 
> Documentation/CodingStyle
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-15 17:46       ` Stefano Stabellini
  2012-08-15 18:43         ` Ian Campbell
@ 2012-08-15 18:47         ` Attilio Rao
  2012-08-16  8:12           ` Ian Campbell
  2012-08-16  9:53           ` Stefano Stabellini
  1 sibling, 2 replies; 20+ messages in thread
From: Attilio Rao @ 2012-08-15 18:47 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Konrad Rzeszutek Wilk, xen-devel@lists.xen.org

On 15/08/12 18:46, Stefano Stabellini wrote:
> On Wed, 15 Aug 2012, Attilio Rao wrote:
>    
>> On 15/08/12 18:25, Stefano Stabellini wrote:
>>      
>>> On Tue, 14 Aug 2012, Attilio Rao wrote:
>>>
>>>        
>>>> - Allow xen_mapping_pagetable_reserve() to handle a start different from
>>>>     pgt_buf_start, but still bigger than it.
>>>> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
>>>>     for verifying start and end are contained in the range
>>>>     [pgt_buf_start, pgt_buf_top].
>>>> - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
>>>> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
>>>>     an actual need to do that (or, in other words, if there are actually some
>>>>     pages going to switch from RO to RW).
>>>>
>>>> Signed-off-by: Attilio Rao<attilio.rao@citrix.com>
>>>> ---
>>>>    arch/x86/mm/init.c |    4 ++++
>>>>    arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
>>>>    2 files changed, 24 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>>>> index e0e6990..c5849b6 100644
>>>> --- a/arch/x86/mm/init.c
>>>> +++ b/arch/x86/mm/init.c
>>>> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
>>>>
>>>>    void __init native_pagetable_reserve(u64 start, u64 end)
>>>>    {
>>>> +	if (start<   PFN_PHYS(pgt_buf_start) || end>   PFN_PHYS(pgt_buf_top))
>>>> +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
>>>>
>>>>          
>>> code style (you can check whether your patch breaks the code style with
>>> scripts/checkpatch.pl)
>>>
>>>        
>> I actually did before to submit, it reported 0 errors/warning.
>>      
> strange, that really looks like a line over 80 chars
>
>    

Actually code style explicitely says to not break strings because they 
want to retain the ability to grep. In FreeBSD this is the same and I 
think this is why checkpatch doesn't whine. I don't think there is a bug 
here.

Can I submit the patch as it is, then?

Attilio

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-15 18:43         ` Ian Campbell
@ 2012-08-15 18:50           ` Attilio Rao
  0 siblings, 0 replies; 20+ messages in thread
From: Attilio Rao @ 2012-08-15 18:50 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel@lists.xen.org, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On 15/08/12 19:43, Ian Campbell wrote:
> On Wed, 2012-08-15 at 18:46 +0100, Stefano Stabellini wrote:
>    
>> On Wed, 15 Aug 2012, Attilio Rao wrote:
>>      
>>> On 15/08/12 18:25, Stefano Stabellini wrote:
>>>        
>>>> On Tue, 14 Aug 2012, Attilio Rao wrote:
>>>>
>>>>          
>>>>> - Allow xen_mapping_pagetable_reserve() to handle a start different from
>>>>>     pgt_buf_start, but still bigger than it.
>>>>> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
>>>>>     for verifying start and end are contained in the range
>>>>>     [pgt_buf_start, pgt_buf_top].
>>>>> - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
>>>>> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
>>>>>     an actual need to do that (or, in other words, if there are actually some
>>>>>     pages going to switch from RO to RW).
>>>>>
>>>>> Signed-off-by: Attilio Rao<attilio.rao@citrix.com>
>>>>> ---
>>>>>    arch/x86/mm/init.c |    4 ++++
>>>>>    arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
>>>>>    2 files changed, 24 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
>>>>> index e0e6990..c5849b6 100644
>>>>> --- a/arch/x86/mm/init.c
>>>>> +++ b/arch/x86/mm/init.c
>>>>> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
>>>>>
>>>>>    void __init native_pagetable_reserve(u64 start, u64 end)
>>>>>    {
>>>>> +	if (start<   PFN_PHYS(pgt_buf_start) || end>   PFN_PHYS(pgt_buf_top))
>>>>> +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
>>>>>
>>>>>            
>>>> code style (you can check whether your patch breaks the code style with
>>>> scripts/checkpatch.pl)
>>>>
>>>>          
>>> I actually did before to submit, it reported 0 errors/warning.
>>>        
>> strange, that really looks like a line over 80 chars
>>      
> Also there should be one space either side of the "<" and">" in the
> conditional.
>
>    

I have no idea why they are reported like that, but in the original 
patch the space is fine.

Attilio

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
  2012-08-15 13:55         ` Stefano Stabellini
@ 2012-08-15 19:15           ` David Vrabel
  2012-08-16 11:07             ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: David Vrabel @ 2012-08-15 19:15 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Attilio Rao, Konrad Rzeszutek Wilk, xen-devel@lists.xen.org

On 15/08/12 14:55, Stefano Stabellini wrote:
> On Wed, 15 Aug 2012, David Vrabel wrote:
>> On 14/08/12 15:12, Attilio Rao wrote:
>>> On 14/08/12 14:57, David Vrabel wrote:
>>>> On 14/08/12 13:24, Attilio Rao wrote:
>> After looking at it some more, I think this pv-ops is unnecessary. How
>> about the following patch to just remove it completely?
>>
>> I've only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning
>> is sound.
> 
> Do you have more then 4G to dom0 on those boxes?

I've tested with 6G now, both 64-bit and 32-bit with HIGHPTE.

> It certainly fixed a serious crash at the time it was introduced, see
> http://marc.info/?l=linux-kernel&m=129901609503574 and
> http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big
> changed in kernel_physical_mapping_init, I think we still need it.
> Depending on the e820 of your test box, the kernel could crash (or not),
> possibly in different places.
>
>>>> Having said that, I couldn't immediately see where pages in (end, 
>>>> pgt_buf_top] was getting set RO.  Can you point me to where it's 
>>>> done?
>>>>
>>>
>>> As mentioned in the comment, please look at xen_set_pte_init().
>>
>> xen_set_pte_init() only ensures it doesn't set the PTE as writable if it
>> is already present and read-only.
> 
> look at mask_rw_pte and read the threads linked above, unfortunately it
> is not that simple.

Yes, I was remembering what 32-bit did here.

The 64-bit version is a bit confused and it often ends up /not/ clearing
RW for the direct mapping of the pages in the pgt_buf because any
existing RW mappings will be used as-is.  See phys_pte_init() which
checks for an existing mapping and only sets the PTE if it is not
already set.

pgd_populate(), pud_populate(), and pmd_populate() take care of clearing
RW for the newly allocated page table pages, so mask_rw_pte() only needs
to consider clearing RW for mappings of the the existing page table PFNs
which all lie outside the range (pt_buf_start, pt_buf_top].

This patch makes it more sensible, and makes removal of the pv-op safe
if pgt_buf lies outside the initial mapping.

index 04c1f61..2fd5e86 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1400,14 +1400,13 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
pte_t pte)
 	unsigned long pfn = pte_pfn(pte);

 	/*
-	 * If the new pfn is within the range of the newly allocated
-	 * kernel pagetable, and it isn't being mapped into an
-	 * early_ioremap fixmap slot as a freshly allocated page, make sure
-	 * it is RO.
+	 * If this is a PTE of an early_ioremap fixmap slot but
+	 * outside the range (pgt_buf_start, pgt_buf_top], then this
+	 * PTE is mapping a PFN in the current page table.  Make
+	 * sure it is RO.
 	 */
-	if (((!is_early_ioremap_ptep(ptep) &&
-			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
-			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
+	if (is_early_ioremap_ptep(ptep)
+	    && (pfn < pgt_buf_start || pfn >= pgt_buf_top))
 		pte = pte_wrprotect(pte);

 	return pte;


>> 8<----------------------
>> x86: remove x86_init.mapping.pagetable_reserve paravirt op
>>
>> The x86_init.mapping.pagetable_reserve paravirt op is used for Xen
>> guests to set the writable flag for the mapping of (pgt_buf_end,
>> pgt_buf_top].  This is not necessary as these pages are never set as
>> read-only as they have never contained page tables.
> 
> Is this actually true? It is possible when pagetable pages are
> allocated by alloc_low_page.

These newly allocated page table pages will be set read-only when they
are linked into the page tables with pgd_populate(), pud_populate() and
friends.

>> When running as a Xen guest, the initial page tables are provided by
>> Xen (these are reserved with memblock_reserve() in
>> xen_setup_kernel_pagetable()) and constructed in brk space (for 32-bit
>> guests) or in the kernel's .data section (for 64-bit guests, see
>> head_64.S).
>>
>> Since these are all marked as reserved, (pgt_buf_start, pgt_buf_top]
>> does not overlap with them and the mappings for these PFNs will be
>> read-write.
> 
> We are talking about pagetable pages built by
> kernel_physical_mapping_init.
> 
> 
>> Since Xen doesn't need to change the mapping its implementation
>> becomes the same as a native and we can simply remove this pv-op
>> completely.
> 
> I don't think so.

David

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-15 18:47         ` Attilio Rao
@ 2012-08-16  8:12           ` Ian Campbell
  2012-08-16  9:53           ` Stefano Stabellini
  1 sibling, 0 replies; 20+ messages in thread
From: Ian Campbell @ 2012-08-16  8:12 UTC (permalink / raw)
  To: Attilio Rao
  Cc: Konrad Rzeszutek Wilk, xen-devel@lists.xen.org,
	Stefano Stabellini

On Wed, 2012-08-15 at 19:47 +0100, Attilio Rao wrote:
> On 15/08/12 18:46, Stefano Stabellini wrote:
> > On Wed, 15 Aug 2012, Attilio Rao wrote:
> >    
> >> On 15/08/12 18:25, Stefano Stabellini wrote:
> >>      
> >>> On Tue, 14 Aug 2012, Attilio Rao wrote:
> >>>
> >>>        
> >>>> - Allow xen_mapping_pagetable_reserve() to handle a start different from
> >>>>     pgt_buf_start, but still bigger than it.
> >>>> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
> >>>>     for verifying start and end are contained in the range
> >>>>     [pgt_buf_start, pgt_buf_top].
> >>>> - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
> >>>> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
> >>>>     an actual need to do that (or, in other words, if there are actually some
> >>>>     pages going to switch from RO to RW).
> >>>>
> >>>> Signed-off-by: Attilio Rao<attilio.rao@citrix.com>
> >>>> ---
> >>>>    arch/x86/mm/init.c |    4 ++++
> >>>>    arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
> >>>>    2 files changed, 24 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> >>>> index e0e6990..c5849b6 100644
> >>>> --- a/arch/x86/mm/init.c
> >>>> +++ b/arch/x86/mm/init.c
> >>>> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
> >>>>
> >>>>    void __init native_pagetable_reserve(u64 start, u64 end)
> >>>>    {
> >>>> +	if (start<   PFN_PHYS(pgt_buf_start) || end>   PFN_PHYS(pgt_buf_top))
> >>>> +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
> >>>>
> >>>>          
> >>> code style (you can check whether your patch breaks the code style with
> >>> scripts/checkpatch.pl)
> >>>
> >>>        
> >> I actually did before to submit, it reported 0 errors/warning.
> >>      
> > strange, that really looks like a line over 80 chars
> >
> >    
> 
> Actually code style explicitely says to not break strings because they 
> want to retain the ability to grep. In FreeBSD this is the same and I 
> think this is why checkpatch doesn't whine. I don't think there is a bug 
> here.

Right, CodingStyle changed a little while ago from a strict 80 column
limit to just strongly preferred 80 columns with an explicit exception
for user visible strings.

> 
> Can I submit the patch as it is, then?
> 
> Attilio
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS
  2012-08-15 18:47         ` Attilio Rao
  2012-08-16  8:12           ` Ian Campbell
@ 2012-08-16  9:53           ` Stefano Stabellini
  1 sibling, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2012-08-16  9:53 UTC (permalink / raw)
  To: Attilio Rao
  Cc: xen-devel@lists.xen.org, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, 15 Aug 2012, Attilio Rao wrote:
> On 15/08/12 18:46, Stefano Stabellini wrote:
> > On Wed, 15 Aug 2012, Attilio Rao wrote:
> >    
> >> On 15/08/12 18:25, Stefano Stabellini wrote:
> >>      
> >>> On Tue, 14 Aug 2012, Attilio Rao wrote:
> >>>
> >>>        
> >>>> - Allow xen_mapping_pagetable_reserve() to handle a start different from
> >>>>     pgt_buf_start, but still bigger than it.
> >>>> - Add checks to xen_mapping_pagetable_reserve() and native_pagetable_reserve()
> >>>>     for verifying start and end are contained in the range
> >>>>     [pgt_buf_start, pgt_buf_top].
> >>>> - In xen_mapping_pagetable_reserve(), change printk into pr_debug.
> >>>> - In xen_mapping_pagetable_reserve(), print out diagnostic only if there is
> >>>>     an actual need to do that (or, in other words, if there are actually some
> >>>>     pages going to switch from RO to RW).
> >>>>
> >>>> Signed-off-by: Attilio Rao<attilio.rao@citrix.com>
> >>>> ---
> >>>>    arch/x86/mm/init.c |    4 ++++
> >>>>    arch/x86/xen/mmu.c |   22 ++++++++++++++++++++--
> >>>>    2 files changed, 24 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> >>>> index e0e6990..c5849b6 100644
> >>>> --- a/arch/x86/mm/init.c
> >>>> +++ b/arch/x86/mm/init.c
> >>>> @@ -92,6 +92,10 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
> >>>>
> >>>>    void __init native_pagetable_reserve(u64 start, u64 end)
> >>>>    {
> >>>> +	if (start<   PFN_PHYS(pgt_buf_start) || end>   PFN_PHYS(pgt_buf_top))
> >>>> +		panic("Invalid address range: [%llu - %llu] should be a subset of [%llu - %llu]\n"
> >>>>
> >>>>          
> >>> code style (you can check whether your patch breaks the code style with
> >>> scripts/checkpatch.pl)
> >>>
> >>>        
> >> I actually did before to submit, it reported 0 errors/warning.
> >>      
> > strange, that really looks like a line over 80 chars
> >
> >    
> 
> Actually code style explicitely says to not break strings because they 
> want to retain the ability to grep. In FreeBSD this is the same and I 
> think this is why checkpatch doesn't whine. I don't think there is a bug 
> here.
> 
> Can I submit the patch as it is, then?

it is ok for me

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
  2012-08-15 19:15           ` David Vrabel
@ 2012-08-16 11:07             ` Stefano Stabellini
  2012-08-16 14:33               ` David Vrabel
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2012-08-16 11:07 UTC (permalink / raw)
  To: David Vrabel
  Cc: Attilio Rao, xen-devel@lists.xen.org, Konrad Rzeszutek Wilk,
	Stefano Stabellini

On Wed, 15 Aug 2012, David Vrabel wrote:
> On 15/08/12 14:55, Stefano Stabellini wrote:
> > On Wed, 15 Aug 2012, David Vrabel wrote:
> >> On 14/08/12 15:12, Attilio Rao wrote:
> >>> On 14/08/12 14:57, David Vrabel wrote:
> >>>> On 14/08/12 13:24, Attilio Rao wrote:
> >> After looking at it some more, I think this pv-ops is unnecessary. How
> >> about the following patch to just remove it completely?
> >>
> >> I've only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning
> >> is sound.
> > 
> > Do you have more then 4G to dom0 on those boxes?
> 
> I've tested with 6G now, both 64-bit and 32-bit with HIGHPTE.
> 
> > It certainly fixed a serious crash at the time it was introduced, see
> > http://marc.info/?l=linux-kernel&m=129901609503574 and
> > http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big
> > changed in kernel_physical_mapping_init, I think we still need it.
> > Depending on the e820 of your test box, the kernel could crash (or not),
> > possibly in different places.
> >
> >>>> Having said that, I couldn't immediately see where pages in (end, 
> >>>> pgt_buf_top] was getting set RO.  Can you point me to where it's 
> >>>> done?
> >>>>
> >>>
> >>> As mentioned in the comment, please look at xen_set_pte_init().
> >>
> >> xen_set_pte_init() only ensures it doesn't set the PTE as writable if it
> >> is already present and read-only.
> > 
> > look at mask_rw_pte and read the threads linked above, unfortunately it
> > is not that simple.
> 
> Yes, I was remembering what 32-bit did here.
> 
> The 64-bit version is a bit confused and it often ends up /not/ clearing
> RW for the direct mapping of the pages in the pgt_buf because any
> existing RW mappings will be used as-is.  See phys_pte_init() which
> checks for an existing mapping and only sets the PTE if it is not
> already set.

not all the pagetable pages might be already mapped, even if they are
already hooked into the pagetable


> pgd_populate(), pud_populate(), and pmd_populate() take care of clearing
> RW for the newly allocated page table pages, so mask_rw_pte() only needs
> to consider clearing RW for mappings of the the existing page table PFNs
> which all lie outside the range (pt_buf_start, pt_buf_top].
> 
> This patch makes it more sensible, and makes removal of the pv-op safe
> if pgt_buf lies outside the initial mapping.
> 
> index 04c1f61..2fd5e86 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1400,14 +1400,13 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>  	unsigned long pfn = pte_pfn(pte);
> 
>  	/*
> -	 * If the new pfn is within the range of the newly allocated
> -	 * kernel pagetable, and it isn't being mapped into an
> -	 * early_ioremap fixmap slot as a freshly allocated page, make sure
> -	 * it is RO.
> +	 * If this is a PTE of an early_ioremap fixmap slot but
> +	 * outside the range (pgt_buf_start, pgt_buf_top], then this
> +	 * PTE is mapping a PFN in the current page table.  Make
> +	 * sure it is RO.
>  	 */
> -	if (((!is_early_ioremap_ptep(ptep) &&
> -			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
> -			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
> +	if (is_early_ioremap_ptep(ptep)
> +	    && (pfn < pgt_buf_start || pfn >= pgt_buf_top))
>  		pte = pte_wrprotect(pte);
> 
>  	return pte;
> 

That's a mistake, you just inverted the condition!
What if map_low_page is used to map a page already hooked into the
pagetable? It should be RO while with your change it would be RW.
Also you don't handle the case when map_low_page is used to map the very
latest page, allocated and mapped by alloc_low_page, but
still not hooked into the pagetable: that page needs to be RW.

I believe you need more RAM than 6G to see all these issues.


> >> 8<----------------------
> >> x86: remove x86_init.mapping.pagetable_reserve paravirt op
> >>
> >> The x86_init.mapping.pagetable_reserve paravirt op is used for Xen
> >> guests to set the writable flag for the mapping of (pgt_buf_end,
> >> pgt_buf_top].  This is not necessary as these pages are never set as
> >> read-only as they have never contained page tables.
> > 
> > Is this actually true? It is possible when pagetable pages are
> > allocated by alloc_low_page.
> 
> These newly allocated page table pages will be set read-only when they
> are linked into the page tables with pgd_populate(), pud_populate() and
> friends.
> 
> >> When running as a Xen guest, the initial page tables are provided by
> >> Xen (these are reserved with memblock_reserve() in
> >> xen_setup_kernel_pagetable()) and constructed in brk space (for 32-bit
> >> guests) or in the kernel's .data section (for 64-bit guests, see
> >> head_64.S).
> >>
> >> Since these are all marked as reserved, (pgt_buf_start, pgt_buf_top]
> >> does not overlap with them and the mappings for these PFNs will be
> >> read-write.
> > 
> > We are talking about pagetable pages built by
> > kernel_physical_mapping_init.
> > 
> > 
> >> Since Xen doesn't need to change the mapping its implementation
> >> becomes the same as a native and we can simply remove this pv-op
> >> completely.
> > 
> > I don't think so.
> 
> David
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
  2012-08-16 11:07             ` Stefano Stabellini
@ 2012-08-16 14:33               ` David Vrabel
  0 siblings, 0 replies; 20+ messages in thread
From: David Vrabel @ 2012-08-16 14:33 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Attilio Rao, Konrad Rzeszutek Wilk, xen-devel@lists.xen.org

On 16/08/12 12:07, Stefano Stabellini wrote:
> On Wed, 15 Aug 2012, David Vrabel wrote:
>> On 15/08/12 14:55, Stefano Stabellini wrote:
>>> On Wed, 15 Aug 2012, David Vrabel wrote:
>>>> On 14/08/12 15:12, Attilio Rao wrote:
>>>>> On 14/08/12 14:57, David Vrabel wrote:
>>>>>> On 14/08/12 13:24, Attilio Rao wrote:
>>>> After looking at it some more, I think this pv-ops is unnecessary. How
>>>> about the following patch to just remove it completely?
>>>>
>>>> I've only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning
>>>> is sound.
>>>
>>> Do you have more then 4G to dom0 on those boxes?
>>
>> I've tested with 6G now, both 64-bit and 32-bit with HIGHPTE.
>>
>>> It certainly fixed a serious crash at the time it was introduced, see
>>> http://marc.info/?l=linux-kernel&m=129901609503574 and
>>> http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big
>>> changed in kernel_physical_mapping_init, I think we still need it.
>>> Depending on the e820 of your test box, the kernel could crash (or not),
>>> possibly in different places.

FYI, it looks like pgt_buf is now located low down, which is why these
changes worked for me.  Possibly this changed as part of a memblock
refactor.

>>>>>> Having said that, I couldn't immediately see where pages in (end, 
>>>>>> pgt_buf_top] was getting set RO.  Can you point me to where it's 
>>>>>> done?
>>>>>>
>>>>>
>>>>> As mentioned in the comment, please look at xen_set_pte_init().
>>>>
>>>> xen_set_pte_init() only ensures it doesn't set the PTE as writable if it
>>>> is already present and read-only.
>>>
>>> look at mask_rw_pte and read the threads linked above, unfortunately it
>>> is not that simple.
>>
>> Yes, I was remembering what 32-bit did here.
>>
>> The 64-bit version is a bit confused and it often ends up /not/ clearing
>> RW for the direct mapping of the pages in the pgt_buf because any
>> existing RW mappings will be used as-is.  See phys_pte_init() which
>> checks for an existing mapping and only sets the PTE if it is not
>> already set.
> 
> not all the pagetable pages might be already mapped, even if they are
> already hooked into the pagetable

Yes, I think this is easy to handle though.

    /*
     * Make sure that active page table pages are not mapped RW.
     */
    if (is_early_ioremap_ptep(ptep)) {
        /*
         * If we're updating an early ioremap PTE, then this PFN may
         * already be in the linear mapping.  If it is, use the
         * existing RW bit.
         */
        unsigned int level;
        pte_t *linear_pte;

        linear_pte = lookup_address(__va(PFN_PHYS(pfn)), &level);
        if (linear_pte && !pte_write(*linear_pt))
            pte = pte_wrprotect(pte);
    } else if (pfn >= pgt_buf_start && pfn < pgt_buf_end) {
        /*
         * The PFN may not be mapped but may be hooked into the page
         * tables.  Make sure this new mapping is read-only.
         */
        pte = pte_wrprotect(pte);
    }

However, the real subtlety are page tables that are mapped as they
themselves are hooked in.

As as example, let's say pgt_buf_start (s) is on a 1 GiB boundary and
pgt_buf_top (t) is below the next 2 MiB boundary.  We're mapping memory
with 4 KiB pages from s to s + 4 MiB.  This requires a new PUD and two
new PMDs.

To map this region:

1. Allocate a new PUD. (@ e = pgt_buf_end)

2. Allocate a new PMD. (@ e + 1)

3. Fill in this PMD's PTEs.  This covers pgt_buf so sets (s, e + 1) as RO.

4. Call pmd_populate() to hook-in this PMD.  This does not set e+1 as RO
(but this is ok as it already is).

5. Allocate a new PMD  (@ e + 2)

6. Fill in this PMD's PTEs.

7. Call pmd_populate() to hook in this PMD.  This does not set e+2 as RO
as the region isn't mapped yet.

8. Call pud_populate() to hook in this PUD.  This sets e as RO but e + 2
is still RW.

9. Boom!

It may be possible to fixup the permissions as the pages are hooked in.
 i.e., if this page's entries cover (pgt_buf_start, pgt_buf_end], walk
the entries and any child tables and fixup the permissions of the leaf
entries.

This would walk the tables a few times unless we were careful to only
walk them when hooking a page into an active table.

It was fun trying to understand this, but I think I'll give up now...

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-08-16 14:33 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-14 12:24 [PATCH v2 0/2] XEN/X86: Document pagetable_reserve PVOPS and enforce a better semantic Attilio Rao
2012-08-14 12:24 ` [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS Attilio Rao
2012-08-15 17:25   ` Stefano Stabellini
2012-08-15 17:15     ` Attilio Rao
2012-08-15 17:46       ` Stefano Stabellini
2012-08-15 18:43         ` Ian Campbell
2012-08-15 18:50           ` Attilio Rao
2012-08-15 18:47         ` Attilio Rao
2012-08-16  8:12           ` Ian Campbell
2012-08-16  9:53           ` Stefano Stabellini
2012-08-15 17:33     ` Stefano Stabellini
2012-08-14 12:24 ` [PATCH v2 2/2] Xen: Document the semantic of the " Attilio Rao
2012-08-14 13:57   ` David Vrabel
2012-08-14 14:12     ` Attilio Rao
2012-08-15 11:19       ` David Vrabel
2012-08-15 13:55         ` Stefano Stabellini
2012-08-15 19:15           ` David Vrabel
2012-08-16 11:07             ` Stefano Stabellini
2012-08-16 14:33               ` David Vrabel
2012-08-15 17:46   ` Stefano Stabellini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).