xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCHv4 28/33] x86/mm: add support of additional page table level during early boot
       [not found] ` <20170306135357.3124-29-kirill.shutemov@linux.intel.com>
@ 2017-03-06 20:05   ` Boris Ostrovsky
       [not found]   ` <7e78a76a-f5e8-bb60-e5be-a91a84faa1f9@oracle.com>
  1 sibling, 0 replies; 7+ messages in thread
From: Boris Ostrovsky @ 2017-03-06 20:05 UTC (permalink / raw)
  To: Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, Arnd Bergmann, H. Peter Anvin
  Cc: linux-arch, Juergen Gross, Andi Kleen, linux-mm, linux-kernel,
	Andy Lutomirski, Dave Hansen, xen-devel


> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> index 9991224f6238..c9e41f1599dd 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -14,15 +14,17 @@
>  #include <linux/bitops.h>
>  #include <linux/threads.h>
>  
> +extern p4d_t level4_kernel_pgt[512];
> +extern p4d_t level4_ident_pgt[512];
>  extern pud_t level3_kernel_pgt[512];
>  extern pud_t level3_ident_pgt[512];
>  extern pmd_t level2_kernel_pgt[512];
>  extern pmd_t level2_fixmap_pgt[512];
>  extern pmd_t level2_ident_pgt[512];
>  extern pte_t level1_fixmap_pgt[512];
> -extern pgd_t init_level4_pgt[];
> +extern pgd_t init_top_pgt[];
>  
> -#define swapper_pg_dir init_level4_pgt
> +#define swapper_pg_dir init_top_pgt
>  
>  extern void paging_init(void);
>  


This means you also need


diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
index 5e24671..e1a5fbe 100644
--- a/arch/x86/xen/xen-pvh.S
+++ b/arch/x86/xen/xen-pvh.S
@@ -87,7 +87,7 @@ ENTRY(pvh_start_xen)
        wrmsr
 
        /* Enable pre-constructed page tables. */
-       mov $_pa(init_level4_pgt), %eax
+       mov $_pa(init_top_pgt), %eax
        mov %eax, %cr3
        mov $(X86_CR0_PG | X86_CR0_PE), %eax
        mov %eax, %cr0


-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCHv4 28/33] x86/mm: add support of additional page table level during early boot
       [not found]   ` <7e78a76a-f5e8-bb60-e5be-a91a84faa1f9@oracle.com>
@ 2017-03-06 20:23     ` Kirill A. Shutemov
  0 siblings, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2017-03-06 20:23 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-arch, Juergen Gross, Andi Kleen, Arnd Bergmann, linux-mm,
	x86, linux-kernel, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, xen-devel, Linus Torvalds,
	Thomas Gleixner, Kirill A. Shutemov

On Mon, Mar 06, 2017 at 03:05:49PM -0500, Boris Ostrovsky wrote:
> 
> > diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> > index 9991224f6238..c9e41f1599dd 100644
> > --- a/arch/x86/include/asm/pgtable_64.h
> > +++ b/arch/x86/include/asm/pgtable_64.h
> > @@ -14,15 +14,17 @@
> >  #include <linux/bitops.h>
> >  #include <linux/threads.h>
> >  
> > +extern p4d_t level4_kernel_pgt[512];
> > +extern p4d_t level4_ident_pgt[512];
> >  extern pud_t level3_kernel_pgt[512];
> >  extern pud_t level3_ident_pgt[512];
> >  extern pmd_t level2_kernel_pgt[512];
> >  extern pmd_t level2_fixmap_pgt[512];
> >  extern pmd_t level2_ident_pgt[512];
> >  extern pte_t level1_fixmap_pgt[512];
> > -extern pgd_t init_level4_pgt[];
> > +extern pgd_t init_top_pgt[];
> >  
> > -#define swapper_pg_dir init_level4_pgt
> > +#define swapper_pg_dir init_top_pgt
> >  
> >  extern void paging_init(void);
> >  
> 
> 
> This means you also need
> 
> 
> diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/xen/xen-pvh.S
> index 5e24671..e1a5fbe 100644
> --- a/arch/x86/xen/xen-pvh.S
> +++ b/arch/x86/xen/xen-pvh.S
> @@ -87,7 +87,7 @@ ENTRY(pvh_start_xen)
>         wrmsr
>  
>         /* Enable pre-constructed page tables. */
> -       mov $_pa(init_level4_pgt), %eax
> +       mov $_pa(init_top_pgt), %eax
>         mov %eax, %cr3
>         mov $(X86_CR0_PG | X86_CR0_PE), %eax
>         mov %eax, %cr0
> 
> 

Ah. Thanks. I've missed that.

The fix is folded.

-- 
 Kirill A. Shutemov

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv4 18/33] x86/xen: convert __xen_pgd_walk() and xen_cleanmfnmap() to support p4d
       [not found] ` <20170306135357.3124-19-kirill.shutemov@linux.intel.com>
@ 2017-03-06 20:48   ` Boris Ostrovsky
       [not found]   ` <ab2868ea-1dd1-d51b-4c5a-921ef5c9a427@oracle.com>
  1 sibling, 0 replies; 7+ messages in thread
From: Boris Ostrovsky @ 2017-03-06 20:48 UTC (permalink / raw)
  To: Kirill A. Shutemov, Linus Torvalds, Andrew Morton, x86,
	Thomas Gleixner, Ingo Molnar, Arnd Bergmann, H. Peter Anvin
  Cc: linux-arch, Juergen Gross, Andi Kleen, linux-mm, linux-kernel,
	Andy Lutomirski, Dave Hansen, Xiong Zhang, xen-devel


> +static int xen_p4d_walk(struct mm_struct *mm, p4d_t *p4d,
> +		int (*func)(struct mm_struct *mm, struct page *, enum pt_level),
> +		bool last, unsigned long limit)
> +{
> +	int i, nr, flush = 0;
> +
> +	nr = last ? p4d_index(limit) + 1 : PTRS_PER_P4D;
> +	for (i = 0; i < nr; i++) {
> +		pud_t *pud;
> +
> +		if (p4d_none(p4d[i]))
> +			continue;
> +
> +		pud = pud_offset(&p4d[i], 0);
> +		if (PTRS_PER_PUD > 1)
> +			flush |= (*func)(mm, virt_to_page(pud), PT_PUD);
> +		xen_pud_walk(mm, pud, func, last && i == nr - 1, limit);
> +	}
> +	return flush;
> +}

..

> +		p4d = p4d_offset(&pgd[i], 0);
> +		if (PTRS_PER_P4D > 1)
> +			flush |= (*func)(mm, virt_to_page(p4d), PT_P4D);
> +		xen_p4d_walk(mm, p4d, func, i == nr - 1, limit);


We are losing flush status at all levels so we need something like

flush |= xen_XXX_walk(...)



>  	}
>  
> -out:
>  	/* Do the top level last, so that the callbacks can use it as
>  	   a cue to do final things like tlb flushes. */
>  	flush |= (*func)(mm, virt_to_page(pgd), PT_PGD);
> @@ -1150,57 +1161,97 @@ static void __init xen_cleanmfnmap_free_pgtbl(void *pgtbl, bool unpin)
>  	xen_free_ro_pages(pa, PAGE_SIZE);
>  }
>  
> +static void __init xen_cleanmfnmap_pmd(pmd_t *pmd, bool unpin)
> +{
> +	unsigned long pa;
> +	pte_t *pte_tbl;
> +	int i;
> +
> +	if (pmd_large(*pmd)) {
> +		pa = pmd_val(*pmd) & PHYSICAL_PAGE_MASK;
> +		xen_free_ro_pages(pa, PMD_SIZE);
> +		return;
> +	}
> +
> +	pte_tbl = pte_offset_kernel(pmd, 0);
> +	for (i = 0; i < PTRS_PER_PTE; i++) {
> +		if (pte_none(pte_tbl[i]))
> +			continue;
> +		pa = pte_pfn(pte_tbl[i]) << PAGE_SHIFT;
> +		xen_free_ro_pages(pa, PAGE_SIZE);
> +	}
> +	set_pmd(pmd, __pmd(0));
> +	xen_cleanmfnmap_free_pgtbl(pte_tbl, unpin);
> +}
> +
> +static void __init xen_cleanmfnmap_pud(pud_t *pud, bool unpin)
> +{
> +	unsigned long pa;
> +	pmd_t *pmd_tbl;
> +	int i;
> +
> +	if (pud_large(*pud)) {
> +		pa = pud_val(*pud) & PHYSICAL_PAGE_MASK;
> +		xen_free_ro_pages(pa, PUD_SIZE);
> +		return;
> +	}
> +
> +	pmd_tbl = pmd_offset(pud, 0);
> +	for (i = 0; i < PTRS_PER_PMD; i++) {
> +		if (pmd_none(pmd_tbl[i]))
> +			continue;
> +		xen_cleanmfnmap_pmd(pmd_tbl + i, unpin);
> +	}
> +	set_pud(pud, __pud(0));
> +	xen_cleanmfnmap_free_pgtbl(pmd_tbl, unpin);
> +}
> +
> +static void __init xen_cleanmfnmap_p4d(p4d_t *p4d, bool unpin)
> +{
> +	unsigned long pa;
> +	pud_t *pud_tbl;
> +	int i;
> +
> +	if (p4d_large(*p4d)) {
> +		pa = p4d_val(*p4d) & PHYSICAL_PAGE_MASK;
> +		xen_free_ro_pages(pa, P4D_SIZE);
> +		return;
> +	}
> +
> +	pud_tbl = pud_offset(p4d, 0);
> +	for (i = 0; i < PTRS_PER_PUD; i++) {
> +		if (pud_none(pud_tbl[i]))
> +			continue;
> +		xen_cleanmfnmap_pud(pud_tbl + i, unpin);
> +	}
> +	set_p4d(p4d, __p4d(0));
> +	xen_cleanmfnmap_free_pgtbl(pud_tbl, unpin);
> +}
> +
>  /*
>   * Since it is well isolated we can (and since it is perhaps large we should)
>   * also free the page tables mapping the initial P->M table.
>   */
>  static void __init xen_cleanmfnmap(unsigned long vaddr)
>  {
> -	unsigned long va = vaddr & PMD_MASK;
> -	unsigned long pa;
> -	pgd_t *pgd = pgd_offset_k(va);
> -	pud_t *pud_page = pud_offset(pgd, 0);
> -	pud_t *pud;
> -	pmd_t *pmd;
> -	pte_t *pte;
> +	pgd_t *pgd;
> +	p4d_t *p4d;
>  	unsigned int i;
>  	bool unpin;
>  
>  	unpin = (vaddr == 2 * PGDIR_SIZE);
> -	set_pgd(pgd, __pgd(0));
> -	do {
> -		pud = pud_page + pud_index(va);
> -		if (pud_none(*pud)) {
> -			va += PUD_SIZE;
> -		} else if (pud_large(*pud)) {
> -			pa = pud_val(*pud) & PHYSICAL_PAGE_MASK;
> -			xen_free_ro_pages(pa, PUD_SIZE);
> -			va += PUD_SIZE;
> -		} else {
> -			pmd = pmd_offset(pud, va);
> -			if (pmd_large(*pmd)) {
> -				pa = pmd_val(*pmd) & PHYSICAL_PAGE_MASK;
> -				xen_free_ro_pages(pa, PMD_SIZE);
> -			} else if (!pmd_none(*pmd)) {
> -				pte = pte_offset_kernel(pmd, va);
> -				set_pmd(pmd, __pmd(0));
> -				for (i = 0; i < PTRS_PER_PTE; ++i) {
> -					if (pte_none(pte[i]))
> -						break;
> -					pa = pte_pfn(pte[i]) << PAGE_SHIFT;
> -					xen_free_ro_pages(pa, PAGE_SIZE);
> -				}
> -				xen_cleanmfnmap_free_pgtbl(pte, unpin);
> -			}
> -			va += PMD_SIZE;
> -			if (pmd_index(va))
> -				continue;
> -			set_pud(pud, __pud(0));
> -			xen_cleanmfnmap_free_pgtbl(pmd, unpin);
> -		}
> -
> -	} while (pud_index(va) || pmd_index(va));
> -	xen_cleanmfnmap_free_pgtbl(pud_page, unpin);
> +	vaddr &= PMD_MASK;
> +	pgd = pgd_offset_k(vaddr);
> +	p4d = p4d_offset(pgd, 0);
> +	for (i = 0; i < PTRS_PER_P4D; i++) {
> +		if (p4d_none(p4d[i]))
> +			continue;
> +		xen_cleanmfnmap_p4d(p4d + i, unpin);
> +	}

Don't we need to pass vaddr down to all routines so that they select
appropriate tables? You seem to always be choosing the first one.

-boris

> +	if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
> +		set_pgd(pgd, __pgd(0));
> +		xen_cleanmfnmap_free_pgtbl(p4d, unpin);
> +	}
>  }
>  
>  static void __init xen_pagetable_p2m_free(void)
> diff --git a/arch/x86/xen/mmu.h b/arch/x86/xen/mmu.h
> index 73809bb951b4..3fe2b3292915 100644
> --- a/arch/x86/xen/mmu.h
> +++ b/arch/x86/xen/mmu.h
> @@ -5,6 +5,7 @@
>  
>  enum pt_level {
>  	PT_PGD,
> +	PT_P4D,
>  	PT_PUD,
>  	PT_PMD,
>  	PT_PTE



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv4 18/33] x86/xen: convert __xen_pgd_walk() and xen_cleanmfnmap() to support p4d
       [not found]   ` <ab2868ea-1dd1-d51b-4c5a-921ef5c9a427@oracle.com>
@ 2017-03-07 13:00     ` Kirill A. Shutemov
       [not found]     ` <20170307130009.GA2154@node>
  1 sibling, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2017-03-07 13:00 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: linux-arch, Juergen Gross, Andi Kleen, Arnd Bergmann, linux-mm,
	x86, linux-kernel, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	H. Peter Anvin, Xiong Zhang, Andrew Morton, xen-devel,
	Linus Torvalds, Thomas Gleixner, Kirill A. Shutemov

On Mon, Mar 06, 2017 at 03:48:24PM -0500, Boris Ostrovsky wrote:
> 
> > +static int xen_p4d_walk(struct mm_struct *mm, p4d_t *p4d,
> > +		int (*func)(struct mm_struct *mm, struct page *, enum pt_level),
> > +		bool last, unsigned long limit)
> > +{
> > +	int i, nr, flush = 0;
> > +
> > +	nr = last ? p4d_index(limit) + 1 : PTRS_PER_P4D;
> > +	for (i = 0; i < nr; i++) {
> > +		pud_t *pud;
> > +
> > +		if (p4d_none(p4d[i]))
> > +			continue;
> > +
> > +		pud = pud_offset(&p4d[i], 0);
> > +		if (PTRS_PER_PUD > 1)
> > +			flush |= (*func)(mm, virt_to_page(pud), PT_PUD);
> > +		xen_pud_walk(mm, pud, func, last && i == nr - 1, limit);
> > +	}
> > +	return flush;
> > +}
> 
> ..
> 
> > +		p4d = p4d_offset(&pgd[i], 0);
> > +		if (PTRS_PER_P4D > 1)
> > +			flush |= (*func)(mm, virt_to_page(p4d), PT_P4D);
> > +		xen_p4d_walk(mm, p4d, func, i == nr - 1, limit);
> 
> 
> We are losing flush status at all levels so we need something like
> 
> flush |= xen_XXX_walk(...)

+ Xiong.

Thanks for noticing this. The fixup is below.

Please test, I don't have a setup for this.

> 
> 
> 
> >  	}
> >  
> > -out:
> >  	/* Do the top level last, so that the callbacks can use it as
> >  	   a cue to do final things like tlb flushes. */
> >  	flush |= (*func)(mm, virt_to_page(pgd), PT_PGD);
> > @@ -1150,57 +1161,97 @@ static void __init xen_cleanmfnmap_free_pgtbl(void *pgtbl, bool unpin)
> >  	xen_free_ro_pages(pa, PAGE_SIZE);
> >  }
> >  
> > +static void __init xen_cleanmfnmap_pmd(pmd_t *pmd, bool unpin)
> > +{
> > +	unsigned long pa;
> > +	pte_t *pte_tbl;
> > +	int i;
> > +
> > +	if (pmd_large(*pmd)) {
> > +		pa = pmd_val(*pmd) & PHYSICAL_PAGE_MASK;
> > +		xen_free_ro_pages(pa, PMD_SIZE);
> > +		return;
> > +	}
> > +
> > +	pte_tbl = pte_offset_kernel(pmd, 0);
> > +	for (i = 0; i < PTRS_PER_PTE; i++) {
> > +		if (pte_none(pte_tbl[i]))
> > +			continue;
> > +		pa = pte_pfn(pte_tbl[i]) << PAGE_SHIFT;
> > +		xen_free_ro_pages(pa, PAGE_SIZE);
> > +	}
> > +	set_pmd(pmd, __pmd(0));
> > +	xen_cleanmfnmap_free_pgtbl(pte_tbl, unpin);
> > +}
> > +
> > +static void __init xen_cleanmfnmap_pud(pud_t *pud, bool unpin)
> > +{
> > +	unsigned long pa;
> > +	pmd_t *pmd_tbl;
> > +	int i;
> > +
> > +	if (pud_large(*pud)) {
> > +		pa = pud_val(*pud) & PHYSICAL_PAGE_MASK;
> > +		xen_free_ro_pages(pa, PUD_SIZE);
> > +		return;
> > +	}
> > +
> > +	pmd_tbl = pmd_offset(pud, 0);
> > +	for (i = 0; i < PTRS_PER_PMD; i++) {
> > +		if (pmd_none(pmd_tbl[i]))
> > +			continue;
> > +		xen_cleanmfnmap_pmd(pmd_tbl + i, unpin);
> > +	}
> > +	set_pud(pud, __pud(0));
> > +	xen_cleanmfnmap_free_pgtbl(pmd_tbl, unpin);
> > +}
> > +
> > +static void __init xen_cleanmfnmap_p4d(p4d_t *p4d, bool unpin)
> > +{
> > +	unsigned long pa;
> > +	pud_t *pud_tbl;
> > +	int i;
> > +
> > +	if (p4d_large(*p4d)) {
> > +		pa = p4d_val(*p4d) & PHYSICAL_PAGE_MASK;
> > +		xen_free_ro_pages(pa, P4D_SIZE);
> > +		return;
> > +	}
> > +
> > +	pud_tbl = pud_offset(p4d, 0);
> > +	for (i = 0; i < PTRS_PER_PUD; i++) {
> > +		if (pud_none(pud_tbl[i]))
> > +			continue;
> > +		xen_cleanmfnmap_pud(pud_tbl + i, unpin);
> > +	}
> > +	set_p4d(p4d, __p4d(0));
> > +	xen_cleanmfnmap_free_pgtbl(pud_tbl, unpin);
> > +}
> > +
> >  /*
> >   * Since it is well isolated we can (and since it is perhaps large we should)
> >   * also free the page tables mapping the initial P->M table.
> >   */
> >  static void __init xen_cleanmfnmap(unsigned long vaddr)
> >  {
> > -	unsigned long va = vaddr & PMD_MASK;
> > -	unsigned long pa;
> > -	pgd_t *pgd = pgd_offset_k(va);
> > -	pud_t *pud_page = pud_offset(pgd, 0);
> > -	pud_t *pud;
> > -	pmd_t *pmd;
> > -	pte_t *pte;
> > +	pgd_t *pgd;
> > +	p4d_t *p4d;
> >  	unsigned int i;
> >  	bool unpin;
> >  
> >  	unpin = (vaddr == 2 * PGDIR_SIZE);
> > -	set_pgd(pgd, __pgd(0));
> > -	do {
> > -		pud = pud_page + pud_index(va);
> > -		if (pud_none(*pud)) {
> > -			va += PUD_SIZE;
> > -		} else if (pud_large(*pud)) {
> > -			pa = pud_val(*pud) & PHYSICAL_PAGE_MASK;
> > -			xen_free_ro_pages(pa, PUD_SIZE);
> > -			va += PUD_SIZE;
> > -		} else {
> > -			pmd = pmd_offset(pud, va);
> > -			if (pmd_large(*pmd)) {
> > -				pa = pmd_val(*pmd) & PHYSICAL_PAGE_MASK;
> > -				xen_free_ro_pages(pa, PMD_SIZE);
> > -			} else if (!pmd_none(*pmd)) {
> > -				pte = pte_offset_kernel(pmd, va);
> > -				set_pmd(pmd, __pmd(0));
> > -				for (i = 0; i < PTRS_PER_PTE; ++i) {
> > -					if (pte_none(pte[i]))
> > -						break;
> > -					pa = pte_pfn(pte[i]) << PAGE_SHIFT;
> > -					xen_free_ro_pages(pa, PAGE_SIZE);
> > -				}
> > -				xen_cleanmfnmap_free_pgtbl(pte, unpin);
> > -			}
> > -			va += PMD_SIZE;
> > -			if (pmd_index(va))
> > -				continue;
> > -			set_pud(pud, __pud(0));
> > -			xen_cleanmfnmap_free_pgtbl(pmd, unpin);
> > -		}
> > -
> > -	} while (pud_index(va) || pmd_index(va));
> > -	xen_cleanmfnmap_free_pgtbl(pud_page, unpin);
> > +	vaddr &= PMD_MASK;
> > +	pgd = pgd_offset_k(vaddr);
> > +	p4d = p4d_offset(pgd, 0);
> > +	for (i = 0; i < PTRS_PER_P4D; i++) {
> > +		if (p4d_none(p4d[i]))
> > +			continue;
> > +		xen_cleanmfnmap_p4d(p4d + i, unpin);
> > +	}
> 
> Don't we need to pass vaddr down to all routines so that they select
> appropriate tables? You seem to always be choosing the first one.

IIUC, we clear whole page table subtree covered by one pgd entry.
So, no, there's no need to pass vaddr down. Just pointer to page table
entry is enough.

But I know virtually nothing about Xen. Please re-check my reasoning.

I would also appreciate help with getting x86 Xen code work with 5-level
paging enabled. For now I make CONFIG_XEN dependent on !CONFIG_X86_5LEVEL.

Fixup:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index a4079cfab007..d66b7e79781a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -629,7 +629,8 @@ static int xen_pud_walk(struct mm_struct *mm, pud_t *pud,
 		pmd = pmd_offset(&pud[i], 0);
 		if (PTRS_PER_PMD > 1)
 			flush |= (*func)(mm, virt_to_page(pmd), PT_PMD);
-		xen_pmd_walk(mm, pmd, func, last && i == nr - 1, limit);
+		flush |= xen_pmd_walk(mm, pmd, func,
+				last && i == nr - 1, limit);
 	}
 	return flush;
 }
@@ -650,7 +651,8 @@ static int xen_p4d_walk(struct mm_struct *mm, p4d_t *p4d,
 		pud = pud_offset(&p4d[i], 0);
 		if (PTRS_PER_PUD > 1)
 			flush |= (*func)(mm, virt_to_page(pud), PT_PUD);
-		xen_pud_walk(mm, pud, func, last && i == nr - 1, limit);
+		flush |= xen_pud_walk(mm, pud, func,
+				last && i == nr - 1, limit);
 	}
 	return flush;
 }
@@ -706,7 +708,7 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t *pgd,
 		p4d = p4d_offset(&pgd[i], 0);
 		if (PTRS_PER_P4D > 1)
 			flush |= (*func)(mm, virt_to_page(p4d), PT_P4D);
-		xen_p4d_walk(mm, p4d, func, i == nr - 1, limit);
+		flush |= xen_p4d_walk(mm, p4d, func, i == nr - 1, limit);
 	}
 
 	/* Do the top level last, so that the callbacks can use it as
-- 
 Kirill A. Shutemov

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCHv4 18/33] x86/xen: convert __xen_pgd_walk() and xen_cleanmfnmap() to support p4d
       [not found]     ` <20170307130009.GA2154@node>
@ 2017-03-07 18:18       ` Boris Ostrovsky
       [not found]       ` <8bd7d5b7-7a22-a0a2-8eff-e909a1c6783e@oracle.com>
  1 sibling, 0 replies; 7+ messages in thread
From: Boris Ostrovsky @ 2017-03-07 18:18 UTC (permalink / raw)
  To: Kirill A. Shutemov, Zhang, Xiong Y
  Cc: linux-arch, Juergen Gross, Andi Kleen, Arnd Bergmann, linux-mm,
	x86, linux-kernel, Andy Lutomirski, Dave Hansen, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, xen-devel, Linus Torvalds,
	Thomas Gleixner, Kirill A. Shutemov


>> Don't we need to pass vaddr down to all routines so that they select
>> appropriate tables? You seem to always be choosing the first one.
> IIUC, we clear whole page table subtree covered by one pgd entry.
> So, no, there's no need to pass vaddr down. Just pointer to page table
> entry is enough.
>
> But I know virtually nothing about Xen. Please re-check my reasoning.

Yes, we effectively remove the whole page table for vaddr so I guess
it's OK.

>
> I would also appreciate help with getting x86 Xen code work with 5-level
> paging enabled. For now I make CONFIG_XEN dependent on !CONFIG_X86_5LEVEL.

Hmmm... that's a problem since this requires changes in the hypervisor
and even if/when these changes are made older version of hypervisor
still will not be able to run those guests.

This affects only PV guests and there is a series under review that
provides clean code separation with CONFIG_XEN_PV but because, for
example, dom0 (Xen control domain) is PV this will significantly limit
availability of dom0-capable kernels (because I assume distros will want
to have CONFIG_X86_5LEVEL).


>
> Fixup:

Yes, that works. (But then it worked even without this change because
problems caused by missing the flush would be intermittent. And a joy to
debug).

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv4 18/33] x86/xen: convert __xen_pgd_walk() and xen_cleanmfnmap() to support p4d
       [not found]       ` <8bd7d5b7-7a22-a0a2-8eff-e909a1c6783e@oracle.com>
@ 2017-03-07 18:26         ` Andrew Cooper
       [not found]         ` <47f06f4a-ef29-5a77-c48b-43a91a8c9579@citrix.com>
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Cooper @ 2017-03-07 18:26 UTC (permalink / raw)
  To: Boris Ostrovsky, Kirill A. Shutemov, Zhang, Xiong Y
  Cc: linux-arch, Juergen Gross, Andi Kleen, Arnd Bergmann, Dave Hansen,
	x86, linux-kernel, Andy Lutomirski, linux-mm, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, xen-devel, Linus Torvalds,
	Thomas Gleixner, Kirill A. Shutemov

On 07/03/17 18:18, Boris Ostrovsky wrote:
>>> Don't we need to pass vaddr down to all routines so that they select
>>> appropriate tables? You seem to always be choosing the first one.
>> IIUC, we clear whole page table subtree covered by one pgd entry.
>> So, no, there's no need to pass vaddr down. Just pointer to page table
>> entry is enough.
>>
>> But I know virtually nothing about Xen. Please re-check my reasoning.
> Yes, we effectively remove the whole page table for vaddr so I guess
> it's OK.
>
>> I would also appreciate help with getting x86 Xen code work with 5-level
>> paging enabled. For now I make CONFIG_XEN dependent on !CONFIG_X86_5LEVEL.
> Hmmm... that's a problem since this requires changes in the hypervisor
> and even if/when these changes are made older version of hypervisor
> still will not be able to run those guests.
>
> This affects only PV guests and there is a series under review that
> provides clean code separation with CONFIG_XEN_PV but because, for
> example, dom0 (Xen control domain) is PV this will significantly limit
> availability of dom0-capable kernels (because I assume distros will want
> to have CONFIG_X86_5LEVEL).

Wasn't the plan to be able to automatically detect 4 vs 5 level support,
and cope either way, so distros didn't have to ship two different builds
of Linux?

If so, all we need to do git things to compile sensibly, and have the PV
entry code in Linux configure the rest of the kernel appropriately.

(If not, please ignore me.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv4 18/33] x86/xen: convert __xen_pgd_walk() and xen_cleanmfnmap() to support p4d
       [not found]         ` <47f06f4a-ef29-5a77-c48b-43a91a8c9579@citrix.com>
@ 2017-03-07 18:45           ` Boris Ostrovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Boris Ostrovsky @ 2017-03-07 18:45 UTC (permalink / raw)
  To: Andrew Cooper, Kirill A. Shutemov, Zhang, Xiong Y
  Cc: linux-arch, Juergen Gross, Andi Kleen, Arnd Bergmann, Dave Hansen,
	x86, linux-kernel, Andy Lutomirski, linux-mm, Ingo Molnar,
	H. Peter Anvin, Andrew Morton, xen-devel, Linus Torvalds,
	Thomas Gleixner, Kirill A. Shutemov

On 03/07/2017 01:26 PM, Andrew Cooper wrote:
> On 07/03/17 18:18, Boris Ostrovsky wrote:
>>>> Don't we need to pass vaddr down to all routines so that they select
>>>> appropriate tables? You seem to always be choosing the first one.
>>> IIUC, we clear whole page table subtree covered by one pgd entry.
>>> So, no, there's no need to pass vaddr down. Just pointer to page table
>>> entry is enough.
>>>
>>> But I know virtually nothing about Xen. Please re-check my reasoning.
>> Yes, we effectively remove the whole page table for vaddr so I guess
>> it's OK.
>>
>>> I would also appreciate help with getting x86 Xen code work with 5-level
>>> paging enabled. For now I make CONFIG_XEN dependent on !CONFIG_X86_5LEVEL.
>> Hmmm... that's a problem since this requires changes in the hypervisor
>> and even if/when these changes are made older version of hypervisor
>> still will not be able to run those guests.
>>
>> This affects only PV guests and there is a series under review that
>> provides clean code separation with CONFIG_XEN_PV but because, for
>> example, dom0 (Xen control domain) is PV this will significantly limit
>> availability of dom0-capable kernels (because I assume distros will want
>> to have CONFIG_X86_5LEVEL).
> Wasn't the plan to be able to automatically detect 4 vs 5 level support,
> and cope either way, so distros didn't have to ship two different builds
> of Linux?
>
> If so, all we need to do git things to compile sensibly, and have the PV
> entry code in Linux configure the rest of the kernel appropriately.

I am not aware of any plans but this would obviously be the preferred route.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-07 18:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20170306135357.3124-1-kirill.shutemov@linux.intel.com>
     [not found] ` <20170306135357.3124-29-kirill.shutemov@linux.intel.com>
2017-03-06 20:05   ` [PATCHv4 28/33] x86/mm: add support of additional page table level during early boot Boris Ostrovsky
     [not found]   ` <7e78a76a-f5e8-bb60-e5be-a91a84faa1f9@oracle.com>
2017-03-06 20:23     ` Kirill A. Shutemov
     [not found] ` <20170306135357.3124-19-kirill.shutemov@linux.intel.com>
2017-03-06 20:48   ` [PATCHv4 18/33] x86/xen: convert __xen_pgd_walk() and xen_cleanmfnmap() to support p4d Boris Ostrovsky
     [not found]   ` <ab2868ea-1dd1-d51b-4c5a-921ef5c9a427@oracle.com>
2017-03-07 13:00     ` Kirill A. Shutemov
     [not found]     ` <20170307130009.GA2154@node>
2017-03-07 18:18       ` Boris Ostrovsky
     [not found]       ` <8bd7d5b7-7a22-a0a2-8eff-e909a1c6783e@oracle.com>
2017-03-07 18:26         ` Andrew Cooper
     [not found]         ` <47f06f4a-ef29-5a77-c48b-43a91a8c9579@citrix.com>
2017-03-07 18:45           ` Boris Ostrovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).