Re: [PATCH v5 1/7] powerpc/mm: update ptep_set_access_flag to not do full mm tlb flush

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@ozlabs.org>
Cc: benh@kernel.crashing.org, mpe@ellerman.id.au,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v5 1/7] powerpc/mm: update ptep_set_access_flag to not do full mm tlb flush
Date: Fri, 25 Nov 2016 09:49:57 +0530	[thread overview]
Message-ID: <87fumgjimq.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <20161125024843.GA24925@fergus.ozlabs.ibm.com>

Paul Mackerras <paulus@ozlabs.org> writes:

> On Wed, Nov 23, 2016 at 04:39:57PM +0530, Aneesh Kumar K.V wrote:
>> When we are updating pte, we just need to flush the tlb mapping for
>> that pte. Right now we do a full mm flush because we don't track page
>> size. Update the interface to track the page size and use that to
>> do the right tlb flush.
> [...]
>
>> +int radix_get_mmu_psize(unsigned long page_size)
>> +{
>> +	int psize;
>> +
>> +	if (page_size == (1UL << mmu_psize_defs[mmu_virtual_psize].shift))
>> +		psize = mmu_virtual_psize;
>> +	else if (page_size == (1UL << mmu_psize_defs[MMU_PAGE_2M].shift))
>> +		psize = MMU_PAGE_2M;
>> +	else if (page_size == (1UL << mmu_psize_defs[MMU_PAGE_1G].shift))
>> +		psize = MMU_PAGE_1G;
>
> Do we actually have support for 1G pages yet?  I couldn't see where
> they get instantiated.

We use that for kernel linear mapping.

>
>> +	else
>> +		return -1;
>> +	return psize;
>> +}
>> +
>> +
>>  static int __init radix_dt_scan_page_sizes(unsigned long node,
>>  					   const char *uname, int depth,
>>  					   void *data)
>> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
>> index 911fdfb63ec1..503ae9bd3efe 100644
>> --- a/arch/powerpc/mm/pgtable.c
>> +++ b/arch/powerpc/mm/pgtable.c
>> @@ -219,12 +219,18 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>>  			  pte_t *ptep, pte_t entry, int dirty)
>>  {
>>  	int changed;
>> +	unsigned long page_size;
>> +
>>  	entry = set_access_flags_filter(entry, vma, dirty);
>>  	changed = !pte_same(*(ptep), entry);
>>  	if (changed) {
>> -		if (!is_vm_hugetlb_page(vma))
>> +		if (!is_vm_hugetlb_page(vma)) {
>> +			page_size = PAGE_SIZE;
>>  			assert_pte_locked(vma->vm_mm, address);
>> -		__ptep_set_access_flags(vma->vm_mm, ptep, entry);
>> +		} else
>> +			page_size = huge_page_size(hstate_vma(vma));
>
> I don't understand how this can work with THP.  You're determining the
> page size using only the VMA, but with a THP VMA surely we get
> different page sizes at different addresses?

That applies only for hugetlb pages. Ie, for hugetlb vma we use
vm_area_struct to determine the hugepage size.

For THP hugepage configuration we end up calling 

int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
			  pmd_t *pmdp, pmd_t entry, int dirty)

and for that we do
	__ptep_set_access_flags(vma->vm_mm, pmdp_ptep(pmdp), pmd_pte(entry),
				address, HPAGE_PMD_SIZE);


>
> More generally, I'm OK with adding the address parameter to
> __ptep_set_access_flags, but I think Ben's suggestion of encoding the
> page size in the PTE value is a good one.  I think it is as simple as
> the patch below (assuming we only support 2MB large pages for now).
> That would simplify things a bit and also it would mean that we are
> sure we know the page size correctly even with THP.
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 9fd77f8..e4f3581 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -32,7 +32,8 @@
>  #define _PAGE_SOFT_DIRTY	0x00000
>  #endif
>  #define _PAGE_SPECIAL		_RPAGE_SW2 /* software: special page */
> -
> +#define _PAGE_GIGANTIC		_RPAGE_SW0	/* software: 1GB page */
> +#define _PAGE_LARGE		_RPAGE_SW1	/* software: 2MB page */

I already use _RPAGE_SW1 for _PAGE_DEVMAP (pmem/nvdimm, patch set to the
list but not merged yet). We are really low on software free bits w.r.t
pte and I was avoiding using that.

I was thinking this series is a simpler cleanup of different page table
update interface to take page size as argument.


>
>  #define _PAGE_PTE		(1ul << 62)	/* distinguishes PTEs from pointers */
>  #define _PAGE_PRESENT		(1ul << 63)	/* pte contains a translation */
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index f4f437c..7ff0289 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -86,7 +86,7 @@ pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
>  {
>  	unsigned long pmdv;
>
> -	pmdv = (pfn << PAGE_SHIFT) & PTE_RPN_MASK;
> +	pmdv = ((pfn << PAGE_SHIFT) & PTE_RPN_MASK) | _PAGE_LARGE;
>  	return pmd_set_protbits(__pmd(pmdv), pgprot);
>  }
>

I will look at this and see if can make the patch simpler. But do we
really want to use the pte bit for this ? Aren't we low on free pte bits
?

-aneesh

next prev parent reply	other threads:[~2016-11-25  4:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-23 11:09 [PATCH v5 1/7] powerpc/mm: update ptep_set_access_flag to not do full mm tlb flush Aneesh Kumar K.V
2016-11-23 11:09 ` [PATCH v5 2/7] powerpc/mm: Rename hugetlb-radix.h to hugetlb.h Aneesh Kumar K.V
2016-11-23 11:09 ` [PATCH v5 3/7] powerpc/mm/hugetlb: Handle hugepage size supported by hash config Aneesh Kumar K.V
2016-11-23 14:08   ` Balbir Singh
2016-11-23 14:30     ` Aneesh Kumar K.V
2016-11-23 11:10 ` [PATCH v5 4/7] powerpc/mm/hugetlb: Make copy of huge_ptep_get_and_clear to different platform headers Aneesh Kumar K.V
2016-11-23 11:10 ` [PATCH v5 5/7] powerpc/mm/hugetlb: Switch hugetlb update to use huge_pte_update Aneesh Kumar K.V
2016-11-23 11:10 ` [PATCH v5 6/7] powerpc/mm: update pte_update to not do full mm tlb flush Aneesh Kumar K.V
2016-11-23 11:10 ` [PATCH v5 7/7] powerpc/mm: Batch tlb flush when invalidating pte entries Aneesh Kumar K.V
2016-11-23 11:23 ` [PATCH v5 1/7] powerpc/mm: update ptep_set_access_flag to not do full mm tlb flush Balbir Singh
2016-11-23 11:53   ` Aneesh Kumar K.V
2016-11-23 14:05     ` Balbir Singh
2016-11-23 14:36       ` Aneesh Kumar K.V
2016-11-23 15:21         ` Balbir Singh
2016-11-25  2:48 ` Paul Mackerras
2016-11-25  4:19   ` Aneesh Kumar K.V [this message]
2016-11-25  7:05     ` Aneesh Kumar K.V
2016-11-25  8:22       ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fumgjimq.fsf@linux.vnet.ibm.com \
    --to=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.