linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <dwg@au1.ibm.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: paulus@samba.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org
Subject: Re: [PATCH -V5 06/25] powerpc: Reduce PTE table memory wastage
Date: Wed, 10 Apr 2013 17:04:03 +1000	[thread overview]
Message-ID: <20130410070403.GH8165@truffula.fritz.box> (raw)
In-Reply-To: <8738uyq4om.fsf@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 4253 bytes --]

On Wed, Apr 10, 2013 at 11:59:29AM +0530, Aneesh Kumar K.V wrote:
> David Gibson <dwg@au1.ibm.com> writes:
> > On Thu, Apr 04, 2013 at 11:27:44AM +0530, Aneesh Kumar K.V wrote:
[snip]
> >> @@ -97,13 +100,45 @@ void __destroy_context(int context_id)
> >>  }
> >>  EXPORT_SYMBOL_GPL(__destroy_context);
> >>  
> >> +#ifdef CONFIG_PPC_64K_PAGES
> >> +static void destroy_pagetable_page(struct mm_struct *mm)
> >> +{
> >> +	int count;
> >> +	struct page *page;
> >> +
> >> +	page = mm->context.pgtable_page;
> >> +	if (!page)
> >> +		return;
> >> +
> >> +	/* drop all the pending references */
> >> +	count = atomic_read(&page->_mapcount) + 1;
> >> +	/* We allow PTE_FRAG_NR(16) fragments from a PTE page */
> >> +	count = atomic_sub_return(16 - count, &page->_count);
> >
> > You should really move PTE_FRAG_NR to a header so you can actually use
> > it here rather than hard coding 16.
> >
> > It took me a fair while to convince myself that there is no race here
> > with something altering mapcount and count between the atomic_read()
> > and the atomic_sub_return().  It could do with a comment to explain
> > why that is safe.
> >
> > Re-using the mapcount field for your index also seems odd, and it took
> > me a while to convince myself that that's safe too.  Wouldn't it be
> > simpler to store a pointer to the next sub-page in the mm_context
> > instead? You can get from that to the struct page easily enough with a
> > shift and pfn_to_page().
> 
> I found using _mapcount simpler in this case. I was looking at it not
> as an index, but rather how may fragments are mapped/used already.

Except that it's actually (#fragments - 1).  Using subpage pointer
makes the fragments calculation (very slightly) harder, but the
calculation of the table address easier.  More importantly it avoids
adding effectively an extra variable - which is then shoehorned into a
structure not really designed to hold it.

> Using
> subpage pointer in mm->context.xyz means, we have to calculate the
> number of fragments used/mapped via the pointer. We need the fragment
> count so that we can drop page reference count correctly here.
> 
> 
> >
> >> +	if (!count) {
> >> +		pgtable_page_dtor(page);
> >> +		reset_page_mapcount(page);
> >> +		free_hot_cold_page(page, 0);
> >
> > It would be nice to use put_page() somehow instead of duplicating its
> > logic, though I realise the sparc code you've based this on does the
> > same thing.
> 
> That is not exactly put_page. We can avoid lots of check in this
> specific case.

[snip]
> >> +static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
> >> +{
> >> +	pte_t *ret = NULL;
> >> +	struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK |
> >> +				       __GFP_REPEAT | __GFP_ZERO);
> >> +	if (!page)
> >> +		return NULL;
> >> +
> >> +	spin_lock(&mm->page_table_lock);
> >> +	/*
> >> +	 * If we find pgtable_page set, we return
> >> +	 * the allocated page with single fragement
> >> +	 * count.
> >> +	 */
> >> +	if (likely(!mm->context.pgtable_page)) {
> >> +		atomic_set(&page->_count, PTE_FRAG_NR);
> >> +		atomic_set(&page->_mapcount, 0);
> >> +		mm->context.pgtable_page = page;
> >> +	}
> >
> > .. and in the unlikely case where there *is* a pgtable_page already
> > set, what then?  Seems like you should BUG_ON, or at least return NULL
> > - as it is you will return the first sub-page of that page again,
> > which is very likely in use.
> 
> 
> As explained in the comment above, we return with the allocated page
> with fragment count set to 1. So we end up having only one fragment. The
> other option I had was to to free the allocated page and do a
> get_from_cache under the page_table_lock. But since we already allocated
> the page, why not use that ?. It also keep the code similar to
> sparc.

My point is that I can't see any circumstance under which we should
ever hit this case.  Which means if we do something is badly messed up
and we should BUG() (or at least WARN()).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  reply	other threads:[~2013-04-10  7:04 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-04  5:57 [PATCH -V5 00/25] THP support for PPC64 Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 01/25] powerpc: Use signed formatting when printing error Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 02/25] powerpc: Save DAR and DSISR in pt_regs on MCE Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 03/25] powerpc: Don't hard code the size of pte page Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 04/25] powerpc: Reduce the PTE_INDEX_SIZE Aneesh Kumar K.V
2013-04-11  7:10   ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 05/25] powerpc: Move the pte free routines from common header Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 06/25] powerpc: Reduce PTE table memory wastage Aneesh Kumar K.V
2013-04-10  4:46   ` David Gibson
2013-04-10  6:29     ` Aneesh Kumar K.V
2013-04-10  7:04       ` David Gibson [this message]
2013-04-10  7:53         ` Aneesh Kumar K.V
2013-04-10 17:47           ` Aneesh Kumar K.V
2013-04-11  1:20             ` David Gibson
2013-04-11  1:12           ` David Gibson
2013-04-10  7:14   ` Michael Ellerman
2013-04-10  7:54     ` Aneesh Kumar K.V
2013-04-10  8:52       ` Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 07/25] powerpc: Use encode avpn where we need only avpn values Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 08/25] powerpc: Decode the pte-lp-encoding bits correctly Aneesh Kumar K.V
2013-04-10  7:19   ` David Gibson
2013-04-10  8:11     ` Aneesh Kumar K.V
2013-04-10 17:49       ` Aneesh Kumar K.V
2013-04-11  1:28       ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 09/25] powerpc: Fix hpte_decode to use the correct decoding for page sizes Aneesh Kumar K.V
2013-04-11  3:20   ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 10/25] powerpc: print both base and actual page size on hash failure Aneesh Kumar K.V
2013-04-11  3:21   ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 11/25] powerpc: Print page size info during boot Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 12/25] powerpc: Return all the valid pte ecndoing in KVM_PPC_GET_SMMU_INFO ioctl Aneesh Kumar K.V
2013-04-11  3:24   ` David Gibson
2013-04-11  5:11     ` Aneesh Kumar K.V
2013-04-11  5:57       ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 13/25] powerpc: Update tlbie/tlbiel as per ISA doc Aneesh Kumar K.V
2013-04-11  3:30   ` David Gibson
2013-04-11  5:20     ` Aneesh Kumar K.V
2013-04-11  6:16       ` David Gibson
2013-04-11  6:36         ` Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 14/25] mm/THP: HPAGE_SHIFT is not a #define on some arch Aneesh Kumar K.V
2013-04-11  3:36   ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 15/25] mm/THP: Add pmd args to pgtable deposit and withdraw APIs Aneesh Kumar K.V
2013-04-11  3:40   ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 16/25] mm/THP: withdraw the pgtable after pmdp related operations Aneesh Kumar K.V
2013-04-04  5:57 ` [PATCH -V5 17/25] powerpc/THP: Implement transparent hugepages for ppc64 Aneesh Kumar K.V
2013-04-11  5:38   ` David Gibson
2013-04-11  7:40     ` Aneesh Kumar K.V
2013-04-12  0:51       ` David Gibson
2013-04-12  5:06         ` Aneesh Kumar K.V
2013-04-12  5:39           ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 18/25] powerpc/THP: Double the PMD table size for THP Aneesh Kumar K.V
2013-04-11  6:18   ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 19/25] powerpc/THP: Differentiate THP PMD entries from HUGETLB PMD entries Aneesh Kumar K.V
2013-04-10  7:21   ` Michael Ellerman
2013-04-10 18:26     ` Aneesh Kumar K.V
2013-04-12  1:28   ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 20/25] powerpc/THP: Add code to handle HPTE faults for large pages Aneesh Kumar K.V
2013-04-12  4:01   ` David Gibson
2013-04-04  5:57 ` [PATCH -V5 21/25] powerpc: Handle hugepage in perf callchain Aneesh Kumar K.V
2013-04-12  1:34   ` David Gibson
2013-04-12  5:05     ` Aneesh Kumar K.V
2013-04-04  5:58 ` [PATCH -V5 22/25] powerpc/THP: get_user_pages_fast changes Aneesh Kumar K.V
2013-04-12  1:41   ` David Gibson
2013-04-04  5:58 ` [PATCH -V5 23/25] powerpc/THP: Enable THP on PPC64 Aneesh Kumar K.V
2013-04-04  5:58 ` [PATCH -V5 24/25] powerpc: Optimize hugepage invalidate Aneesh Kumar K.V
2013-04-12  4:21   ` David Gibson
2013-04-14 10:02     ` Aneesh Kumar K.V
2013-04-15  1:18       ` David Gibson
2013-04-04  5:58 ` [PATCH -V5 25/25] powerpc: Handle hugepages in kvm Aneesh Kumar K.V
2013-04-04  6:00 ` [PATCH -V5 00/25] THP support for PPC64 Simon Jeons
2013-04-04  6:10   ` Aneesh Kumar K.V
2013-04-04  6:14 ` Simon Jeons
2013-04-04  8:38   ` Aneesh Kumar K.V
2013-04-19  1:55 ` Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130410070403.GH8165@truffula.fritz.box \
    --to=dwg@au1.ibm.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).