[Patch 1/3] Reclaim pmd and pte entries to quicklists.

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* [Patch 1/3] Reclaim pmd and pte entries to quicklists.
@ 2005-02-26 14:25 Robin Holt
  2005-03-01 19:23 ` David Mosberger
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Robin Holt @ 2005-02-26 14:25 UTC (permalink / raw)
  To: linux-ia64

Tony,

This patch introduces using the quicklists for pgd, pmd, and pte levels
by combining the alloc and free functions into a common set of routines.
This greatly simplifies the reading of this header file.

I ran a full lmbench benchmark before and after this change and did not
see a significant change in performance on most things.  There is, however
a marked difference for the lat_proc fork+exit and fork+execve runs.

Signed-off-by: Robin Holt <holt@sgi.com>


Before:
Process fork+exit: 255.0909 microseconds
Process fork+exit: 254.5238 microseconds
Process fork+exit: 254.5714 microseconds
Process fork+exit: 260.8636 microseconds
Process fork+exit: 248.9091 microseconds
Process fork+exit: 254.0952 microseconds
Process fork+exit: 252.1364 microseconds
Process fork+exit: 251.3478 microseconds
Process fork+exit: 250.5217 microseconds
Process fork+exit: 249.5238 microseconds
Process fork+execve: 258.0500 microseconds
Process fork+execve: 258.5500 microseconds
Process fork+execve: 259.2500 microseconds
Process fork+execve: 255.5000 microseconds
Process fork+execve: 254.6500 microseconds
Process fork+execve: 262.0455 microseconds
Process fork+execve: 257.2500 microseconds
Process fork+execve: 256.0909 microseconds
Process fork+execve: 258.4500 microseconds
Process fork+execve: 258.4000 microseconds

After:
Process fork+exit: 184.2333 microseconds
Process fork+exit: 184.7241 microseconds
Process fork+exit: 184.0333 microseconds
Process fork+exit: 185.6667 microseconds
Process fork+exit: 185.4000 microseconds
Process fork+exit: 184.6000 microseconds
Process fork+exit: 184.1333 microseconds
Process fork+exit: 184.3667 microseconds
Process fork+exit: 184.7667 microseconds
Process fork+exit: 183.7097 microseconds
Process fork+execve: 188.5172 microseconds
Process fork+execve: 190.0000 microseconds
Process fork+execve: 189.7931 microseconds
Process fork+execve: 190.2414 microseconds
Process fork+execve: 190.5517 microseconds
Process fork+execve: 190.5172 microseconds
Process fork+execve: 191.0000 microseconds
Process fork+execve: 189.9310 microseconds
Process fork+execve: 191.2069 microseconds
Process fork+execve: 190.8276 microseconds


 arch/ia64/mm/contig.c        |    2 
 arch/ia64/mm/discontig.c     |    2 
 arch/ia64/mm/init.c          |    9 +---
 include/asm-ia64/pgalloc.h   |   95 +++++++++++++------------------------------
 include/asm-ia64/processor.h |    5 --
 5 files changed, 36 insertions(+), 77 deletions(-)


Index: linux-2.6/arch/ia64/mm/discontig.c
=================================--- linux-2.6.orig/arch/ia64/mm/discontig.c	2005-02-25 17:14:45.712380229 -0600
+++ linux-2.6/arch/ia64/mm/discontig.c	2005-02-26 07:25:48.590951585 -0600
@@ -582,7 +582,7 @@
 	printk("%d reserved pages\n", total_reserved);
 	printk("%d pages shared\n", total_shared);
 	printk("%d pages swap cached\n", total_cached);
-	printk("Total of %ld pages in page table cache\n", pgtable_cache_size);
+	printk("Total of %ld pages in page table cache\n", local_cpu_data->pgtable_quicklist_size);
 	printk("%d free buffer pages\n", nr_free_buffer_pages());
 }
 
Index: linux-2.6/arch/ia64/mm/init.c
=================================--- linux-2.6.orig/arch/ia64/mm/init.c	2005-02-25 17:14:45.713356780 -0600
+++ linux-2.6/arch/ia64/mm/init.c	2005-02-26 07:36:18.933000631 -0600
@@ -64,13 +64,10 @@
 	high = pgt_cache_water[1];
 
 	preempt_disable();
-	if (pgtable_cache_size > (u64) high) {
+	if (local_cpu_data->pgtable_quicklist_size > (u64) high) {
 		do {
-			if (pgd_quicklist)
-				free_page((unsigned long)pgd_alloc_one_fast(NULL));
-			if (pmd_quicklist)
-				free_page((unsigned long)pmd_alloc_one_fast(NULL, 0));
-		} while (pgtable_cache_size > (u64) low);
+			free_page((unsigned long)pgtable_quicklist_alloc());
+		} while (local_cpu_data->pgtable_quicklist_size > (u64) low);
 	}
 	preempt_enable();
 }
Index: linux-2.6/include/asm-ia64/pgalloc.h
=================================--- linux-2.6.orig/include/asm-ia64/pgalloc.h	2005-02-25 17:14:45.714333331 -0600
+++ linux-2.6/include/asm-ia64/pgalloc.h	2005-02-26 07:31:43.557396415 -0600
@@ -23,57 +23,49 @@
 #include <asm/mmu_context.h>
 #include <asm/processor.h>
 
-/*
- * Very stupidly, we used to get new pgd's and pmd's, init their contents
- * to point to the NULL versions of the next level page table, later on
- * completely re-init them the same way, then free them up.  This wasted
- * a lot of work and caused unnecessary memory traffic.  How broken...
- * We fix this by caching them.
- */
-#define pgd_quicklist		(local_cpu_data->pgd_quick)
-#define pmd_quicklist		(local_cpu_data->pmd_quick)
-#define pgtable_cache_size	(local_cpu_data->pgtable_cache_sz)
 
-static inline pgd_t*
-pgd_alloc_one_fast (struct mm_struct *mm)
+
+static inline void*
+pgtable_quicklist_alloc(void)
 {
 	unsigned long *ret = NULL;
 
 	preempt_disable();
 
-	ret = pgd_quicklist;
+	ret = local_cpu_data->pgtable_quicklist;
 	if (likely(ret != NULL)) {
-		pgd_quicklist = (unsigned long *)(*ret);
+		local_cpu_data->pgtable_quicklist = (unsigned long *)(*ret);
 		ret[0] = 0;
-		--pgtable_cache_size;
-	} else
-		ret = NULL;
+		--local_cpu_data->pgtable_quicklist_size;
+	} else {
+		ret = (unsigned long *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+	}
 
 	preempt_enable();
 
-	return (pgd_t *) ret;
+	return ret;
+}
+
+static inline void
+pgtable_quicklist_free (void *pgtable_entry)
+{
+	preempt_disable();
+	*(unsigned long *)pgtable_entry = (unsigned long) local_cpu_data->pgtable_quicklist;
+	local_cpu_data->pgtable_quicklist = (unsigned long *) pgtable_entry;
+	++local_cpu_data->pgtable_quicklist_size;
+	preempt_enable();
 }
 
 static inline pgd_t*
 pgd_alloc (struct mm_struct *mm)
 {
-	/* the VM system never calls pgd_alloc_one_fast(), so we do it here. */
-	pgd_t *pgd = pgd_alloc_one_fast(mm);
-
-	if (unlikely(pgd = NULL)) {
-		pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
-	}
-	return pgd;
+	return pgtable_quicklist_alloc();
 }
 
 static inline void
 pgd_free (pgd_t *pgd)
 {
-	preempt_disable();
-	*(unsigned long *)pgd = (unsigned long) pgd_quicklist;
-	pgd_quicklist = (unsigned long *) pgd;
-	++pgtable_cache_size;
-	preempt_enable();
+	pgtable_quicklist_free(pgd);
 }
 
 static inline void
@@ -83,40 +75,15 @@
 }
 
 static inline pmd_t*
-pmd_alloc_one_fast (struct mm_struct *mm, unsigned long addr)
-{
-	unsigned long *ret = NULL;
-
-	preempt_disable();
-
-	ret = (unsigned long *)pmd_quicklist;
-	if (likely(ret != NULL)) {
-		pmd_quicklist = (unsigned long *)(*ret);
-		ret[0] = 0;
-		--pgtable_cache_size;
-	}
-
-	preempt_enable();
-
-	return (pmd_t *)ret;
-}
-
-static inline pmd_t*
 pmd_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-	pmd_t *pmd = (pmd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
-
-	return pmd;
+	return pgtable_quicklist_alloc();
 }
 
 static inline void
 pmd_free (pmd_t *pmd)
 {
-	preempt_disable();
-	*(unsigned long *)pmd = (unsigned long) pmd_quicklist;
-	pmd_quicklist = (unsigned long *) pmd;
-	++pgtable_cache_size;
-	preempt_enable();
+	pgtable_quicklist_free(pmd);
 }
 
 #define __pmd_free_tlb(tlb, pmd)	pmd_free(pmd)
@@ -136,32 +103,28 @@
 static inline struct page *
 pte_alloc_one (struct mm_struct *mm, unsigned long addr)
 {
-	struct page *pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0);
-
-	return pte;
+	return virt_to_page(pgtable_quicklist_alloc());
 }
 
 static inline pte_t *
 pte_alloc_one_kernel (struct mm_struct *mm, unsigned long addr)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
-
-	return pte;
+	return pgtable_quicklist_alloc();
 }
 
 static inline void
 pte_free (struct page *pte)
 {
-	__free_page(pte);
+	pgtable_quicklist_free(page_address(pte));
 }
 
 static inline void
 pte_free_kernel (pte_t *pte)
 {
-	free_page((unsigned long) pte);
+	pgtable_quicklist_free(pte);
 }
 
-#define __pte_free_tlb(tlb, pte)	tlb_remove_page((tlb), (pte))
+#define __pte_free_tlb(tlb, pte)	pte_free(pte)
 
 extern void check_pgt_cache (void);
 
Index: linux-2.6/include/asm-ia64/processor.h
=================================--- linux-2.6.orig/include/asm-ia64/processor.h	2005-02-25 17:14:45.714333331 -0600
+++ linux-2.6/include/asm-ia64/processor.h	2005-02-26 07:25:48.785285191 -0600
@@ -145,9 +145,8 @@
 	__u64 nsec_per_cyc;	/* (1000000000<<IA64_NSEC_PER_CYC_SHIFT)/itc_freq */
 	__u64 unimpl_va_mask;	/* mask of unimplemented virtual address bits (from PAL) */
 	__u64 unimpl_pa_mask;	/* mask of unimplemented physical address bits (from PAL) */
-	__u64 *pgd_quick;
-	__u64 *pmd_quick;
-	__u64 pgtable_cache_sz;
+	__u64 *pgtable_quicklist;
+	__u64 pgtable_quicklist_size;
 	__u64 itc_freq;		/* frequency of ITC counter */
 	__u64 proc_freq;	/* frequency of processor */
 	__u64 cyc_per_usec;	/* itc_freq/1000000 */
Index: linux-2.6/arch/ia64/mm/contig.c
=================================--- linux-2.6.orig/arch/ia64/mm/contig.c	2005-02-25 17:14:45.713356780 -0600
+++ linux-2.6/arch/ia64/mm/contig.c	2005-02-26 07:25:48.817511367 -0600
@@ -61,7 +61,7 @@
 	printk("%d reserved pages\n", reserved);
 	printk("%d pages shared\n", shared);
 	printk("%d pages swap cached\n", cached);
-	printk("%ld pages in page table cache\n", pgtable_cache_size);
+	printk("%ld pages in page table cache\n", local_cpu_data->pgtable_quicklist_size);
 }
 
 /* physical address where the bootmem map is located */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch 1/3] Reclaim pmd and pte entries to quicklists.
  2005-02-26 14:25 [Patch 1/3] Reclaim pmd and pte entries to quicklists Robin Holt
@ 2005-03-01 19:23 ` David Mosberger
  2005-03-01 21:10 ` Robin Holt
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2005-03-01 19:23 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Sat, 26 Feb 2005 08:25:41 -0600, Robin Holt <holt@sgi.com> said:

  Robin> Tony, This patch introduces using the quicklists for pgd,
  Robin> pmd, and pte levels by combining the alloc and free functions
  Robin> into a common set of routines.  This greatly simplifies the
  Robin> reading of this header file.

Agreed, but now you implicitly assume that the directories at all
levels are the same size.  I think it would be better to pass the
desired size to the alloc routine such that you can at least fail
noisily if somebody ever tried to use different-size directories.

  Robin> Before:
  Robin> Process fork+exit: 255.0909 microseconds

  Robin> After:
  Robin> Process fork+exit: 184.2333 microseconds

Nice.

  Robin> +	ret = local_cpu_data->pgtable_quicklist;

Is there any reason pgtable_quicklist isn't a normal per-CPU variable?
We didn't use to have that facility, that's why we used to keep it in
local_cpu_data, but noadays, there should be no need for that ugly
hack.

Since this is performance-critical, it's probably worthwhile to use
__ia64_per_cpu_var() to access the quicklist.  That way, the address
of pgtable_quicklist can be obtained with a single "addl" instruction.

	--david

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch 1/3] Reclaim pmd and pte entries to quicklists.
  2005-02-26 14:25 [Patch 1/3] Reclaim pmd and pte entries to quicklists Robin Holt
  2005-03-01 19:23 ` David Mosberger
@ 2005-03-01 21:10 ` Robin Holt
  2005-03-02 17:36 ` David Mosberger
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Robin Holt @ 2005-03-01 21:10 UTC (permalink / raw)
  To: linux-ia64

On Tue, Mar 01, 2005 at 11:23:21AM -0800, David Mosberger wrote:
> >>>>> On Sat, 26 Feb 2005 08:25:41 -0600, Robin Holt <holt@sgi.com> said:
> 
>   Robin> Tony, This patch introduces using the quicklists for pgd,
>   Robin> pmd, and pte levels by combining the alloc and free functions
>   Robin> into a common set of routines.  This greatly simplifies the
>   Robin> reading of this header file.
> 
> Agreed, but now you implicitly assume that the directories at all
> levels are the same size.  I think it would be better to pass the
> desired size to the alloc routine such that you can at least fail
> noisily if somebody ever tried to use different-size directories.

Can I skip this for now?  The old code assumed a page size allocation
as does the new.  When I go to implement the 4-level page tables,
we can come back and address this concern then.

> 
>   Robin> Before:
>   Robin> Process fork+exit: 255.0909 microseconds
> 
>   Robin> After:
>   Robin> Process fork+exit: 184.2333 microseconds
> 
> Nice.
> 
>   Robin> +	ret = local_cpu_data->pgtable_quicklist;
> 
> Is there any reason pgtable_quicklist isn't a normal per-CPU variable?
> We didn't use to have that facility, that's why we used to keep it in
> local_cpu_data, but noadays, there should be no need for that ugly
> hack.
> 
> Since this is performance-critical, it's probably worthwhile to use
> __ia64_per_cpu_var() to access the quicklist.  That way, the address
> of pgtable_quicklist can be obtained with a single "addl" instruction.

I made this change.  One thing I noticed is in contig.c and discontig.c,
we print out the number of pages in the quicklists.  Should I be
totalling that for all cpus and printing the summation?  Is there a
better way than for_each_online_cpu() (or whatever it is called)?

Thanks,
Robin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch 1/3] Reclaim pmd and pte entries to quicklists.
  2005-02-26 14:25 [Patch 1/3] Reclaim pmd and pte entries to quicklists Robin Holt
  2005-03-01 19:23 ` David Mosberger
  2005-03-01 21:10 ` Robin Holt
@ 2005-03-02 17:36 ` David Mosberger
  2005-03-02 18:33 ` Robin Holt
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2005-03-02 17:36 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Tue, 1 Mar 2005 15:10:54 -0600, Robin Holt <holt@sgi.com> said:

  >>  Agreed, but now you implicitly assume that the directories at
  >> all levels are the same size.  I think it would be better to pass
  >> the desired size to the alloc routine such that you can at least
  >> fail noisily if somebody ever tried to use different-size
  >> directories.

  Robin> Can I skip this for now?  The old code assumed a page size
  Robin> allocation as does the new.  When I go to implement the
  Robin> 4-level page tables, we can come back and address this
  Robin> concern then.

I wasn't suggesting to support multiple sizes, just to add a
BUG_ON(size != PAGE_SIZE) or something like that.

  >> Since this is performance-critical, it's probably worthwhile to
  >> use __ia64_per_cpu_var() to access the quicklist.  That way, the
  >> address of pgtable_quicklist can be obtained with a single "addl"
  >> instruction.

  Robin> I made this change.  One thing I noticed is in contig.c and
  Robin> discontig.c, we print out the number of pages in the
  Robin> quicklists.  Should I be totalling that for all cpus and
  Robin> printing the summation?

I think so.

  Robin> Is there a better way than for_each_online_cpu() (or whatever
  Robin> it is called)?

Not that I know of.  It's hardly a performance-critical operation, so
I don't see any issues.

	--david

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch 1/3] Reclaim pmd and pte entries to quicklists.
  2005-02-26 14:25 [Patch 1/3] Reclaim pmd and pte entries to quicklists Robin Holt
                   ` (2 preceding siblings ...)
  2005-03-02 17:36 ` David Mosberger
@ 2005-03-02 18:33 ` Robin Holt
  2005-03-02 18:43 ` David Mosberger
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Robin Holt @ 2005-03-02 18:33 UTC (permalink / raw)
  To: linux-ia64

> I wasn't suggesting to support multiple sizes, just to add a
> BUG_ON(size != PAGE_SIZE) or something like that.

What is the motivation for this?  I have not changed anything in the
pgtable allocation/free path which changes in any way the size.  I don't
see what issue I am raising with this patch.  The sizes are constants
defined in header files.  Can we just put in a #error instead?

I am just confused as to what we are trying to accomplish.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch 1/3] Reclaim pmd and pte entries to quicklists.
  2005-02-26 14:25 [Patch 1/3] Reclaim pmd and pte entries to quicklists Robin Holt
                   ` (3 preceding siblings ...)
  2005-03-02 18:33 ` Robin Holt
@ 2005-03-02 18:43 ` David Mosberger
  2005-03-02 20:44 ` Robin Holt
  2005-03-02 21:21 ` David Mosberger
  6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2005-03-02 18:43 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 2 Mar 2005 12:33:25 -0600, Robin Holt <holt@sgi.com> said:

  >> I wasn't suggesting to support multiple sizes, just to add a
  >> BUG_ON(size != PAGE_SIZE) or something like that.

  Robin> What is the motivation for this?

To make the kernel more resilient against inconsistencies.  Sure, it
won't be perfect, but while mucking with it anyway, why not improve
it?  Me thinks you'd want something along the lines of:

pgd_alloc:

	...
	pgd = alloc_one_fast(mm, PTRS_PER_PGD * sizeof(pgd_t));
	...

Isn't this clearer?

	--david

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch 1/3] Reclaim pmd and pte entries to quicklists.
  2005-02-26 14:25 [Patch 1/3] Reclaim pmd and pte entries to quicklists Robin Holt
                   ` (4 preceding siblings ...)
  2005-03-02 18:43 ` David Mosberger
@ 2005-03-02 20:44 ` Robin Holt
  2005-03-02 21:21 ` David Mosberger
  6 siblings, 0 replies; 8+ messages in thread
From: Robin Holt @ 2005-03-02 20:44 UTC (permalink / raw)
  To: linux-ia64

On Wed, Mar 02, 2005 at 10:43:41AM -0800, David Mosberger wrote:
> >>>>> On Wed, 2 Mar 2005 12:33:25 -0600, Robin Holt <holt@sgi.com> said:
> 
>   >> I wasn't suggesting to support multiple sizes, just to add a
>   >> BUG_ON(size != PAGE_SIZE) or something like that.
> 
>   Robin> What is the motivation for this?
> 
> To make the kernel more resilient against inconsistencies.  Sure, it
> won't be perfect, but while mucking with it anyway, why not improve
> it?  Me thinks you'd want something along the lines of:
> 
> pgd_alloc:
> 
> 	...
> 	pgd = alloc_one_fast(mm, PTRS_PER_PGD * sizeof(pgd_t));

Since the size of page tables does not change on a running system,
would you accept three BUG_ONs in mm/init.c that check pgd, pmd, and
pte allocation sizes?  That will be clearer and definitely not have any
chance of impacting runtime.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Patch 1/3] Reclaim pmd and pte entries to quicklists.
  2005-02-26 14:25 [Patch 1/3] Reclaim pmd and pte entries to quicklists Robin Holt
                   ` (5 preceding siblings ...)
  2005-03-02 20:44 ` Robin Holt
@ 2005-03-02 21:21 ` David Mosberger
  6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2005-03-02 21:21 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 2 Mar 2005 14:44:37 -0600, Robin Holt <holt@sgi.com> said:

  >> pgd_alloc:

  >> ...  pgd = alloc_one_fast(mm, PTRS_PER_PGD * sizeof(pgd_t));

  Robin> Since the size of page tables does not change on a running
  Robin> system, would you accept three BUG_ONs in mm/init.c that
  Robin> check pgd, pmd, and pte allocation sizes?  That will be
  Robin> clearer and definitely not have any chance of impacting
  Robin> runtime.

I suppose that's OK.  I still think the above interface is clearer,
but understand that the impact would be bigger on the existing patch.

	--david

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-03-02 21:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-26 14:25 [Patch 1/3] Reclaim pmd and pte entries to quicklists Robin Holt
2005-03-01 19:23 ` David Mosberger
2005-03-01 21:10 ` Robin Holt
2005-03-02 17:36 ` David Mosberger
2005-03-02 18:33 ` Robin Holt
2005-03-02 18:43 ` David Mosberger
2005-03-02 20:44 ` Robin Holt
2005-03-02 21:21 ` David Mosberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox