public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging
@ 2014-12-17 21:47 Fenghua Yu
  2014-12-18  0:25 ` Dave Hansen
  2014-12-18 17:25 ` H. Peter Anvin
  0 siblings, 2 replies; 7+ messages in thread
From: Fenghua Yu @ 2014-12-17 21:47 UTC (permalink / raw)
  To: Thomas Gleixner, H. Peter Anvin, Ingo Molnar, Glenn Williamson
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

X86 32-bit machine and kernel use PAE paging, which currently wastes about
4K of memory per process on Linux where we have to reserve an entire page to
support a single 256-byte PGD structure.  It would be a very good thing if
we could eliminate that wastage.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/mm/pgtable.c | 42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 6fb6927..695db92 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -1,5 +1,6 @@
 #include <linux/mm.h>
 #include <linux/gfp.h>
+#include <linux/slab.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
 #include <asm/tlb.h>
@@ -271,12 +272,46 @@ static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
 	}
 }
 
+/*
+ * Xen paravirt assumes pgd table should be in one page. pgd in 64 bit also
+ * needs to be in one page.
+ *
+ * But PAE without Xen only needs to allocate 256 bytes for pgd.
+ *
+ * So if kernel is compiled as PAE model without Xen, we allocate 256 bytes
+ * for pgd entries to save memory space.
+ *
+ * In other cases, one page is allocated for pgd. In theory, a kernel
+ * in PAE mode not running in Xen could allocate 256 bytes for pgd
+ * as well. But that will make the allocation and free more complex
+ * but not useful in reality. To simplify the code and testing, we just
+ * allocate one page when CONFIG_XEN is enabled regardelss kernel is running
+ * in Xen or not.
+ */
+static inline pgd_t *_pgd_alloc(void)
+{
+#if defined(CONFIG_X86_PAE) && !defined(CONFIG_XEN)
+	return kmalloc(sizeof(pgdval_t) * PTRS_PER_PGD, PGALLOC_GFP);
+#else
+	return (pgd_t *)__get_free_page(PGALLOC_GFP);
+#endif
+}
+
+static inline void _pgd_free(pgd_t *pgd)
+{
+#if defined(CONFIG_X86_PAE) && !defined(CONFIG_XEN)
+	kfree(pgd);
+#else
+	free_page((unsigned long)pgd);
+#endif
+}
+
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	pgd_t *pgd;
 	pmd_t *pmds[PREALLOCATED_PMDS];
 
-	pgd = (pgd_t *)__get_free_page(PGALLOC_GFP);
+	pgd = _pgd_alloc();
 
 	if (pgd == NULL)
 		goto out;
@@ -306,7 +341,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 out_free_pmds:
 	free_pmds(pmds);
 out_free_pgd:
-	free_page((unsigned long)pgd);
+	_pgd_free(pgd);
 out:
 	return NULL;
 }
@@ -316,7 +351,8 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 	pgd_mop_up_pmds(mm, pgd);
 	pgd_dtor(pgd);
 	paravirt_pgd_free(mm, pgd);
-	free_page((unsigned long)pgd);
+	_pgd_free(pgd);
+
 }
 
 /*
-- 
1.8.1.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging
  2014-12-17 21:47 [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging Fenghua Yu
@ 2014-12-18  0:25 ` Dave Hansen
  2014-12-18 14:52   ` Christoph Lameter
  2014-12-18 17:25 ` H. Peter Anvin
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2014-12-18  0:25 UTC (permalink / raw)
  To: Fenghua Yu, Thomas Gleixner, H. Peter Anvin, Ingo Molnar,
	Glenn Williamson
  Cc: linux-kernel, x86, Christoph Lameter

On 12/17/2014 01:47 PM, Fenghua Yu wrote:
> +static inline pgd_t *_pgd_alloc(void)
> +{
> +#if defined(CONFIG_X86_PAE) && !defined(CONFIG_XEN)
> +	return kmalloc(sizeof(pgdval_t) * PTRS_PER_PGD, PGALLOC_GFP);
> +#else
> +	return (pgd_t *)__get_free_page(PGALLOC_GFP);
> +#endif
> +}

I'm looking at:

	"Figure 4-7. Formats of CR3 and Paging-Structure Entries with
	 PAE Paging"

in the SDM.  It makes it pretty clear that the lower 5 bits of cr3 are
ignored in PAE mode.  That means we have to be 32-byte (or greater)
aligned, right?  Does kmalloc() guarantee that?

IOW, do *ALL* of the sl*b allocators in all of their forms with all of
their debugging options guarantee 32-byte alignment when allocating
256-byte objects?

I know we at least try to align to a cacheline, which would be good
enough, but I'm fuzzy on what we *guarantee*.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging
  2014-12-18  0:25 ` Dave Hansen
@ 2014-12-18 14:52   ` Christoph Lameter
  2014-12-18 15:36     ` Dave Hansen
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2014-12-18 14:52 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Fenghua Yu, Thomas Gleixner, H. Peter Anvin, Ingo Molnar,
	Glenn Williamson, linux-kernel, x86

On Wed, 17 Dec 2014, Dave Hansen wrote:

> IOW, do *ALL* of the sl*b allocators in all of their forms with all of
> their debugging options guarantee 32-byte alignment when allocating
> 256-byte objects?

No. For that the arch has to set a macro call ARCH_KMALLOC_MINALIGN or
ARCH_DMA_MINALIGN. Default alignment is to a word boundary.

> I know we at least try to align to a cacheline, which would be good
> enough, but I'm fuzzy on what we *guarantee*.

Sorry we only do this if requested for a slab cache.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging
  2014-12-18 14:52   ` Christoph Lameter
@ 2014-12-18 15:36     ` Dave Hansen
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2014-12-18 15:36 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Fenghua Yu, Thomas Gleixner, H. Peter Anvin, Ingo Molnar,
	Glenn Williamson, linux-kernel, x86

On 12/18/2014 06:52 AM, Christoph Lameter wrote:
> On Wed, 17 Dec 2014, Dave Hansen wrote:
>> > IOW, do *ALL* of the sl*b allocators in all of their forms with all of
>> > their debugging options guarantee 32-byte alignment when allocating
>> > 256-byte objects?
> No. For that the arch has to set a macro call ARCH_KMALLOC_MINALIGN or
> ARCH_DMA_MINALIGN. Default alignment is to a word boundary.

OK, sounds like this is going to need its own slab.  It will make the
patch a bit bigger but shouldn't be all that much more complicated.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging
  2014-12-17 21:47 [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging Fenghua Yu
  2014-12-18  0:25 ` Dave Hansen
@ 2014-12-18 17:25 ` H. Peter Anvin
  2014-12-18 17:51   ` Yu, Fenghua
  1 sibling, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2014-12-18 17:25 UTC (permalink / raw)
  To: Fenghua Yu, Thomas Gleixner, H. Peter Anvin, Ingo Molnar,
	Glenn Williamson
  Cc: linux-kernel, x86

On 12/17/2014 01:47 PM, Fenghua Yu wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> X86 32-bit machine and kernel use PAE paging, which currently wastes about
> 4K of memory per process on Linux where we have to reserve an entire page to
> support a single 256-byte PGD structure.  It would be a very good thing if
> we could eliminate that wastage.
> 

4*8 = 32 bytes, where did 256 bytes come from?

	-hpa



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging
  2014-12-18 17:25 ` H. Peter Anvin
@ 2014-12-18 17:51   ` Yu, Fenghua
  2014-12-18 18:00     ` H. Peter Anvin
  0 siblings, 1 reply; 7+ messages in thread
From: Yu, Fenghua @ 2014-12-18 17:51 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner, H. Peter Anvin, Ingo Molnar,
	Williamson, Glenn P
  Cc: linux-kernel, x86

> Sent: Thursday, December 18, 2014 9:26 AM
> To: Yu, Fenghua; Thomas Gleixner; H. Peter Anvin; Ingo Molnar; Williamson,
> Glenn P
> Cc: linux-kernel; x86
> Subject: Re: [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging
> 
> On 12/17/2014 01:47 PM, Fenghua Yu wrote:
> > From: Fenghua Yu <fenghua.yu@intel.com>
> >
> > X86 32-bit machine and kernel use PAE paging, which currently wastes
> > about 4K of memory per process on Linux where we have to reserve an
> > entire page to support a single 256-byte PGD structure.  It would be a
> > very good thing if we could eliminate that wastage.
> >
> 
> 4*8 = 32 bytes, where did 256 bytes come from?

You are right. It should be 32 bytes. I will change the wording in future patch. The real calculation in code is right though.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging
  2014-12-18 17:51   ` Yu, Fenghua
@ 2014-12-18 18:00     ` H. Peter Anvin
  0 siblings, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2014-12-18 18:00 UTC (permalink / raw)
  To: Yu, Fenghua, Thomas Gleixner, H. Peter Anvin, Ingo Molnar,
	Williamson, Glenn P
  Cc: linux-kernel, x86

>> 4*8 = 32 bytes, where did 256 bytes come from?
> 
> You are right. It should be 32 bytes. I will change the wording in future patch. The real calculation in code is right though.

I don't know if it makes sense to round up to a cache line.  My
suspicion is that it won't matter, as these fields will be read-mostly,
on the other hand, wasting 32 bytes isn't exactly a problem, either.

	-hpa



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-12-18 18:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-17 21:47 [PATCH v2] X86-32: Allocate 256 bytes for pgd in PAE paging Fenghua Yu
2014-12-18  0:25 ` Dave Hansen
2014-12-18 14:52   ` Christoph Lameter
2014-12-18 15:36     ` Dave Hansen
2014-12-18 17:25 ` H. Peter Anvin
2014-12-18 17:51   ` Yu, Fenghua
2014-12-18 18:00     ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox