* [RFC] i386: Remove page sized slabs for pgds and pmds
@ 2007-03-28 20:12 Christoph Lameter
2007-03-28 20:45 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-03-28 20:12 UTC (permalink / raw)
To: linux-kernel; +Cc: William Lee Irwin III, akpm
The benefit of preconstructed pgds and pmds in the i386 arch code seem to
be debatable. The performance measurements indicate that there may be a slight
benefit but it seems to almost vanish in the noise ratio.
Method used (i386 1G memory):
1. Boot kernel
2. make clean
3. time make all
Results:
2.6.21-rc5:
real 8m45.505s
user 8m0.910s
sys 0m34.550s
real 8m45.780s
user 8m1.380s
sys 0m33.890s
real 8m47.247s
user 8m1.420s
sys 0m33.980s
real 8m49.382s
user 8m2.460s
sys 0m32.950s
2.6.21-rc5 with patch below:
real 8m47.352s
user 8m3.190s
sys 0m33.070s
real 8m46.747s
user 8m2.680s
sys 0m33.750s
real 8m48.987s
user 8m1.850s
sys 0m34.690s
real 8m49.341s
user 8m2.560s
sys 0m34.220s
i386 only provides support for caching constructed pgd and pmds. These are
comparatively rare to ptes so it may be no surprise that the current
approach has only minimal effect. An implementation that would also cache
ptes in their zeroed state may led to a benefit but maybe that could be
done after this patch was merged? However, such an implementation requires
modifications to the tlb shootdown logic and funky things with highmem(!).
Removal of the slabs avoids concurrent use of page structs by
the slab allocators and i386 arch code.
This means:
1. We can modify SLAB to allow debugging of page order slab caches.
2. SLUB avoids the special casing for page size slabs and also allow
debugging of all slab caches.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: William Lee Irwin III <wli@holomorphy.com>
Index: linux-2.6.21-rc5/arch/i386/mm/init.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/i386/mm/init.c 2007-03-25 22:56:23.000000000 +0000
+++ linux-2.6.21-rc5/arch/i386/mm/init.c 2007-03-28 18:23:51.000000000 +0000
@@ -695,31 +695,6 @@
EXPORT_SYMBOL_GPL(remove_memory);
#endif
-struct kmem_cache *pgd_cache;
-struct kmem_cache *pmd_cache;
-
-void __init pgtable_cache_init(void)
-{
- if (PTRS_PER_PMD > 1) {
- pmd_cache = kmem_cache_create("pmd",
- PTRS_PER_PMD*sizeof(pmd_t),
- PTRS_PER_PMD*sizeof(pmd_t),
- 0,
- pmd_ctor,
- NULL);
- if (!pmd_cache)
- panic("pgtable_cache_init(): cannot create pmd cache");
- }
- pgd_cache = kmem_cache_create("pgd",
- PTRS_PER_PGD*sizeof(pgd_t),
- PTRS_PER_PGD*sizeof(pgd_t),
- 0,
- pgd_ctor,
- PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
- if (!pgd_cache)
- panic("pgtable_cache_init(): Cannot create pgd cache");
-}
-
/*
* This function cannot be __init, since exceptions don't work in that
* section. Put this after the callers, so that it cannot be inlined.
Index: linux-2.6.21-rc5/arch/i386/mm/pageattr.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/i386/mm/pageattr.c 2007-03-25 22:56:23.000000000 +0000
+++ linux-2.6.21-rc5/arch/i386/mm/pageattr.c 2007-03-28 18:23:51.000000000 +0000
@@ -87,24 +87,23 @@
static void set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
{
- struct page *page;
- unsigned long flags;
+ struct mm_struct *mm;
set_pte_atomic(kpte, pte); /* change init_mm */
if (PTRS_PER_PMD > 1)
return;
- spin_lock_irqsave(&pgd_lock, flags);
- for (page = pgd_list; page; page = (struct page *)page->index) {
- pgd_t *pgd;
+ spin_lock(&mmlist_lock);
+ list_for_each_entry(mm, &init_mm.mmlist, mmlist) {
+ pgd_t *pgd = mm->pgd;
pud_t *pud;
pmd_t *pmd;
- pgd = (pgd_t *)page_address(page) + pgd_index(address);
+
pud = pud_offset(pgd, address);
pmd = pmd_offset(pud, address);
set_pte_atomic((pte_t *)pmd, pte);
}
- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&mmlist_lock);
}
/*
Index: linux-2.6.21-rc5/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/i386/mm/pgtable.c 2007-03-25 22:56:23.000000000 +0000
+++ linux-2.6.21-rc5/arch/i386/mm/pgtable.c 2007-03-28 18:23:51.000000000 +0000
@@ -181,109 +181,39 @@
#endif
}
-pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
-{
- return (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
-}
-
-struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
-{
- struct page *pte;
-
-#ifdef CONFIG_HIGHPTE
- pte = alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT|__GFP_ZERO, 0);
-#else
- pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0);
-#endif
- return pte;
-}
-
-void pmd_ctor(void *pmd, struct kmem_cache *cache, unsigned long flags)
-{
- memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t));
-}
-
-/*
- * List of all pgd's needed for non-PAE so it can invalidate entries
- * in both cached and uncached pgd's; not needed for PAE since the
- * kernel pmd is shared. If PAE were not to share the pmd a similar
- * tactic would be needed. This is essentially codepath-based locking
- * against pageattr.c; it is the unique case in which a valid change
- * of kernel pagetables can't be lazily synchronized by vmalloc faults.
- * vmalloc faults work because attached pagetables are never freed.
- * The locking scheme was chosen on the basis of manfred's
- * recommendations and having no core impact whatsoever.
- * -- wli
- */
-DEFINE_SPINLOCK(pgd_lock);
-struct page *pgd_list;
-
-static inline void pgd_list_add(pgd_t *pgd)
-{
- struct page *page = virt_to_page(pgd);
- page->index = (unsigned long)pgd_list;
- if (pgd_list)
- set_page_private(pgd_list, (unsigned long)&page->index);
- pgd_list = page;
- set_page_private(page, (unsigned long)&pgd_list);
-}
-
-static inline void pgd_list_del(pgd_t *pgd)
-{
- struct page *next, **pprev, *page = virt_to_page(pgd);
- next = (struct page *)page->index;
- pprev = (struct page **)page_private(page);
- *pprev = next;
- if (next)
- set_page_private(next, (unsigned long)pprev);
-}
-
-void pgd_ctor(void *pgd, struct kmem_cache *cache, unsigned long unused)
-{
- unsigned long flags;
-
- if (PTRS_PER_PMD == 1) {
- memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
- spin_lock_irqsave(&pgd_lock, flags);
- }
-
- clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
- swapper_pg_dir + USER_PTRS_PER_PGD,
- KERNEL_PGD_PTRS);
-
- if (PTRS_PER_PMD > 1)
- return;
-
- /* must happen under lock */
- paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
- __pa(swapper_pg_dir) >> PAGE_SHIFT,
- USER_PTRS_PER_PGD, PTRS_PER_PGD - USER_PTRS_PER_PGD);
-
- pgd_list_add(pgd);
- spin_unlock_irqrestore(&pgd_lock, flags);
-}
-
-/* never called when PTRS_PER_PMD > 1 */
-void pgd_dtor(void *pgd, struct kmem_cache *cache, unsigned long unused)
-{
- unsigned long flags; /* can be called from interrupt context */
-
- paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT);
- spin_lock_irqsave(&pgd_lock, flags);
- pgd_list_del(pgd);
- spin_unlock_irqrestore(&pgd_lock, flags);
-}
+#ifdef CONFIG_HIGHMEM64G
+#define __pgd_alloc() kmem_cache_alloc(pgd_cache, GFP_KERNEL|__GFP_REPEAT)
+#define __pgd_free(pgd) kmem_cache_free(pgd_cache, pgd)
+
+static struct kmem_cache *pgd_cache;
+
+void __init pgtable_cache_init(void)
+{
+ pgd_cache = kmem_cache_create("pgd",
+ PTRS_PER_PGD*sizeof(pgd_t),
+ PTRS_PER_PGD*sizeof(pgd_t),
+ SLAB_PANIC,
+ NULL,
+ NULL);
+}
+#else /* !CONFIG_HIGHMEM64G */
+#define __pgd_alloc() ((pgd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT))
+#define __pgd_free(pgd) free_page((unsigned long)(pgd))
+#endif /* !CONFIG_HIGHMEM64G */
pgd_t *pgd_alloc(struct mm_struct *mm)
{
int i;
- pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL);
+ pgd_t *pgd = __pgd_alloc();
- if (PTRS_PER_PMD == 1 || !pgd)
+ if (!pgd)
+ return NULL;
+ memcpy(&pgd[USER_PTRS_PER_PGD], &swapper_pg_dir[USER_PTRS_PER_PGD],
+ KERNEL_PGD_PTRS*sizeof(pgd_t));
+ if (PTRS_PER_PMD == 1)
return pgd;
-
for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
- pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
+ pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
if (!pmd)
goto out_oom;
paravirt_alloc_pd(__pa(pmd) >> PAGE_SHIFT);
@@ -296,9 +226,9 @@
pgd_t pgdent = pgd[i];
void* pmd = (void *)__va(pgd_val(pgdent)-1);
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
- kmem_cache_free(pmd_cache, pmd);
+ free_page((unsigned long)pmd);
}
- kmem_cache_free(pgd_cache, pgd);
+ __pgd_free(pgd);
return NULL;
}
@@ -312,8 +242,8 @@
pgd_t pgdent = pgd[i];
void* pmd = (void *)__va(pgd_val(pgdent)-1);
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
- kmem_cache_free(pmd_cache, pmd);
+ free_page((unsigned long)pmd);
}
/* in the non-PAE case, free_pgtables() clears user pgd entries */
- kmem_cache_free(pgd_cache, pgd);
+ __pgd_free(pgd);
}
Index: linux-2.6.21-rc5/include/asm-i386/pgalloc.h
===================================================================
--- linux-2.6.21-rc5.orig/include/asm-i386/pgalloc.h 2007-03-25 22:56:23.000000000 +0000
+++ linux-2.6.21-rc5/include/asm-i386/pgalloc.h 2007-03-28 18:23:51.000000000 +0000
@@ -36,8 +36,22 @@
extern pgd_t *pgd_alloc(struct mm_struct *);
extern void pgd_free(pgd_t *pgd);
-extern pte_t *pte_alloc_one_kernel(struct mm_struct *, unsigned long);
-extern struct page *pte_alloc_one(struct mm_struct *, unsigned long);
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long uvaddr)
+{
+ return (pte_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+}
+
+#ifdef CONFIG_HIGHPTE
+static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long uvaddr)
+{
+ return alloc_page(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT|__GFP_ZERO);
+}
+#else /* !CONFIG_HIGHPTE */
+static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long uvaddr)
+{
+ return alloc_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+}
+#endif /* !CONFIG_HIGHPTE */
static inline void pte_free_kernel(pte_t *pte)
{
Index: linux-2.6.21-rc5/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.21-rc5.orig/include/asm-i386/pgtable.h 2007-03-25 22:56:23.000000000 +0000
+++ linux-2.6.21-rc5/include/asm-i386/pgtable.h 2007-03-28 18:23:51.000000000 +0000
@@ -35,15 +35,6 @@
#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
extern unsigned long empty_zero_page[1024];
extern pgd_t swapper_pg_dir[1024];
-extern struct kmem_cache *pgd_cache;
-extern struct kmem_cache *pmd_cache;
-extern spinlock_t pgd_lock;
-extern struct page *pgd_list;
-
-void pmd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_ctor(void *, struct kmem_cache *, unsigned long);
-void pgd_dtor(void *, struct kmem_cache *, unsigned long);
-void pgtable_cache_init(void);
void paging_init(void);
/*
Index: linux-2.6.21-rc5/include/asm-i386/pgtable-2level.h
===================================================================
--- linux-2.6.21-rc5.orig/include/asm-i386/pgtable-2level.h 2007-03-25 22:56:23.000000000 +0000
+++ linux-2.6.21-rc5/include/asm-i386/pgtable-2level.h 2007-03-28 18:23:51.000000000 +0000
@@ -67,5 +67,6 @@
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
void vmalloc_sync_all(void);
+#define pgtable_cache_init() do { } while (0)
#endif /* _I386_PGTABLE_2LEVEL_H */
Index: linux-2.6.21-rc5/include/asm-i386/pgtable-3level.h
===================================================================
--- linux-2.6.21-rc5.orig/include/asm-i386/pgtable-3level.h 2007-03-25 22:56:23.000000000 +0000
+++ linux-2.6.21-rc5/include/asm-i386/pgtable-3level.h 2007-03-28 18:23:51.000000000 +0000
@@ -188,5 +188,6 @@
#define __pmd_free_tlb(tlb, x) do { } while (0)
#define vmalloc_sync_all() ((void)0)
+void pgtable_cache_init(void);
#endif /* _I386_PGTABLE_3LEVEL_H */
Index: linux-2.6.21-rc5/arch/i386/mm/fault.c
===================================================================
--- linux-2.6.21-rc5.orig/arch/i386/mm/fault.c 2007-03-25 22:56:23.000000000 +0000
+++ linux-2.6.21-rc5/arch/i386/mm/fault.c 2007-03-28 18:23:51.000000000 +0000
@@ -604,19 +604,19 @@
BUILD_BUG_ON(TASK_SIZE & ~PGDIR_MASK);
for (address = start; address >= TASK_SIZE; address += PGDIR_SIZE) {
if (!test_bit(pgd_index(address), insync)) {
- unsigned long flags;
- struct page *page;
+ struct mm_struct *mm;
+ int broken = 0;
- spin_lock_irqsave(&pgd_lock, flags);
- for (page = pgd_list; page; page =
- (struct page *)page->index)
- if (!vmalloc_sync_one(page_address(page),
- address)) {
- BUG_ON(page != pgd_list);
- break;
- }
- spin_unlock_irqrestore(&pgd_lock, flags);
- if (!page)
+ spin_lock(&mmlist_lock);
+ list_for_each_entry(mm, &init_mm.mmlist, mmlist) {
+ if (vmalloc_sync_one(mm->pgd, address))
+ continue;
+ BUG_ON(mm->mmlist.prev != &init_mm.mmlist);
+ broken = 1;
+ break;
+ }
+ spin_unlock(&mmlist_lock);
+ if (!broken)
set_bit(pgd_index(address), insync);
}
if (address == start && test_bit(pgd_index(address), insync))
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 20:12 [RFC] i386: Remove page sized slabs for pgds and pmds Christoph Lameter
@ 2007-03-28 20:45 ` Andrew Morton
2007-03-28 21:05 ` Christoph Lameter
2007-03-28 20:50 ` William Lee Irwin III
2007-03-28 22:26 ` Chris Wright
2 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2007-03-28 20:45 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-kernel, William Lee Irwin III
On Wed, 28 Mar 2007 13:12:56 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:
> The benefit of preconstructed pgds and pmds in the i386 arch code seem to
> be debatable. The performance measurements indicate that there may be a slight
> benefit but it seems to almost vanish in the noise ratio.
yeah, I did a couple hours testing with rc5, rc5+quicklists,
rc5+quicklists+qlhack, where "qlhack" is my patch which removes the
quicklists and mysteriously causes i386 to crash.
I was testing 8-way x86_64 with patch-scripts (wildly fork-intensive),
kernel build and lmbench.
Basically I was unable to observe any statistically meaningful-looking
differences between any of them.
I would like to test qlhack on ia64, because ia64 puts pte-pages into the
quicklists as well, but my ia64 machine is presently indisposed.
My (much) preferred way to handle all this is to remove the quicklists from
all architectures: just use alloc_page() and free_page() for pte pages, pmd
pages, etc. I remain unaware of any testing whcih indicates that this
would be a bad change.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 20:45 ` Andrew Morton
@ 2007-03-28 21:05 ` Christoph Lameter
2007-03-28 21:16 ` William Lee Irwin III
2007-03-29 1:17 ` Benjamin LaHaise
0 siblings, 2 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-03-28 21:05 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, William Lee Irwin III
On Wed, 28 Mar 2007, Andrew Morton wrote:
> I would like to test qlhack on ia64, because ia64 puts pte-pages into the
> quicklists as well, but my ia64 machine is presently indisposed.
Its probably easier to set the number of cached pages to one.
> My (much) preferred way to handle all this is to remove the quicklists from
> all architectures: just use alloc_page() and free_page() for pte pages, pmd
> pages, etc. I remain unaware of any testing whcih indicates that this
> would be a bad change.
Tried this also on x86_64 with an enhanced quicklist patch that also deals
with ptes (at the price of not guaranteeing the free after the tlb flush):
quicklist:
real 8m56.787s
user 7m46.380s
sys 0m55.910s
real 8m58.326s
user 7m46.030s
sys 0m56.250s
real 8m56.900s
user 7m46.620s
sys 0m55.570s
no quicklist:
real 9m7.798s
user 7m48.340s
sys 0m58.360s
real 8m58.994s
user 7m47.980s
sys 0m54.870s
real 8m58.533s
user 7m48.030s
sys 0m54.150s
Seems that there is a slight benefit but its also barely above noise
level.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 21:05 ` Christoph Lameter
@ 2007-03-28 21:16 ` William Lee Irwin III
2007-03-28 21:20 ` Christoph Lameter
2007-03-29 1:17 ` Benjamin LaHaise
1 sibling, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2007-03-28 21:16 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel
[... kernel compile garbage snipped ...]
On Wed, Mar 28, 2007 at 02:05:54PM -0700, Christoph Lameter wrote:
> Seems that there is a slight benefit but its also barely above noise
> level.
I already went over the methodological issues with kernel compiles.
You may be able to prove this, but not this way.
-- wli
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 21:16 ` William Lee Irwin III
@ 2007-03-28 21:20 ` Christoph Lameter
2007-03-28 22:37 ` William Lee Irwin III
0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2007-03-28 21:20 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Andrew Morton, linux-kernel
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
> On Wed, Mar 28, 2007 at 02:05:54PM -0700, Christoph Lameter wrote:
> > Seems that there is a slight benefit but its also barely above noise
> > level.
>
> I already went over the methodological issues with kernel compiles.
> You may be able to prove this, but not this way.
But this way is an established kernel way of doing things. Seems that my
AIM9 stuff was not convincing and I am not sure what other tests would be
acceptable. Could you post some of data regarding the improvements
possible through your patches?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 21:20 ` Christoph Lameter
@ 2007-03-28 22:37 ` William Lee Irwin III
2007-03-28 23:44 ` Christoph Lameter
0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2007-03-28 22:37 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
>> I already went over the methodological issues with kernel compiles.
>> You may be able to prove this, but not this way.
On Wed, Mar 28, 2007 at 02:20:20PM -0700, Christoph Lameter wrote:
> But this way is an established kernel way of doing things. Seems that my
> AIM9 stuff was not convincing and I am not sure what other tests would be
> acceptable. Could you post some of data regarding the improvements
> possible through your patches?
What I did, I did a number of years ago. Even if I could find the
results (and I don't even recall order-of-magnitude estimates) they
would be effectively irrelevant to modern kernels. The disaster in
all this was that the PTE caching never got merged. It's not much of
an observation to note that the primarily bottleneck is still there
when the patch to resolve it never got merged.
As far as kernel compiles being relevant to anything besides
potentially optimizing a particular major benchmark using gcc as one
of its components... yeah, right. It's too macro to be a microbenchmark
of anything and too micro to be pertinent to any meaningful
macrobenchmark such as those from major benchmark publishers (who can't
be named for trademark/etc. reasons). Hasn't it been at least 5 years
since people figured out kernel compiles were complete bulls**t as
benchmarks along with dbench for other reasons and several others? If
not, I don't know why I bother with this kernel at all.
Even so, I already did this and am done with it. It's not like I'm
not carrying around numerous patches I know will never be merged all
the time anyway. If you want to back it all out so badly, just do it
and stop bothering me about it, and I'll merely continue maintaining my
local patches without ever posting them as I have been for years. I'm
not at all happy with the NIH situation, either, not that I'm at such a
loss for ideas to need to contest every petty NIH that flies past.
-- wli
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 22:37 ` William Lee Irwin III
@ 2007-03-28 23:44 ` Christoph Lameter
2007-03-29 0:53 ` William Lee Irwin III
0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2007-03-28 23:44 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Andrew Morton, linux-kernel
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
> As far as kernel compiles being relevant to anything besides
> potentially optimizing a particular major benchmark using gcc as one
> of its components... yeah, right. It's too macro to be a microbenchmark
> of anything and too micro to be pertinent to any meaningful
> macrobenchmark such as those from major benchmark publishers (who can't
> be named for trademark/etc. reasons). Hasn't it been at least 5 years
> since people figured out kernel compiles were complete bulls**t as
> benchmarks along with dbench for other reasons and several others? If
> not, I don't know why I bother with this kernel at all.
All benchmarks have their specific drawbacks. I personally like to do code
review and see what cachelines are touched but that is basically imagining
what a cpu does. Ones thinking may be led astray.
> Even so, I already did this and am done with it. It's not like I'm
> not carrying around numerous patches I know will never be merged all
> the time anyway. If you want to back it all out so badly, just do it
> and stop bothering me about it, and I'll merely continue maintaining my
> local patches without ever posting them as I have been for years. I'm
> not at all happy with the NIH situation, either, not that I'm at such a
> loss for ideas to need to contest every petty NIH that flies past.
What is NIH? My main concern is to get the use of page struct fields of
the slab removed. We have to do special things to page sized allocs
because of these page struct uses. F.e. the private field is used by
compound pages and if any of the slabs allocate a higher order page that
field will be in use.
We certainly see even from the rudimentary tests that I have done that
the limited pgd, pmd caching has some effect. Could we please see your local
patches? And I guess that you must have some sort of benchmark that you
run to test these?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 23:44 ` Christoph Lameter
@ 2007-03-29 0:53 ` William Lee Irwin III
2007-03-29 1:09 ` Christoph Lameter
0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2007-03-29 0:53 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
>> As far as kernel compiles being relevant to anything besides
>> potentially optimizing a particular major benchmark using gcc as one
>> of its components... yeah, right. It's too macro to be a microbenchmark
>> of anything and too micro to be pertinent to any meaningful
>> macrobenchmark such as those from major benchmark publishers (who can't
>> be named for trademark/etc. reasons). Hasn't it been at least 5 years
>> since people figured out kernel compiles were complete bulls**t as
>> benchmarks along with dbench for other reasons and several others? If
>> not, I don't know why I bother with this kernel at all.
On Wed, Mar 28, 2007 at 04:44:01PM -0700, Christoph Lameter wrote:
> All benchmarks have their specific drawbacks. I personally like to do
> code review and see what cachelines are touched but that is basically
> imagining what a cpu does. Ones thinking may be led astray.
Well, the kernel compiles are just terrible at everything they could
plausibly be used to measure. I could, in principle, develop a benchmark
that simulates a forking server that does many things similar to what a
kernel compile is meant to measure without a number of its stupidities,
but I've got enough to do already.
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
>> Even so, I already did this and am done with it. It's not like I'm
>> not carrying around numerous patches I know will never be merged all
>> the time anyway. If you want to back it all out so badly, just do it
>> and stop bothering me about it, and I'll merely continue maintaining my
>> local patches without ever posting them as I have been for years. I'm
>> not at all happy with the NIH situation, either, not that I'm at such a
>> loss for ideas to need to contest every petty NIH that flies past.
On Wed, Mar 28, 2007 at 04:44:01PM -0700, Christoph Lameter wrote:
> What is NIH? My main concern is to get the use of page struct fields of
> the slab removed. We have to do special things to page sized allocs
> because of these page struct uses. F.e. the private field is used by
> compound pages and if any of the slabs allocate a higher order page that
> field will be in use.
NIH == "Not Invented Here." Basically a sort of idea theft, often used
to grab credit for patches. You're not the one involved there. That was
a digression. One could say, though, that a solution to the slab issues
is to NIH slab allocators e.g. via quicklist.h/quicklist.c without the
negative connotation.
On Wed, Mar 28, 2007 at 04:44:01PM -0700, Christoph Lameter wrote:
> We certainly see even from the rudimentary tests that I have done
> that the limited pgd, pmd caching has some effect. Could we please
> see your local patches? And I guess that you must have some sort of
> benchmark that you run to test these?
Short answer: No.
Long answer: Most of the local patches are not likely to be of interest
to the world at large. The ones I probably don't mind mentioning so
much are things like ports of ipt_TARPIT.c to -CURRENT, support for
mmap() of /proc/profile, things to make the notsc boot parameter
actually do what you'd expect it to do instead of the kernel ignoring
the option when you actually need it and mucking with the TSC behind
your back, and so on. There are also things I'd rather keep under wraps
so they don't mysteriously appear on lkml a few years later posted by
someone else without any attribution to me (i.e. the NIH's that bother
me). I've not got any of them ported to current mainline anyway, and
some data loss from fried disks seems to have eaten most/all of the
post-2.6.0 revisions of these patches anyway, though I've got compiled
kernels with them on various kernel versions between then and 2.6.10
(not to say that's any impediment to my hammering out fresh ports).
-- wli
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-29 0:53 ` William Lee Irwin III
@ 2007-03-29 1:09 ` Christoph Lameter
0 siblings, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-03-29 1:09 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Andrew Morton, linux-kernel
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
> NIH == "Not Invented Here." Basically a sort of idea theft, often used
> to grab credit for patches. You're not the one involved there. That was
> a digression. One could say, though, that a solution to the slab issues
> is to NIH slab allocators e.g. via quicklist.h/quicklist.c without the
> negative connotation.
Oh. The quicklist were actually taken from existing IA64 code. Not my idea
either. I am not wedded to any solution and I was certainly not intending
to abscond with your idea. Would have been difficult given that there was
a signoff line with your name on it.
> On Wed, Mar 28, 2007 at 04:44:01PM -0700, Christoph Lameter wrote:
> > We certainly see even from the rudimentary tests that I have done
> > that the limited pgd, pmd caching has some effect. Could we please
> > see your local patches? And I guess that you must have some sort of
> > benchmark that you run to test these?
>
> Short answer: No.
>
> Long answer: Most of the local patches are not likely to be of interest
> to the world at large. The ones I probably don't mind mentioning so
> much are things like ports of ipt_TARPIT.c to -CURRENT, support for
> mmap() of /proc/profile, things to make the notsc boot parameter
> actually do what you'd expect it to do instead of the kernel ignoring
> the option when you actually need it and mucking with the TSC behind
> your back, and so on. There are also things I'd rather keep under wraps
> so they don't mysteriously appear on lkml a few years later posted by
> someone else without any attribution to me (i.e. the NIH's that bother
> me). I've not got any of them ported to current mainline anyway, and
> some data loss from fried disks seems to have eaten most/all of the
> post-2.6.0 revisions of these patches anyway, though I've got compiled
> kernels with them on various kernel versions between then and 2.6.10
> (not to say that's any impediment to my hammering out fresh ports).
Ummm..... So nothing concrete on the performance issues that we
are considering here? We are talking about something that was lost
a couple of years ago? There are certainly other people who will have the
same ideas given enough time. Software and patches age like groceries.
Hiding them will just make them wither away.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 21:05 ` Christoph Lameter
2007-03-28 21:16 ` William Lee Irwin III
@ 2007-03-29 1:17 ` Benjamin LaHaise
1 sibling, 0 replies; 23+ messages in thread
From: Benjamin LaHaise @ 2007-03-29 1:17 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel, William Lee Irwin III
On Wed, Mar 28, 2007 at 02:05:54PM -0700, Christoph Lameter wrote:
> Tried this also on x86_64 with an enhanced quicklist patch that also deals
> with ptes (at the price of not guaranteeing the free after the tlb flush):
...
> Seems that there is a slight benefit but its also barely above noise
> level.
You're not running a test that will show any benefit in this area. Run
some heavy shell scripts or lmbench's fork() and exec() latency tests
to get real numbers.
-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <zyntrop@kvack.org>.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 20:12 [RFC] i386: Remove page sized slabs for pgds and pmds Christoph Lameter
2007-03-28 20:45 ` Andrew Morton
@ 2007-03-28 20:50 ` William Lee Irwin III
2007-03-28 21:00 ` Christoph Lameter
2007-03-28 22:26 ` Chris Wright
2 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2007-03-28 20:50 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-kernel, akpm
On Wed, Mar 28, 2007 at 01:12:56PM -0700, Christoph Lameter wrote:
> The benefit of preconstructed pgds and pmds in the i386 arch code seem to
> be debatable. The performance measurements indicate that there may be a slight
> benefit but it seems to almost vanish in the noise ratio.
> Method used (i386 1G memory):
> 1. Boot kernel
> 2. make clean
> 3. time make all
Getting the relevant results without tremendous amounts of noise from
other kernel activity needs something like lmbench's fault and fork()
microbenchmarks. Also, /proc/profile and/or oprofile results would be
useful here to get useful notions of what's happening performancewise,
in particular oprofile with L2 cache miss performance counters.
The magnitude of the degradation in the specific operations can't
really be established without these sorts of affairs. We need to know
how large a fraction of the workload is attributable to the operations
in question. Profiling would already give something of a breakdown but
it's difficult to get the numbers to add up properly or to separate
fork() from faults, so the microbenchmarks focused on the particular
operations are needed. The profiles are still needed to understand how
the time shuffled around and to either confirm or invalidate notions
regarding cache behavior.
In any event, I already know the answers because I already did all this
a number of years ago: it's all dominated by PTE caching, which was
never merged, but I carried out-of-tree for several years, so of course
it's in the noise. PMD and PTE caching are only pertinent to fork()
anyway, so the vast majority of your workload is unaffected, and it's
even more "in the noise" due to that. I can't be arsed to care about it
anymore since i386 became a target-system-only architecture for me and
am sick of the flak coming at me for my pagetable caching code anyway.
Hence this patch.
-- wli
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 20:50 ` William Lee Irwin III
@ 2007-03-28 21:00 ` Christoph Lameter
2007-03-28 21:24 ` William Lee Irwin III
0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2007-03-28 21:00 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: linux-kernel, akpm
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
> Getting the relevant results without tremendous amounts of noise from
> other kernel activity needs something like lmbench's fault and fork()
> microbenchmarks. Also, /proc/profile and/or oprofile results would be
> useful here to get useful notions of what's happening performancewise,
> in particular oprofile with L2 cache miss performance counters.
The machine had a minimal debian root and it was run on a serial console.
> it's in the noise. PMD and PTE caching are only pertinent to fork()
> anyway, so the vast majority of your workload is unaffected, and it's
I'd be interested to see some numbers here. I still believe the situation
to be better on IA64 and other platforms due to the larger page sizes
containing many more cachelines which should make the effect bigger.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 21:00 ` Christoph Lameter
@ 2007-03-28 21:24 ` William Lee Irwin III
2007-03-28 21:38 ` Christoph Lameter
0 siblings, 1 reply; 23+ messages in thread
From: William Lee Irwin III @ 2007-03-28 21:24 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-kernel, akpm
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
>> Getting the relevant results without tremendous amounts of noise from
>> other kernel activity needs something like lmbench's fault and fork()
>> microbenchmarks. Also, /proc/profile and/or oprofile results would be
>> useful here to get useful notions of what's happening performancewise,
>> in particular oprofile with L2 cache miss performance counters.
On Wed, Mar 28, 2007 at 02:00:44PM -0700, Christoph Lameter wrote:
> The machine had a minimal debian root and it was run on a serial console.
The other activity I referred to was the massive amount of crap like
mremap(), gettimeofday(), open()/close()/etc., spinning in userspace,
etc. that kernel compiles do. An otherwise quiescent system is a
distinct prerequisite for validity. Most of what you've "proven" with
the kernel compile is that pagetable construction is probably not a
significant portion of kernel compiles and not anything about relative
speed apart from eager construction being unquantifiably slower.
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
>> it's in the noise. PMD and PTE caching are only pertinent to fork()
>> anyway, so the vast majority of your workload is unaffected, and it's
On Wed, Mar 28, 2007 at 02:00:44PM -0700, Christoph Lameter wrote:
> I'd be interested to see some numbers here. I still believe the situation
> to be better on IA64 and other platforms due to the larger page sizes
> containing many more cachelines which should make the effect bigger.
Bad quoting; this is reminiscent of PR hackery where subsequences of
words from the middle of sentences are cut out with audio editors in
order to form an embarrassing quote. The full paragraph:
On Wed, Mar 28, 2007 at 01:50:55PM -0700, William Lee Irwin III wrote:
> In any event, I already know the answers because I already did all this
> a number of years ago: it's all dominated by PTE caching, which was
> never merged, but I carried out-of-tree for several years, so of course
> it's in the noise. PMD and PTE caching are only pertinent to fork()
> anyway, so the vast majority of your workload is unaffected, and it's
> even more "in the noise" due to that. I can't be arsed to care about it
> anymore since i386 became a target-system-only architecture for me and
> am sick of the flak coming at me for my pagetable caching code anyway.
> Hence this patch.
I'd also appreciate if it were mentioned who actually wrote this, given
that the patch posted was what I sent you verbatim (and actually
requested you not post for good reasons, some centering around pageattr.c).
The salient point from the above, which was conveniently omitted, was
"it's all dominated by PTE caching." However, even this needs some
qualification having to do with the VSZ of the parent process in fork().
For very small processes, such as cannot be constructed with glibc due
to its massive bloat and cannot be constructed with the standard process
address space layouts due to stacks being too distant from text and
heap, PTE's actually don't substantially outnumber PMD's et al, so
there is more to this story.
-- wli
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 21:24 ` William Lee Irwin III
@ 2007-03-28 21:38 ` Christoph Lameter
2007-03-28 22:01 ` William Lee Irwin III
0 siblings, 1 reply; 23+ messages in thread
From: Christoph Lameter @ 2007-03-28 21:38 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: linux-kernel, akpm
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
> I'd also appreciate if it were mentioned who actually wrote this, given
> that the patch posted was what I sent you verbatim (and actually
> requested you not post for good reasons, some centering around pageattr.c).
Ughhh. Sorry I had this in there in early editions but removed the
paragraph later.
> The salient point from the above, which was conveniently omitted, was
> "it's all dominated by PTE caching." However, even this needs some
> qualification having to do with the VSZ of the parent process in fork().
> For very small processes, such as cannot be constructed with glibc due
> to its massive bloat and cannot be constructed with the standard process
> address space layouts due to stacks being too distant from text and
> heap, PTE's actually don't substantially outnumber PMD's et al, so
> there is more to this story.
No that was described in the patch. Quote:
"i386 only provides support for caching constructed pgd and pmds. These
are comparatively rare to ptes so it is no surprise that the current
approach has only minimal effect. ...."
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 21:38 ` Christoph Lameter
@ 2007-03-28 22:01 ` William Lee Irwin III
2007-03-28 22:22 ` Christoph Lameter
2007-03-29 0:28 ` Alan Cox
0 siblings, 2 replies; 23+ messages in thread
From: William Lee Irwin III @ 2007-03-28 22:01 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-kernel, akpm
On Wed, Mar 28, 2007 at 02:38:55PM -0700, Christoph Lameter wrote:
> No that was described in the patch. Quote:
> "i386 only provides support for caching constructed pgd and pmds. These
> are comparatively rare to ptes so it is no surprise that the current
> approach has only minimal effect. ...."
And where was the mention of this being a patch I sent you in a private
reply verbatim, and furthermore asked you not to post?
-- wli
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 22:01 ` William Lee Irwin III
@ 2007-03-28 22:22 ` Christoph Lameter
2007-03-29 0:28 ` Alan Cox
1 sibling, 0 replies; 23+ messages in thread
From: Christoph Lameter @ 2007-03-28 22:22 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: linux-kernel, akpm
On Wed, 28 Mar 2007, William Lee Irwin III wrote:
> On Wed, Mar 28, 2007 at 02:38:55PM -0700, Christoph Lameter wrote:
> > No that was described in the patch. Quote:
> > "i386 only provides support for caching constructed pgd and pmds. These
> > are comparatively rare to ptes so it is no surprise that the current
> > approach has only minimal effect. ...."
>
> And where was the mention of this being a patch I sent you in a private
> reply verbatim, and furthermore asked you not to post?
Yes it was private and you told me to be careful about "waving this patch
around". No mention of not posting it.
And I repeat that I am sorry to have removed the paragraph that mentioned
you being the author during rewrites of the text. Signoff line is there.
This is an RFC so we can still add this if we want to apply it at all.
I think we need to discuss this openly. It seems that I am getting into
unknown minefields of an ancient discussion between you and Andrew.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 22:01 ` William Lee Irwin III
2007-03-28 22:22 ` Christoph Lameter
@ 2007-03-29 0:28 ` Alan Cox
2007-03-29 0:59 ` William Lee Irwin III
1 sibling, 1 reply; 23+ messages in thread
From: Alan Cox @ 2007-03-29 0:28 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: linux-kernel, akpm
On Wed, 28 Mar 2007 15:01:31 -0700
William Lee Irwin III <wli@holomorphy.com> wrote:
> On Wed, Mar 28, 2007 at 02:38:55PM -0700, Christoph Lameter wrote:
> > No that was described in the patch. Quote:
> > "i386 only provides support for caching constructed pgd and pmds. These
> > are comparatively rare to ptes so it is no surprise that the current
> > approach has only minimal effect. ...."
Whatever it was originally for and public or not, the above isn't true
for some non Intel products...
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-29 0:28 ` Alan Cox
@ 2007-03-29 0:59 ` William Lee Irwin III
0 siblings, 0 replies; 23+ messages in thread
From: William Lee Irwin III @ 2007-03-29 0:59 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel, akpm
On Wed, Mar 28, 2007 at 02:38:55PM -0700, Christoph Lameter wrote:
>>> No that was described in the patch. Quote:
>>> "i386 only provides support for caching constructed pgd and pmds. These
>>> are comparatively rare to ptes so it is no surprise that the current
>>> approach has only minimal effect. ...."
On Thu, Mar 29, 2007 at 01:28:59AM +0100, Alan Cox wrote:
> Whatever it was originally for and public or not, the above isn't true
> for some non Intel products...
Sorry if the descriptions here are misleading. This is basically an
attempt to have the kernel keep preconstructed pagetables around so
that the bitblitting hits need not be repetitively taken during fork()
and faults, where counterarguments revolve around whether this is
actually a hit at all and whether it's significant. It's not related to
the transparent quasi-ASID/ASN affairs AMD has based on %cr3 contents.
-- wli
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 20:12 [RFC] i386: Remove page sized slabs for pgds and pmds Christoph Lameter
2007-03-28 20:45 ` Andrew Morton
2007-03-28 20:50 ` William Lee Irwin III
@ 2007-03-28 22:26 ` Chris Wright
2007-03-28 22:33 ` William Lee Irwin III
2 siblings, 1 reply; 23+ messages in thread
From: Chris Wright @ 2007-03-28 22:26 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-kernel, William Lee Irwin III, akpm, jeremy, zach
* Christoph Lameter (clameter@sgi.com) wrote:
> +#ifdef CONFIG_HIGHMEM64G
> +#define __pgd_alloc() kmem_cache_alloc(pgd_cache, GFP_KERNEL|__GFP_REPEAT)
> +#define __pgd_free(pgd) kmem_cache_free(pgd_cache, pgd)
I must've glazed over something, I thought this was removal of slabs?
BTW, this will interact shared_kernel_pmd patch that Jeremy's posted a
few times (I know at least wli has looked over that one). We need to
make sure that PAE under at least Xen hypervisor has a page-sized pgd,
although the mmlist chaining looks nice to me.
> +static struct kmem_cache *pgd_cache;
> +
> +void __init pgtable_cache_init(void)
> +{
> + pgd_cache = kmem_cache_create("pgd",
> + PTRS_PER_PGD*sizeof(pgd_t),
> + PTRS_PER_PGD*sizeof(pgd_t),
> + SLAB_PANIC,
> + NULL,
> + NULL);
> +}
> +#else /* !CONFIG_HIGHMEM64G */
> +#define __pgd_alloc() ((pgd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT))
> +#define __pgd_free(pgd) free_page((unsigned long)(pgd))
> +#endif /* !CONFIG_HIGHMEM64G */
>
> pgd_t *pgd_alloc(struct mm_struct *mm)
> {
> int i;
> - pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL);
> + pgd_t *pgd = __pgd_alloc();
>
> - if (PTRS_PER_PMD == 1 || !pgd)
> + if (!pgd)
> + return NULL;
> + memcpy(&pgd[USER_PTRS_PER_PGD], &swapper_pg_dir[USER_PTRS_PER_PGD],
> + KERNEL_PGD_PTRS*sizeof(pgd_t));
clone_pgd_range() for consistency? and it seems we lost a paravirt_alloc_pd_clone()
in there somewhere.
> + if (PTRS_PER_PMD == 1)
> return pgd;
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 22:26 ` Chris Wright
@ 2007-03-28 22:33 ` William Lee Irwin III
2007-03-28 22:45 ` Chris Wright
2007-03-28 23:42 ` Zachary Amsden
0 siblings, 2 replies; 23+ messages in thread
From: William Lee Irwin III @ 2007-03-28 22:33 UTC (permalink / raw)
To: Chris Wright; +Cc: Christoph Lameter, linux-kernel, akpm, jeremy, zach
* Christoph Lameter (clameter@sgi.com) wrote:
>> +#ifdef CONFIG_HIGHMEM64G
>> +#define __pgd_alloc() kmem_cache_alloc(pgd_cache, GFP_KERNEL|__GFP_REPEAT)
>> +#define __pgd_free(pgd) kmem_cache_free(pgd_cache, pgd)
On Wed, Mar 28, 2007 at 03:26:56PM -0700, Chris Wright wrote:
> I must've glazed over something, I thought this was removal of slabs?
The pgd slab is not fully removable in the PAE case because a dedicated
slab is the only way to enforce alignment for allocations as small as
PAE PGD's.
On Wed, Mar 28, 2007 at 03:26:56PM -0700, Chris Wright wrote:
> BTW, this will interact shared_kernel_pmd patch that Jeremy's posted a
> few times (I know at least wli has looked over that one). We need to
> make sure that PAE under at least Xen hypervisor has a page-sized pgd,
> although the mmlist chaining looks nice to me.
That, not to mention the total lack of verification of the pageattr.c
code, are among the reasons I didn't want it posted.
* Christoph Lameter (clameter@sgi.com) wrote:
>> + memcpy(&pgd[USER_PTRS_PER_PGD], &swapper_pg_dir[USER_PTRS_PER_PGD],
>> + KERNEL_PGD_PTRS*sizeof(pgd_t));
On Wed, Mar 28, 2007 at 03:26:56PM -0700, Chris Wright wrote:
> clone_pgd_range() for consistency? and it seems we lost a
> paravirt_alloc_pd_clone() in there somewhere.
Yes, another reason why it shouldn't have been posted as-is. It was not
intended to for anything more than comparative benchmarking on systems
without graphics running on the bare metal as opposed to Xen/etc. guests.
-- wli
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 22:33 ` William Lee Irwin III
@ 2007-03-28 22:45 ` Chris Wright
2007-03-28 23:42 ` Zachary Amsden
1 sibling, 0 replies; 23+ messages in thread
From: Chris Wright @ 2007-03-28 22:45 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Chris Wright, Christoph Lameter, linux-kernel, akpm, jeremy, zach
* William Lee Irwin III (wli@holomorphy.com) wrote:
> * Christoph Lameter (clameter@sgi.com) wrote:
> >> +#ifdef CONFIG_HIGHMEM64G
> >> +#define __pgd_alloc() kmem_cache_alloc(pgd_cache, GFP_KERNEL|__GFP_REPEAT)
> >> +#define __pgd_free(pgd) kmem_cache_free(pgd_cache, pgd)
>
> On Wed, Mar 28, 2007 at 03:26:56PM -0700, Chris Wright wrote:
> > I must've glazed over something, I thought this was removal of slabs?
>
> The pgd slab is not fully removable in the PAE case because a dedicated
> slab is the only way to enforce alignment for allocations as small as
> PAE PGD's.
Heh, yeah "page sized" is the part i glazed over, my fault.
> On Wed, Mar 28, 2007 at 03:26:56PM -0700, Chris Wright wrote:
> > BTW, this will interact shared_kernel_pmd patch that Jeremy's posted a
> > few times (I know at least wli has looked over that one). We need to
> > make sure that PAE under at least Xen hypervisor has a page-sized pgd,
> > although the mmlist chaining looks nice to me.
>
> That, not to mention the total lack of verification of the pageattr.c
> code, are among the reasons I didn't want it posted.
>
>
> * Christoph Lameter (clameter@sgi.com) wrote:
> >> + memcpy(&pgd[USER_PTRS_PER_PGD], &swapper_pg_dir[USER_PTRS_PER_PGD],
> >> + KERNEL_PGD_PTRS*sizeof(pgd_t));
>
> On Wed, Mar 28, 2007 at 03:26:56PM -0700, Chris Wright wrote:
> > clone_pgd_range() for consistency? and it seems we lost a
> > paravirt_alloc_pd_clone() in there somewhere.
>
> Yes, another reason why it shouldn't have been posted as-is. It was not
> intended to for anything more than comparative benchmarking on systems
> without graphics running on the bare metal as opposed to Xen/etc. guests.
OK, all good here. Just wanted to make sure things didn't collide
too badly.
thanks,
-chris
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 22:33 ` William Lee Irwin III
2007-03-28 22:45 ` Chris Wright
@ 2007-03-28 23:42 ` Zachary Amsden
2007-03-28 22:47 ` Chris Wright
1 sibling, 1 reply; 23+ messages in thread
From: Zachary Amsden @ 2007-03-28 23:42 UTC (permalink / raw)
To: William Lee Irwin III
Cc: Chris Wright, Christoph Lameter, linux-kernel, akpm, jeremy
William Lee Irwin III wrote:
>> clone_pgd_range() for consistency? and it seems we lost a
>> paravirt_alloc_pd_clone() in there somewhere.
>>
>
> Yes, another reason why it shouldn't have been posted as-is. It was not
> intended to for anything more than comparative benchmarking on systems
> without graphics running on the bare metal as opposed to Xen/etc. guests.
>
So clone_pgd_range is mostly useless now. Originally, I intended it to
take the part of paravirt_alloc_pd_clone. We should probably merge the
two into just one function, unless someone thinks clone_pgd_range is
actually useful for something.
Zach
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC] i386: Remove page sized slabs for pgds and pmds
2007-03-28 23:42 ` Zachary Amsden
@ 2007-03-28 22:47 ` Chris Wright
0 siblings, 0 replies; 23+ messages in thread
From: Chris Wright @ 2007-03-28 22:47 UTC (permalink / raw)
To: Zachary Amsden
Cc: William Lee Irwin III, Chris Wright, Christoph Lameter,
linux-kernel, akpm, jeremy
* Zachary Amsden (zach@vmware.com) wrote:
> William Lee Irwin III wrote:
> >>clone_pgd_range() for consistency? and it seems we lost a
> >>paravirt_alloc_pd_clone() in there somewhere.
> >>
> >
> >Yes, another reason why it shouldn't have been posted as-is. It was not
> >intended to for anything more than comparative benchmarking on systems
> >without graphics running on the bare metal as opposed to Xen/etc. guests.
> >
>
> So clone_pgd_range is mostly useless now. Originally, I intended it to
> take the part of paravirt_alloc_pd_clone. We should probably merge the
> two into just one function, unless someone thinks clone_pgd_range is
> actually useful for something.
No, I was going to suggest just that. It was orginially introduced as
the place holder for that IIRC.
thanks,
-chris
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2007-03-29 1:18 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-28 20:12 [RFC] i386: Remove page sized slabs for pgds and pmds Christoph Lameter
2007-03-28 20:45 ` Andrew Morton
2007-03-28 21:05 ` Christoph Lameter
2007-03-28 21:16 ` William Lee Irwin III
2007-03-28 21:20 ` Christoph Lameter
2007-03-28 22:37 ` William Lee Irwin III
2007-03-28 23:44 ` Christoph Lameter
2007-03-29 0:53 ` William Lee Irwin III
2007-03-29 1:09 ` Christoph Lameter
2007-03-29 1:17 ` Benjamin LaHaise
2007-03-28 20:50 ` William Lee Irwin III
2007-03-28 21:00 ` Christoph Lameter
2007-03-28 21:24 ` William Lee Irwin III
2007-03-28 21:38 ` Christoph Lameter
2007-03-28 22:01 ` William Lee Irwin III
2007-03-28 22:22 ` Christoph Lameter
2007-03-29 0:28 ` Alan Cox
2007-03-29 0:59 ` William Lee Irwin III
2007-03-28 22:26 ` Chris Wright
2007-03-28 22:33 ` William Lee Irwin III
2007-03-28 22:45 ` Chris Wright
2007-03-28 23:42 ` Zachary Amsden
2007-03-28 22:47 ` Chris Wright
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.