From: Manfred Spraul <manfred@colorfullife.com>
To: Ravikiran G Thirumalai <kiran@in.ibm.com>
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org,
Robert Love <rml@tech9.net>,
dipankar@in.ibm.com
Subject: Re: [patch] Make slab allocator work with SLAB_MUST_HWCACHE_ALIGN
Date: Thu, 11 Sep 2003 23:49:50 +0200 [thread overview]
Message-ID: <3F60EDFE.5090502@colorfullife.com> (raw)
In-Reply-To: <3F60A08A.7040504@colorfullife.com>
[-- Attachment #1: Type: text/plain, Size: 541 bytes --]
Attached is a forward port of my arbitrary-align patch from 2.4.something.
Only partially tested. And a small correction: rmap needs alignment to
it's own object size, it crashes immediately if this is not provided by
slab. Probably it's a good idea to use the new API to decouple the
pte_chain size from the L1_CACHE_SIZE: I'd bet that 32-byte for a pIII
is too small.
I'm still thinking how to implement kmem_cache_alloc_forcpu() without
having to change too much in slab - it's not as simple as I assumed
initially
--
Manfred
[-- Attachment #2: patch-slab-alignobj --]
[-- Type: text/plain, Size: 9835 bytes --]
// $Header$
// Kernel Version:
// VERSION = 2
// PATCHLEVEL = 6
// SUBLEVEL = 0
// EXTRAVERSION = -test5-mm1
--- 2.6/mm/slab.c 2003-09-11 22:30:47.000000000 +0200
+++ build-2.6/mm/slab.c 2003-09-11 23:42:16.000000000 +0200
@@ -268,6 +268,7 @@
unsigned int colour_off; /* colour offset */
unsigned int colour_next; /* cache colouring */
kmem_cache_t *slabp_cache;
+ unsigned int slab_size;
unsigned int dflags; /* dynamic flags */
/* constructor func */
@@ -488,7 +489,7 @@
.objsize = sizeof(kmem_cache_t),
.flags = SLAB_NO_REAP,
.spinlock = SPIN_LOCK_UNLOCKED,
- .colour_off = L1_CACHE_BYTES,
+ .colour_off = SMP_CACHE_BYTES,
.name = "kmem_cache",
};
@@ -523,7 +524,7 @@
static void enable_cpucache (kmem_cache_t *cachep);
/* Cal the num objs, wastage, and bytes left over for a given slab size. */
-static void cache_estimate (unsigned long gfporder, size_t size,
+static void cache_estimate (unsigned long gfporder, size_t size, size_t align,
int flags, size_t *left_over, unsigned int *num)
{
int i;
@@ -536,7 +537,7 @@
extra = sizeof(kmem_bufctl_t);
}
i = 0;
- while (i*size + L1_CACHE_ALIGN(base+i*extra) <= wastage)
+ while (i*size + ALIGN(base+i*extra, align) <= wastage)
i++;
if (i > 0)
i--;
@@ -546,7 +547,7 @@
*num = i;
wastage -= i*size;
- wastage -= L1_CACHE_ALIGN(base+i*extra);
+ wastage -= ALIGN(base+i*extra, align);
*left_over = wastage;
}
@@ -690,14 +691,15 @@
list_add(&cache_cache.next, &cache_chain);
cache_cache.array[smp_processor_id()] = &initarray_cache.cache;
- cache_estimate(0, cache_cache.objsize, 0,
- &left_over, &cache_cache.num);
+ cache_estimate(0, cache_cache.objsize, SMP_CACHE_BYTES, 0,
+ &left_over, &cache_cache.num);
if (!cache_cache.num)
BUG();
cache_cache.colour = left_over/cache_cache.colour_off;
cache_cache.colour_next = 0;
-
+ cache_cache.slab_size = ALIGN(cache_cache.num*sizeof(kmem_bufctl_t) + sizeof(struct slab),
+ SMP_CACHE_BYTES);
/* 2+3) create the kmalloc caches */
sizes = malloc_sizes;
@@ -993,7 +995,7 @@
* kmem_cache_create - Create a cache.
* @name: A string which is used in /proc/slabinfo to identify this cache.
* @size: The size of objects to be created in this cache.
- * @offset: The offset to use within the page.
+ * @align: The required alignment for the objects.
* @flags: SLAB flags
* @ctor: A constructor for the objects.
* @dtor: A destructor for the objects.
@@ -1018,17 +1020,15 @@
* %SLAB_NO_REAP - Don't automatically reap this cache when we're under
* memory pressure.
*
- * %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware
- * cacheline. This can be beneficial if you're counting cycles as closely
- * as davem.
+ * %SLAB_HWCACHE_ALIGN - This flag has no effect and will be removed soon.
*/
kmem_cache_t *
-kmem_cache_create (const char *name, size_t size, size_t offset,
+kmem_cache_create (const char *name, size_t size, size_t align,
unsigned long flags, void (*ctor)(void*, kmem_cache_t *, unsigned long),
void (*dtor)(void*, kmem_cache_t *, unsigned long))
{
const char *func_nm = KERN_ERR "kmem_create: ";
- size_t left_over, align, slab_size;
+ size_t left_over, slab_size;
kmem_cache_t *cachep = NULL;
/*
@@ -1039,7 +1039,7 @@
(size < BYTES_PER_WORD) ||
(size > (1<<MAX_OBJ_ORDER)*PAGE_SIZE) ||
(dtor && !ctor) ||
- (offset < 0 || offset > size))
+ (align < 0))
BUG();
#if DEBUG
@@ -1051,22 +1051,16 @@
#if FORCED_DEBUG
/*
- * Enable redzoning and last user accounting, except
- * - for caches with forced alignment: redzoning would violate the
- * alignment
- * - for caches with large objects, if the increased size would
- * increase the object size above the next power of two: caches
- * with object sizes just above a power of two have a significant
- * amount of internal fragmentation
+ * Enable redzoning and last user accounting, except for caches with
+ * large objects, if the increased size would increase the object size
+ * above the next power of two: caches with object sizes just above a
+ * power of two have a significant amount of internal fragmentation.
*/
- if ((size < 4096 || fls(size-1) == fls(size-1+3*BYTES_PER_WORD))
- && !(flags & SLAB_MUST_HWCACHE_ALIGN)) {
+ if ((size < 4096 || fls(size-1) == fls(size-1+3*BYTES_PER_WORD)))
flags |= SLAB_RED_ZONE|SLAB_STORE_USER;
- }
flags |= SLAB_POISON;
#endif
#endif
-
/*
* Always checks flags, a caller might be expecting debug
* support which isn't available.
@@ -1074,15 +1068,23 @@
if (flags & ~CREATE_MASK)
BUG();
+ if (align) {
+ /* minimum supported alignment: */
+ if (align < BYTES_PER_WORD)
+ align = BYTES_PER_WORD;
+
+ /* combinations of forced alignment and advanced debugging is
+ * not yet implemented.
+ */
+ flags &= ~(SLAB_RED_ZONE|SLAB_STORE_USER);
+ }
+
/* Get cache's description obj. */
cachep = (kmem_cache_t *) kmem_cache_alloc(&cache_cache, SLAB_KERNEL);
if (!cachep)
goto opps;
memset(cachep, 0, sizeof(kmem_cache_t));
-#if DEBUG
- cachep->reallen = size;
-#endif
/* Check that size is in terms of words. This is needed to avoid
* unaligned accesses for some archs when redzoning is used, and makes
* sure any on-slab bufctl's are also correctly aligned.
@@ -1094,20 +1096,25 @@
}
#if DEBUG
+ cachep->reallen = size;
+
if (flags & SLAB_RED_ZONE) {
- /*
- * There is no point trying to honour cache alignment
- * when redzoning.
- */
- flags &= ~SLAB_HWCACHE_ALIGN;
+ /* redzoning only works with word aligned caches */
+ align = BYTES_PER_WORD;
+
/* add space for red zone words */
cachep->dbghead += BYTES_PER_WORD;
size += 2*BYTES_PER_WORD;
}
if (flags & SLAB_STORE_USER) {
- flags &= ~SLAB_HWCACHE_ALIGN;
- size += BYTES_PER_WORD; /* add space */
+ /* user store requires word alignment and
+ * one word storage behind the end of the real
+ * object.
+ */
+ align = BYTES_PER_WORD;
+ size += BYTES_PER_WORD;
}
+
#if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC)
if (size > 128 && cachep->reallen > L1_CACHE_BYTES && size < PAGE_SIZE) {
cachep->dbghead += PAGE_SIZE - size;
@@ -1115,9 +1122,6 @@
}
#endif
#endif
- align = BYTES_PER_WORD;
- if (flags & SLAB_HWCACHE_ALIGN)
- align = L1_CACHE_BYTES;
/* Determine if the slab management is 'on' or 'off' slab. */
if (size >= (PAGE_SIZE>>3))
@@ -1127,13 +1131,16 @@
*/
flags |= CFLGS_OFF_SLAB;
- if (flags & SLAB_HWCACHE_ALIGN) {
- /* Need to adjust size so that objs are cache aligned. */
- /* Small obj size, can get at least two per cache line. */
+ if (!align) {
+ /* Default alignment: compile time specified l1 cache size.
+ * But if an object is really small, then squeeze multiple
+ * into one cacheline.
+ */
+ align = L1_CACHE_BYTES;
while (size <= align/2)
align /= 2;
- size = (size+align-1)&(~(align-1));
}
+ size = ALIGN(size, align);
/* Cal size (in pages) of slabs, and the num of objs per slab.
* This could be made much more intelligent. For now, try to avoid
@@ -1143,7 +1150,7 @@
do {
unsigned int break_flag = 0;
cal_wastage:
- cache_estimate(cachep->gfporder, size, flags,
+ cache_estimate(cachep->gfporder, size, align, flags,
&left_over, &cachep->num);
if (break_flag)
break;
@@ -1177,8 +1184,8 @@
cachep = NULL;
goto opps;
}
- slab_size = L1_CACHE_ALIGN(cachep->num*sizeof(kmem_bufctl_t) +
- sizeof(struct slab));
+ slab_size = ALIGN(cachep->num*sizeof(kmem_bufctl_t) + sizeof(struct slab),
+ align);
/*
* If the slab has been placed off-slab, and we have enough space then
@@ -1189,14 +1196,17 @@
left_over -= slab_size;
}
+ if (flags & CFLGS_OFF_SLAB) {
+ /* really off slab. No need for manual alignment */
+ slab_size = cachep->num*sizeof(kmem_bufctl_t)+sizeof(struct slab);
+ }
+
+ cachep->colour_off = L1_CACHE_BYTES;
/* Offset must be a multiple of the alignment. */
- offset += (align-1);
- offset &= ~(align-1);
- if (!offset)
- offset = L1_CACHE_BYTES;
- cachep->colour_off = offset;
- cachep->colour = left_over/offset;
-
+ if (cachep->colour_off < align)
+ cachep->colour_off = align;
+ cachep->colour = left_over/cachep->colour_off;
+ cachep->slab_size = slab_size;
cachep->flags = flags;
cachep->gfpflags = 0;
if (flags & SLAB_CACHE_DMA)
@@ -1470,8 +1480,7 @@
return NULL;
} else {
slabp = objp+colour_off;
- colour_off += L1_CACHE_ALIGN(cachep->num *
- sizeof(kmem_bufctl_t) + sizeof(struct slab));
+ colour_off += cachep->slab_size;
}
slabp->inuse = 0;
slabp->colouroff = colour_off;
--- 2.6/arch/i386/mm/init.c 2003-09-11 22:30:47.000000000 +0200
+++ build-2.6/arch/i386/mm/init.c 2003-09-11 22:38:57.000000000 +0200
@@ -509,8 +509,8 @@
if (PTRS_PER_PMD > 1) {
pmd_cache = kmem_cache_create("pmd",
PTRS_PER_PMD*sizeof(pmd_t),
+ PTRS_PER_PMD*sizeof(pmd_t),
0,
- SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
pmd_ctor,
NULL);
if (!pmd_cache)
@@ -519,8 +519,8 @@
if (TASK_SIZE > PAGE_OFFSET) {
kpmd_cache = kmem_cache_create("kpmd",
PTRS_PER_PMD*sizeof(pmd_t),
+ PTRS_PER_PMD*sizeof(pmd_t),
0,
- SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
kpmd_ctor,
NULL);
if (!kpmd_cache)
@@ -541,8 +541,8 @@
pgd_cache = kmem_cache_create("pgd",
PTRS_PER_PGD*sizeof(pgd_t),
+ PTRS_PER_PGD*sizeof(pgd_t),
0,
- SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
ctor,
dtor);
if (!pgd_cache)
--- 2.6/mm/rmap.c 2003-09-11 19:33:57.000000000 +0200
+++ build-2.6/mm/rmap.c 2003-09-11 23:41:55.000000000 +0200
@@ -521,8 +521,8 @@
{
pte_chain_cache = kmem_cache_create( "pte_chain",
sizeof(struct pte_chain),
+ sizeof(struct pte_chain),
0,
- SLAB_MUST_HWCACHE_ALIGN,
pte_chain_ctor,
NULL);
next prev parent reply other threads:[~2003-09-11 21:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-09-10 8:16 How reliable is SLAB_HWCACHE_ALIGN? Ravikiran G Thirumalai
2003-09-10 15:41 ` Robert Love
2003-09-11 5:54 ` Ravikiran G Thirumalai
2003-09-11 11:08 ` [patch] Make slab allocator work with SLAB_MUST_HWCACHE_ALIGN Ravikiran G Thirumalai
2003-09-11 16:19 ` Manfred Spraul
2003-09-11 21:49 ` Manfred Spraul [this message]
2003-09-12 8:59 ` Ravikiran G Thirumalai
2003-09-12 9:10 ` Arjan van de Ven
2003-09-13 20:06 ` Manfred Spraul
2003-09-13 20:58 ` Dipankar Sarma
2003-09-14 8:09 ` Ravikiran G Thirumalai
2003-09-14 13:00 ` Dipankar Sarma
2003-09-15 5:13 ` Ravikiran G Thirumalai
[not found] <u8mV.so.19@gated-at.bofh.it>
[not found] ` <ufor.30e.21@gated-at.bofh.it>
[not found] ` <usvj.6s9.17@gated-at.bofh.it>
[not found] ` <uxv1.5D5.23@gated-at.bofh.it>
[not found] ` <uCuI.5hY.13@gated-at.bofh.it>
[not found] ` <uRWI.xK.5@gated-at.bofh.it>
[not found] ` <voSF.8l7.17@gated-at.bofh.it>
2003-09-13 22:18 ` Arnd Bergmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3F60EDFE.5090502@colorfullife.com \
--to=manfred@colorfullife.com \
--cc=akpm@osdl.org \
--cc=dipankar@in.ibm.com \
--cc=kiran@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rml@tech9.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.