* Purpose of the mm/slab.c changes @ 2001-09-09 11:05 Manfred Spraul 2001-09-09 14:26 ` Andrea Arcangeli 0 siblings, 1 reply; 28+ messages in thread From: Manfred Spraul @ 2001-09-09 11:05 UTC (permalink / raw) To: linux-kernel, Alan Cox, torvalds, andrea What's the purpose of the mm/slab.c changes? Linus, Alan, could you please drop them. Switching from 1 list to 3 lists is just a slowdown. 2.4.9 2.4.9-ac9 km(1) 0x87 0x89 kf(1) 0xa9 0xa8 kf(cachep, 1) 0xad 0xad km(4096) 0x7B 0x87 (+24% without overhead!) kf(4096) 0xcB 0xd1 kf(cachep,4096) 0xcc 0xd3 (cpu ticks on a Duron 700, UP kernel, 100 calls in a tight loop, loop overhead (0x49) not removed) Benchmarking with SMP kernel is meaningless, since the actual list changes are outside of the hot path - I did it, and the differences are negligable (+-1 cpu tick) And this is a benchmark with 100 calls in a tight loop - Andrea's patch adds several additional branches, in real-world there would be an even larger slowdown. If there are real performance problems with the slab, then the not-yet optimized parts should get changed: * kmalloc(): the loop to find the correct slab is too slow. (the numbers in the table are without the loop, i.e. kmem_cache_alloc(kmem_find_general_cache(4096, SLAB_KERNEL), SLAB_KERNEL); * finetune the size of the per-cpu caches * finetune the amount of pages freed during kmem_cache_reap() * replace the division in kmem_cache_free_one() with a multiplication. (UP only, on SMP outside of the hot path) * enable OPTIMIZE - could help on platforms without a fast virt_addr->struct page lookup. I have a patch with some optimization, but I tought that would be 2.5 stuff. -- Manfred ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 11:05 Purpose of the mm/slab.c changes Manfred Spraul @ 2001-09-09 14:26 ` Andrea Arcangeli 2001-09-09 15:18 ` Manfred Spraul 0 siblings, 1 reply; 28+ messages in thread From: Andrea Arcangeli @ 2001-09-09 14:26 UTC (permalink / raw) To: Manfred Spraul; +Cc: linux-kernel, Alan Cox, torvalds On Sun, Sep 09, 2001 at 01:05:34PM +0200, Manfred Spraul wrote: > What's the purpose of the mm/slab.c changes? it provides lifo allocations from both partial and unused slabs. You should benchmark it with a real load, not with dummy allocations/freeing. Andrea ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 14:26 ` Andrea Arcangeli @ 2001-09-09 15:18 ` Manfred Spraul 2001-09-09 15:33 ` Andrea Arcangeli ` (4 more replies) 0 siblings, 5 replies; 28+ messages in thread From: Manfred Spraul @ 2001-09-09 15:18 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Alan Cox, torvalds > > it provides lifo allocations from both partial and unused slabs. > lifo/fifo for unused slabs is obviously superflous - free is free, it doesn't matter which free page is used first/last. The partial slabs are already lifo: if a slab goes from full to partial, it's added to the head of the partial list. > You > should benchmark it with a real load, not with dummy > allocations/freeing. > Did you run any benchmarks? If yes, could you post them? What I did was checking the number of branches in the hot path (increased), the code length (increased) and execution time with dummy loads (slower). -- Manfred ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 15:18 ` Manfred Spraul @ 2001-09-09 15:33 ` Andrea Arcangeli 2001-09-11 18:41 ` Manfred Spraul 2001-09-09 16:12 ` Alan Cox ` (3 subsequent siblings) 4 siblings, 1 reply; 28+ messages in thread From: Andrea Arcangeli @ 2001-09-09 15:33 UTC (permalink / raw) To: Manfred Spraul; +Cc: linux-kernel, Alan Cox, torvalds On Sun, Sep 09, 2001 at 05:18:00PM +0200, Manfred Spraul wrote: > > > > it provides lifo allocations from both partial and unused slabs. > > > > lifo/fifo for unused slabs is obviously superflous - free is free, it > doesn't matter which free page is used first/last. then why don't you also make fifo the buddy allocator and the partial list as well if free is free and fifo/lifo doesn't matter? > Did you run any benchmarks? If yes, could you post them? I didn't run any specific benchmark for such change but please let me know if you can find that any real benchmark is hurted by it. I think the cleanup and the potential for lifo in the free slabs is much more sensible than the other factors you mentioned, of course there's less probability of having to fall into the free slabs rather than in the partial ones during allocations, but that doesn't mean that cannot happen very often, but I will glady suggest to remove it if you prove me wrong. All I'm saying here is that the dummy allocations with no access to the ram returned are not interesting numbers. Andrea ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 15:33 ` Andrea Arcangeli @ 2001-09-11 18:41 ` Manfred Spraul 2001-09-11 19:36 ` Andrea Arcangeli 0 siblings, 1 reply; 28+ messages in thread From: Manfred Spraul @ 2001-09-11 18:41 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Alan Cox, torvalds > I think the cleanup I'm sure you read of this comment in page_alloc.c: * Buddy system. Hairy. You really aren't expected to understand this * * Hint: -mask = 1+~mask and the slab allocator must sustain more 10 times more allocations/sec: from lse netbench on sourceforge, 4-cpu, ext2, one minute: 4 million kmallocs, 5 million kmem_cache_alloc 721 000 rmqueue slab.c doesn't need to be simple, it must be fast. > and the potential for lifo in the free slabs is much more > sensible than the other factors you mentioned, of course there's less > probability of having to fall into the free slabs rather than in the > partial ones during allocations, but that doesn't mean that cannot > happen very often, but I will glady suggest to remove it if you prove > me wrong. Ok, so you agree that your changes are only beneficial in one case: kmem_cache_free(), uniprocessor or SMP not-per-cpu cached. * frees one object * after that free-operation no further slabs with allocated objects are left - only full and free slabs. Your code ensures that the next object returned will be the previously freed object, my code doesn't guarantee that. If I can modify my slab allocator to guarantee it, you'd drop your patch? -- Manfred ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-11 18:41 ` Manfred Spraul @ 2001-09-11 19:36 ` Andrea Arcangeli 2001-09-11 20:43 ` Manfred Spraul 0 siblings, 1 reply; 28+ messages in thread From: Andrea Arcangeli @ 2001-09-11 19:36 UTC (permalink / raw) To: Manfred Spraul; +Cc: linux-kernel, Alan Cox, torvalds On Tue, Sep 11, 2001 at 08:41:32PM +0200, Manfred Spraul wrote: > > I think the cleanup > I'm sure you read of this comment in page_alloc.c: > * Buddy system. Hairy. You really aren't expected to understand this > * > * Hint: -mask = 1+~mask > > and the slab allocator must sustain more 10 times more allocations/sec: > from lse netbench on sourceforge, 4-cpu, ext2, one minute: > 4 million kmallocs, > 5 million kmem_cache_alloc > 721 000 rmqueue > slab.c doesn't need to be simple, it must be fast. > > > and the potential for lifo in the free slabs is much more > > sensible than the other factors you mentioned, of course there's less > > probability of having to fall into the free slabs rather than in the > > partial ones during allocations, but that doesn't mean that cannot > > happen very often, but I will glady suggest to remove it if you prove > > me wrong. > > Ok, so you agree that your changes are only beneficial in one case: > > kmem_cache_free(), uniprocessor or SMP not-per-cpu cached. I wouldn't say "not only in such case" but "mainly in such case" (there's not infinite ram in the per-cpu caches). > If I can modify my slab allocator to guarantee it, you'd drop your > patch? I'm open to an alternate more optimized approch, however I've to say that using the struct list_head and maintaining the very clean three separated lists was really nice IMHO. Also I believe there are more interesting parts to optimize on the design side rather than making the slab.c implementation more complex with the object of microoptimization for microbenchmarks (as you told me you couldn't measure any decrease in performance in real life in a sendfile benchmark, infact the first run with the patch was a little faster and the second a little slower). Andrea ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-11 19:36 ` Andrea Arcangeli @ 2001-09-11 20:43 ` Manfred Spraul 2001-09-12 14:18 ` Andrea Arcangeli 0 siblings, 1 reply; 28+ messages in thread From: Manfred Spraul @ 2001-09-11 20:43 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Alan Cox, torvalds [-- Attachment #1: Type: text/plain, Size: 2347 bytes --] Andrea Arcangeli wrote: > > > > > Ok, so you agree that your changes are only beneficial in one case: > > > > kmem_cache_free(), uniprocessor or SMP not-per-cpu cached. > > I wouldn't say "not only in such case" but "mainly in such case" > (there's not infinite ram in the per-cpu caches). > Large object were LIFO even without your patch. \x18 Small objects on UP with partial slabs were LIFO without your patch. Small objects on UP without any partial slabs were not LIFO, but it was simple to fix that (patch attached, not yet fully tested). Small objects on SMP are managed in the per-cpu arrays - and transfered in batches (by default 30-120 objects, depending on the size) to the backend slab lists. The backend was not LIFO (also fixed in the patch). > > Also I believe there are more interesting parts to optimize on the > design side rather than making the slab.c implementation more complex > with the object of microoptimization for microbenchmarks (as you told me > you couldn't measure any decrease in performance in real life in a > sendfile benchmark, infact the first run with the patch was a little > faster and the second a little slower). > Interesting: you loose all microbenchmarks, your patch doesn't improve LIFO ordering, and you still think your patch is better? Could you explain why? Btw, I didn't merge the current slab.c directly: Ingo Molnar compared it with his own version and sent it to Linus because it was faster than his version (8-way, tux optimization). But I agree that redesign is probably a good idea for 2.5, but that's 2.5 stuff: * microoptimization: try to avoid the division in kmem_cache_free_one. Might give a 20 % boost on UP i386, even more on cpus without an efficient hardware division. I have an (ugly) patch. Mark Hemment's version avoided the division in some cases, and that really helps. * redesign: the code relies on a fast virt_to_page(). Is that realistic with NUMA/CONFIG_DISCONTIGMEM? * redesign: inode and dcache currently drain the per-cpu caches very often to reduce the fragmentation - perhaps another, stronger defragmenting allocator for inode & dcache? CPU crosscalls can't be the perfect solution. * redesign: NUMA. But that redesign - virt_to_page is the first function in kmem_cache_free and kfree - you won't need a backend simplification, more a rewrite. -- Manfred [-- Attachment #2: patch-slab-lifo --] [-- Type: text/plain, Size: 3478 bytes --] --- 2.4/mm/slab.c Tue Sep 11 21:32:23 2001 +++ build-2.4/mm/slab.c Tue Sep 11 22:39:54 2001 @@ -85,9 +85,9 @@ * FORCED_DEBUG - 1 enables SLAB_RED_ZONE and SLAB_POISON (if possible) */ -#define DEBUG 0 -#define STATS 0 -#define FORCED_DEBUG 0 +#define DEBUG 1 +#define STATS 1 +#define FORCED_DEBUG 1 /* * Parameters for kmem_cache_reap @@ -448,7 +448,7 @@ /* Inc off-slab bufctl limit until the ceiling is hit. */ if (!(OFF_SLAB(sizes->cs_cachep))) { offslab_limit = sizes->cs_size-sizeof(slab_t); - offslab_limit /= 2; + offslab_limit /= sizeof(kmem_bufctl_t); } sprintf(name, "size-%Zd(DMA)",sizes->cs_size); sizes->cs_dmacachep = kmem_cache_create(name, sizes->cs_size, 0, @@ -1411,16 +1411,31 @@ moveslab_free: /* * was partial, now empty. - * c_firstnotfull might point to slabp - * FIXME: optimize + * c_firstnotfull might point to slabp. + * The code ensures LIFO ordering if there are no partial slabs. + * (allocation from partial slabs has higher priority that LIFO + * - we just found a freeable page!) */ { - struct list_head *t = cachep->firstnotfull->prev; - - list_del(&slabp->list); - list_add_tail(&slabp->list, &cachep->slabs); - if (cachep->firstnotfull == &slabp->list) - cachep->firstnotfull = t->next; + slab_t* next = list_entry(slabp->list.next, slab_t, list); + if (&next->list != &cachep->slabs) { + if (next->inuse != cachep->num) { + if (&slabp->list == cachep->firstnotfull) + cachep->firstnotfull = &next->list; + list_del(&slabp->list); + list_add_tail(&slabp->list, &cachep->slabs); + } /* else { + The next slab is a free slab. That means + the slab with the freed object in in it's + correct position: behind all partial slabs, + in front of all other free slabs to ensure + LIFO. + } */ + }/* else { + the slab the freed object was in was the last slab in + the cache. That means it's already in the correct + position: behind all partial slabs (if any). + } */ return; } } @@ -1473,6 +1488,46 @@ #endif } +#if DEBUG +static void kmem_slabchain_test(kmem_cache_t *cachep) +{ + int pos = 0; + slab_t* walk; + unsigned long flags; + + spin_lock_irqsave(&cachep->spinlock, flags); + + walk = list_entry(cachep->slabs.next, slab_t, list); + while(&walk->list != &cachep->slabs) { + if (walk->inuse == cachep->num) { + if (pos > 0) + BUG(); + } else if (walk->inuse > 0) { + if (pos == 0) { + if (cachep->firstnotfull != &walk->list) + BUG(); + } + if (pos > 1) + BUG(); + pos = 1; /* found partial slabp */ + } else { + if (pos == 0) { + if (cachep->firstnotfull != &walk->list) + BUG(); + } + pos = 2; /* found free slabp */ + } + walk = list_entry(walk->list.next, slab_t, list); + } + if (pos == 0) { + if (cachep->firstnotfull != &cachep->slabs) + BUG(); + } + spin_unlock_irqrestore(&cachep->spinlock, flags); +} +#else +#define kmem_slabchain_test(cachep) do { } while(0) +#endif /** * kmem_cache_alloc - Allocate an object * @cachep: The cache to allocate from. @@ -1540,6 +1595,7 @@ local_irq_save(flags); __kmem_cache_free(cachep, objp); local_irq_restore(flags); + kmem_slabchain_test(cachep); } /** @@ -1561,6 +1617,7 @@ c = GET_PAGE_CACHE(virt_to_page(objp)); __kmem_cache_free(c, (void*)objp); local_irq_restore(flags); + kmem_slabchain_test(c); } kmem_cache_t * kmem_find_general_cachep (size_t size, int gfpflags) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-11 20:43 ` Manfred Spraul @ 2001-09-12 14:18 ` Andrea Arcangeli 0 siblings, 0 replies; 28+ messages in thread From: Andrea Arcangeli @ 2001-09-12 14:18 UTC (permalink / raw) To: Manfred Spraul; +Cc: linux-kernel, Alan Cox, torvalds [-- Attachment #1: Type: text/plain, Size: 1719 bytes --] On Tue, Sep 11, 2001 at 10:43:40PM +0200, Manfred Spraul wrote: > Interesting: you loose all microbenchmarks, your patch doesn't improve > LIFO ordering, and you still think your patch is better? Could you > explain why? I believe it's much cleaner to have proper list. It also shrinks the caches in fifo order (shrinking is fifo and allocation is lifo). The main reason it was significantly slower in the microbenchmarks is that it wasn't microoptimized, the design of the three list wasn't the real problem. I wrote this microbenchmark: #include <linux/module.h> #include <linux/slab.h> #include <asm/msr.h> #define NR_MALLOC 1000 #define SIZE 10 int init_module(void) { unsigned long start, stop; int i; void * malloc[NR_MALLOC]; rdtscl(start); for (i = 0; i < NR_MALLOC; i++) malloc[i] = kmalloc(SIZE, GFP_KERNEL); rdtscl(stop); for (i = 0; i < NR_MALLOC; i++) kfree(malloc[i]); printk("cycles %lu\n", stop - start); return 1; } these are the figures on UP PII 450mhz: mainline pre10 (slabs-list patch applied) cycles 90236 cycles 89812 cycles 90106 mainline but with slabs-list backed out cycles 85187 cycles 85386 cycles 85169 mainline but with slabs-list backed out and your latest lifo patch applied but with debugging turned off (also I didn't checked the offslab_limit but it was applied) cycles 85578 cycles 85856 cycles 85596 mainline with this new attached patch applied cycles 85775 cycles 85963 cycles 85901 So in short I'd prefer to merge in mainline the attached patch that optimizes the slab lists taking full advantage of the three lists. Never mind to complain if you still have problems with those microoptimizations applied. (btw, compiled with gcc 3.0.2) Andrea [-- Attachment #2: 00_slab-microoptimize-1 --] [-- Type: text/plain, Size: 4524 bytes --] --- 2.4.10pre8aa1/mm/slab.c.~1~ Wed Sep 12 03:23:43 2001 +++ 2.4.10pre8aa1/mm/slab.c Wed Sep 12 05:02:40 2001 @@ -926,8 +926,10 @@ break; slabp = list_entry(cachep->slabs_free.prev, slab_t, list); +#if DEBUG if (slabp->inuse) BUG(); +#endif list_del(&slabp->list); spin_unlock_irq(&cachep->spinlock); @@ -1215,7 +1217,7 @@ } static inline void * kmem_cache_alloc_one_tail (kmem_cache_t *cachep, - slab_t *slabp, int partial) + slab_t *slabp) { void *objp; @@ -1228,14 +1230,9 @@ objp = slabp->s_mem + slabp->free*cachep->objsize; slabp->free=slab_bufctl(slabp)[slabp->free]; - if (slabp->free == BUFCTL_END) { + if (__builtin_expect(slabp->free == BUFCTL_END, 0)) { list_del(&slabp->list); list_add(&slabp->list, &cachep->slabs_full); - } else { - if (!partial) { - list_del(&slabp->list); - list_add(&slabp->list, &cachep->slabs_partial); - } } #if DEBUG if (cachep->flags & SLAB_POISON) @@ -1262,20 +1259,23 @@ */ #define kmem_cache_alloc_one(cachep) \ ({ \ - slab_t *slabp; \ - struct list_head * slab_freelist; \ - int partial = 1; \ + struct list_head * slabs_partial, * entry; \ + slab_t *slabp; \ \ - slab_freelist = &(cachep)->slabs_partial; \ - if (list_empty(slab_freelist)) { \ - partial = 0; \ - slab_freelist = &(cachep)->slabs_free; \ - if (list_empty(slab_freelist)) \ + slabs_partial = &(cachep)->slabs_partial; \ + entry = slabs_partial->next; \ + if (__builtin_expect(entry == slabs_partial, 0)) { \ + struct list_head * slabs_free; \ + slabs_free = &(cachep)->slabs_free; \ + entry = slabs_free->next; \ + if (__builtin_expect(entry == slabs_free, 0)) \ goto alloc_new_slab; \ + list_del(entry); \ + list_add(entry, slabs_partial); \ } \ \ - slabp = list_entry(slab_freelist->next, slab_t, list); \ - kmem_cache_alloc_one_tail(cachep, slabp, partial); \ + slabp = list_entry(entry, slab_t, list); \ + kmem_cache_alloc_one_tail(cachep, slabp); \ }) #ifdef CONFIG_SMP @@ -1283,25 +1283,27 @@ { int batchcount = cachep->batchcount; cpucache_t* cc = cc_data(cachep); - struct list_head * slab_freelist; - int partial; - slab_t *slabp; spin_lock(&cachep->spinlock); while (batchcount--) { + struct list_head * slabs_partial, * entry; + slab_t *slabp; /* Get slab alloc is to come from. */ - slab_freelist = &(cachep)->slabs_partial; - partial = 1; - if (list_empty(slab_freelist)) { - partial = 0; - slab_freelist = &(cachep)->slabs_free; - if (list_empty(slab_freelist)) + slabs_partial = &(cachep)->slabs_partial; + entry = slabs_partial->next; + if (__builtin_expect(entry == slabs_partial, 0)) { + struct list_head * slabs_free; + slabs_free = &(cachep)->slabs_free; + entry = slabs_free->next; + if (__builtin_expect(entry == slabs_free, 0)) break; + list_del(entry); + list_add(entry, slabs_partial); } - slabp = list_entry(slab_freelist->next, slab_t, list); + slabp = list_entry(entry, slab_t, list); cc_entry(cc)[cc->avail++] = - kmem_cache_alloc_one_tail(cachep, slabp, partial); + kmem_cache_alloc_one_tail(cachep, slabp); } spin_unlock(&cachep->spinlock); @@ -1432,23 +1434,18 @@ STATS_DEC_ACTIVE(cachep); /* fixup slab chains */ - if (!--slabp->inuse) - goto moveslab_free; - if (slabp->inuse + 1 == cachep->num) - goto moveslab_partial; - return; - -moveslab_partial: - /* Was full. */ - list_del(&slabp->list); - list_add(&slabp->list, &cachep->slabs_partial); - return; - -moveslab_free: - /* Was partial, now empty. */ - list_del(&slabp->list); - list_add(&slabp->list, &cachep->slabs_free); - return; + { + int inuse = slabp->inuse; + if (__builtin_expect(!--slabp->inuse, 0)) { + /* Was partial or full, now empty. */ + list_del(&slabp->list); + list_add(&slabp->list, &cachep->slabs_free); + } else if (__builtin_expect(inuse == cachep->num, 0)) { + /* Was full. */ + list_del(&slabp->list); + list_add(&slabp->list, &cachep->slabs_partial); + } + } } #ifdef CONFIG_SMP @@ -1756,8 +1753,10 @@ p = searchp->slabs_free.next; while (p != &searchp->slabs_free) { slabp = list_entry(p, slab_t, list); +#if DEBUG if (slabp->inuse) BUG(); +#endif full_free++; p = p->next; } @@ -1807,8 +1806,10 @@ if (p == &best_cachep->slabs_free) break; slabp = list_entry(p,slab_t,list); +#if DEBUG if (slabp->inuse) BUG(); +#endif list_del(&slabp->list); STATS_INC_REAPED(best_cachep); ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 15:18 ` Manfred Spraul 2001-09-09 15:33 ` Andrea Arcangeli @ 2001-09-09 16:12 ` Alan Cox 2001-09-09 16:25 ` Linus Torvalds ` (2 subsequent siblings) 4 siblings, 0 replies; 28+ messages in thread From: Alan Cox @ 2001-09-09 16:12 UTC (permalink / raw) To: Manfred Spraul; +Cc: Andrea Arcangeli, linux-kernel, Alan Cox, torvalds > lifo/fifo for unused slabs is obviously superflous - free is free, it > doesn't matter which free page is used first/last. Which pages are likely to be in the CPU caches ? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 15:18 ` Manfred Spraul 2001-09-09 15:33 ` Andrea Arcangeli 2001-09-09 16:12 ` Alan Cox @ 2001-09-09 16:25 ` Linus Torvalds 2001-09-09 16:47 ` Alan Cox 2001-09-15 0:29 ` Albert D. Cahalan 2001-09-09 20:23 ` Rik van Riel 2001-09-10 2:28 ` Daniel Phillips 4 siblings, 2 replies; 28+ messages in thread From: Linus Torvalds @ 2001-09-09 16:25 UTC (permalink / raw) To: Manfred Spraul; +Cc: Andrea Arcangeli, linux-kernel, Alan Cox On Sun, 9 Sep 2001, Manfred Spraul wrote: > > > > it provides lifo allocations from both partial and unused slabs. > > > > lifo/fifo for unused slabs is obviously superflous - free is free, it > doesn't matter which free page is used first/last. You're full of crap. LIFO is obviously superior due to cache re-use. Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 16:25 ` Linus Torvalds @ 2001-09-09 16:47 ` Alan Cox 2001-09-09 16:55 ` Manfred Spraul ` (4 more replies) 2001-09-15 0:29 ` Albert D. Cahalan 1 sibling, 5 replies; 28+ messages in thread From: Alan Cox @ 2001-09-09 16:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Alan Cox > > doesn't matter which free page is used first/last. > > You're full of crap. > LIFO is obviously superior due to cache re-use. Interersting question however. On SMP without sufficient per CPU slab caches is tht still the case ? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 16:47 ` Alan Cox @ 2001-09-09 16:55 ` Manfred Spraul 2001-09-09 17:01 ` Linus Torvalds ` (3 subsequent siblings) 4 siblings, 0 replies; 28+ messages in thread From: Manfred Spraul @ 2001-09-09 16:55 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, Andrea Arcangeli, linux-kernel Alan Cox wrote: > > > > doesn't matter which free page is used first/last. > > > > You're full of crap. > > LIFO is obviously superior due to cache re-use. > > Interersting question however. On SMP without sufficient per CPU slab caches > is tht still the case ? Correct. SMP was perfect LIFO even without Andrea's changes. I thought Andrea tried to reduce the fragmentation, therefore I wrote "free is free". But even for cache re-use his changes are not a big change: The main fifo/lifo ordering on UP is mandated by the defragmentation property of the slab allocator. Afaics there is exactly one case where my code is not lifo and Andrea's is: kmem_cache_free frees the last object in slab, each slab contains more than one object, and there are no further partial slabs. In all other cases Andrea just adds list_del();list_add() instead of changes to the firstnotfull pointer. full->partial is/was lifo, partial->partial doesn't change the lists at all partial->empty was fifo, is now lifo_if_no_partial_slab_exists -- Manfred ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 16:47 ` Alan Cox 2001-09-09 16:55 ` Manfred Spraul @ 2001-09-09 17:01 ` Linus Torvalds 2001-09-09 17:22 ` Manfred Spraul ` (2 more replies) 2001-09-09 17:30 ` Andrea Arcangeli ` (2 subsequent siblings) 4 siblings, 3 replies; 28+ messages in thread From: Linus Torvalds @ 2001-09-09 17:01 UTC (permalink / raw) To: Alan Cox; +Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel On Sun, 9 Sep 2001, Alan Cox wrote: > > > LIFO is obviously superior due to cache re-use. > > Interersting question however. On SMP without sufficient per CPU slab caches > is tht still the case ? That _should_ be fairly easily testable by just changing the cpucache tuning, which yu can do through /proc. The default parameters look reasonable, though. Note, however, that as far as the slab is concerned, this issue never exists, if only because all the lists are per-CPU. So the only way you can get cross-CPU cache behaviour is (a) when the actual _use_ of the slab is moved across CPU's and (b) when you have a page that ends up moving from one CPU to the other through page allocation when the slab caches run out. Case (a) is clearly rather independent of the actual slab allocation logic, and depends on the _user_ of the allocation, not on the allocator itself. Case (b) implies that the page allocator might also should also find per-CPU LIFO queues in front of the actual allocator useful. That might certainly be worth-while looking into - although it would also increase the risk of fragementation. (Doing per-CPU LIFO queues for the actual page allocator would potentially make page alloc/de-alloc much faster due to lower locking requirements too. So you might have a double performance win if anybody wants to try this out). Linus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 17:01 ` Linus Torvalds @ 2001-09-09 17:22 ` Manfred Spraul 2001-09-09 17:27 ` arjan 2001-09-09 17:35 ` Andrea Arcangeli 2 siblings, 0 replies; 28+ messages in thread From: Manfred Spraul @ 2001-09-09 17:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alan Cox, Andrea Arcangeli, linux-kernel Linus Torvalds wrote: > > (Doing per-CPU LIFO queues for the actual page allocator would potentially > make page alloc/de-alloc much faster due to lower locking requirements > too. So you might have a double performance win if anybody wants to try > this out). > That patch already exists - Ingo wrote it, I think it's included in some RedHat kernels. But Andrea's changes mainly affect UP, not SMP. -- Manfred ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 17:01 ` Linus Torvalds 2001-09-09 17:22 ` Manfred Spraul @ 2001-09-09 17:27 ` arjan 2001-09-09 17:35 ` Andrea Arcangeli 2 siblings, 0 replies; 28+ messages in thread From: arjan @ 2001-09-09 17:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel In article <Pine.LNX.4.33.0109090952380.14365-100000@penguin.transmeta.com> you wrote: > Case (b) implies that the page allocator might also should also find > per-CPU LIFO queues in front of the actual allocator useful. That might > certainly be worth-while looking into - although it would also increase > the risk of fragementation. Hi, Ingo Molnar did such a patch already; attached below. This yields a 1.5% improvement on a 4 way machine using Oracle, more for other applications (as the oracle tests is also IO bound). Greetings, Arjan van de Ven diff -urN linux.org/include/linux/mmzone.h linux/include/linux/mmzone.h --- linux.org/include/linux/mmzone.h Wed Aug 15 17:21:11 2001 +++ linux/include/linux/mmzone.h Tue Sep 4 08:12:37 2001 @@ -7,12 +7,13 @@ #include <linux/config.h> #include <linux/spinlock.h> #include <linux/list.h> +#include <linux/smp.h> /* * Free memory management - zoned buddy allocator. */ -#define MAX_ORDER 10 +#define MAX_ORDER 11 typedef struct free_area_struct { struct list_head free_list; @@ -21,6 +22,14 @@ struct pglist_data; +#define MAX_PER_CPU_PAGES 64 + +typedef struct per_cpu_pages_s { + int nr_pages; + int max_nr_pages; + struct list_head head; +} __attribute__((aligned(L1_CACHE_BYTES))) per_cpu_t; + /* * On machines where it is needed (eg PCs) we divide physical memory * into multiple physical zones. On a PC we have 3 zones: @@ -33,6 +42,7 @@ /* * Commonly accessed fields: */ + per_cpu_t cpu_pages [NR_CPUS]; spinlock_t lock; unsigned long free_pages; unsigned long inactive_clean_pages; diff -urN linux.org/mm/page_alloc.c linux/mm/page_alloc.c --- linux.org/mm/page_alloc.c Thu Aug 16 12:43:02 2001 +++ linux/mm/page_alloc.c Tue Sep 4 08:12:37 2001 @@ -7,6 +7,7 @@ * Reshaped it to be a zoned allocator, Ingo Molnar, Red Hat, 1999 * Discontiguous memory support, Kanoj Sarcar, SGI, Nov 1999 * Zone balancing, Kanoj Sarcar, SGI, Jan 2000 + * Per-CPU page pool, Ingo Molnar, Red Hat, 2001 */ #include <linux/config.h> @@ -67,8 +68,16 @@ unsigned long index, page_idx, mask, flags; free_area_t *area; struct page *base; + per_cpu_t *per_cpu; zone_t *zone; + /* + * This late check is safe because reserved pages do not + * have a valid page->count. This trick avoids overhead + * in __free_pages(). + */ + if (PageReserved(page)) + return; if (page->buffers) BUG(); if (page->mapping) @@ -102,7 +111,18 @@ area = zone->free_area + order; - spin_lock_irqsave(&zone->lock, flags); + per_cpu = zone->cpu_pages + smp_processor_id(); + + __save_flags(flags); + __cli(); + if (!order && (per_cpu->nr_pages < per_cpu->max_nr_pages) && (!free_shortage())) { + list_add(&page->list, &per_cpu->head); + per_cpu->nr_pages++; + __restore_flags(flags); + return; + } + + spin_lock(&zone->lock); zone->free_pages -= mask; @@ -169,16 +189,35 @@ return page; } + static FASTCALL(struct page * rmqueue(zone_t *zone, unsigned long order)); static struct page * rmqueue(zone_t *zone, unsigned long order) { + per_cpu_t *per_cpu = zone->cpu_pages + smp_processor_id(); free_area_t * area = zone->free_area + order; unsigned long curr_order = order; struct list_head *head, *curr; unsigned long flags; struct page *page; + int threshold=0; - spin_lock_irqsave(&zone->lock, flags); + if (!(current->flags & PF_MEMALLOC)) threshold = per_cpu->max_nr_pages/8; + __save_flags(flags); + __cli(); + + if (!order && (per_cpu->nr_pages>threshold)) { + if (list_empty(&per_cpu->head)) + BUG(); + page = list_entry(per_cpu->head.next, struct page, list); + list_del(&page->list); + per_cpu->nr_pages--; + __restore_flags(flags); + + set_page_count(page, 1); + return page; + } + + spin_lock(&zone->lock); do { head = &area->free_list; curr = memlist_next(head); @@ -534,7 +573,7 @@ void __free_pages(struct page *page, unsigned long order) { - if (!PageReserved(page) && put_page_testzero(page)) + if (put_page_testzero(page)) __free_pages_ok(page, order); } @@ -796,6 +835,7 @@ offset = lmem_map - mem_map; for (j = 0; j < MAX_NR_ZONES; j++) { + int k; zone_t *zone = pgdat->node_zones + j; unsigned long mask; unsigned long size, realsize; @@ -807,6 +847,18 @@ printk("zone(%lu): %lu pages.\n", j, size); zone->size = size; zone->name = zone_names[j]; + + for (k = 0; k < NR_CPUS; k++) { + per_cpu_t *per_cpu = zone->cpu_pages + k; + + INIT_LIST_HEAD(&per_cpu->head); + per_cpu->max_nr_pages = realsize / smp_num_cpus / 128; + if (per_cpu->max_nr_pages > MAX_PER_CPU_PAGES) + per_cpu->max_nr_pages = MAX_PER_CPU_PAGES; + else + if (!per_cpu->max_nr_pages) + per_cpu->max_nr_pages = 1; + } zone->lock = SPIN_LOCK_UNLOCKED; zone->zone_pgdat = pgdat; zone->free_pages = 0; ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 17:01 ` Linus Torvalds 2001-09-09 17:22 ` Manfred Spraul 2001-09-09 17:27 ` arjan @ 2001-09-09 17:35 ` Andrea Arcangeli 2 siblings, 0 replies; 28+ messages in thread From: Andrea Arcangeli @ 2001-09-09 17:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alan Cox, Manfred Spraul, linux-kernel On Sun, Sep 09, 2001 at 10:01:32AM -0700, Linus Torvalds wrote: > (Doing per-CPU LIFO queues for the actual page allocator would potentially > make page alloc/de-alloc much faster due to lower locking requirements > too. So you might have a double performance win if anybody wants to try > this out). I recall I seen this one implemented during some auditing recently, I think it's in the tux patch but I may remeber wrong. Andrea ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 16:47 ` Alan Cox 2001-09-09 16:55 ` Manfred Spraul 2001-09-09 17:01 ` Linus Torvalds @ 2001-09-09 17:30 ` Andrea Arcangeli 2001-09-09 17:59 ` Fwd: 2.4.10-pre6 ramdisk driver broken? won't compile Stephan Gutschke 2001-09-09 20:26 ` Purpose of the mm/slab.c changes Rik van Riel 4 siblings, 0 replies; 28+ messages in thread From: Andrea Arcangeli @ 2001-09-09 17:30 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, Manfred Spraul, linux-kernel On Sun, Sep 09, 2001 at 05:47:12PM +0100, Alan Cox wrote: > > > doesn't matter which free page is used first/last. > > > > You're full of crap. > > LIFO is obviously superior due to cache re-use. > > Interersting question however. On SMP without sufficient per CPU slab caches > is tht still the case ? if we stay in per-cpu caches then the other paths cannot run at all so it cannot make any difference for such case. Andrea ^ permalink raw reply [flat|nested] 28+ messages in thread
* Fwd: 2.4.10-pre6 ramdisk driver broken? won't compile 2001-09-09 16:47 ` Alan Cox ` (2 preceding siblings ...) 2001-09-09 17:30 ` Andrea Arcangeli @ 2001-09-09 17:59 ` Stephan Gutschke 2001-09-09 20:26 ` Purpose of the mm/slab.c changes Rik van Riel 4 siblings, 0 replies; 28+ messages in thread From: Stephan Gutschke @ 2001-09-09 17:59 UTC (permalink / raw) To: linux-kernel From: Stephan Gutschke <stephan@kernel.gutschke.com> To: Majordomo@vger.kernel.org Date: Sun, 09 Sep 2001 09:55:14 -800 Hi there, my kernel stops compiling with this: gcc -D__KERNEL__ -I/usr/src/linux-2.4.10-pre6/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fomit-frame-pointer -fno-strict-aliasing -fno-common -pipe - mpreferred-stack-boundary=2 -march=i686 -c -o rd.o rd.c rd.c: In function `rd_ioctl': rd.c:262: invalid type argument of `->' make[3]: *** [rd.o] Error 1 make[3]: Leaving directory `/usr/src/linux-2.4.10-pre6/drivers/block' make[2]: *** [first_rule] Error 2 make[2]: Leaving directory `/usr/src/linux-2.4.10-pre6/drivers/block' make[1]: *** [_subdir_block] Error 2 make[1]: Leaving directory `/usr/src/linux-2.4.10-pre6/drivers' make: *** [_dir_drivers] Error 2 I checked the patch (patch-2.4.10-pre6) and thought maybe there is a "&" missing, where someone changed &inode->... to rd_bdev[... ? Anyways, i changed it and it seems to compile ;) @@ -259,7 +259,7 @@ /* special: we want to release the ramdisk memory, it's not like with the other blockdevices where this ioctl only flushes away the buffer cache. */ - if ((atomic_read(&inode->i_bdev->bd_openers) > 2)) + if ((atomic_read(rd_bdev[minor]->bd_openers) > 2)) return -EBUSY; destroy_buffers(inode->i_rdev); rd_blocksizes[minor] = 0; Bye Stephan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 16:47 ` Alan Cox ` (3 preceding siblings ...) 2001-09-09 17:59 ` Fwd: 2.4.10-pre6 ramdisk driver broken? won't compile Stephan Gutschke @ 2001-09-09 20:26 ` Rik van Riel 4 siblings, 0 replies; 28+ messages in thread From: Rik van Riel @ 2001-09-09 20:26 UTC (permalink / raw) To: Alan Cox; +Cc: Linus Torvalds, Manfred Spraul, Andrea Arcangeli, linux-kernel On Sun, 9 Sep 2001, Alan Cox wrote: > > > doesn't matter which free page is used first/last. > > > > You're full of crap. > > LIFO is obviously superior due to cache re-use. > > Interersting question however. On SMP without sufficient per CPU > slab caches is tht still the case ? By definition, no. If we're allocating and freeing the slabs at such a fast speed that the slabs which are NOT in the per-CPU caches are still in the cache, chances are our per-CPU caches are too small. regards, Rik -- IA64: a worthy successor to the i860. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 16:25 ` Linus Torvalds 2001-09-09 16:47 ` Alan Cox @ 2001-09-15 0:29 ` Albert D. Cahalan 1 sibling, 0 replies; 28+ messages in thread From: Albert D. Cahalan @ 2001-09-15 0:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Alan Cox Linus Torvalds writes: > On Sun, 9 Sep 2001, Manfred Spraul wrote: >>> it provides lifo allocations from both partial and unused slabs. >> >> lifo/fifo for unused slabs is obviously superflous - free is free, it >> doesn't matter which free page is used first/last. > > You're full of crap. > > LIFO is obviously superior due to cache re-use. I think a bit of arch-specific code could do better. Since I happen to have the MPC7400 manual at hand, I'll use that for my example. This is the PowerPC "G4" chip. The L1 cache is 8-way, 32 bytes/block, and has 128 sets. The L1 cache block replacement algorithm will select an invalid block whenever possible. If an invalid block isn't available, then a dirty block must go out to L2. The L2 cache is 2-way, as large as 2 MB, and has FIFO replacement. Now, considering that allocator: The LIFO way: likely to suck up memory/L2 bandwidth writing out dirty data that we don't care about, plus you risk stalling the CPU to do this while in need of a free cache block. The alternative: invalidate the cache line. This takes just one instruction per cache line, and causes an address-only bus operation at worst. Completely unrelated code may run faster due to the availability of invalid cache lines. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 15:18 ` Manfred Spraul ` (2 preceding siblings ...) 2001-09-09 16:25 ` Linus Torvalds @ 2001-09-09 20:23 ` Rik van Riel 2001-09-09 20:44 ` Davide Libenzi 2001-09-10 2:28 ` Daniel Phillips 4 siblings, 1 reply; 28+ messages in thread From: Rik van Riel @ 2001-09-09 20:23 UTC (permalink / raw) To: Manfred Spraul; +Cc: Andrea Arcangeli, linux-kernel, Alan Cox, torvalds On Sun, 9 Sep 2001, Manfred Spraul wrote: > > it provides lifo allocations from both partial and unused slabs. > > lifo/fifo for unused slabs is obviously superflous - free is > free, it doesn't matter which free page is used first/last. Mind that the L2 cache is often as much as 10 times faster than RAM, so it would be nice if we had a good chance that the slab we just allocated would be in L2 cache. regards, Rik -- IA64: a worthy successor to the i860. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 20:23 ` Rik van Riel @ 2001-09-09 20:44 ` Davide Libenzi 2001-09-09 20:45 ` Rik van Riel 0 siblings, 1 reply; 28+ messages in thread From: Davide Libenzi @ 2001-09-09 20:44 UTC (permalink / raw) To: Rik van Riel Cc: torvalds, Alan Cox, linux-kernel, Andrea Arcangeli, Manfred Spraul On 09-Sep-2001 Rik van Riel wrote: > On Sun, 9 Sep 2001, Manfred Spraul wrote: > >> > it provides lifo allocations from both partial and unused slabs. >> >> lifo/fifo for unused slabs is obviously superflous - free is >> free, it doesn't matter which free page is used first/last. > > Mind that the L2 cache is often as much as 10 times faster > than RAM, so it would be nice if we had a good chance that > the slab we just allocated would be in L2 cache. Do You see it as a plus ? The new allocated slab will be very likely written ( w/o regard about the old content ) and an L2 mapping will generate invalidate traffic. - Davide ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 20:44 ` Davide Libenzi @ 2001-09-09 20:45 ` Rik van Riel 2001-09-09 20:58 ` Davide Libenzi 0 siblings, 1 reply; 28+ messages in thread From: Rik van Riel @ 2001-09-09 20:45 UTC (permalink / raw) To: Davide Libenzi Cc: torvalds, Alan Cox, linux-kernel, Andrea Arcangeli, Manfred Spraul On Sun, 9 Sep 2001, Davide Libenzi wrote: > Do You see it as a plus ? > The new allocated slab will be very likely written ( w/o regard > about the old content ) and an L2 mapping will generate > invalidate traffic. If your invalidates are slower than your RAM, you should consider getting another computer. Rik -- IA64: a worthy successor to the i860. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 20:45 ` Rik van Riel @ 2001-09-09 20:58 ` Davide Libenzi 2001-09-22 12:28 ` Ralf Baechle 0 siblings, 1 reply; 28+ messages in thread From: Davide Libenzi @ 2001-09-09 20:58 UTC (permalink / raw) To: Rik van Riel Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Alan Cox, torvalds On 09-Sep-2001 Rik van Riel wrote: > On Sun, 9 Sep 2001, Davide Libenzi wrote: > >> Do You see it as a plus ? >> The new allocated slab will be very likely written ( w/o regard >> about the old content ) and an L2 mapping will generate >> invalidate traffic. > > If your invalidates are slower than your RAM, you should > consider getting another computer. You mean a Sun, that uses a separate bus for snooping ? :) Besides to not under estimate the cache coherency traffic ( that on many CPUs uses the main memory bus ) there's the fact that the old data eventually present in L2 won't be used by the new slab user. - Davide ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 20:58 ` Davide Libenzi @ 2001-09-22 12:28 ` Ralf Baechle 2001-09-22 21:03 ` Davide Libenzi 0 siblings, 1 reply; 28+ messages in thread From: Ralf Baechle @ 2001-09-22 12:28 UTC (permalink / raw) To: Davide Libenzi Cc: Rik van Riel, Manfred Spraul, Andrea Arcangeli, linux-kernel, Alan Cox, torvalds On Sun, Sep 09, 2001 at 01:58:44PM -0700, Davide Libenzi wrote: > >> Do You see it as a plus ? > >> The new allocated slab will be very likely written ( w/o regard > >> about the old content ) and an L2 mapping will generate > >> invalidate traffic. > > > > If your invalidates are slower than your RAM, you should > > consider getting another computer. > > You mean a Sun, that uses a separate bus for snooping ? :) > Besides to not under estimate the cache coherency traffic ( that on many CPUs > uses the main memory bus ) there's the fact that the old data eventually > present in L2 won't be used by the new slab user. That's actually what having a slab cache of pre-initialized elements tries to achieve. On anything that uses a MESI-like cache coherence protocol a cached dirty cache line that is written to will not cause any coherency traffic and thus be faster. Ralf -- "Embrace, Enhance, Eliminate" - it worked for the pope, it'll work for Bill. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-22 12:28 ` Ralf Baechle @ 2001-09-22 21:03 ` Davide Libenzi 2001-09-22 21:36 ` David S. Miller 0 siblings, 1 reply; 28+ messages in thread From: Davide Libenzi @ 2001-09-22 21:03 UTC (permalink / raw) To: Ralf Baechle Cc: torvalds, Alan Cox, linux-kernel, Andrea Arcangeli, Manfred Spraul, Rik van Riel On 22-Sep-2001 Ralf Baechle wrote: > On Sun, Sep 09, 2001 at 01:58:44PM -0700, Davide Libenzi wrote: > >> >> Do You see it as a plus ? >> >> The new allocated slab will be very likely written ( w/o regard >> >> about the old content ) and an L2 mapping will generate >> >> invalidate traffic. >> > >> > If your invalidates are slower than your RAM, you should >> > consider getting another computer. >> >> You mean a Sun, that uses a separate bus for snooping ? :) >> Besides to not under estimate the cache coherency traffic ( that on many CPUs >> uses the main memory bus ) there's the fact that the old data eventually >> present in L2 won't be used by the new slab user. > > That's actually what having a slab cache of pre-initialized elements tries > to achieve. > > On anything that uses a MESI-like cache coherence protocol a cached dirty > cache line that is written to will not cause any coherency traffic and > thus be faster. MESI is a bit more complicated than clean/dirty status. This is a very good state machine graph for MESI : http://odin.ee.uwa.edu.au/~morris/CA406/cache_coh.html Besides this, i don't get how a LIFO could help you. >From a cache point of view if the slab-free code run on processor A and the slab-alloc code will run on a processor B, if these two ops are executed very close in time ( due LIFO ) there's a good probability of modified->shared migration that will result in pushback ops. A FIFO will result in more time between free and alloc with a good probability that the interlock will be relaxed. - Davide ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-22 21:03 ` Davide Libenzi @ 2001-09-22 21:36 ` David S. Miller 0 siblings, 0 replies; 28+ messages in thread From: David S. Miller @ 2001-09-22 21:36 UTC (permalink / raw) To: davidel; +Cc: ralf, torvalds, alan, linux-kernel, andrea, manfred, riel From: Davide Libenzi <davidel@xmailserver.org> Date: Sat, 22 Sep 2001 14:03:02 -0700 (PDT) Besides this, i don't get how a LIFO could help you. Actually, there is a school of thought which says that if you make the time between the free and re-alloc of a piece of memory as long as possible you increase the likelyhood that any dirty cache lines of that memory can be sent back to memory "quietly" during natural L2 cache line replacement. I don't necessarily subscribe to these ideas, but I do see the potential benefits. For one thing, it does have the potential to lead to more repeatable timings, at least in theory. Later, David S. Miller davem@redhat.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Purpose of the mm/slab.c changes 2001-09-09 15:18 ` Manfred Spraul ` (3 preceding siblings ...) 2001-09-09 20:23 ` Rik van Riel @ 2001-09-10 2:28 ` Daniel Phillips 4 siblings, 0 replies; 28+ messages in thread From: Daniel Phillips @ 2001-09-10 2:28 UTC (permalink / raw) To: Manfred Spraul, Andrea Arcangeli; +Cc: linux-kernel, Alan Cox, torvalds On September 9, 2001 05:18 pm, Manfred Spraul wrote: > lifo/fifo for unused slabs is obviously superflous - free is free, it > doesn't matter which free page is used first/last. Welll... there is a some difference with respect to the buddy allocator and fragmentation. <--- nit -- Daniel ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2001-09-22 21:36 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-09-09 11:05 Purpose of the mm/slab.c changes Manfred Spraul 2001-09-09 14:26 ` Andrea Arcangeli 2001-09-09 15:18 ` Manfred Spraul 2001-09-09 15:33 ` Andrea Arcangeli 2001-09-11 18:41 ` Manfred Spraul 2001-09-11 19:36 ` Andrea Arcangeli 2001-09-11 20:43 ` Manfred Spraul 2001-09-12 14:18 ` Andrea Arcangeli 2001-09-09 16:12 ` Alan Cox 2001-09-09 16:25 ` Linus Torvalds 2001-09-09 16:47 ` Alan Cox 2001-09-09 16:55 ` Manfred Spraul 2001-09-09 17:01 ` Linus Torvalds 2001-09-09 17:22 ` Manfred Spraul 2001-09-09 17:27 ` arjan 2001-09-09 17:35 ` Andrea Arcangeli 2001-09-09 17:30 ` Andrea Arcangeli 2001-09-09 17:59 ` Fwd: 2.4.10-pre6 ramdisk driver broken? won't compile Stephan Gutschke 2001-09-09 20:26 ` Purpose of the mm/slab.c changes Rik van Riel 2001-09-15 0:29 ` Albert D. Cahalan 2001-09-09 20:23 ` Rik van Riel 2001-09-09 20:44 ` Davide Libenzi 2001-09-09 20:45 ` Rik van Riel 2001-09-09 20:58 ` Davide Libenzi 2001-09-22 12:28 ` Ralf Baechle 2001-09-22 21:03 ` Davide Libenzi 2001-09-22 21:36 ` David S. Miller 2001-09-10 2:28 ` Daniel Phillips
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox