From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id 3138E6B0024 for ; Mon, 16 May 2011 16:26:24 -0400 (EDT) Message-Id: <20110516202605.274023469@linux.com> Date: Mon, 16 May 2011 15:26:05 -0500 From: Christoph Lameter Subject: [slubllv5 00/25] SLUB: Lockless freelists for objects V5 Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner V4->V5 - More cleanup. Remove gotos from __slab_alloc and __slab_free - Some structural changes to alloc and free to clean up the code - Statistics modifications folded in other patches. - Fixes to patches already in Pekka's slabnext. - Include missing upstream fixes V3->V4 - Diffed against Pekka's slab/next tree. - Numerous cleanups in particular as a result of the removal of the #ifdef CMPXCHG_LOCAL stuff. - Smaller cleanups whereever I saw something. V2->V3 - Provide statistics - Fallback logic to page lock if cmpxchg16b is not available. - Better counter support - More cleanups and clarifications Well here is another result of my obsession with SLAB allocators. There must be some way to get an allocator done that is faster without queueing and I hope that we are now there (maybe only almost...). Any help with cleaning up the rough edges would be appreciated. This patchset implement wider lockless operations in slub affecting most of the slowpaths. In particular the patch decreases the overhead in the performance critical section of __slab_free. One test that I ran was "hackbench 200 process 200" on 2.6.29-rc3 under KVM Run SLAB SLUB SLUB LL 1st 35.2 35.9 31.9 2nd 34.6 30.8 27.9 3rd 33.8 29.9 28.8 Note that the SLUB version in 2.6.29-rc1 already has an optimized allocation and free path using this_cpu_cmpxchg_double(). SLUB LL takes it to new heights by also using cmpxchg_double() in the slowpaths (especially in the kfree() case where we frequently cannot use the fastpath because there is no queue). The patch uses a cmpxchg_double (also introduced here) to do an atomic change on the state of a slab page that includes the following pieces of information: 1. Freelist pointer 2. Number of objects inuse 3. Frozen state of a slab Disabling of interrupts (which is a significant latency in the allocator paths) is avoided in the __slab_free case. There are some concerns with this patch. The use of cmpxchg_double on fields of the page struct requires alignment of the fields to double word boundaries. That can only be accomplished by adding some padding to struct page which blows it up to 64 byte (on x86_64). Comments in the source describe these things in more detail. The cmpxchg_double() operation introduced here could also be used to update other doublewords in the page struct in a lockless fashion. One can envision page state changes that involved flags and mappings or maybe do list operations locklessly (but with the current scheme we would need to update two other words elsewhere at the same time too, so another scheme would be needed). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id 555436B0028 for ; Mon, 16 May 2011 16:26:25 -0400 (EDT) Message-Id: <20110516202622.862544137@linux.com> Date: Mon, 16 May 2011 15:26:08 -0500 From: Christoph Lameter Subject: [slubllv5 03/25] slub: Make CONFIG_PAGE_ALLOC work with new fastpath References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=fixup Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Fastpath can do a speculative access to a page that CONFIG_PAGE_ALLOC may have marked as invalid to retrieve the pointer to the next free object. Use probe_kernel_read in that case in order not to cause a page fault. Signed-off-by: Christoph Lameter --- mm/slub.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-10 14:31:28.000000000 -0500 +++ linux-2.6/mm/slub.c 2011-05-10 14:31:35.000000000 -0500 @@ -261,6 +261,18 @@ static inline void *get_freepointer(stru return *(void **)(object + s->offset); } +static inline void *get_freepointer_safe(struct kmem_cache *s, void *object) +{ + void *p; + +#ifdef CONFIG_DEBUG_PAGEALLOC + probe_kernel_read(&p, (void **)(object + s->offset), sizeof(p)); +#else + p = get_freepointer(s, object); +#endif + return p; +} + static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp) { *(void **)(object + s->offset) = fp; @@ -1933,7 +1945,7 @@ redo: if (unlikely(!irqsafe_cpu_cmpxchg_double( s->cpu_slab->freelist, s->cpu_slab->tid, object, tid, - get_freepointer(s, object), next_tid(tid)))) { + get_freepointer_safe(s, object), next_tid(tid)))) { note_cmpxchg_failure("slab_alloc", s, tid); goto redo; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with SMTP id 891BA6B0025 for ; Mon, 16 May 2011 16:26:26 -0400 (EDT) Message-Id: <20110516202621.693228967@linux.com> Date: Mon, 16 May 2011 15:26:06 -0500 From: Christoph Lameter Subject: [slubllv5 01/25] slub: Avoid warning for !CONFIG_SLUB_DEBUG References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=fixup33 Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Move the #ifdef so that get_map is only defined if CONFIG_SLUB_DEBUG is defined. Signed-off-by: Christoph Lameter --- mm/slub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-12 11:38:42.000000000 -0500 +++ linux-2.6/mm/slub.c 2011-05-12 11:39:40.000000000 -0500 @@ -326,6 +326,7 @@ static inline int oo_objects(struct kmem return x.x & OO_MASK; } +#ifdef CONFIG_SLUB_DEBUG /* * Determine a map of object in use on a page. * @@ -341,7 +342,6 @@ static void get_map(struct kmem_cache *s set_bit(slab_index(p, s, addr), map); } -#ifdef CONFIG_SLUB_DEBUG /* * Debug settings: */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with SMTP id C2B178D003B for ; Mon, 16 May 2011 16:26:26 -0400 (EDT) Message-Id: <20110516202623.440437817@linux.com> Date: Mon, 16 May 2011 15:26:09 -0500 From: Christoph Lameter Subject: [slubllv5 04/25] slub: Push irq disable into allocate_slab() References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=push_irq_disable Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Do the irq handling in allocate_slab() instead of __slab_alloc(). __slab_alloc() is already cluttered and allocate_slab() is already fiddling around with gfp flags. Signed-off-by: Christoph Lameter --- mm/slub.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 11:40:38.031463496 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 11:42:31.921463363 -0500 @@ -1187,6 +1187,11 @@ static struct page *allocate_slab(struct struct kmem_cache_order_objects oo = s->oo; gfp_t alloc_gfp; + flags &= gfp_allowed_mask; + + if (flags & __GFP_WAIT) + local_irq_enable(); + flags |= s->allocflags; /* @@ -1203,12 +1208,15 @@ static struct page *allocate_slab(struct * Try a lower order alloc if possible */ page = alloc_slab_page(flags, node, oo); - if (!page) - return NULL; - stat(s, ORDER_FALLBACK); } + if (flags & __GFP_WAIT) + local_irq_disable(); + + if (!page) + return NULL; + if (kmemcheck_enabled && !(s->flags & (SLAB_NOTRACK | DEBUG_DEFAULT_FLAGS))) { int pages = 1 << oo_order(oo); @@ -1850,15 +1858,8 @@ new_slab: goto load_freelist; } - gfpflags &= gfp_allowed_mask; - if (gfpflags & __GFP_WAIT) - local_irq_enable(); - page = new_slab(s, gfpflags, node); - if (gfpflags & __GFP_WAIT) - local_irq_disable(); - if (page) { c = __this_cpu_ptr(s->cpu_slab); stat(s, ALLOC_SLAB); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta7.messagelabs.com (mail6.bemta7.messagelabs.com [216.82.255.55]) by kanga.kvack.org (Postfix) with ESMTP id 06EB290010C for ; Mon, 16 May 2011 16:26:26 -0400 (EDT) Message-Id: <20110516202622.292494949@linux.com> Date: Mon, 16 May 2011 15:26:07 -0500 From: Christoph Lameter Subject: [slubllv5 02/25] slub: Fix control flow in slab_alloc References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=fixup44 Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Signed-off-by: Christoph Lameter --- mm/slub.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 13:00:39.000000000 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 13:01:40.171457827 -0500 @@ -1833,7 +1833,6 @@ new_slab: page = get_partial(s, gfpflags, node); if (page) { stat(s, ALLOC_FROM_PARTIAL); -load_from_page: c->node = page_to_nid(page); c->page = page; goto load_freelist; @@ -1856,8 +1855,9 @@ load_from_page: slab_lock(page); __SetPageSlubFrozen(page); - - goto load_from_page; + c->node = page_to_nid(page); + c->page = page; + goto load_freelist; } if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit()) slab_out_of_memory(s, gfpflags, node); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id DBEBB6B0028 for ; Mon, 16 May 2011 16:26:27 -0400 (EDT) Message-Id: <20110516202625.197639928@linux.com> Date: Mon, 16 May 2011 15:26:12 -0500 From: Christoph Lameter Subject: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=cmpxchg_double_x86 Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner A simple implementation that only supports the word size and does not have a fallback mode (would require a spinlock). And 32 and 64 bit support for cmpxchg_double. cmpxchg double uses the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare and swap 2 machine words. This allows lockless algorithms to move more context information through critical sections. Set a flag CONFIG_CMPXCHG_DOUBLE to signal the support of that feature during kernel builds. Signed-off-by: Christoph Lameter --- arch/x86/Kconfig.cpu | 3 ++ arch/x86/include/asm/cmpxchg_32.h | 46 ++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cmpxchg_64.h | 45 +++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cpufeature.h | 1 4 files changed, 95 insertions(+) Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h 2011-05-16 11:40:36.421463498 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h 2011-05-16 11:46:34.781463079 -0500 @@ -151,4 +151,49 @@ extern void __cmpxchg_wrong_size(void); cmpxchg_local((ptr), (o), (n)); \ }) +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ + : "=d"(__junk), "=a"(__ret) \ + : "S"(ptr), "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + + +#define cmpxchg16b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg16b (%%rsi)\n\t\tsetz %1\n\t" \ + : "=d"(__junk)_, "=a"(__ret) \ + : "S"((ptr)), "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define system_has_cmpxchg_double() cpu_has_cx16 + #endif /* _ASM_X86_CMPXCHG_64_H */ Index: linux-2.6/arch/x86/include/asm/cmpxchg_32.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_32.h 2011-05-16 11:40:36.431463498 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_32.h 2011-05-16 11:46:34.781463079 -0500 @@ -280,4 +280,50 @@ static inline unsigned long cmpxchg_386( #endif +#define cmpxchg8b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg8b (%%esi); setz %1"\ + : "d="(__dummy), "=a" (__ret) \ + : "S" ((ptr)), "a" (__old1), "d"(__old2), \ + "b" (__new1), "c" (__new2) \ + : "memory"); \ + __ret; }) + + +#define cmpxchg8b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg8b (%%esi); tsetz %1" \ + : "d="(__dummy), "=a"(__ret) \ + : "S" ((ptr)), "a" (__old), "d"(__old2), \ + "b" (__new1), "c" (__new2), \ + : "memory"); \ + __ret; }) + + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg8b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + #endif /* _ASM_X86_CMPXCHG_32_H */ Index: linux-2.6/arch/x86/Kconfig.cpu =================================================================== --- linux-2.6.orig/arch/x86/Kconfig.cpu 2011-05-16 11:40:36.401463498 -0500 +++ linux-2.6/arch/x86/Kconfig.cpu 2011-05-16 11:46:34.781463079 -0500 @@ -308,6 +308,9 @@ config X86_CMPXCHG config CMPXCHG_LOCAL def_bool X86_64 || (X86_32 && !M386) +config CMPXCHG_DOUBLE + def_bool X86_64 || (X86_32 && !M386) + config X86_L1_CACHE_SHIFT int default "7" if MPENTIUM4 || MPSC Index: linux-2.6/arch/x86/include/asm/cpufeature.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cpufeature.h 2011-05-16 11:40:36.411463498 -0500 +++ linux-2.6/arch/x86/include/asm/cpufeature.h 2011-05-16 11:46:34.801463079 -0500 @@ -286,6 +286,7 @@ extern const char * const x86_power_flag #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR) #define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ) #define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE) +#define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16) #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64) # define cpu_has_invlpg 1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id AEEEF6B002B for ; Mon, 16 May 2011 16:26:26 -0400 (EDT) Message-Id: <20110516202624.013950205@linux.com> Date: Mon, 16 May 2011 15:26:10 -0500 From: Christoph Lameter Subject: [slubllv5 05/25] slub: Do not use frozen page flag but a bit in the page counters References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=frozen_field Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Do not use a page flag for the frozen bit. It needs to be part of the state that is handled with cmpxchg_double(). So use a bit in the counter struct in the page struct for that purpose. Also all page start out as frozen pages so set the bit when the page is allocated. Signed-off-by: Christoph Lameter --- include/linux/mm_types.h | 5 +++-- include/linux/page-flags.h | 2 -- mm/slub.c | 12 ++++++------ 3 files changed, 9 insertions(+), 10 deletions(-) Index: linux-2.6/include/linux/mm_types.h =================================================================== --- linux-2.6.orig/include/linux/mm_types.h 2011-05-12 11:11:00.000000000 -0500 +++ linux-2.6/include/linux/mm_types.h 2011-05-12 15:36:09.000000000 -0500 @@ -41,8 +41,9 @@ struct page { * & limit reverse map searches. */ struct { /* SLUB */ - u16 inuse; - u16 objects; + unsigned inuse:16; + unsigned objects:15; + unsigned frozen:1; }; }; union { Index: linux-2.6/include/linux/page-flags.h =================================================================== --- linux-2.6.orig/include/linux/page-flags.h 2011-05-12 11:11:00.000000000 -0500 +++ linux-2.6/include/linux/page-flags.h 2011-05-12 15:36:09.000000000 -0500 @@ -212,8 +212,6 @@ PAGEFLAG(SwapBacked, swapbacked) __CLEAR __PAGEFLAG(SlobFree, slob_free) -__PAGEFLAG(SlubFrozen, slub_frozen) - /* * Private page markings that may be used by the filesystem that owns the page * for its own purposes. Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-12 15:35:58.000000000 -0500 +++ linux-2.6/mm/slub.c 2011-05-12 15:36:29.000000000 -0500 @@ -166,7 +166,7 @@ static inline int kmem_cache_debug(struc #define OO_SHIFT 16 #define OO_MASK ((1 << OO_SHIFT) - 1) -#define MAX_OBJS_PER_PAGE 65535 /* since page.objects is u16 */ +#define MAX_OBJS_PER_PAGE 32767 /* since page.objects is u15 */ /* Internal SLUB flags */ #define __OBJECT_POISON 0x80000000UL /* Poison object */ @@ -1025,7 +1025,7 @@ static noinline int free_debug_processin } /* Special debug activities for freeing objects */ - if (!PageSlubFrozen(page) && !page->freelist) + if (!page->frozen && !page->freelist) remove_full(s, page); if (s->flags & SLAB_STORE_USER) set_track(s, object, TRACK_FREE, addr); @@ -1414,7 +1414,7 @@ static inline int lock_and_freeze_slab(s { if (slab_trylock(page)) { __remove_partial(n, page); - __SetPageSlubFrozen(page); + page->frozen = 1; return 1; } return 0; @@ -1528,7 +1528,7 @@ static void unfreeze_slab(struct kmem_ca { struct kmem_cache_node *n = get_node(s, page_to_nid(page)); - __ClearPageSlubFrozen(page); + page->frozen = 0; if (page->inuse) { if (page->freelist) { @@ -1866,7 +1866,7 @@ new_slab: flush_slab(s, c); slab_lock(page); - __SetPageSlubFrozen(page); + page->frozen = 1; c->node = page_to_nid(page); c->page = page; goto load_freelist; @@ -2043,7 +2043,7 @@ static void __slab_free(struct kmem_cach page->freelist = object; page->inuse--; - if (unlikely(PageSlubFrozen(page))) { + if (unlikely(page->frozen)) { stat(s, FREE_FROZEN); goto out_unlock; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id 07B636B0029 for ; Mon, 16 May 2011 16:26:27 -0400 (EDT) Message-Id: <20110516202624.616923279@linux.com> Date: Mon, 16 May 2011 15:26:11 -0500 From: Christoph Lameter Subject: [slubllv5 06/25] slub: Move page->frozen handling near where the page->freelist handling occurs References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=frozen_move Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner This is necessary because the frozen bit has to be handled in the same cmpxchg_double with the freelist and the counters. Signed-off-by: Christoph Lameter --- mm/slub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-12 15:36:29.000000000 -0500 +++ linux-2.6/mm/slub.c 2011-05-12 15:37:30.000000000 -0500 @@ -1276,6 +1276,7 @@ static struct page *new_slab(struct kmem page->freelist = start; page->inuse = 0; + page->frozen = 1; out: return page; } @@ -1414,7 +1415,6 @@ static inline int lock_and_freeze_slab(s { if (slab_trylock(page)) { __remove_partial(n, page); - page->frozen = 1; return 1; } return 0; @@ -1528,7 +1528,6 @@ static void unfreeze_slab(struct kmem_ca { struct kmem_cache_node *n = get_node(s, page_to_nid(page)); - page->frozen = 0; if (page->inuse) { if (page->freelist) { @@ -1661,6 +1660,7 @@ static void deactivate_slab(struct kmem_ } c->page = NULL; c->tid = next_tid(c->tid); + page->frozen = 0; unfreeze_slab(s, page, tail); } @@ -1821,6 +1821,8 @@ static void *__slab_alloc(struct kmem_ca stat(s, ALLOC_REFILL); load_freelist: + VM_BUG_ON(!page->frozen); + object = page->freelist; if (unlikely(!object)) goto another_slab; @@ -1845,6 +1847,7 @@ new_slab: page = get_partial(s, gfpflags, node); if (page) { stat(s, ALLOC_FROM_PARTIAL); + page->frozen = 1; c->node = page_to_nid(page); c->page = page; goto load_freelist; @@ -2370,6 +2373,7 @@ static void early_kmem_cache_node_alloc( BUG_ON(!n); page->freelist = get_freepointer(kmem_cache_node, n); page->inuse++; + page->frozen = 0; kmem_cache_node->node[node] = n; #ifdef CONFIG_SLUB_DEBUG init_object(kmem_cache_node, n, SLUB_RED_ACTIVE); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 6E87490010D for ; Mon, 16 May 2011 16:26:28 -0400 (EDT) Message-Id: <20110516202625.792645168@linux.com> Date: Mon, 16 May 2011 15:26:13 -0500 From: Christoph Lameter Subject: [slubllv5 08/25] mm: Rearrange struct page References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=resort_struct_page Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner We need to be able to use cmpxchg_double on the freelist and object count field in struct page. Rearrange the fields in struct page according to doubleword entities so that the freelist pointer comes before the counters. Do the rearranging with a future in mind where we use more doubleword atomics to avoid locking of updates to flags/mapping or lru pointers. Create another union to allow access to counters in struct page as a single unsigned long value. The doublewords must be properly aligned for cmpxchg_double to work. Sadly this increases the size of page struct by one word on some architectures. But as a resultpage structs are now cacheline aligned on x86_64. Signed-off-by: Christoph Lameter --- include/linux/mm_types.h | 89 +++++++++++++++++++++++++++++++---------------- 1 file changed, 60 insertions(+), 29 deletions(-) Index: linux-2.6/include/linux/mm_types.h =================================================================== --- linux-2.6.orig/include/linux/mm_types.h 2011-05-06 12:03:46.000000000 -0500 +++ linux-2.6/include/linux/mm_types.h 2011-05-06 12:50:40.000000000 -0500 @@ -30,52 +30,74 @@ struct address_space; * moment. Note that we have no way to track which tasks are using * a page, though if it is a pagecache page, rmap structures can tell us * who is mapping it. + * + * The objects in struct page are organized in double word blocks in + * order to allows us to use atomic double word operations on portions + * of struct page. That is currently only used by slub but the arrangement + * allows the use of atomic double word operations on the flags/mapping + * and lru list pointers also. */ struct page { + /* First double word block */ unsigned long flags; /* Atomic flags, some possibly * updated asynchronously */ - atomic_t _count; /* Usage count, see below. */ + struct address_space *mapping; /* If low bit clear, points to + * inode address_space, or NULL. + * If page mapped as anonymous + * memory, low bit is set, and + * it points to anon_vma object: + * see PAGE_MAPPING_ANON below. + */ + /* Second double word */ union { - atomic_t _mapcount; /* Count of ptes mapped in mms, - * to show when page is mapped - * & limit reverse map searches. + struct { + pgoff_t index; /* Our offset within mapping. */ + atomic_t _mapcount; /* Count of ptes mapped in mms, + * to show when page is mapped + * & limit reverse map searches. + */ + atomic_t _count; /* Usage count, see below. */ + }; + + struct { /* SLUB cmpxchg_double area */ + void *freelist; + union { + unsigned long counters; + struct { + unsigned inuse:16; + unsigned objects:15; + unsigned frozen:1; + /* + * Kernel may make use of this field even when slub + * uses the rest of the double word! */ - struct { /* SLUB */ - unsigned inuse:16; - unsigned objects:15; - unsigned frozen:1; + atomic_t _count; + }; + }; }; }; + + /* Third double word block */ + struct list_head lru; /* Pageout list, eg. active_list + * protected by zone->lru_lock ! + */ + + /* Remainder is not double word aligned */ union { - struct { - unsigned long private; /* Mapping-private opaque data: + unsigned long private; /* Mapping-private opaque data: * usually used for buffer_heads * if PagePrivate set; used for * swp_entry_t if PageSwapCache; * indicates order in the buddy * system if PG_buddy is set. */ - struct address_space *mapping; /* If low bit clear, points to - * inode address_space, or NULL. - * If page mapped as anonymous - * memory, low bit is set, and - * it points to anon_vma object: - * see PAGE_MAPPING_ANON below. - */ - }; #if USE_SPLIT_PTLOCKS - spinlock_t ptl; + spinlock_t ptl; #endif - struct kmem_cache *slab; /* SLUB: Pointer to slab */ - struct page *first_page; /* Compound tail pages */ + struct kmem_cache *slab; /* SLUB: Pointer to slab */ + struct page *first_page; /* Compound tail pages */ }; - union { - pgoff_t index; /* Our offset within mapping. */ - void *freelist; /* SLUB: freelist req. slab lock */ - }; - struct list_head lru; /* Pageout list, eg. active_list - * protected by zone->lru_lock ! - */ + /* * On machines where all RAM is mapped into kernel address space, * we can simply calculate the virtual address. On machines with @@ -101,7 +123,16 @@ struct page { */ void *shadow; #endif -}; +} +/* + * If another subsystem starts using the double word pairing for atomic + * operations on struct page then it must change the #if to ensure + * proper alignment of the page struct. + */ +#if defined(CONFIG_SLUB) && defined(CONFIG_CMPXCHG_LOCAL) + __attribute__((__aligned__(2*sizeof(unsigned long)))) +#endif +; /* * A region containing a mapping of a non-memory backed file under NOMMU -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id 0A7528D003B for ; Mon, 16 May 2011 16:26:28 -0400 (EDT) Message-Id: <20110516202626.373428657@linux.com> Date: Mon, 16 May 2011 15:26:14 -0500 From: Christoph Lameter Subject: [slubllv5 09/25] slub: Add cmpxchg_double_slab() References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=cmpxchg_double_slab Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Add a function that operates on the second doubleword in the page struct and manipulates the object counters, the freelist and the frozen attribute. Signed-off-by: Christoph Lameter --- include/linux/slub_def.h | 1 + mm/slub.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 11:46:33.591463082 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 11:46:51.181463060 -0500 @@ -131,6 +131,9 @@ static inline int kmem_cache_debug(struc /* Enable to test recovery from slab corruption on boot */ #undef SLUB_RESILIENCY_TEST +/* Enable to log cmpxchg failures */ +#undef SLUB_DEBUG_CMPXCHG + /* * Mininum number of partial slabs. These will be left on the partial * lists even if they are empty. kmem_cache_shrink may reclaim them. @@ -170,6 +173,7 @@ static inline int kmem_cache_debug(struc /* Internal SLUB flags */ #define __OBJECT_POISON 0x80000000UL /* Poison object */ +#define __CMPXCHG_DOUBLE 0x40000000UL /* Use cmpxchg_double */ static int kmem_size = sizeof(struct kmem_cache); @@ -354,6 +358,37 @@ static void get_map(struct kmem_cache *s set_bit(slab_index(p, s, addr), map); } +static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page, + void *freelist_old, unsigned long counters_old, + void *freelist_new, unsigned long counters_new, + const char *n) +{ +#ifdef CONFIG_CMPXCHG_DOUBLE + if (s->flags & __CMPXCHG_DOUBLE) { + if (cmpxchg_double(&page->freelist, + freelist_old, counters_old, + freelist_new, counters_new)) + return 1; + } else +#endif + { + if (page->freelist == freelist_old && page->counters == counters_old) { + page->freelist = freelist_new; + page->counters = counters_new; + return 1; + } + } + + cpu_relax(); + stat(s, CMPXCHG_DOUBLE_FAIL); + +#ifdef SLUB_DEBUG_CMPXCHG + printk(KERN_INFO "%s %s: cmpxchg double redo ", n, s->name); +#endif + + return 0; +} + /* * Debug settings: */ @@ -2596,6 +2631,12 @@ static int kmem_cache_open(struct kmem_c } } +#ifdef CONFIG_CMPXCHG_DOUBLE + if (system_has_cmpxchg_double() && (s->flags & SLAB_DEBUG_FLAGS) == 0) + /* Enable fast mode */ + s->flags |= __CMPXCHG_DOUBLE; +#endif + /* * The larger the object size is, the more pages we want on the partial * list to avoid pounding the page allocator excessively. @@ -4493,6 +4534,8 @@ STAT_ATTR(DEACTIVATE_TO_HEAD, deactivate STAT_ATTR(DEACTIVATE_TO_TAIL, deactivate_to_tail); STAT_ATTR(DEACTIVATE_REMOTE_FREES, deactivate_remote_frees); STAT_ATTR(ORDER_FALLBACK, order_fallback); +STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail); +STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail); #endif static struct attribute *slab_attrs[] = { @@ -4550,6 +4593,8 @@ static struct attribute *slab_attrs[] = &deactivate_to_tail_attr.attr, &deactivate_remote_frees_attr.attr, &order_fallback_attr.attr, + &cmpxchg_double_fail_attr.attr, + &cmpxchg_double_cpu_fail_attr.attr, #endif #ifdef CONFIG_FAILSLAB &failslab_attr.attr, Index: linux-2.6/include/linux/slub_def.h =================================================================== --- linux-2.6.orig/include/linux/slub_def.h 2011-05-16 11:40:35.371463499 -0500 +++ linux-2.6/include/linux/slub_def.h 2011-05-16 11:46:51.181463060 -0500 @@ -33,6 +33,7 @@ enum stat_item { DEACTIVATE_REMOTE_FREES,/* Slab contained remotely freed objects */ ORDER_FALLBACK, /* Number of times fallback was necessary */ CMPXCHG_DOUBLE_CPU_FAIL,/* Failure of this_cpu_cmpxchg_double */ + CMPXCHG_DOUBLE_FAIL, /* Number of times that cmpxchg double did not match */ NR_SLUB_STAT_ITEMS }; struct kmem_cache_cpu { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with SMTP id 2D9668D004A for ; Mon, 16 May 2011 16:26:29 -0400 (EDT) Message-Id: <20110516202626.965065592@linux.com> Date: Mon, 16 May 2011 15:26:15 -0500 From: Christoph Lameter Subject: [slubllv5 10/25] slub: explicit list_lock taking References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=unlock_list_ops Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner The allocator fastpath rework does change the usage of the list_lock. Remove the list_lock processing from the functions that hide them from the critical sections and move them into those critical sections. This in turn simplifies the support functions (no __ variant needed anymore) and simplifies the lock handling on bootstrap. Signed-off-by: Christoph Lameter --- mm/slub.c | 74 ++++++++++++++++++++++++++++++-------------------------------- 1 file changed, 36 insertions(+), 38 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 11:46:51.181463060 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 11:46:58.311463052 -0500 @@ -917,25 +917,21 @@ static inline void slab_free_hook(struct /* * Tracking of fully allocated slabs for debugging purposes. */ -static void add_full(struct kmem_cache_node *n, struct page *page) +static void add_full(struct kmem_cache *s, + struct kmem_cache_node *n, struct page *page) { - spin_lock(&n->list_lock); + if (!(s->flags & SLAB_STORE_USER)) + return; + list_add(&page->lru, &n->full); - spin_unlock(&n->list_lock); } static void remove_full(struct kmem_cache *s, struct page *page) { - struct kmem_cache_node *n; - if (!(s->flags & SLAB_STORE_USER)) return; - n = get_node(s, page_to_nid(page)); - - spin_lock(&n->list_lock); list_del(&page->lru); - spin_unlock(&n->list_lock); } /* Tracking of the number of slabs for debugging purposes */ @@ -1060,8 +1056,13 @@ static noinline int free_debug_processin } /* Special debug activities for freeing objects */ - if (!page->frozen && !page->freelist) + if (!page->frozen && !page->freelist) { + struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + + spin_lock(&n->list_lock); remove_full(s, page); + spin_unlock(&n->list_lock); + } if (s->flags & SLAB_STORE_USER) set_track(s, object, TRACK_FREE, addr); trace(s, page, object, 0); @@ -1420,36 +1421,26 @@ static __always_inline int slab_trylock( /* * Management of partially allocated slabs */ -static void add_partial(struct kmem_cache_node *n, +static inline void add_partial(struct kmem_cache_node *n, struct page *page, int tail) { - spin_lock(&n->list_lock); n->nr_partial++; if (tail) list_add_tail(&page->lru, &n->partial); else list_add(&page->lru, &n->partial); - spin_unlock(&n->list_lock); } -static inline void __remove_partial(struct kmem_cache_node *n, +static inline void remove_partial(struct kmem_cache_node *n, struct page *page) { list_del(&page->lru); n->nr_partial--; } -static void remove_partial(struct kmem_cache *s, struct page *page) -{ - struct kmem_cache_node *n = get_node(s, page_to_nid(page)); - - spin_lock(&n->list_lock); - __remove_partial(n, page); - spin_unlock(&n->list_lock); -} - /* - * Lock slab and remove from the partial list. + * Lock slab, remove from the partial list and put the object into the + * per cpu freelist. * * Must hold list_lock. */ @@ -1457,7 +1448,7 @@ static inline int lock_and_freeze_slab(s struct page *page) { if (slab_trylock(page)) { - __remove_partial(n, page); + remove_partial(n, page); return 1; } return 0; @@ -1574,12 +1565,17 @@ static void unfreeze_slab(struct kmem_ca if (page->inuse) { if (page->freelist) { + spin_lock(&n->list_lock); add_partial(n, page, tail); + spin_unlock(&n->list_lock); stat(s, tail ? DEACTIVATE_TO_TAIL : DEACTIVATE_TO_HEAD); } else { stat(s, DEACTIVATE_FULL); - if (kmem_cache_debug(s) && (s->flags & SLAB_STORE_USER)) - add_full(n, page); + if (kmem_cache_debug(s) && (s->flags & SLAB_STORE_USER)) { + spin_lock(&n->list_lock); + add_full(s, n, page); + spin_unlock(&n->list_lock); + } } slab_unlock(page); } else { @@ -1595,7 +1591,9 @@ static void unfreeze_slab(struct kmem_ca * kmem_cache_shrink can reclaim any empty slabs from * the partial list. */ + spin_lock(&n->list_lock); add_partial(n, page, 1); + spin_unlock(&n->list_lock); slab_unlock(page); } else { slab_unlock(page); @@ -2095,7 +2093,11 @@ static void __slab_free(struct kmem_cach * then add it. */ if (unlikely(!prior)) { + struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + + spin_lock(&n->list_lock); add_partial(get_node(s, page_to_nid(page)), page, 1); + spin_unlock(&n->list_lock); stat(s, FREE_ADD_PARTIAL); } @@ -2109,7 +2111,11 @@ slab_empty: /* * Slab still on the partial list. */ - remove_partial(s, page); + struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + + spin_lock(&n->list_lock); + remove_partial(n, page); + spin_unlock(&n->list_lock); stat(s, FREE_REMOVE_PARTIAL); } slab_unlock(page); @@ -2391,7 +2397,6 @@ static void early_kmem_cache_node_alloc( { struct page *page; struct kmem_cache_node *n; - unsigned long flags; BUG_ON(kmem_cache_node->size < sizeof(struct kmem_cache_node)); @@ -2418,14 +2423,7 @@ static void early_kmem_cache_node_alloc( init_kmem_cache_node(n, kmem_cache_node); inc_slabs_node(kmem_cache_node, node, page->objects); - /* - * lockdep requires consistent irq usage for each lock - * so even though there cannot be a race this early in - * the boot sequence, we still disable irqs. - */ - local_irq_save(flags); add_partial(n, page, 0); - local_irq_restore(flags); } static void free_kmem_cache_nodes(struct kmem_cache *s) @@ -2709,7 +2707,7 @@ static void free_partial(struct kmem_cac spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry_safe(page, h, &n->partial, lru) { if (!page->inuse) { - __remove_partial(n, page); + remove_partial(n, page); discard_slab(s, page); } else { list_slab_objects(s, page, @@ -3047,7 +3045,7 @@ int kmem_cache_shrink(struct kmem_cache * may have freed the last object and be * waiting to release the slab. */ - __remove_partial(n, page); + remove_partial(n, page); slab_unlock(page); discard_slab(s, page); } else { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with SMTP id C8425900115 for ; Mon, 16 May 2011 16:26:31 -0400 (EDT) Message-Id: <20110516202629.872677981@linux.com> Date: Mon, 16 May 2011 15:26:20 -0500 From: Christoph Lameter Subject: [slubllv5 15/25] slub: Avoid disabling interrupts in free slowpath References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=slab_free_without_irqoff Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Disabling interrupts can be avoided now. However, list operation still require disabling interrupts since allocations can occur from interrupt contexts and there is no way to perform atomic list operations. So acquire the list lock opportunistically if there is a chance that list operations would be needed. This may result in needless synchronizations but allows the avoidance of synchronization in the majority of the cases. Dropping interrupt handling significantly simplifies the slowpath. Signed-off-by: Christoph Lameter --- mm/slub.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 12:45:42.741458944 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 12:45:44.211458942 -0500 @@ -2182,11 +2182,10 @@ static void __slab_free(struct kmem_cach struct kmem_cache_node *n = NULL; unsigned long flags; - local_irq_save(flags); stat(s, FREE_SLOWPATH); if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr)) - goto out_unlock; + return; do { prior = page->freelist; @@ -2205,7 +2204,7 @@ static void __slab_free(struct kmem_cach * Otherwise the list_lock will synchronize with * other processors updating the list of slabs. */ - spin_lock(&n->list_lock); + spin_lock_irqsave(&n->list_lock, flags); } inuse = new.inuse; @@ -2221,7 +2220,7 @@ static void __slab_free(struct kmem_cach */ if (was_frozen) stat(s, FREE_FROZEN); - goto out_unlock; + return; } /* @@ -2244,11 +2243,7 @@ static void __slab_free(struct kmem_cach stat(s, FREE_ADD_PARTIAL); } } - - spin_unlock(&n->list_lock); - -out_unlock: - local_irq_restore(flags); + spin_unlock_irqrestore(&n->list_lock, flags); return; slab_empty: @@ -2260,8 +2255,7 @@ slab_empty: stat(s, FREE_REMOVE_PARTIAL); } - spin_unlock(&n->list_lock); - local_irq_restore(flags); + spin_unlock_irqrestore(&n->list_lock, flags); stat(s, FREE_SLAB); discard_slab(s, page); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id A7CC0900114 for ; Mon, 16 May 2011 16:26:31 -0400 (EDT) Message-Id: <20110516202628.116720236@linux.com> Date: Mon, 16 May 2011 15:26:17 -0500 From: Christoph Lameter Subject: [slubllv5 12/25] slub: Rework allocator fastpaths References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=rework_fastpaths Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Rework the allocation paths so that updates of the page freelist, frozen state and number of objects use cmpxchg_double_slab(). Signed-off-by: Christoph Lameter --- mm/slub.c | 413 ++++++++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 282 insertions(+), 131 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 11:47:01.591463049 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 12:45:33.241458954 -0500 @@ -987,11 +987,6 @@ static noinline int alloc_debug_processi if (!check_slab(s, page)) goto bad; - if (!on_freelist(s, page, object)) { - object_err(s, page, object, "Object already allocated"); - goto bad; - } - if (!check_valid_pointer(s, page, object)) { object_err(s, page, object, "Freelist Pointer check fails"); goto bad; @@ -1055,14 +1050,6 @@ static noinline int free_debug_processin goto fail; } - /* Special debug activities for freeing objects */ - if (!page->frozen && !page->freelist) { - struct kmem_cache_node *n = get_node(s, page_to_nid(page)); - - spin_lock(&n->list_lock); - remove_full(s, page); - spin_unlock(&n->list_lock); - } if (s->flags & SLAB_STORE_USER) set_track(s, object, TRACK_FREE, addr); trace(s, page, object, 0); @@ -1447,11 +1434,52 @@ static inline void remove_partial(struct static inline int lock_and_freeze_slab(struct kmem_cache *s, struct kmem_cache_node *n, struct page *page) { - if (slab_trylock(page)) { - remove_partial(n, page); + void *freelist; + unsigned long counters; + struct page new; + + + if (!slab_trylock(page)) + return 0; + + /* + * Zap the freelist and set the frozen bit. + * The old freelist is the list of objects for the + * per cpu allocation list. + */ + do { + freelist = page->freelist; + counters = page->counters; + new.counters = counters; + new.inuse = page->objects; + + VM_BUG_ON(new.frozen); + new.frozen = 1; + + } while (!cmpxchg_double_slab(s, page, + freelist, counters, + NULL, new.counters, + "lock and freeze")); + + remove_partial(n, page); + + if (freelist) { + /* Populate the per cpu freelist */ + this_cpu_write(s->cpu_slab->freelist, freelist); + this_cpu_write(s->cpu_slab->page, page); + this_cpu_write(s->cpu_slab->node, page_to_nid(page)); return 1; + } else { + /* + * Slab page came from the wrong list. No object to allocate + * from. Put it onto the correct list and continue partial + * scan. + */ + printk(KERN_ERR "SLUB: %s : Page without available objects on" + " partial list\n", s->name); + slab_unlock(page); + return 0; } - return 0; } /* @@ -1551,59 +1579,6 @@ static struct page *get_partial(struct k return get_any_partial(s, flags); } -/* - * Move a page back to the lists. - * - * Must be called with the slab lock held. - * - * On exit the slab lock will have been dropped. - */ -static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail) - __releases(bitlock) -{ - struct kmem_cache_node *n = get_node(s, page_to_nid(page)); - - if (page->inuse) { - - if (page->freelist) { - spin_lock(&n->list_lock); - add_partial(n, page, tail); - spin_unlock(&n->list_lock); - stat(s, tail ? DEACTIVATE_TO_TAIL : DEACTIVATE_TO_HEAD); - } else { - stat(s, DEACTIVATE_FULL); - if (kmem_cache_debug(s) && (s->flags & SLAB_STORE_USER)) { - spin_lock(&n->list_lock); - add_full(s, n, page); - spin_unlock(&n->list_lock); - } - } - slab_unlock(page); - } else { - stat(s, DEACTIVATE_EMPTY); - if (n->nr_partial < s->min_partial) { - /* - * Adding an empty slab to the partial slabs in order - * to avoid page allocator overhead. This slab needs - * to come after the other slabs with objects in - * so that the others get filled first. That way the - * size of the partial list stays small. - * - * kmem_cache_shrink can reclaim any empty slabs from - * the partial list. - */ - spin_lock(&n->list_lock); - add_partial(n, page, 1); - spin_unlock(&n->list_lock); - slab_unlock(page); - } else { - slab_unlock(page); - stat(s, FREE_SLAB); - discard_slab(s, page); - } - } -} - #ifdef CONFIG_PREEMPT /* * Calculate the next globally unique transaction for disambiguiation @@ -1673,37 +1648,158 @@ void init_kmem_cache_cpus(struct kmem_ca /* * Remove the cpu slab */ + +/* + * Remove the cpu slab + */ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c) - __releases(bitlock) { + enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE }; struct page *page = c->page; - int tail = 1; + struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + int lock = 0; + enum slab_modes l = M_NONE, m = M_NONE; + void *freelist; + void *nextfree; + int tail = 0; + struct page new; + struct page old; - if (page->freelist) + if (page->freelist) { stat(s, DEACTIVATE_REMOTE_FREES); + tail = 1; + } + + c->tid = next_tid(c->tid); + c->page = NULL; + freelist = c->freelist; + c->freelist = NULL; + /* - * Merge cpu freelist into slab freelist. Typically we get here - * because both freelists are empty. So this is unlikely - * to occur. + * Stage one: Free all available per cpu objects back + * to the page freelist while it is still frozen. Leave the + * last one. + * + * There is no need to take the list->lock because the page + * is still frozen. */ - while (unlikely(c->freelist)) { - void **object; + while (freelist && (nextfree = get_freepointer(s, freelist))) { + void *prior; + unsigned long counters; + + do { + prior = page->freelist; + counters = page->counters; + set_freepointer(s, freelist, prior); + new.counters = counters; + new.inuse--; + VM_BUG_ON(!new.frozen); + + } while (!cmpxchg_double_slab(s, page, + prior, counters, + freelist, new.counters, + "drain percpu freelist")); - tail = 0; /* Hot objects. Put the slab first */ + freelist = nextfree; + } - /* Retrieve object from cpu_freelist */ - object = c->freelist; - c->freelist = get_freepointer(s, c->freelist); + /* + * Stage two: Ensure that the page is unfrozen while the + * list presence reflects the actual number of objects + * during unfreeze. + * + * We setup the list membership and then perform a cmpxchg + * with the count. If there is a mismatch then the page + * is not unfrozen but the page is on the wrong list. + * + * Then we restart the process which may have to remove + * the page from the list that we just put it on again + * because the number of objects in the slab may have + * changed. + */ +redo: + + old.freelist = page->freelist; + old.counters = page->counters; + VM_BUG_ON(!old.frozen); + + /* Determine target state of the slab */ + new.counters = old.counters; + if (freelist) { + new.inuse--; + set_freepointer(s, freelist, old.freelist); + new.freelist = freelist; + } else + new.freelist = old.freelist; + + new.frozen = 0; + + if (!new.inuse && n->nr_partial < s->min_partial) + m = M_FREE; + else if (new.freelist) { + m = M_PARTIAL; + if (!lock) { + lock = 1; + /* + * Taking the spinlock removes the possiblity + * that acquire_slab() will see a slab page that + * is frozen + */ + spin_lock(&n->list_lock); + } + } else { + m = M_FULL; + if (kmem_cache_debug(s) && !lock) { + lock = 1; + /* + * This also ensures that the scanning of full + * slabs from diagnostic functions will not see + * any frozen slabs. + */ + spin_lock(&n->list_lock); + } + } + + if (l != m) { + + if (l == M_PARTIAL) + + remove_partial(n, page); + + else if (l == M_FULL) + + remove_full(s, page); + + if (m == M_PARTIAL) { - /* And put onto the regular freelist */ - set_freepointer(s, object, page->freelist); - page->freelist = object; - page->inuse--; + add_partial(n, page, tail); + stat(s, tail ? DEACTIVATE_TO_TAIL : DEACTIVATE_TO_HEAD); + + } else if (m == M_FULL) { + + stat(s, DEACTIVATE_FULL); + add_full(s, n, page); + + } + } + + l = m; + if (!cmpxchg_double_slab(s, page, + old.freelist, old.counters, + new.freelist, new.counters, + "unfreezing slab")) + goto redo; + + slab_unlock(page); + + if (lock) + spin_unlock(&n->list_lock); + + if (m == M_FREE) { + stat(s, DEACTIVATE_EMPTY); + discard_slab(s, page); + stat(s, FREE_SLAB); } - c->page = NULL; - c->tid = next_tid(c->tid); - page->frozen = 0; - unfreeze_slab(s, page, tail); } static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c) @@ -1838,6 +1934,8 @@ static void *__slab_alloc(struct kmem_ca void **object; struct page *page; unsigned long flags; + struct page new; + unsigned long counters; local_irq_save(flags); #ifdef CONFIG_PREEMPT @@ -1860,26 +1958,33 @@ static void *__slab_alloc(struct kmem_ca if (unlikely(!node_match(c, node))) goto another_slab; - stat(s, ALLOC_REFILL); + stat(s, ALLOC_SLOWPATH); + + do { + object = page->freelist; + counters = page->counters; + new.counters = counters; + new.inuse = page->objects; + VM_BUG_ON(!new.frozen); + + } while (!cmpxchg_double_slab(s, page, + object, counters, + NULL, new.counters, + "__slab_alloc")); load_freelist: VM_BUG_ON(!page->frozen); - object = page->freelist; if (unlikely(!object)) goto another_slab; - if (kmem_cache_debug(s)) - goto debug; - c->freelist = get_freepointer(s, object); - page->inuse = page->objects; - page->freelist = NULL; + stat(s, ALLOC_REFILL); -unlock_out: slab_unlock(page); + + c->freelist = get_freepointer(s, object); c->tid = next_tid(c->tid); local_irq_restore(flags); - stat(s, ALLOC_SLOWPATH); return object; another_slab: @@ -1889,9 +1994,10 @@ new_slab: page = get_partial(s, gfpflags, node); if (page) { stat(s, ALLOC_FROM_PARTIAL); - page->frozen = 1; - c->node = page_to_nid(page); - c->page = page; + object = c->freelist; + + if (kmem_cache_debug(s)) + goto debug; goto load_freelist; } @@ -1899,12 +2005,19 @@ new_slab: if (page) { c = __this_cpu_ptr(s->cpu_slab); - stat(s, ALLOC_SLAB); if (c->page) flush_slab(s, c); + /* + * No other reference to the page yet so we can + * muck around with it freely without cmpxchg + */ + object = page->freelist; + page->freelist = NULL; + page->inuse = page->objects; + + stat(s, ALLOC_SLAB); slab_lock(page); - page->frozen = 1; c->node = page_to_nid(page); c->page = page; goto load_freelist; @@ -1913,14 +2026,16 @@ new_slab: slab_out_of_memory(s, gfpflags, node); local_irq_restore(flags); return NULL; + debug: - if (!alloc_debug_processing(s, page, object, addr)) - goto another_slab; + if (!object || !alloc_debug_processing(s, page, object, addr)) + goto new_slab; - page->inuse++; - page->freelist = get_freepointer(s, object); + c->freelist = get_freepointer(s, object); + deactivate_slab(s, c); c->node = NUMA_NO_NODE; - goto unlock_out; + local_irq_restore(flags); + return object; } /* @@ -2067,6 +2182,11 @@ static void __slab_free(struct kmem_cach { void *prior; void **object = (void *)x; + int was_frozen; + int inuse; + struct page new; + unsigned long counters; + struct kmem_cache_node *n = NULL; unsigned long flags; local_irq_save(flags); @@ -2076,32 +2196,65 @@ static void __slab_free(struct kmem_cach if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr)) goto out_unlock; - prior = page->freelist; - set_freepointer(s, object, prior); - page->freelist = object; - page->inuse--; - - if (unlikely(page->frozen)) { - stat(s, FREE_FROZEN); - goto out_unlock; - } + do { + prior = page->freelist; + counters = page->counters; + set_freepointer(s, object, prior); + new.counters = counters; + was_frozen = new.frozen; + new.inuse--; + if ((!new.inuse || !prior) && !was_frozen && !n) { + n = get_node(s, page_to_nid(page)); + /* + * Speculatively acquire the list_lock. + * If the cmpxchg does not succeed then we may + * drop the list_lock without any processing. + * + * Otherwise the list_lock will synchronize with + * other processors updating the list of slabs. + */ + spin_lock(&n->list_lock); + } + inuse = new.inuse; - if (unlikely(!page->inuse)) - goto slab_empty; + } while (!cmpxchg_double_slab(s, page, + prior, counters, + object, new.counters, + "__slab_free")); + + if (likely(!n)) { + /* + * The list lock was not taken therefore no list + * activity can be necessary. + */ + if (was_frozen) + stat(s, FREE_FROZEN); + goto out_unlock; + } /* - * Objects left in the slab. If it was not on the partial list before - * then add it. + * was_frozen may have been set after we acquired the list_lock in + * an earlier loop. So we need to check it here again. */ - if (unlikely(!prior)) { - struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + if (was_frozen) + stat(s, FREE_FROZEN); + else { + if (unlikely(!inuse && n->nr_partial > s->min_partial)) + goto slab_empty; - spin_lock(&n->list_lock); - add_partial(get_node(s, page_to_nid(page)), page, 1); - spin_unlock(&n->list_lock); - stat(s, FREE_ADD_PARTIAL); + /* + * Objects left in the slab. If it was not on the partial list before + * then add it. + */ + if (unlikely(!prior)) { + remove_full(s, page); + add_partial(n, page, 0); + stat(s, FREE_ADD_PARTIAL); + } } + spin_unlock(&n->list_lock); + out_unlock: slab_unlock(page); local_irq_restore(flags); @@ -2112,13 +2265,11 @@ slab_empty: /* * Slab still on the partial list. */ - struct kmem_cache_node *n = get_node(s, page_to_nid(page)); - - spin_lock(&n->list_lock); remove_partial(n, page); - spin_unlock(&n->list_lock); stat(s, FREE_REMOVE_PARTIAL); } + + spin_unlock(&n->list_lock); slab_unlock(page); local_irq_restore(flags); stat(s, FREE_SLAB); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with SMTP id A670E6B002D for ; Mon, 16 May 2011 16:26:34 -0400 (EDT) Message-Id: <20110516202630.444217953@linux.com> Date: Mon, 16 May 2011 15:26:21 -0500 From: Christoph Lameter Subject: [slubllv5 16/25] slub: Get rid of the another_slab label References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=eliminate_another_slab Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner We can avoid deactivate slab in special cases if we do the deactivation of slabs in each code flow that leads to new_slab. Signed-off-by: Christoph Lameter --- mm/slub.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 12:45:44.211458942 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 12:45:49.831458937 -0500 @@ -1951,8 +1951,10 @@ static void *__slab_alloc(struct kmem_ca if (!page) goto new_slab; - if (unlikely(!node_match(c, node))) - goto another_slab; + if (unlikely(!node_match(c, node))) { + deactivate_slab(s, c); + goto new_slab; + } stat(s, ALLOC_SLOWPATH); @@ -1972,7 +1974,7 @@ load_freelist: VM_BUG_ON(!page->frozen); if (unlikely(!object)) - goto another_slab; + goto new_slab; stat(s, ALLOC_REFILL); @@ -1981,9 +1983,6 @@ load_freelist: local_irq_restore(flags); return object; -another_slab: - deactivate_slab(s, c); - new_slab: page = get_partial(s, gfpflags, node); if (page) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id D1284900116 for ; Mon, 16 May 2011 16:26:31 -0400 (EDT) Message-Id: <20110516202629.279893737@linux.com> Date: Mon, 16 May 2011 15:26:19 -0500 From: Christoph Lameter Subject: [slubllv5 14/25] slub: Disable interrupts in free_debug processing References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=irqoff_in_free_debug_processing Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner We will be calling free_debug_processing with interrupts disabled in some case when the later patches are applied. Some of the functions called by free_debug_processing expect interrupts to be off. Signed-off-by: Christoph Lameter --- mm/slub.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-12 16:19:44.000000000 -0500 +++ linux-2.6/mm/slub.c 2011-05-12 16:19:56.000000000 -0500 @@ -1035,6 +1035,10 @@ bad: static noinline int free_debug_processing(struct kmem_cache *s, struct page *page, void *object, unsigned long addr) { + unsigned long flags; + int rc = 0; + + local_irq_save(flags); slab_lock(page); if (!check_slab(s, page)) @@ -1051,7 +1055,7 @@ static noinline int free_debug_processin } if (!check_object(s, page, object, SLUB_RED_ACTIVE)) - return 0; + goto out; if (unlikely(s != page->slab)) { if (!PageSlab(page)) { @@ -1072,13 +1076,15 @@ static noinline int free_debug_processin set_track(s, object, TRACK_FREE, addr); trace(s, page, object, 0); init_object(s, object, SLUB_RED_INACTIVE); + rc = 1; +out: slab_unlock(page); - return 1; + local_irq_restore(flags); + return rc; fail: slab_fix(s, "Object at 0x%p not freed", object); - slab_unlock(page); - return 0; + goto out; } static int __init setup_slub_debug(char *str) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 32A44900113 for ; Mon, 16 May 2011 16:26:30 -0400 (EDT) Message-Id: <20110516202627.545322180@linux.com> Date: Mon, 16 May 2011 15:26:16 -0500 From: Christoph Lameter Subject: [slubllv5 11/25] slub: Pass kmem_cache struct to lock and freeze slab References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=pass_kmem_cache_to_lock_and_freeze Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner We need more information about the slab for the cmpxchg implementation. Signed-off-by: Christoph Lameter --- mm/slub.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 11:46:58.311463052 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 11:47:01.591463049 -0500 @@ -1444,8 +1444,8 @@ static inline void remove_partial(struct * * Must hold list_lock. */ -static inline int lock_and_freeze_slab(struct kmem_cache_node *n, - struct page *page) +static inline int lock_and_freeze_slab(struct kmem_cache *s, + struct kmem_cache_node *n, struct page *page) { if (slab_trylock(page)) { remove_partial(n, page); @@ -1457,7 +1457,8 @@ static inline int lock_and_freeze_slab(s /* * Try to allocate a partial slab from a specific node. */ -static struct page *get_partial_node(struct kmem_cache_node *n) +static struct page *get_partial_node(struct kmem_cache *s, + struct kmem_cache_node *n) { struct page *page; @@ -1472,7 +1473,7 @@ static struct page *get_partial_node(str spin_lock(&n->list_lock); list_for_each_entry(page, &n->partial, lru) - if (lock_and_freeze_slab(n, page)) + if (lock_and_freeze_slab(s, n, page)) goto out; page = NULL; out: @@ -1523,7 +1524,7 @@ static struct page *get_any_partial(stru if (n && cpuset_zone_allowed_hardwall(zone, flags) && n->nr_partial > s->min_partial) { - page = get_partial_node(n); + page = get_partial_node(s, n); if (page) { put_mems_allowed(); return page; @@ -1543,7 +1544,7 @@ static struct page *get_partial(struct k struct page *page; int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node; - page = get_partial_node(get_node(s, searchnode)); + page = get_partial_node(s, get_node(s, searchnode)); if (page || node != NUMA_NO_NODE) return page; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id EC34E6B0025 for ; Mon, 16 May 2011 16:26:31 -0400 (EDT) Message-Id: <20110516202628.699365728@linux.com> Date: Mon, 16 May 2011 15:26:18 -0500 From: Christoph Lameter Subject: [slubllv5 13/25] slub: Invert locking and avoid slab lock References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=slab_lock_subsume Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Locking slabs is no longer necesary if the arch supports cmpxchg operations and if no debuggin features are used on a slab. If the arch does not support cmpxchg then we fallback to use the slab lock to do a cmpxchg like operation. The patch also changes the lock order. Slab locks are subsumed to the node lock now. With that approach slab_trylocking is no longer necessary. Signed-off-by: Christoph Lameter --- mm/slub.c | 131 +++++++++++++++++++++++++------------------------------------- 1 file changed, 53 insertions(+), 78 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 12:45:33.241458954 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 12:45:39.451458948 -0500 @@ -2,10 +2,11 @@ * SLUB: A slab allocator that limits cache line use instead of queuing * objects in per cpu and per node lists. * - * The allocator synchronizes using per slab locks and only - * uses a centralized lock to manage a pool of partial slabs. + * The allocator synchronizes using per slab locks or atomic operatios + * and only uses a centralized lock to manage a pool of partial slabs. * * (C) 2007 SGI, Christoph Lameter + * (C) 2011 Linux Foundation, Christoph Lameter */ #include @@ -32,15 +33,27 @@ /* * Lock order: - * 1. slab_lock(page) - * 2. slab->list_lock - * - * The slab_lock protects operations on the object of a particular - * slab and its metadata in the page struct. If the slab lock - * has been taken then no allocations nor frees can be performed - * on the objects in the slab nor can the slab be added or removed - * from the partial or full lists since this would mean modifying - * the page_struct of the slab. + * 1. slub_lock (Global Semaphore) + * 2. node->list_lock + * 3. slab_lock(page) (Only on some arches and for debugging) + * + * slub_lock + * + * The role of the slub_lock is to protect the list of all the slabs + * and to synchronize major metadata changes to slab cache structures. + * + * The slab_lock is only used for debugging and on arches that do not + * have the ability to do a cmpxchg_double. It only protects the second + * double word in the page struct. Meaning + * A. page->freelist -> List of object free in a page + * B. page->counters -> Counters of objects + * C. page->frozen -> frozen state + * + * If a slab is frozen then it is exempt from list management. It is not + * on any list. The processor that froze the slab is the one who can + * perform list operations on the page. Other processors may put objects + * onto the freelist but the processor that froze the slab is the only + * one that can retrieve the objects from the page's freelist. * * The list_lock protects the partial and full list on each node and * the partial slab counter. If taken then no new slabs may be added or @@ -53,20 +66,6 @@ * slabs, operations can continue without any centralized lock. F.e. * allocating a long series of objects that fill up slabs does not require * the list lock. - * - * The lock order is sometimes inverted when we are trying to get a slab - * off a list. We take the list_lock and then look for a page on the list - * to use. While we do that objects in the slabs may be freed. We can - * only operate on the slab if we have also taken the slab_lock. So we use - * a slab_trylock() on the slab. If trylock was successful then no frees - * can occur anymore and we can use the slab for allocations etc. If the - * slab_trylock() does not succeed then frees are in progress in the slab and - * we must stay away from it for a while since we may cause a bouncing - * cacheline if we try to acquire the lock. So go onto the next slab. - * If all pages are busy then we may allocate a new slab instead of reusing - * a partial slab. A new slab has no one operating on it and thus there is - * no danger of cacheline contention. - * * Interrupts are disabled during allocation and deallocation in order to * make the slab allocator safe to use in the context of an irq. In addition * interrupts are disabled to ensure that the processor does not change @@ -346,7 +345,7 @@ static inline int oo_objects(struct kmem /* * Determine a map of object in use on a page. * - * Slab lock or node listlock must be held to guarantee that the page does + * Node listlock must be held to guarantee that the page does * not vanish from under us. */ static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map) @@ -358,6 +357,19 @@ static void get_map(struct kmem_cache *s set_bit(slab_index(p, s, addr), map); } +/* + * Per slab locking using the pagelock + */ +static __always_inline void slab_lock(struct page *page) +{ + bit_spin_lock(PG_locked, &page->flags); +} + +static __always_inline void slab_unlock(struct page *page) +{ + __bit_spin_unlock(PG_locked, &page->flags); +} + static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page, void *freelist_old, unsigned long counters_old, void *freelist_new, unsigned long counters_new, @@ -372,11 +384,14 @@ static inline bool cmpxchg_double_slab(s } else #endif { + slab_lock(page); if (page->freelist == freelist_old && page->counters == counters_old) { page->freelist = freelist_new; page->counters = counters_new; + slab_unlock(page); return 1; } + slab_unlock(page); } cpu_relax(); @@ -808,10 +823,11 @@ static int check_slab(struct kmem_cache static int on_freelist(struct kmem_cache *s, struct page *page, void *search) { int nr = 0; - void *fp = page->freelist; + void *fp; void *object = NULL; unsigned long max_objects; + fp = page->freelist; while (fp && nr <= page->objects) { if (fp == search) return 1; @@ -1019,6 +1035,8 @@ bad: static noinline int free_debug_processing(struct kmem_cache *s, struct page *page, void *object, unsigned long addr) { + slab_lock(page); + if (!check_slab(s, page)) goto fail; @@ -1054,10 +1072,12 @@ static noinline int free_debug_processin set_track(s, object, TRACK_FREE, addr); trace(s, page, object, 0); init_object(s, object, SLUB_RED_INACTIVE); + slab_unlock(page); return 1; fail: slab_fix(s, "Object at 0x%p not freed", object); + slab_unlock(page); return 0; } @@ -1385,27 +1405,6 @@ static void discard_slab(struct kmem_cac } /* - * Per slab locking using the pagelock - */ -static __always_inline void slab_lock(struct page *page) -{ - bit_spin_lock(PG_locked, &page->flags); -} - -static __always_inline void slab_unlock(struct page *page) -{ - __bit_spin_unlock(PG_locked, &page->flags); -} - -static __always_inline int slab_trylock(struct page *page) -{ - int rc = 1; - - rc = bit_spin_trylock(PG_locked, &page->flags); - return rc; -} - -/* * Management of partially allocated slabs */ static inline void add_partial(struct kmem_cache_node *n, @@ -1431,17 +1430,13 @@ static inline void remove_partial(struct * * Must hold list_lock. */ -static inline int lock_and_freeze_slab(struct kmem_cache *s, +static inline int acquire_slab(struct kmem_cache *s, struct kmem_cache_node *n, struct page *page) { void *freelist; unsigned long counters; struct page new; - - if (!slab_trylock(page)) - return 0; - /* * Zap the freelist and set the frozen bit. * The old freelist is the list of objects for the @@ -1477,7 +1472,6 @@ static inline int lock_and_freeze_slab(s */ printk(KERN_ERR "SLUB: %s : Page without available objects on" " partial list\n", s->name); - slab_unlock(page); return 0; } } @@ -1501,7 +1495,7 @@ static struct page *get_partial_node(str spin_lock(&n->list_lock); list_for_each_entry(page, &n->partial, lru) - if (lock_and_freeze_slab(s, n, page)) + if (acquire_slab(s, n, page)) goto out; page = NULL; out: @@ -1790,8 +1784,6 @@ redo: "unfreezing slab")) goto redo; - slab_unlock(page); - if (lock) spin_unlock(&n->list_lock); @@ -1805,7 +1797,6 @@ redo: static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c) { stat(s, CPUSLAB_FLUSH); - slab_lock(c->page); deactivate_slab(s, c); } @@ -1954,7 +1945,6 @@ static void *__slab_alloc(struct kmem_ca if (!page) goto new_slab; - slab_lock(page); if (unlikely(!node_match(c, node))) goto another_slab; @@ -1980,8 +1970,6 @@ load_freelist: stat(s, ALLOC_REFILL); - slab_unlock(page); - c->freelist = get_freepointer(s, object); c->tid = next_tid(c->tid); local_irq_restore(flags); @@ -2017,7 +2005,6 @@ new_slab: page->inuse = page->objects; stat(s, ALLOC_SLAB); - slab_lock(page); c->node = page_to_nid(page); c->page = page; goto load_freelist; @@ -2190,7 +2177,6 @@ static void __slab_free(struct kmem_cach unsigned long flags; local_irq_save(flags); - slab_lock(page); stat(s, FREE_SLOWPATH); if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr)) @@ -2256,7 +2242,6 @@ static void __slab_free(struct kmem_cach spin_unlock(&n->list_lock); out_unlock: - slab_unlock(page); local_irq_restore(flags); return; @@ -2270,7 +2255,6 @@ slab_empty: } spin_unlock(&n->list_lock); - slab_unlock(page); local_irq_restore(flags); stat(s, FREE_SLAB); discard_slab(s, page); @@ -3191,14 +3175,8 @@ int kmem_cache_shrink(struct kmem_cache * list_lock. page->inuse here is the upper limit. */ list_for_each_entry_safe(page, t, &n->partial, lru) { - if (!page->inuse && slab_trylock(page)) { - /* - * Must hold slab lock here because slab_free - * may have freed the last object and be - * waiting to release the slab. - */ + if (!page->inuse) { remove_partial(n, page); - slab_unlock(page); discard_slab(s, page); } else { list_move(&page->lru, @@ -3786,12 +3764,9 @@ static int validate_slab(struct kmem_cac static void validate_slab_slab(struct kmem_cache *s, struct page *page, unsigned long *map) { - if (slab_trylock(page)) { - validate_slab(s, page, map); - slab_unlock(page); - } else - printk(KERN_INFO "SLUB %s: Skipped busy slab 0x%p\n", - s->name, page); + slab_lock(page); + validate_slab(s, page, map); + slab_unlock(page); } static int validate_slab_node(struct kmem_cache *s, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 7B09790010D for ; Mon, 16 May 2011 16:26:33 -0400 (EDT) Message-Id: <20110516202631.025446049@linux.com> Date: Mon, 16 May 2011 15:26:22 -0500 From: Christoph Lameter Subject: [slubllv5 17/25] slub: Add statistics for the case that the current slab does not match the node References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=node_mismatch Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Slub reloads the per cpu slab if the page does not satisfy the NUMA condition. Track those reloads since doing so has a performance impact. Signed-off-by: Christoph Lameter --- include/linux/slub_def.h | 1 + mm/slub.c | 3 +++ 2 files changed, 4 insertions(+) Index: linux-2.6/include/linux/slub_def.h =================================================================== --- linux-2.6.orig/include/linux/slub_def.h 2011-05-16 12:51:07.661458566 -0500 +++ linux-2.6/include/linux/slub_def.h 2011-05-16 12:51:49.901458516 -0500 @@ -24,6 +24,7 @@ enum stat_item { ALLOC_FROM_PARTIAL, /* Cpu slab acquired from partial list */ ALLOC_SLAB, /* Cpu slab acquired from page allocator */ ALLOC_REFILL, /* Refill cpu slab from slab freelist */ + ALLOC_NODE_MISMATCH, /* Switching cpu slab */ FREE_SLAB, /* Slab freed to the page allocator */ CPUSLAB_FLUSH, /* Abandoning of the cpu slab */ DEACTIVATE_FULL, /* Cpu slab was full when deactivated */ Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 12:51:07.651458566 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 12:51:49.901458516 -0500 @@ -1952,6 +1952,7 @@ static void *__slab_alloc(struct kmem_ca goto new_slab; if (unlikely(!node_match(c, node))) { + stat(s, ALLOC_NODE_MISMATCH); deactivate_slab(s, c); goto new_slab; } @@ -4650,6 +4651,7 @@ STAT_ATTR(FREE_REMOVE_PARTIAL, free_remo STAT_ATTR(ALLOC_FROM_PARTIAL, alloc_from_partial); STAT_ATTR(ALLOC_SLAB, alloc_slab); STAT_ATTR(ALLOC_REFILL, alloc_refill); +STAT_ATTR(ALLOC_NODE_MISMATCH, alloc_node_mismatch); STAT_ATTR(FREE_SLAB, free_slab); STAT_ATTR(CPUSLAB_FLUSH, cpuslab_flush); STAT_ATTR(DEACTIVATE_FULL, deactivate_full); @@ -4709,6 +4711,7 @@ static struct attribute *slab_attrs[] = &alloc_from_partial_attr.attr, &alloc_slab_attr.attr, &alloc_refill_attr.attr, + &alloc_node_mismatch_attr.attr, &free_slab_attr.attr, &cpuslab_flush_attr.attr, &deactivate_full_attr.attr, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with SMTP id E38D58D003B for ; Mon, 16 May 2011 16:26:34 -0400 (EDT) Message-Id: <20110516202632.861615235@linux.com> Date: Mon, 16 May 2011 15:26:25 -0500 From: Christoph Lameter Subject: [slubllv5 20/25] slub: slabinfo update for cmpxchg handling References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=update_slabinfo Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Update the statistics handling and the slabinfo tool to include the new statistics in the reports it generates. Signed-off-by: Christoph Lameter --- tools/slub/slabinfo.c | 57 ++++++++++++++++++++++++++++++++++---------------- 1 file changed, 39 insertions(+), 18 deletions(-) Index: linux-2.6/tools/slub/slabinfo.c =================================================================== --- linux-2.6.orig/tools/slub/slabinfo.c 2011-05-16 12:51:06.000000000 -0500 +++ linux-2.6/tools/slub/slabinfo.c 2011-05-16 12:52:08.501458494 -0500 @@ -2,8 +2,9 @@ * Slabinfo: Tool to get reports about slabs * * (C) 2007 sgi, Christoph Lameter + * (C) 2011 Linux Foundation, Christoph Lameter * - * Compile by: + * Compile with: * * gcc -o slabinfo slabinfo.c */ @@ -39,6 +40,8 @@ struct slabinfo { unsigned long cpuslab_flush, deactivate_full, deactivate_empty; unsigned long deactivate_to_head, deactivate_to_tail; unsigned long deactivate_remote_frees, order_fallback; + unsigned long cmpxchg_double_cpu_fail, cmpxchg_double_fail; + unsigned long alloc_node_mismatch, deactivate_bypass; int numa[MAX_NODES]; int numa_partial[MAX_NODES]; } slabinfo[MAX_SLABS]; @@ -99,7 +102,7 @@ static void fatal(const char *x, ...) static void usage(void) { - printf("slabinfo 5/7/2007. (c) 2007 sgi.\n\n" + printf("slabinfo 4/15/2011. (c) 2007 sgi/(c) 2011 Linux Foundation.\n\n" "slabinfo [-ahnpvtsz] [-d debugopts] [slab-regexp]\n" "-a|--aliases Show aliases\n" "-A|--activity Most active slabs first\n" @@ -293,7 +296,7 @@ int line = 0; static void first_line(void) { if (show_activity) - printf("Name Objects Alloc Free %%Fast Fallb O\n"); + printf("Name Objects Alloc Free %%Fast Fallb O CmpX UL\n"); else printf("Name Objects Objsize Space " "Slabs/Part/Cpu O/S O %%Fr %%Ef Flg\n"); @@ -379,14 +382,14 @@ static void show_tracking(struct slabinf printf("\n%s: Kernel object allocation\n", s->name); printf("-----------------------------------------------------------------------\n"); if (read_slab_obj(s, "alloc_calls")) - printf(buffer); + printf("%s", buffer); else printf("No Data\n"); printf("\n%s: Kernel object freeing\n", s->name); printf("------------------------------------------------------------------------\n"); if (read_slab_obj(s, "free_calls")) - printf(buffer); + printf("%s", buffer); else printf("No Data\n"); @@ -400,7 +403,7 @@ static void ops(struct slabinfo *s) if (read_slab_obj(s, "ops")) { printf("\n%s: kmem_cache operations\n", s->name); printf("--------------------------------------------\n"); - printf(buffer); + printf("%s", buffer); } else printf("\n%s has no kmem_cache operations\n", s->name); } @@ -462,19 +465,32 @@ static void slab_stats(struct slabinfo * if (s->cpuslab_flush) printf("Flushes %8lu\n", s->cpuslab_flush); - if (s->alloc_refill) - printf("Refill %8lu\n", s->alloc_refill); - total = s->deactivate_full + s->deactivate_empty + - s->deactivate_to_head + s->deactivate_to_tail; + s->deactivate_to_head + s->deactivate_to_tail + s->deactivate_bypass; - if (total) - printf("Deactivate Full=%lu(%lu%%) Empty=%lu(%lu%%) " - "ToHead=%lu(%lu%%) ToTail=%lu(%lu%%)\n", - s->deactivate_full, (s->deactivate_full * 100) / total, - s->deactivate_empty, (s->deactivate_empty * 100) / total, - s->deactivate_to_head, (s->deactivate_to_head * 100) / total, + if (total) { + printf("\nSlab Deactivation Ocurrences %%\n"); + printf("-------------------------------------------------\n"); + printf("Slab full %7lu %3lu%%\n", + s->deactivate_full, (s->deactivate_full * 100) / total); + printf("Slab empty %7lu %3lu%%\n", + s->deactivate_empty, (s->deactivate_empty * 100) / total); + printf("Moved to head of partial list %7lu %3lu%%\n", + s->deactivate_to_head, (s->deactivate_to_head * 100) / total); + printf("Moved to tail of partial list %7lu %3lu%%\n", s->deactivate_to_tail, (s->deactivate_to_tail * 100) / total); + printf("Deactivation bypass %7lu %3lu%%\n", + s->deactivate_bypass, (s->deactivate_bypass * 100) / total); + printf("Refilled from foreign frees %7lu %3lu%%\n", + s->alloc_refill, (s->alloc_refill * 100) / total); + printf("Node mismatch %7lu %3lu%%\n", + s->alloc_node_mismatch, (s->alloc_node_mismatch * 100) / total); + } + + if (s->cmpxchg_double_fail || s->cmpxchg_double_cpu_fail) + printf("\nCmpxchg_double Looping\n------------------------\n"); + printf("Locked Cmpxchg Double redos %lu\nUnlocked Cmpxchg Double redos %lu\n", + s->cmpxchg_double_fail, s->cmpxchg_double_cpu_fail); } static void report(struct slabinfo *s) @@ -573,12 +589,13 @@ static void slabcache(struct slabinfo *s total_alloc = s->alloc_fastpath + s->alloc_slowpath; total_free = s->free_fastpath + s->free_slowpath; - printf("%-21s %8ld %10ld %10ld %3ld %3ld %5ld %1d\n", + printf("%-21s %8ld %10ld %10ld %3ld %3ld %5ld %1d %4ld %4ld\n", s->name, s->objects, total_alloc, total_free, total_alloc ? (s->alloc_fastpath * 100 / total_alloc) : 0, total_free ? (s->free_fastpath * 100 / total_free) : 0, - s->order_fallback, s->order); + s->order_fallback, s->order, s->cmpxchg_double_fail, + s->cmpxchg_double_cpu_fail); } else printf("%-21s %8ld %7d %8s %14s %4d %1d %3ld %3ld %s\n", @@ -1190,6 +1207,10 @@ static void read_slab_dir(void) slab->deactivate_to_tail = get_obj("deactivate_to_tail"); slab->deactivate_remote_frees = get_obj("deactivate_remote_frees"); slab->order_fallback = get_obj("order_fallback"); + slab->cmpxchg_double_cpu_fail = get_obj("cmpxchg_double_cpu_fail"); + slab->cmpxchg_double_fail = get_obj("cmpxchg_double_fail"); + slab->alloc_node_mismatch = get_obj("alloc_node_mismatch"); + slab->deactivate_bypass = get_obj("deactivate_bypass"); chdir(".."); if (slab->name[0] == ':') alias_targets++; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 752406B002C for ; Mon, 16 May 2011 16:26:34 -0400 (EDT) Message-Id: <20110516202632.259446673@linux.com> Date: Mon, 16 May 2011 15:26:24 -0500 From: Christoph Lameter Subject: [slubllv5 19/25] slub: Not necessary to check for empty slab on load_freelist References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=goto_load_freelist Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner load_freelist is now only branched to only if there are objects available. So no need to check the object variable for NULL. --- mm/slub.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 12:51:57.171458507 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 12:52:00.731458504 -0500 @@ -1983,9 +1983,6 @@ static void *__slab_alloc(struct kmem_ca NULL, new.counters, "__slab_alloc")); -load_freelist: - VM_BUG_ON(!page->frozen); - if (unlikely(!object)) { c->page = NULL; stat(s, DEACTIVATE_BYPASS); @@ -1994,6 +1991,8 @@ load_freelist: stat(s, ALLOC_REFILL); +load_freelist: + VM_BUG_ON(!page->frozen); c->freelist = get_freepointer(s, object); c->tid = next_tid(c->tid); local_irq_restore(flags); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id 58BA16B002A for ; Mon, 16 May 2011 16:26:34 -0400 (EDT) Message-Id: <20110516202631.680279544@linux.com> Date: Mon, 16 May 2011 15:26:23 -0500 From: Christoph Lameter Subject: [slubllv5 18/25] slub: fast release on full slab References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=slab_alloc_fast_release Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Make deactivation occur implicitly while checking out the current freelist. This avoids one cmpxchg operation on a slab that is now fully in use. Signed-off-by: Christoph Lameter --- include/linux/slub_def.h | 1 + mm/slub.c | 21 +++++++++++++++++++-- 2 files changed, 20 insertions(+), 2 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 12:51:49.901458516 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 12:51:57.171458507 -0500 @@ -1963,9 +1963,21 @@ static void *__slab_alloc(struct kmem_ca object = page->freelist; counters = page->counters; new.counters = counters; - new.inuse = page->objects; VM_BUG_ON(!new.frozen); + /* + * If there is no object left then we use this loop to + * deactivate the slab which is simple since no objects + * are left in the slab and therefore we do not need to + * put the page back onto the partial list. + * + * If there are objects left then we retrieve them + * and use them to refill the per cpu queue. + */ + + new.inuse = page->objects; + new.frozen = object != NULL; + } while (!cmpxchg_double_slab(s, page, object, counters, NULL, new.counters, @@ -1974,8 +1986,11 @@ static void *__slab_alloc(struct kmem_ca load_freelist: VM_BUG_ON(!page->frozen); - if (unlikely(!object)) + if (unlikely(!object)) { + c->page = NULL; + stat(s, DEACTIVATE_BYPASS); goto new_slab; + } stat(s, ALLOC_REFILL); @@ -4659,6 +4674,7 @@ STAT_ATTR(DEACTIVATE_EMPTY, deactivate_e STAT_ATTR(DEACTIVATE_TO_HEAD, deactivate_to_head); STAT_ATTR(DEACTIVATE_TO_TAIL, deactivate_to_tail); STAT_ATTR(DEACTIVATE_REMOTE_FREES, deactivate_remote_frees); +STAT_ATTR(DEACTIVATE_BYPASS, deactivate_bypass); STAT_ATTR(ORDER_FALLBACK, order_fallback); STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail); STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail); @@ -4719,6 +4735,7 @@ static struct attribute *slab_attrs[] = &deactivate_to_head_attr.attr, &deactivate_to_tail_attr.attr, &deactivate_remote_frees_attr.attr, + &deactivate_bypass_attr.attr, &order_fallback_attr.attr, &cmpxchg_double_fail_attr.attr, &cmpxchg_double_cpu_fail_attr.attr, Index: linux-2.6/include/linux/slub_def.h =================================================================== --- linux-2.6.orig/include/linux/slub_def.h 2011-05-16 12:51:49.901458516 -0500 +++ linux-2.6/include/linux/slub_def.h 2011-05-16 12:51:57.171458507 -0500 @@ -32,6 +32,7 @@ enum stat_item { DEACTIVATE_TO_HEAD, /* Cpu slab was moved to the head of partials */ DEACTIVATE_TO_TAIL, /* Cpu slab was moved to the tail of partials */ DEACTIVATE_REMOTE_FREES,/* Slab contained remotely freed objects */ + DEACTIVATE_BYPASS, /* Implicit deactivation */ ORDER_FALLBACK, /* Number of times fallback was necessary */ CMPXCHG_DOUBLE_CPU_FAIL,/* Failure of this_cpu_cmpxchg_double */ CMPXCHG_DOUBLE_FAIL, /* Number of times that cmpxchg double did not match */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id 0C562900113 for ; Mon, 16 May 2011 16:26:37 -0400 (EDT) Message-Id: <20110516202635.172662310@linux.com> Date: Mon, 16 May 2011 15:26:29 -0500 From: Christoph Lameter Subject: [slubllv5 24/25] slub: Remove gotos from __slab_free() References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=degotofy_slab_free Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Signed-off-by: Christoph Lameter --- mm/slub.c | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 14:27:50.551451801 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 14:31:53.401451518 -0500 @@ -2259,34 +2259,34 @@ static void __slab_free(struct kmem_cach if (was_frozen) stat(s, FREE_FROZEN); else { - if (unlikely(!inuse && n->nr_partial > s->min_partial)) - goto slab_empty; + if (unlikely(inuse || n->nr_partial <= s->min_partial)) { + /* + * Objects left in the slab. If it was not on the partial list before + * then add it. + */ + if (unlikely(!prior)) { + remove_full(s, page); + add_partial(n, page, 0); + stat(s, FREE_ADD_PARTIAL); + } + } else { + /* Empty slab */ + if (prior) { + /* + * Slab still on the partial list. + */ + remove_partial(n, page); + stat(s, FREE_REMOVE_PARTIAL); + } - /* - * Objects left in the slab. If it was not on the partial list before - * then add it. - */ - if (unlikely(!prior)) { - remove_full(s, page); - add_partial(n, page, 0); - stat(s, FREE_ADD_PARTIAL); + spin_unlock_irqrestore(&n->list_lock, flags); + stat(s, FREE_SLAB); + discard_slab(s, page); + return; } } spin_unlock_irqrestore(&n->list_lock, flags); return; - -slab_empty: - if (prior) { - /* - * Slab still on the partial list. - */ - remove_partial(n, page); - stat(s, FREE_REMOVE_PARTIAL); - } - - spin_unlock_irqrestore(&n->list_lock, flags); - stat(s, FREE_SLAB); - discard_slab(s, page); } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id EB9346B002F for ; Mon, 16 May 2011 16:26:36 -0400 (EDT) Message-Id: <20110516202634.023102369@linux.com> Date: Mon, 16 May 2011 15:26:27 -0500 From: Christoph Lameter Subject: [slubllv5 22/25] slub: pass kmem_cache_cpu pointer to get_partial() References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=push_c_into_get_partial Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Pass the kmem_cache_cpu pointer to get_partial(). That way we can avoid the this_cpu_write() statements. Signed-off-by: Christoph Lameter --- mm/slub.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 12:52:41.421458455 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 12:52:45.161458452 -0500 @@ -1437,7 +1437,8 @@ static inline void remove_partial(struct * Must hold list_lock. */ static inline int acquire_slab(struct kmem_cache *s, - struct kmem_cache_node *n, struct page *page) + struct kmem_cache_node *n, struct page *page, + struct kmem_cache_cpu *c) { void *freelist; unsigned long counters; @@ -1466,9 +1467,9 @@ static inline int acquire_slab(struct km if (freelist) { /* Populate the per cpu freelist */ - this_cpu_write(s->cpu_slab->freelist, freelist); - this_cpu_write(s->cpu_slab->page, page); - this_cpu_write(s->cpu_slab->node, page_to_nid(page)); + c->freelist = freelist; + c->page = page; + c->node = page_to_nid(page); return 1; } else { /* @@ -1486,7 +1487,7 @@ static inline int acquire_slab(struct km * Try to allocate a partial slab from a specific node. */ static struct page *get_partial_node(struct kmem_cache *s, - struct kmem_cache_node *n) + struct kmem_cache_node *n, struct kmem_cache_cpu *c) { struct page *page; @@ -1501,7 +1502,7 @@ static struct page *get_partial_node(str spin_lock(&n->list_lock); list_for_each_entry(page, &n->partial, lru) - if (acquire_slab(s, n, page)) + if (acquire_slab(s, n, page, c)) goto out; page = NULL; out: @@ -1512,7 +1513,8 @@ out: /* * Get a page from somewhere. Search in increasing NUMA distances. */ -static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags) +static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags, + struct kmem_cache_cpu *c) { #ifdef CONFIG_NUMA struct zonelist *zonelist; @@ -1552,7 +1554,7 @@ static struct page *get_any_partial(stru if (n && cpuset_zone_allowed_hardwall(zone, flags) && n->nr_partial > s->min_partial) { - page = get_partial_node(s, n); + page = get_partial_node(s, n, c); if (page) { put_mems_allowed(); return page; @@ -1567,16 +1569,17 @@ static struct page *get_any_partial(stru /* * Get a partial page, lock it and return it. */ -static struct page *get_partial(struct kmem_cache *s, gfp_t flags, int node) +static struct page *get_partial(struct kmem_cache *s, gfp_t flags, int node, + struct kmem_cache_cpu *c) { struct page *page; int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node; - page = get_partial_node(s, get_node(s, searchnode)); + page = get_partial_node(s, get_node(s, searchnode), c); if (page || node != NUMA_NO_NODE) return page; - return get_any_partial(s, flags); + return get_any_partial(s, flags, c); } #ifdef CONFIG_PREEMPT @@ -1645,9 +1648,6 @@ void init_kmem_cache_cpus(struct kmem_ca for_each_possible_cpu(cpu) per_cpu_ptr(s->cpu_slab, cpu)->tid = init_tid(cpu); } -/* - * Remove the cpu slab - */ /* * Remove the cpu slab @@ -1999,7 +1999,7 @@ load_freelist: return object; new_slab: - page = get_partial(s, gfpflags, node); + page = get_partial(s, gfpflags, node, c); if (page) { stat(s, ALLOC_FROM_PARTIAL); object = c->freelist; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id C1072900115 for ; Mon, 16 May 2011 16:26:35 -0400 (EDT) Message-Id: <20110516202633.437021548@linux.com> Date: Mon, 16 May 2011 15:26:26 -0500 From: Christoph Lameter Subject: [slubllv5 21/25] slub: Prepare inuse field in new_slab() References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=new_slab Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner inuse will always be set to page->objects. There is no point in initializing the field to zero in new_slab() and then overwriting the value in __slab_alloc(). Signed-off-by: Christoph Lameter --- mm/slub.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 12:52:00.000000000 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 12:52:41.421458455 -0500 @@ -1332,7 +1332,7 @@ static struct page *new_slab(struct kmem set_freepointer(s, last, NULL); page->freelist = start; - page->inuse = 0; + page->inuse = page->objects; page->frozen = 1; out: return page; @@ -2022,7 +2022,6 @@ new_slab: */ object = page->freelist; page->freelist = NULL; - page->inuse = page->objects; stat(s, ALLOC_SLAB); c->node = page_to_nid(page); @@ -2563,7 +2562,7 @@ static void early_kmem_cache_node_alloc( n = page->freelist; BUG_ON(!n); page->freelist = get_freepointer(kmem_cache_node, n); - page->inuse++; + page->inuse = 1; page->frozen = 0; kmem_cache_node->node[node] = n; #ifdef CONFIG_SLUB_DEBUG -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 7CEF96B0032 for ; Mon, 16 May 2011 16:26:37 -0400 (EDT) Message-Id: <20110516202634.597471664@linux.com> Date: Mon, 16 May 2011 15:26:28 -0500 From: Christoph Lameter Subject: [slubllv5 23/25] slub: return object pointer from get_partial() / new_slab(). References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=object_instead_of_page_return Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner There is no need anymore to return the pointer to a slab page from get_partial() since it can be assigned to the kmem_cache_cpu structures "page" field. Instead return an object pointer. That in turn allows a simplification of the spaghetti code in __slab_alloc(). Signed-off-by: Christoph Lameter --- mm/slub.c | 130 ++++++++++++++++++++++++++++++++++---------------------------- 1 file changed, 73 insertions(+), 57 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 14:11:37.531452935 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 14:24:19.781452046 -0500 @@ -1434,9 +1434,11 @@ static inline void remove_partial(struct * Lock slab, remove from the partial list and put the object into the * per cpu freelist. * + * Returns a list of objects or NULL if it fails. + * * Must hold list_lock. */ -static inline int acquire_slab(struct kmem_cache *s, +static inline void *acquire_slab(struct kmem_cache *s, struct kmem_cache_node *n, struct page *page, struct kmem_cache_cpu *c) { @@ -1467,10 +1469,11 @@ static inline int acquire_slab(struct km if (freelist) { /* Populate the per cpu freelist */ - c->freelist = freelist; c->page = page; c->node = page_to_nid(page); - return 1; + stat(s, ALLOC_FROM_PARTIAL); + + return freelist; } else { /* * Slab page came from the wrong list. No object to allocate @@ -1479,17 +1482,18 @@ static inline int acquire_slab(struct km */ printk(KERN_ERR "SLUB: %s : Page without available objects on" " partial list\n", s->name); - return 0; + return NULL; } } /* * Try to allocate a partial slab from a specific node. */ -static struct page *get_partial_node(struct kmem_cache *s, +static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, struct kmem_cache_cpu *c) { struct page *page; + void *object; /* * Racy check. If we mistakenly see no partial slabs then we @@ -1501,13 +1505,15 @@ static struct page *get_partial_node(str return NULL; spin_lock(&n->list_lock); - list_for_each_entry(page, &n->partial, lru) - if (acquire_slab(s, n, page, c)) + list_for_each_entry(page, &n->partial, lru) { + object = acquire_slab(s, n, page, c); + if (object) goto out; - page = NULL; + } + object = NULL; out: spin_unlock(&n->list_lock); - return page; + return object; } /* @@ -1521,7 +1527,7 @@ static struct page *get_any_partial(stru struct zoneref *z; struct zone *zone; enum zone_type high_zoneidx = gfp_zone(flags); - struct page *page; + void *object; /* * The defrag ratio allows a configuration of the tradeoffs between @@ -1554,10 +1560,10 @@ static struct page *get_any_partial(stru if (n && cpuset_zone_allowed_hardwall(zone, flags) && n->nr_partial > s->min_partial) { - page = get_partial_node(s, n, c); - if (page) { + object = get_partial_node(s, n, c); + if (object) { put_mems_allowed(); - return page; + return object; } } } @@ -1569,15 +1575,15 @@ static struct page *get_any_partial(stru /* * Get a partial page, lock it and return it. */ -static struct page *get_partial(struct kmem_cache *s, gfp_t flags, int node, +static void *get_partial(struct kmem_cache *s, gfp_t flags, int node, struct kmem_cache_cpu *c) { - struct page *page; + void *object; int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node; - page = get_partial_node(s, get_node(s, searchnode), c); - if (page || node != NUMA_NO_NODE) - return page; + object = get_partial_node(s, get_node(s, searchnode), c); + if (object || node != NUMA_NO_NODE) + return object; return get_any_partial(s, flags, c); } @@ -1907,6 +1913,35 @@ slab_out_of_memory(struct kmem_cache *s, } } +static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags, + int node, struct kmem_cache_cpu **pc) +{ + void *object; + struct kmem_cache_cpu *c; + struct page *page = new_slab(s, flags, node); + + if (page) { + c = __this_cpu_ptr(s->cpu_slab); + if (c->page) + flush_slab(s, c); + + /* + * No other reference to the page yet so we can + * muck around with it freely without cmpxchg + */ + object = page->freelist; + page->freelist = NULL; + + stat(s, ALLOC_SLAB); + c->node = page_to_nid(page); + c->page = page; + *pc = c; + } else + object = NULL; + + return object; +} + /* * Slow path. The lockless freelist is empty or we need to perform * debugging duties. @@ -1929,7 +1964,6 @@ static void *__slab_alloc(struct kmem_ca unsigned long addr, struct kmem_cache_cpu *c) { void **object; - struct page *page; unsigned long flags; struct page new; unsigned long counters; @@ -1947,8 +1981,7 @@ static void *__slab_alloc(struct kmem_ca /* We handle __GFP_ZERO in the caller */ gfpflags &= ~__GFP_ZERO; - page = c->page; - if (!page) + if (!c->page) goto new_slab; if (unlikely(!node_match(c, node))) { @@ -1960,8 +1993,8 @@ static void *__slab_alloc(struct kmem_ca stat(s, ALLOC_SLOWPATH); do { - object = page->freelist; - counters = page->counters; + object = c->page->freelist; + counters = c->page->counters; new.counters = counters; VM_BUG_ON(!new.frozen); @@ -1973,12 +2006,12 @@ static void *__slab_alloc(struct kmem_ca * * If there are objects left then we retrieve them * and use them to refill the per cpu queue. - */ + */ - new.inuse = page->objects; + new.inuse = c->page->objects; new.frozen = object != NULL; - } while (!cmpxchg_double_slab(s, page, + } while (!cmpxchg_double_slab(s, c->page, object, counters, NULL, new.counters, "__slab_alloc")); @@ -1992,50 +2025,33 @@ static void *__slab_alloc(struct kmem_ca stat(s, ALLOC_REFILL); load_freelist: - VM_BUG_ON(!page->frozen); c->freelist = get_freepointer(s, object); c->tid = next_tid(c->tid); local_irq_restore(flags); return object; new_slab: - page = get_partial(s, gfpflags, node, c); - if (page) { - stat(s, ALLOC_FROM_PARTIAL); - object = c->freelist; + object = get_partial(s, gfpflags, node, c); - if (kmem_cache_debug(s)) - goto debug; - goto load_freelist; - } + if (unlikely(!object)) { - page = new_slab(s, gfpflags, node); + object = new_slab_objects(s, gfpflags, node, &c); - if (page) { - c = __this_cpu_ptr(s->cpu_slab); - if (c->page) - flush_slab(s, c); + if (unlikely(!object)) { + if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit()) + slab_out_of_memory(s, gfpflags, node); - /* - * No other reference to the page yet so we can - * muck around with it freely without cmpxchg - */ - object = page->freelist; - page->freelist = NULL; + local_irq_restore(flags); + return NULL; + } + } - stat(s, ALLOC_SLAB); - c->node = page_to_nid(page); - c->page = page; + if (likely(!kmem_cache_debug(s))) goto load_freelist; - } - if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit()) - slab_out_of_memory(s, gfpflags, node); - local_irq_restore(flags); - return NULL; -debug: - if (!object || !alloc_debug_processing(s, page, object, addr)) - goto new_slab; + /* Only entered in the debug case */ + if (!alloc_debug_processing(s, c->page, object, addr)) + goto new_slab; /* Slab failed checks. Next slab needed */ c->freelist = get_freepointer(s, object); deactivate_slab(s, c); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta7.messagelabs.com (mail6.bemta7.messagelabs.com [216.82.255.55]) by kanga.kvack.org (Postfix) with ESMTP id AD7A390010C for ; Mon, 16 May 2011 16:26:38 -0400 (EDT) Message-Id: <20110516202635.739312612@linux.com> Date: Mon, 16 May 2011 15:26:30 -0500 From: Christoph Lameter Subject: [slubllv5 25/25] slub: Remove gotos from __slab_alloc() References: <20110516202605.274023469@linux.com> Content-Disposition: inline; filename=degotofy_slab_alloc Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Signed-off-by: Christoph Lameter --- mm/slub.c | 155 ++++++++++++++++++++++++++++++++++---------------------------- 1 file changed, 87 insertions(+), 68 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-05-16 15:00:46.511449498 -0500 +++ linux-2.6/mm/slub.c 2011-05-16 15:07:01.241449060 -0500 @@ -1942,6 +1942,64 @@ static inline void *new_slab_objects(str return object; } +/* Check if the current slab page is matching NUMA requirements. If not deactivate slab */ +static inline int node_is_matching(struct kmem_cache *s, struct kmem_cache_cpu *c, int node) +{ + if (!c->page) + return 0; + + if (!node_match(c, node)) { + stat(s, ALLOC_NODE_MISMATCH); + deactivate_slab(s, c); + return 0; + } else + return 1; +} + +/* + * Retrieve the page freelist locklessly. + * + * Return NULL and deactivate the current slab if no objects are available. + */ +static inline void *get_freelist(struct kmem_cache *s, struct kmem_cache_cpu *c) +{ + struct page new; + unsigned long counters; + void *object; + + do { + object = c->page->freelist; + counters = c->page->counters; + new.counters = counters; + VM_BUG_ON(!new.frozen); + + /* + * If there is no object left then we use this loop to + * deactivate the slab which is simple since no objects + * are left in the slab and therefore we do not need to + * put the page back onto the partial list. + * + * If there are objects left then we retrieve them + * and use them to refill the per cpu queue. + */ + + new.inuse = c->page->objects; + new.frozen = object != NULL; + + } while (!cmpxchg_double_slab(s, c->page, + object, counters, + NULL, new.counters, + "__slab_alloc")); + + if (unlikely(!object)) { + c->page = NULL; + stat(s, DEACTIVATE_BYPASS); + } else + stat(s, ALLOC_REFILL); + + return object; +} + /* * Slow path. The lockless freelist is empty or we need to perform * debugging duties. @@ -1963,10 +2021,8 @@ static inline void *new_slab_objects(str static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, unsigned long addr, struct kmem_cache_cpu *c) { - void **object; + void *object; unsigned long flags; - struct page new; - unsigned long counters; local_irq_save(flags); #ifdef CONFIG_PREEMPT @@ -1981,81 +2037,44 @@ static void *__slab_alloc(struct kmem_ca /* We handle __GFP_ZERO in the caller */ gfpflags &= ~__GFP_ZERO; - if (!c->page) - goto new_slab; - - if (unlikely(!node_match(c, node))) { - stat(s, ALLOC_NODE_MISMATCH); - deactivate_slab(s, c); - goto new_slab; - } - - stat(s, ALLOC_SLOWPATH); - - do { - object = c->page->freelist; - counters = c->page->counters; - new.counters = counters; - VM_BUG_ON(!new.frozen); - - /* - * If there is no object left then we use this loop to - * deactivate the slab which is simple since no objects - * are left in the slab and therefore we do not need to - * put the page back onto the partial list. - * - * If there are objects left then we retrieve them - * and use them to refill the per cpu queue. - */ - - new.inuse = c->page->objects; - new.frozen = object != NULL; - - } while (!cmpxchg_double_slab(s, c->page, - object, counters, - NULL, new.counters, - "__slab_alloc")); - - if (unlikely(!object)) { - c->page = NULL; - stat(s, DEACTIVATE_BYPASS); - goto new_slab; - } + if (node_is_matching(s, c, node) && (object = get_freelist(s, c))) { - stat(s, ALLOC_REFILL); + c->freelist = get_freepointer(s, object); + c->tid = next_tid(c->tid); -load_freelist: - c->freelist = get_freepointer(s, object); - c->tid = next_tid(c->tid); - local_irq_restore(flags); - return object; + } else + while (1) { + object = get_partial(s, gfpflags, node, c); -new_slab: - object = get_partial(s, gfpflags, node, c); + if (unlikely(!object)) { - if (unlikely(!object)) { + object = new_slab_objects(s, gfpflags, node, &c); - object = new_slab_objects(s, gfpflags, node, &c); + if (unlikely(!object)) { + if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit()) + slab_out_of_memory(s, gfpflags, node); + break; + } + } - if (unlikely(!object)) { - if (!(gfpflags & __GFP_NOWARN) && printk_ratelimit()) - slab_out_of_memory(s, gfpflags, node); + if (likely(!kmem_cache_debug(s))) { - local_irq_restore(flags); - return NULL; + c->freelist = get_freepointer(s, object); + c->tid = next_tid(c->tid); + break; + + } else { + /* Only entered in the debug case */ + if (alloc_debug_processing(s, c->page, object, addr)) { + + c->freelist = get_freepointer(s, object); + deactivate_slab(s, c); + c->node = NUMA_NO_NODE; + break; + } } } - if (likely(!kmem_cache_debug(s))) - goto load_freelist; - - /* Only entered in the debug case */ - if (!alloc_debug_processing(s, c->page, object, addr)) - goto new_slab; /* Slab failed checks. Next slab needed */ - - c->freelist = get_freepointer(s, object); - deactivate_slab(s, c); - c->node = NUMA_NO_NODE; local_irq_restore(flags); return object; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 330D390010B for ; Tue, 17 May 2011 00:53:03 -0400 (EDT) Received: by wwi36 with SMTP id 36so104968wwi.26 for ; Mon, 16 May 2011 21:52:59 -0700 (PDT) Subject: Re: [slubllv5 03/25] slub: Make CONFIG_PAGE_ALLOC work with new fastpath From: Eric Dumazet In-Reply-To: <20110516202622.862544137@linux.com> References: <20110516202605.274023469@linux.com> <20110516202622.862544137@linux.com> Content-Type: text/plain; charset="UTF-8" Date: Tue, 17 May 2011 06:52:54 +0200 Message-ID: <1305607974.9466.42.camel@edumazet-laptop> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Pekka Enberg , David Rientjes , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner Le lundi 16 mai 2011 A 15:26 -0500, Christoph Lameter a A(C)crit : > piA?ce jointe document texte brut (fixup) > Fastpath can do a speculative access to a page that CONFIG_PAGE_ALLOC may have CONFIG_DEBUG_PAGE_ALLOC > marked as invalid to retrieve the pointer to the next free object. > > Use probe_kernel_read in that case in order not to cause a page fault. > Some credits would be good, it would certainly help both of us. Reported-by: Eric Dumazet > Signed-off-by: Christoph Lameter Signed-off-by: Eric Dumazet > --- > mm/slub.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > Thanks -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with SMTP id E250B6B0012 for ; Tue, 17 May 2011 09:46:16 -0400 (EDT) Date: Tue, 17 May 2011 08:46:13 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 03/25] slub: Make CONFIG_PAGE_ALLOC work with new fastpath In-Reply-To: <1305607974.9466.42.camel@edumazet-laptop> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202622.862544137@linux.com> <1305607974.9466.42.camel@edumazet-laptop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: Pekka Enberg , David Rientjes , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner On Tue, 17 May 2011, Eric Dumazet wrote: > Some credits would be good, it would certainly help both of us. True. Sorry I just posted my queue without integrating tags. > Reported-by: Eric Dumazet > > > Signed-off-by: Christoph Lameter > > Signed-off-by: Eric Dumazet -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id EBB5B6B0025 for ; Tue, 17 May 2011 15:22:20 -0400 (EDT) Date: Tue, 17 May 2011 22:22:15 +0300 (EEST) From: Pekka Enberg Subject: Re: [slubllv5 03/25] slub: Make CONFIG_PAGE_ALLOC work with new fastpath In-Reply-To: Message-ID: References: <20110516202605.274023469@linux.com> <20110516202622.862544137@linux.com> <1305607974.9466.42.camel@edumazet-laptop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Eric Dumazet , Pekka Enberg , David Rientjes , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner On Tue, 17 May 2011, Christoph Lameter wrote: > On Tue, 17 May 2011, Eric Dumazet wrote: > >> Some credits would be good, it would certainly help both of us. > > True. Sorry I just posted my queue without integrating tags. > >> Reported-by: Eric Dumazet >> >>> Signed-off-by: Christoph Lameter >> >> Signed-off-by: Eric Dumazet Applied, with fixed changelog. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with SMTP id C2AAC6B0011 for ; Thu, 26 May 2011 13:57:28 -0400 (EDT) Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double From: Pekka Enberg In-Reply-To: <20110516202625.197639928@linux.com> References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> Content-Type: text/plain; charset="ISO-8859-1" Date: Thu, 26 May 2011 20:57:25 +0300 Message-ID: <1306432645.16757.137.camel@jaguar> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner , tj@kernel.org On Mon, 2011-05-16 at 15:26 -0500, Christoph Lameter wrote: > plain text document attachment (cmpxchg_double_x86) > A simple implementation that only supports the word size and does not > have a fallback mode (would require a spinlock). > > And 32 and 64 bit support for cmpxchg_double. cmpxchg double uses > the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare > and swap 2 machine words. This allows lockless algorithms to move more > context information through critical sections. > > Set a flag CONFIG_CMPXCHG_DOUBLE to signal the support of that feature > during kernel builds. > > Signed-off-by: Christoph Lameter You forgot to CC Tejun for this patch. > --- > arch/x86/Kconfig.cpu | 3 ++ > arch/x86/include/asm/cmpxchg_32.h | 46 ++++++++++++++++++++++++++++++++++++++ > arch/x86/include/asm/cmpxchg_64.h | 45 +++++++++++++++++++++++++++++++++++++ > arch/x86/include/asm/cpufeature.h | 1 > 4 files changed, 95 insertions(+) > > Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h > =================================================================== > --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h 2011-05-16 11:40:36.421463498 -0500 > +++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h 2011-05-16 11:46:34.781463079 -0500 > @@ -151,4 +151,49 @@ extern void __cmpxchg_wrong_size(void); > cmpxchg_local((ptr), (o), (n)); \ > }) > > +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ > +({ \ > + char __ret; \ > + __typeof__(o2) __junk; \ > + __typeof__(*(ptr)) __old1 = (o1); \ > + __typeof__(o2) __old2 = (o2); \ > + __typeof__(*(ptr)) __new1 = (n1); \ > + __typeof__(o2) __new2 = (n2); \ > + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ > + : "=d"(__junk), "=a"(__ret) \ > + : "S"(ptr), "b"(__new1), "c"(__new2), \ > + "a"(__old1), "d"(__old2)); \ > + __ret; }) > + > + > +#define cmpxchg16b_local(ptr, o1, o2, n1, n2) \ > +({ \ > + char __ret; \ > + __typeof__(o2) __junk; \ > + __typeof__(*(ptr)) __old1 = (o1); \ > + __typeof__(o2) __old2 = (o2); \ > + __typeof__(*(ptr)) __new1 = (n1); \ > + __typeof__(o2) __new2 = (n2); \ > + asm volatile("cmpxchg16b (%%rsi)\n\t\tsetz %1\n\t" \ > + : "=d"(__junk)_, "=a"(__ret) \ > + : "S"((ptr)), "b"(__new1), "c"(__new2), \ > + "a"(__old1), "d"(__old2)); \ > + __ret; }) > + > +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ > +({ \ > + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ > + VM_BUG_ON((unsigned long)(ptr) % 16); \ > + cmpxchg16b((ptr), (o1), (o2), (n1), (n2)); \ > +}) > + > +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ > +({ \ > + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ > + VM_BUG_ON((unsigned long)(ptr) % 16); \ > + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ > +}) > + > +#define system_has_cmpxchg_double() cpu_has_cx16 > + > #endif /* _ASM_X86_CMPXCHG_64_H */ > Index: linux-2.6/arch/x86/include/asm/cmpxchg_32.h > =================================================================== > --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_32.h 2011-05-16 11:40:36.431463498 -0500 > +++ linux-2.6/arch/x86/include/asm/cmpxchg_32.h 2011-05-16 11:46:34.781463079 -0500 > @@ -280,4 +280,50 @@ static inline unsigned long cmpxchg_386( > > #endif > > +#define cmpxchg8b(ptr, o1, o2, n1, n2) \ > +({ \ > + char __ret; \ > + __typeof__(o2) __dummy; \ > + __typeof__(*(ptr)) __old1 = (o1); \ > + __typeof__(o2) __old2 = (o2); \ > + __typeof__(*(ptr)) __new1 = (n1); \ > + __typeof__(o2) __new2 = (n2); \ > + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg8b (%%esi); setz %1"\ > + : "d="(__dummy), "=a" (__ret) \ > + : "S" ((ptr)), "a" (__old1), "d"(__old2), \ > + "b" (__new1), "c" (__new2) \ > + : "memory"); \ > + __ret; }) > + > + > +#define cmpxchg8b_local(ptr, o1, o2, n1, n2) \ > +({ \ > + char __ret; \ > + __typeof__(o2) __dummy; \ > + __typeof__(*(ptr)) __old1 = (o1); \ > + __typeof__(o2) __old2 = (o2); \ > + __typeof__(*(ptr)) __new1 = (n1); \ > + __typeof__(o2) __new2 = (n2); \ > + asm volatile("cmpxchg8b (%%esi); tsetz %1" \ > + : "d="(__dummy), "=a"(__ret) \ > + : "S" ((ptr)), "a" (__old), "d"(__old2), \ > + "b" (__new1), "c" (__new2), \ > + : "memory"); \ > + __ret; }) > + > + > +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ > +({ \ > + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ > + VM_BUG_ON((unsigned long)(ptr) % 8); \ > + cmpxchg8b((ptr), (o1), (o2), (n1), (n2)); \ > +}) > + > +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ > +({ \ > + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ > + VM_BUG_ON((unsigned long)(ptr) % 8); \ > + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ > +}) > + > #endif /* _ASM_X86_CMPXCHG_32_H */ > Index: linux-2.6/arch/x86/Kconfig.cpu > =================================================================== > --- linux-2.6.orig/arch/x86/Kconfig.cpu 2011-05-16 11:40:36.401463498 -0500 > +++ linux-2.6/arch/x86/Kconfig.cpu 2011-05-16 11:46:34.781463079 -0500 > @@ -308,6 +308,9 @@ config X86_CMPXCHG > config CMPXCHG_LOCAL > def_bool X86_64 || (X86_32 && !M386) > > +config CMPXCHG_DOUBLE > + def_bool X86_64 || (X86_32 && !M386) > + > config X86_L1_CACHE_SHIFT > int > default "7" if MPENTIUM4 || MPSC > Index: linux-2.6/arch/x86/include/asm/cpufeature.h > =================================================================== > --- linux-2.6.orig/arch/x86/include/asm/cpufeature.h 2011-05-16 11:40:36.411463498 -0500 > +++ linux-2.6/arch/x86/include/asm/cpufeature.h 2011-05-16 11:46:34.801463079 -0500 > @@ -286,6 +286,7 @@ extern const char * const x86_power_flag > #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR) > #define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ) > #define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE) > +#define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16) > > #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64) > # define cpu_has_invlpg 1 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta12.messagelabs.com (mail6.bemta12.messagelabs.com [216.82.250.247]) by kanga.kvack.org (Postfix) with ESMTP id A35356B0011 for ; Thu, 26 May 2011 14:02:25 -0400 (EDT) Date: Thu, 26 May 2011 13:02:13 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <1306432645.16757.137.camel@jaguar> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <1306432645.16757.137.camel@jaguar> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Pekka Enberg Cc: David Rientjes , Eric Dumazet , "H. Peter Anvin" , linux-mm@kvack.org, Thomas Gleixner , tj@kernel.org On Thu, 26 May 2011, Pekka Enberg wrote: > On Mon, 2011-05-16 at 15:26 -0500, Christoph Lameter wrote: > > plain text document attachment (cmpxchg_double_x86) > > A simple implementation that only supports the word size and does not > > have a fallback mode (would require a spinlock). > > > > And 32 and 64 bit support for cmpxchg_double. cmpxchg double uses > > the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare > > and swap 2 machine words. This allows lockless algorithms to move more > > context information through critical sections. > > > > Set a flag CONFIG_CMPXCHG_DOUBLE to signal the support of that feature > > during kernel builds. > > > > Signed-off-by: Christoph Lameter > > You forgot to CC Tejun for this patch. Ok. I can do that but the patch is not in the same context of the per cpu stuff that we worked on earlier. This is fully locked version of cmpxchg_double. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 695026B0011 for ; Thu, 26 May 2011 14:12:11 -0400 (EDT) Message-ID: <4DDE9670.3060709@zytor.com> Date: Thu, 26 May 2011 11:05:36 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> In-Reply-To: <20110516202625.197639928@linux.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Pekka Enberg , David Rientjes , Eric Dumazet , linux-mm@kvack.org, Thomas Gleixner On 05/16/2011 01:26 PM, Christoph Lameter wrote: > A simple implementation that only supports the word size and does not > have a fallback mode (would require a spinlock). > > And 32 and 64 bit support for cmpxchg_double. cmpxchg double uses > the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare > and swap 2 machine words. This allows lockless algorithms to move more > context information through critical sections. > > Set a flag CONFIG_CMPXCHG_DOUBLE to signal the support of that feature > during kernel builds. > > Signed-off-by: Christoph Lameter > > > +config CMPXCHG_DOUBLE > + def_bool X86_64 || (X86_32 && !M386) > + CMPXCHG16B is not a baseline feature for the Linux x86-64 build, and CMPXCHG8G is a Pentium, not a 486, feature. Nacked-by: H. Peter Anvin -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with SMTP id B0BB66B0011 for ; Thu, 26 May 2011 14:17:24 -0400 (EDT) Date: Thu, 26 May 2011 13:17:11 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <4DDE9670.3060709@zytor.com> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Pekka Enberg , David Rientjes , Eric Dumazet , linux-mm@kvack.org, Thomas Gleixner On Thu, 26 May 2011, H. Peter Anvin wrote: > > +config CMPXCHG_DOUBLE > > + def_bool X86_64 || (X86_32 && !M386) > > + > > CMPXCHG16B is not a baseline feature for the Linux x86-64 build, and > CMPXCHG8G is a Pentium, not a 486, feature. > > Nacked-by: H. Peter Anvin Hmmm... We may have to call it CONFIG_CMPXCHG_DOUBLE_POSSIBLE then? Because the slub code tests the flag in the processor and will not use the cmpxchg16b from the allocator if its not there. It will then fallback to using a bit lock in page struct. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 94F206B0011 for ; Thu, 26 May 2011 14:35:52 -0400 (EDT) Message-ID: <4DDE9C01.2090104@zytor.com> Date: Thu, 26 May 2011 11:29:21 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Pekka Enberg , David Rientjes , Eric Dumazet , linux-mm@kvack.org, Thomas Gleixner On 05/26/2011 11:17 AM, Christoph Lameter wrote: > On Thu, 26 May 2011, H. Peter Anvin wrote: > >>> +config CMPXCHG_DOUBLE >>> + def_bool X86_64 || (X86_32 && !M386) >>> + >> >> CMPXCHG16B is not a baseline feature for the Linux x86-64 build, and >> CMPXCHG8G is a Pentium, not a 486, feature. >> >> Nacked-by: H. Peter Anvin > > Hmmm... We may have to call it CONFIG_CMPXCHG_DOUBLE_POSSIBLE then? > > Because the slub code tests the flag in the processor and will not use the > cmpxchg16b from the allocator if its not there. It will then fallback to > using a bit lock in page struct. > Well, if it is just about being "possible" then it should simply be true for all of x86. There is no reason to exclude i386 (which is all your above predicate does, it is exactly equivalent to !M386). -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with SMTP id CBDC46B0011 for ; Thu, 26 May 2011 14:42:42 -0400 (EDT) Date: Thu, 26 May 2011 13:42:39 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <4DDE9C01.2090104@zytor.com> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Pekka Enberg , David Rientjes , Eric Dumazet , linux-mm@kvack.org, Thomas Gleixner On Thu, 26 May 2011, H. Peter Anvin wrote: > On 05/26/2011 11:17 AM, Christoph Lameter wrote: > > On Thu, 26 May 2011, H. Peter Anvin wrote: > > > >>> +config CMPXCHG_DOUBLE > >>> + def_bool X86_64 || (X86_32 && !M386) > >>> + > >> > >> CMPXCHG16B is not a baseline feature for the Linux x86-64 build, and > >> CMPXCHG8G is a Pentium, not a 486, feature. > >> > >> Nacked-by: H. Peter Anvin > > > > Hmmm... We may have to call it CONFIG_CMPXCHG_DOUBLE_POSSIBLE then? > > > > Because the slub code tests the flag in the processor and will not use the > > cmpxchg16b from the allocator if its not there. It will then fallback to > > using a bit lock in page struct. > > > > Well, if it is just about being "possible" then it should simply be true > for all of x86. There is no reason to exclude i386 (which is all your > above predicate does, it is exactly equivalent to !M386). Ok. Possible means that the code for cmpxchg16b/8b will be compiled in. Then how do I exclude the code if someone compiles a kernel for a processor that certainly does not support these instructions? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta7.messagelabs.com (mail6.bemta7.messagelabs.com [216.82.255.55]) by kanga.kvack.org (Postfix) with ESMTP id 35A6B6B0011 for ; Thu, 26 May 2011 17:16:33 -0400 (EDT) Date: Thu, 26 May 2011 16:16:28 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <4DDE9C01.2090104@zytor.com> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Pekka Enberg , David Rientjes , Eric Dumazet , linux-mm@kvack.org, Thomas Gleixner Here is a new patch that may address the concerns. The list of cpus that support CMPXCHG_DOUBLE is not complete.Could someone help me complete it? Subject: x86: Add support for cmpxchg_double A simple implementation that only supports the word size and does not have a fallback mode (would require a spinlock). And 32 and 64 bit support for cmpxchg_double. cmpxchg double uses the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare and swap 2 machine words. This allows lockless algorithms to move more context information through critical sections. Set a flag CONFIG_CMPXCHG_DOUBLE to signal the support of that feature during kernel builds. Cc: tj@kernel.org Signed-off-by: Christoph Lameter --- arch/x86/Kconfig.cpu | 10 +++++++ arch/x86/include/asm/cmpxchg_32.h | 48 ++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cmpxchg_64.h | 45 +++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cpufeature.h | 1 4 files changed, 104 insertions(+) Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h 2011-05-26 16:03:33.595608967 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h 2011-05-26 16:06:25.815607865 -0500 @@ -151,4 +151,49 @@ extern void __cmpxchg_wrong_size(void); cmpxchg_local((ptr), (o), (n)); \ }) +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ + : "=d"(__junk), "=a"(__ret) \ + : "S"(ptr), "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + + +#define cmpxchg16b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg16b (%%rsi)\n\t\tsetz %1\n\t" \ + : "=d"(__junk)_, "=a"(__ret) \ + : "S"((ptr)), "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define system_has_cmpxchg_double() cpu_has_cx16 + #endif /* _ASM_X86_CMPXCHG_64_H */ Index: linux-2.6/arch/x86/include/asm/cmpxchg_32.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_32.h 2011-05-26 16:03:33.615608967 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_32.h 2011-05-26 16:07:27.895607465 -0500 @@ -280,4 +280,52 @@ static inline unsigned long cmpxchg_386( #endif +#define cmpxchg8b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg8b (%%esi); setz %1"\ + : "d="(__dummy), "=a" (__ret) \ + : "S" ((ptr)), "a" (__old1), "d"(__old2), \ + "b" (__new1), "c" (__new2) \ + : "memory"); \ + __ret; }) + + +#define cmpxchg8b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg8b (%%esi); tsetz %1" \ + : "d="(__dummy), "=a"(__ret) \ + : "S" ((ptr)), "a" (__old), "d"(__old2), \ + "b" (__new1), "c" (__new2), \ + : "memory"); \ + __ret; }) + + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg8b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define system_has_cmpxchg_double() cpu_has_cx8 + #endif /* _ASM_X86_CMPXCHG_32_H */ Index: linux-2.6/arch/x86/Kconfig.cpu =================================================================== --- linux-2.6.orig/arch/x86/Kconfig.cpu 2011-05-26 16:03:33.625608967 -0500 +++ linux-2.6/arch/x86/Kconfig.cpu 2011-05-26 16:13:22.795605197 -0500 @@ -312,6 +312,16 @@ config X86_CMPXCHG config CMPXCHG_LOCAL def_bool X86_64 || (X86_32 && !M386) +# +# CMPXCHG_DOUBLE needs to be set to enable the kernel to use cmpxchg16/8b +# for cmpxchg_double if it find processor flags that indicate that the +# capabilities are available. CMPXCHG_DOUBLE only compiles in +# detection support. It needs to be set if there is a chance that processor +# supports these instructions. +# +config CMPXCHG_DOUBLE + def_bool GENERIC_CPU || X86_GENERIC || M486 || MPENTIUM4 || MATOM || MCORE2 + config X86_L1_CACHE_SHIFT int default "7" if MPENTIUM4 || MPSC Index: linux-2.6/arch/x86/include/asm/cpufeature.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cpufeature.h 2011-05-26 16:03:33.605608967 -0500 +++ linux-2.6/arch/x86/include/asm/cpufeature.h 2011-05-26 16:06:25.815607865 -0500 @@ -288,6 +288,7 @@ extern const char * const x86_power_flag #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR) #define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ) #define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE) +#define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16) #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64) # define cpu_has_invlpg 1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id 45EE96B0011 for ; Thu, 26 May 2011 17:26:06 -0400 (EDT) Received: by wwi36 with SMTP id 36so1041098wwi.26 for ; Thu, 26 May 2011 14:26:03 -0700 (PDT) Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double From: Eric Dumazet In-Reply-To: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> Content-Type: text/plain; charset="UTF-8" Date: Thu, 26 May 2011 23:25:59 +0200 Message-ID: <1306445159.2543.25.camel@edumazet-laptop> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: "H. Peter Anvin" , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner Le jeudi 26 mai 2011 A 16:16 -0500, Christoph Lameter a A(C)crit : > Here is a new patch that may address the concerns. The list of cpus that > support CMPXCHG_DOUBLE is not complete.Could someone help me complete it? > > > > Subject: x86: Add support for cmpxchg_double > > A simple implementation that only supports the word size and does not > have a fallback mode (would require a spinlock). > > And 32 and 64 bit support for cmpxchg_double. cmpxchg double uses > the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare > and swap 2 machine words. This allows lockless algorithms to move more > context information through critical sections. > > Set a flag CONFIG_CMPXCHG_DOUBLE to signal the support of that feature > during kernel builds. > > Cc: tj@kernel.org > Signed-off-by: Christoph Lameter > > --- > arch/x86/Kconfig.cpu | 10 +++++++ > arch/x86/include/asm/cmpxchg_32.h | 48 ++++++++++++++++++++++++++++++++++++++ > arch/x86/include/asm/cmpxchg_64.h | 45 +++++++++++++++++++++++++++++++++++ > arch/x86/include/asm/cpufeature.h | 1 > 4 files changed, 104 insertions(+) > > Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h > =================================================================== > --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h 2011-05-26 16:03:33.595608967 -0500 > +++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h 2011-05-26 16:06:25.815607865 -0500 > @@ -151,4 +151,49 @@ extern void __cmpxchg_wrong_size(void); > cmpxchg_local((ptr), (o), (n)); \ > }) > > +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ > +({ \ > + char __ret; \ > + __typeof__(o2) __junk; \ > + __typeof__(*(ptr)) __old1 = (o1); \ > + __typeof__(o2) __old2 = (o2); \ > + __typeof__(*(ptr)) __new1 = (n1); \ > + __typeof__(o2) __new2 = (n2); \ > + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ If there is no emulation, why do you force rsi here ? It could be something else, like "=m" (*ptr) ? (same remark for other functions) > + : "=d"(__junk), "=a"(__ret) \ > + : "S"(ptr), "b"(__new1), "c"(__new2), \ > + "a"(__old1), "d"(__old2)); \ > + __ret; }) > + > + -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id A41736B0011 for ; Thu, 26 May 2011 17:28:26 -0400 (EDT) Message-ID: <4DDEC473.3050701@zytor.com> Date: Thu, 26 May 2011 14:21:55 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Pekka Enberg , David Rientjes , Eric Dumazet , linux-mm@kvack.org, Thomas Gleixner On 05/26/2011 02:16 PM, Christoph Lameter wrote: > Here is a new patch that may address the concerns. The list of cpus that > support CMPXCHG_DOUBLE is not complete.Could someone help me complete it? > Index: linux-2.6/arch/x86/Kconfig.cpu > =================================================================== > --- linux-2.6.orig/arch/x86/Kconfig.cpu 2011-05-26 16:03:33.625608967 -0500 > +++ linux-2.6/arch/x86/Kconfig.cpu 2011-05-26 16:13:22.795605197 -0500 > @@ -312,6 +312,16 @@ config X86_CMPXCHG > config CMPXCHG_LOCAL > def_bool X86_64 || (X86_32 && !M386) > > +# > +# CMPXCHG_DOUBLE needs to be set to enable the kernel to use cmpxchg16/8b > +# for cmpxchg_double if it find processor flags that indicate that the > +# capabilities are available. CMPXCHG_DOUBLE only compiles in > +# detection support. It needs to be set if there is a chance that processor > +# supports these instructions. > +# > +config CMPXCHG_DOUBLE > + def_bool GENERIC_CPU || X86_GENERIC || M486 || MPENTIUM4 || MATOM || MCORE2 > + How about: X86_64 || X86_GENERIC || !M386 -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta7.messagelabs.com (mail6.bemta7.messagelabs.com [216.82.255.55]) by kanga.kvack.org (Postfix) with ESMTP id B372A6B0023 for ; Thu, 26 May 2011 17:38:05 -0400 (EDT) Message-ID: <4DDEC6B4.4050509@zytor.com> Date: Thu, 26 May 2011 14:31:32 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> In-Reply-To: <1306445159.2543.25.camel@edumazet-laptop> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: Christoph Lameter , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On 05/26/2011 02:25 PM, Eric Dumazet wrote: >> >> +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ >> +({ \ >> + char __ret; \ >> + __typeof__(o2) __junk; \ >> + __typeof__(*(ptr)) __old1 = (o1); \ >> + __typeof__(o2) __old2 = (o2); \ >> + __typeof__(*(ptr)) __new1 = (n1); \ >> + __typeof__(o2) __new2 = (n2); \ >> + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ > > If there is no emulation, why do you force rsi here ? > > It could be something else, like "=m" (*ptr) ? > > (same remark for other functions) > "+m" (*ptr) please... -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id C355B6B0023 for ; Thu, 26 May 2011 17:45:09 -0400 (EDT) Received: by wwi18 with SMTP id 18so4807238wwi.2 for ; Thu, 26 May 2011 14:45:07 -0700 (PDT) Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double From: Eric Dumazet In-Reply-To: <4DDEC6B4.4050509@zytor.com> References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DDEC6B4.4050509@zytor.com> Content-Type: text/plain; charset="UTF-8" Date: Thu, 26 May 2011 23:45:03 +0200 Message-ID: <1306446303.2543.27.camel@edumazet-laptop> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Christoph Lameter , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner Le jeudi 26 mai 2011 A 14:31 -0700, H. Peter Anvin a A(C)crit : > On 05/26/2011 02:25 PM, Eric Dumazet wrote: > >> > >> +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ > >> +({ \ > >> + char __ret; \ > >> + __typeof__(o2) __junk; \ > >> + __typeof__(*(ptr)) __old1 = (o1); \ > >> + __typeof__(o2) __old2 = (o2); \ > >> + __typeof__(*(ptr)) __new1 = (n1); \ > >> + __typeof__(o2) __new2 = (n2); \ > >> + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ > > > > If there is no emulation, why do you force rsi here ? > > > > It could be something else, like "=m" (*ptr) ? > > > > (same remark for other functions) > > > > "+m" (*ptr) please... > > -hpa Oh well, I guess I was fooled by : (arch/x86/include/asm/cmpxchg_32.h) static inline void set_64bit(volatile u64 *ptr, u64 value) { u32 low = value; u32 high = value >> 32; u64 prev = *ptr; asm volatile("\n1:\t" LOCK_PREFIX "cmpxchg8b %0\n\t" "jnz 1b" : "=m" (*ptr), "+A" (prev) : "b" (low), "c" (high) : "memory"); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id A2D936B0011 for ; Thu, 26 May 2011 20:56:20 -0400 (EDT) Message-ID: <4DDEF524.9000109@zytor.com> Date: Thu, 26 May 2011 17:49:40 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DDEC6B4.4050509@zytor.com> <1306446303.2543.27.camel@edumazet-laptop> In-Reply-To: <1306446303.2543.27.camel@edumazet-laptop> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: Christoph Lameter , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On 05/26/2011 02:45 PM, Eric Dumazet wrote: >> >> "+m" (*ptr) please... >> >> -hpa > > Oh well, I guess I was fooled by : > > (arch/x86/include/asm/cmpxchg_32.h) > > static inline void set_64bit(volatile u64 *ptr, u64 value) > { > u32 low = value; > u32 high = value >> 32; > u64 prev = *ptr; > > asm volatile("\n1:\t" > LOCK_PREFIX "cmpxchg8b %0\n\t" > "jnz 1b" > : "=m" (*ptr), "+A" (prev) > : "b" (low), "c" (high) > : "memory"); > } > That's =m because the operation implemented by the asm() statement as a whole is an assignment; the memory location after the entire asm() statement has executed does not depend on the input value. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with ESMTP id 020786B0011 for ; Thu, 26 May 2011 20:57:08 -0400 (EDT) Message-ID: <4DDEF559.6040107@zytor.com> Date: Thu, 26 May 2011 17:50:33 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Pekka Enberg , David Rientjes , Eric Dumazet , linux-mm@kvack.org, Thomas Gleixner On 05/26/2011 02:16 PM, Christoph Lameter wrote: > + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ Just spotted this: LOCK_PREFIX_HERE "lock; " is kind of redundant, no? -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 2473F6B0011 for ; Tue, 31 May 2011 11:10:38 -0400 (EDT) Date: Tue, 31 May 2011 10:10:31 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <4DDEF559.6040107@zytor.com> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <4DDEF559.6040107@zytor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Pekka Enberg , David Rientjes , Eric Dumazet , linux-mm@kvack.org, Thomas Gleixner On Thu, 26 May 2011, H. Peter Anvin wrote: > On 05/26/2011 02:16 PM, Christoph Lameter wrote: > > + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ > > Just spotted this: LOCK_PREFIX_HERE "lock; " is kind of redundant, no? cmpxchg_386 does that too. Got it from there I guess. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 661EC6B0011 for ; Tue, 31 May 2011 11:13:17 -0400 (EDT) Date: Tue, 31 May 2011 10:13:13 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <1306445159.2543.25.camel@edumazet-laptop> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: "H. Peter Anvin" , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On Thu, 26 May 2011, Eric Dumazet wrote: > > +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ > > +({ \ > > + char __ret; \ > > + __typeof__(o2) __junk; \ > > + __typeof__(*(ptr)) __old1 = (o1); \ > > + __typeof__(o2) __old2 = (o2); \ > > + __typeof__(*(ptr)) __new1 = (n1); \ > > + __typeof__(o2) __new2 = (n2); \ > > + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ > > If there is no emulation, why do you force rsi here ? > > It could be something else, like "=m" (*ptr) ? > > (same remark for other functions) Well I ran into trouble with =m. Maybe +m will do. Will try again. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta12.messagelabs.com (mail6.bemta12.messagelabs.com [216.82.250.247]) by kanga.kvack.org (Postfix) with ESMTP id C2ACB6B0011 for ; Tue, 31 May 2011 11:23:23 -0400 (EDT) Message-ID: <4DE50632.90906@zytor.com> Date: Tue, 31 May 2011 08:16:02 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On 05/31/2011 08:13 AM, Christoph Lameter wrote: > On Thu, 26 May 2011, Eric Dumazet wrote: > >>> +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ >>> +({ \ >>> + char __ret; \ >>> + __typeof__(o2) __junk; \ >>> + __typeof__(*(ptr)) __old1 = (o1); \ >>> + __typeof__(o2) __old2 = (o2); \ >>> + __typeof__(*(ptr)) __new1 = (n1); \ >>> + __typeof__(o2) __new2 = (n2); \ >>> + asm volatile(LOCK_PREFIX_HERE "lock; cmpxchg16b (%%rsi);setz %1" \ >> >> If there is no emulation, why do you force rsi here ? >> >> It could be something else, like "=m" (*ptr) ? >> >> (same remark for other functions) > > Well I ran into trouble with =m. Maybe +m will do. Will try again. > Yes, =m would be very wrong indeed. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id BDA596B0023 for ; Tue, 31 May 2011 12:53:32 -0400 (EDT) Date: Tue, 31 May 2011 11:53:28 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <4DE50632.90906@zytor.com> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DE50632.90906@zytor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On Tue, 31 May 2011, H. Peter Anvin wrote: > > Well I ran into trouble with =m. Maybe +m will do. Will try again. > > > > Yes, =m would be very wrong indeed. Subject: x86: Add support for cmpxchg_double A simple implementation that only supports the word size and does not have a fallback mode (would require a spinlock). Add 32 and 64 bit support for cmpxchg_double. cmpxchg double uses the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare and swap 2 machine words. This allows lockless algorithms to move more context information through critical sections. Set a flag CONFIG_CMPXCHG_DOUBLE to signal that support for double word cmpxchg detection has been build into the kernel. Note that each subsystem using cmpxchg_double has to implement a fall back mechanism as long as we offer support for processors that do not implement cmpxchg_double. Cc: tj@kernel.org Signed-off-by: Christoph Lameter --- arch/x86/Kconfig.cpu | 10 +++++++ arch/x86/include/asm/cmpxchg_32.h | 48 ++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cmpxchg_64.h | 45 +++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cpufeature.h | 1 4 files changed, 104 insertions(+) Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h 2011-05-31 11:28:24.172948792 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h 2011-05-31 11:35:51.892945925 -0500 @@ -151,4 +151,49 @@ extern void __cmpxchg_wrong_size(void); cmpxchg_local((ptr), (o), (n)); \ }) +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX "cmpxchg16b %2;setz %1" \ + : "=d"(__junk), "=a"(__ret), "+m" (*ptr) \ + : "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + + +#define cmpxchg16b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg16b %2;setz %1" \ + : "=d"(__junk), "=a"(__ret), "+m" (*ptr) \ + : "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define system_has_cmpxchg_double() cpu_has_cx16 + #endif /* _ASM_X86_CMPXCHG_64_H */ Index: linux-2.6/arch/x86/include/asm/cmpxchg_32.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_32.h 2011-05-31 11:28:24.192948792 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_32.h 2011-05-31 11:29:36.742948327 -0500 @@ -280,4 +280,52 @@ static inline unsigned long cmpxchg_386( #endif +#define cmpxchg8b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX_HERE "cmpxchg8b (%%esi); setz %1"\ + : "d="(__dummy), "=a" (__ret) \ + : "S" ((ptr)), "a" (__old1), "d"(__old2), \ + "b" (__new1), "c" (__new2) \ + : "memory"); \ + __ret; }) + + +#define cmpxchg8b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg8b (%%esi); tsetz %1" \ + : "d="(__dummy), "=a"(__ret) \ + : "S" ((ptr)), "a" (__old), "d"(__old2), \ + "b" (__new1), "c" (__new2), \ + : "memory"); \ + __ret; }) + + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg8b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define system_has_cmpxchg_double() cpu_has_cx8 + #endif /* _ASM_X86_CMPXCHG_32_H */ Index: linux-2.6/arch/x86/Kconfig.cpu =================================================================== --- linux-2.6.orig/arch/x86/Kconfig.cpu 2011-05-31 11:28:24.202948792 -0500 +++ linux-2.6/arch/x86/Kconfig.cpu 2011-05-31 11:29:36.742948327 -0500 @@ -312,6 +312,16 @@ config X86_CMPXCHG config CMPXCHG_LOCAL def_bool X86_64 || (X86_32 && !M386) +# +# CMPXCHG_DOUBLE needs to be set to enable the kernel to use cmpxchg16/8b +# for cmpxchg_double if it find processor flags that indicate that the +# capabilities are available. CMPXCHG_DOUBLE only compiles in +# detection support. It needs to be set if there is a chance that processor +# supports these instructions. +# +config CMPXCHG_DOUBLE + def_bool GENERIC_CPU || X86_GENERIC || !M386 + config X86_L1_CACHE_SHIFT int default "7" if MPENTIUM4 || MPSC Index: linux-2.6/arch/x86/include/asm/cpufeature.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cpufeature.h 2011-05-31 11:28:24.182948792 -0500 +++ linux-2.6/arch/x86/include/asm/cpufeature.h 2011-05-31 11:29:36.742948327 -0500 @@ -288,6 +288,7 @@ extern const char * const x86_power_flag #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR) #define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ) #define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE) +#define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16) #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64) # define cpu_has_invlpg 1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id 983C66B0011 for ; Tue, 31 May 2011 19:23:30 -0400 (EDT) Message-ID: <4DE576EA.6070906@zytor.com> Date: Tue, 31 May 2011 16:16:58 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DE50632.90906@zytor.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On 05/31/2011 09:53 AM, Christoph Lameter wrote: > Index: linux-2.6/arch/x86/Kconfig.cpu > =================================================================== > --- linux-2.6.orig/arch/x86/Kconfig.cpu 2011-05-31 11:28:24.202948792 -0500 > +++ linux-2.6/arch/x86/Kconfig.cpu 2011-05-31 11:29:36.742948327 -0500 > @@ -312,6 +312,16 @@ config X86_CMPXCHG > config CMPXCHG_LOCAL > def_bool X86_64 || (X86_32 && !M386) > > +# > +# CMPXCHG_DOUBLE needs to be set to enable the kernel to use cmpxchg16/8b > +# for cmpxchg_double if it find processor flags that indicate that the > +# capabilities are available. CMPXCHG_DOUBLE only compiles in > +# detection support. It needs to be set if there is a chance that processor > +# supports these instructions. > +# > +config CMPXCHG_DOUBLE > + def_bool GENERIC_CPU || X86_GENERIC || !M386 > + > config X86_L1_CACHE_SHIFT > int > default "7" if MPENTIUM4 || MPSC Per previous discussion: - Drop this Kconfig option (it is irrelevant.) CONFIG_CMPXCHG_LOCAL is different: it indicates that CMPXCHG is *guaranteed* to exist. > + asm volatile(LOCK_PREFIX_HERE "cmpxchg8b (%%esi); setz %1"\ > + : "d="(__dummy), "=a" (__ret) \ > + : "S" ((ptr)), "a" (__old1), "d"(__old2), \ > + "b" (__new1), "c" (__new2) \ > + : "memory"); \ > + __ret; }) > + asm volatile("cmpxchg8b (%%esi); tsetz %1" \ > + : "d="(__dummy), "=a"(__ret) \ > + : "S" ((ptr)), "a" (__old), "d"(__old2), \ > + "b" (__new1), "c" (__new2), \ > + : "memory"); \ > + __ret; }) d= is broken (won't even compile), and there is a typo in the opcode (setz, not tsetz). Use LOCK_PREFIX and +m here too. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id BE1E86B0011 for ; Tue, 31 May 2011 19:49:10 -0400 (EDT) Date: Tue, 31 May 2011 18:49:05 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <4DE576EA.6070906@zytor.com> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DE50632.90906@zytor.com> <4DE576EA.6070906@zytor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On Tue, 31 May 2011, H. Peter Anvin wrote: > On 05/31/2011 09:53 AM, Christoph Lameter wrote: > > Index: linux-2.6/arch/x86/Kconfig.cpu > > =================================================================== > > --- linux-2.6.orig/arch/x86/Kconfig.cpu 2011-05-31 11:28:24.202948792 -0500 > > +++ linux-2.6/arch/x86/Kconfig.cpu 2011-05-31 11:29:36.742948327 -0500 > > @@ -312,6 +312,16 @@ config X86_CMPXCHG > > config CMPXCHG_LOCAL > > def_bool X86_64 || (X86_32 && !M386) > > > > +# > > +# CMPXCHG_DOUBLE needs to be set to enable the kernel to use cmpxchg16/8b > > +# for cmpxchg_double if it find processor flags that indicate that the > > +# capabilities are available. CMPXCHG_DOUBLE only compiles in > > +# detection support. It needs to be set if there is a chance that processor > > +# supports these instructions. > > +# > > +config CMPXCHG_DOUBLE > > + def_bool GENERIC_CPU || X86_GENERIC || !M386 > > + > > config X86_L1_CACHE_SHIFT > > int > > default "7" if MPENTIUM4 || MPSC > > Per previous discussion: > > - Drop this Kconfig option (it is irrelevant.) CONFIG_CMPXCHG_LOCAL is > different: it indicates that CMPXCHG is *guaranteed* to exist. Right but this is for cmpxchg16b which means that we need to check a bit in the processor flags. Isnt this what you suggested? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with ESMTP id D33176B0011 for ; Tue, 31 May 2011 20:01:07 -0400 (EDT) Message-ID: <4DE57FBB.8040408@zytor.com> Date: Tue, 31 May 2011 16:54:35 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DE50632.90906@zytor.com> <4DE576EA.6070906@zytor.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On 05/31/2011 04:49 PM, Christoph Lameter wrote: >>> >>> +# >>> +# CMPXCHG_DOUBLE needs to be set to enable the kernel to use cmpxchg16/8b >>> +# for cmpxchg_double if it find processor flags that indicate that the >>> +# capabilities are available. CMPXCHG_DOUBLE only compiles in >>> +# detection support. It needs to be set if there is a chance that processor >>> +# supports these instructions. >>> +# >>> +config CMPXCHG_DOUBLE >>> + def_bool GENERIC_CPU || X86_GENERIC || !M386 >>> + >>> config X86_L1_CACHE_SHIFT >>> int >>> default "7" if MPENTIUM4 || MPSC >> >> Per previous discussion: >> >> - Drop this Kconfig option (it is irrelevant.) CONFIG_CMPXCHG_LOCAL is >> different: it indicates that CMPXCHG is *guaranteed* to exist. > > Right but this is for cmpxchg16b which means that we need to check a > bit in the processor flags. Isnt this what you suggested? > Per your own description: "CMPXCHG_DOUBLE only compiles in detection support. It needs to be set if there is a chance that processor supports these instructions." That condition is always TRUE, so no Kconfig is needed. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id 214D86B0011 for ; Wed, 1 Jun 2011 10:13:19 -0400 (EDT) Date: Wed, 1 Jun 2011 09:13:15 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <4DE57FBB.8040408@zytor.com> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DE50632.90906@zytor.com> <4DE576EA.6070906@zytor.com> <4DE57FBB.8040408@zytor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On Tue, 31 May 2011, H. Peter Anvin wrote: > On 05/31/2011 04:49 PM, Christoph Lameter wrote: > >>> > >>> +# > >>> +# CMPXCHG_DOUBLE needs to be set to enable the kernel to use cmpxchg16/8b > >>> +# for cmpxchg_double if it find processor flags that indicate that the > >>> +# capabilities are available. CMPXCHG_DOUBLE only compiles in > >>> +# detection support. It needs to be set if there is a chance that processor > >>> +# supports these instructions. > >>> +# > >>> +config CMPXCHG_DOUBLE > >>> + def_bool GENERIC_CPU || X86_GENERIC || !M386 > >>> + > >>> config X86_L1_CACHE_SHIFT > >>> int > >>> default "7" if MPENTIUM4 || MPSC > >> > >> Per previous discussion: > >> > >> - Drop this Kconfig option (it is irrelevant.) CONFIG_CMPXCHG_LOCAL is > >> different: it indicates that CMPXCHG is *guaranteed* to exist. > > > > Right but this is for cmpxchg16b which means that we need to check a > > bit in the processor flags. Isnt this what you suggested? > > > > Per your own description: > > "CMPXCHG_DOUBLE only compiles in detection support. It needs to be set > if there is a chance that processor supports these instructions." > > That condition is always TRUE, so no Kconfig is needed. There are several early processors (especially from AMD it seems) that do not support cmpxchg16b. If one builds a kernel specifically for the early cpus then the support does not need to be enabled. This is also an issue going beyond x86. Other platforms mostly do not support double word cmpxchg so the code for this feature also does not need to be included for those builds. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 364926B0011 for ; Wed, 1 Jun 2011 10:46:31 -0400 (EDT) Date: Wed, 1 Jun 2011 09:46:26 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DE50632.90906@zytor.com> <4DE576EA.6070906@zytor.com> <4DE57FBB.8040408@zytor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Tejun Heo , Thomas Gleixner Ok newest rev. Fixed the 32 bit issues. Subject: x86: Add support for cmpxchg_double A simple implementation that only supports the word size and does not have a fallback mode (would require a spinlock). Add 32 and 64 bit support for cmpxchg_double. cmpxchg double uses the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare and swap 2 machine words. This allows lockless algorithms to move more context information through critical sections. Set a flag CONFIG_CMPXCHG_DOUBLE to signal that support for double word cmpxchg detection has been build into the kernel. Note that each subsystem using cmpxchg_double has to implement a fall back mechanism as long as we offer support for processors that do not implement cmpxchg_double. Also various non x86 architectures do not support double cmpxchg and will require the fallback code. Cc: tj@kernel.org Signed-off-by: Christoph Lameter --- arch/x86/Kconfig.cpu | 10 +++++++ arch/x86/include/asm/cmpxchg_32.h | 48 ++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cmpxchg_64.h | 45 +++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cpufeature.h | 2 + 4 files changed, 105 insertions(+) Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h 2011-06-01 09:23:08.822443732 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h 2011-06-01 09:25:40.832442760 -0500 @@ -151,4 +151,49 @@ extern void __cmpxchg_wrong_size(void); cmpxchg_local((ptr), (o), (n)); \ }) +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX "cmpxchg16b %2;setz %1" \ + : "=d"(__junk), "=a"(__ret), "+m" (*ptr) \ + : "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + + +#define cmpxchg16b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg16b %2;setz %1" \ + : "=d"(__junk), "=a"(__ret), "+m" (*ptr) \ + : "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define system_has_cmpxchg_double() cpu_has_cx16 + #endif /* _ASM_X86_CMPXCHG_64_H */ Index: linux-2.6/arch/x86/include/asm/cmpxchg_32.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_32.h 2011-06-01 09:23:08.842443735 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_32.h 2011-06-01 09:32:53.072439992 -0500 @@ -280,4 +280,52 @@ static inline unsigned long cmpxchg_386( #endif +#define cmpxchg8b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX "cmpxchg8b %2; setz %1" \ + : "=d"(__dummy), "=a" (__ret), "+m" (*ptr)\ + : "a" (__old1), "d"(__old2), \ + "b" (__new1), "c" (__new2) \ + : "memory"); \ + __ret; }) + + +#define cmpxchg8b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg8b %2; setz %1" \ + : "=d"(__dummy), "=a"(__ret), "m+" (*ptr)\ + : "a" (__old), "d"(__old2), \ + "b" (__new1), "c" (__new2), \ + : "memory"); \ + __ret; }) + + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg8b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define system_has_cmpxchg_double() cpu_has_cx8 + #endif /* _ASM_X86_CMPXCHG_32_H */ Index: linux-2.6/arch/x86/Kconfig.cpu =================================================================== --- linux-2.6.orig/arch/x86/Kconfig.cpu 2011-06-01 09:23:08.862443735 -0500 +++ linux-2.6/arch/x86/Kconfig.cpu 2011-06-01 09:25:40.842442760 -0500 @@ -312,6 +312,16 @@ config X86_CMPXCHG config CMPXCHG_LOCAL def_bool X86_64 || (X86_32 && !M386) +# +# CMPXCHG_DOUBLE needs to be set to enable the kernel to use cmpxchg16/8b +# for cmpxchg_double if it find processor flags that indicate that the +# capabilities are available. CMPXCHG_DOUBLE only compiles in +# detection support. It needs to be set if there is a chance that processor +# supports these instructions. +# +config CMPXCHG_DOUBLE + def_bool GENERIC_CPU || X86_GENERIC || !M386 + config X86_L1_CACHE_SHIFT int default "7" if MPENTIUM4 || MPSC Index: linux-2.6/arch/x86/include/asm/cpufeature.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cpufeature.h 2011-06-01 09:23:08.832443731 -0500 +++ linux-2.6/arch/x86/include/asm/cpufeature.h 2011-06-01 09:38:25.012437868 -0500 @@ -288,6 +288,8 @@ extern const char * const x86_power_flag #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR) #define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ) #define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE) +#define cpu_has_cx8 boot_cpu_has(X86_FEATURE_CX8) +#define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16) #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64) # define cpu_has_invlpg 1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail6.bemta7.messagelabs.com (mail6.bemta7.messagelabs.com [216.82.255.55]) by kanga.kvack.org (Postfix) with ESMTP id 34D306B0011 for ; Wed, 1 Jun 2011 11:48:21 -0400 (EDT) Message-ID: <4DE65DB6.4050801@zytor.com> Date: Wed, 01 Jun 2011 08:41:42 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DE50632.90906@zytor.com> <4DE576EA.6070906@zytor.com> <4DE57FBB.8040408@zytor.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Thomas Gleixner On 06/01/2011 07:13 AM, Christoph Lameter wrote: >> >> Per your own description: >> >> "CMPXCHG_DOUBLE only compiles in detection support. It needs to be set >> if there is a chance that processor supports these instructions." >> >> That condition is always TRUE, so no Kconfig is needed. > > There are several early processors (especially from AMD it seems) that do > not support cmpxchg16b. If one builds a kernel specifically for the early > cpus then the support does not need to be enabled. > We don't support building kernels specifically for those early CPUs as far as I know. Besides, it is a very small set. Even if we did, the conditional as you have specified it is wrong, and I mean "not even in the general ballpark of correct". > This is also an issue going beyond x86. Other platforms mostly do not > support double word cmpxchg so the code for this feature also does not > need to be included for those builds. That's fine; just set it unconditionally for x86. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 4E7186B0011 for ; Wed, 1 Jun 2011 11:49:31 -0400 (EDT) Message-ID: <4DE65E02.8080303@zytor.com> Date: Wed, 01 Jun 2011 08:42:58 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DE50632.90906@zytor.com> <4DE576EA.6070906@zytor.com> <4DE57FBB.8040408@zytor.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Christoph Lameter Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Tejun Heo , Thomas Gleixner On 06/01/2011 07:46 AM, Christoph Lameter wrote: > +#define cmpxchg8b_local(ptr, o1, o2, n1, n2) \ > +({ \ > + char __ret; \ > + __typeof__(o2) __dummy; \ > + __typeof__(*(ptr)) __old1 = (o1); \ > + __typeof__(o2) __old2 = (o2); \ > + __typeof__(*(ptr)) __new1 = (n1); \ > + __typeof__(o2) __new2 = (n2); \ > + asm volatile("cmpxchg8b %2; setz %1" \ > + : "=d"(__dummy), "=a"(__ret), "m+" (*ptr)\ > + : "a" (__old), "d"(__old2), \ > + "b" (__new1), "c" (__new2), \ > + : "memory"); \ > + __ret; }) Another syntax error... did you even compile-test any of your patches on 32 bits? > +# > +# CMPXCHG_DOUBLE needs to be set to enable the kernel to use cmpxchg16/8b > +# for cmpxchg_double if it find processor flags that indicate that the > +# capabilities are available. CMPXCHG_DOUBLE only compiles in > +# detection support. It needs to be set if there is a chance that processor > +# supports these instructions. > +# > +config CMPXCHG_DOUBLE > + def_bool GENERIC_CPU || X86_GENERIC || !M386 > + Still wrong. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id 768196B0023 for ; Wed, 1 Jun 2011 12:08:19 -0400 (EDT) Date: Wed, 1 Jun 2011 11:08:14 -0500 (CDT) From: Christoph Lameter Subject: Re: [slubllv5 07/25] x86: Add support for cmpxchg_double In-Reply-To: <4DE65E02.8080303@zytor.com> Message-ID: References: <20110516202605.274023469@linux.com> <20110516202625.197639928@linux.com> <4DDE9670.3060709@zytor.com> <4DDE9C01.2090104@zytor.com> <1306445159.2543.25.camel@edumazet-laptop> <4DE50632.90906@zytor.com> <4DE576EA.6070906@zytor.com> <4DE57FBB.8040408@zytor.com> <4DE65E02.8080303@zytor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Eric Dumazet , Pekka Enberg , David Rientjes , linux-mm@kvack.org, Tejun Heo , Thomas Gleixner On Wed, 1 Jun 2011, H. Peter Anvin wrote: > On 06/01/2011 07:46 AM, Christoph Lameter wrote: > > +#define cmpxchg8b_local(ptr, o1, o2, n1, n2) \ > > +({ \ > > + char __ret; \ > > + __typeof__(o2) __dummy; \ > > + __typeof__(*(ptr)) __old1 = (o1); \ > > + __typeof__(o2) __old2 = (o2); \ > > + __typeof__(*(ptr)) __new1 = (n1); \ > > + __typeof__(o2) __new2 = (n2); \ > > + asm volatile("cmpxchg8b %2; setz %1" \ > > + : "=d"(__dummy), "=a"(__ret), "m+" (*ptr)\ > > + : "a" (__old), "d"(__old2), \ > > + "b" (__new1), "c" (__new2), \ > > + : "memory"); \ > > + __ret; }) > > Another syntax error... did you even compile-test any of your patches on > 32 bits? Yes. Compiled, built and I also ran a benchmark with them. cmpxchg8b_local is not used right now. New patch with CMPXCHG_DOUBLE always on and the typo fix. Subject: x86: Add support for cmpxchg_double A simple implementation that only supports the word size and does not have a fallback mode (would require a spinlock). Add 32 and 64 bit support for cmpxchg_double. cmpxchg double uses the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare and swap 2 machine words. This allows lockless algorithms to move more context information through critical sections. Set a flag CONFIG_CMPXCHG_DOUBLE to signal that support for double word cmpxchg detection has been build into the kernel. Note that each subsystem using cmpxchg_double has to implement a fall back mechanism as long as we offer support for processors that do not implement cmpxchg_double. Cc: tj@kernel.org Signed-off-by: Christoph Lameter --- arch/x86/Kconfig.cpu | 3 ++ arch/x86/include/asm/cmpxchg_32.h | 48 ++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cmpxchg_64.h | 45 +++++++++++++++++++++++++++++++++++ arch/x86/include/asm/cpufeature.h | 2 + 4 files changed, 98 insertions(+) Index: linux-2.6/arch/x86/include/asm/cmpxchg_64.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_64.h 2011-06-01 11:01:05.002406114 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_64.h 2011-06-01 11:01:48.222405834 -0500 @@ -151,4 +151,49 @@ extern void __cmpxchg_wrong_size(void); cmpxchg_local((ptr), (o), (n)); \ }) +#define cmpxchg16b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX "cmpxchg16b %2;setz %1" \ + : "=d"(__junk), "=a"(__ret), "+m" (*ptr) \ + : "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + + +#define cmpxchg16b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __junk; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg16b %2;setz %1" \ + : "=d"(__junk), "=a"(__ret), "+m" (*ptr) \ + : "b"(__new1), "c"(__new2), \ + "a"(__old1), "d"(__old2)); \ + __ret; }) + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 8); \ + VM_BUG_ON((unsigned long)(ptr) % 16); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define system_has_cmpxchg_double() cpu_has_cx16 + #endif /* _ASM_X86_CMPXCHG_64_H */ Index: linux-2.6/arch/x86/include/asm/cmpxchg_32.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cmpxchg_32.h 2011-06-01 11:01:05.022406109 -0500 +++ linux-2.6/arch/x86/include/asm/cmpxchg_32.h 2011-06-01 11:01:48.222405834 -0500 @@ -280,4 +280,52 @@ static inline unsigned long cmpxchg_386( #endif +#define cmpxchg8b(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile(LOCK_PREFIX "cmpxchg8b %2; setz %1" \ + : "=d"(__dummy), "=a" (__ret), "+m" (*ptr)\ + : "a" (__old1), "d"(__old2), \ + "b" (__new1), "c" (__new2) \ + : "memory"); \ + __ret; }) + + +#define cmpxchg8b_local(ptr, o1, o2, n1, n2) \ +({ \ + char __ret; \ + __typeof__(o2) __dummy; \ + __typeof__(*(ptr)) __old1 = (o1); \ + __typeof__(o2) __old2 = (o2); \ + __typeof__(*(ptr)) __new1 = (n1); \ + __typeof__(o2) __new2 = (n2); \ + asm volatile("cmpxchg8b %2; setz %1" \ + : "=d"(__dummy), "=a"(__ret), "+m" (*ptr)\ + : "a" (__old), "d"(__old2), \ + "b" (__new1), "c" (__new2), \ + : "memory"); \ + __ret; }) + + +#define cmpxchg_double(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg8b((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define cmpxchg_double_local(ptr, o1, o2, n1, n2) \ +({ \ + BUILD_BUG_ON(sizeof(*(ptr)) != 4); \ + VM_BUG_ON((unsigned long)(ptr) % 8); \ + cmpxchg16b_local((ptr), (o1), (o2), (n1), (n2)); \ +}) + +#define system_has_cmpxchg_double() cpu_has_cx8 + #endif /* _ASM_X86_CMPXCHG_32_H */ Index: linux-2.6/arch/x86/Kconfig.cpu =================================================================== --- linux-2.6.orig/arch/x86/Kconfig.cpu 2011-06-01 11:01:05.032406108 -0500 +++ linux-2.6/arch/x86/Kconfig.cpu 2011-06-01 11:02:20.912405628 -0500 @@ -312,6 +312,9 @@ config X86_CMPXCHG config CMPXCHG_LOCAL def_bool X86_64 || (X86_32 && !M386) +config CMPXCHG_DOUBLE + def_bool y + config X86_L1_CACHE_SHIFT int default "7" if MPENTIUM4 || MPSC Index: linux-2.6/arch/x86/include/asm/cpufeature.h =================================================================== --- linux-2.6.orig/arch/x86/include/asm/cpufeature.h 2011-06-01 11:01:05.012406112 -0500 +++ linux-2.6/arch/x86/include/asm/cpufeature.h 2011-06-01 11:01:48.222405834 -0500 @@ -288,6 +288,8 @@ extern const char * const x86_power_flag #define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR) #define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ) #define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE) +#define cpu_has_cx8 boot_cpu_has(X86_FEATURE_CX8) +#define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16) #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64) # define cpu_has_invlpg 1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org