* Re: [PATCH] SLUB use cmpxchg_local
2007-08-28 1:26 ` [PATCH] SLUB use cmpxchg_local Christoph Lameter
@ 2007-08-28 12:07 ` Mathieu Desnoyers
2007-08-28 19:42 ` Christoph Lameter
2007-09-04 20:02 ` Mathieu Desnoyers
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-08-28 12:07 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
Ok, I just had a look at ia64 instruction set, and I fear that cmpxchg
must always come with the acquire or release semantic. Is there any
cmpxchg equivalent on ia64 that would be acquire and release semantic
free ? This implicit memory ordering in the instruction seems to be
responsible for the slowdown.
If such primitive does not exist, then we should think about an irq
disable fallback for this local atomic operation. However, I would
prefer to let the cmpxchg_local primitive be bound to the "slow"
cmpxchg_acq and create something like _cmpxchg_local that would be
interrupt-safe, but not reentrant wrt NMIs.
This way, cmpxchg_local users could choose either the fast flavor
(_cmpxchg_local: not necessarily atomic wrt NMIs) or the most atomic
flavor (cmpxchg_local) available on the architecture. If you think of a
better name, please tell me... it could also be: fast version (mostly
used): cmpxchg_local(); slow, fully reentrant version:
cmpxchg_local_nmi().
Mathieu
* Christoph Lameter (clameter@sgi.com) wrote:
> Measurements on IA64 slub w/per cpu vs slub w/per cpu/cmpxchg_local
> emulation. Results are not good:
>
> slub/per cpu
> 10000 times kmalloc(8)/kfree -> 105 cycles
> 10000 times kmalloc(16)/kfree -> 104 cycles
> 10000 times kmalloc(32)/kfree -> 105 cycles
> 10000 times kmalloc(64)/kfree -> 104 cycles
> 10000 times kmalloc(128)/kfree -> 104 cycles
> 10000 times kmalloc(256)/kfree -> 115 cycles
> 10000 times kmalloc(512)/kfree -> 116 cycles
> 10000 times kmalloc(1024)/kfree -> 115 cycles
> 10000 times kmalloc(2048)/kfree -> 115 cycles
> 10000 times kmalloc(4096)/kfree -> 115 cycles
> 10000 times kmalloc(8192)/kfree -> 117 cycles
> 10000 times kmalloc(16384)/kfree -> 439 cycles
> 10000 times kmalloc(32768)/kfree -> 800 cycles
>
>
> slub/per cpu + cmpxchg_local emulation
> 10000 times kmalloc(8)/kfree -> 143 cycles
> 10000 times kmalloc(16)/kfree -> 143 cycles
> 10000 times kmalloc(32)/kfree -> 143 cycles
> 10000 times kmalloc(64)/kfree -> 143 cycles
> 10000 times kmalloc(128)/kfree -> 143 cycles
> 10000 times kmalloc(256)/kfree -> 154 cycles
> 10000 times kmalloc(512)/kfree -> 154 cycles
> 10000 times kmalloc(1024)/kfree -> 154 cycles
> 10000 times kmalloc(2048)/kfree -> 154 cycles
> 10000 times kmalloc(4096)/kfree -> 155 cycles
> 10000 times kmalloc(8192)/kfree -> 155 cycles
> 10000 times kmalloc(16384)/kfree -> 440 cycles
> 10000 times kmalloc(32768)/kfree -> 819 cycles
> 10000 times kmalloc(65536)/kfree -> 902 cycles
>
>
> Parallel allocs:
>
> Kmalloc N*alloc N*free(16): 0\x102/136 1ó/136 2ô/140 3ò/140 4\x100/138
> 5ô/139 6\x100/139 7\x101/141 Averageô/139
>
> cmpxchg_local emulation
> Kmalloc N*alloc N*free(16): 0\x116/147 1\x116/145 2\x115/151 3\x115/147
> 4\x115/149 5\x117/147 6\x116/148 7\x116/146 Average\x116/147
>
> Patch used:
>
> Index: linux-2.6/include/asm-ia64/atomic.h
> =================================> --- linux-2.6.orig/include/asm-ia64/atomic.h 2007-08-27 16:42:02.000000000 -0700
> +++ linux-2.6/include/asm-ia64/atomic.h 2007-08-27 17:50:24.000000000 -0700
> @@ -223,4 +223,17 @@ atomic64_add_negative (__s64 i, atomic64
> #define smp_mb__after_atomic_inc() barrier()
>
> #include <asm-generic/atomic.h>
> +
> +static inline void *cmpxchg_local(void **p, void *old, void *new)
> +{
> + unsigned long flags;
> + void *before;
> +
> + local_irq_save(flags);
> + before = *p;
> + if (likely(before = old))
> + *p = new;
> + local_irq_restore(flags);
> + return before;
> +}
> #endif /* _ASM_IA64_ATOMIC_H */
>
> kmem_cache_alloc before
>
> 0000000000008900 <kmem_cache_alloc>:
> 8900: 01 28 31 0e 80 05 [MII] alloc r37=ar.pfs,12,7,0
> 8906: 40 02 00 62 00 00 mov r36∞
> 890c: 00 00 04 00 nop.i 0x0;;
> 8910: 0b 18 01 00 25 04 [MMI] mov r35=psr;;
> 8916: 00 00 04 0e 00 00 rsm 0x4000
> 891c: 00 00 04 00 nop.i 0x0;;
> 8920: 08 50 90 1b 19 21 [MMI] adds r10300,r13
> 8926: 70 02 80 00 42 40 mov r39=r32
> 892c: 05 00 c4 00 mov r42∞
> 8930: 09 40 01 42 00 21 [MMI] mov r40=r33
> 8936: 00 00 00 02 00 20 nop.m 0x0
> 893c: f5 e7 ff 9f mov r41=-1;;
> 8940: 0b 48 00 14 10 10 [MMI] ld4 r9=[r10];;
> 8946: 00 00 00 02 00 00 nop.m 0x0
> 894c: 01 48 58 00 sxt4 r8=r9;;
> 8950: 0b 18 20 40 12 20 [MMI] shladd r3=r8,3,r32;;
> 8956: 20 80 0f 82 48 00 addl r2Ñ32,r3
> 895c: 00 00 04 00 nop.i 0x0;;
> 8960: 0a 00 01 04 18 10 [MMI] ld8 r32=[r2];;
> 8966: e0 a0 80 00 42 60 adds r14 ,r32
> 896c: 05 00 01 84 mov r43=r32
> 8970: 0b 10 01 40 18 10 [MMI] ld8 r34=[r32];;
> 8976: 70 00 88 0c 72 00 cmp.eq p7,p6=0,r34
> 897c: 00 00 04 00 nop.i 0x0;;
> 8980: cb 70 00 1c 10 90 [MMI] (p06) ld4 r14=[r14];;
> 8986: e1 70 88 24 40 00 (p06) shladd r14=r14,3,r34
> 898c: 00 00 04 00 nop.i 0x0;;
> 8990: c2 70 00 1c 18 10 [MII] (p06) ld8 r14=[r14]
> 8996: 00 00 00 02 00 00 nop.i 0x0;;
> 899c: 00 00 04 00 nop.i 0x0
> 89a0: d8 00 38 40 98 11 [MMB] (p06) st8 [r32]=r14
> 89a6: 00 00 00 02 00 03 nop.m 0x0
> 89ac: 30 00 00 40 (p06) br.cond.sptk.few 89d0 <kmem_cache_alloc+0xd0>
> 89b0: 11 00 00 00 01 00 [MIB] nop.m 0x0
> 89b6: 00 00 00 02 00 00 nop.i 0x0
> 89bc: 18 d8 ff 58 br.call.sptk.many b0ac0 <__slab_alloc>;;
> 89c0: 08 10 01 10 00 21 [MMI] mov r34=r8
> 89c6: 00 00 00 02 00 00 nop.m 0x0
> 89cc: 00 00 04 00 nop.i 0x0
> 89d0: 03 00 00 00 01 00 [MII] nop.m 0x0
> 89d6: 20 01 00 00 49 20 mov r18\x16384;;
> 89dc: 22 19 31 80 and r17=r18,r35;;
> 89e0: 0a 38 44 00 06 b8 [MMI] cmp.eq p7,p6=r17,r0;;
> 89e6: 01 00 04 0c 00 00 (p06) ssm 0x4000
> 89ec: 00 00 04 00 nop.i 0x0
> 89f0: eb 00 00 02 07 80 [MMI] (p07) rsm 0x4000;;
> 89f6: 01 00 00 60 00 00 (p06) srlz.d
> 89fc: 00 00 04 00 nop.i 0x0;;
> 8a00: 08 00 00 00 01 00 [MMI] nop.m 0x0
> 8a06: b0 00 88 14 72 e0 cmp.eq p11,p10=0,r34
> 8a0c: e1 09 01 52 extr.u r15=r33,15,1
> 8a10: 09 58 60 40 00 21 [MMI] adds r11$,r32
> 8a16: 70 02 88 00 42 00 mov r39=r34
> 8a1c: 05 00 00 84 mov r40=r0;;
> 8a20: 42 71 04 00 00 e4 [MII] (p10) mov r14=1
> 8a26: e2 00 00 00 42 00 (p11) mov r14=r0;;
> 8a2c: 00 00 04 00 nop.i 0x0
> 8a30: 0b 80 3c 1c 0c 20 [MMI] and r16=r15,r14;;
> 8a36: 80 00 40 12 73 00 cmp4.eq p8,p9=0,r16
> 8a3c: 00 00 04 00 nop.i 0x0;;
> 8a40: 31 49 01 16 10 10 [MIB] (p09) ld4 r41=[r11]
> 8a46: 00 00 00 02 80 04 nop.i 0x0
> 8a4c: 08 00 00 51 (p09) br.call.spnt.many b0ä40 <kmem_cache_alloc+0x140>;;
> 8a50: 08 00 00 00 01 00 [MMI] nop.m 0x0
> 8a56: 80 00 88 00 42 00 mov r8=r34
> 8a5c: 40 0a 00 07 mov b0=r36
> 8a60: 11 00 00 00 01 00 [MIB] nop.m 0x0
> 8a66: 00 28 01 55 00 80 mov.i ar.pfs=r37
> 8a6c: 08 00 84 00 br.ret.sptk.many b0;;
> 8a70: 08 00 00 00 01 00 [MMI] nop.m 0x0
> 8a76: 00 00 00 02 00 00 nop.m 0x0
> 8a7c: 00 00 04 00 nop.i 0x0
>
> kmem_cache_alloc with cmpxchg emulation:
>
> 0000000000008da0 <kmem_cache_alloc>:
> 8da0: 09 28 31 0e 80 05 [MMI] alloc r37=ar.pfs,12,7,0
> 8da6: a0 80 36 32 42 80 adds r10280,r13
> 8dac: 04 00 c4 00 mov r36∞;;
> 8db0: 02 00 00 00 01 00 [MII] nop.m 0x0
> 8db6: 00 41 29 00 42 00 adds r16@,r10;;
> 8dbc: 00 00 04 00 nop.i 0x0
> 8dc0: 0a 58 00 20 10 10 [MMI] ld4 r11=[r16];;
> 8dc6: e0 08 2c 00 42 00 adds r14=1,r11
> 8dcc: 00 00 04 00 nop.i 0x0
> 8dd0: 0b 00 00 00 01 00 [MMI] nop.m 0x0;;
> 8dd6: 00 70 40 20 23 00 st4 [r16]=r14
> 8ddc: 00 00 04 00 nop.i 0x0;;
> 8de0: 09 78 50 14 00 21 [MMI] adds r15 ,r10
> 8de6: 00 00 00 02 00 40 nop.m 0x0
> 8dec: 02 00 00 92 mov r18\x16384;;
> 8df0: 0b 48 00 1e 10 10 [MMI] ld4 r9=[r15];;
> 8df6: 00 00 00 02 00 00 nop.m 0x0
> 8dfc: 01 48 58 00 sxt4 r8=r9;;
> 8e00: 0b 18 20 40 12 20 [MMI] shladd r3=r8,3,r32;;
> 8e06: 20 80 0f 82 48 00 addl r2Ñ32,r3
> 8e0c: 00 00 04 00 nop.i 0x0;;
> 8e10: 02 10 01 04 18 10 [MII] ld8 r34=[r2]
> 8e16: 00 00 00 02 00 20 nop.i 0x0;;
> 8e1c: 42 11 01 84 adds r17 ,r34
> 8e20: 09 00 00 00 01 00 [MMI] nop.m 0x0
> 8e26: f0 00 88 30 20 00 ld8 r15=[r34]
> 8e2c: 00 00 04 00 nop.i 0x0;;
> 8e30: 10 00 00 00 01 00 [MIB] nop.m 0x0
> 8e36: 60 00 3c 0e 72 03 cmp.eq p6,p7=0,r15
> 8e3c: 20 01 00 41 (p06) br.cond.spnt.few 8f50 <kmem_cache_alloc+0x1b0>
> 8e40: 0a b8 00 22 10 10 [MMI] ld4 r23=[r17];;
> 8e46: 60 b9 3c 24 40 00 shladd r22=r23,3,r15
> 8e4c: 00 00 04 00 nop.i 0x0
> 8e50: 0b 00 00 00 01 00 [MMI] nop.m 0x0;;
> 8e56: 40 01 58 30 20 00 ld8 r20=[r22]
> 8e5c: 00 00 04 00 nop.i 0x0;;
> 8e60: 0b a8 00 00 25 04 [MMI] mov r21=psr;;
> 8e66: 00 00 04 0e 00 00 rsm 0x4000
> 8e6c: 00 00 04 00 nop.i 0x0;;
> 8e70: 02 18 01 44 18 10 [MII] ld8 r35=[r34]
> 8e76: 30 91 54 18 40 00 and r19=r18,r21;;
> 8e7c: 00 00 04 00 nop.i 0x0
> 8e80: 0b 48 3c 46 08 78 [MMI] cmp.eq p9,p8=r15,r35;;
> 8e86: 02 a0 88 30 23 00 (p09) st8 [r34]=r20
> 8e8c: 00 00 04 00 nop.i 0x0;;
> 8e90: 0a 38 4c 00 06 b8 [MMI] cmp.eq p7,p6=r19,r0;;
> 8e96: 01 00 04 0c 00 00 (p06) ssm 0x4000
> 8e9c: 00 00 04 00 nop.i 0x0
> 8ea0: eb 00 00 02 07 80 [MMI] (p07) rsm 0x4000;;
> 8ea6: 01 00 00 60 00 00 (p06) srlz.d
> 8eac: 00 00 04 00 nop.i 0x0;;
> 8eb0: 11 00 00 00 01 00 [MIB] nop.m 0x0
> 8eb6: 00 00 00 02 00 04 nop.i 0x0
> 8ebc: 70 ff ff 49 (p08) br.cond.spnt.few 8e20 <kmem_cache_alloc+0x80>;;
> 8ec0: 03 00 00 00 01 00 [MII] nop.m 0x0
> 8ec6: 80 81 36 32 42 20 adds r24280,r13;;
> 8ecc: 83 c2 00 84 adds r25@,r24;;
> 8ed0: 0a d8 00 32 10 10 [MMI] ld4 r27=[r25];;
> 8ed6: a0 f9 6f 7e 46 00 adds r26=-1,r27
> 8edc: 00 00 04 00 nop.i 0x0
> 8ee0: 0b 00 00 00 01 00 [MMI] nop.m 0x0;;
> 8ee6: 00 d0 64 20 23 00 st4 [r25]=r26
> 8eec: 00 00 04 00 nop.i 0x0;;
> 8ef0: 0b 90 40 30 00 21 [MMI] adds r18\x16,r24;;
> 8ef6: 10 01 48 60 21 00 ld4.acq r17=[r18]
> 8efc: 00 00 04 00 nop.i 0x0;;
> 8f00: 11 00 00 00 01 00 [MIB] nop.m 0x0
> 8f06: c0 10 44 1a a8 06 tbit.z p12,p13=r17,1
> 8f0c: 08 00 00 51 (p13) br.call.spnt.many b0è00 <kmem_cache_alloc+0x160>;;
> 8f10: 02 00 00 00 01 00 [MII] nop.m 0x0
> 8f16: a0 f0 84 16 a8 c5 tbit.z p10,p11=r33,15;;
> 8f1c: 81 11 01 84 (p11) adds r14$,r34
> 8f20: 62 39 01 46 00 e1 [MII] (p11) mov r39=r35
> 8f26: 82 02 00 00 42 00 (p11) mov r40=r0;;
> 8f2c: 00 00 04 00 nop.i 0x0
> 8f30: 79 49 01 1c 10 10 [MMB] (p11) ld4 r41=[r14]
> 8f36: 00 00 00 02 80 05 nop.m 0x0
> 8f3c: 08 00 00 51 (p11) br.call.spnt.many b0è30 <kmem_cache_alloc+0x190>;;
> 8f40: 10 00 00 00 01 00 [MIB] nop.m 0x0
> 8f46: 80 00 8c 00 42 00 mov r8=r35
> 8f4c: 30 00 00 40 br.few 8f70 <kmem_cache_alloc+0x1d0>
> 8f50: 08 38 01 40 00 21 [MMI] mov r39=r32
> 8f56: 80 02 84 00 42 40 mov r40=r33
> 8f5c: 05 20 01 84 mov r42=r36
> 8f60: 19 58 01 44 00 21 [MMB] mov r43=r34
> 8f66: 90 fa f3 ff 4f 00 mov r41=-1
> 8f6c: 28 d3 ff 58 br.call.sptk.many b0b80 <__slab_alloc>;;
> 8f70: 00 00 00 00 01 00 [MII] nop.m 0x0
> 8f76: 00 20 05 80 03 00 mov b0=r36
> 8f7c: 00 00 04 00 nop.i 0x0
> 8f80: 11 00 00 00 01 00 [MIB] nop.m 0x0
> 8f86: 00 28 01 55 00 80 mov.i ar.pfs=r37
> 8f8c: 08 00 84 00 br.ret.sptk.many b0;;
> 8f90: 08 00 00 00 01 00 [MMI] nop.m 0x0
> 8f96: 00 00 00 02 00 00 nop.m 0x0
> 8f9c: 00 00 04 00 nop.i 0x0
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] SLUB use cmpxchg_local
2007-08-28 12:07 ` Mathieu Desnoyers
@ 2007-08-28 19:42 ` Christoph Lameter
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-08-28 19:42 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
On Tue, 28 Aug 2007, Mathieu Desnoyers wrote:
> Ok, I just had a look at ia64 instruction set, and I fear that cmpxchg
> must always come with the acquire or release semantic. Is there any
> cmpxchg equivalent on ia64 that would be acquire and release semantic
> free ? This implicit memory ordering in the instruction seems to be
> responsible for the slowdown.
No. There is no cmpxchg used in the patches that I tested. The slowdown
seem to come from the need to serialize at barriers. Adding an interrupt
enable/disable in the middle of the hot path creates another serialization
point.
> If such primitive does not exist, then we should think about an irq
> disable fallback for this local atomic operation. However, I would
> prefer to let the cmpxchg_local primitive be bound to the "slow"
> cmpxchg_acq and create something like _cmpxchg_local that would be
> interrupt-safe, but not reentrant wrt NMIs.
Ummm... That is what I did. See the included patch that you quoted. The
measurements show that such a fallback is not preserving the performance
on IA64.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] SLUB use cmpxchg_local
2007-08-28 1:26 ` [PATCH] SLUB use cmpxchg_local Christoph Lameter
2007-08-28 12:07 ` Mathieu Desnoyers
@ 2007-09-04 20:02 ` Mathieu Desnoyers
2007-09-04 20:03 ` [PATCH] local_t protection (critical section) Mathieu Desnoyers
2007-09-04 20:04 ` [PATCH] slub - Use local_t protection Mathieu Desnoyers
3 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-09-04 20:02 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
* Christoph Lameter (clameter@sgi.com) wrote:
> Measurements on IA64 slub w/per cpu vs slub w/per cpu/cmpxchg_local
> emulation. Results are not good:
>
Hi Christoph,
I tried to come up with a patch set implementing the basics of a new
critical section: local_enter(flags) and local_exit(flags).
Can you try those on ia64 and tell me if the results are better ?
See the 2 next posts...
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] local_t protection (critical section)
2007-08-28 1:26 ` [PATCH] SLUB use cmpxchg_local Christoph Lameter
2007-08-28 12:07 ` Mathieu Desnoyers
2007-09-04 20:02 ` Mathieu Desnoyers
@ 2007-09-04 20:03 ` Mathieu Desnoyers
2007-09-04 20:04 ` [PATCH] slub - Use local_t protection Mathieu Desnoyers
3 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-09-04 20:03 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
local_t protection (critical section)
Adds local_enter(flags) and local_exit(flags) as primitives to surround critical
sections using local_t types.
On architectures providing fast atomic primitives, this turns into a preempt
disable/enable().
However, on architectures not providing such fast primitives, such as ia64, it
turns into a local irq disable/enable so that we can use *_local primitives that
are non atomic.
This is only the primary work here: made for testing ia64 with cmpxchg_local
(other local_* primitives still use atomic_long_t operations as fallback).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Christoph Lameter <clameter@sgi.com>
---
include/asm-generic/local.h | 3 +++
include/asm-i386/local.h | 3 +++
include/asm-ia64/intrinsics.h | 14 ++++++++++++--
3 files changed, 18 insertions(+), 2 deletions(-)
Index: linux-2.6-lttng/include/asm-generic/local.h
=================================--- linux-2.6-lttng.orig/include/asm-generic/local.h 2007-09-04 15:32:02.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/local.h 2007-09-04 15:36:41.000000000 -0400
@@ -46,6 +46,9 @@ typedef struct
#define local_add_unless(l, a, u) atomic_long_add_unless((&(l)->a), (a), (u))
#define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
+#define local_enter(flags) local_irq_save(flags)
+#define local_exit(flags) local_irq_restore(flags)
+
/* Non-atomic variants, ie. preemption disabled and won't be touched
* in interrupt, etc. Some archs can optimize this case well. */
#define __local_inc(l) local_set((l), local_read(l) + 1)
Index: linux-2.6-lttng/include/asm-i386/local.h
=================================--- linux-2.6-lttng.orig/include/asm-i386/local.h 2007-09-04 15:28:52.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/local.h 2007-09-04 15:31:54.000000000 -0400
@@ -194,6 +194,9 @@ static __inline__ long local_sub_return(
})
#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
+#define local_enter(flags) preempt_disable()
+#define local_exit(flags) preempt_enable()
+
/* On x86, these are no better than the atomic variants. */
#define __local_inc(l) local_inc(l)
#define __local_dec(l) local_dec(l)
Index: linux-2.6-lttng/include/asm-ia64/intrinsics.h
=================================--- linux-2.6-lttng.orig/include/asm-ia64/intrinsics.h 2007-09-04 15:47:24.000000000 -0400
+++ linux-2.6-lttng/include/asm-ia64/intrinsics.h 2007-09-04 15:49:41.000000000 -0400
@@ -160,8 +160,18 @@ extern long ia64_cmpxchg_called_with_bad
#define cmpxchg(ptr,o,n) cmpxchg_acq(ptr,o,n)
#define cmpxchg64(ptr,o,n) cmpxchg_acq(ptr,o,n)
-#define cmpxchg_local cmpxchg
-#define cmpxchg64_local cmpxchg64
+/* Must be executed between local_enter/local_exit. */
+static inline void *cmpxchg_local(void **p, void *old, void *new)
+{
+ unsigned long flags;
+ void *before;
+
+ before = *p;
+ if (likely(before = old))
+ *p = new;
+ return before;
+}
+#define cmpxchg64_local cmpxchg_local
#ifdef CONFIG_IA64_DEBUG_CMPXCHG
# define CMPXCHG_BUGCHECK_DECL int _cmpxchg_bugcheck_count = 128;
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH] slub - Use local_t protection
2007-08-28 1:26 ` [PATCH] SLUB use cmpxchg_local Christoph Lameter
` (2 preceding siblings ...)
2007-09-04 20:03 ` [PATCH] local_t protection (critical section) Mathieu Desnoyers
@ 2007-09-04 20:04 ` Mathieu Desnoyers
2007-09-04 20:45 ` Christoph Lameter
3 siblings, 1 reply; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-09-04 20:04 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
slub - Use local_t protection
Use local_enter/local_exit for protection in the fast path.
Depends on the cmpxchg_local slub patch.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Christoph Lameter <clameter@sgi.com>
---
mm/slub.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
Index: linux-2.6-lttng/mm/slub.c
=================================--- linux-2.6-lttng.orig/mm/slub.c 2007-09-04 15:47:20.000000000 -0400
+++ linux-2.6-lttng/mm/slub.c 2007-09-04 15:52:07.000000000 -0400
@@ -1456,7 +1456,6 @@ static void *__slab_alloc(struct kmem_ca
unsigned long flags;
local_irq_save(flags);
- put_cpu_no_resched();
if (!c->page)
/* Slab was flushed */
goto new_slab;
@@ -1480,7 +1479,6 @@ load_freelist:
out:
slab_unlock(c->page);
local_irq_restore(flags);
- preempt_check_resched();
if (unlikely((gfpflags & __GFP_ZERO)))
memset(object, 0, c->objsize);
return object;
@@ -1524,7 +1522,6 @@ new_slab:
goto load_freelist;
}
local_irq_restore(flags);
- preempt_check_resched();
return NULL;
debug:
object = c->page->freelist;
@@ -1552,8 +1549,10 @@ static void __always_inline *slab_alloc(
{
void **object;
struct kmem_cache_cpu *c;
+ unsigned long flags;
- c = get_cpu_slab(s, get_cpu());
+ local_enter(flags);
+ c = get_cpu_slab(s, smp_processor_id());
redo:
object = c->freelist;
if (unlikely(!object))
@@ -1566,12 +1565,13 @@ redo:
object[c->offset]) != object))
goto redo;
- put_cpu();
+ local_exit(flags);
if (unlikely((gfpflags & __GFP_ZERO)))
memset(object, 0, c->objsize);
return object;
slow:
+ local_exit(flags);
return __slab_alloc(s, gfpflags, node, addr, c);
}
@@ -1605,7 +1605,6 @@ static void __slab_free(struct kmem_cach
void **object = (void *)x;
unsigned long flags;
- put_cpu();
local_irq_save(flags);
slab_lock(page);
@@ -1670,10 +1669,12 @@ static void __always_inline slab_free(st
void **object = (void *)x;
void **freelist;
struct kmem_cache_cpu *c;
+ unsigned long flags;
debug_check_no_locks_freed(object, s->objsize);
- c = get_cpu_slab(s, get_cpu());
+ local_enter(flags);
+ c = get_cpu_slab(s, smp_processor_id());
if (unlikely(c->node < 0))
goto slow;
redo:
@@ -1687,9 +1688,10 @@ redo:
!= freelist))
goto redo;
- put_cpu();
+ local_exit(flags);
return;
slow:
+ local_exit(flags);
__slab_free(s, page, x, addr, c->offset);
}
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] slub - Use local_t protection
2007-09-04 20:04 ` [PATCH] slub - Use local_t protection Mathieu Desnoyers
@ 2007-09-04 20:45 ` Christoph Lameter
2007-09-05 13:03 ` Mathieu Desnoyers
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-09-04 20:45 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
On Tue, 4 Sep 2007, Mathieu Desnoyers wrote:
> @@ -1566,12 +1565,13 @@ redo:
> object[c->offset]) != object))
> goto redo;
>
> - put_cpu();
> + local_exit(flags);
> if (unlikely((gfpflags & __GFP_ZERO)))
> memset(object, 0, c->objsize);
>
> return object;
> slow:
> + local_exit(flags);
Here we can be rescheduled to another processors.
> return __slab_alloc(s, gfpflags, node, addr, c)
c may point to the wrong processor.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] slub - Use local_t protection
2007-09-04 20:45 ` Christoph Lameter
@ 2007-09-05 13:03 ` Mathieu Desnoyers
2007-09-05 13:04 ` [PATCH] local_t protection (critical section) Mathieu Desnoyers
2007-09-05 13:06 ` [PATCH] slub - Use local_t protection Mathieu Desnoyers
2 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-09-05 13:03 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
* Christoph Lameter (clameter@sgi.com) wrote:
> On Tue, 4 Sep 2007, Mathieu Desnoyers wrote:
>
> > @@ -1566,12 +1565,13 @@ redo:
> > object[c->offset]) != object))
> > goto redo;
> >
> > - put_cpu();
> > + local_exit(flags);
> > if (unlikely((gfpflags & __GFP_ZERO)))
> > memset(object, 0, c->objsize);
> >
> > return object;
> > slow:
> > + local_exit(flags);
>
> Here we can be rescheduled to another processors.
>
> > return __slab_alloc(s, gfpflags, node, addr, c)
>
> c may point to the wrong processor.
Good point. the current CPU is not updated at the beginning of the
slow path.
I'll post the updated patchset. Comments are welcome, especially about
the naming scheme which is currently awkward.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] local_t protection (critical section)
2007-09-04 20:45 ` Christoph Lameter
2007-09-05 13:03 ` Mathieu Desnoyers
@ 2007-09-05 13:04 ` Mathieu Desnoyers
2007-09-12 22:33 ` Christoph Lameter
2007-09-05 13:06 ` [PATCH] slub - Use local_t protection Mathieu Desnoyers
2 siblings, 1 reply; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-09-05 13:04 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
local_t protection (critical section)
Adds local_enter_save(flags) and local_exit_restore(flags) as primitives to
surround critical sections using local_t types.
On architectures providing fast atomic primitives, this turns into a preempt
disable/enable().
However, on architectures not providing such fast primitives, such as ia64, it
turns into a local irq disable/enable so that we can use *_local primitives that
are non atomic.
This is only the primary work here: made for testing ia64 with cmpxchg_local
(other local_* primitives still use atomic_long_t operations as fallback).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Christoph Lameter <clameter@sgi.com>
---
include/asm-generic/local.h | 9 +++++++++
include/asm-i386/local.h | 17 +++++++++++++++++
include/asm-ia64/intrinsics.h | 14 ++++++++++++--
3 files changed, 38 insertions(+), 2 deletions(-)
Index: linux-2.6-lttng/include/asm-generic/local.h
=================================--- linux-2.6-lttng.orig/include/asm-generic/local.h 2007-09-04 15:32:02.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/local.h 2007-09-05 08:50:47.000000000 -0400
@@ -46,6 +46,15 @@ typedef struct
#define local_add_unless(l, a, u) atomic_long_add_unless((&(l)->a), (a), (u))
#define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
+#define local_enter_save(flags) local_irq_save(flags)
+#define local_exit_restore(flags) local_irq_restore(flags)
+#define local_enter() local_irq_disable()
+#define local_exit() local_irq_enable()
+#define local_nest_irq_save(flags) (flags)
+#define local_nest_irq_restore(flags) (flags)
+#define local_nest_irq_disable()
+#define local_nest_irq_enable()
+
/* Non-atomic variants, ie. preemption disabled and won't be touched
* in interrupt, etc. Some archs can optimize this case well. */
#define __local_inc(l) local_set((l), local_read(l) + 1)
Index: linux-2.6-lttng/include/asm-i386/local.h
=================================--- linux-2.6-lttng.orig/include/asm-i386/local.h 2007-09-04 15:28:52.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/local.h 2007-09-05 08:49:19.000000000 -0400
@@ -194,6 +194,23 @@ static __inline__ long local_sub_return(
})
#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
+#define local_enter_save(flags) \
+ do { \
+ (flags); \
+ preempt_disable(); \
+ } while (0)
+#define local_exit_restore(flags) \
+ do { \
+ (flags); \
+ preempt_enable(); \
+ } while (0)
+#define local_enter() preempt_disable()
+#define local_exit() preempt_enable()
+#define local_nest_irq_save(flags) local_irq_save(flags)
+#define local_nest_irq_restore(flags) local_irq_restore(flags)
+#define local_nest_irq_disable() local_irq_disable()
+#define local_nest_irq_enable() local_irq_enable()
+
/* On x86, these are no better than the atomic variants. */
#define __local_inc(l) local_inc(l)
#define __local_dec(l) local_dec(l)
Index: linux-2.6-lttng/include/asm-ia64/intrinsics.h
=================================--- linux-2.6-lttng.orig/include/asm-ia64/intrinsics.h 2007-09-04 15:47:24.000000000 -0400
+++ linux-2.6-lttng/include/asm-ia64/intrinsics.h 2007-09-04 15:49:41.000000000 -0400
@@ -160,8 +160,18 @@ extern long ia64_cmpxchg_called_with_bad
#define cmpxchg(ptr,o,n) cmpxchg_acq(ptr,o,n)
#define cmpxchg64(ptr,o,n) cmpxchg_acq(ptr,o,n)
-#define cmpxchg_local cmpxchg
-#define cmpxchg64_local cmpxchg64
+/* Must be executed between local_enter/local_exit. */
+static inline void *cmpxchg_local(void **p, void *old, void *new)
+{
+ unsigned long flags;
+ void *before;
+
+ before = *p;
+ if (likely(before = old))
+ *p = new;
+ return before;
+}
+#define cmpxchg64_local cmpxchg_local
#ifdef CONFIG_IA64_DEBUG_CMPXCHG
# define CMPXCHG_BUGCHECK_DECL int _cmpxchg_bugcheck_count = 128;
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] local_t protection (critical section)
2007-09-05 13:04 ` [PATCH] local_t protection (critical section) Mathieu Desnoyers
@ 2007-09-12 22:33 ` Christoph Lameter
2007-09-12 23:00 ` Mathieu Desnoyers
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Lameter @ 2007-09-12 22:33 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
On Wed, 5 Sep 2007, Mathieu Desnoyers wrote:
> Index: linux-2.6-lttng/include/asm-generic/local.h
> =================================> --- linux-2.6-lttng.orig/include/asm-generic/local.h 2007-09-04 15:32:02.000000000 -0400
> +++ linux-2.6-lttng/include/asm-generic/local.h 2007-09-05 08:50:47.000000000 -0400
> @@ -46,6 +46,15 @@ typedef struct
> #define local_add_unless(l, a, u) atomic_long_add_unless((&(l)->a), (a), (u))
> #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
>
> +#define local_enter_save(flags) local_irq_save(flags)
> +#define local_exit_restore(flags) local_irq_restore(flags)
> +#define local_enter() local_irq_disable()
> +#define local_exit() local_irq_enable()
> +#define local_nest_irq_save(flags) (flags)
> +#define local_nest_irq_restore(flags) (flags)
> +#define local_nest_irq_disable()
> +#define local_nest_irq_enable()
> +
This list is going to increase with RT support in SLUB? Argh.
> Index: linux-2.6-lttng/include/asm-i386/local.h
> =================================> --- linux-2.6-lttng.orig/include/asm-i386/local.h 2007-09-04 15:28:52.000000000 -0400
> +++ linux-2.6-lttng/include/asm-i386/local.h 2007-09-05 08:49:19.000000000 -0400
> @@ -194,6 +194,23 @@ static __inline__ long local_sub_return(
> })
> #define local_inc_not_zero(l) local_add_unless((l), 1, 0)
>
> +#define local_enter_save(flags) \
> + do { \
> + (flags); \
> + preempt_disable(); \
> + } while (0)
> +#define local_exit_restore(flags) \
> + do { \
> + (flags); \
> + preempt_enable(); \
> + } while (0)
This does not result in warnings because a variable is not used or used
uninitialized?
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] local_t protection (critical section)
2007-09-12 22:33 ` Christoph Lameter
@ 2007-09-12 23:00 ` Mathieu Desnoyers
0 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-09-12 23:00 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
* Christoph Lameter (clameter@sgi.com) wrote:
> On Wed, 5 Sep 2007, Mathieu Desnoyers wrote:
>
> > Index: linux-2.6-lttng/include/asm-generic/local.h
> > =================================> > --- linux-2.6-lttng.orig/include/asm-generic/local.h 2007-09-04 15:32:02.000000000 -0400
> > +++ linux-2.6-lttng/include/asm-generic/local.h 2007-09-05 08:50:47.000000000 -0400
> > @@ -46,6 +46,15 @@ typedef struct
> > #define local_add_unless(l, a, u) atomic_long_add_unless((&(l)->a), (a), (u))
> > #define local_inc_not_zero(l) atomic_long_inc_not_zero(&(l)->a)
> >
> > +#define local_enter_save(flags) local_irq_save(flags)
> > +#define local_exit_restore(flags) local_irq_restore(flags)
> > +#define local_enter() local_irq_disable()
> > +#define local_exit() local_irq_enable()
> > +#define local_nest_irq_save(flags) (flags)
> > +#define local_nest_irq_restore(flags) (flags)
> > +#define local_nest_irq_disable()
> > +#define local_nest_irq_enable()
> > +
>
> This list is going to increase with RT support in SLUB? Argh.
>
AFAIK, there is no difference between local irq save/restore in mainline
VS -RT. The same applies to preempt disable/enable.
The only thing we have to make sure is that the irq disable and
preempt disable code paths are short and O(1).
>
> > Index: linux-2.6-lttng/include/asm-i386/local.h
> > =================================> > --- linux-2.6-lttng.orig/include/asm-i386/local.h 2007-09-04 15:28:52.000000000 -0400
> > +++ linux-2.6-lttng/include/asm-i386/local.h 2007-09-05 08:49:19.000000000 -0400
> > @@ -194,6 +194,23 @@ static __inline__ long local_sub_return(
> > })
> > #define local_inc_not_zero(l) local_add_unless((l), 1, 0)
> >
> > +#define local_enter_save(flags) \
> > + do { \
> > + (flags); \
> > + preempt_disable(); \
> > + } while (0)
>
>
> > +#define local_exit_restore(flags) \
> > + do { \
> > + (flags); \
> > + preempt_enable(); \
> > + } while (0)
>
>
> This does not result in warnings because a variable is not used or used
> uninitialized?
Because the variable is not used at all if I don't put the "(flags)"
(gcc warns about this).
I'm glad that some of the proposed changes may help. I'll let the
cmpxchg_local patches sleep for a while so I can concentrate my efforts
on text edit lock, immediate values and markers. I think what we'll
really need for the cmpxchg_local is two flavors: one that is as atomic
as possible (for things such as tracing), and the other one the fastest
possible (potentially using irq disable). A lot of per architecture
testing/fine tuning will be required though, and I don't have the
hardware to do this.
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] slub - Use local_t protection
2007-09-04 20:45 ` Christoph Lameter
2007-09-05 13:03 ` Mathieu Desnoyers
2007-09-05 13:04 ` [PATCH] local_t protection (critical section) Mathieu Desnoyers
@ 2007-09-05 13:06 ` Mathieu Desnoyers
2007-09-12 22:28 ` Christoph Lameter
2 siblings, 1 reply; 13+ messages in thread
From: Mathieu Desnoyers @ 2007-09-05 13:06 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
slub - Use local_t protection
Use local_enter/local_exit for protection in the fast path.
Depends on the cmpxchg_local slub patch.
Changelog:
Add new primitives to switch from local critical section to interrupt disabled
section.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Christoph Lameter <clameter@sgi.com>
---
mm/slub.c | 55 ++++++++++++++++++++++++++++++++-----------------------
1 file changed, 32 insertions(+), 23 deletions(-)
Index: linux-2.6-lttng/mm/slub.c
=================================--- linux-2.6-lttng.orig/mm/slub.c 2007-09-05 09:01:00.000000000 -0400
+++ linux-2.6-lttng/mm/slub.c 2007-09-05 09:05:34.000000000 -0400
@@ -1065,9 +1065,6 @@ static struct page *new_slab(struct kmem
BUG_ON(flags & GFP_SLAB_BUG_MASK);
- if (flags & __GFP_WAIT)
- local_irq_enable();
-
page = allocate_slab(s,
flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
if (!page)
@@ -1100,8 +1097,6 @@ static struct page *new_slab(struct kmem
page->freelist = start;
page->inuse = 0;
out:
- if (flags & __GFP_WAIT)
- local_irq_disable();
return page;
}
@@ -1455,8 +1450,7 @@ static void *__slab_alloc(struct kmem_ca
struct page *new;
unsigned long flags;
- local_irq_save(flags);
- put_cpu_no_resched();
+ local_nest_irq_save(flags);
if (!c->page)
/* Slab was flushed */
goto new_slab;
@@ -1479,8 +1473,7 @@ load_freelist:
c->freelist = object[c->offset];
out:
slab_unlock(c->page);
- local_irq_restore(flags);
- preempt_check_resched();
+ local_nest_irq_restore(flags);
if (unlikely((gfpflags & __GFP_ZERO)))
memset(object, 0, c->objsize);
return object;
@@ -1494,8 +1487,16 @@ new_slab:
c->page = new;
goto load_freelist;
}
-
+ if (gfpflags & __GFP_WAIT) {
+ local_nest_irq_enable();
+ local_exit();
+ }
new = new_slab(s, gfpflags, node);
+ if (gfpflags & __GFP_WAIT) {
+ local_enter();
+ local_nest_irq_disable();
+ }
+
if (new) {
c = get_cpu_slab(s, smp_processor_id());
if (c->page) {
@@ -1523,8 +1524,7 @@ new_slab:
c->page = new;
goto load_freelist;
}
- local_irq_restore(flags);
- preempt_check_resched();
+ local_nest_irq_restore(flags);
return NULL;
debug:
object = c->page->freelist;
@@ -1552,8 +1552,11 @@ static void __always_inline *slab_alloc(
{
void **object;
struct kmem_cache_cpu *c;
+ unsigned long flags;
+ void *ret;
- c = get_cpu_slab(s, get_cpu());
+ local_enter_save(flags);
+ c = get_cpu_slab(s, smp_processor_id());
redo:
object = c->freelist;
if (unlikely(!object))
@@ -1566,14 +1569,15 @@ redo:
object[c->offset]) != object))
goto redo;
- put_cpu();
+ local_exit_restore(flags);
if (unlikely((gfpflags & __GFP_ZERO)))
memset(object, 0, c->objsize);
return object;
slow:
- return __slab_alloc(s, gfpflags, node, addr, c);
-
+ ret = __slab_alloc(s, gfpflags, node, addr, c);
+ local_exit_restore(flags);
+ return ret;
}
void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
@@ -1605,8 +1609,7 @@ static void __slab_free(struct kmem_cach
void **object = (void *)x;
unsigned long flags;
- put_cpu();
- local_irq_save(flags);
+ local_nest_irq_save(flags);
slab_lock(page);
if (unlikely(SlabDebug(page)))
@@ -1632,7 +1635,7 @@ checks_ok:
out_unlock:
slab_unlock(page);
- local_irq_restore(flags);
+ local_nest_irq_restore(flags);
return;
slab_empty:
@@ -1643,7 +1646,7 @@ slab_empty:
remove_partial(s, page);
slab_unlock(page);
- local_irq_restore(flags);
+ local_nest_irq_restore(flags);
discard_slab(s, page);
return;
@@ -1670,10 +1673,12 @@ static void __always_inline slab_free(st
void **object = (void *)x;
void **freelist;
struct kmem_cache_cpu *c;
+ unsigned long flags;
debug_check_no_locks_freed(object, s->objsize);
- c = get_cpu_slab(s, get_cpu());
+ local_enter_save(flags);
+ c = get_cpu_slab(s, smp_processor_id());
if (unlikely(c->node < 0))
goto slow;
redo:
@@ -1687,10 +1692,11 @@ redo:
!= freelist))
goto redo;
- put_cpu();
+ local_exit_restore(flags);
return;
slow:
__slab_free(s, page, x, addr, c->offset);
+ local_exit_restore(flags);
}
void kmem_cache_free(struct kmem_cache *s, void *x)
@@ -2026,8 +2032,11 @@ static struct kmem_cache_node *early_kme
BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
+ if (gfpflags & __GFP_WAIT)
+ local_irq_enable();
page = new_slab(kmalloc_caches, gfpflags, node);
-
+ if (gfpflags & __GFP_WAIT)
+ local_irq_disable();
BUG_ON(!page);
if (page_to_nid(page) != node) {
printk(KERN_ERR "SLUB: Unable to allocate memory from "
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] slub - Use local_t protection
2007-09-05 13:06 ` [PATCH] slub - Use local_t protection Mathieu Desnoyers
@ 2007-09-12 22:28 ` Christoph Lameter
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-09-12 22:28 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Peter Zijlstra, akpm, linux-kernel, mingo, linux-ia64
On Wed, 5 Sep 2007, Mathieu Desnoyers wrote:
> Use local_enter/local_exit for protection in the fast path.
Sorry that it took some time to get back to this issue. KS interfered.
> @@ -1494,8 +1487,16 @@ new_slab:
> c->page = new;
> goto load_freelist;
> }
> -
> + if (gfpflags & __GFP_WAIT) {
> + local_nest_irq_enable();
> + local_exit();
> + }
> new = new_slab(s, gfpflags, node);
> + if (gfpflags & __GFP_WAIT) {
> + local_enter();
> + local_nest_irq_disable();
> + }
> +
> if (new) {
> c = get_cpu_slab(s, smp_processor_id());
> if (c->page) {
Hmmmm... Definitely an interesting change to move the interrupt
enable/disable to __slab_alloc. But it looks like it is getting a bit
messy. All my attempts ended also like this. Sigh.
> @@ -2026,8 +2032,11 @@ static struct kmem_cache_node *early_kme
>
> BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
>
> + if (gfpflags & __GFP_WAIT)
> + local_irq_enable();
> page = new_slab(kmalloc_caches, gfpflags, node);
> -
> + if (gfpflags & __GFP_WAIT)
> + local_irq_disable();
> BUG_ON(!page);
> if (page_to_nid(page) != node) {
> printk(KERN_ERR "SLUB: Unable to allocate memory from "
Hmmmm... Actually we could drop the irq disable / enable here since this
is boot code. That would also allow the removal of the later
local_irq_enable.
Good idea. I think I will do the moving of the interrupt enable/disable
independently.
^ permalink raw reply [flat|nested] 13+ messages in thread