public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation
@ 2009-11-25 18:21 Pekka Enberg
  2009-11-25 18:27 ` Joe Perches
  0 siblings, 1 reply; 7+ messages in thread
From: Pekka Enberg @ 2009-11-25 18:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: Pekka Enberg, Christoph Lameter

The unlikely() annotation in slab_alloc() covers too much of the expression.
It's actually very likely that the object is not NULL so use unlikely() only
for the __GFP_ZERO expression like SLAB does.

The patch reduces kernel text by 29 bytes on x86-64:

   text	   data	    bss	    dec	    hex	filename
  24185	   8560	    176	  32921	   8099	mm/slub.o.orig
  24156	   8560	    176	  32892	   807c	mm/slub.o

Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
---
 mm/slub.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 4996fc7..0956396 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1735,7 +1735,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
 	}
 	local_irq_restore(flags);
 
-	if (unlikely((gfpflags & __GFP_ZERO) && object))
+	if (unlikely(gfpflags & __GFP_ZERO) && object)
 		memset(object, 0, objsize);
 
 	kmemcheck_slab_alloc(s, gfpflags, object, c->objsize);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation
  2009-11-25 18:21 [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation Pekka Enberg
@ 2009-11-25 18:27 ` Joe Perches
  2009-11-25 18:30   ` Pekka Enberg
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2009-11-25 18:27 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-kernel, Christoph Lameter

On Wed, 2009-11-25 at 20:21 +0200, Pekka Enberg wrote:
> The unlikely() annotation in slab_alloc() covers too much of the expression.
> It's actually very likely that the object is not NULL so use unlikely() only
> for the __GFP_ZERO expression like SLAB does.
[]
> +++ b/mm/slub.c
> @@ -1735,7 +1735,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
> -	if (unlikely((gfpflags & __GFP_ZERO) && object))
> +	if (unlikely(gfpflags & __GFP_ZERO) && object)
>  		memset(object, 0, objsize);

so why not use

	if (unlikely(gfpflags & __GFP_ZERO) && likely(object))



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation
  2009-11-25 18:27 ` Joe Perches
@ 2009-11-25 18:30   ` Pekka Enberg
  2009-11-25 20:15     ` Christoph Lameter
  0 siblings, 1 reply; 7+ messages in thread
From: Pekka Enberg @ 2009-11-25 18:30 UTC (permalink / raw)
  To: Joe Perches; +Cc: linux-kernel, Christoph Lameter

Joe Perches wrote:
> On Wed, 2009-11-25 at 20:21 +0200, Pekka Enberg wrote:
>> The unlikely() annotation in slab_alloc() covers too much of the expression.
>> It's actually very likely that the object is not NULL so use unlikely() only
>> for the __GFP_ZERO expression like SLAB does.
> []
>> +++ b/mm/slub.c
>> @@ -1735,7 +1735,7 @@ static __always_inline void *slab_alloc(struct kmem_cache *s,
>> -	if (unlikely((gfpflags & __GFP_ZERO) && object))
>> +	if (unlikely(gfpflags & __GFP_ZERO) && object)
>>  		memset(object, 0, objsize);
> 
> so why not use
> 
> 	if (unlikely(gfpflags & __GFP_ZERO) && likely(object))

Because that has no effect on the generated code.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation
  2009-11-25 18:30   ` Pekka Enberg
@ 2009-11-25 20:15     ` Christoph Lameter
  2009-11-25 20:19       ` Pekka Enberg
  2009-11-26 20:27       ` Pekka Enberg
  0 siblings, 2 replies; 7+ messages in thread
From: Christoph Lameter @ 2009-11-25 20:15 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Joe Perches, linux-kernel

On Wed, 25 Nov 2009, Pekka Enberg wrote:

> Because that has no effect on the generated code.

Does this affect the cycle count in slab_alloc?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation
  2009-11-25 20:15     ` Christoph Lameter
@ 2009-11-25 20:19       ` Pekka Enberg
  2009-11-26 20:27       ` Pekka Enberg
  1 sibling, 0 replies; 7+ messages in thread
From: Pekka Enberg @ 2009-11-25 20:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Joe Perches, linux-kernel

On Wed, Nov 25, 2009 at 10:15 PM, Christoph Lameter
<cl@linux-foundation.org> wrote:
> On Wed, 25 Nov 2009, Pekka Enberg wrote:
>
>> Because that has no effect on the generated code.
>
> Does this affect the cycle count in slab_alloc?

Joe's suggestion doesn't change the generated code, my patch does and
probably changes cycle count. Let me see if I can run the in-kernel
slab tests.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation
  2009-11-25 20:15     ` Christoph Lameter
  2009-11-25 20:19       ` Pekka Enberg
@ 2009-11-26 20:27       ` Pekka Enberg
  2009-11-27 19:26         ` Christoph Lameter
  1 sibling, 1 reply; 7+ messages in thread
From: Pekka Enberg @ 2009-11-26 20:27 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Joe Perches, linux-kernel

On Wed, Nov 25, 2009 at 10:15 PM, Christoph Lameter
<cl@linux-foundation.org> wrote:
> On Wed, 25 Nov 2009, Pekka Enberg wrote:
>
>> Because that has no effect on the generated code.
>
> Does this affect the cycle count in slab_alloc?

I ran the synthetic benchmarks here and cycle count is roughly the
same for both versions.

SLUB 2.6.32-rc8:

  Single thread testing
  =====================
  1. Kmalloc: Repeatedly allocate then free test
  10000 times kmalloc(8) -> 118 cycles kfree -> 158 cycles
  10000 times kmalloc(16) -> 129 cycles kfree -> 162 cycles
  10000 times kmalloc(32) -> 136 cycles kfree -> 149 cycles
  10000 times kmalloc(64) -> 148 cycles kfree -> 168 cycles
  10000 times kmalloc(128) -> 189 cycles kfree -> 189 cycles
  10000 times kmalloc(256) -> 217 cycles kfree -> 242 cycles
  10000 times kmalloc(512) -> 254 cycles kfree -> 377 cycles
  10000 times kmalloc(1024) -> 284 cycles kfree -> 468 cycles
  10000 times kmalloc(2048) -> 304 cycles kfree -> 509 cycles
  10000 times kmalloc(4096) -> 388 cycles kfree -> 567 cycles
  10000 times kmalloc(8192) -> 523 cycles kfree -> 658 cycles
  10000 times kmalloc(16384) -> 790 cycles kfree -> 843 cycles
  2. Kmalloc: alloc/free test
  10000 times kmalloc(8)/kfree -> 199 cycles
  10000 times kmalloc(16)/kfree -> 197 cycles
  10000 times kmalloc(32)/kfree -> 197 cycles
  10000 times kmalloc(64)/kfree -> 196 cycles
  10000 times kmalloc(128)/kfree -> 197 cycles
  10000 times kmalloc(256)/kfree -> 197 cycles
  10000 times kmalloc(512)/kfree -> 210 cycles
  10000 times kmalloc(1024)/kfree -> 201 cycles
  10000 times kmalloc(2048)/kfree -> 186 cycles
  10000 times kmalloc(4096)/kfree -> 204 cycles
  10000 times kmalloc(8192)/kfree -> 202 cycles
  10000 times kmalloc(16384)/kfree -> 871 cycles
  Concurrent allocs
  =================
  Kmalloc N*alloc N*free(8): 0=160/182 1=160/179 Average=160/180
  Kmalloc N*alloc N*free(16): 0=164/189 1=163/186 Average=164/188
  Kmalloc N*alloc N*free(32): 0=177/194 1=176/192 Average=177/193
  Kmalloc N*alloc N*free(64): 0=206/211 1=202/209 Average=204/210
  Kmalloc N*alloc N*free(128): 0=238/283 1=238/288 Average=238/285
  Kmalloc N*alloc N*free(256): 0=286/465 1=282/462 Average=284/463
  Kmalloc N*alloc N*free(512): 0=385/604 1=382/600 Average=384/602
  Kmalloc N*alloc N*free(1024): 0=466/606 1=405/604 Average=436/605
  Kmalloc N*alloc N*free(2048): 0=417/650 1=415/659 Average=416/654
  Kmalloc N*alloc N*free(4096): 0=574/838 1=574/827 Average=574/833
  Kmalloc N*(alloc free)(8): 0=222 1=219 Average=221
  Kmalloc N*(alloc free)(16): 0=222 1=220 Average=221
  Kmalloc N*(alloc free)(32): 0=222 1=219 Average=221
  Kmalloc N*(alloc free)(64): 0=185 1=184 Average=184
  Kmalloc N*(alloc free)(128): 0=222 1=219 Average=221
  Kmalloc N*(alloc free)(256): 0=224 1=221 Average=223
  Kmalloc N*(alloc free)(512): 0=225 1=221 Average=223
  Kmalloc N*(alloc free)(1024): 0=225 1=223 Average=224
  Kmalloc N*(alloc free)(2048): 0=225 1=223 Average=224
  Kmalloc N*(alloc free)(4096): 0=225 1=222 Average=224
  Remote free test
  ================
  N*remote free(8): 0=8/187 1=161/0 Average=84/93
  N*remote free(16): 0=8/189 1=163/0 Average=86/94
  N*remote free(32): 0=8/198 1=177/0 Average=92/99
  N*remote free(64): 0=8/197 1=210/0 Average=109/98
  N*remote free(128): 0=8/214 1=249/0 Average=129/107
  N*remote free(256): 0=8/273 1=267/0 Average=137/136
  N*remote free(512): 0=8/380 1=283/0 Average=145/190
  N*remote free(1024): 0=8/466 1=300/0 Average=154/233
  N*remote free(2048): 0=8/486 1=325/0 Average=166/243
  N*remote free(4096): 0=8/564 1=420/0 Average=214/282
  1 alloc N free test
  ===================
  1 alloc N free(8): 0=211 1=152 Average=181
  1 alloc N free(16): 0=217 1=155 Average=186
  1 alloc N free(32): 0=243 1=165 Average=204
  1 alloc N free(64): 0=352 1=233 Average=293
  1 alloc N free(128): 0=356 1=232 Average=294
  1 alloc N free(256): 0=448 1=296 Average=372
  1 alloc N free(512): 0=468 1=308 Average=388
  1 alloc N free(1024): 0=450 1=282 Average=366
  1 alloc N free(2048): 0=391 1=218 Average=305
  1 alloc N free(4096): 0=508 1=286 Average=397


SLUB 2.6.32-rc8 + GFP_ZERO fix:

  Single thread testing
  =====================
  1. Kmalloc: Repeatedly allocate then free test
  10000 times kmalloc(8) -> 114 cycles kfree -> 146 cycles
  10000 times kmalloc(16) -> 131 cycles kfree -> 161 cycles
  10000 times kmalloc(32) -> 139 cycles kfree -> 186 cycles
  10000 times kmalloc(64) -> 162 cycles kfree -> 168 cycles
  10000 times kmalloc(128) -> 193 cycles kfree -> 190 cycles
  10000 times kmalloc(256) -> 218 cycles kfree -> 246 cycles
  10000 times kmalloc(512) -> 254 cycles kfree -> 390 cycles
  10000 times kmalloc(1024) -> 276 cycles kfree -> 450 cycles
  10000 times kmalloc(2048) -> 284 cycles kfree -> 492 cycles
  10000 times kmalloc(4096) -> 406 cycles kfree -> 536 cycles
  10000 times kmalloc(8192) -> 495 cycles kfree -> 630 cycles
  10000 times kmalloc(16384) -> 797 cycles kfree -> 788 cycles
  2. Kmalloc: alloc/free test
  10000 times kmalloc(8)/kfree -> 197 cycles
  10000 times kmalloc(16)/kfree -> 224 cycles
  10000 times kmalloc(32)/kfree -> 199 cycles
  10000 times kmalloc(64)/kfree -> 199 cycles
  10000 times kmalloc(128)/kfree -> 199 cycles
  10000 times kmalloc(256)/kfree -> 199 cycles
  10000 times kmalloc(512)/kfree -> 199 cycles
  10000 times kmalloc(1024)/kfree -> 199 cycles
  10000 times kmalloc(2048)/kfree -> 178 cycles
  10000 times kmalloc(4096)/kfree -> 208 cycles
  10000 times kmalloc(8192)/kfree -> 233 cycles
  10000 times kmalloc(16384)/kfree -> 875 cycles
  Concurrent allocs
  =================
  Kmalloc N*alloc N*free(8): 0=157/182 1=157/178 Average=157/180
  Kmalloc N*alloc N*free(16): 0=161/188 1=161/184 Average=161/186
  Kmalloc N*alloc N*free(32): 0=173/194 1=174/192 Average=173/193
  Kmalloc N*alloc N*free(64): 0=201/212 1=197/211 Average=199/211
  Kmalloc N*alloc N*free(128): 0=233/281 1=233/283 Average=233/282
  Kmalloc N*alloc N*free(256): 0=279/446 1=283/447 Average=281/446
  Kmalloc N*alloc N*free(512): 0=374/565 1=372/570 Average=373/567
  Kmalloc N*alloc N*free(1024): 0=394/591 1=391/593 Average=393/592
  Kmalloc N*alloc N*free(2048): 0=407/621 1=404/629 Average=405/625
  Kmalloc N*alloc N*free(4096): 0=559/786 1=558/788 Average=559/787
  Kmalloc N*(alloc free)(8): 0=216 1=214 Average=215
  Kmalloc N*(alloc free)(16): 0=216 1=214 Average=215
  Kmalloc N*(alloc free)(32): 0=216 1=214 Average=215
  Kmalloc N*(alloc free)(64): 0=216 1=214 Average=215
  Kmalloc N*(alloc free)(128): 0=216 1=214 Average=215
  Kmalloc N*(alloc free)(256): 0=216 1=214 Average=215
  Kmalloc N*(alloc free)(512): 0=216 1=215 Average=216
  Kmalloc N*(alloc free)(1024): 0=217 1=215 Average=216
  Kmalloc N*(alloc free)(2048): 0=217 1=215 Average=216
  Kmalloc N*(alloc free)(4096): 0=217 1=215 Average=216
  Remote free test
  ================
  N*remote free(8): 0=8/188 1=156/0 Average=82/94
  N*remote free(16): 0=8/189 1=159/0 Average=84/94
  N*remote free(32): 0=8/191 1=171/0 Average=89/95
  N*remote free(64): 0=8/197 1=200/0 Average=104/98
  N*remote free(128): 0=8/214 1=243/0 Average=125/107
  N*remote free(256): 0=8/266 1=258/0 Average=133/133
  N*remote free(512): 0=8/379 1=274/0 Average=141/189
  N*remote free(1024): 0=8/443 1=291/0 Average=150/221
  N*remote free(2048): 0=8/472 1=318/0 Average=163/236
  N*remote free(4096): 0=8/546 1=416/0 Average=212/273
  1 alloc N free test
  ===================
  1 alloc N free(8): 0=208 1=149 Average=179
  1 alloc N free(16): 0=219 1=158 Average=189
  1 alloc N free(32): 0=241 1=165 Average=203
  1 alloc N free(64): 0=311 1=207 Average=259
  1 alloc N free(128): 0=350 1=226 Average=288
  1 alloc N free(256): 0=439 1=292 Average=366
  1 alloc N free(512): 0=470 1=314 Average=392
  1 alloc N free(1024): 0=361 1=203 Average=282
  1 alloc N free(2048): 0=385 1=216 Average=300
  1 alloc N free(4096): 0=492 1=278 Average=385

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation
  2009-11-26 20:27       ` Pekka Enberg
@ 2009-11-27 19:26         ` Christoph Lameter
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2009-11-27 19:26 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Joe Perches, linux-kernel

On Thu, 26 Nov 2009, Pekka Enberg wrote:

> I ran the synthetic benchmarks here and cycle count is roughly the
> same for both versions.

Ok.

Acked-by: Christoph Lameter <cl@linux-foundation.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-11-27 19:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-25 18:21 [PATCH] SLUB: Fix __GFP_ZERO unlikely() annotation Pekka Enberg
2009-11-25 18:27 ` Joe Perches
2009-11-25 18:30   ` Pekka Enberg
2009-11-25 20:15     ` Christoph Lameter
2009-11-25 20:19       ` Pekka Enberg
2009-11-26 20:27       ` Pekka Enberg
2009-11-27 19:26         ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox