* hackbench regression since 2.6.25-rc @ 2008-03-13 7:46 Zhang, Yanmin 2008-03-13 8:48 ` Andrew Morton 2008-03-13 15:14 ` Greg KH 0 siblings, 2 replies; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-13 7:46 UTC (permalink / raw) To: Kay Sievers, Greg Kroah-Hartman; +Cc: LKML Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about 40% regression with 2.6.25-rc1, and more than 20% regression with kernel 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. Command to start it. #hackbench 100 process 2000 I ran it for 3 times and sum the values. I tried to investiagte it by bisect. Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between these 2 tags for many times manually and kernel always paniced. All patches between the 2 tags are on kobject restructure. I guess such restructure creates more cache miss on the 16-core tigerton. Any idea? -yanmin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-13 7:46 hackbench regression since 2.6.25-rc Zhang, Yanmin @ 2008-03-13 8:48 ` Andrew Morton 2008-03-13 9:28 ` Zhang, Yanmin 2008-03-13 15:14 ` Greg KH 1 sibling, 1 reply; 31+ messages in thread From: Andrew Morton @ 2008-03-13 8:48 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Thu, 13 Mar 2008 15:46:57 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote: > Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about > 40% regression with 2.6.25-rc1, and more than 20% regression with kernel > 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. > > Command to start it. > #hackbench 100 process 2000 > I ran it for 3 times and sum the values. > > I tried to investiagte it by bisect. > Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. > Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. > > Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between > these 2 tags for many times manually and kernel always paniced. > > All patches between the 2 tags are on kobject restructure. I guess such restructure > creates more cache miss on the 16-core tigerton. > That's pretty surprising - hackbench spends most of its time in userspace and zeroing out anonymous pages. It shouldn't be fiddling with kobjects much at all. Some kernel profiling might be needed here.. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-13 8:48 ` Andrew Morton @ 2008-03-13 9:28 ` Zhang, Yanmin 2008-03-13 9:52 ` Andrew Morton 2008-03-14 0:16 ` Christoph Lameter 0 siblings, 2 replies; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-13 9:28 UTC (permalink / raw) To: Andrew Morton, Christoph Lameter Cc: Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Thu, 2008-03-13 at 01:48 -0700, Andrew Morton wrote: > On Thu, 13 Mar 2008 15:46:57 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote: > > > Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about > > 40% regression with 2.6.25-rc1, and more than 20% regression with kernel > > 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. > > > > Command to start it. > > #hackbench 100 process 2000 > > I ran it for 3 times and sum the values. > > > > I tried to investiagte it by bisect. > > Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. > > Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. > > > > Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between > > these 2 tags for many times manually and kernel always paniced. > > > > All patches between the 2 tags are on kobject restructure. I guess such restructure > > creates more cache miss on the 16-core tigerton. > > > > That's pretty surprising - hackbench spends most of its time in userspace > and zeroing out anonymous pages. No. vmstat showed hackbench spends almost 100% in sys. > It shouldn't be fiddling with kobjects > much at all. > > Some kernel profiling might be needed here.. Thanks for your kind reminder. I don't know why I forgot it. 2.6.24 oprofile data: CPU: Core 2, speed 1602 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % image name app name symbol name 40200494 43.3899 linux-2.6.24 linux-2.6.24 __slab_alloc 35338431 38.1421 linux-2.6.24 linux-2.6.24 add_partial_tail 2993156 3.2306 linux-2.6.24 linux-2.6.24 __slab_free 1365806 1.4742 linux-2.6.24 linux-2.6.24 sock_alloc_send_skb 1253820 1.3533 linux-2.6.24 linux-2.6.24 copy_user_generic_string 1141442 1.2320 linux-2.6.24 linux-2.6.24 unix_stream_recvmsg 846836 0.9140 linux-2.6.24 linux-2.6.24 unix_stream_sendmsg 777561 0.8393 linux-2.6.24 linux-2.6.24 kmem_cache_alloc 587127 0.6337 linux-2.6.24 linux-2.6.24 sock_def_readable 2.6.25-rc4 oprofile data: CPU: Core 2, speed 1602 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % image name app name symbol name 46746994 43.3801 linux-2.6.25-rc4 linux-2.6.25-rc4 __slab_alloc 45986635 42.6745 linux-2.6.25-rc4 linux-2.6.25-rc4 add_partial 2577578 2.3919 linux-2.6.25-rc4 linux-2.6.25-rc4 __slab_free 1301644 1.2079 linux-2.6.25-rc4 linux-2.6.25-rc4 sock_alloc_send_skb 1185888 1.1005 linux-2.6.25-rc4 linux-2.6.25-rc4 copy_user_generic_string 969847 0.9000 linux-2.6.25-rc4 linux-2.6.25-rc4 unix_stream_recvmsg 806665 0.7486 linux-2.6.25-rc4 linux-2.6.25-rc4 kmem_cache_alloc 731059 0.6784 linux-2.6.25-rc4 linux-2.6.25-rc4 unix_stream_sendmsg -yanmin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-13 9:28 ` Zhang, Yanmin @ 2008-03-13 9:52 ` Andrew Morton 2008-03-14 0:16 ` Christoph Lameter 1 sibling, 0 replies; 31+ messages in thread From: Andrew Morton @ 2008-03-13 9:52 UTC (permalink / raw) To: Zhang, Yanmin Cc: Christoph Lameter, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Thu, 13 Mar 2008 17:28:58 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote: > On Thu, 2008-03-13 at 01:48 -0700, Andrew Morton wrote: > > On Thu, 13 Mar 2008 15:46:57 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote: > > > > > Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about > > > 40% regression with 2.6.25-rc1, and more than 20% regression with kernel > > > 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. > > > > > > Command to start it. > > > #hackbench 100 process 2000 > > > I ran it for 3 times and sum the values. > > > > > > I tried to investiagte it by bisect. > > > Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. > > > Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. > > > > > > Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between > > > these 2 tags for many times manually and kernel always paniced. > > > > > > All patches between the 2 tags are on kobject restructure. I guess such restructure > > > creates more cache miss on the 16-core tigerton. > > > > > > > That's pretty surprising - hackbench spends most of its time in userspace > > and zeroing out anonymous pages. > No. vmstat showed hackbench spends almost 100% in sys. ah, I got confused about which test that is. > > It shouldn't be fiddling with kobjects > > much at all. > > > > Some kernel profiling might be needed here.. > Thanks for your kind reminder. I don't know why I forgot it. > > 2.6.24 oprofile data: > CPU: Core 2, speed 1602 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 > samples % image name app name symbol name > 40200494 43.3899 linux-2.6.24 linux-2.6.24 __slab_alloc > 35338431 38.1421 linux-2.6.24 linux-2.6.24 add_partial_tail > 2993156 3.2306 linux-2.6.24 linux-2.6.24 __slab_free > 1365806 1.4742 linux-2.6.24 linux-2.6.24 sock_alloc_send_skb > 1253820 1.3533 linux-2.6.24 linux-2.6.24 copy_user_generic_string > 1141442 1.2320 linux-2.6.24 linux-2.6.24 unix_stream_recvmsg > 846836 0.9140 linux-2.6.24 linux-2.6.24 unix_stream_sendmsg > 777561 0.8393 linux-2.6.24 linux-2.6.24 kmem_cache_alloc > 587127 0.6337 linux-2.6.24 linux-2.6.24 sock_def_readable > > > > > 2.6.25-rc4 oprofile data: > CPU: Core 2, speed 1602 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 > samples % image name app name symbol name > 46746994 43.3801 linux-2.6.25-rc4 linux-2.6.25-rc4 __slab_alloc > 45986635 42.6745 linux-2.6.25-rc4 linux-2.6.25-rc4 add_partial > 2577578 2.3919 linux-2.6.25-rc4 linux-2.6.25-rc4 __slab_free > 1301644 1.2079 linux-2.6.25-rc4 linux-2.6.25-rc4 sock_alloc_send_skb > 1185888 1.1005 linux-2.6.25-rc4 linux-2.6.25-rc4 copy_user_generic_string > 969847 0.9000 linux-2.6.25-rc4 linux-2.6.25-rc4 unix_stream_recvmsg > 806665 0.7486 linux-2.6.25-rc4 linux-2.6.25-rc4 kmem_cache_alloc > 731059 0.6784 linux-2.6.25-rc4 linux-2.6.25-rc4 unix_stream_sendmsg > So slub got a litle slower? (Is slab any better?) Still, I don't think there are any kobject operations in these codepaths are there? Maybe some related to the network device, but I doubt it - networking tends to go it alone on those things, mainly for performance reasons. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-13 9:28 ` Zhang, Yanmin 2008-03-13 9:52 ` Andrew Morton @ 2008-03-14 0:16 ` Christoph Lameter 2008-03-14 3:04 ` Zhang, Yanmin 1 sibling, 1 reply; 31+ messages in thread From: Christoph Lameter @ 2008-03-14 0:16 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar Could you recompile the kernel with slub performance statistics and post the output of slabinfo -AD ? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 0:16 ` Christoph Lameter @ 2008-03-14 3:04 ` Zhang, Yanmin 2008-03-14 3:30 ` Zhang, Yanmin 2008-03-14 6:32 ` Christoph Lameter 0 siblings, 2 replies; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-14 3:04 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Thu, 2008-03-13 at 17:16 -0700, Christoph Lameter wrote: > Could you recompile the kernel with slub performance statistics and post > the output of > > slabinfo -AD Before testing with kernel 2.6.25-rc5: Name Objects Alloc Free %Fast vm_area_struct 2795 135185 132587 93 29 :0004096 25 119045 119043 99 98 :0000064 12257 119671 107742 98 50 :0000192 3312 78563 75370 92 21 :0000128 4648 48143 43738 97 53 dentry 15217 46675 31527 95 72 :0000080 12784 33674 21206 99 97 :0000016 4367 25871 23705 99 78 :0000096 3001 22591 20084 99 92 buffer_head 5536 18147 12884 97 42 anon_vma 1729 14948 14130 99 73 After testing: Name Objects Alloc Free %Fast :0000192 3428 80093958 80090708 92 8 :0000512 374 80016030 80015715 68 7 vm_area_struct 2875 224524 221868 94 20 :0000064 12408 134273 122227 98 47 :0004096 24 127397 127395 99 98 :0000128 4596 57837 53432 97 48 dentry 15659 51402 35824 95 64 :0000016 4584 29327 27161 99 76 :0000080 12784 33674 21206 99 97 :0000096 2998 26264 23757 99 93 So block 192 and 512's and very active and their fast free percentage is low. -yanmin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 3:04 ` Zhang, Yanmin @ 2008-03-14 3:30 ` Zhang, Yanmin 2008-03-14 5:28 ` Zhang, Yanmin 2008-03-14 6:34 ` Christoph Lameter 2008-03-14 6:32 ` Christoph Lameter 1 sibling, 2 replies; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-14 3:30 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 2008-03-14 at 11:04 +0800, Zhang, Yanmin wrote: > On Thu, 2008-03-13 at 17:16 -0700, Christoph Lameter wrote: > > Could you recompile the kernel with slub performance statistics and post > > the output of > > > > slabinfo -AD > Before testing with kernel 2.6.25-rc5: > Name Objects Alloc Free %Fast > vm_area_struct 2795 135185 132587 93 29 > :0004096 25 119045 119043 99 98 > :0000064 12257 119671 107742 98 50 > :0000192 3312 78563 75370 92 21 > :0000128 4648 48143 43738 97 53 > dentry 15217 46675 31527 95 72 > :0000080 12784 33674 21206 99 97 > :0000016 4367 25871 23705 99 78 > :0000096 3001 22591 20084 99 92 > buffer_head 5536 18147 12884 97 42 > anon_vma 1729 14948 14130 99 73 > > > After testing: > Name Objects Alloc Free %Fast > :0000192 3428 80093958 80090708 92 8 > :0000512 374 80016030 80015715 68 7 > vm_area_struct 2875 224524 221868 94 20 > :0000064 12408 134273 122227 98 47 > :0004096 24 127397 127395 99 98 > :0000128 4596 57837 53432 97 48 > dentry 15659 51402 35824 95 64 > :0000016 4584 29327 27161 99 76 > :0000080 12784 33674 21206 99 97 > :0000096 2998 26264 23757 99 93 > > > So block 192 and 512's and very active and their fast free percentage is low. On my 8-core stoakley, there is no such regression. Below data is after testing. [root@lkp-st02-x8664 ~]# slabinfo -AD Name Objects Alloc Free %Fast :0000192 3170 80055388 80052280 92 1 :0000512 316 80012750 80012466 69 1 vm_area_struct 2642 194700 192193 94 16 :0000064 3846 74468 70820 97 53 :0004096 15 69014 69012 98 97 :0000128 1447 32920 31541 91 8 dentry 13485 33060 19652 92 42 :0000080 10639 23377 12953 98 98 :0000096 1662 16496 15036 99 94 :0000832 232 14422 14203 85 10 :0000016 2733 15102 13372 99 14 So the block 192 and 512's fast free percentage is even smaller than the ones on tigerton. Oprofile data on stoakley: CPU: Core 2, speed 2660 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % app name symbol name 2897265 25.7603 linux-2.6.25-rc5 __slab_alloc 2689900 23.9166 linux-2.6.25-rc5 add_partial 629355 5.5957 linux-2.6.25-rc5 copy_user_generic_string 552309 4.9107 linux-2.6.25-rc5 __slab_free 514792 4.5771 linux-2.6.25-rc5 sock_alloc_send_skb 500879 4.4534 linux-2.6.25-rc5 unix_stream_recvmsg 274798 2.4433 linux-2.6.25-rc5 __kmalloc_track_caller 230283 2.0475 linux-2.6.25-rc5 kfree 222286 1.9764 linux-2.6.25-rc5 unix_stream_sendmsg 217413 1.9331 linux-2.6.25-rc5 memset_c 211589 1.8813 linux-2.6.25-rc5 kmem_cache_alloc 151500 1.3470 linux-2.6.25-rc5 system_call 132262 1.1760 linux-2.6.25-rc5 sock_def_readable 123130 1.0948 linux-2.6.25-rc5 kmem_cache_free 109518 0.9738 linux-2.6.25-rc5 sock_wfree yanmin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 3:30 ` Zhang, Yanmin @ 2008-03-14 5:28 ` Zhang, Yanmin 2008-03-14 6:39 ` Christoph Lameter 2008-03-14 6:34 ` Christoph Lameter 1 sibling, 1 reply; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-14 5:28 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 2008-03-14 at 11:30 +0800, Zhang, Yanmin wrote: > On Fri, 2008-03-14 at 11:04 +0800, Zhang, Yanmin wrote: > > On Thu, 2008-03-13 at 17:16 -0700, Christoph Lameter wrote: > > > Could you recompile the kernel with slub performance statistics and post > > > the output of > > > > > > slabinfo -AD > > Before testing with kernel 2.6.25-rc5: > > Name Objects Alloc Free %Fast > > vm_area_struct 2795 135185 132587 93 29 > > :0004096 25 119045 119043 99 98 > > :0000064 12257 119671 107742 98 50 > > :0000192 3312 78563 75370 92 21 > > :0000128 4648 48143 43738 97 53 > > dentry 15217 46675 31527 95 72 > > :0000080 12784 33674 21206 99 97 > > :0000016 4367 25871 23705 99 78 > > :0000096 3001 22591 20084 99 92 > > buffer_head 5536 18147 12884 97 42 > > anon_vma 1729 14948 14130 99 73 > > > > > > After testing: > > Name Objects Alloc Free %Fast > > :0000192 3428 80093958 80090708 92 8 > > :0000512 374 80016030 80015715 68 7 > > vm_area_struct 2875 224524 221868 94 20 > > :0000064 12408 134273 122227 98 47 > > :0004096 24 127397 127395 99 98 > > :0000128 4596 57837 53432 97 48 > > dentry 15659 51402 35824 95 64 > > :0000016 4584 29327 27161 99 76 > > :0000080 12784 33674 21206 99 97 > > :0000096 2998 26264 23757 99 93 > > > > > > So block 192 and 512's and very active and their fast free percentage is low. > On my 8-core stoakley, there is no such regression. Below data is after testing. > > [root@lkp-st02-x8664 ~]# slabinfo -AD > Name Objects Alloc Free %Fast > :0000192 3170 80055388 80052280 92 1 > :0000512 316 80012750 80012466 69 1 > vm_area_struct 2642 194700 192193 94 16 > :0000064 3846 74468 70820 97 53 > :0004096 15 69014 69012 98 97 > :0000128 1447 32920 31541 91 8 > dentry 13485 33060 19652 92 42 > :0000080 10639 23377 12953 98 98 > :0000096 1662 16496 15036 99 94 > :0000832 232 14422 14203 85 10 > :0000016 2733 15102 13372 99 14 > > So the block 192 and 512's fast free percentage is even smaller than the ones on tigerton. > > Oprofile data on stoakley: > > CPU: Core 2, speed 2660 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 > samples % app name symbol name > 2897265 25.7603 linux-2.6.25-rc5 __slab_alloc > 2689900 23.9166 linux-2.6.25-rc5 add_partial > 629355 5.5957 linux-2.6.25-rc5 copy_user_generic_string > 552309 4.9107 linux-2.6.25-rc5 __slab_free > 514792 4.5771 linux-2.6.25-rc5 sock_alloc_send_skb > 500879 4.4534 linux-2.6.25-rc5 unix_stream_recvmsg > 274798 2.4433 linux-2.6.25-rc5 __kmalloc_track_caller > 230283 2.0475 linux-2.6.25-rc5 kfree > 222286 1.9764 linux-2.6.25-rc5 unix_stream_sendmsg > 217413 1.9331 linux-2.6.25-rc5 memset_c > 211589 1.8813 linux-2.6.25-rc5 kmem_cache_alloc > 151500 1.3470 linux-2.6.25-rc5 system_call > 132262 1.1760 linux-2.6.25-rc5 sock_def_readable > 123130 1.0948 linux-2.6.25-rc5 kmem_cache_free > 109518 0.9738 linux-2.6.25-rc5 sock_wfree On tigerton, if I add "slub_max_order=3 slub_min_objects=16" to kernel boot cmdline, the result is improved significantly and it takes just 1/10 time of the original testing. Below is the new output of slabino -AD. Name Objects Alloc Free %Fast :0000192 3192 80087199 80084141 92 8 kmalloc-512 773 80016203 80015888 97 9 vm_area_struct 2787 223100 220525 94 17 :0004096 68 118322 118320 99 98 :0000064 12215 123575 111669 98 42 :0000128 4616 53826 49422 97 45 dentry 12373 49568 37286 95 65 :0000080 12823 33755 21206 99 97 So kmalloc-512 is the key. Then, I tested it on stoakley with the same kernel commandline. Improvement is about 50%. One important thing is without the boot parameter, hackbench on stoakey takes only 1/4 time of the one on tigerton. With the boot parameter, hackbench on tigerton is faster than the one on stoakely. Is it possible to initiate slub_min_objects based on possible cpu number? I mean, cpu_possible_map(). We could calculate slub_min_objects by a formular. -yanmin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 5:28 ` Zhang, Yanmin @ 2008-03-14 6:39 ` Christoph Lameter 2008-03-14 7:29 ` Zhang, Yanmin 0 siblings, 1 reply; 31+ messages in thread From: Christoph Lameter @ 2008-03-14 6:39 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > On tigerton, if I add "slub_max_order=3 slub_min_objects=16" to kernel > boot cmdline, the result is improved significantly and it takes just > 1/10 time of the original testing. Hmmm... That means the updates to SLUB in mm will fix the regression that you are seeing because we there can use large orders of slabs and fallback for all slab caches. But I am still interested to get to the details of slub behavior on the 16p. > So kmalloc-512 is the key. Yeah in 2.6.26-rc kmalloc-512 has 8 objects per slab. The mm version increases that with a larger allocation size. > Then, I tested it on stoakley with the same kernel commandline. > Improvement is about 50%. One important thing is without the boot > parameter, hackbench on stoakey takes only 1/4 time of the one on > tigerton. With the boot parameter, hackbench on tigerton is faster than > the one on stoakely. > > Is it possible to initiate slub_min_objects based on possible cpu > number? I mean, cpu_possible_map(). We could calculate slub_min_objects > by a formular. Hmmm... Interesting. Lets first get the details for 2.6.25-rc. Then we can start toying around with the slub version in mm to configure slub in such a way that we get best results on both machines. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 6:39 ` Christoph Lameter @ 2008-03-14 7:29 ` Zhang, Yanmin 2008-03-14 21:05 ` Christoph Lameter 0 siblings, 1 reply; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-14 7:29 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Thu, 2008-03-13 at 23:39 -0700, Christoph Lameter wrote: > On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > > > On tigerton, if I add "slub_max_order=3 slub_min_objects=16" to kernel > > boot cmdline, the result is improved significantly and it takes just > > 1/10 time of the original testing. > > Hmmm... That means the updates to SLUB in mm will fix the regression that > you are seeing because we there can use large orders of slabs and fallback > for all slab caches. But I am still interested to get to the details of > slub behavior on the 16p. > > > So kmalloc-512 is the key. > > Yeah in 2.6.26-rc kmalloc-512 has 8 objects per slab. The mm version > increases that with a larger allocation size. Would you like to give me a pointer to the patch? Is it one patch, or many patches? > > > Then, I tested it on stoakley with the same kernel commandline. > > Improvement is about 50%. One important thing is without the boot > > parameter, hackbench on stoakey takes only 1/4 time of the one on > > tigerton. With the boot parameter, hackbench on tigerton is faster than > > the one on stoakely. > > > > Is it possible to initiate slub_min_objects based on possible cpu > > number? I mean, cpu_possible_map(). We could calculate slub_min_objects > > by a formular. > > Hmmm... Interesting. Lets first get the details for 2.6.25-rc. Then we can > start toying around with the slub version in mm to configure slub in such > a way that we get best results on both machines. Boot parameter "slub_max_order=3 slub_min_objects=16" could boost perforamnce both on stoakley and on tigerton. So should we keep slub_min_objects scalable based on possible cpu number? When a machine has more cpu, it means more processes/threads will run on it and it will take more time when they compete for the same resources. SLAB is such a typical resource. -yanmin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 7:29 ` Zhang, Yanmin @ 2008-03-14 21:05 ` Christoph Lameter 0 siblings, 0 replies; 31+ messages in thread From: Christoph Lameter @ 2008-03-14 21:05 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > > Yeah in 2.6.26-rc kmalloc-512 has 8 objects per slab. The mm version > > increases that with a larger allocation size. > Would you like to give me a pointer to the patch? Is it one patch, or many patches? If you a git pull on the slab-mm branch off my VM tree on kernel.org then you got all you need. There will be an update in the next days though since some of the data you gave me already suggests a couple of ways that things may be made better. > > Hmmm... Interesting. Lets first get the details for 2.6.25-rc. Then we can > > start toying around with the slub version in mm to configure slub in such > > a way that we get best results on both machines. > Boot parameter "slub_max_order=3 slub_min_objects=16" could boost perforamnce > both on stoakley and on tigerton. Well the current slab-mm tree already does order 4 and min_objects=60 which is probably overkill. Next git push on slab-mm will reduce that to the values you found to be sufficient. > So should we keep slub_min_objects scalable based on possible cpu > number? When a machine has more cpu, it means more processes/threads > will run on it and it will take more time when they compete for the same > resources. SLAB is such a typical resource. We would have to do some experiments to see how cpu counts affect multiple benchmarks. If we can establish a consistent benefit from varying these parameters based on processor count then we should do so. There is already one example in mm/vmstat.c how this could be done. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 3:30 ` Zhang, Yanmin 2008-03-14 5:28 ` Zhang, Yanmin @ 2008-03-14 6:34 ` Christoph Lameter 2008-03-14 7:23 ` Zhang, Yanmin 1 sibling, 1 reply; 31+ messages in thread From: Christoph Lameter @ 2008-03-14 6:34 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > > So block 192 and 512's and very active and their fast free percentage > > is low. > On my 8-core stoakley, there is no such regression. Below data is after testing. Ok get the detailed statistics for this configuration as well. Then we can see what kind of slub behavior changes between both configurations. The 16p is really one node? No strange variances in memory latencies? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 6:34 ` Christoph Lameter @ 2008-03-14 7:23 ` Zhang, Yanmin 2008-03-14 21:06 ` Christoph Lameter 0 siblings, 1 reply; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-14 7:23 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Thu, 2008-03-13 at 23:34 -0700, Christoph Lameter wrote: > On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > > > > So block 192 and 512's and very active and their fast free percentage > > > is low. > > On my 8-core stoakley, there is no such regression. Below data is after testing. > > Ok get the detailed statistics for this configuration as well. Then we > can see what kind of slub behavior changes between both configurations. I did paste such data in a prior email. COpy it below. On my 8-core stoakley, there is no such regression. Below data is after testing. [root@lkp-st02-x8664 ~]# slabinfo -AD Name Objects Alloc Free %Fast :0000192 3170 80055388 80052280 92 1 :0000512 316 80012750 80012466 69 1 vm_area_struct 2642 194700 192193 94 16 :0000064 3846 74468 70820 97 53 :0004096 15 69014 69012 98 97 :0000128 1447 32920 31541 91 8 dentry 13485 33060 19652 92 42 :0000080 10639 23377 12953 98 98 :0000096 1662 16496 15036 99 94 :0000832 232 14422 14203 85 10 :0000016 2733 15102 13372 99 14 I ran it for many times and got the similiar output from slabinfo. > > The 16p is really one node? Yes. It's a SMP machine. > No strange variances in memory latencies? No. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 7:23 ` Zhang, Yanmin @ 2008-03-14 21:06 ` Christoph Lameter 2008-03-17 7:50 ` Zhang, Yanmin 0 siblings, 1 reply; 31+ messages in thread From: Christoph Lameter @ 2008-03-14 21:06 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > On my 8-core stoakley, there is no such regression. Below data is after > testing. I was looking for the details on two slab caches. The comparison of the details statistics is likely very interesting because we will be able to see how the doubling of processor counts affects the internal behavior of slub. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 21:06 ` Christoph Lameter @ 2008-03-17 7:50 ` Zhang, Yanmin 2008-03-17 17:32 ` Christoph Lameter 0 siblings, 1 reply; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-17 7:50 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 2008-03-14 at 14:06 -0700, Christoph Lameter wrote: > On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > > > On my 8-core stoakley, there is no such regression. Below data is after > > testing. > > I was looking for the details on two slab caches. The comparison of the > details statistics is likely very interesting because we will be able to > see how the doubling of processor counts affects the internal behavior of > slub. I collected more data on 16-p tigerton to try to find the possible relationship between slub_min_objects and processor number. Kernel is 2.6.25-rc5. Command\slub_min_objects | slub_min_objects=8 | 16 | 32 | 64 ------------------------------------------------------------------------------------- ./hackbench 100 process 2000 | 250second | 23 | 18.6 | 17.5 ./hackbench 200 process 2000 | 532 | 44 | 35.6 | 33.5 The first command line will start 4000 processes and the second will start 8000 processes. As the problemtic slab is kmalloc-512, slub_min_objects=8 is just the default configuration. Oprofile data shows the ratio of __slab_alloc+__slab_free+add_partial has no difference between the 2 commandline with the same kernel boot parameters. slub_min_objects | 8 | 16 | 32 | 64 -------------------------------------------------------------------------------------------- slab(__slab_alloc+__slab_free+add_partial) cpu utilization | 88.00% | 44.00% | 13.00% | 12% When slub_min_objects=32, we could get a reasonable value. Beyond 32, the improvement is very small. 32 is just possible_cpu_number*2 on my tigerton. It's hard to say hackbench simulates real applications closely. But it discloses a possible performance bottlebeck. Last year, we once captured the kmalloc-2048 issue by tbench. So the default slub_min_objects need to be revised. In the other hand, slab is allocated by alloc_page when its size is equal to or more than a half page, so enlarging slub_min_objects won't create too many slab page buffers. As for NUMA, perhaps we could define slub_min_objects to 2*max_cpu_number_per_node. -yanmin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-17 7:50 ` Zhang, Yanmin @ 2008-03-17 17:32 ` Christoph Lameter 2008-03-18 3:28 ` Zhang, Yanmin 0 siblings, 1 reply; 31+ messages in thread From: Christoph Lameter @ 2008-03-17 17:32 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Mon, 17 Mar 2008, Zhang, Yanmin wrote: > slub_min_objects | 8 | 16 | 32 | 64 > -------------------------------------------------------------------------------------------- > slab(__slab_alloc+__slab_free+add_partial) cpu utilization | 88.00% | 44.00% | 13.00% | 12% > > > When slub_min_objects=32, we could get a reasonable value. Beyond 32, the improvement > is very small. 32 is just possible_cpu_number*2 on my tigerton. Interesting. What is the optimal configuration for your 8p? Could you figure out the optimal configuration for an 4p and a 2p configuration? > It's hard to say hackbench simulates real applications closely. But it discloses a possible > performance bottlebeck. Last year, we once captured the kmalloc-2048 issue by tbench. So the > default slub_min_objects need to be revised. In the other hand, slab is allocated by alloc_page > when its size is equal to or more than a half page, so enlarging slub_min_objects won't create > too many slab page buffers. > > As for NUMA, perhaps we could define slub_min_objects to 2*max_cpu_number_per_node. Well for a 4k cpu configu this would set min_objects to 8192. So I think we could implement a form of logarithmic scaling based on cpu counts comparable to what is done for the statistics update in vmstat.c fls(num_online_cpus()) = 4 So maybe slub_min_objects= 8 + (2 + fls(num_online_cpus())) * 4 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-17 17:32 ` Christoph Lameter @ 2008-03-18 3:28 ` Zhang, Yanmin 2008-03-18 4:07 ` Christoph Lameter 0 siblings, 1 reply; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-18 3:28 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Mon, 2008-03-17 at 10:32 -0700, Christoph Lameter wrote: > On Mon, 17 Mar 2008, Zhang, Yanmin wrote: > > > slub_min_objects | 8 | 16 | 32 | 64 > > -------------------------------------------------------------------------------------------- > > slab(__slab_alloc+__slab_free+add_partial) cpu utilization | 88.00% | 44.00% | 13.00% | 12% > > > > > > When slub_min_objects=32, we could get a reasonable value. Beyond 32, the improvement > > is very small. 32 is just possible_cpu_number*2 on my tigerton. > > Interesting. What is the optimal configuration for your 8p? Could you > figure out the optimal configuration for an 4p and a 2p configuration? I used 8-core stoakley to do testing, and tried boot kernel with maxcpus=4 and 2. Just ran ./hackbench 100 process 2000. processor number\slub_min_objects | slub_min_objects=8 | 16 | 32 | 64 -------------------------------------------------------------------------------------------- 8p | 60second | 30 | 28.5 | 26.5 -------------------------------------------------------------------------------------------- 4p | 50second | 43 | 42 | -------------------------------------------------------------------------------------------- 2p | 92second | 79 | | As stoakley is just multi-core machine and hasn't hyper-threading, I also tested it on an old harwich machine which has 4 physical processors and 8 logical processors with hyperthreading. processor number\slub_min_objects | slub_min_objects=8 | 16 | 32 | 64 -------------------------------------------------------------------------------------------- 8p | 78.7second | 77.5| | > > > It's hard to say hackbench simulates real applications closely. But it discloses a possible > > performance bottlebeck. Last year, we once captured the kmalloc-2048 issue by tbench. So the > > default slub_min_objects need to be revised. In the other hand, slab is allocated by alloc_page > > when its size is equal to or more than a half page, so enlarging slub_min_objects won't create > > too many slab page buffers. > > > > As for NUMA, perhaps we could define slub_min_objects to 2*max_cpu_number_per_node. > > Well for a 4k cpu configu this would set min_objects to 8192. > So I think > we could implement a form of logarithmic scaling based on cpu > counts comparable to what is done for the statistics update in vmstat.c > > fls(num_online_cpus()) = 4 num_online_cpus as the input parameter is ok. A potential issue is how to consider cpu hot-plug. When num_online_cpus()=16, fls(num_online_cpus())=5. > > So maybe > > slub_min_objects= 8 + (2 + fls(num_online_cpus())) * 4 So slub_min_objects= 8 + (1 + fls(num_online_cpus())) * 4. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-18 3:28 ` Zhang, Yanmin @ 2008-03-18 4:07 ` Christoph Lameter 0 siblings, 0 replies; 31+ messages in thread From: Christoph Lameter @ 2008-03-18 4:07 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Tue, 18 Mar 2008, Zhang, Yanmin wrote: > num_online_cpus as the input parameter is ok. A potential issue is how to consider cpu hot-plug. Yeah I used nr_cpu_ids instead in the patchset that I cced you on. Maybe continue discussion on that thread? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 3:04 ` Zhang, Yanmin 2008-03-14 3:30 ` Zhang, Yanmin @ 2008-03-14 6:32 ` Christoph Lameter 2008-03-14 7:14 ` Zhang, Yanmin 1 sibling, 1 reply; 31+ messages in thread From: Christoph Lameter @ 2008-03-14 6:32 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > After testing: > Name Objects Alloc Free %Fast > :0000192 3428 80093958 80090708 92 8 > :0000512 374 80016030 80015715 68 7 Ahhh... Okay those slabs did not change for 2.6.25-rc. Is there really a difference to 2.6.24? > So block 192 and 512's and very active and their fast free percentage is low. Yes but that is to be expected given that hackbench does allocate objects and then passes them to other processors for freeing. Could you get me more details on the two critical slabs? Do slabinfo -a and then pick one alias for each of those sizes. Then do slabinfo skbuff_head (whatever alias you want to use to refer to the slab) for each of them. Should give some more insight as to how slub behaves with these two slab caches. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 6:32 ` Christoph Lameter @ 2008-03-14 7:14 ` Zhang, Yanmin 2008-03-14 21:08 ` Christoph Lameter 0 siblings, 1 reply; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-14 7:14 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Thu, 2008-03-13 at 23:32 -0700, Christoph Lameter wrote: > On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > > > After testing: > > Name Objects Alloc Free %Fast > > :0000192 3428 80093958 80090708 92 8 > > :0000512 374 80016030 80015715 68 7 > > Ahhh... Okay those slabs did not change for 2.6.25-rc. Is there > really a difference to 2.6.24? As oprofile shows slub functions spend more than 80% cpu time, I would like to focus on optimizing SLUB before going back to 2.6.24. > > > So block 192 and 512's and very active and their fast free percentage is low. > > Yes but that is to be expected given that hackbench does allocate objects > and then passes them to other processors for freeing. > > Could you get me more details on the two critical slabs? Yes, definitely. > > Do slabinfo -a and then pick one alias for each of those sizes. They are skbuff_head_cache and kmalloc-512. > > Then do > > slabinfo skbuff_head (whatever alias you want to use to refer to the slab) Slabcache: skbuff_head_cache Aliases: 7 Order : 0 Objects: 2848 Sizes (bytes) Slabs Debug Memory ------------------------------------------------------------------------ Object : 192 Total : 142 Sanity Checks : Off Total: 581632 SlabObj: 192 Full : 126 Redzoning : Off Used : 546816 SlabSiz: 4096 Partial: 0 Poisoning : Off Loss : 34816 Loss : 0 CpuSlab: 16 Tracking : Off Lalig: 0 Align : 8 Objects: 21 Tracing : Off Lpadd: 9088 skbuff_head_cache has no kmem_cache operations skbuff_head_cache: Kernel object allocation ----------------------------------------------------------------------- No Data skbuff_head_cache: Kernel object freeing ------------------------------------------------------------------------ No Data skbuff_head_cache: No NUMA information available. Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 74048234 6259131 92 7 Slowpath 6031994 73818377 7 92 Page Alloc 19746 19603 0 0 Add partial 0 4658709 0 5 Remove partial 4639106 19603 5 0 RemoteObj/SlabFrozen 0 3887872 0 4 Total 80080228 80077508 Refill 6031979 Deactivate Full=4658836(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%) Slabcache: kmalloc-512 Aliases: 1 Order : 0 Objects: 365 Sizes (bytes) Slabs Debug Memory ------------------------------------------------------------------------ Object : 512 Total : 61 Sanity Checks : Off Total: 249856 SlabObj: 512 Full : 36 Redzoning : Off Used : 186880 SlabSiz: 4096 Partial: 9 Poisoning : Off Loss : 62976 Loss : 0 CpuSlab: 16 Tracking : Off Lalig: 0 Align : 8 Objects: 8 Tracing : Off Lpadd: 0 kmalloc-512 has no kmem_cache operations kmalloc-512: Kernel object allocation ----------------------------------------------------------------------- No Data kmalloc-512: Kernel object freeing ------------------------------------------------------------------------ No Data kmalloc-512: No NUMA information available. Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 55039159 5006829 68 6 Slowpath 24975754 75007769 31 93 Page Alloc 73840 73779 0 0 Add partial 0 24341085 0 30 Remove partial 24267297 73779 30 0 RemoteObj/SlabFrozen 0 953614 0 1 Total 80014913 80014598 Refill 24975738 Deactivate Full=24341121(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%) > > for each of them. Should give some more insight as to how slub behaves > with these two slab caches. > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 7:14 ` Zhang, Yanmin @ 2008-03-14 21:08 ` Christoph Lameter 2008-03-15 0:15 ` Christoph Lameter 2008-03-17 3:05 ` Zhang, Yanmin 0 siblings, 2 replies; 31+ messages in thread From: Christoph Lameter @ 2008-03-14 21:08 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > > Ahhh... Okay those slabs did not change for 2.6.25-rc. Is there > > really a difference to 2.6.24? > As oprofile shows slub functions spend more than 80% cpu time, I would like > to focus on optimizing SLUB before going back to 2.6.24. I thought you wanted to address a regression vs 2.6.24? > kmalloc-512: No NUMA information available. > > Slab Perf Counter Alloc Free %Al %Fr > -------------------------------------------------- > Fastpath 55039159 5006829 68 6 > Slowpath 24975754 75007769 31 93 > Page Alloc 73840 73779 0 0 > Add partial 0 24341085 0 30 > Remove partial 24267297 73779 30 0 ^^^ add partial/remove partial is likely the cause for trouble here. 30% is unacceptably high. The larger allocs will reduce the partial handling overhead. That is likely the effect that we see here. > Refill 24975738 Duh refills at 50%? We could try to just switch to another slab instead of reusing the existing one. May also affect the add/remove partial situation. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 21:08 ` Christoph Lameter @ 2008-03-15 0:15 ` Christoph Lameter 2008-03-17 3:35 ` Zhang, Yanmin 2008-03-17 3:05 ` Zhang, Yanmin 1 sibling, 1 reply; 31+ messages in thread From: Christoph Lameter @ 2008-03-15 0:15 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar Here is a patch to just not perform refills but switch slabs instead. Could check what effect doing so has on the statistics you see on the 16p? --- mm/slub.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2008-03-14 16:49:36.000000000 -0700 +++ linux-2.6/mm/slub.c 2008-03-14 16:50:04.000000000 -0700 @@ -1474,10 +1474,7 @@ static void *__slab_alloc(struct kmem_ca goto new_slab; slab_lock(c->page); - if (unlikely(!node_match(c, node))) - goto another_slab; - - stat(c, ALLOC_REFILL); + goto another_slab; load_freelist: object = c->page->freelist; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-15 0:15 ` Christoph Lameter @ 2008-03-17 3:35 ` Zhang, Yanmin 2008-03-17 17:27 ` Christoph Lameter 0 siblings, 1 reply; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-17 3:35 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 2008-03-14 at 17:15 -0700, Christoph Lameter wrote: > Here is a patch to just not perform refills but switch slabs instead. > Could check what effect doing so has on the statistics you see on the 16p? > > --- > mm/slub.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > Index: linux-2.6/mm/slub.c > =================================================================== > --- linux-2.6.orig/mm/slub.c 2008-03-14 16:49:36.000000000 -0700 > +++ linux-2.6/mm/slub.c 2008-03-14 16:50:04.000000000 -0700 > @@ -1474,10 +1474,7 @@ static void *__slab_alloc(struct kmem_ca > goto new_slab; > > slab_lock(c->page); > - if (unlikely(!node_match(c, node))) > - goto another_slab; > - > - stat(c, ALLOC_REFILL); > + goto another_slab; > > load_freelist: > object = c->page->freelist; There is no much help. In 2.6.25-rc5, REFILL means refill from c->page->freelist and another_slab. It's looks like its definition is confusing. In the case of hackbench, mostly, c->page->freelist is NULL. With #hackbench 100 process 2000, 100*20*2 (totoally 4000) processes are started. vmstat shows about 300~500 processes are at RUNNING state, so every processor runqueue has more than 20 processes running on 16p tigerton. Below is the data with kernel 2.6.25-rc5+your_patch. [ymzhang@lkp-tt01-x8664 ~]$ slabinfo kmalloc-512 Slabcache: kmalloc-512 Aliases: 1 Order : 0 Objects: 352 Sizes (bytes) Slabs Debug Memory ------------------------------------------------------------------------ Object : 512 Total : 56 Sanity Checks : Off Total: 229376 SlabObj: 512 Full : 36 Redzoning : Off Used : 180224 SlabSiz: 4096 Partial: 4 Poisoning : Off Loss : 49152 Loss : 0 CpuSlab: 16 Tracking : Off Lalig: 0 Align : 8 Objects: 8 Tracing : Off Lpadd: 0 kmalloc-512 has no kmem_cache operations kmalloc-512: Kernel object allocation ----------------------------------------------------------------------- No Data kmalloc-512: Kernel object freeing ------------------------------------------------------------------------ No Data kmalloc-512: No NUMA information available. Slab Perf Counter Alloc Free %Al %Fr -------------------------------------------------- Fastpath 55883575 6130576 69 7 Slowpath 24131134 73883818 30 92 Page Alloc 84844 84788 0 0 Add partial 270625 23860257 0 29 Remove partial 24046290 84752 30 0 RemoteObj/SlabFrozen 270825 439015 0 0 Total 80014709 80014394 Deactivate Full=23860293(98%) Empty=200(0%) ToHead=0(0%) ToTail=270625(1%) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-17 3:35 ` Zhang, Yanmin @ 2008-03-17 17:27 ` Christoph Lameter 0 siblings, 0 replies; 31+ messages in thread From: Christoph Lameter @ 2008-03-17 17:27 UTC (permalink / raw) To: Zhang, Yanmin Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Mon, 17 Mar 2008, Zhang, Yanmin wrote: > There is no much help. In 2.6.25-rc5, REFILL means refill from c->page->freelist > and another_slab. It's looks like its definition is confusing. In the case of > hackbench, mostly, c->page->freelist is NULL. REFILL means refilling the per cpu objects from the freelist of the per cpu slab page. That could be bad because it requires taking the slab lock on the slab page. > Slab Perf Counter Alloc Free %Al %Fr > -------------------------------------------------- > Fastpath 55883575 6130576 69 7 > Slowpath 24131134 73883818 30 92 > Page Alloc 84844 84788 0 0 > Add partial 270625 23860257 0 29 > Remove partial 24046290 84752 30 0 Hmmm... I was hoping that add/remove partial numbers would come down. Ok lets forget about the patch. Increasing min_objects does the trick. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 21:08 ` Christoph Lameter 2008-03-15 0:15 ` Christoph Lameter @ 2008-03-17 3:05 ` Zhang, Yanmin 1 sibling, 0 replies; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-17 3:05 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Kay Sievers, Greg Kroah-Hartman, LKML, Ingo Molnar On Fri, 2008-03-14 at 14:08 -0700, Christoph Lameter wrote: > On Fri, 14 Mar 2008, Zhang, Yanmin wrote: > > > > Ahhh... Okay those slabs did not change for 2.6.25-rc. Is there > > > really a difference to 2.6.24? > > As oprofile shows slub functions spend more than 80% cpu time, I would like > > to focus on optimizing SLUB before going back to 2.6.24. > > I thought you wanted to address a regression vs 2.6.24? Initially I wanted to do so, but oprofile data showed both 2.6.24 and 2.6.25-rc aren't good with hachbench on tigerton. The slub_min_objects boot parameter could boost performance largely. So I think we need optimize it before addressing the regression. > > > kmalloc-512: No NUMA information available. > > > > Slab Perf Counter Alloc Free %Al %Fr > > -------------------------------------------------- > > Fastpath 55039159 5006829 68 6 > > Slowpath 24975754 75007769 31 93 > > Page Alloc 73840 73779 0 0 > > Add partial 0 24341085 0 30 > > Remove partial 24267297 73779 30 0 > > ^^^ add partial/remove partial is likely the cause for > trouble here. 30% is unacceptably high. The larger allocs will reduce the > partial handling overhead. That is likely the effect that we see here. > > > Refill 24975738 > > Duh refills at 50%? We could try to just switch to another slab instead of > reusing the existing one. May also affect the add/remove partial > situation. > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-13 7:46 hackbench regression since 2.6.25-rc Zhang, Yanmin 2008-03-13 8:48 ` Andrew Morton @ 2008-03-13 15:14 ` Greg KH 2008-03-13 16:19 ` Randy Dunlap 1 sibling, 1 reply; 31+ messages in thread From: Greg KH @ 2008-03-13 15:14 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: Kay Sievers, LKML On Thu, Mar 13, 2008 at 03:46:57PM +0800, Zhang, Yanmin wrote: > Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about > 40% regression with 2.6.25-rc1, and more than 20% regression with kernel > 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. > > Command to start it. > #hackbench 100 process 2000 > I ran it for 3 times and sum the values. > > I tried to investiagte it by bisect. > Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. > Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. > > Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between > these 2 tags for many times manually and kernel always paniced. Where is the kernel panicing? The changeset right after the last one above: bc87d2fe7a1190f1c257af8a91fc490b1ee35954, is a change to efivars, are you using that in your .config? > All patches between the 2 tags are on kobject restructure. I guess such restructure > creates more cache miss on the 16-core tigerton. Nothing should be creating kobjects on a normal load like this, so a regression seems very odd. Unless the /sys/kernel/uids/ stuff is triggering this? Do you have a link to where I can get hackbench (google seems to find lots of reports with it, but not the source itself), so I can test to see if we are accidentally creating kobjects with this load? thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-13 15:14 ` Greg KH @ 2008-03-13 16:19 ` Randy Dunlap 2008-03-13 17:12 ` Greg KH 0 siblings, 1 reply; 31+ messages in thread From: Randy Dunlap @ 2008-03-13 16:19 UTC (permalink / raw) To: Greg KH; +Cc: Zhang, Yanmin, Kay Sievers, LKML On Thu, 13 Mar 2008 08:14:13 -0700 Greg KH wrote: > On Thu, Mar 13, 2008 at 03:46:57PM +0800, Zhang, Yanmin wrote: > > Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about > > 40% regression with 2.6.25-rc1, and more than 20% regression with kernel > > 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. > > > > Command to start it. > > #hackbench 100 process 2000 > > I ran it for 3 times and sum the values. > > > > I tried to investiagte it by bisect. > > Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. > > Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. > > > > Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between > > these 2 tags for many times manually and kernel always paniced. > > Where is the kernel panicing? The changeset right after the last one > above: bc87d2fe7a1190f1c257af8a91fc490b1ee35954, is a change to efivars, > are you using that in your .config? > > > All patches between the 2 tags are on kobject restructure. I guess such restructure > > creates more cache miss on the 16-core tigerton. > > Nothing should be creating kobjects on a normal load like this, so a > regression seems very odd. Unless the /sys/kernel/uids/ stuff is > triggering this? > > Do you have a link to where I can get hackbench (google seems to find > lots of reports with it, but not the source itself), so I can test to > see if we are accidentally creating kobjects with this load? The version that I see referenced most often (unscientifically :) is somewhere under people.redhat.com/mingo/, like so: http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c --- ~Randy ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-13 16:19 ` Randy Dunlap @ 2008-03-13 17:12 ` Greg KH 2008-03-14 0:50 ` Zhang, Yanmin 0 siblings, 1 reply; 31+ messages in thread From: Greg KH @ 2008-03-13 17:12 UTC (permalink / raw) To: Randy Dunlap; +Cc: Zhang, Yanmin, Kay Sievers, LKML On Thu, Mar 13, 2008 at 09:19:21AM -0700, Randy Dunlap wrote: > On Thu, 13 Mar 2008 08:14:13 -0700 Greg KH wrote: > > > On Thu, Mar 13, 2008 at 03:46:57PM +0800, Zhang, Yanmin wrote: > > > Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about > > > 40% regression with 2.6.25-rc1, and more than 20% regression with kernel > > > 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. > > > > > > Command to start it. > > > #hackbench 100 process 2000 > > > I ran it for 3 times and sum the values. > > > > > > I tried to investiagte it by bisect. > > > Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. > > > Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. > > > > > > Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between > > > these 2 tags for many times manually and kernel always paniced. > > > > Where is the kernel panicing? The changeset right after the last one > > above: bc87d2fe7a1190f1c257af8a91fc490b1ee35954, is a change to efivars, > > are you using that in your .config? > > > > > All patches between the 2 tags are on kobject restructure. I guess such restructure > > > creates more cache miss on the 16-core tigerton. > > > > Nothing should be creating kobjects on a normal load like this, so a > > regression seems very odd. Unless the /sys/kernel/uids/ stuff is > > triggering this? > > > > Do you have a link to where I can get hackbench (google seems to find > > lots of reports with it, but not the source itself), so I can test to > > see if we are accidentally creating kobjects with this load? > > The version that I see referenced most often (unscientifically :) > is somewhere under people.redhat.com/mingo/, like so: > http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c Great, thanks for the link. In using that version, I do not see any kobjects being created at all when running the program. So I don't see how a kobject change could have caused any slowdown. Yanmin, is the above link the version you are using? Hm, running with "hackbench 100 process 2000" seems to lock up my laptop, maybe I shouldn't run 4000 tasks at once on such a memory starved machine... thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-13 17:12 ` Greg KH @ 2008-03-14 0:50 ` Zhang, Yanmin 2008-03-14 5:01 ` Greg KH 0 siblings, 1 reply; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-14 0:50 UTC (permalink / raw) To: Greg KH; +Cc: Randy Dunlap, Kay Sievers, LKML On Thu, 2008-03-13 at 10:12 -0700, Greg KH wrote: > On Thu, Mar 13, 2008 at 09:19:21AM -0700, Randy Dunlap wrote: > > On Thu, 13 Mar 2008 08:14:13 -0700 Greg KH wrote: > > > > > On Thu, Mar 13, 2008 at 03:46:57PM +0800, Zhang, Yanmin wrote: > > > > Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about > > > > 40% regression with 2.6.25-rc1, and more than 20% regression with kernel > > > > 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. > > > > > > > > Command to start it. > > > > #hackbench 100 process 2000 > > > > I ran it for 3 times and sum the values. > > > > > > > > I tried to investiagte it by bisect. > > > > Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. > > > > Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. > > > > > > > > Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between > > > > these 2 tags for many times manually and kernel always paniced. > > > > > > Where is the kernel panicing? The changeset right after the last one > > > above: bc87d2fe7a1190f1c257af8a91fc490b1ee35954, is a change to efivars, > > > are you using that in your .config? > > > > > > > All patches between the 2 tags are on kobject restructure. I guess such restructure > > > > creates more cache miss on the 16-core tigerton. > > > > > > Nothing should be creating kobjects on a normal load like this, so a > > > regression seems very odd. Unless the /sys/kernel/uids/ stuff is > > > triggering this? > > > > > > Do you have a link to where I can get hackbench (google seems to find > > > lots of reports with it, but not the source itself), so I can test to > > > see if we are accidentally creating kobjects with this load? > > > > The version that I see referenced most often (unscientifically :) > > is somewhere under people.redhat.com/mingo/, like so: > > http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c > > Great, thanks for the link. > > In using that version, I do not see any kobjects being created at all > when running the program. So I don't see how a kobject change could > have caused any slowdown. > > Yanmin, is the above link the version you are using? Yes. > > Hm, running with "hackbench 100 process 2000" seems to lock up my > laptop, maybe I shouldn't run 4000 tasks at once on such a memory > starved machine... The issue doesn't exist on my 8-core stoakley and on tulsa. So I don't think you could reproduce it on laptop. >From the oprofile data, perhaps we need dig into SLUB firstly. -yanmin ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 0:50 ` Zhang, Yanmin @ 2008-03-14 5:01 ` Greg KH 2008-03-14 5:32 ` Zhang, Yanmin 0 siblings, 1 reply; 31+ messages in thread From: Greg KH @ 2008-03-14 5:01 UTC (permalink / raw) To: Zhang, Yanmin; +Cc: Randy Dunlap, Kay Sievers, LKML On Fri, Mar 14, 2008 at 08:50:19AM +0800, Zhang, Yanmin wrote: > On Thu, 2008-03-13 at 10:12 -0700, Greg KH wrote: > > On Thu, Mar 13, 2008 at 09:19:21AM -0700, Randy Dunlap wrote: > > > On Thu, 13 Mar 2008 08:14:13 -0700 Greg KH wrote: > > > > > > > On Thu, Mar 13, 2008 at 03:46:57PM +0800, Zhang, Yanmin wrote: > > > > > Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about > > > > > 40% regression with 2.6.25-rc1, and more than 20% regression with kernel > > > > > 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. > > > > > > > > > > Command to start it. > > > > > #hackbench 100 process 2000 > > > > > I ran it for 3 times and sum the values. > > > > > > > > > > I tried to investiagte it by bisect. > > > > > Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. > > > > > Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. > > > > > > > > > > Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between > > > > > these 2 tags for many times manually and kernel always paniced. > > > > > > > > Where is the kernel panicing? The changeset right after the last one > > > > above: bc87d2fe7a1190f1c257af8a91fc490b1ee35954, is a change to efivars, > > > > are you using that in your .config? > > > > > > > > > All patches between the 2 tags are on kobject restructure. I guess such restructure > > > > > creates more cache miss on the 16-core tigerton. > > > > > > > > Nothing should be creating kobjects on a normal load like this, so a > > > > regression seems very odd. Unless the /sys/kernel/uids/ stuff is > > > > triggering this? > > > > > > > > Do you have a link to where I can get hackbench (google seems to find > > > > lots of reports with it, but not the source itself), so I can test to > > > > see if we are accidentally creating kobjects with this load? > > > > > > The version that I see referenced most often (unscientifically :) > > > is somewhere under people.redhat.com/mingo/, like so: > > > http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c > > > > Great, thanks for the link. > > > > In using that version, I do not see any kobjects being created at all > > when running the program. So I don't see how a kobject change could > > have caused any slowdown. > > > > Yanmin, is the above link the version you are using? > Yes. > > > > > Hm, running with "hackbench 100 process 2000" seems to lock up my > > laptop, maybe I shouldn't run 4000 tasks at once on such a memory > > starved machine... > The issue doesn't exist on my 8-core stoakley and on tulsa. So I don't think > you could reproduce it on laptop. But I should see any kobjects being created and destroyed as you are thinking that is the problem here, right? And I don't see any, so I'm thinking that this is probably something else. I'm still interested in why your machine was oopsing when bisecting through the kobject commits. I thought it all should have worked without problems, as I spend enough time trying to ensure it was so... thanks, greg k-h ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: hackbench regression since 2.6.25-rc 2008-03-14 5:01 ` Greg KH @ 2008-03-14 5:32 ` Zhang, Yanmin 0 siblings, 0 replies; 31+ messages in thread From: Zhang, Yanmin @ 2008-03-14 5:32 UTC (permalink / raw) To: Greg KH; +Cc: Randy Dunlap, Kay Sievers, LKML On Thu, 2008-03-13 at 22:01 -0700, Greg KH wrote: > On Fri, Mar 14, 2008 at 08:50:19AM +0800, Zhang, Yanmin wrote: > > On Thu, 2008-03-13 at 10:12 -0700, Greg KH wrote: > > > On Thu, Mar 13, 2008 at 09:19:21AM -0700, Randy Dunlap wrote: > > > > On Thu, 13 Mar 2008 08:14:13 -0700 Greg KH wrote: > > > > > > > > > On Thu, Mar 13, 2008 at 03:46:57PM +0800, Zhang, Yanmin wrote: > > > > > > Comparing with 2.6.24, on my 16-core tigerton, hackbench process mode has about > > > > > > 40% regression with 2.6.25-rc1, and more than 20% regression with kernel > > > > > > 2.6.25-rc4, because rc4 includes the reverting patch of scheduler load balance. > > > > > > > > > > > > Command to start it. > > > > > > #hackbench 100 process 2000 > > > > > > I ran it for 3 times and sum the values. > > > > > > > > > > > > I tried to investiagte it by bisect. > > > > > > Kernel up to tag 0f4dafc0563c6c49e17fe14b3f5f356e4c4b8806 has the 20% regression. > > > > > > Kernel up to tag 6e90aa972dda8ef86155eefcdbdc8d34165b9f39 hasn't regression. > > > > > > > > > > > > Any bisect between above 2 tags cause kernel hang. I tried to checkout to a point between > > > > > > these 2 tags for many times manually and kernel always paniced. > > > > > > > > > > Where is the kernel panicing? The changeset right after the last one > > > > > above: bc87d2fe7a1190f1c257af8a91fc490b1ee35954, is a change to efivars, > > > > > are you using that in your .config? > > > > > > > > > > > All patches between the 2 tags are on kobject restructure. I guess such restructure > > > > > > creates more cache miss on the 16-core tigerton. > > > > > > > > > > Nothing should be creating kobjects on a normal load like this, so a > > > > > regression seems very odd. Unless the /sys/kernel/uids/ stuff is > > > > > triggering this? > > > > > > > > > > Do you have a link to where I can get hackbench (google seems to find > > > > > lots of reports with it, but not the source itself), so I can test to > > > > > see if we are accidentally creating kobjects with this load? > > > > > > > > The version that I see referenced most often (unscientifically :) > > > > is somewhere under people.redhat.com/mingo/, like so: > > > > http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c > > > > > > Great, thanks for the link. > > > > > > In using that version, I do not see any kobjects being created at all > > > when running the program. So I don't see how a kobject change could > > > have caused any slowdown. > > > > > > Yanmin, is the above link the version you are using? > > Yes. > > > > > > > > Hm, running with "hackbench 100 process 2000" seems to lock up my > > > laptop, maybe I shouldn't run 4000 tasks at once on such a memory > > > starved machine... > > The issue doesn't exist on my 8-core stoakley and on tulsa. So I don't think > > you could reproduce it on laptop. > > But I should see any kobjects being created and destroyed as you are > thinking that is the problem here, right? Not just thinking. That's based on lots of testing. But as you know, performance work is complicated often. Now, I think maybe kernel image changes cache line alignment. > > And I don't see any, so I'm thinking that this is probably something > else. Yes. > > I'm still interested in why your machine was oopsing when bisecting > through the kobject commits. I thought it all should have worked > without problems, as I spend enough time trying to ensure it was so... Kernel panic after printing warning in kref_get when executing add_disk in rd_init. Thanks, Yanmin ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2008-03-18 4:08 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-03-13 7:46 hackbench regression since 2.6.25-rc Zhang, Yanmin 2008-03-13 8:48 ` Andrew Morton 2008-03-13 9:28 ` Zhang, Yanmin 2008-03-13 9:52 ` Andrew Morton 2008-03-14 0:16 ` Christoph Lameter 2008-03-14 3:04 ` Zhang, Yanmin 2008-03-14 3:30 ` Zhang, Yanmin 2008-03-14 5:28 ` Zhang, Yanmin 2008-03-14 6:39 ` Christoph Lameter 2008-03-14 7:29 ` Zhang, Yanmin 2008-03-14 21:05 ` Christoph Lameter 2008-03-14 6:34 ` Christoph Lameter 2008-03-14 7:23 ` Zhang, Yanmin 2008-03-14 21:06 ` Christoph Lameter 2008-03-17 7:50 ` Zhang, Yanmin 2008-03-17 17:32 ` Christoph Lameter 2008-03-18 3:28 ` Zhang, Yanmin 2008-03-18 4:07 ` Christoph Lameter 2008-03-14 6:32 ` Christoph Lameter 2008-03-14 7:14 ` Zhang, Yanmin 2008-03-14 21:08 ` Christoph Lameter 2008-03-15 0:15 ` Christoph Lameter 2008-03-17 3:35 ` Zhang, Yanmin 2008-03-17 17:27 ` Christoph Lameter 2008-03-17 3:05 ` Zhang, Yanmin 2008-03-13 15:14 ` Greg KH 2008-03-13 16:19 ` Randy Dunlap 2008-03-13 17:12 ` Greg KH 2008-03-14 0:50 ` Zhang, Yanmin 2008-03-14 5:01 ` Greg KH 2008-03-14 5:32 ` Zhang, Yanmin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox