* power increase issue on light load
@ 2011-06-23 2:43 Alex,Shi
2011-06-23 9:02 ` Peter Zijlstra
2011-07-01 5:44 ` Ming Lei
0 siblings, 2 replies; 17+ messages in thread
From: Alex,Shi @ 2011-06-23 2:43 UTC (permalink / raw)
To: ncrao, peterz, mingo
Cc: Chen, Tim C, Li, Shaohua, linux-kernel@vger.kernel.org
commit c8b281161dfa4bb5d5be63fb036ce19347b88c63 causes light load
benchmark use more than 10% system power on platform NHM-EP and laptop
Thinkpad T410 etc. The benchmarks are specpower and bltk office.
I tried to track this issue, but only find deep C sate time reduced
much, about from 90% to 30~40%, the C0 or C1 state increase much on
different machines.
Powertop just hints RES interrupts has a bit more. but when I try "perf
probe native_smp_send_reschedule". I didn't find much.
I also checked the /proc/schedstat, just can sure the load_balance was
called a bit more frequency. but pull_task() was called really rare.
The following are the /proc/schedstat increased number in about 300' when do bltk-office.
The getting command is here:
#on a 16 LCPU system, with 3 level domain, 0,1,2, so all domain number
is 48, the domain statistic number is 2 + 36, so fs=38,
$cat /proc/schedstat > schedstat ; sleep x ; cat /proc/schedstat >>
schedstat ; cat schedstat | grep domain | sed '49 i \\n' | awk -v fs=38
'BEGIN { RS=""; FS=" " } { if ( NR ==1) for (i=0; i<NF; i++)
{ value1[i]=$i ; } ; if ( NR ==2) for (i=0; i<NF; i++) { value2[i]=
$i } } END {ORS=" "; for (i=0;i<NF;i++){ if (i%fs == 0) ll="\n"; else
ll=""; print value2[i] - value1[i] ll }; print "\n" }'
BTW, the imbalance increasing is due to the SCALE increase about 1024.
=============== schedsat on 2.6.39 =========
0 0 7819 7812 6 7168 1 0 0 7812 617 614 3 3072 0 0 0 614 19617 19165 103 462848 349 1 0 19165 0 0 0 0 0 0 0 0 0 15484 819 0
0 0 6262 6243 13 20716 6 0 0 6243 80 80 0 0 0 0 0 17 19268 19210 53 62326 5 0 0 19210 0 0 0 0 0 0 0 0 0 4898 10 0
0 0 1418 1416 1 2230 1 0 0 1416 1 1 0 0 0 0 0 0 19263 19262 1 1531 0 0 1 19261 0 0 0 0 0 0 0 0 0 878 439 0
0 0 1351 1348 3 3072 1 1 0 1348 13 13 0 0 0 0 0 13 2160 2159 1 1024 0 0 0 2159 1 0 1 0 0 0 0 0 0 1265 11 0
0 0 1066 1066 0 511 0 0 0 1066 9 9 0 0 0 0 0 4 2160 2159 1 1024 0 0 0 2159 0 0 0 0 0 0 0 0 0 393 1 0
0 0 532 506 15 24850 11 0 0 506 2 2 0 0 0 0 0 2 2160 2157 2 4085 1 0 0 2157 0 0 0 0 0 0 0 0 0 942 435 0
0 0 1963 1962 1 1024 0 0 0 1962 267 267 0 0 0 0 0 267 8372 8367 5 5120 0 0 0 8367 0 0 0 0 0 0 0 0 0 1502 2 0
0 0 1865 1712 143 167805 14 5 0 1712 185 185 0 0 0 0 0 6 8372 8259 109 120287 4 0 0 8259 4 0 4 0 0 0 0 0 0 5855 0 0
0 0 476 476 0 501 0 0 0 122 10 10 0 0 0 0 0 1 8368 8368 0 0 0 0 0 8368 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1227 1227 0 0 0 0 0 1227 1 1 0 0 0 0 0 1 301 301 0 0 0 0 0 301 0 0 0 0 0 0 0 0 0 7 0 0
0 0 1136 1124 11 14075 1 0 0 1124 1 1 0 0 0 0 0 0 301 298 3 3072 0 0 0 298 0 0 0 0 0 0 0 0 0 396 0 0
0 0 489 489 0 0 0 0 0 8 0 0 0 0 0 0 0 0 301 301 0 0 0 0 0 301 0 0 0 0 0 0 0 0 0 36 0 0
0 0 1374 1374 0 0 0 0 0 1374 5 5 0 0 0 0 0 5 428 428 0 0 0 0 0 428 0 0 0 0 0 0 0 0 0 34 0 0
0 0 1301 1151 134 163543 17 4 1 1150 3 3 0 0 0 0 0 0 428 352 72 78311 4 0 0 352 1 0 1 0 0 0 0 0 0 961 0 0
0 0 407 407 0 0 0 0 0 9 0 0 0 0 0 0 0 0 424 424 0 0 0 0 0 424 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1100 1100 0 0 0 0 0 1100 0 0 0 0 0 0 0 0 159 159 0 0 0 0 0 159 0 0 0 0 0 0 0 0 0 2 0 0
0 0 1092 1071 20 24826 1 0 0 1071 0 0 0 0 0 0 0 0 159 157 2 1914 0 0 0 157 0 0 0 0 0 0 0 0 0 26 0 0
0 0 470 470 0 0 0 0 0 0 0 0 0 0 0 0 0 0 159 158 1 2047 0 0 1 157 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1257 1257 0 0 0 0 0 1257 0 0 0 0 0 0 0 0 196 196 0 0 0 0 0 196 0 0 0 0 0 0 0 0 0 23 0 0
0 0 1210 1051 151 175150 12 5 1 1050 0 0 0 0 0 0 0 0 196 159 35 37885 2 0 0 159 4 0 4 0 0 0 0 0 0 55 0 0
0 0 486 486 0 0 0 0 0 1 0 0 0 0 0 0 0 0 194 194 0 0 0 0 0 194 0 0 0 0 0 0 0 0 0 1 0 0
0 0 988 988 0 0 0 0 0 988 0 0 0 0 0 0 0 0 115 115 0 0 0 0 0 115 0 0 0 0 0 0 0 0 0 0 0 0
0 0 972 949 23 26105 0 0 0 949 0 0 0 0 0 0 0 0 115 115 0 0 0 0 0 115 0 0 0 0 0 0 0 0 0 0 0 0
0 0 457 457 0 0 0 0 0 0 0 0 0 0 0 0 0 0 115 115 0 0 0 0 0 115 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4431 4411 10 20992 10 1 0 4411 522 520 0 2560 2 0 0 520 17075 16946 117 133119 12 0 3 16943 0 0 0 0 0 0 0 0 0 15708 2 0
0 0 4421 4415 6 7424 0 0 0 697 353 353 0 0 0 0 0 5 17063 17050 12 15225 1 0 1 17049 0 0 0 0 0 0 0 0 0 24 0 0
0 0 283 283 0 0 0 0 0 0 42 42 0 0 0 0 0 0 17062 17062 0 0 0 0 0 17062 0 0 0 0 0 0 0 0 0 65 0 0
0 0 1272 1271 1 1024 0 0 0 1271 7 7 0 0 0 0 0 7 534 534 0 0 0 0 0 534 0 0 0 0 0 0 0 0 0 1175 0 0
0 0 1223 1223 0 0 0 0 0 13 5 5 0 0 0 0 0 0 534 533 1 1535 0 0 0 533 0 0 0 0 0 0 0 0 0 15 0 0
0 0 31 31 0 0 0 0 0 0 5 5 0 0 0 0 0 0 534 534 0 0 0 0 0 534 0 0 0 0 0 0 0 0 0 23 0 0
0 0 1743 1742 1 1024 0 0 0 1742 106 105 0 1024 1 0 0 105 2575 2570 4 5120 1 0 0 2570 0 0 0 0 0 0 0 0 0 1369 0 0
0 0 1743 1742 1 1024 0 0 0 151 67 67 0 0 0 0 0 0 2574 2554 17 20344 3 0 0 2554 0 0 0 0 0 0 0 0 0 184 0 0
0 0 49 49 0 0 0 0 0 0 38 38 0 0 0 0 0 0 2571 2570 0 1536 1 0 0 2570 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1126 1126 0 0 0 0 0 1126 2 2 0 0 0 0 0 2 211 210 0 1024 1 0 0 210 0 0 0 0 0 0 0 0 0 42 0 0
0 0 1126 1126 0 0 0 0 0 1 2 2 0 0 0 0 0 0 210 210 0 0 0 0 0 210 0 0 0 0 0 0 0 0 0 2 0 0
0 0 1 1 0 0 0 0 0 0 2 2 0 0 0 0 0 0 210 210 0 0 0 0 0 210 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1177 1175 2 2048 0 0 0 1175 0 0 0 0 0 0 0 0 222 222 0 0 0 0 0 222 0 0 0 0 0 0 0 0 0 27 0 0
0 0 1177 1177 0 0 0 0 0 17 0 0 0 0 0 0 0 0 222 209 9 12916 4 0 1 208 0 0 0 0 0 0 0 0 0 18 0 0
0 0 11 11 0 0 0 0 0 0 1 1 0 0 0 0 0 0 218 218 0 0 0 0 0 218 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1070 1069 1 1024 0 0 0 1069 0 0 0 0 0 0 0 0 123 123 0 0 0 0 0 123 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1070 1070 0 0 0 0 0 2 0 0 0 0 0 0 0 0 123 122 1 1024 0 0 0 122 0 0 0 0 0 0 0 0 0 3 0 0
0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 123 123 0 0 0 0 0 123 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1059 1059 0 0 0 0 0 1059 0 0 0 0 0 0 0 0 147 147 0 0 0 0 0 147 0 0 0 0 0 0 0 0 0 22 0 0
0 0 1059 1059 0 0 0 0 0 4 0 0 0 0 0 0 0 0 147 141 6 6010 0 0 0 141 0 0 0 0 0 0 0 0 0 7 0 0
0 0 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 147 147 0 0 0 0 0 147 0 0 0 0 0 0 0 0 0 0 0 0
0 0 987 987 0 0 0 0 0 987 0 0 0 0 0 0 0 0 115 115 0 0 0 0 0 115 0 0 0 0 0 0 0 0 0 0 0 0
0 0 987 987 0 0 0 0 0 0 0 0 0 0 0 0 0 0 115 115 0 0 0 0 0 115 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 115 115 0 0 0 0 0 115 0 0 0 0 0 0 0 0 0 0 0
============= for 3.0.0-rc4 ===================
0 0 9928 9926 2 2097152 0 0 0 9926 617 617 0 0 0 0 0 617 15813 15730 75 87031808 8 0 0 15730 0 0 0 0 0 0 0 0 0 15578 395 0
0 0 9882 9870 11 14942201 1 0 0 9870 489 489 0 0 0 0 0 18 15805 15794 10 11397255 1 0 0 15794 0 0 0 0 0 0 0 0 0 5812 22 0
0 0 1500 1498 1 2473919 1 0 0 1498 1 1 0 0 0 0 0 0 15804 15799 2 7864320 3 0 0 15799 0 0 0 0 0 0 0 0 0 1388 66 0
0 0 1764 1763 1 1048576 0 0 0 1763 102 102 0 0 0 0 0 102 4932 4930 1 2097152 1 0 0 4930 0 0 0 0 0 0 0 0 0 996 6 0
0 0 1764 1749 8 15728632 7 0 0 1749 90 90 0 0 0 0 0 6 4931 4877 54 56885246 0 0 0 4877 0 0 0 0 0 0 0 0 0 2183 11 0
0 0 652 631 15 21226946 6 0 0 631 5 5 0 0 0 0 0 0 4931 4764 167 181927108 0 0 1 4763 0 0 0 0 0 0 0 0 0 427 60 0
0 0 3577 3575 2 2097152 0 0 0 3575 113 113 0 0 0 0 0 113 2396 2390 6 6291456 0 0 1 2389 0 0 0 0 0 0 0 0 0 243 0 0
0 0 2896 2753 134 157411412 10 1 3 2750 96 96 0 0 0 0 0 1 2396 2338 56 62914553 2 0 0 2338 1 0 1 0 0 0 0 0 0 6115 0 0
0 0 644 644 0 0 0 0 2 174 2 2 0 0 0 0 0 1 2394 2389 2 8912888 3 0 0 2389 0 0 0 0 0 0 0 0 0 9 0 0
0 0 1573 1573 0 0 0 0 0 1573 170 170 0 0 0 0 0 170 4734 4731 1 3145728 2 0 0 4731 0 0 0 0 0 0 0 0 0 437 0 0
0 0 1451 1440 9 11796478 2 0 0 1440 96 96 0 0 0 0 0 1 4732 4699 33 35913726 0 0 0 4699 0 0 0 0 0 0 0 0 0 3137 0 0
0 0 558 558 0 0 0 0 0 13 17 17 0 0 0 0 0 0 4732 4729 2 4194291 1 0 0 4729 0 0 0 0 0 0 0 0 0 846 0 0
0 0 1690 1690 0 0 0 0 0 1690 1 1 0 0 0 0 0 1 297 297 0 0 0 0 0 297 0 0 0 0 0 0 0 0 0 56 0 0
0 0 1593 1440 144 171092595 13 5 0 1440 1 1 0 0 0 0 0 0 297 274 21 23330810 2 0 0 274 4 0 4 0 0 0 0 0 0 510 0 0
0 0 580 580 0 0 0 0 0 5 1 1 0 0 0 0 0 0 295 294 1 1048571 0 0 0 294 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1660 1660 0 0 0 0 0 1660 1 1 0 0 0 0 0 1 205 205 0 0 0 0 0 205 0 0 0 0 0 0 0 0 0 4 0 0
0 0 1660 1629 26 32243706 5 0 0 1629 1 1 0 0 0 0 0 0 205 185 17 20310149 3 0 0 185 0 0 0 0 0 0 0 0 0 185 0 0
0 0 573 573 0 0 0 0 0 3 1 1 0 0 0 0 0 0 202 202 0 0 0 0 0 202 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1483 1483 0 0 0 0 0 1483 0 0 0 0 0 0 0 0 146 146 0 0 0 0 0 146 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1375 1223 147 173014981 7 3 1 1222 0 0 0 0 0 0 0 0 146 134 8 12183684 4 0 0 134 2 0 2 0 0 0 0 0 0 21 0 0
0 0 558 558 0 0 0 0 0 1 0 0 0 0 0 0 0 0 142 142 0 0 0 0 0 142 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1372 1372 0 0 0 0 0 1372 0 0 0 0 0 0 0 0 129 129 0 0 0 0 0 129 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1372 1343 22 30408700 8 3 0 1343 0 0 0 0 0 0 0 0 129 124 3 5242879 2 0 0 124 1 0 1 0 0 0 0 0 0 11 0 0
0 0 542 542 0 0 0 0 0 0 0 0 0 0 0 0 0 0 127 127 0 0 0 0 0 127 0 0 0 0 0 0 0 0 0 0 0 0
0 0 12209 12190 16 20971520 3 0 0 12190 296 294 1 2097152 1 0 0 294 16022 15892 123 136839168 7 0 12 15880 0 0 0 0 0 0 0 0 0 16130 2 0
0 0 12206 12190 13 20185088 3 0 0 1180 195 195 0 0 0 0 0 0 16015 16006 8 11010047 1 0 1 16005 0 0 0 0 0 0 0 0 0 34 0 0
0 0 678 678 0 0 0 0 0 0 16 16 0 0 0 0 0 0 16014 16012 1 3145718 1 0 0 16012 0 0 0 0 0 0 0 0 0 32 1 0
0 0 1830 1828 2 2097152 0 0 0 1828 123 123 0 0 0 0 0 123 4970 4967 2 3145728 1 0 0 4967 0 0 0 0 0 0 0 0 0 1157 0 0
0 0 1830 1829 0 911495 1 0 0 65 95 95 0 0 0 0 0 0 4969 4966 2 2883582 1 0 0 4966 0 0 0 0 0 0 0 0 0 9 0 0
0 0 24 24 0 0 0 0 0 0 38 38 0 0 0 0 0 0 4968 4965 2 3670011 1 0 0 4965 0 0 0 0 0 0 0 0 0 2 0 0
0 0 2816 2813 3 3145728 0 0 0 2813 4 4 0 0 0 0 0 4 544 544 0 0 0 0 0 544 0 0 0 0 0 0 0 0 0 224 0 0
0 0 2816 2814 1 1960071 1 0 0 78 3 3 0 0 0 0 0 0 544 534 9 10747903 1 0 0 534 0 0 0 0 0 0 0 0 0 31 0 0
0 0 30 30 0 0 0 0 0 0 5 5 0 0 0 0 0 0 543 542 1 1048576 0 0 0 542 0 0 0 0 0 0 0 0 0 8 0 0
0 0 1551 1550 1 1048576 0 0 0 1550 8 8 0 0 0 0 0 8 672 669 2 3145728 1 0 0 669 0 0 0 0 0 0 0 0 0 411 0 0
0 0 1551 1551 0 0 0 0 0 18 7 7 0 0 0 0 0 0 671 668 3 3145727 0 0 0 668 0 0 0 0 0 0 0 0 0 302 0 0
0 0 14 14 0 0 0 0 0 0 3 3 0 0 0 0 0 0 671 671 0 524288 0 0 0 671 0 0 0 0 0 0 0 0 0 3 0 0
0 0 1378 1378 0 0 0 0 0 1378 0 0 0 0 0 0 0 0 169 169 0 0 0 0 0 169 0 0 0 0 0 0 0 0 0 58 0 0
0 0 1357 1356 0 1048576 1 0 0 6 0 0 0 0 0 0 0 0 169 163 6 7864317 0 0 0 163 0 0 0 0 0 0 0 0 0 18 0 0
0 0 15 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 169 169 0 0 0 0 0 169 0 0 0 0 0 0 0 0 0 3 0 0
0 0 1232 1232 0 0 0 0 0 1232 0 0 0 0 0 0 0 0 126 126 0 0 0 0 0 126 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1232 1232 0 0 0 0 0 2 0 0 0 0 0 0 0 0 126 123 3 2883583 0 0 0 123 0 0 0 0 0 0 0 0 0 3 0 0
0 0 2 2 0 0 0 0 0 0 1 1 0 0 0 0 0 0 126 126 0 0 0 0 0 126 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1222 1222 0 0 0 0 0 1222 1 1 0 0 0 0 0 1 122 122 0 0 0 0 0 122 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1113 1112 0 911495 1 0 0 13 1 1 0 0 0 0 0 0 122 118 4 4319367 0 0 0 118 0 0 0 0 0 0 0 0 0 2 0 0
0 0 47 47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 122 122 0 0 0 0 0 122 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1222 1222 0 0 0 0 0 1222 0 0 0 0 0 0 0 0 118 118 0 0 0 0 0 118 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1222 1222 0 0 0 0 0 0 0 0 0 0 0 0 0 0 118 116 2 2097152 0 0 0 116 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 118 118 0 0 0 0 0 118 0 0 0 0 0 0 0 0 0 0 0
Any ideas of this?
Regards
Alex
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-23 2:43 power increase issue on light load Alex,Shi
@ 2011-06-23 9:02 ` Peter Zijlstra
2011-06-24 0:41 ` Alex,Shi
2011-07-01 5:44 ` Ming Lei
1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2011-06-23 9:02 UTC (permalink / raw)
To: Alex,Shi
Cc: ncrao, mingo, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org
On Thu, 2011-06-23 at 10:43 +0800, Alex,Shi wrote:
> commit c8b281161dfa4bb5d5be63fb036ce19347b88c63 causes light load
> benchmark use more than 10% system power on platform NHM-EP and laptop
> Thinkpad T410 etc. The benchmarks are specpower and bltk office.
>
> I tried to track this issue, but only find deep C sate time reduced
> much, about from 90% to 30~40%, the C0 or C1 state increase much on
> different machines.
>
> Powertop just hints RES interrupts has a bit more. but when I try "perf
> probe native_smp_send_reschedule". I didn't find much.
>
> I also checked the /proc/schedstat, just can sure the load_balance was
> called a bit more frequency. but pull_task() was called really rare.
>
>
> The following are the /proc/schedstat increased number in about 300' when do bltk-office.
> The getting command is here:
> #on a 16 LCPU system, with 3 level domain, 0,1,2, so all domain number
> is 48, the domain statistic number is 2 + 36, so fs=38,
>
> $cat /proc/schedstat > schedstat ; sleep x ; cat /proc/schedstat >>
> schedstat ; cat schedstat | grep domain | sed '49 i \\n' | awk -v fs=38
> 'BEGIN { RS=""; FS=" " } { if ( NR ==1) for (i=0; i<NF; i++)
> { value1[i]=$i ; } ; if ( NR ==2) for (i=0; i<NF; i++) { value2[i]=
> $i } } END {ORS=" "; for (i=0;i<NF;i++){ if (i%fs == 0) ll="\n"; else
> ll=""; print value2[i] - value1[i] ll }; print "\n" }'
/proc/schedstat is already a massive pain to interpret and then you go
and mangle things even more and expect me to try and understand that
crap? I don't think so, life is too short.
> BTW, the imbalance increasing is due to the SCALE increase about 1024.
> Any ideas of this?
What happens if you try something like the below. Increased imbalance
might lead to more load-balance action, which might lead to more task
migration/waking up of cpus etc.
If the below makes any difference, Nikhil's changes have a funny that
needs to be caught.
---
include/linux/sched.h | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a837b20..84121d6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -808,15 +808,9 @@ enum cpu_idle_type {
* when BITS_PER_LONG <= 32 are pretty high and the returns do not justify the
* increased costs.
*/
-#if BITS_PER_LONG > 32
-# define SCHED_LOAD_RESOLUTION 10
-# define scale_load(w) ((w) << SCHED_LOAD_RESOLUTION)
-# define scale_load_down(w) ((w) >> SCHED_LOAD_RESOLUTION)
-#else
# define SCHED_LOAD_RESOLUTION 0
# define scale_load(w) (w)
# define scale_load_down(w) (w)
-#endif
#define SCHED_LOAD_SHIFT (10 + SCHED_LOAD_RESOLUTION)
#define SCHED_LOAD_SCALE (1L << SCHED_LOAD_SHIFT)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-23 9:02 ` Peter Zijlstra
@ 2011-06-24 0:41 ` Alex,Shi
2011-06-28 0:02 ` Alex,Shi
0 siblings, 1 reply; 17+ messages in thread
From: Alex,Shi @ 2011-06-24 0:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: ncrao@google.com, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, len.brown
On Thu, 2011-06-23 at 17:02 +0800, Peter Zijlstra wrote:
> On Thu, 2011-06-23 at 10:43 +0800, Alex,Shi wrote:
> > commit c8b281161dfa4bb5d5be63fb036ce19347b88c63 causes light load
> > benchmark use more than 10% system power on platform NHM-EP and laptop
> > Thinkpad T410 etc. The benchmarks are specpower and bltk office.
> >
> > I tried to track this issue, but only find deep C sate time reduced
> > much, about from 90% to 30~40%, the C0 or C1 state increase much on
> > different machines.
> >
> > Powertop just hints RES interrupts has a bit more. but when I try "perf
> > probe native_smp_send_reschedule". I didn't find much.
> >
> > I also checked the /proc/schedstat, just can sure the load_balance was
> > called a bit more frequency. but pull_task() was called really rare.
> >
> >
> > The following are the /proc/schedstat increased number in about 300' when do bltk-office.
> > The getting command is here:
> > #on a 16 LCPU system, with 3 level domain, 0,1,2, so all domain number
> > is 48, the domain statistic number is 2 + 36, so fs=38,
> >
> > $cat /proc/schedstat > schedstat ; sleep x ; cat /proc/schedstat >>
> > schedstat ; cat schedstat | grep domain | sed '49 i \\n' | awk -v fs=38
> > 'BEGIN { RS=""; FS=" " } { if ( NR ==1) for (i=0; i<NF; i++)
> > { value1[i]=$i ; } ; if ( NR ==2) for (i=0; i<NF; i++) { value2[i]=
> > $i } } END {ORS=" "; for (i=0;i<NF;i++){ if (i%fs == 0) ll="\n"; else
> > ll=""; print value2[i] - value1[i] ll }; print "\n" }'
>
> /proc/schedstat is already a massive pain to interpret and then you go
> and mangle things even more and expect me to try and understand that
> crap? I don't think so, life is too short.
>
> > BTW, the imbalance increasing is due to the SCALE increase about 1024.
>
> > Any ideas of this?
>
> What happens if you try something like the below. Increased imbalance
> might lead to more load-balance action, which might lead to more task
> migration/waking up of cpus etc.
>
> If the below makes any difference, Nikhil's changes have a funny that
> needs to be caught.
Yes, it most remove the commit effect, So the power recovered.
In fact the only suspicious I found is large imbalance, but that it is
the commit want to...
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-24 0:41 ` Alex,Shi
@ 2011-06-28 0:02 ` Alex,Shi
2011-06-28 14:59 ` Peter Zijlstra
0 siblings, 1 reply; 17+ messages in thread
From: Alex,Shi @ 2011-06-28 0:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: ncrao@google.com, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, len.brown
> >
> > What happens if you try something like the below. Increased imbalance
> > might lead to more load-balance action, which might lead to more task
> > migration/waking up of cpus etc.
> >
> > If the below makes any difference, Nikhil's changes have a funny that
> > needs to be caught.
>
> Yes, it most remove the commit effect, So the power recovered.
>
> In fact the only suspicious I found is large imbalance, but that is
> the commit want ...
Any further comments for this?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-28 0:02 ` Alex,Shi
@ 2011-06-28 14:59 ` Peter Zijlstra
2011-06-28 17:13 ` Nikhil Rao
0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2011-06-28 14:59 UTC (permalink / raw)
To: Alex,Shi
Cc: ncrao@google.com, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, len.brown
On Tue, 2011-06-28 at 08:02 +0800, Alex,Shi wrote:
> > >
> > > What happens if you try something like the below. Increased imbalance
> > > might lead to more load-balance action, which might lead to more task
> > > migration/waking up of cpus etc.
> > >
> > > If the below makes any difference, Nikhil's changes have a funny that
> > > needs to be caught.
> >
> > Yes, it most remove the commit effect, So the power recovered.
> >
> > In fact the only suspicious I found is large imbalance, but that is
> > the commit want ...
>
> Any further comments for this?
I had a look over all that stuff, but I couldn't find an obvious unit
mis-match in any of the imbalance code. Nikhil any clue?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-28 14:59 ` Peter Zijlstra
@ 2011-06-28 17:13 ` Nikhil Rao
2011-06-29 2:30 ` Nikhil Rao
0 siblings, 1 reply; 17+ messages in thread
From: Nikhil Rao @ 2011-06-28 17:13 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Alex, Shi, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, len.brown
On Tue, Jun 28, 2011 at 7:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2011-06-28 at 08:02 +0800, Alex,Shi wrote:
>> > >
>> > > What happens if you try something like the below. Increased imbalance
>> > > might lead to more load-balance action, which might lead to more task
>> > > migration/waking up of cpus etc.
>> > >
>> > > If the below makes any difference, Nikhil's changes have a funny that
>> > > needs to be caught.
>> >
>> > Yes, it most remove the commit effect, So the power recovered.
>> >
>> > In fact the only suspicious I found is large imbalance, but that is
>> > the commit want ...
>>
>> Any further comments for this?
>
> I had a look over all that stuff, but I couldn't find an obvious unit
> mis-match in any of the imbalance code. Nikhil any clue?
>
Sorry for the late reply. My mailbox filters failed me :-(
Alex -- I'm looking into this issue. Will get back to you soon.
-Thanks,
Nikhil
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-28 17:13 ` Nikhil Rao
@ 2011-06-29 2:30 ` Nikhil Rao
2011-06-29 3:22 ` Alex,Shi
2011-06-30 0:07 ` Nikhil Rao
0 siblings, 2 replies; 17+ messages in thread
From: Nikhil Rao @ 2011-06-29 2:30 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Alex, Shi, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, len.brown
On Tue, Jun 28, 2011 at 10:13 AM, Nikhil Rao <ncrao@google.com> wrote:
> On Tue, Jun 28, 2011 at 7:59 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Tue, 2011-06-28 at 08:02 +0800, Alex,Shi wrote:
>>> > >
>>> > > What happens if you try something like the below. Increased imbalance
>>> > > might lead to more load-balance action, which might lead to more task
>>> > > migration/waking up of cpus etc.
>>> > >
>>> > > If the below makes any difference, Nikhil's changes have a funny that
>>> > > needs to be caught.
>>> >
>>> > Yes, it most remove the commit effect, So the power recovered.
>>> >
>>> > In fact the only suspicious I found is large imbalance, but that is
>>> > the commit want ...
>>>
>>> Any further comments for this?
>>
>> I had a look over all that stuff, but I couldn't find an obvious unit
>> mis-match in any of the imbalance code. Nikhil any clue?
>>
>
> Sorry for the late reply. My mailbox filters failed me :-(
>
> Alex -- I'm looking into this issue. Will get back to you soon.
>
Looking at the schedstat data Alex posted:
- Distribution of load balances across cores looks about the same.
- Load balancer does more idle balances on 3.0-rc4 as compared to
2.6.39 on SMT and NUMA domains. Busy and newidle balances are a mixed
bag.
- I see far fewer affine wakeups on 3.0-rc4 as compared to 2.6.39.
About half as many affine wakeups on SMT and about a quarter as many
on NUMA.
I'm investigating the impact of the load resolution patchset on
effective load and wake affine calculations. This seems to be the most
obvious difference from the schedstat data.
Alex -- I have a couple of questions about your test setup and results.
- What is the impact on throughput of these benchmarks?
- Would it be possible to get a "perf sched" trace on these two kernels?
- I'm assuming the three sched domains are SMT, MC and NUMA. Is that
right? Do you have any powersavings balance or special sched domain
flags enabled?
- Are you using group scheduling? If so, what does your setup look like?
-Thanks,
Nikhil
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-29 2:30 ` Nikhil Rao
@ 2011-06-29 3:22 ` Alex,Shi
2011-06-29 6:55 ` Alex,Shi
2011-06-30 0:07 ` Nikhil Rao
1 sibling, 1 reply; 17+ messages in thread
From: Alex,Shi @ 2011-06-29 3:22 UTC (permalink / raw)
To: Nikhil Rao
Cc: Peter Zijlstra, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, Brown, Len
>
> Looking at the schedstat data Alex posted:
> - Distribution of load balances across cores looks about the same.
> - Load balancer does more idle balances on 3.0-rc4 as compared to
> 2.6.39 on SMT and NUMA domains. Busy and newidle balances are a mixed
> bag.
> - I see far fewer affine wakeups on 3.0-rc4 as compared to 2.6.39.
> About half as many affine wakeups on SMT and about a quarter as many
> on NUMA.
>
> I'm investigating the impact of the load resolution patchset on
> effective load and wake affine calculations. This seems to be the most
> obvious difference from the schedstat data.
>
> Alex -- I have a couple of questions about your test setup and results.
> - What is the impact on throughput of these benchmarks?
both on bltk-office and light load specpower, 10%/20%/30% load, the
throughput almost have no change on my NHM-EP server and t410 laptop.
> - Would it be possible to get a "perf sched" trace on these two kernels?
I will run the testing again and give you data later. but I didn't find
more useful data in 'perf record -e sched*'.
> - I'm assuming the three sched domains are SMT, MC and NUMA. Is that
> right? Do you have any powersavings balance or special sched domain
> flags enabled?
Yes, and the sched_mc_power_savings and sched_smt_power_savings were
both set. the NHM-EP domain like below:
CPU15 attaching sched-domain:
domain 0: span 7,15 level SIBLING
groups: 15 (cpu_power = 589) 7 (cpu_power = 589)
domain 1: span 1,3,5,7,9,11,13,15 level MC
groups: 7,15 (cpu_power = 1178) 1,9 (cpu_power = 1178) 3,11 (cpu_power = 1178) 5,13 (cpu_power = 1178)
domain 2: span 0-15 level NODE
groups: 1,3,5,7,9,11,13,15 (cpu_power = 4712) 0,2,4,6,8,10,12,14 (cpu_power = 4712)
> - Are you using group scheduling? If so, what does your setup look like?
I enabled the FAIR group default. But I have tried to disable it. the
problem is same. so, it isn't related to group.
>
> -Thanks,
> Nikhil
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-29 3:22 ` Alex,Shi
@ 2011-06-29 6:55 ` Alex,Shi
2011-06-30 0:26 ` Nikhil Rao
0 siblings, 1 reply; 17+ messages in thread
From: Alex,Shi @ 2011-06-29 6:55 UTC (permalink / raw)
To: Nikhil Rao
Cc: Peter Zijlstra, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, Brown, Len
On Wed, 2011-06-29 at 11:22 +0800, Alex,Shi wrote:
> >
> > Looking at the schedstat data Alex posted:
> > - Distribution of load balances across cores looks about the same.
> > - Load balancer does more idle balances on 3.0-rc4 as compared to
> > 2.6.39 on SMT and NUMA domains. Busy and newidle balances are a mixed
> > bag.
> > - I see far fewer affine wakeups on 3.0-rc4 as compared to 2.6.39.
> > About half as many affine wakeups on SMT and about a quarter as many
> > on NUMA.
> >
> > I'm investigating the impact of the load resolution patchset on
> > effective load and wake affine calculations. This seems to be the most
> > obvious difference from the schedstat data.
> >
> > Alex -- I have a couple of questions about your test setup and results.
> > - What is the impact on throughput of these benchmarks?
>
> both on bltk-office and light load specpower, 10%/20%/30% load, the
> throughput almost have no change on my NHM-EP server and t410 laptop.
> > - Would it be possible to get a "perf sched" trace on these two kernels?
I tried the 'perf sched record' and then 'perf sched trace' as usage
show. but in fact, the 'perf sched' doesn't support 'trace' command now.
since the 'perf sched record' is using 'perf record -e sched:xxx' to do
record. I used 'perf record' directly. The follow info collected in 300'
on my NHM-EP for benchmark bltk-office.
[alexs@lkp-ne01 ~]$ grep -e Events.*sched
linux-2.6/perf-report-3.0.0-rc5
# Events: 11K sched:sched_wakeup
# Events: 1K sched:sched_wakeup_new
# Events: 24K sched:sched_switch
# Events: 3K sched:sched_migrate_task
# Events: 851 sched:sched_process_free
# Events: 1K sched:sched_process_exit
# Events: 1K sched:sched_process_wait
# Events: 1K sched:sched_process_fork
# Events: 12K sched:sched_stat_wait
# Events: 9K sched:sched_stat_sleep
# Events: 452 sched:sched_stat_iowait
# Events: 16K sched:sched_stat_runtime
[alexs@lkp-ne01 ~]$
[alexs@lkp-ne01 ~]$
[alexs@lkp-ne01 ~]$ grep -e
Events.*sched /mnt/linux-2.6.39/perf-report-2.6.39
# Events: 5K sched:sched_wakeup
# Events: 615 sched:sched_wakeup_new
# Events: 11K sched:sched_switch
# Events: 2K sched:sched_migrate_task
# Events: 541 sched:sched_process_free
# Events: 692 sched:sched_process_exit
# Events: 1K sched:sched_process_wait
# Events: 615 sched:sched_process_fork
# Events: 6K sched:sched_stat_wait
# Events: 4K sched:sched_stat_sleep
# Events: 178 sched:sched_stat_iowait
# Events: 9K sched:sched_stat_runtime
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-29 2:30 ` Nikhil Rao
2011-06-29 3:22 ` Alex,Shi
@ 2011-06-30 0:07 ` Nikhil Rao
2011-06-30 8:34 ` Alex,Shi
1 sibling, 1 reply; 17+ messages in thread
From: Nikhil Rao @ 2011-06-30 0:07 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Alex, Shi, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, len.brown
On Tue, Jun 28, 2011 at 7:30 PM, Nikhil Rao <ncrao@google.com> wrote:
> Looking at the schedstat data Alex posted:
> - Distribution of load balances across cores looks about the same.
> - Load balancer does more idle balances on 3.0-rc4 as compared to
> 2.6.39 on SMT and NUMA domains. Busy and newidle balances are a mixed
> bag.
> - I see far fewer affine wakeups on 3.0-rc4 as compared to 2.6.39.
> About half as many affine wakeups on SMT and about a quarter as many
> on NUMA.
>
> I'm investigating the impact of the load resolution patchset on
> effective load and wake affine calculations. This seems to be the most
> obvious difference from the schedstat data.
>
I went through the math in effective load and wake affine and I think
it should be OK. There are a couple of corner cases where increasing
sched load resolution can change the result of wake affine -- I've
listed them below. However, I not convinced you are hitting these
cases often enough to make a noticeable difference. I'm looking into
the other LB paths...
- One corner case is because of rounding error in the shares update
path. Let's say the shares update logic assigned weight A to a sched
entity in the case with scaled resolution, and it assigned weight B
without scaling weights. Now, we expect A/1024 = B, but this is not
always the case because of rounding error. The difference between (A
and B*1024) gets amplified in wake_affine() since it multiplies
(weight+effective load) with imbalance pct and cpu power -- we
effectively scale this up by 5 orders of magnitude. In cases where
prev_eff_load and this_eff_load are pretty close, this difference can
result in a different result in wake_affine().
- There's a corner case in effective_load(), where if a task wakes up
on an empty cfs_rq, you could hit the clamp in effective_load (i.e. <
MIN_SHARES) which can affect prev_eff_load (you get a lower number --
making it less likely to do an affine wakeup). I think this patch
(against 3.0-rc4) will address that issue -- can you please give this
a try?
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 433491c..6fcfbfc 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1442,8 +1442,8 @@ static long effective_load(struct task_group
*tg, int cpu, long wl, long wg)
wl = tg->shares;
/* zero point is MIN_SHARES */
- if (wl < MIN_SHARES)
- wl = MIN_SHARES;
+ if (wl < scale_load(MIN_SHARES))
+ wl = scale_load(MIN_SHARES);
wl -= se->load.weight;
wg = 0;
}
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-29 6:55 ` Alex,Shi
@ 2011-06-30 0:26 ` Nikhil Rao
2011-06-30 8:38 ` Alex,Shi
0 siblings, 1 reply; 17+ messages in thread
From: Nikhil Rao @ 2011-06-30 0:26 UTC (permalink / raw)
To: Alex,Shi
Cc: Peter Zijlstra, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, Brown, Len
On Tue, Jun 28, 2011 at 11:55 PM, Alex,Shi <alex.shi@intel.com> wrote:
>> > - Would it be possible to get a "perf sched" trace on these two kernels?
>
> I tried the 'perf sched record' and then 'perf sched trace' as usage
> show. but in fact, the 'perf sched' doesn't support 'trace' command now.
I believe this was renamed to "perf script" in
133dc4c39c57eeef2577ca5b4ed24765b7a78ce2.
> since the 'perf sched record' is using 'perf record -e sched:xxx' to do
> record. I used 'perf record' directly. The follow info collected in 300'
> on my NHM-EP for benchmark bltk-office.
>
> [alexs@lkp-ne01 ~]$ grep -e Events.*sched
> linux-2.6/perf-report-3.0.0-rc5
> # Events: 11K sched:sched_wakeup
> # Events: 1K sched:sched_wakeup_new
> # Events: 24K sched:sched_switch
> # Events: 3K sched:sched_migrate_task
> # Events: 851 sched:sched_process_free
> # Events: 1K sched:sched_process_exit
> # Events: 1K sched:sched_process_wait
> # Events: 1K sched:sched_process_fork
> # Events: 12K sched:sched_stat_wait
> # Events: 9K sched:sched_stat_sleep
> # Events: 452 sched:sched_stat_iowait
> # Events: 16K sched:sched_stat_runtime
> [alexs@lkp-ne01 ~]$
> [alexs@lkp-ne01 ~]$
> [alexs@lkp-ne01 ~]$ grep -e
> Events.*sched /mnt/linux-2.6.39/perf-report-2.6.39
> # Events: 5K sched:sched_wakeup
> # Events: 615 sched:sched_wakeup_new
> # Events: 11K sched:sched_switch
> # Events: 2K sched:sched_migrate_task
> # Events: 541 sched:sched_process_free
> # Events: 692 sched:sched_process_exit
> # Events: 1K sched:sched_process_wait
> # Events: 615 sched:sched_process_fork
> # Events: 6K sched:sched_stat_wait
> # Events: 4K sched:sched_stat_sleep
> # Events: 178 sched:sched_stat_iowait
> # Events: 9K sched:sched_stat_runtime
>
>
Thanks for the data but these raw counts are not very useful. Can you
please send either the binary file or the ascii trace output for the
two kernels? Also -- a 300s trace might be too much; about 30s should
be sufficient.
-Thanks,
Nikhil
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-30 0:07 ` Nikhil Rao
@ 2011-06-30 8:34 ` Alex,Shi
0 siblings, 0 replies; 17+ messages in thread
From: Alex,Shi @ 2011-06-30 8:34 UTC (permalink / raw)
To: Nikhil Rao
Cc: Peter Zijlstra, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, Brown, Len
On Thu, 2011-06-30 at 08:07 +0800, Nikhil Rao wrote:
> On Tue, Jun 28, 2011 at 7:30 PM, Nikhil Rao <ncrao@google.com> wrote:
> > Looking at the schedstat data Alex posted:
> > - Distribution of load balances across cores looks about the same.
> > - Load balancer does more idle balances on 3.0-rc4 as compared to
> > 2.6.39 on SMT and NUMA domains. Busy and newidle balances are a mixed
> > bag.
> > - I see far fewer affine wakeups on 3.0-rc4 as compared to 2.6.39.
> > About half as many affine wakeups on SMT and about a quarter as many
> > on NUMA.
> >
> > I'm investigating the impact of the load resolution patchset on
> > effective load and wake affine calculations. This seems to be the most
> > obvious difference from the schedstat data.
> >
>
> I went through the math in effective load and wake affine and I think
> it should be OK. There are a couple of corner cases where increasing
> sched load resolution can change the result of wake affine -- I've
> listed them below. However, I not convinced you are hitting these
> cases often enough to make a noticeable difference. I'm looking into
> the other LB paths...
>
> - One corner case is because of rounding error in the shares update
> path. Let's say the shares update logic assigned weight A to a sched
> entity in the case with scaled resolution, and it assigned weight B
> without scaling weights. Now, we expect A/1024 = B, but this is not
> always the case because of rounding error. The difference between (A
> and B*1024) gets amplified in wake_affine() since it multiplies
> (weight+effective load) with imbalance pct and cpu power -- we
> effectively scale this up by 5 orders of magnitude. In cases where
> prev_eff_load and this_eff_load are pretty close, this difference can
> result in a different result in wake_affine().
>
> - There's a corner case in effective_load(), where if a task wakes up
> on an empty cfs_rq, you could hit the clamp in effective_load (i.e. <
> MIN_SHARES) which can affect prev_eff_load (you get a lower number --
> making it less likely to do an affine wakeup). I think this patch
> (against 3.0-rc4) will address that issue -- can you please give this
> a try?
I had tried disable CONFIG_FAIR_GROUP_SCHED, and the problem still here.
So, it won't have effect.
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 433491c..6fcfbfc 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1442,8 +1442,8 @@ static long effective_load(struct task_group
> *tg, int cpu, long wl, long wg)
> wl = tg->shares;
>
> /* zero point is MIN_SHARES */
> - if (wl < MIN_SHARES)
> - wl = MIN_SHARES;
> + if (wl < scale_load(MIN_SHARES))
> + wl = scale_load(MIN_SHARES);
> wl -= se->load.weight;
> wg = 0;
> }
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-30 0:26 ` Nikhil Rao
@ 2011-06-30 8:38 ` Alex,Shi
0 siblings, 0 replies; 17+ messages in thread
From: Alex,Shi @ 2011-06-30 8:38 UTC (permalink / raw)
To: Nikhil Rao
Cc: Peter Zijlstra, mingo@elte.hu, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org, Brown, Len
On Thu, 2011-06-30 at 08:26 +0800, Nikhil Rao wrote:
> On Tue, Jun 28, 2011 at 11:55 PM, Alex,Shi <alex.shi@intel.com> wrote:
> >> > - Would it be possible to get a "perf sched" trace on these two kernels?
> >
> > I tried the 'perf sched record' and then 'perf sched trace' as usage
> > show. but in fact, the 'perf sched' doesn't support 'trace' command now.
>
> I believe this was renamed to "perf script" in
> 133dc4c39c57eeef2577ca5b4ed24765b7a78ce2.
>
> > since the 'perf sched record' is using 'perf record -e sched:xxx' to do
> > record. I used 'perf record' directly. The follow info collected in 300'
> > on my NHM-EP for benchmark bltk-office.
> >
> > [alexs@lkp-ne01 ~]$ grep -e Events.*sched
> > linux-2.6/perf-report-3.0.0-rc5
> > # Events: 11K sched:sched_wakeup
> > # Events: 1K sched:sched_wakeup_new
> > # Events: 24K sched:sched_switch
> > # Events: 3K sched:sched_migrate_task
> > # Events: 851 sched:sched_process_free
> > # Events: 1K sched:sched_process_exit
> > # Events: 1K sched:sched_process_wait
> > # Events: 1K sched:sched_process_fork
> > # Events: 12K sched:sched_stat_wait
> > # Events: 9K sched:sched_stat_sleep
> > # Events: 452 sched:sched_stat_iowait
> > # Events: 16K sched:sched_stat_runtime
> > [alexs@lkp-ne01 ~]$
> > [alexs@lkp-ne01 ~]$
> > [alexs@lkp-ne01 ~]$ grep -e
> > Events.*sched /mnt/linux-2.6.39/perf-report-2.6.39
> > # Events: 5K sched:sched_wakeup
> > # Events: 615 sched:sched_wakeup_new
> > # Events: 11K sched:sched_switch
> > # Events: 2K sched:sched_migrate_task
> > # Events: 541 sched:sched_process_free
> > # Events: 692 sched:sched_process_exit
> > # Events: 1K sched:sched_process_wait
> > # Events: 615 sched:sched_process_fork
> > # Events: 6K sched:sched_stat_wait
> > # Events: 4K sched:sched_stat_sleep
> > # Events: 178 sched:sched_stat_iowait
> > # Events: 9K sched:sched_stat_runtime
> >
> >
>
> Thanks for the data but these raw counts are not very useful. Can you
> please send either the binary file or the ascii trace output for the
> two kernels? Also -- a 300s trace might be too much; about 30s should
> be sufficient.
The trace output file is much big, and guess no others has interesting
on it. I will give separately. :)
>
> -Thanks,
> Nikhil
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-06-23 2:43 power increase issue on light load Alex,Shi
2011-06-23 9:02 ` Peter Zijlstra
@ 2011-07-01 5:44 ` Ming Lei
2011-07-01 18:00 ` Nikhil Rao
1 sibling, 1 reply; 17+ messages in thread
From: Ming Lei @ 2011-07-01 5:44 UTC (permalink / raw)
To: Alex,Shi
Cc: ncrao, peterz, mingo, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org
Hi,
On Thu, Jun 23, 2011 at 10:43 AM, Alex,Shi <alex.shi@intel.com> wrote:
> commit c8b281161dfa4bb5d5be63fb036ce19347b88c63 causes light load
> benchmark use more than 10% system power on platform NHM-EP and laptop
> Thinkpad T410 etc. The benchmarks are specpower and bltk office.
>
> I tried to track this issue, but only find deep C sate time reduced
> much, about from 90% to 30~40%, the C0 or C1 state increase much on
> different machines.
>
> Powertop just hints RES interrupts has a bit more. but when I try "perf
> probe native_smp_send_reschedule". I didn't find much.
I see the problem on -rc5 in my T410 too, the fan has been ringing always
and powertop reports "Wakeups-from-idle per second : 508.6".
thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-07-01 5:44 ` Ming Lei
@ 2011-07-01 18:00 ` Nikhil Rao
2011-07-01 23:51 ` Ming Lei
2011-07-04 0:45 ` Alex,Shi
0 siblings, 2 replies; 17+ messages in thread
From: Nikhil Rao @ 2011-07-01 18:00 UTC (permalink / raw)
To: Ming Lei
Cc: Alex,Shi, peterz, mingo, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org
Hi Ming,
On Thu, Jun 30, 2011 at 10:44 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> Hi,
>
> On Thu, Jun 23, 2011 at 10:43 AM, Alex,Shi <alex.shi@intel.com> wrote:
>> commit c8b281161dfa4bb5d5be63fb036ce19347b88c63 causes light load
>> benchmark use more than 10% system power on platform NHM-EP and laptop
>> Thinkpad T410 etc. The benchmarks are specpower and bltk office.
>>
>> I tried to track this issue, but only find deep C sate time reduced
>> much, about from 90% to 30~40%, the C0 or C1 state increase much on
>> different machines.
>>
>> Powertop just hints RES interrupts has a bit more. but when I try "perf
>> probe native_smp_send_reschedule". I didn't find much.
>
> I see the problem on -rc5 in my T410 too, the fan has been ringing always
> and powertop reports "Wakeups-from-idle per second : 508.6".
>
Thanks for reporting this issue. Looks like the increased resolution
has a negative impact on power savings balance. I'm looking into this
issue based on Alex's data, but If you can give a trace or something
that I can work with it will be really helpful.
-Thanks,
Nikhil
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-07-01 18:00 ` Nikhil Rao
@ 2011-07-01 23:51 ` Ming Lei
2011-07-04 0:45 ` Alex,Shi
1 sibling, 0 replies; 17+ messages in thread
From: Ming Lei @ 2011-07-01 23:51 UTC (permalink / raw)
To: Nikhil Rao
Cc: Alex,Shi, peterz, mingo, Chen, Tim C, Li, Shaohua,
linux-kernel@vger.kernel.org
Hi,
On Sat, Jul 2, 2011 at 2:00 AM, Nikhil Rao <ncrao@google.com> wrote:
> Thanks for reporting this issue. Looks like the increased resolution
> has a negative impact on power savings balance. I'm looking into this
> issue based on Alex's data, but If you can give a trace or something
> that I can work with it will be really helpful.
No problem if I can do it, which traces are needed for you to figure out
the root cause?
thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: power increase issue on light load
2011-07-01 18:00 ` Nikhil Rao
2011-07-01 23:51 ` Ming Lei
@ 2011-07-04 0:45 ` Alex,Shi
1 sibling, 0 replies; 17+ messages in thread
From: Alex,Shi @ 2011-07-04 0:45 UTC (permalink / raw)
To: Nikhil Rao
Cc: Ming Lei, peterz@infradead.org, mingo@elte.hu, Chen, Tim C,
Li, Shaohua, linux-kernel@vger.kernel.org
On Sat, 2011-07-02 at 02:00 +0800, Nikhil Rao wrote:
> Hi Ming,
>
> On Thu, Jun 30, 2011 at 10:44 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> > Hi,
> >
> > On Thu, Jun 23, 2011 at 10:43 AM, Alex,Shi <alex.shi@intel.com> wrote:
> >> commit c8b281161dfa4bb5d5be63fb036ce19347b88c63 causes light load
> >> benchmark use more than 10% system power on platform NHM-EP and laptop
> >> Thinkpad T410 etc. The benchmarks are specpower and bltk office.
> >>
> >> I tried to track this issue, but only find deep C sate time reduced
> >> much, about from 90% to 30~40%, the C0 or C1 state increase much on
> >> different machines.
> >>
> >> Powertop just hints RES interrupts has a bit more. but when I try "perf
> >> probe native_smp_send_reschedule". I didn't find much.
> >
> > I see the problem on -rc5 in my T410 too, the fan has been ringing always
> > and powertop reports "Wakeups-from-idle per second : 508.6".
> >
>
> Thanks for reporting this issue. Looks like the increased resolution
> has a negative impact on power savings balance. I'm looking into this
> issue based on Alex's data, but If you can give a trace or something
> that I can work with it will be really helpful.
Ops, I didn't aware that my evolution didn't send out the trace code to
you, until this e-mail. I just resend them and it should be OK this
time. Please check it and give me a response.
Thanks!
>
> -Thanks,
> Nikhil
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-07-04 0:45 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-23 2:43 power increase issue on light load Alex,Shi
2011-06-23 9:02 ` Peter Zijlstra
2011-06-24 0:41 ` Alex,Shi
2011-06-28 0:02 ` Alex,Shi
2011-06-28 14:59 ` Peter Zijlstra
2011-06-28 17:13 ` Nikhil Rao
2011-06-29 2:30 ` Nikhil Rao
2011-06-29 3:22 ` Alex,Shi
2011-06-29 6:55 ` Alex,Shi
2011-06-30 0:26 ` Nikhil Rao
2011-06-30 8:38 ` Alex,Shi
2011-06-30 0:07 ` Nikhil Rao
2011-06-30 8:34 ` Alex,Shi
2011-07-01 5:44 ` Ming Lei
2011-07-01 18:00 ` Nikhil Rao
2011-07-01 23:51 ` Ming Lei
2011-07-04 0:45 ` Alex,Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox