* OOPS: 2.6.16-rc6 cpufreq_conservative
@ 2006-03-18 20:25 Parag Warudkar
2006-03-18 21:40 ` Linus Torvalds
0 siblings, 1 reply; 31+ messages in thread
From: Parag Warudkar @ 2006-03-18 20:25 UTC (permalink / raw)
To: linux-kernel, alex-kernel, jun.nakajima, Linus Torvalds
I can reproduce the below OOPS by doing
$ modprobe cpufreq_conservative
$ echo conservative > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$ echo conservative > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
Which brings up a question - Do we really support difference scaling governors
for different cpu cores?
This is a Centrino Duo Laptop.
Unable to handle kernel NULL pointer dereference at virtual address 0000001c
printing eip:
f834e6d0
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: cpufreq_conservative oprofile ntfs snd_pcm_oss
snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device eth1394
ohci1394 i2c_i801 i2c_core hw_random ipw3945 ieee80211 ieee80211_crypt
firmware_class snd_hda_intel snd_hda_codec snd_pcm snd_timer snd
snd_page_alloc i915 drm cpufreq_ondemand speedstep_centrino b44
CPU: 1
EIP: 0060:[<f834e6d0>] Not tainted VLI
EFLAGS: 00010206 (2.6.16-rc6 #3)
EIP is at dbs_check_cpu+0x220/0x400 [cpufreq_conservative]
eax: 00000000 ebx: 00000001 ecx: 00000001 edx: 7840d1bc
esi: 78445730 edi: 00000002 ebp: 78445730 esp: 7a1c1ef0
ds: 007b es: 007b ss: 0068
Process events/1 (pid: 9, threadinfo=7a1c0000 task=7a33fa90)
Stack: <0>7840d1bc 00000002 00000001 7a2d3a80 00000000 f834f704 7a1aa1c0
00000000
f834e938 00000000 00000202 7a1c0000 f834f700 78132869 00000000 00000000
18a60e00 7a1aa1d0 7a1aa1e8 f834e8b0 00000202 7a1c0000 7a1aa1d8 7a1aa1d0
Call Trace:
[<f834e938>] do_dbs_timer+0x88/0xc0 [cpufreq_conservative]
[<78132869>] run_workqueue+0x79/0xf0
[<f834e8b0>] do_dbs_timer+0x0/0xc0 [cpufreq_conservative]
[<78132a38>] worker_thread+0x158/0x180
[<7811b0d0>] default_wake_function+0x0/0x20
[<7811b0d0>] default_wake_function+0x0/0x20
[<781328e0>] worker_thread+0x0/0x180
[<7813664c>] kthread+0xbc/0x100
[<78136590>] kthread+0x0/0x100
[<78101285>] kernel_thread_helper+0x5/0x10
Code: 0f bc c0 83 f8 03 bb 02 00 00 00 0f 4c d8 83 fb 01 77 49 89 ee 8d b6 00
00 00 00 8b 04 9d 04 80 44 78 bf 02 00 00 00 01 f0 8b 00 <8b> 40 1c 89 7c 24
04 c7 04 24 bc d1 40 78 89 04 9d 48 fa 34 f8
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-18 20:25 OOPS: 2.6.16-rc6 cpufreq_conservative Parag Warudkar @ 2006-03-18 21:40 ` Linus Torvalds 2006-03-18 22:09 ` Parag Warudkar ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Linus Torvalds @ 2006-03-18 21:40 UTC (permalink / raw) To: Parag Warudkar Cc: Linux Kernel Mailing List, alex-kernel, jun.nakajima, Dave Jones Gaah. Looking at the assembly code to try to figure out where the oops happened, one thing is clear: "for_each_cpu_mask()" generates some truly horrible code. C code that looks simple ends up being tons of complex assembly language due to inline functions etc. We've made it way too easy for people to write code that just absolutely _sucks_. It's extra sad, because disassembling Parag's oops, it looks like he has limited NR_CPU's to 2 (appropriate for his setup), yet we literally do things like: <code+0>: bsf %eax,%eax <code+3>: cmp $0x3,%eax <code+6>: mov $0x2,%ebx <code+11>: cmovl %eax,%ebx <code+14>: cmp $0x1,%ebx <code+17>: ja 0x80495fc <code+19>: mov %ebp,%esi <code+21>: lea 0x0(%esi),%esi <code+27>: mov 0x78448004(,%ebx,4),%eax <code+34>: mov $0x2,%edi <code+39>: add %esi,%eax ** <code+41>: mov (%eax),%eax ** oops here ** <code+43>: mov 0x1c(%eax),%eax <code+46>: mov %edi,0x4(%esp) <code+50>: movl $0x7840d1bc,(%esp) <code+57>: mov %eax,0x???(,%ebx,4) where 90% of that code seems to be just cruft from __next_cpu(): the "bsf", the comparisons against 3 (n+1), the $0x2 setup for calling find_next_bit().. Anyway, I _think_ it's this one: for_each_online_cpu(j) { dbs_info = &per_cpu(cpu_dbs_info, j); requested_freq[j] = dbs_info->cur_policy->cur; } where dbs_info->cur_policy seems to be NULL. Maybe. Exactly because 90% of the instructions come from that generic stuff around it, it's hard to be sure, and I don't have the same compiler version Parag does. Linus On Sat, 18 Mar 2006, Parag Warudkar wrote: > > Unable to handle kernel NULL pointer dereference at virtual address 0000001c > printing eip: > f834e6d0 > *pde = 00000000 > Oops: 0000 [#1] > PREEMPT SMP > Modules linked in: cpufreq_conservative oprofile ntfs snd_pcm_oss > snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device eth1394 > ohci1394 i2c_i801 i2c_core hw_random ipw3945 ieee80211 ieee80211_crypt > firmware_class snd_hda_intel snd_hda_codec snd_pcm snd_timer snd > snd_page_alloc i915 drm cpufreq_ondemand speedstep_centrino b44 > CPU: 1 > EIP: 0060:[<f834e6d0>] Not tainted VLI > EFLAGS: 00010206 (2.6.16-rc6 #3) > EIP is at dbs_check_cpu+0x220/0x400 [cpufreq_conservative] > eax: 00000000 ebx: 00000001 ecx: 00000001 edx: 7840d1bc > esi: 78445730 edi: 00000002 ebp: 78445730 esp: 7a1c1ef0 > ds: 007b es: 007b ss: 0068 > Process events/1 (pid: 9, threadinfo=7a1c0000 task=7a33fa90) > Stack: <0>7840d1bc 00000002 00000001 7a2d3a80 00000000 f834f704 7a1aa1c0 > 00000000 > f834e938 00000000 00000202 7a1c0000 f834f700 78132869 00000000 00000000 > 18a60e00 7a1aa1d0 7a1aa1e8 f834e8b0 00000202 7a1c0000 7a1aa1d8 7a1aa1d0 > Call Trace: > [<f834e938>] do_dbs_timer+0x88/0xc0 [cpufreq_conservative] > [<78132869>] run_workqueue+0x79/0xf0 > [<f834e8b0>] do_dbs_timer+0x0/0xc0 [cpufreq_conservative] > [<78132a38>] worker_thread+0x158/0x180 > [<7811b0d0>] default_wake_function+0x0/0x20 > [<7811b0d0>] default_wake_function+0x0/0x20 > [<781328e0>] worker_thread+0x0/0x180 > [<7813664c>] kthread+0xbc/0x100 > [<78136590>] kthread+0x0/0x100 > [<78101285>] kernel_thread_helper+0x5/0x10 > Code: 0f bc c0 83 f8 03 bb 02 00 00 00 0f 4c d8 83 fb 01 77 49 89 ee 8d b6 00 > 00 00 00 8b 04 9d 04 80 44 78 bf 02 00 00 00 01 f0 8b 00 <8b> 40 1c 89 7c 24 > 04 c7 04 24 bc d1 40 78 89 04 9d 48 fa 34 f8 > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-18 21:40 ` Linus Torvalds @ 2006-03-18 22:09 ` Parag Warudkar 2006-03-18 23:12 ` Linus Torvalds 2006-03-18 22:26 ` Parag Warudkar 2006-03-19 0:53 ` Andrew Morton 2 siblings, 1 reply; 31+ messages in thread From: Parag Warudkar @ 2006-03-18 22:09 UTC (permalink / raw) To: Linus Torvalds Cc: Linux Kernel Mailing List, alex-kernel, jun.nakajima, Dave Jones On Saturday 18 March 2006 16:40, Linus Torvalds wrote: > Anyway, I _think_ it's this one: > > for_each_online_cpu(j) { > dbs_info = &per_cpu(cpu_dbs_info, j); > requested_freq[j] = dbs_info->cur_policy->cur; > } Here is the disassembly of cpufrequency_conservative.o:dbs_check_cpu() from my setup, if it is of any help. 000004b0 <dbs_check_cpu>: 4b0: 55 push %ebp 4b1: bd 00 00 00 00 mov $0x0,%ebp 4b6: 89 e8 mov %ebp,%eax 4b8: 57 push %edi 4b9: 56 push %esi 4ba: 53 push %ebx 4bb: 83 ec 10 sub $0x10,%esp 4be: 8b 54 24 24 mov 0x24(%esp),%edx 4c2: 8b 0c 95 00 00 00 00 mov 0x0(,%edx,4),%ecx 4c9: 01 c8 add %ecx,%eax 4cb: 8b 50 0c mov 0xc(%eax),%edx 4ce: 85 d2 test %edx,%edx 4d0: 0f 84 ba 01 00 00 je 690 <dbs_check_cpu+0x1e0> 4d6: 66 83 3d 10 00 00 00 cmpw $0x0,0x10 4dd: 00 4de: 8b 00 mov (%eax),%eax 4e0: 89 44 24 0c mov %eax,0xc(%esp) 4e4: 0f 84 ae 01 00 00 je 698 <dbs_check_cpu+0x1e8> 4ea: 8b 4c 24 0c mov 0xc(%esp),%ecx 4ee: bf ff ff ff ff mov $0xffffffff,%edi 4f3: 8b 01 mov (%ecx),%eax 4f5: 85 c0 test %eax,%eax 4f7: 0f 85 23 02 00 00 jne 720 <dbs_check_cpu+0x270> 4fd: b8 20 00 00 00 mov $0x20,%eax 502: eb 52 jmp 556 <dbs_check_cpu+0xa6> 504: 8b 14 9d 00 00 00 00 mov 0x0(,%ebx,4),%edx 50b: 89 e8 mov %ebp,%eax 50d: 8b 0d 60 00 00 00 mov 0x60,%ecx 513: 8d 34 02 lea (%edx,%eax,1),%esi 516: b8 00 00 00 00 mov $0x0,%eax 51b: 8d 04 02 lea (%edx,%eax,1),%eax 51e: 8b 50 30 mov 0x30(%eax),%edx 521: 03 50 28 add 0x28(%eax),%edx 524: 85 c9 test %ecx,%ecx 526: 74 03 je 52b <dbs_check_cpu+0x7b> 528: 03 50 08 add 0x8(%eax),%edx 52b: 8b 4e 04 mov 0x4(%esi),%ecx 52e: 89 d0 mov %edx,%eax 530: 89 56 04 mov %edx,0x4(%esi) 533: 29 c8 sub %ecx,%eax 535: 39 f8 cmp %edi,%eax 537: 0f 42 f8 cmovb %eax,%edi 53a: 8d 43 01 lea 0x1(%ebx),%eax 53d: 89 44 24 08 mov %eax,0x8(%esp) 541: b8 02 00 00 00 mov $0x2,%eax 546: 89 44 24 04 mov %eax,0x4(%esp) 54a: 8b 44 24 0c mov 0xc(%esp),%eax 54e: 89 04 24 mov %eax,(%esp) 551: e8 fc ff ff ff call 552 <dbs_check_cpu+0xa2> 556: 83 f8 03 cmp $0x3,%eax 559: bb 02 00 00 00 mov $0x2,%ebx 55e: 0f 4c d8 cmovl %eax,%ebx 561: 83 fb 01 cmp $0x1,%ebx 564: 76 9e jbe 504 <dbs_check_cpu+0x54> 566: 8d 04 bf lea (%edi,%edi,4),%eax 569: 8b 35 50 00 00 00 mov 0x50,%esi 56f: b9 64 00 00 00 mov $0x64,%ecx 574: 8d 04 80 lea (%eax,%eax,4),%eax 577: ba fe ff ff 7f mov $0x7ffffffe,%edx 57c: 8d 1c 85 00 00 00 00 lea 0x0(,%eax,4),%ebx 583: a1 58 00 00 00 mov 0x58,%eax 588: 29 c1 sub %eax,%ecx 58a: 81 fe c0 e0 ff ff cmp $0xffffe0c0,%esi 590: 77 10 ja 5a2 <dbs_check_cpu+0xf2> 592: 8d 96 9f 0f 00 00 lea 0xf9f(%esi),%edx 598: b8 d3 4d 62 10 mov $0x10624dd3,%eax 59d: f7 e2 mul %edx 59f: c1 ea 08 shr $0x8,%edx 5a2: 0f af ca imul %edx,%ecx 5a5: 39 cb cmp %ecx,%ebx 5a7: 0f 83 7b 01 00 00 jae 728 <dbs_check_cpu+0x278> 5ad: 8b 54 24 24 mov 0x24(%esp),%edx 5b1: 31 c0 xor %eax,%eax 5b3: 8b 4c 24 0c mov 0xc(%esp),%ecx 5b7: 89 04 95 00 00 00 00 mov %eax,0x0(,%edx,4) 5be: 8b 01 mov (%ecx),%eax 5c0: 85 c0 test %eax,%eax 5c2: 0f 84 ca 02 00 00 je 892 <dbs_check_cpu+0x3e2> 5c8: 0f bc c0 bsf %eax,%eax 5cb: 83 f8 03 cmp $0x3,%eax 5ce: bb 02 00 00 00 mov $0x2,%ebx 5d3: 0f 4c d8 cmovl %eax,%ebx 5d6: 83 fb 01 cmp $0x1,%ebx 5d9: 77 40 ja 61b <dbs_check_cpu+0x16b> 5db: 89 ee mov %ebp,%esi 5dd: 8d 76 00 lea 0x0(%esi),%esi 5e0: 8b 14 9d 00 00 00 00 mov 0x0(,%ebx,4),%edx 5e7: 01 f2 add %esi,%edx 5e9: 8b 42 04 mov 0x4(%edx),%eax 5ec: 89 42 08 mov %eax,0x8(%edx) 5ef: 8d 43 01 lea 0x1(%ebx),%eax 5f2: bb 02 00 00 00 mov $0x2,%ebx 5f7: 89 44 24 08 mov %eax,0x8(%esp) 5fb: b8 02 00 00 00 mov $0x2,%eax 600: 89 44 24 04 mov %eax,0x4(%esp) 604: 8b 44 24 0c mov 0xc(%esp),%eax 608: 89 04 24 mov %eax,(%esp) 60b: e8 fc ff ff ff call 60c <dbs_check_cpu+0x15c> 610: 83 f8 03 cmp $0x3,%eax 613: 0f 4c d8 cmovl %eax,%ebx 616: 83 fb 01 cmp $0x1,%ebx 619: 76 c5 jbe 5e0 <dbs_check_cpu+0x130> 61b: 8b 54 24 24 mov 0x24(%esp),%edx 61f: 8b 0c 95 08 00 00 00 mov 0x8(,%edx,4),%ecx 626: 8b 54 24 0c mov 0xc(%esp),%edx 62a: 8b 42 18 mov 0x18(%edx),%eax 62d: 39 c1 cmp %eax,%ecx 62f: 74 5f je 690 <dbs_check_cpu+0x1e0> 631: 8b 15 64 00 00 00 mov 0x64,%edx 637: 0f af d0 imul %eax,%edx 63a: b8 1f 85 eb 51 mov $0x51eb851f,%eax 63f: f7 e2 mul %edx 641: b8 05 00 00 00 mov $0x5,%eax 646: c1 ea 05 shr $0x5,%edx 649: 0f 44 d0 cmove %eax,%edx 64c: 8d 04 11 lea (%ecx,%edx,1),%eax 64f: 8b 4c 24 24 mov 0x24(%esp),%ecx 653: 89 04 8d 08 00 00 00 mov %eax,0x8(,%ecx,4) 65a: 8b 4c 24 0c mov 0xc(%esp),%ecx 65e: 8b 51 18 mov 0x18(%ecx),%edx 661: 39 d0 cmp %edx,%eax 663: 76 0d jbe 672 <dbs_check_cpu+0x1c2> 665: 89 d0 mov %edx,%eax 667: 8b 54 24 24 mov 0x24(%esp),%edx 66b: 89 04 95 08 00 00 00 mov %eax,0x8(,%edx,4) 672: 89 44 24 04 mov %eax,0x4(%esp) 676: 8b 4c 24 0c mov 0xc(%esp),%ecx 67a: ba 01 00 00 00 mov $0x1,%edx 67f: 89 54 24 08 mov %edx,0x8(%esp) 683: 89 0c 24 mov %ecx,(%esp) 686: e8 fc ff ff ff call 687 <dbs_check_cpu+0x1d7> 68b: 90 nop 68c: 8d 74 26 00 lea 0x0(%esi),%esi 690: 83 c4 10 add $0x10,%esp 693: 5b pop %ebx 694: 5e pop %esi 695: 5f pop %edi 696: 5d pop %ebp 697: c3 ret 698: a1 00 00 00 00 mov 0x0,%eax 69d: 85 c0 test %eax,%eax 69f: 0f 84 e3 01 00 00 je 888 <dbs_check_cpu+0x3d8> 6a5: 0f bc c0 bsf %eax,%eax 6a8: 83 f8 03 cmp $0x3,%eax 6ab: bb 02 00 00 00 mov $0x2,%ebx 6b0: 0f 4c d8 cmovl %eax,%ebx 6b3: 83 fb 01 cmp $0x1,%ebx 6b6: 77 49 ja 701 <dbs_check_cpu+0x251> 6b8: 89 ee mov %ebp,%esi 6ba: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 6c0: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax 6c7: bf 02 00 00 00 mov $0x2,%edi 6cc: 01 f0 add %esi,%eax 6ce: 8b 00 mov (%eax),%eax 6d0: 8b 40 1c mov 0x1c(%eax),%eax 6d3: 89 7c 24 04 mov %edi,0x4(%esp) 6d7: c7 04 24 00 00 00 00 movl $0x0,(%esp) 6de: 89 04 9d 08 00 00 00 mov %eax,0x8(,%ebx,4) 6e5: 8d 43 01 lea 0x1(%ebx),%eax 6e8: bb 02 00 00 00 mov $0x2,%ebx 6ed: 89 44 24 08 mov %eax,0x8(%esp) 6f1: e8 fc ff ff ff call 6f2 <dbs_check_cpu+0x242> 6f6: 83 f8 03 cmp $0x3,%eax 6f9: 0f 4c d8 cmovl %eax,%ebx 6fc: 83 fb 01 cmp $0x1,%ebx 6ff: 76 bf jbe 6c0 <dbs_check_cpu+0x210> 701: 8b 4c 24 0c mov 0xc(%esp),%ecx 705: bb 01 00 00 00 mov $0x1,%ebx 70a: bf ff ff ff ff mov $0xffffffff,%edi 70f: 66 89 1d 10 00 00 00 mov %bx,0x10 716: 8b 01 mov (%ecx),%eax 718: 85 c0 test %eax,%eax 71a: 0f 84 dd fd ff ff je 4fd <dbs_check_cpu+0x4d> 720: 0f bc c0 bsf %eax,%eax 723: e9 2e fe ff ff jmp 556 <dbs_check_cpu+0xa6> 728: 8b 54 24 24 mov 0x24(%esp),%edx 72c: 8b 04 95 00 00 00 00 mov 0x0(,%edx,4),%eax 733: 40 inc %eax 734: 89 04 95 00 00 00 00 mov %eax,0x0(,%edx,4) 73b: 8b 15 54 00 00 00 mov 0x54,%edx 741: 39 d0 cmp %edx,%eax 743: 0f 82 47 ff ff ff jb 690 <dbs_check_cpu+0x1e0> 749: 8b 4c 24 0c mov 0xc(%esp),%ecx 74d: bf ff ff ff ff mov $0xffffffff,%edi 752: 8b 01 mov (%ecx),%eax 754: 85 c0 test %eax,%eax 756: 0f 85 40 01 00 00 jne 89c <dbs_check_cpu+0x3ec> 75c: b8 20 00 00 00 mov $0x20,%eax 761: 83 f8 03 cmp $0x3,%eax 764: bb 02 00 00 00 mov $0x2,%ebx 769: 0f 4c d8 cmovl %eax,%ebx 76c: 83 fb 01 cmp $0x1,%ebx 76f: 77 64 ja 7d5 <dbs_check_cpu+0x325> 771: 89 ee mov %ebp,%esi 773: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 779: 8d bc 27 00 00 00 00 lea 0x0(%edi),%edi 780: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax 787: 01 f0 add %esi,%eax 789: 8b 50 04 mov 0x4(%eax),%edx 78c: 8b 68 08 mov 0x8(%eax),%ebp 78f: 89 d1 mov %edx,%ecx 791: 29 e9 sub %ebp,%ecx 793: 89 50 08 mov %edx,0x8(%eax) 796: 8d 43 01 lea 0x1(%ebx),%eax 799: 39 f9 cmp %edi,%ecx 79b: 89 44 24 08 mov %eax,0x8(%esp) 79f: 8b 44 24 0c mov 0xc(%esp),%eax 7a3: bd 02 00 00 00 mov $0x2,%ebp 7a8: 0f 42 f9 cmovb %ecx,%edi 7ab: 89 6c 24 04 mov %ebp,0x4(%esp) 7af: 89 04 24 mov %eax,(%esp) 7b2: e8 fc ff ff ff call 7b3 <dbs_check_cpu+0x303> 7b7: 83 f8 03 cmp $0x3,%eax 7ba: ba 02 00 00 00 mov $0x2,%edx 7bf: 0f 4d c2 cmovge %edx,%eax 7c2: 83 f8 01 cmp $0x1,%eax 7c5: 89 c3 mov %eax,%ebx 7c7: 76 b7 jbe 780 <dbs_check_cpu+0x2d0> 7c9: 8b 35 50 00 00 00 mov 0x50,%esi 7cf: 8b 15 54 00 00 00 mov 0x54,%edx 7d5: 8d 04 bf lea (%edi,%edi,4),%eax 7d8: 8b 4c 24 24 mov 0x24(%esp),%ecx 7dc: 8d 04 80 lea (%eax,%eax,4),%eax 7df: 8d 1c 85 00 00 00 00 lea 0x0(,%eax,4),%ebx 7e6: 89 f0 mov %esi,%eax 7e8: 0f af c2 imul %edx,%eax 7eb: 8b 35 5c 00 00 00 mov 0x5c,%esi 7f1: 31 ff xor %edi,%edi 7f3: ba fe ff ff 7f mov $0x7ffffffe,%edx 7f8: 89 3c 8d 00 00 00 00 mov %edi,0x0(,%ecx,4) 7ff: b9 64 00 00 00 mov $0x64,%ecx 804: 29 f1 sub %esi,%ecx 806: 3d c0 e0 ff ff cmp $0xffffe0c0,%eax 80b: 77 10 ja 81d <dbs_check_cpu+0x36d> 80d: 8d 90 9f 0f 00 00 lea 0xf9f(%eax),%edx 813: b8 d3 4d 62 10 mov $0x10624dd3,%eax 818: f7 e2 mul %edx 81a: c1 ea 08 shr $0x8,%edx 81d: 0f af ca imul %edx,%ecx 820: 39 cb cmp %ecx,%ebx 822: 0f 86 68 fe ff ff jbe 690 <dbs_check_cpu+0x1e0> 828: 8b 44 24 24 mov 0x24(%esp),%eax 82c: 8b 54 24 0c mov 0xc(%esp),%edx 830: 8b 0c 85 08 00 00 00 mov 0x8(,%eax,4),%ecx 837: 3b 4a 14 cmp 0x14(%edx),%ecx 83a: 0f 84 50 fe ff ff je 690 <dbs_check_cpu+0x1e0> 840: a1 64 00 00 00 mov 0x64,%eax 845: 85 c0 test %eax,%eax 847: 0f 84 43 fe ff ff je 690 <dbs_check_cpu+0x1e0> 84d: 8b 5a 18 mov 0x18(%edx),%ebx 850: ba 1f 85 eb 51 mov $0x51eb851f,%edx 855: 0f af c3 imul %ebx,%eax 858: f7 e2 mul %edx 85a: b8 05 00 00 00 mov $0x5,%eax 85f: c1 ea 05 shr $0x5,%edx 862: 0f 44 d0 cmove %eax,%edx 865: 89 c8 mov %ecx,%eax 867: 8b 4c 24 24 mov 0x24(%esp),%ecx 86b: 29 d0 sub %edx,%eax 86d: 89 04 8d 08 00 00 00 mov %eax,0x8(,%ecx,4) 874: 8b 4c 24 0c mov 0xc(%esp),%ecx 878: 8b 51 14 mov 0x14(%ecx),%edx 87b: 39 d0 cmp %edx,%eax 87d: 0f 83 ef fd ff ff jae 672 <dbs_check_cpu+0x1c2> 883: e9 dd fd ff ff jmp 665 <dbs_check_cpu+0x1b5> 888: b8 20 00 00 00 mov $0x20,%eax 88d: e9 16 fe ff ff jmp 6a8 <dbs_check_cpu+0x1f8> 892: b8 20 00 00 00 mov $0x20,%eax 897: e9 2f fd ff ff jmp 5cb <dbs_check_cpu+0x11b> 89c: 0f bc c0 bsf %eax,%eax 89f: e9 bd fe ff ff jmp 761 <dbs_check_cpu+0x2b1> 8a4: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 8aa: 8d bf 00 00 00 00 lea 0x0(%edi),%edi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-18 22:09 ` Parag Warudkar @ 2006-03-18 23:12 ` Linus Torvalds 0 siblings, 0 replies; 31+ messages in thread From: Linus Torvalds @ 2006-03-18 23:12 UTC (permalink / raw) To: Parag Warudkar Cc: Linux Kernel Mailing List, alex-kernel, jun.nakajima, Dave Jones On Sat, 18 Mar 2006, Parag Warudkar wrote: > > Here is the disassembly of cpufrequency_conservative.o:dbs_check_cpu() from my > setup, if it is of any help. Yup. Just confirming that this seems to be what I was pointing to. It's the first "for_each_online_cpu()" loop: the reason it's pretty far down in the function disassembly is that the early "init_flag == 0" test jumps off into the latter part of the thing, even though the code is early. That's just gcc re-ordering stuff. > 000004b0 <dbs_check_cpu>: > 4b0: 55 push %ebp > 4b1: bd 00 00 00 00 mov $0x0,%ebp > 4b6: 89 e8 mov %ebp,%eax > 4b8: 57 push %edi > 4b9: 56 push %esi > 4ba: 53 push %ebx > 4bb: 83 ec 10 sub $0x10,%esp > 4be: 8b 54 24 24 mov 0x24(%esp),%edx > 4c2: 8b 0c 95 00 00 00 00 mov 0x0(,%edx,4),%ecx > 4c9: 01 c8 add %ecx,%eax > 4cb: 8b 50 0c mov 0xc(%eax),%edx > 4ce: 85 d2 test %edx,%edx > 4d0: 0f 84 ba 01 00 00 je 690 <dbs_check_cpu+0x1e0> This is the "init_flag == 0" test: > 4d6: 66 83 3d 10 00 00 00 cmpw $0x0,0x10 > 4dd: 00 > 4de: 8b 00 mov (%eax),%eax > 4e0: 89 44 24 0c mov %eax,0xc(%esp) > 4e4: 0f 84 ae 01 00 00 je 698 <dbs_check_cpu+0x1e8> ....... This is for_each_online_cpu(j) { > 698: a1 00 00 00 00 mov 0x0,%eax > 69d: 85 c0 test %eax,%eax > 69f: 0f 84 e3 01 00 00 je 888 <dbs_check_cpu+0x3d8> > 6a5: 0f bc c0 bsf %eax,%eax > 6a8: 83 f8 03 cmp $0x3,%eax > 6ab: bb 02 00 00 00 mov $0x2,%ebx > 6b0: 0f 4c d8 cmovl %eax,%ebx > 6b3: 83 fb 01 cmp $0x1,%ebx > 6b6: 77 49 ja 701 <dbs_check_cpu+0x251> > 6b8: 89 ee mov %ebp,%esi > 6ba: 8d b6 00 00 00 00 lea 0x0(%esi),%esi dbs_info = &per_cpu(cpu_dbs_info, j); > 6c0: 8b 04 9d 00 00 00 00 mov 0x0(,%ebx,4),%eax > 6c7: bf 02 00 00 00 mov $0x2,%edi > 6cc: 01 f0 add %esi,%eax > 6ce: 8b 00 mov (%eax),%eax And this is the oopsing instruction: > 6d0: 8b 40 1c mov 0x1c(%eax),%eax > 6d3: 89 7c 24 04 mov %edi,0x4(%esp) > 6d7: c7 04 24 00 00 00 00 movl $0x0,(%esp) And this is the "requested_freq[j]" assignment: > 6de: 89 04 9d 08 00 00 00 mov %eax,0x8(,%ebx,4) > 6e5: 8d 43 01 lea 0x1(%ebx),%eax > 6e8: bb 02 00 00 00 mov $0x2,%ebx > 6ed: 89 44 24 08 mov %eax,0x8(%esp) This is really a "call find_next_bit" > 6f1: e8 fc ff ff ff call 6f2 <dbs_check_cpu+0x242> > 6f6: 83 f8 03 cmp $0x3,%eax > 6f9: 0f 4c d8 cmovl %eax,%ebx > 6fc: 83 fb 01 cmp $0x1,%ebx > 6ff: 76 bf jbe 6c0 <dbs_check_cpu+0x210> And that wass the end of that loop: > 701: 8b 4c 24 0c mov 0xc(%esp),%ecx > 705: bb 01 00 00 00 mov $0x1,%ebx > 70a: bf ff ff ff ff mov $0xffffffff,%edi > 70f: 66 89 1d 10 00 00 00 mov %bx,0x10 And that was setting "init_flag = 1" Horrid, horrid assembly language from that simple "for_each_online_cpu(j)" loop. Oh, well. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-18 21:40 ` Linus Torvalds 2006-03-18 22:09 ` Parag Warudkar @ 2006-03-18 22:26 ` Parag Warudkar 2006-03-19 0:53 ` Andrew Morton 2 siblings, 0 replies; 31+ messages in thread From: Parag Warudkar @ 2006-03-18 22:26 UTC (permalink / raw) To: Linus Torvalds Cc: Linux Kernel Mailing List, alex-kernel, jun.nakajima, Dave Jones On Saturday 18 March 2006 16:40, Linus Torvalds wrote: > Anyway, I _think_ it's this one: > > for_each_online_cpu(j) { > dbs_info = &per_cpu(cpu_dbs_info, j); > requested_freq[j] = dbs_info->cur_policy->cur; > } > > where dbs_info->cur_policy seems to be NULL. Maybe. That was right on target. I just put a check in the code which confirms that for some reason cur_policy for cpu0 is NULL. Parag ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-18 21:40 ` Linus Torvalds 2006-03-18 22:09 ` Parag Warudkar 2006-03-18 22:26 ` Parag Warudkar @ 2006-03-19 0:53 ` Andrew Morton 2006-03-19 2:38 ` Linus Torvalds 2006-03-19 6:34 ` Parag Warudkar 2 siblings, 2 replies; 31+ messages in thread From: Andrew Morton @ 2006-03-19 0:53 UTC (permalink / raw) To: Linus Torvalds Cc: kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej Linus Torvalds <torvalds@osdl.org> wrote: > > Gaah. Looking at the assembly code to try to figure out where the oops > happened, one thing is clear: "for_each_cpu_mask()" generates some truly > horrible code. C code that looks simple ends up being tons of complex > assembly language due to inline functions etc. > > We've made it way too easy for people to write code that just absolutely > _sucks_. Yes, uninlining merely first_cpu() shrinks my usual vmlinux by 2.5k. I'll fix it up. Meanwhile, I suppose we need to check that pointer for NULL as Parag suggests. I might stick a once-off WARN_ON() in there so someone gets in and works out why we keep on having to graft mysterious null-pointer avoidances into cpufreq. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 0:53 ` Andrew Morton @ 2006-03-19 2:38 ` Linus Torvalds 2006-03-19 5:08 ` Paul Jackson 2006-03-19 18:46 ` Linus Torvalds 2006-03-19 6:34 ` Parag Warudkar 1 sibling, 2 replies; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 2:38 UTC (permalink / raw) To: Andrew Morton Cc: kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sat, 18 Mar 2006, Andrew Morton wrote: > > Yes, uninlining merely first_cpu() shrinks my usual vmlinux by 2.5k. I'll > fix it up. The thing is, we could do _so_ much better if we just fixed the "calling convention" for that loop. In particular, if we instead of for_each_cpu_mask(cpu, mask) .. do something with cpu .. had for_each_cpu_mask(cpu, mask) { .. do something with cpu .. } end_for_each_online_cpu; we could do #if NR_CPUS <= BITS_IN_LONG #define for_each_cpu_mask(cpu, mask) \ { unsigned long __cpubits = (mask)->bits[0]; \ for (cpu = 0; __cpubits; cpu++, cpubits >>= 1) { \ if (!(__cpubits & 1)) \ continue; #define end_for_each_cpu_mask } } then the code we'd generate would be (a) tons smaller (b) lots faster than what we do now. No uninlining necessary, because we'd simply not have any complex code to either inline or to call. The reason the code sucks right now is that the interface is bad, and causes us to do insane things to make it work at all with that syntax. (We already have this kind of thing for the "do_each_thread / while_each_thread" interface to make it work sanely. Also, anybody who has ever played with "sparse" has seen the wonderful and flexible FOR_EACH_PTR() .. END_FOR_EACH_PTR() interface that allows for efficient pointer list traversal). Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 2:38 ` Linus Torvalds @ 2006-03-19 5:08 ` Paul Jackson 2006-03-19 17:43 ` Linus Torvalds 2006-03-19 18:46 ` Linus Torvalds 1 sibling, 1 reply; 31+ messages in thread From: Paul Jackson @ 2006-03-19 5:08 UTC (permalink / raw) To: Linus Torvalds Cc: akpm, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej If find_first_bit and find_next_bit suck so bad, then why just fix their use in the for_each_cpu_mask(). How about the other uses? Such as the other 50 odd places in the kernel that call them directly, such as: $ grep ' = find_[a-z]*_bit' lib/bitmap.c rbot = cur = find_first_bit(maskp, nmaskbits); cur = find_next_bit(maskp, nmaskbits, cur+1); i = find_first_bit(buf, bits); i = find_next_bit(buf, bits, i + 1); for (i = find_first_bit(buf, bits); i = find_next_bit(buf, bits, i + 1)) for (oldbit = find_first_bit(src, bits); oldbit = find_next_bit(src, bits, oldbit + 1)) { Perhaps the common interface to these find_*_bit routines should be out of line, with perhaps just a couple of key calls from the scheduler using the inline form. And if we fix the cpu loop to the API Linus suggests, we should do the same with the node loops, such as used in: $ grep for_each.*_node mm/slab.c for_each_node(i) { for_each_node(i) for_each_online_node(i) { for_each_online_node(node) { for_each_online_node(node) { for_each_online_node(node) { for_each_online_node(node) { for_each_online_node(i) { for_each_online_node(i) { for_each_online_node(node) { for_each_online_node(node) { for_each_online_node(node) { for_each_online_node(node) { And for those of us with too many CPUs, how about something like (totally untested and probably totally bogus): #if NR_CPUS <= BITS_IN_LONG ... as per Linus, shifting a mask right ... #else #define for_each_cpu_mask(cpu, mask) { for (cpu = 0; cpu < NR_CPUS; cpu++) { if (!(test_bit(cpu, mask)) continue; #endif #define end_for_each_cpu_mask } } -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 5:08 ` Paul Jackson @ 2006-03-19 17:43 ` Linus Torvalds 0 siblings, 0 replies; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 17:43 UTC (permalink / raw) To: Paul Jackson Cc: akpm, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sat, 18 Mar 2006, Paul Jackson wrote: > > If find_first_bit and find_next_bit suck so bad, then why just fix > their use in the for_each_cpu_mask(). How about the other uses? find_first_bit/find_next_bit don't actually suck. They're perfectly efficient for large bitmaps, which is what they were designed for (ie they were written for doing the free-block/inode bitmaps for minixfs originally). In other words, they are meant for bitmaps of _thousands_ of bits, and for potentially sparse bitmaps. Now, the CPU masks _can_ be large too, but commonly they are not. In this example, the "bitmap" was exactly two bits: cpu 0 and cpu 1. Using a complex "find_next_bit" for something like that is just overkill. > And if we fix the cpu loop to the API Linus suggests, we should do the > same with the node loops Yes. Almost ever "for_each_xyzzy" tends to be more easily written as a macro if we also have a ending condition. > And for those of us with too many CPUs, how about something like > (totally untested and probably totally bogus): > > #if NR_CPUS <= BITS_IN_LONG > ... as per Linus, shifting a mask right ... > #else > #define for_each_cpu_mask(cpu, mask) > { for (cpu = 0; cpu < NR_CPUS; cpu++) { > if (!(test_bit(cpu, mask)) > continue; > #endif > > #define end_for_each_cpu_mask } } Yes, except you'd probably be better off with testing whole words at a time, since a large NR_CPUs can often be because somebody wanted to support 256 CPU machines, and then used the same binary on a 8-way setup. That would most helpfully be done with a double loop (outer one iterating over the bitmask words, inner one over the bits), but then it's hard to make "break" do the right thing inside the loop. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 2:38 ` Linus Torvalds 2006-03-19 5:08 ` Paul Jackson @ 2006-03-19 18:46 ` Linus Torvalds 2006-03-19 19:02 ` Linus Torvalds 1 sibling, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 18:46 UTC (permalink / raw) To: Andrew Morton Cc: kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sat, 18 Mar 2006, Linus Torvalds wrote: > > The thing is, we could do _so_ much better if we just fixed the "calling > convention" for that loop. Actually, looking at it a bit more, I think we can do better even _without_ fixing the "calling convention". Namely like this. This should make the "NR_CPUS <= 16" case have a _much_ simpler loop. I picked "16" at random. The loop is much faster and much simpler (it doesn't use complex instructions like "find first bit", but the downside is that if the CPU mask is very sparse, it will take more of those (very simple) iterations to figure out that it's empty. I suspect the "16" could be raised to 32 (at which point it would cover all the default vaules), but somebody should probably take a look at how much this shrinks the kernel. I have to admit that it's a bit evil to leave a dangling "else" in a macro, but it beautifully handles the "must be followed by exactly one C statement" requirement of the old macro ;) So instead of calling it "evil", let's just say that it's "creative macro-writing". (Btw, we could make this even cleaner by making the non-SMP case define "cpu_isset()" to 1 - at that point the UP and the "low CPU count" #defines could be merged into one). Linus -- diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 60e56c6..a659f42 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -313,11 +313,17 @@ static inline void __cpus_remap(cpumask_ bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits); } -#if NR_CPUS > 1 +#if NR_CPUS > 16 #define for_each_cpu_mask(cpu, mask) \ for ((cpu) = first_cpu(mask); \ (cpu) < NR_CPUS; \ (cpu) = next_cpu((cpu), (mask))) +#elif NR_CPUS > 1 +#define for_each_cpu_mask(cpu, mask) \ + for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \ + if (!cpu_isset(cpu, mask)) \ + continue; \ + else #else /* NR_CPUS == 1 */ #define for_each_cpu_mask(cpu, mask) for ((cpu) = 0; (cpu) < 1; (cpu)++) #endif /* NR_CPUS */ ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 18:46 ` Linus Torvalds @ 2006-03-19 19:02 ` Linus Torvalds 2006-03-19 19:33 ` Linus Torvalds 0 siblings, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 19:02 UTC (permalink / raw) To: Andrew Morton Cc: kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sun, 19 Mar 2006, Linus Torvalds wrote: > > Actually, looking at it a bit more, I think we can do better even > _without_ fixing the "calling convention". Here's the "before and after" example of nr_context_switches() from kernel/sched.c with this patch on x86-64 (selected because the whole function is just basically one simple "loop over all possible CPU's and return the sum of context switches"). I'll do the "after" first, because it's actually readable: nr_context_switches: pushq %rbp xorl %esi, %esi xorl %ecx, %ecx movq %rsp, %rbp .L210: #APP btl %ecx,cpu_possible_map(%rip) sbbl %eax,%eax #NO_APP testl %eax, %eax je .L211 movq _cpu_pda(,%rcx,8), %rdx movq $per_cpu__runqueues, %rax addq 8(%rdx), %rax addq 40(%rax), %rsi .L211: incq %rcx cmpq $8, %rcx jne .L210 leave movq %rsi, %rax ret ie the loop starts at .L210, end basically iterates %rcx from 0..7, and you can read the assembly and it actually makes sense. (Whether it _works_ is another matter: I haven't actually booted the patched kernel ;) Compare that to the "before" state: nr_context_switches: pushq %rbp movl $8, %eax movq cpu_possible_map(%rip), %rdi #APP bsfq %rdi,%rdx ; cmovz %rax,%rdx #NO_APP cmpl $9, %edx movq %rdx, %rax movl $8, %edx cmovge %edx, %eax movq %rsp, %rbp pushq %rbx movslq %eax,%rcx xorl %r8d, %r8d jmp .L261 .L262: movq _cpu_pda(,%rcx,8), %rdx movl $8, %esi movq $per_cpu__runqueues, %rax movq %rdi, %rbx movq 8(%rdx), %rdx addq 40(%rax,%rdx), %r8 movl %ecx, %edx movl %esi, %eax leal 1(%rdx), %ecx subl %edx, %eax decl %eax shrq %cl, %rbx cltq movq %rbx, %rcx #APP bsfq %rcx,%rbx ; cmovz %rax,%rbx #NO_APP leal 1(%rdx,%rbx), %edx cmpl $9, %edx cmovge %esi, %edx movslq %edx,%rcx .L261: cmpq $7, %rcx jbe .L262 popq %rbx leave movq %r8, %rax ret which is not only obviously bigger (40 instructions vs just 18), I also dare anybody to actually read and understand the assembly. Now, the simple version will iterate 8 times over the loop, while the more complex version will iterate just as many times as there are CPU's in the actual system. So there's a trade-off. The "load the CPU mask first and then shift it down by one every time" thing (that required different syntax) would have been able to exit early. This one isn't. So the syntax change would still help a lot (and would avoid the "btl"). Of course, if people were to set CONFIG_NR_CPUS appropriately for their system, the shorter version gets increasingly better (since it then won't do any unnecessary work). Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 19:02 ` Linus Torvalds @ 2006-03-19 19:33 ` Linus Torvalds 2006-03-19 19:40 ` Al Viro 2006-03-20 6:12 ` Willy Tarreau 0 siblings, 2 replies; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 19:33 UTC (permalink / raw) To: Andrew Morton Cc: kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sun, 19 Mar 2006, Linus Torvalds wrote: > > Now, the simple version will iterate 8 times over the loop, while the more > complex version will iterate just as many times as there are CPU's in the > actual system. So there's a trade-off. The "load the CPU mask first and > then shift it down by one every time" thing (that required different > syntax) would have been able to exit early. This one isn't. So the syntax > change would still help a lot (and would avoid the "btl"). Hah. My "abuse every gcc feature and do really ugly macros" dark side has been vetted, and it came up with the appended work of art. Doing a "break" inside a conditional by using the gcc statement expressions is sublime. And it works. It's a couple of instructions longer than the shortest version (but still about half the size of the horror we have now), but the instructions are simpler (just a shift rather than a "btl"), and it now knows to break out early if there are no CPU's left to check. It has the added advantage that because it uses simpler core instructions, gcc can actually optimize it better - in "nr_context_switches()" gcc happily hoisted the "mask->bits[0]" load to outside the loop, for example. With NR_CPUS==2, gcc even apparently unrolls the loop, to the point where it isn't even a loop at all. Good boy (at least for that case - I sure hope it won't do that for some of the larger loops ;). Of course, I shouldn't say "works", since it is still totally untested. It _looks_ good, and that statement expression usage is just _so_ ugly it's cute. Linus --- diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 60e56c6..17a965c 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -313,11 +313,20 @@ static inline void __cpus_remap(cpumask_ bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits); } -#if NR_CPUS > 1 +#define check_for_each_cpu(cpu, mask) \ + ({ unsigned long __bits = (mask).bits[0] >> (cpu); if (!__bits) break; __bits & 1; }) + +#if NR_CPUS > 32 #define for_each_cpu_mask(cpu, mask) \ for ((cpu) = first_cpu(mask); \ (cpu) < NR_CPUS; \ (cpu) = next_cpu((cpu), (mask))) +#elif NR_CPUS > 1 +#define for_each_cpu_mask(cpu, mask) \ + for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \ + if (!check_for_each_cpu(cpu, mask)) \ + continue; \ + else #else /* NR_CPUS == 1 */ #define for_each_cpu_mask(cpu, mask) for ((cpu) = 0; (cpu) < 1; (cpu)++) #endif /* NR_CPUS */ ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 19:33 ` Linus Torvalds @ 2006-03-19 19:40 ` Al Viro 2006-03-19 20:01 ` Linus Torvalds 2006-03-20 6:12 ` Willy Tarreau 1 sibling, 1 reply; 31+ messages in thread From: Al Viro @ 2006-03-19 19:40 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sun, Mar 19, 2006 at 11:33:17AM -0800, Linus Torvalds wrote: > Doing a "break" inside a conditional by using the gcc statement > expressions is sublime. > > And it works. In the version of gcc you've tested. With options and phase of moon being what they had been. IOW, you are awfully optimistic - it's not just using gcc extension, it's using undocumented (in the best case) behaviour outside the intended use of that extension. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 19:40 ` Al Viro @ 2006-03-19 20:01 ` Linus Torvalds 2006-03-19 20:31 ` Linus Torvalds 0 siblings, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 20:01 UTC (permalink / raw) To: Al Viro Cc: Andrew Morton, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sun, 19 Mar 2006, Al Viro wrote: > > In the version of gcc you've tested. With options and phase of moon > being what they had been. IOW, you are awfully optimistic - it's not > just using gcc extension, it's using undocumented (in the best case) > behaviour outside the intended use of that extension. I admit that it's ugly, but it's not undocumented. It flows directly from "statements as expression". Once you do that, you have to do flow control with them. The end result may be _surprising_, the same way Duff's device is surprising (and for the same reason). But a C compiler that doesn't support Duff's device is not a C compiler. And this is really no different: it may not bestandard C: but it _is_ standard and documented GNU C. And btw, this is _not_ new behaviour for the kernel. We have used non-local control behaviour in statement expressions before, just do a git grep '({.*return' to see at least ten cases of that (in fact, check out NFA_PUT(), which does a goto for the failure case in networking). That grep misses all the multi-line cases, so I assume there are more of them. So this definitely works, and is not new. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 20:01 ` Linus Torvalds @ 2006-03-19 20:31 ` Linus Torvalds 2006-03-19 20:47 ` Andrew Morton 2006-03-19 20:57 ` Parag Warudkar 0 siblings, 2 replies; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 20:31 UTC (permalink / raw) To: Al Viro Cc: Andrew Morton, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sun, 19 Mar 2006, Linus Torvalds wrote: > > So this definitely works, and is not new. Btw, don't get me wrong - I think it would be preferable to use the "for_each/end_for_each" syntax, since that would make the macros simpler, and would possibly allow for somewhat simpler generated code. But that would require each of the 300+ "for_each.*_cpu()" uses in the current kernel to be modified and checked. In the meantime, people who are interested can test out how much difference the trivial patch makes for them.. For me, it made a 4970 byte difference in code size. Noticeable? Maybe not (it's about 0.15% of my kernel text-size), but it _is_ better code, and next time somebody gets an oops near there, at least the crap code won't be hiding the cause ;) Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 20:31 ` Linus Torvalds @ 2006-03-19 20:47 ` Andrew Morton 2006-03-19 22:18 ` Linus Torvalds 2006-03-19 20:57 ` Parag Warudkar 1 sibling, 1 reply; 31+ messages in thread From: Andrew Morton @ 2006-03-19 20:47 UTC (permalink / raw) To: Linus Torvalds Cc: viro, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej Linus Torvalds <torvalds@osdl.org> wrote: > > For me, it made a 4970 byte difference in code size. > That's about the same saving as uninlining first_cpu() and next_cpu() provides. Anything which iterates across multiple CPUs is cachemiss heaven - I doubt if this is performance-critical code. Or at least if it is, we have bigger problems.. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 20:47 ` Andrew Morton @ 2006-03-19 22:18 ` Linus Torvalds 2006-03-19 22:35 ` Andrew Morton 2006-03-19 22:46 ` Linus Torvalds 0 siblings, 2 replies; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 22:18 UTC (permalink / raw) To: Andrew Morton Cc: viro, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sun, 19 Mar 2006, Andrew Morton wrote: > > That's about the same saving as uninlining first_cpu() and next_cpu() > provides. Btw, we could do that better for UP too. How does this patch look? It makes "for_each_cpu()" look the same regardless of whether it is UP/SMP, by simply making first_cpu() and next_cpu() work appropriately. No ugly macros this time ;) Linus --- diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 60e56c6..c8dbc24 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -212,17 +212,15 @@ static inline void __cpus_shift_left(cpu bitmap_shift_left(dstp->bits, srcp->bits, n, nbits); } -#define first_cpu(src) __first_cpu(&(src), NR_CPUS) -static inline int __first_cpu(const cpumask_t *srcp, int nbits) -{ - return min_t(int, nbits, find_first_bit(srcp->bits, nbits)); -} - -#define next_cpu(n, src) __next_cpu((n), &(src), NR_CPUS) -static inline int __next_cpu(int n, const cpumask_t *srcp, int nbits) -{ - return min_t(int, nbits, find_next_bit(srcp->bits, nbits, n+1)); -} +#if NR_CPUS > 1 +extern int fastcall __first_cpu(const cpumask_t *srcp); +extern int fastcall __next_cpu(int n, const cpumask_t *srcp); +#define first_cpu(srcp) __first_cpu(&(srcp)) +#define next_cpu(n,srcp) __next_cpu((n),&(srcp)) +#else +#define first_cpu(srcp) (0) +#define next_cpu(n, srcp) ((n)+1) +#endif #define cpumask_of_cpu(cpu) \ ({ \ @@ -313,14 +311,10 @@ static inline void __cpus_remap(cpumask_ bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits); } -#if NR_CPUS > 1 #define for_each_cpu_mask(cpu, mask) \ for ((cpu) = first_cpu(mask); \ (cpu) < NR_CPUS; \ (cpu) = next_cpu((cpu), (mask))) -#else /* NR_CPUS == 1 */ -#define for_each_cpu_mask(cpu, mask) for ((cpu) = 0; (cpu) < 1; (cpu)++) -#endif /* NR_CPUS */ /* * The following particular system cpumasks and operations manage diff --git a/kernel/cpu.c b/kernel/cpu.c index e882c6b..0a9cc97 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -10,6 +10,7 @@ #include <linux/sched.h> #include <linux/unistd.h> #include <linux/cpu.h> +#include <linux/cpumask.h> #include <linux/module.h> #include <linux/kthread.h> #include <linux/stop_machine.h> @@ -20,6 +21,18 @@ static DECLARE_MUTEX(cpucontrol); static struct notifier_block *cpu_chain; +int __first_cpu(const cpumask_t *srcp) +{ + return min_t(int, NR_CPUS, find_first_bit(srcp->bits, NR_CPUS)); +} +EXPORT_SYMBOL(__first_cpu); + +int __next_cpu(int n, const cpumask_t *srcp) +{ + return min_t(int, NR_CPUS, find_next_bit(srcp->bits, NR_CPUS, n+1)); +} +EXPORT_SYMBOL(__next_cpu); + #ifdef CONFIG_HOTPLUG_CPU static struct task_struct *lock_cpu_hotplug_owner; static int lock_cpu_hotplug_depth; ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 22:18 ` Linus Torvalds @ 2006-03-19 22:35 ` Andrew Morton 2006-03-19 22:55 ` Linus Torvalds 2006-03-19 22:46 ` Linus Torvalds 1 sibling, 1 reply; 31+ messages in thread From: Andrew Morton @ 2006-03-19 22:35 UTC (permalink / raw) To: Linus Torvalds Cc: viro, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej Linus Torvalds <torvalds@osdl.org> wrote: > > How does this patch look? > Wonderful - coz it's just like mine ;) (4 patches temp-joined below) - I made SMP=n next_cpu() just a constant "1". - It's odd that the code uses (NR_CPUS>1) rather than CONFIG_SMP. On x86 (at least), SMP=y,NR_CPUS==1 isn't allowed anyway. From: Andrew Morton <akpm@osdl.org> text data bss dec hex filename before: 3490577 1322408 360000 5172985 4eeef9 vmlinux after: 3488027 1322496 360128 5170651 4ee5db vmlinux Cc: Paul Jackson <pj@sgi.com> DESC cpumask: uninline next_cpu() EDESC From: Andrew Morton <akpm@osdl.org> text data bss dec hex filename before: 3488027 1322496 360128 5170651 4ee5db vmlinux after: 3485112 1322480 359968 5167560 4ed9c8 vmlinux 2931 bytes saved Cc: Paul Jackson <pj@sgi.com> DESC cpumask: uninline highest_possible_processor_id() EDESC From: Andrew Morton <akpm@osdl.org> Shrinks the only caller (net/bridge/netfilter/ebtables.c) by 174 bytes. Also, optimise highest_possible_processor_id() out of existence on CONFIG_SMP=n. Cc: Paul Jackson <pj@sgi.com> DESC cpumask: uninline any_online_cpu() EDESC From: Andrew Morton <akpm@osdl.org> text data bss dec hex filename before: 3605597 1363528 363328 5332453 515de5 vmlinux after: 3605295 1363612 363200 5332107 515c8b vmlinux 218 bytes saved. Also, optimise any_online_cpu() out of existence on CONFIG_SMP=n. This function seems inefficient. Can't we simply AND the two masks, then use find_first_bit()? Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> --- include/linux/cpumask.h | 46 ++++++++++++++------------------------ lib/Makefile | 2 + lib/cpumask.c | 45 +++++++++++++++++++++++++++++++++++++ 3 files changed, 64 insertions(+), 29 deletions(-) diff -puN include/linux/cpumask.h~cpumask-uninline-first_cpu include/linux/cpumask.h --- devel/include/linux/cpumask.h~cpumask-uninline-first_cpu 2006-03-19 14:26:35.000000000 -0800 +++ devel-akpm/include/linux/cpumask.h 2006-03-19 14:28:45.000000000 -0800 @@ -212,17 +212,15 @@ static inline void __cpus_shift_left(cpu bitmap_shift_left(dstp->bits, srcp->bits, n, nbits); } -#define first_cpu(src) __first_cpu(&(src), NR_CPUS) -static inline int __first_cpu(const cpumask_t *srcp, int nbits) -{ - return min_t(int, nbits, find_first_bit(srcp->bits, nbits)); -} - -#define next_cpu(n, src) __next_cpu((n), &(src), NR_CPUS) -static inline int __next_cpu(int n, const cpumask_t *srcp, int nbits) -{ - return min_t(int, nbits, find_next_bit(srcp->bits, nbits, n+1)); -} +#ifdef CONFIG_SMP +int __first_cpu(const cpumask_t *srcp); +#define first_cpu(src) __first_cpu(&(src)) +int __next_cpu(int n, const cpumask_t *srcp); +#define next_cpu(n, src) __next_cpu((n), &(src)) +#else +#define first_cpu(src) 0 +#define next_cpu(n, src) 1 +#endif #define cpumask_of_cpu(cpu) \ ({ \ @@ -398,27 +396,17 @@ extern cpumask_t cpu_present_map; #define cpu_present(cpu) ((cpu) == 0) #endif -#define any_online_cpu(mask) \ -({ \ - int cpu; \ - for_each_cpu_mask(cpu, (mask)) \ - if (cpu_online(cpu)) \ - break; \ - cpu; \ -}) +#ifdef CONFIG_SMP +int highest_possible_processor_id(void); +#define any_online_cpu(mask) __any_online_cpu(&(mask)) +int __any_online_cpu(const cpumask_t *mask); +#else +#define highest_possible_processor_id() 0 +#define any_online_cpu(mask) 0 +#endif #define for_each_cpu(cpu) for_each_cpu_mask((cpu), cpu_possible_map) #define for_each_online_cpu(cpu) for_each_cpu_mask((cpu), cpu_online_map) #define for_each_present_cpu(cpu) for_each_cpu_mask((cpu), cpu_present_map) -/* Find the highest possible smp_processor_id() */ -#define highest_possible_processor_id() \ -({ \ - unsigned int cpu, highest = 0; \ - for_each_cpu_mask(cpu, cpu_possible_map) \ - highest = cpu; \ - highest; \ -}) - - #endif /* __LINUX_CPUMASK_H */ diff -puN /dev/null lib/cpumask.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ devel-akpm/lib/cpumask.c 2006-03-19 14:28:45.000000000 -0800 @@ -0,0 +1,45 @@ +#include <linux/kernel.h> +#include <linux/bitops.h> +#include <linux/cpumask.h> +#include <linux/module.h> + +int __first_cpu(const cpumask_t *srcp) +{ + return min_t(int, NR_CPUS, find_first_bit(srcp->bits, NR_CPUS)); +} +EXPORT_SYMBOL(__first_cpu); + +int __next_cpu(int n, const cpumask_t *srcp) +{ + return min_t(int, NR_CPUS, find_next_bit(srcp->bits, NR_CPUS, n+1)); +} +EXPORT_SYMBOL(__next_cpu); + +/* + * Find the highest possible smp_processor_id() + * + * Note: if we're prepared to assume that cpu_possible_map never changes + * (reasonable) then this function should cache its return value. + */ +int highest_possible_processor_id(void) +{ + unsigned int cpu; + unsigned highest = 0; + + for_each_cpu_mask(cpu, cpu_possible_map) + highest = cpu; + return highest; +} +EXPORT_SYMBOL(highest_possible_processor_id); + +int __any_online_cpu(const cpumask_t *mask) +{ + int cpu; + + for_each_cpu_mask(cpu, *mask) { + if (cpu_online(cpu)) + break; + } + return cpu; +} +EXPORT_SYMBOL(__any_online_cpu); diff -puN lib/Makefile~cpumask-uninline-first_cpu lib/Makefile --- devel/lib/Makefile~cpumask-uninline-first_cpu 2006-03-19 14:26:35.000000000 -0800 +++ devel-akpm/lib/Makefile 2006-03-19 14:26:35.000000000 -0800 @@ -7,6 +7,8 @@ lib-y := errno.o ctype.o string.o vsprin idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \ sha1.o +lib-$(CONFIG_SMP) += cpumask.o + lib-y += kobject.o kref.o kobject_uevent.o klist.o obj-y += sort.o parser.o halfmd4.o iomap_copy.o _ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 22:35 ` Andrew Morton @ 2006-03-19 22:55 ` Linus Torvalds 0 siblings, 0 replies; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 22:55 UTC (permalink / raw) To: Andrew Morton Cc: viro, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sun, 19 Mar 2006, Andrew Morton wrote: > > Also, optimise any_online_cpu() out of existence on CONFIG_SMP=n. > > This function seems inefficient. Can't we simply AND the two masks, then use > find_first_bit()? Then you'd need to generate a temporary cpumask thing. Not a big deal as long as it fits in an "unsigned long", but since the online-cpu thing is likely dense in any relevant cpu-mask, I actually think "any_online_cpu()" as it stands now is likely to be simpler/more efficient than generating a temporary mask. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 22:18 ` Linus Torvalds 2006-03-19 22:35 ` Andrew Morton @ 2006-03-19 22:46 ` Linus Torvalds 2006-03-19 23:04 ` Andrew Morton 1 sibling, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2006-03-19 22:46 UTC (permalink / raw) To: Andrew Morton Cc: viro, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej On Sun, 19 Mar 2006, Linus Torvalds wrote: > > How does this patch look? It makes "for_each_cpu()" look the same > regardless of whether it is UP/SMP, by simply making first_cpu() and > next_cpu() work appropriately. It's still about 200 bytes larger than the inlined "smarter code", but yeah, the unlinlining is certainly the smaller change and doesn't depend on the compiler being as smart. Here's another uninlining patch if you want it. It doesn't make much of a difference on x86 (since the ffs/fls things are inlined to single instructions rather than making use of the generic routines, so the fact that we now link against the generic versions actually increases code size that isn't ever used), but if anybody is collecting patches that removes unnecessary inlining, it definitely looks like the right thing for v850/sh/sh64/arm etc that use the generic versions.. Linus --- diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 208650b..2d3ed35 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -1,74 +1,28 @@ #ifndef _LINUX_BITOPS_H #define _LINUX_BITOPS_H + +#include <linux/linkage.h> #include <asm/types.h> /* * ffs: find first bit set. This is defined the same way as * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). + * + * fls: find last bit set. */ -static inline int generic_ffs(int x) -{ - int r = 1; - - if (!x) - return 0; - if (!(x & 0xffff)) { - x >>= 16; - r += 16; - } - if (!(x & 0xff)) { - x >>= 8; - r += 8; - } - if (!(x & 0xf)) { - x >>= 4; - r += 4; - } - if (!(x & 3)) { - x >>= 2; - r += 2; - } - if (!(x & 1)) { - x >>= 1; - r += 1; - } - return r; -} +extern int fastcall generic_ffs(int x); +extern int fastcall generic_fls(int x); /* - * fls: find last bit set. + * hweightN: returns the hamming weight (i.e. the number + * of bits set) of a N-bit word */ - -static __inline__ int generic_fls(int x) -{ - int r = 32; - - if (!x) - return 0; - if (!(x & 0xffff0000u)) { - x <<= 16; - r -= 16; - } - if (!(x & 0xff000000u)) { - x <<= 8; - r -= 8; - } - if (!(x & 0xf0000000u)) { - x <<= 4; - r -= 4; - } - if (!(x & 0xc0000000u)) { - x <<= 2; - r -= 2; - } - if (!(x & 0x80000000u)) { - x <<= 1; - r -= 1; - } - return r; -} +unsigned int fastcall generic_hweight64(__u64 w); +unsigned int fastcall generic_hweight32(unsigned int w); +unsigned int fastcall generic_hweight16(unsigned int w); +unsigned int fastcall generic_hweight8(unsigned int w); /* * Include this here because some architectures need generic_ffs/fls in @@ -76,7 +30,6 @@ static __inline__ int generic_fls(int x) */ #include <asm/bitops.h> - static inline int generic_fls64(__u64 x) { __u32 h = x >> 32; @@ -103,51 +56,6 @@ static __inline__ int get_count_order(un return order; } -/* - * hweightN: returns the hamming weight (i.e. the number - * of bits set) of a N-bit word - */ - -static inline unsigned int generic_hweight32(unsigned int w) -{ - unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); - res = (res & 0x33333333) + ((res >> 2) & 0x33333333); - res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); - res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); - return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); -} - -static inline unsigned int generic_hweight16(unsigned int w) -{ - unsigned int res = (w & 0x5555) + ((w >> 1) & 0x5555); - res = (res & 0x3333) + ((res >> 2) & 0x3333); - res = (res & 0x0F0F) + ((res >> 4) & 0x0F0F); - return (res & 0x00FF) + ((res >> 8) & 0x00FF); -} - -static inline unsigned int generic_hweight8(unsigned int w) -{ - unsigned int res = (w & 0x55) + ((w >> 1) & 0x55); - res = (res & 0x33) + ((res >> 2) & 0x33); - return (res & 0x0F) + ((res >> 4) & 0x0F); -} - -static inline unsigned long generic_hweight64(__u64 w) -{ -#if BITS_PER_LONG < 64 - return generic_hweight32((unsigned int)(w >> 32)) + - generic_hweight32((unsigned int)w); -#else - u64 res; - res = (w & 0x5555555555555555ul) + ((w >> 1) & 0x5555555555555555ul); - res = (res & 0x3333333333333333ul) + ((res >> 2) & 0x3333333333333333ul); - res = (res & 0x0F0F0F0F0F0F0F0Ful) + ((res >> 4) & 0x0F0F0F0F0F0F0F0Ful); - res = (res & 0x00FF00FF00FF00FFul) + ((res >> 8) & 0x00FF00FF00FF00FFul); - res = (res & 0x0000FFFF0000FFFFul) + ((res >> 16) & 0x0000FFFF0000FFFFul); - return (res & 0x00000000FFFFFFFFul) + ((res >> 32) & 0x00000000FFFFFFFFul); -#endif -} - static inline unsigned long hweight_long(unsigned long w) { return sizeof(w) == 4 ? generic_hweight32(w) : generic_hweight64(w); diff --git a/lib/Makefile b/lib/Makefile index 648b2c1..bba36f1 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -5,7 +5,7 @@ lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \ bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \ idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \ - sha1.o + sha1.o bitops.o lib-y += kobject.o kref.o kobject_uevent.o klist.o diff --git a/lib/bitops.c b/lib/bitops.c new file mode 100644 index 0000000..e5e4cab --- /dev/null +++ b/lib/bitops.c @@ -0,0 +1,106 @@ +#include <linux/bitops.h> +#include <linux/module.h> + +int generic_ffs(int x) +{ + int r = 1; + + if (!x) + return 0; + if (!(x & 0xffff)) { + x >>= 16; + r += 16; + } + if (!(x & 0xff)) { + x >>= 8; + r += 8; + } + if (!(x & 0xf)) { + x >>= 4; + r += 4; + } + if (!(x & 3)) { + x >>= 2; + r += 2; + } + if (!(x & 1)) { + x >>= 1; + r += 1; + } + return r; +} +EXPORT_SYMBOL(generic_ffs); + +int generic_fls(int x) +{ + int r = 32; + + if (!x) + return 0; + if (!(x & 0xffff0000u)) { + x <<= 16; + r -= 16; + } + if (!(x & 0xff000000u)) { + x <<= 8; + r -= 8; + } + if (!(x & 0xf0000000u)) { + x <<= 4; + r -= 4; + } + if (!(x & 0xc0000000u)) { + x <<= 2; + r -= 2; + } + if (!(x & 0x80000000u)) { + x <<= 1; + r -= 1; + } + return r; +} +EXPORT_SYMBOL(generic_fls); + +unsigned int generic_hweight32(unsigned int w) +{ + unsigned int res = (w & 0x55555555) + ((w >> 1) & 0x55555555); + res = (res & 0x33333333) + ((res >> 2) & 0x33333333); + res = (res & 0x0F0F0F0F) + ((res >> 4) & 0x0F0F0F0F); + res = (res & 0x00FF00FF) + ((res >> 8) & 0x00FF00FF); + return (res & 0x0000FFFF) + ((res >> 16) & 0x0000FFFF); +} +EXPORT_SYMBOL(generic_hweight32); + +unsigned int generic_hweight16(unsigned int w) +{ + unsigned int res = (w & 0x5555) + ((w >> 1) & 0x5555); + res = (res & 0x3333) + ((res >> 2) & 0x3333); + res = (res & 0x0F0F) + ((res >> 4) & 0x0F0F); + return (res & 0x00FF) + ((res >> 8) & 0x00FF); +} +EXPORT_SYMBOL(generic_hweight16); + +unsigned int generic_hweight8(unsigned int w) +{ + unsigned int res = (w & 0x55) + ((w >> 1) & 0x55); + res = (res & 0x33) + ((res >> 2) & 0x33); + return (res & 0x0F) + ((res >> 4) & 0x0F); +} +EXPORT_SYMBOL(generic_hweight8); + +unsigned int generic_hweight64(__u64 w) +{ +#if BITS_PER_LONG < 64 + return generic_hweight32((unsigned int)(w >> 32)) + + generic_hweight32((unsigned int)w); +#else + u64 res; + res = (w & 0x5555555555555555ul) + ((w >> 1) & 0x5555555555555555ul); + res = (res & 0x3333333333333333ul) + ((res >> 2) & 0x3333333333333333ul); + res = (res & 0x0F0F0F0F0F0F0F0Ful) + ((res >> 4) & 0x0F0F0F0F0F0F0F0Ful); + res = (res & 0x00FF00FF00FF00FFul) + ((res >> 8) & 0x00FF00FF00FF00FFul); + res = (res & 0x0000FFFF0000FFFFul) + ((res >> 16) & 0x0000FFFF0000FFFFul); + return (res & 0x00000000FFFFFFFFul) + ((res >> 32) & 0x00000000FFFFFFFFul); +#endif +} +EXPORT_SYMBOL(generic_hweight64); ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 22:46 ` Linus Torvalds @ 2006-03-19 23:04 ` Andrew Morton 0 siblings, 0 replies; 31+ messages in thread From: Andrew Morton @ 2006-03-19 23:04 UTC (permalink / raw) To: Linus Torvalds Cc: viro, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej Linus Torvalds <torvalds@osdl.org> wrote: > > Here's another uninlining patch if you want it. > Yes, the bitops need that. We have a 50-patch series against the bitops code queued up. It's from Akinobu Mita. It consolidates of all the C-coded operations which archtectures are presently duplicating. In toto: 80 files changed, 1271 insertions(+), 4999 deletions(-) It does include uninlining of the hweight functions, although I note that include/asm-generic/bitops/__ffs.h remains inlined. I don't know how many architectures are using the generic code though. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 20:31 ` Linus Torvalds 2006-03-19 20:47 ` Andrew Morton @ 2006-03-19 20:57 ` Parag Warudkar 1 sibling, 0 replies; 31+ messages in thread From: Parag Warudkar @ 2006-03-19 20:57 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Andrew Morton, linux-kernel, alex-kernel, jun.nakajima, davej On Sunday 19 March 2006 15:31, Linus Torvalds wrote: > In the meantime, people who are > interested can test out how much difference the trivial patch makes for > them.. 4464 bytes saving in text size for me on x86 - boots and seems to work fine. Parag ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 19:33 ` Linus Torvalds 2006-03-19 19:40 ` Al Viro @ 2006-03-20 6:12 ` Willy Tarreau 2006-03-20 6:26 ` Linus Torvalds 2006-03-20 8:22 ` Peter T. Breuer 1 sibling, 2 replies; 31+ messages in thread From: Willy Tarreau @ 2006-03-20 6:12 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej, viro On Sun, Mar 19, 2006 at 11:33:17AM -0800, Linus Torvalds wrote: > Of course, I shouldn't say "works", since it is still totally untested. It > _looks_ good, and that statement expression usage is just _so_ ugly it's > cute. > > Linus At least, you could have moved the macro closer to where it's used. It's very uncommon to break a statement within an if condition, and it's not expected that the macro you're calling does a break under you. It took me several minutes to understand precisely how this works. Now it seems trivial, but I guess that at 3am I would have gone to bed instead. One first enhancement would be to make it easier to understand by putting it closer to its user : #elif NR_CPUS > 1 #define check_for_each_cpu(cpu, mask) \ ({ unsigned long __bits = (mask).bits[0] >> (cpu); if (!__bits) break; __bits & 1; }) #define for_each_cpu_mask(cpu, mask) \ for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \ if (!check_for_each_cpu(cpu, mask)) \ continue; \ else Now, does removing the macro completely change the output code ? I think that if something written like this produces the same code, it would be easier to read : #define for_each_cpu_mask(cpu, mask) \ for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \ unsigned long __bits = (mask).bits[0] >> (cpu); \ if (!__bits) \ break; \ if (!__bits & 1) \ continue; \ else Regards, Willy ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-20 6:12 ` Willy Tarreau @ 2006-03-20 6:26 ` Linus Torvalds 2006-03-20 7:18 ` Willy Tarreau 2006-03-20 8:22 ` Peter T. Breuer 1 sibling, 1 reply; 31+ messages in thread From: Linus Torvalds @ 2006-03-20 6:26 UTC (permalink / raw) To: Willy Tarreau Cc: Andrew Morton, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej, viro On Mon, 20 Mar 2006, Willy Tarreau wrote: > > Now, does removing the macro completely change the output code ? > I think that if something written like this produces the same > code, it would be easier to read : > > #define for_each_cpu_mask(cpu, mask) \ > for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \ > unsigned long __bits = (mask).bits[0] >> (cpu); \ > if (!__bits) \ > break; \ > if (!__bits & 1) \ > continue; \ > else Absolutely, but now it has a dangling "{" that didn't get closed. So the above would definitely be more readable, it just doesn't actually work. Unless you'd do the "end_for_each_cpu" define (to close the statement), and update the 300+ places that use this. Which might well be worth it. So the subtle "break from the middle of a statement expression" was just a rather hacky way to avoid having to change all the users of this macro. Linus ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-20 6:26 ` Linus Torvalds @ 2006-03-20 7:18 ` Willy Tarreau 2006-03-21 6:32 ` Willy Tarreau 0 siblings, 1 reply; 31+ messages in thread From: Willy Tarreau @ 2006-03-20 7:18 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej, viro On Sun, Mar 19, 2006 at 10:26:30PM -0800, Linus Torvalds wrote: > > > On Mon, 20 Mar 2006, Willy Tarreau wrote: > > > > Now, does removing the macro completely change the output code ? > > I think that if something written like this produces the same > > code, it would be easier to read : > > > > #define for_each_cpu_mask(cpu, mask) \ > > for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \ > > unsigned long __bits = (mask).bits[0] >> (cpu); \ > > if (!__bits) \ > > break; \ > > if (!__bits & 1) \ > > continue; \ > > else > > Absolutely, but now it has a dangling "{" that didn't get closed. So the > above would definitely be more readable, it just doesn't actually work. > > Unless you'd do the "end_for_each_cpu" define (to close the statement), > and update the 300+ places that use this. Which might well be worth it. > > So the subtle "break from the middle of a statement expression" was just a > rather hacky way to avoid having to change all the users of this macro. > > Linus Oh, you're right, now I understand your motivation in doing this. Then perhaps using your trick but applying it to the whole for loop would make it easier to read ? #define for_each_cpu_mask(cpu, mask) \ for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \ ({ unsigned long __bits = (mask).bits[0] >> (cpu); \ if (!__bits) \ break; \ if (!__bits & 1) \ continue; \ else \ ... \ }) Please note that I've not read the rest of the code, so there may be some problems left. However, if the above works, I find it easier to read. And in this case, yes, it's interesting to be able to break from within an expression. Cheers, Willy ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-20 7:18 ` Willy Tarreau @ 2006-03-21 6:32 ` Willy Tarreau 0 siblings, 0 replies; 31+ messages in thread From: Willy Tarreau @ 2006-03-21 6:32 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, kernel-stuff, linux-kernel, alex-kernel, jun.nakajima, davej, viro On Mon, Mar 20, 2006 at 08:18:46AM +0100, Willy Tarreau wrote: > On Sun, Mar 19, 2006 at 10:26:30PM -0800, Linus Torvalds wrote: > > > > > > On Mon, 20 Mar 2006, Willy Tarreau wrote: > > > > > > Now, does removing the macro completely change the output code ? > > > I think that if something written like this produces the same > > > code, it would be easier to read : > > > > > > #define for_each_cpu_mask(cpu, mask) \ > > > for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \ > > > unsigned long __bits = (mask).bits[0] >> (cpu); \ > > > if (!__bits) \ > > > break; \ > > > if (!__bits & 1) \ > > > continue; \ > > > else > > > > Absolutely, but now it has a dangling "{" that didn't get closed. So the > > above would definitely be more readable, it just doesn't actually work. > > > > Unless you'd do the "end_for_each_cpu" define (to close the statement), > > and update the 300+ places that use this. Which might well be worth it. > > > > So the subtle "break from the middle of a statement expression" was just a > > rather hacky way to avoid having to change all the users of this macro. > > > > Linus > > Oh, you're right, now I understand your motivation in doing this. > Then perhaps using your trick but applying it to the whole for loop > would make it easier to read ? > > #define for_each_cpu_mask(cpu, mask) \ > for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \ > ({ unsigned long __bits = (mask).bits[0] >> (cpu); \ > if (!__bits) \ > break; \ > if (!__bits & 1) \ > continue; \ > else \ > ... \ > }) Well, forget it ! I did not realize that the 'else' was what called the inner loop. So this construct it not possible at all either for the same reason. Sorry for not having read the code before posting. Cheers, Willy ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-20 6:12 ` Willy Tarreau 2006-03-20 6:26 ` Linus Torvalds @ 2006-03-20 8:22 ` Peter T. Breuer 1 sibling, 0 replies; 31+ messages in thread From: Peter T. Breuer @ 2006-03-20 8:22 UTC (permalink / raw) To: Willy Tarreau; +Cc: linux-kernel In article <20060320061212.GG21493@w.ods.org> you wrote: > On Sun, Mar 19, 2006 at 11:33:17AM -0800, Linus Torvalds wrote: >> Of course, I shouldn't say "works", since it is still totally untested. It >> _looks_ good, and that statement expression usage is just _so_ ugly it's >> cute. > #define for_each_cpu_mask(cpu, mask) \ > for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) { \ > unsigned long __bits = (mask).bits[0] >> (cpu); \ > if (!__bits) \ > break; \ > if (!__bits & 1) \ > continue; \ > else While that's slightly wrong (needs a closing } supplied by the user), it does inspire me to point out that one can put the skips in the ordinary statement part of the for and use the if else idea to make sure that the for needs just one statement following (i.e. no dangling } supplied by the user) #define for_each_cpu_mask(cpu, mask) \ for ((cpu) = 0; (cpu) < NR_CPUS; (cpu)++) \ if (!((mask).bits[0] >> (cpu)) { \ break; \ } else if (!((mask).bits[0] >> (cpu)) & 1) { \ continue; \ } else I expect common subexpression optimization in the compiler to remove the repetition here. If not, somebody else can think about it! Peter ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 0:53 ` Andrew Morton 2006-03-19 2:38 ` Linus Torvalds @ 2006-03-19 6:34 ` Parag Warudkar 2006-03-19 12:00 ` Alexander Clouter 1 sibling, 1 reply; 31+ messages in thread From: Parag Warudkar @ 2006-03-19 6:34 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, linux-kernel, alex-kernel, jun.nakajima, davej On Saturday 18 March 2006 19:53, Andrew Morton wrote: > I might stick a once-off WARN_ON() in there so someone gets in > and works out why we keep on having to graft mysterious null-pointer > avoidances into cpufreq. cpufreq_conservative should be marked broken on SMP - I have used it on UP boxes without trouble but I can't even safely modprobe it on SMP - it nearly ate my filesystem. And there seem to be multiple different problems with it - I get different oopses depending upon whether or not I have loaded it before or after the ondemand module. Weird enough - cpufreq_conservative shares much of it's code with cpufreq_ondemand, which works without any problem. Let me know if anyone has objection to marking cpufreq_conservative depends !SMP - I am planning to submit a patch soon. Parag ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 6:34 ` Parag Warudkar @ 2006-03-19 12:00 ` Alexander Clouter 2006-03-19 14:06 ` Parag Warudkar 0 siblings, 1 reply; 31+ messages in thread From: Alexander Clouter @ 2006-03-19 12:00 UTC (permalink / raw) To: Parag Warudkar Cc: Andrew Morton, Linus Torvalds, linux-kernel, jun.nakajima, davej [-- Attachment #1: Type: text/plain, Size: 1719 bytes --] Hi, Parag Warudkar <kernel-stuff@comcast.net> [20060319 01:34:01 -0500]: > > On Saturday 18 March 2006 19:53, Andrew Morton wrote: > > I might stick a once-off WARN_ON() in there so someone gets in > > and works out why we keep on having to graft mysterious null-pointer > > avoidances into cpufreq. > cpufreq_conservative should be marked broken on SMP - I have used it on UP > boxes without trouble but I can't even safely modprobe it on SMP - it nearly > ate my filesystem. > > And there seem to be multiple different problems with it - I get different > oopses depending upon whether or not I have loaded it before or after the > ondemand module. Weird enough - cpufreq_conservative shares much of it's > code with cpufreq_ondemand, which works without any problem. > Well its drifted a bit, however I submitted a number of patches here about two weeks ago to bring it back into line and hopefully make it HOTPLUG safe. The new set of patches pretty much make conservative's codebase identical to ondemands....as no one has posted back having used these or anything what am I to do?! > Let me know if anyone has objection to marking cpufreq_conservative > depends !SMP - I am planning to submit a patch soon. > Doesn't bother me, what does is no one is trying to updates to conservative before deciding to declare it borked? Hey ho.... Alex > Parag -- _______________________________________ / Misfortunes arrive on wings and leave \ \ on foot. / --------------------------------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 12:00 ` Alexander Clouter @ 2006-03-19 14:06 ` Parag Warudkar 2006-03-19 17:34 ` Alexander Clouter 0 siblings, 1 reply; 31+ messages in thread From: Parag Warudkar @ 2006-03-19 14:06 UTC (permalink / raw) To: Alexander Clouter Cc: Andrew Morton, Linus Torvalds, linux-kernel, jun.nakajima, davej On Sunday 19 March 2006 07:00, Alexander Clouter wrote: > Well its drifted a bit, however I submitted a number of patches here about > two weeks ago to bring it back into line and hopefully make it HOTPLUG > safe. > > The new set of patches pretty much make conservative's codebase identical > to ondemands....as no one has posted back having used these or anything > what am I to do?! The codebase already seems identical to ondemand - Are your patches in 2.6.16-rc6 or -mm? If they are - let me know which. If you posted them but they haven't yet made it into either -mm or mainline can you please post links to all your patches please? I can test them. Why do we even have conservative and ondemand as two separate modules given they share huge amount of code - perhaps make conservative an optional behaviour of ondemand or alteast make a common lib which both use? Thanks Parag ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: OOPS: 2.6.16-rc6 cpufreq_conservative 2006-03-19 14:06 ` Parag Warudkar @ 2006-03-19 17:34 ` Alexander Clouter 0 siblings, 0 replies; 31+ messages in thread From: Alexander Clouter @ 2006-03-19 17:34 UTC (permalink / raw) To: Parag Warudkar Cc: Andrew Morton, Linus Torvalds, linux-kernel, jun.nakajima, davej [-- Attachment #1: Type: text/plain, Size: 2651 bytes --] Hi, Parag Warudkar <kernel-stuff@comcast.net> [20060319 09:06:25 -0500]: > > The codebase already seems identical to ondemand - Are your patches in > 2.6.16-rc6 or -mm? If they are - let me know which. If you posted them but > they haven't yet made it into either -mm or mainline can you please post > links to all your patches please? I can test them. > Well I submitted them back on 2006-02-24: http://marc.theaimsgroup.com/?l=linux-kernel&m=114079151404567&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114079151425558&w=2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114079151417220&w=2 It has been pointed out I did forget to CC the cpufreq folk, so thats completely my fault; I thought I had. :-/ > Why do we even have conservative and ondemand as two separate modules given > they share huge amount of code - perhaps make conservative an optional > behaviour of ondemand or alteast make a common lib which both use? > Originally the 'conservative' feature was just a sysfs flag that could be set however it was rejected for a number of reasons; one of them quite rightly, we have a modular system that can take stacks of cpufreq governors so lets use that :) Also, more importantly, bugs in my bits do not affect the original author. It makes more sense to keep things this way as the internal code then does not need a bunch of if{}'s scattered around and also I have a number of extra sysfs files to tweak which would either do nothing in 'ondemand' mode or have to be magically created and destroyed. Either way, its probably neater this way and its *my* duty to make sure anything changing for ondemand is considered for conservative. If you look at a lot of the userland tools that have come out, it would be a pain to have them consider the exception class of handling the combined ondemand/conservative. Breaking out the thing into a library probably would be awkward as all the similar code is actually inline in functions, putting that in a seperate file would be pointless. Hopefully when you apply my patches and then do a diff between the ondemand and conservative governors you will see what I mean. Cheers Alex -- ________________________________________ / Fortune finishes the great quotations, \ | #3 | | | | Birds of a feather flock to a newly | \ washed car. / ---------------------------------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2006-03-21 6:36 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-03-18 20:25 OOPS: 2.6.16-rc6 cpufreq_conservative Parag Warudkar 2006-03-18 21:40 ` Linus Torvalds 2006-03-18 22:09 ` Parag Warudkar 2006-03-18 23:12 ` Linus Torvalds 2006-03-18 22:26 ` Parag Warudkar 2006-03-19 0:53 ` Andrew Morton 2006-03-19 2:38 ` Linus Torvalds 2006-03-19 5:08 ` Paul Jackson 2006-03-19 17:43 ` Linus Torvalds 2006-03-19 18:46 ` Linus Torvalds 2006-03-19 19:02 ` Linus Torvalds 2006-03-19 19:33 ` Linus Torvalds 2006-03-19 19:40 ` Al Viro 2006-03-19 20:01 ` Linus Torvalds 2006-03-19 20:31 ` Linus Torvalds 2006-03-19 20:47 ` Andrew Morton 2006-03-19 22:18 ` Linus Torvalds 2006-03-19 22:35 ` Andrew Morton 2006-03-19 22:55 ` Linus Torvalds 2006-03-19 22:46 ` Linus Torvalds 2006-03-19 23:04 ` Andrew Morton 2006-03-19 20:57 ` Parag Warudkar 2006-03-20 6:12 ` Willy Tarreau 2006-03-20 6:26 ` Linus Torvalds 2006-03-20 7:18 ` Willy Tarreau 2006-03-21 6:32 ` Willy Tarreau 2006-03-20 8:22 ` Peter T. Breuer 2006-03-19 6:34 ` Parag Warudkar 2006-03-19 12:00 ` Alexander Clouter 2006-03-19 14:06 ` Parag Warudkar 2006-03-19 17:34 ` Alexander Clouter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox