public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
@ 2009-01-07  0:12 Justin P. Mattock
  2009-01-07  6:48 ` Pekka Enberg
  0 siblings, 1 reply; 15+ messages in thread
From: Justin P. Mattock @ 2009-01-07  0:12 UTC (permalink / raw)
  To: linux-kernel

With pulling git today I'm unable to shut the machine down completely.
(the system just sits there with the message on the screen);

* will now halt
[  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
[  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
[  286.550598] Oops: 0002 [#1] SMP
[  286.552206] last sysfs file: /sys/block/sda/removeable
[  286.553844] Modules linked in: hidp radeon drm agpgart btusb rfcomm 
bnep sco l2cap bluetooth fan battery container ipt_LOG xt_limit 
xt_tcpudp xt_state ipt_addrtype nf_nat_irc nf_conntrack_irc nf_nat_ftp 
nf_nat nf_conntrack_ftp ipmi_watchdog ipmi_msghandler uvcvideo 
isight_firmware uinput arpt_mangle arptable_filter arp_tables 
nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle 
iptable_filter ip_tables x_tables coretemp eeprom acpi_cpufreq 
cpufreq_powersave cpufreq_performance cpufreq_ondemand 
cpufreq_conservative appletouch snd_had_codec_idt ohci1394 ehci_hcd 
snd_hda_intel snd_hda_codec thermal ath9k uhci_hcd ieee1394 joydev 
pata_acpi snd_hwdep snd_pcm snd_page_alloc video ac button processor 
applesmc evdev
[  286.560580]
[  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 
#1) MacBookPro2,2
[  286.560580] EIP: 0060:[<c0150ca4>] EFLAGS: 00010293 CPU: 0
[  286.560580] EIP: is at __stop_machine+0x88/0xe3
[  286.560580] EAX: 6b6b6b6b EBX: 00000000 ECX: 6b6b6b6b EDX: 00000000
[  286.560580] ESI: c054abe0 EDI: c03d03a4 EBP: f1a29e54 ESP: f1a29e44
[  286.560580]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 
task.ti=f1a28000)
[  286.560580] Stack:
[  286.560580] f1a29e60 c054abe0 00000001 00000010 f1a29e7c c03d04e4 
ffffffea 00000010
[  286.560580] 00000001 00000003 00000022 00000001 4321fedc c054abe0 
f1a29e94 c012a57e
[  286.560580] 00000000 ffffffff 4321fedc 28121969 f1a29e9c c01360c0 
f1a29fb0 c0136301
[  286.560580] Call Trace:
[  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
[  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
[  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
[  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
[  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
[  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
[  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
[  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
[  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
[  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
[  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
[  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
[  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
[  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
[  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
[  286.560580] Code: c7 05 10 06 62 c0 00 00 00 00 a3 f4 05 62 c0 c7 05 
ec 05 62 c0 01 00 00 00 83 cb ff eb 2d a1 1c 06 62 c0 f7 d0 8b 0c 98 8d 
41 04 <c7> 01 00 00 00 00 89 41 04 89 41 08 c7 41 0c ff 0c 15 c0 89 d8
[  286.560580] EIP: [<c0150ca4>] __stop_machine+0x88/0xe3 SS:ESP 
0068:f1a29e44
[  286.639215] ---[ end trace 5b080c1ab14203ae ] ---
Segmentation fault

after this message appears, if I hold down the start button
the system shuts off after a few seconds.
(BTW hopefully the number are correct,
manually writing this down, is a bit of a pain);

regards;

Justin P. Mattock


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
  2009-01-07  0:12 [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock
@ 2009-01-07  6:48 ` Pekka Enberg
  2009-01-07  8:13   ` Justin P. Mattock
  2009-01-07  8:30   ` Pekka Enberg
  0 siblings, 2 replies; 15+ messages in thread
From: Pekka Enberg @ 2009-01-07  6:48 UTC (permalink / raw)
  To: Justin P. Mattock; +Cc: linux-kernel, Rusty Russell

Hi Justin,

On Wed, Jan 7, 2009 at 2:12 AM, Justin P. Mattock
<justinmattock@gmail.com> wrote:
> With pulling git today I'm unable to shut the machine down completely.
> (the system just sits there with the message on the screen);
>
> * will now halt
> [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b

That looks like use-after-free in __stop_machine() so lets cc Rusty.
If you want, you can convert the oops location into human-readable
form. Just search for "GDB" in Documentation/BUG-HUNTING for
instructions how to do that. And don't forget to send your .config.

> [  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
> [  286.550598] Oops: 0002 [#1] SMP
> [  286.552206] last sysfs file: /sys/block/sda/removeable
> [  286.553844] Modules linked in: hidp radeon drm agpgart btusb rfcomm bnep
> sco l2cap bluetooth fan battery container ipt_LOG xt_limit xt_tcpudp
> xt_state ipt_addrtype nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat
> nf_conntrack_ftp ipmi_watchdog ipmi_msghandler uvcvideo isight_firmware
> uinput arpt_mangle arptable_filter arp_tables nf_conntrack_ipv4 nf_conntrack
> nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables coretemp
> eeprom acpi_cpufreq cpufreq_powersave cpufreq_performance cpufreq_ondemand
> cpufreq_conservative appletouch snd_had_codec_idt ohci1394 ehci_hcd
> snd_hda_intel snd_hda_codec thermal ath9k uhci_hcd ieee1394 joydev pata_acpi
> snd_hwdep snd_pcm snd_page_alloc video ac button processor applesmc evdev
> [  286.560580]
> [  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 #1)
> MacBookPro2,2
> [  286.560580] EIP: 0060:[<c0150ca4>] EFLAGS: 00010293 CPU: 0
> [  286.560580] EIP: is at __stop_machine+0x88/0xe3
> [  286.560580] EAX: 6b6b6b6b EBX: 00000000 ECX: 6b6b6b6b EDX: 00000000
> [  286.560580] ESI: c054abe0 EDI: c03d03a4 EBP: f1a29e54 ESP: f1a29e44
> [  286.560580]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
> task.ti=f1a28000)
> [  286.560580] Stack:
> [  286.560580] f1a29e60 c054abe0 00000001 00000010 f1a29e7c c03d04e4
> ffffffea 00000010
> [  286.560580] 00000001 00000003 00000022 00000001 4321fedc c054abe0
> f1a29e94 c012a57e
> [  286.560580] 00000000 ffffffff 4321fedc 28121969 f1a29e9c c01360c0
> f1a29fb0 c0136301
> [  286.560580] Call Trace:
> [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
> [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
> [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
> [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
> [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
> [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
> [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
> [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
> [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
> [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
> [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
> [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
> [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
> [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
> [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
> [  286.560580] Code: c7 05 10 06 62 c0 00 00 00 00 a3 f4 05 62 c0 c7 05 ec
> 05 62 c0 01 00 00 00 83 cb ff eb 2d a1 1c 06 62 c0 f7 d0 8b 0c 98 8d 41 04
> <c7> 01 00 00 00 00 89 41 04 89 41 08 c7 41 0c ff 0c 15 c0 89 d8
> [  286.560580] EIP: [<c0150ca4>] __stop_machine+0x88/0xe3 SS:ESP
> 0068:f1a29e44
> [  286.639215] ---[ end trace 5b080c1ab14203ae ] ---
> Segmentation fault
>
> after this message appears, if I hold down the start button
> the system shuts off after a few seconds.
> (BTW hopefully the number are correct,
> manually writing this down, is a bit of a pain);
>
> regards;
>
> Justin P. Mattock
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
  2009-01-07  6:48 ` Pekka Enberg
@ 2009-01-07  8:13   ` Justin P. Mattock
  2009-01-07  8:30   ` Pekka Enberg
  1 sibling, 0 replies; 15+ messages in thread
From: Justin P. Mattock @ 2009-01-07  8:13 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-kernel, Rusty Russell

Pekka Enberg wrote:
> Hi Justin,
>
> On Wed, Jan 7, 2009 at 2:12 AM, Justin P. Mattock
> <justinmattock@gmail.com> wrote:
>   
>> With pulling git today I'm unable to shut the machine down completely.
>> (the system just sits there with the message on the screen);
>>
>> * will now halt
>> [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
>>     
>
> That looks like use-after-free in __stop_machine() so lets cc Rusty.
> If you want, you can convert the oops location into human-readable
> form. Just search for "GDB" in Documentation/BUG-HUNTING for
> instructions how to do that. And don't forget to send your .config.
>
>   
>> [  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
>> [  286.550598] Oops: 0002 [#1] SMP
>> [  286.552206] last sysfs file: /sys/block/sda/removeable
>> [  286.553844] Modules linked in: hidp radeon drm agpgart btusb rfcomm bnep
>> sco l2cap bluetooth fan battery container ipt_LOG xt_limit xt_tcpudp
>> xt_state ipt_addrtype nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat
>> nf_conntrack_ftp ipmi_watchdog ipmi_msghandler uvcvideo isight_firmware
>> uinput arpt_mangle arptable_filter arp_tables nf_conntrack_ipv4 nf_conntrack
>> nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables coretemp
>> eeprom acpi_cpufreq cpufreq_powersave cpufreq_performance cpufreq_ondemand
>> cpufreq_conservative appletouch snd_had_codec_idt ohci1394 ehci_hcd
>> snd_hda_intel snd_hda_codec thermal ath9k uhci_hcd ieee1394 joydev pata_acpi
>> snd_hwdep snd_pcm snd_page_alloc video ac button processor applesmc evdev
>> [  286.560580]
>> [  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 #1)
>> MacBookPro2,2
>> [  286.560580] EIP: 0060:[<c0150ca4>] EFLAGS: 00010293 CPU: 0
>> [  286.560580] EIP: is at __stop_machine+0x88/0xe3
>> [  286.560580] EAX: 6b6b6b6b EBX: 00000000 ECX: 6b6b6b6b EDX: 00000000
>> [  286.560580] ESI: c054abe0 EDI: c03d03a4 EBP: f1a29e54 ESP: f1a29e44
>> [  286.560580]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> [  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
>> task.ti=f1a28000)
>> [  286.560580] Stack:
>> [  286.560580] f1a29e60 c054abe0 00000001 00000010 f1a29e7c c03d04e4
>> ffffffea 00000010
>> [  286.560580] 00000001 00000003 00000022 00000001 4321fedc c054abe0
>> f1a29e94 c012a57e
>> [  286.560580] 00000000 ffffffff 4321fedc 28121969 f1a29e9c c01360c0
>> f1a29fb0 c0136301
>> [  286.560580] Call Trace:
>> [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
>> [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
>> [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
>> [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
>> [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
>> [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
>> [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
>> [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
>> [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
>> [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
>> [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
>> [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
>> [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
>> [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
>> [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
>> [  286.560580] Code: c7 05 10 06 62 c0 00 00 00 00 a3 f4 05 62 c0 c7 05 ec
>> 05 62 c0 01 00 00 00 83 cb ff eb 2d a1 1c 06 62 c0 f7 d0 8b 0c 98 8d 41 04
>> <c7> 01 00 00 00 00 89 41 04 89 41 08 c7 41 0c ff 0c 15 c0 89 d8
>> [  286.560580] EIP: [<c0150ca4>] __stop_machine+0x88/0xe3 SS:ESP
>> 0068:f1a29e44
>> [  286.639215] ---[ end trace 5b080c1ab14203ae ] ---
>> Segmentation fault
>>
>> after this message appears, if I hold down the start button
>> the system shuts off after a few seconds.
>> (BTW hopefully the number are correct,
>> manually writing this down, is a bit of a pain);
>>
>> regards;
>>
>> Justin P. Mattock
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>>     
>
>   
Thats nice, thanks for the info.
I like the idea of using gcc to disassemble
this text.
Since not knowing what I'm doing,
I'll have to do my homework on this.
(really curious to see what this does);

regards;

Justin P. Mattock




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
  2009-01-07  6:48 ` Pekka Enberg
  2009-01-07  8:13   ` Justin P. Mattock
@ 2009-01-07  8:30   ` Pekka Enberg
  2009-01-07  9:15     ` Heiko Carstens
  1 sibling, 1 reply; 15+ messages in thread
From: Pekka Enberg @ 2009-01-07  8:30 UTC (permalink / raw)
  To: Justin P. Mattock; +Cc: linux-kernel, Rusty Russell, heiko.carstens

On Wed, Jan 7, 2009 at 8:48 AM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> On Wed, Jan 7, 2009 at 2:12 AM, Justin P. Mattock
> <justinmattock@gmail.com> wrote:
>> With pulling git today I'm unable to shut the machine down completely.
>> (the system just sits there with the message on the screen);
>>
>> * will now halt
>> [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
>
> That looks like use-after-free in __stop_machine() so lets cc Rusty.
> If you want, you can convert the oops location into human-readable
> form. Just search for "GDB" in Documentation/BUG-HUNTING for
> instructions how to do that. And don't forget to send your .config.
>
>> [  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
>> [  286.550598] Oops: 0002 [#1] SMP
>> [  286.552206] last sysfs file: /sys/block/sda/removeable
>> [  286.553844] Modules linked in: hidp radeon drm agpgart btusb rfcomm bnep
>> sco l2cap bluetooth fan battery container ipt_LOG xt_limit xt_tcpudp
>> xt_state ipt_addrtype nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat
>> nf_conntrack_ftp ipmi_watchdog ipmi_msghandler uvcvideo isight_firmware
>> uinput arpt_mangle arptable_filter arp_tables nf_conntrack_ipv4 nf_conntrack
>> nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables coretemp
>> eeprom acpi_cpufreq cpufreq_powersave cpufreq_performance cpufreq_ondemand
>> cpufreq_conservative appletouch snd_had_codec_idt ohci1394 ehci_hcd
>> snd_hda_intel snd_hda_codec thermal ath9k uhci_hcd ieee1394 joydev pata_acpi
>> snd_hwdep snd_pcm snd_page_alloc video ac button processor applesmc evdev
>> [  286.560580]
>> [  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 #1)
>> MacBookPro2,2
>> [  286.560580] EIP: 0060:[<c0150ca4>] EFLAGS: 00010293 CPU: 0
>> [  286.560580] EIP: is at __stop_machine+0x88/0xe3
>> [  286.560580] EAX: 6b6b6b6b EBX: 00000000 ECX: 6b6b6b6b EDX: 00000000
>> [  286.560580] ESI: c054abe0 EDI: c03d03a4 EBP: f1a29e54 ESP: f1a29e44
>> [  286.560580]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> [  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
>> task.ti=f1a28000)
>> [  286.560580] Stack:
>> [  286.560580] f1a29e60 c054abe0 00000001 00000010 f1a29e7c c03d04e4
>> ffffffea 00000010
>> [  286.560580] 00000001 00000003 00000022 00000001 4321fedc c054abe0
>> f1a29e94 c012a57e
>> [  286.560580] 00000000 ffffffff 4321fedc 28121969 f1a29e9c c01360c0
>> f1a29fb0 c0136301
>> [  286.560580] Call Trace:
>> [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
>> [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
>> [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
>> [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
>> [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
>> [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
>> [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
>> [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
>> [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
>> [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
>> [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
>> [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
>> [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
>> [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
>> [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
>> [  286.560580] Code: c7 05 10 06 62 c0 00 00 00 00 a3 f4 05 62 c0 c7 05 ec
>> 05 62 c0 01 00 00 00 83 cb ff eb 2d a1 1c 06 62 c0 f7 d0 8b 0c 98 8d 41 04
>> <c7> 01 00 00 00 00 89 41 04 89 41 08 c7 41 0c ff 0c 15 c0 89 d8
>> [  286.560580] EIP: [<c0150ca4>] __stop_machine+0x88/0xe3 SS:ESP
>> 0068:f1a29e44
>> [  286.639215] ---[ end trace 5b080c1ab14203ae ] ---
>> Segmentation fault
>>
>> after this message appears, if I hold down the start button
>> the system shuts off after a few seconds.
>> (BTW hopefully the number are correct,
>> manually writing this down, is a bit of a pain);

scripts/decodecode gives us:

   0:   c7 05 10 06 62 c0 00    movl   $0x0,0xc0620610
   7:   00 00 00
   a:   a3 f4 05 62 c0          mov    %eax,0xc06205f4
   f:   c7 05 ec 05 62 c0 01    movl   $0x1,0xc06205ec
  16:   00 00 00
  19:   83 cb ff                or     $0xffffffff,%ebx
  1c:   eb 2d                   jmp    0x4b
  1e:   a1 1c 06 62 c0          mov    0xc062061c,%eax
  23:   f7 d0                   not    %eax
  25:   8b 0c 98                mov    (%eax,%ebx,4),%ecx
  28:   8d 41 04                lea    0x4(%ecx),%eax
  2b:   c7 01 00 00 00 00       movl   $0x0,(%ecx)
  31:   89 41 04                mov    %eax,0x4(%ecx)
  34:   89 41 08                mov    %eax,0x8(%ecx)
  37:   c7 41 0c ff 0c 15 c0    movl   $0xc0150cff,0xc(%ecx)
  3e:   89 d8                   mov    %ebx,%eax
   0:   c7 01 00 00 00 00       movl   $0x0,(%ecx)      <-- oops
   6:   89 41 04                mov    %eax,0x4(%ecx)
   9:   89 41 08                mov    %eax,0x8(%ecx)
   c:   c7 41 0c ff 0c 15 c0    movl   $0xc0150cff,0xc(%ecx)
  13:   89 d8                   mov    %ebx,%eax

objdump -S -d kernel/stop_machine.o looks like this on my machine:

        /* Schedule the stop_cpu work on all cpus: hold this CPU so one
         * doesn't hit this CPU until we're ready. */
        get_cpu();
        for_each_online_cpu(i) {
                sm_work = percpu_ptr(stop_machine_work, i);
                INIT_WORK(sm_work, stop_cpu);
  8b:   c7 01 00 00 00 00       movl   $0x0,(%ecx)
  91:   c7 41 0c e0 00 00 00    movl   $0xe0,0xc(%ecx)

where

#define INIT_WORK(_work, _func)                                         \
        do {                                                            \
                (_work)->data = (atomic_long_t) WORK_DATA_INIT();       \

and offset of ->data is zero and WORK_DATA_INIT() expands to
ATOMIC_LONG_INIT(0) so looks to me like 'sm_work' is used after it has
been free'd. Perhaps stop_machine_destroy() was called before or in
parallel to __stop_machine()? So lets cc Heiko as well.

                        Pekka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
  2009-01-07  8:30   ` Pekka Enberg
@ 2009-01-07  9:15     ` Heiko Carstens
  2009-01-07  9:19       ` Pekka Enberg
  0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2009-01-07  9:15 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Justin P. Mattock, linux-kernel, Rusty Russell

On Wed, Jan 07, 2009 at 10:30:56AM +0200, Pekka Enberg wrote:
> On Wed, Jan 7, 2009 at 8:48 AM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> > On Wed, Jan 7, 2009 at 2:12 AM, Justin P. Mattock
> > <justinmattock@gmail.com> wrote:
> >> With pulling git today I'm unable to shut the machine down completely.
> >> (the system just sits there with the message on the screen);
[...]
> >> [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
> >
> > That looks like use-after-free in __stop_machine() so lets cc Rusty.
> > If you want, you can convert the oops location into human-readable
> > form. Just search for "GDB" in Documentation/BUG-HUNTING for
> > instructions how to do that. And don't forget to send your .config.
[...]
> >> [  286.560580] EIP: is at __stop_machine+0x88/0xe3
> >> [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
> >> [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
> >> [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
> >> [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
> >> [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
> >> [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
> >> [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
> >> [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
> >> [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
> >> [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
> >> [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
> >> [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
> >> [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
> >> [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
> >> [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
[...]
> and offset of ->data is zero and WORK_DATA_INIT() expands to
> ATOMIC_LONG_INIT(0) so looks to me like 'sm_work' is used after it has
> been free'd. Perhaps stop_machine_destroy() was called before or in
> parallel to __stop_machine()? So lets cc Heiko as well.

I missed to convert disable_nonboot_cpus to stop_machine_create/destroy.
So it's a use before-even-allocated bug.

The patch below should hopefully fix it:

Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus

From: Heiko Carstens <heiko.carstens@de.ibm.com>

disable_nonboot_cpus calls directly _cpu_down. _cpu_down however relies on
the in advanced created stop_machine kernel threads which should be created
by the caller (like cpu_down does).

So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus
as well.

Fixes this bug:

[  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
[  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
[  286.550598] Oops: 0002 [#1] SMP
[  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5
[  286.560580] EIP: is at __stop_machine+0x88/0xe3
[  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
[  286.560580] Call Trace:
[  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
[  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
[  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
[  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
[  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
[  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
[  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
[  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
[  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
[  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
[  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
[  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
[  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
[  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
[  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34

Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/cpu.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/cpu.c
===================================================================
--- linux-2.6.orig/kernel/cpu.c
+++ linux-2.6/kernel/cpu.c
@@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus;
 
 int disable_nonboot_cpus(void)
 {
-	int cpu, first_cpu, error = 0;
+	int cpu, first_cpu, error;
 
+	error = stop_machine_create();
+	if (error)
+		return error;
 	cpu_maps_update_begin();
 	first_cpu = cpumask_first(cpu_online_mask);
 	/* We take down all of the non-boot CPUs in one shot to avoid races
@@ -409,6 +412,7 @@ int disable_nonboot_cpus(void)
 		printk(KERN_ERR "Non-boot CPUs are not disabled\n");
 	}
 	cpu_maps_update_done();
+	stop_machine_destroy();
 	return error;
 }
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
  2009-01-07  9:15     ` Heiko Carstens
@ 2009-01-07  9:19       ` Pekka Enberg
  2009-01-07 11:36         ` Jeff Chua
  0 siblings, 1 reply; 15+ messages in thread
From: Pekka Enberg @ 2009-01-07  9:19 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Justin P. Mattock, linux-kernel, Rusty Russell

On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote:
> I missed to convert disable_nonboot_cpus to
> stop_machine_create/destroy.
> So it's a use before-even-allocated bug.
> 
> The patch below should hopefully fix it:
> 
> Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus
> 
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> 
> disable_nonboot_cpus calls directly _cpu_down. _cpu_down however relies on
> the in advanced created stop_machine kernel threads which should be created
> by the caller (like cpu_down does).
> 
> So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus
> as well.
> 
> Fixes this bug:
> 
> [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
> [  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
> [  286.550598] Oops: 0002 [#1] SMP
> [  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5
> [  286.560580] EIP: is at __stop_machine+0x88/0xe3
> [  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
> [  286.560580] Call Trace:
> [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
> [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
> [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
> [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
> [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
> [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
> [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
> [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
> [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
> [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
> [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
> [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
> [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
> [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
> [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
> 
> Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>

Looks good to me!

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>

> ---
>  kernel/cpu.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/kernel/cpu.c
> ===================================================================
> --- linux-2.6.orig/kernel/cpu.c
> +++ linux-2.6/kernel/cpu.c
> @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus;
>  
>  int disable_nonboot_cpus(void)
>  {
> -	int cpu, first_cpu, error = 0;
> +	int cpu, first_cpu, error;
>  
> +	error = stop_machine_create();
> +	if (error)
> +		return error;
>  	cpu_maps_update_begin();
>  	first_cpu = cpumask_first(cpu_online_mask);
>  	/* We take down all of the non-boot CPUs in one shot to avoid races
> @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void)
>  		printk(KERN_ERR "Non-boot CPUs are not disabled\n");
>  	}
>  	cpu_maps_update_done();
> +	stop_machine_destroy();
>  	return error;
>  }
>  


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
  2009-01-07  9:19       ` Pekka Enberg
@ 2009-01-07 11:36         ` Jeff Chua
  2009-01-07 12:27           ` Heiko Carstens
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Chua @ 2009-01-07 11:36 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Heiko Carstens, Justin P. Mattock, linux-kernel, Rusty Russell

On Wed, Jan 7, 2009 at 5:19 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote:
>> I missed to convert disable_nonboot_cpus to
>> stop_machine_create/destroy.
>> So it's a use before-even-allocated bug.
>> The patch below should hopefully fix it:
>> Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus
> Looks good to me!

This also fixes the suspend-to-ram/disk problem. Without it, the
system will just hang.

Thanks for the fix.

Thanks,
Jeff.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
  2009-01-07 11:36         ` Jeff Chua
@ 2009-01-07 12:27           ` Heiko Carstens
  2009-01-07 13:51             ` Jeff Chua
  0 siblings, 1 reply; 15+ messages in thread
From: Heiko Carstens @ 2009-01-07 12:27 UTC (permalink / raw)
  To: Jeff Chua; +Cc: Pekka Enberg, Justin P. Mattock, linux-kernel, Rusty Russell

On Wed, Jan 07, 2009 at 07:36:57PM +0800, Jeff Chua wrote:
> On Wed, Jan 7, 2009 at 5:19 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> > On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote:
> >> I missed to convert disable_nonboot_cpus to
> >> stop_machine_create/destroy.
> >> So it's a use before-even-allocated bug.
> >> The patch below should hopefully fix it:
> >> Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus
> > Looks good to me!
> 
> This also fixes the suspend-to-ram/disk problem. Without it, the
> system will just hang.
> 
> Thanks for the fix.

Did you also see the reboot problem and does the patch fix it for you?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
  2009-01-07 12:27           ` Heiko Carstens
@ 2009-01-07 13:51             ` Jeff Chua
  2009-01-07 15:19               ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens
  2009-01-07 15:28               ` [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff Chua @ 2009-01-07 13:51 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Pekka Enberg, Justin P. Mattock, linux-kernel, Rusty Russell

On Wed, Jan 7, 2009 at 8:27 PM, Heiko Carstens
<heiko.carstens@de.ibm.com> wrote:
> On Wed, Jan 07, 2009 at 07:36:57PM +0800, Jeff Chua wrote:
>> On Wed, Jan 7, 2009 at 5:19 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
>> > On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote:
>> >> I missed to convert disable_nonboot_cpus to
>> >> stop_machine_create/destroy.
>> >> So it's a use before-even-allocated bug.
>> >> The patch below should hopefully fix it:
>> >> Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus
>> > Looks good to me!
>>
>> This also fixes the suspend-to-ram/disk problem. Without it, the
>> system will just hang.

> Did you also see the reboot problem and does the patch fix it for you?

I never had problem with rebooting. Just suspend hanging which is
really annoying ... walking away and come back hours later and
realized that the suspend is hanging and having to do a hard boot.

Thanks,
Jeff.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus
  2009-01-07 13:51             ` Jeff Chua
@ 2009-01-07 15:19               ` Heiko Carstens
  2009-01-07 15:23                 ` Ingo Molnar
  2009-01-07 15:30                 ` Frédéric Weisbecker
  2009-01-07 15:28               ` [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock
  1 sibling, 2 replies; 15+ messages in thread
From: Heiko Carstens @ 2009-01-07 15:19 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, Rusty Russell
  Cc: Pekka Enberg, Justin P. Mattock, linux-kernel, Jeff Chua

From: Heiko Carstens <heiko.carstens@de.ibm.com>

disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the
caller already created the stop_machine workqueue (like cpu_down does).
Otherwise a call to stop_machine will lead to accesses to random memory
regions.

When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85
"stop_machine: introduce stop_machine_create/destroy") I missed the second
call site of _cpu_down.
So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus
as well.

Fixes suspend-to-ram/disk and also this bug:

[  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
[  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
[  286.550598] Oops: 0002 [#1] SMP
[  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5
[  286.560580] EIP: is at __stop_machine+0x88/0xe3
[  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
[  286.560580] Call Trace:
[  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
[  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
[  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
[  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
[  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
[  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
[  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
[  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
[  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
[  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
[  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
[  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
[  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
[  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
[  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34

Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
 kernel/cpu.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/cpu.c
===================================================================
--- linux-2.6.orig/kernel/cpu.c
+++ linux-2.6/kernel/cpu.c
@@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus;
 
 int disable_nonboot_cpus(void)
 {
-	int cpu, first_cpu, error = 0;
+	int cpu, first_cpu, error;
 
+	error = stop_machine_create();
+	if (error)
+		return error;
 	cpu_maps_update_begin();
 	first_cpu = cpumask_first(cpu_online_mask);
 	/* We take down all of the non-boot CPUs in one shot to avoid races
@@ -409,6 +412,7 @@ int disable_nonboot_cpus(void)
 		printk(KERN_ERR "Non-boot CPUs are not disabled\n");
 	}
 	cpu_maps_update_done();
+	stop_machine_destroy();
 	return error;
 }
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus
  2009-01-07 15:19               ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens
@ 2009-01-07 15:23                 ` Ingo Molnar
  2009-01-07 15:30                 ` Frédéric Weisbecker
  1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2009-01-07 15:23 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Linus Torvalds, Andrew Morton, Rusty Russell, Pekka Enberg,
	Justin P. Mattock, linux-kernel, Jeff Chua


* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> 
> disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the 
> caller already created the stop_machine workqueue (like cpu_down does). 
> Otherwise a call to stop_machine will lead to accesses to random memory 
> regions.

btw., i got this crash earlier today:

CPU0 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 0 1
CPU1 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 1 0
eth0: no IPv6 routers present
BUG: Bad page state in process cc1  pfn:00879
page:c101b894 flags:00000400 count:0 mapcount:0 mapping:(null) index:0
Pid: 3060, comm: cc1 Not tainted 2.6.28-tip-07641-gb97d41d-dirty #14985
Call Trace:
 [<c016ce8b>] bad_page+0xcf/0xe5
 [<c016d3b4>] free_pages_check+0xa7/0xc5
 [<c016d400>] free_hot_cold_page+0x2e/0x138
 [<c014751c>] ? __lock_acquire+0x127/0x29d
 [<c016d558>] free_hot_page+0xf/0x11
 [<c0170963>] put_page+0x76/0x7c
 [<c0185071>] ? constant_test_bit+0x9/0x20
 [<c0187149>] kfree+0x30/0xe5
 [<c0164993>] ? trace_hardirqs_on+0x8/0x1c
 [<c01547dd>] free_user_ns+0x1d/0x20
 [<c01547c0>] ? free_user_ns+0x0/0x20
 [<c02c7a41>] kref_put+0x18/0x22
 [<c0132d4c>] put_user_ns+0x16/0x18
 [<c0132f52>] free_uid+0x59/0xc8
 [<c0136239>] ? groups_free+0x36/0x3a
 [<c0140406>] put_cred_rcu+0x5f/0x70
 [<c01598fb>] __rcu_process_callbacks+0x168/0x1f8
 [<c03031be>] ? isicom_tx+0x0/0x31f
 [<c01599b1>] rcu_process_callbacks+0x26/0x46
 [<c012f11d>] __do_softirq+0x9d/0x139
 [<c012f080>] ? __do_softirq+0x0/0x139
 <IRQ>  [<c012efe2>] ? irq_exit+0x4c/0x83
 [<c05cc586>] ? __irqentry_text_start+0x6e/0x7c
 [<c0103f61>] ? apic_timer_interrupt+0x2d/0x34

and i applied your patch (from the other thread) and never saw this bug 
again.

So if it's the same bug (it appears to be) then you have my:

Tested-by: Ingo Molnar <mingo@elte.hu>


	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
  2009-01-07 13:51             ` Jeff Chua
  2009-01-07 15:19               ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens
@ 2009-01-07 15:28               ` Justin P. Mattock
  1 sibling, 0 replies; 15+ messages in thread
From: Justin P. Mattock @ 2009-01-07 15:28 UTC (permalink / raw)
  To: Jeff Chua; +Cc: Heiko Carstens, Pekka Enberg, linux-kernel, Rusty Russell

Jeff Chua wrote:
> On Wed, Jan 7, 2009 at 8:27 PM, Heiko Carstens
> <heiko.carstens@de.ibm.com> wrote:
>   
>> On Wed, Jan 07, 2009 at 07:36:57PM +0800, Jeff Chua wrote:
>>     
>>> On Wed, Jan 7, 2009 at 5:19 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
>>>       
>>>> On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote:
>>>>         
>>>>> I missed to convert disable_nonboot_cpus to
>>>>> stop_machine_create/destroy.
>>>>> So it's a use before-even-allocated bug.
>>>>> The patch below should hopefully fix it:
>>>>> Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus
>>>>>           
>>>> Looks good to me!
>>>>         
>>> This also fixes the suspend-to-ram/disk problem. Without it, the
>>> system will just hang.
>>>       
>
>   
>> Did you also see the reboot problem and does the patch fix it for you?
>>     
>
> I never had problem with rebooting. Just suspend hanging which is
> really annoying ... walking away and come back hours later and
> realized that the suspend is hanging and having to do a hard boot.
>
> Thanks,
> Jeff.
>
>   
Man!!  I missed this whole conversation
(passed out, too tired);
I'll go ahead and apply the patch and let you
know If I get the freeze at shutdown.
Then I'm still interested in knowing
how to take a debug messages and dissect
it to find the exact location of the problem.
(but could do that later);

regards;

Justin P. Mattock

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus
  2009-01-07 15:19               ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens
  2009-01-07 15:23                 ` Ingo Molnar
@ 2009-01-07 15:30                 ` Frédéric Weisbecker
  2009-01-07 15:52                   ` Justin P. Mattock
  2009-01-08  5:13                   ` Justin P. Mattock
  1 sibling, 2 replies; 15+ messages in thread
From: Frédéric Weisbecker @ 2009-01-07 15:30 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Linus Torvalds, Andrew Morton, Rusty Russell, Pekka Enberg,
	Justin P. Mattock, linux-kernel, Jeff Chua

2009/1/7 Heiko Carstens <heiko.carstens@de.ibm.com>:
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the
> caller already created the stop_machine workqueue (like cpu_down does).
> Otherwise a call to stop_machine will lead to accesses to random memory
> regions.
>
> When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85
> "stop_machine: introduce stop_machine_create/destroy") I missed the second
> call site of _cpu_down.
> So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus
> as well.
>
> Fixes suspend-to-ram/disk and also this bug:
>
> [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
> [  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
> [  286.550598] Oops: 0002 [#1] SMP
> [  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5
> [  286.560580] EIP: is at __stop_machine+0x88/0xe3
> [  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
> [  286.560580] Call Trace:
> [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
> [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
> [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
> [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
> [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
> [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
> [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
> [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
> [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
> [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
> [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
> [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
> [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
> [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
> [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
>
> Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> ---
>  kernel/cpu.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/kernel/cpu.c
> ===================================================================
> --- linux-2.6.orig/kernel/cpu.c
> +++ linux-2.6/kernel/cpu.c
> @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus;
>
>  int disable_nonboot_cpus(void)
>  {
> -       int cpu, first_cpu, error = 0;
> +       int cpu, first_cpu, error;
>
> +       error = stop_machine_create();
> +       if (error)
> +               return error;
>        cpu_maps_update_begin();
>        first_cpu = cpumask_first(cpu_online_mask);
>        /* We take down all of the non-boot CPUs in one shot to avoid races
> @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void)
>                printk(KERN_ERR "Non-boot CPUs are not disabled\n");
>        }
>        cpu_maps_update_done();
> +       stop_machine_destroy();
>        return error;
>  }
>


That should explain why suspend to disk failed on my box yesterday on
the processors stage...
Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus
  2009-01-07 15:30                 ` Frédéric Weisbecker
@ 2009-01-07 15:52                   ` Justin P. Mattock
  2009-01-08  5:13                   ` Justin P. Mattock
  1 sibling, 0 replies; 15+ messages in thread
From: Justin P. Mattock @ 2009-01-07 15:52 UTC (permalink / raw)
  To: Frédéric Weisbecker
  Cc: Heiko Carstens, Linus Torvalds, Andrew Morton, Rusty Russell,
	Pekka Enberg, linux-kernel, Jeff Chua

Frédéric Weisbecker wrote:
> 2009/1/7 Heiko Carstens <heiko.carstens@de.ibm.com>:
>   
>> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>>
>> disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the
>> caller already created the stop_machine workqueue (like cpu_down does).
>> Otherwise a call to stop_machine will lead to accesses to random memory
>> regions.
>>
>> When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85
>> "stop_machine: introduce stop_machine_create/destroy") I missed the second
>> call site of _cpu_down.
>> So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus
>> as well.
>>
>> Fixes suspend-to-ram/disk and also this bug:
>>
>> [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
>> [  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
>> [  286.550598] Oops: 0002 [#1] SMP
>> [  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5
>> [  286.560580] EIP: is at __stop_machine+0x88/0xe3
>> [  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
>> [  286.560580] Call Trace:
>> [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
>> [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
>> [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
>> [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
>> [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
>> [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
>> [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
>> [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
>> [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
>> [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
>> [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
>> [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
>> [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
>> [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
>> [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
>>
>> Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
>> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
>> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>> ---
>>  kernel/cpu.c |    6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> Index: linux-2.6/kernel/cpu.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/cpu.c
>> +++ linux-2.6/kernel/cpu.c
>> @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus;
>>
>>  int disable_nonboot_cpus(void)
>>  {
>> -       int cpu, first_cpu, error = 0;
>> +       int cpu, first_cpu, error;
>>
>> +       error = stop_machine_create();
>> +       if (error)
>> +               return error;
>>        cpu_maps_update_begin();
>>        first_cpu = cpumask_first(cpu_online_mask);
>>        /* We take down all of the non-boot CPUs in one shot to avoid races
>> @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void)
>>                printk(KERN_ERR "Non-boot CPUs are not disabled\n");
>>        }
>>        cpu_maps_update_done();
>> +       stop_machine_destroy();
>>        return error;
>>  }
>>
>>     
>
>
> That should explain why suspend to disk failed on my box yesterday on
> the processors stage...
> Thanks!
>
>   
O.K. applied the patch,
and shutdown the machine
a few times; no freeze, no bug message.
sweet!!.
Now I'm gonna try and dismantle
a bug message for educational purposes.
Thanks for the assistance.

regards;

Justin P. Mattock

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus
  2009-01-07 15:30                 ` Frédéric Weisbecker
  2009-01-07 15:52                   ` Justin P. Mattock
@ 2009-01-08  5:13                   ` Justin P. Mattock
  1 sibling, 0 replies; 15+ messages in thread
From: Justin P. Mattock @ 2009-01-08  5:13 UTC (permalink / raw)
  To: Frédéric Weisbecker
  Cc: Heiko Carstens, Linus Torvalds, Andrew Morton, Rusty Russell,
	Pekka Enberg, linux-kernel, Jeff Chua

Frédéric Weisbecker wrote:
> 2009/1/7 Heiko Carstens <heiko.carstens@de.ibm.com>:
>   
>> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>>
>> disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the
>> caller already created the stop_machine workqueue (like cpu_down does).
>> Otherwise a call to stop_machine will lead to accesses to random memory
>> regions.
>>
>> When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85
>> "stop_machine: introduce stop_machine_create/destroy") I missed the second
>> call site of _cpu_down.
>> So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus
>> as well.
>>
>> Fixes suspend-to-ram/disk and also this bug:
>>
>> [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
>> [  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
>> [  286.550598] Oops: 0002 [#1] SMP
>> [  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5
>> [  286.560580] EIP: is at __stop_machine+0x88/0xe3
>> [  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
>> [  286.560580] Call Trace:
>> [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
>> [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
>> [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
>> [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
>> [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
>> [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
>> [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
>> [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
>> [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
>> [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
>> [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
>> [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
>> [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
>> [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
>> [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
>>
>> Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
>> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
>> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>> ---
>>  kernel/cpu.c |    6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> Index: linux-2.6/kernel/cpu.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/cpu.c
>> +++ linux-2.6/kernel/cpu.c
>> @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus;
>>
>>  int disable_nonboot_cpus(void)
>>  {
>> -       int cpu, first_cpu, error = 0;
>> +       int cpu, first_cpu, error;
>>
>> +       error = stop_machine_create();
>> +       if (error)
>> +               return error;
>>        cpu_maps_update_begin();
>>        first_cpu = cpumask_first(cpu_online_mask);
>>        /* We take down all of the non-boot CPUs in one shot to avoid races
>> @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void)
>>                printk(KERN_ERR "Non-boot CPUs are not disabled\n");
>>        }
>>        cpu_maps_update_done();
>> +       stop_machine_destroy();
>>        return error;
>>  }
>>
>>     
>
>
> That should explain why suspend to disk failed on my box yesterday on
> the processors stage...
> Thanks!
>
>   
I hate to ask this, but I'm going to
anyway:
 when running
gdb /usr/src/linux/vmlinux
(hoping to see if gdb will catch the bug);
I keep getting:
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
You can't do that without a process to debug.

if i do a:
(gdb) disassemble __stop_machine
(as described in Documentation);
I'll see a bit of info.

How do I start/or figure out a process
to debug? i.g. under the bug message
that I wrote down, it says Pid: 3273
entering that in (gdb) r 3273
results in a SIGKILL.

regards;

Justin P. Mattock




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-01-08  5:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-07  0:12 [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock
2009-01-07  6:48 ` Pekka Enberg
2009-01-07  8:13   ` Justin P. Mattock
2009-01-07  8:30   ` Pekka Enberg
2009-01-07  9:15     ` Heiko Carstens
2009-01-07  9:19       ` Pekka Enberg
2009-01-07 11:36         ` Jeff Chua
2009-01-07 12:27           ` Heiko Carstens
2009-01-07 13:51             ` Jeff Chua
2009-01-07 15:19               ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens
2009-01-07 15:23                 ` Ingo Molnar
2009-01-07 15:30                 ` Frédéric Weisbecker
2009-01-07 15:52                   ` Justin P. Mattock
2009-01-08  5:13                   ` Justin P. Mattock
2009-01-07 15:28               ` [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox