* [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b @ 2009-01-07 0:12 Justin P. Mattock 2009-01-07 6:48 ` Pekka Enberg 0 siblings, 1 reply; 15+ messages in thread From: Justin P. Mattock @ 2009-01-07 0:12 UTC (permalink / raw) To: linux-kernel With pulling git today I'm unable to shut the machine down completely. (the system just sits there with the message on the screen); * will now halt [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 [ 286.550598] Oops: 0002 [#1] SMP [ 286.552206] last sysfs file: /sys/block/sda/removeable [ 286.553844] Modules linked in: hidp radeon drm agpgart btusb rfcomm bnep sco l2cap bluetooth fan battery container ipt_LOG xt_limit xt_tcpudp xt_state ipt_addrtype nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat nf_conntrack_ftp ipmi_watchdog ipmi_msghandler uvcvideo isight_firmware uinput arpt_mangle arptable_filter arp_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables coretemp eeprom acpi_cpufreq cpufreq_powersave cpufreq_performance cpufreq_ondemand cpufreq_conservative appletouch snd_had_codec_idt ohci1394 ehci_hcd snd_hda_intel snd_hda_codec thermal ath9k uhci_hcd ieee1394 joydev pata_acpi snd_hwdep snd_pcm snd_page_alloc video ac button processor applesmc evdev [ 286.560580] [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 #1) MacBookPro2,2 [ 286.560580] EIP: 0060:[<c0150ca4>] EFLAGS: 00010293 CPU: 0 [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 [ 286.560580] EAX: 6b6b6b6b EBX: 00000000 ECX: 6b6b6b6b EDX: 00000000 [ 286.560580] ESI: c054abe0 EDI: c03d03a4 EBP: f1a29e54 ESP: f1a29e44 [ 286.560580] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 task.ti=f1a28000) [ 286.560580] Stack: [ 286.560580] f1a29e60 c054abe0 00000001 00000010 f1a29e7c c03d04e4 ffffffea 00000010 [ 286.560580] 00000001 00000003 00000022 00000001 4321fedc c054abe0 f1a29e94 c012a57e [ 286.560580] 00000000 ffffffff 4321fedc 28121969 f1a29e9c c01360c0 f1a29fb0 c0136301 [ 286.560580] Call Trace: [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 [ 286.560580] Code: c7 05 10 06 62 c0 00 00 00 00 a3 f4 05 62 c0 c7 05 ec 05 62 c0 01 00 00 00 83 cb ff eb 2d a1 1c 06 62 c0 f7 d0 8b 0c 98 8d 41 04 <c7> 01 00 00 00 00 89 41 04 89 41 08 c7 41 0c ff 0c 15 c0 89 d8 [ 286.560580] EIP: [<c0150ca4>] __stop_machine+0x88/0xe3 SS:ESP 0068:f1a29e44 [ 286.639215] ---[ end trace 5b080c1ab14203ae ] --- Segmentation fault after this message appears, if I hold down the start button the system shuts off after a few seconds. (BTW hopefully the number are correct, manually writing this down, is a bit of a pain); regards; Justin P. Mattock ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b 2009-01-07 0:12 [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock @ 2009-01-07 6:48 ` Pekka Enberg 2009-01-07 8:13 ` Justin P. Mattock 2009-01-07 8:30 ` Pekka Enberg 0 siblings, 2 replies; 15+ messages in thread From: Pekka Enberg @ 2009-01-07 6:48 UTC (permalink / raw) To: Justin P. Mattock; +Cc: linux-kernel, Rusty Russell Hi Justin, On Wed, Jan 7, 2009 at 2:12 AM, Justin P. Mattock <justinmattock@gmail.com> wrote: > With pulling git today I'm unable to shut the machine down completely. > (the system just sits there with the message on the screen); > > * will now halt > [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b That looks like use-after-free in __stop_machine() so lets cc Rusty. If you want, you can convert the oops location into human-readable form. Just search for "GDB" in Documentation/BUG-HUNTING for instructions how to do that. And don't forget to send your .config. > [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 > [ 286.550598] Oops: 0002 [#1] SMP > [ 286.552206] last sysfs file: /sys/block/sda/removeable > [ 286.553844] Modules linked in: hidp radeon drm agpgart btusb rfcomm bnep > sco l2cap bluetooth fan battery container ipt_LOG xt_limit xt_tcpudp > xt_state ipt_addrtype nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat > nf_conntrack_ftp ipmi_watchdog ipmi_msghandler uvcvideo isight_firmware > uinput arpt_mangle arptable_filter arp_tables nf_conntrack_ipv4 nf_conntrack > nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables coretemp > eeprom acpi_cpufreq cpufreq_powersave cpufreq_performance cpufreq_ondemand > cpufreq_conservative appletouch snd_had_codec_idt ohci1394 ehci_hcd > snd_hda_intel snd_hda_codec thermal ath9k uhci_hcd ieee1394 joydev pata_acpi > snd_hwdep snd_pcm snd_page_alloc video ac button processor applesmc evdev > [ 286.560580] > [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 #1) > MacBookPro2,2 > [ 286.560580] EIP: 0060:[<c0150ca4>] EFLAGS: 00010293 CPU: 0 > [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 > [ 286.560580] EAX: 6b6b6b6b EBX: 00000000 ECX: 6b6b6b6b EDX: 00000000 > [ 286.560580] ESI: c054abe0 EDI: c03d03a4 EBP: f1a29e54 ESP: f1a29e44 > [ 286.560580] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 > task.ti=f1a28000) > [ 286.560580] Stack: > [ 286.560580] f1a29e60 c054abe0 00000001 00000010 f1a29e7c c03d04e4 > ffffffea 00000010 > [ 286.560580] 00000001 00000003 00000022 00000001 4321fedc c054abe0 > f1a29e94 c012a57e > [ 286.560580] 00000000 ffffffff 4321fedc 28121969 f1a29e9c c01360c0 > f1a29fb0 c0136301 > [ 286.560580] Call Trace: > [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 > [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc > [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 > [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c > [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 > [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 > [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c > [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 > [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a > [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a > [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 > [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 > [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 > [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 > [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 > [ 286.560580] Code: c7 05 10 06 62 c0 00 00 00 00 a3 f4 05 62 c0 c7 05 ec > 05 62 c0 01 00 00 00 83 cb ff eb 2d a1 1c 06 62 c0 f7 d0 8b 0c 98 8d 41 04 > <c7> 01 00 00 00 00 89 41 04 89 41 08 c7 41 0c ff 0c 15 c0 89 d8 > [ 286.560580] EIP: [<c0150ca4>] __stop_machine+0x88/0xe3 SS:ESP > 0068:f1a29e44 > [ 286.639215] ---[ end trace 5b080c1ab14203ae ] --- > Segmentation fault > > after this message appears, if I hold down the start button > the system shuts off after a few seconds. > (BTW hopefully the number are correct, > manually writing this down, is a bit of a pain); > > regards; > > Justin P. Mattock > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b 2009-01-07 6:48 ` Pekka Enberg @ 2009-01-07 8:13 ` Justin P. Mattock 2009-01-07 8:30 ` Pekka Enberg 1 sibling, 0 replies; 15+ messages in thread From: Justin P. Mattock @ 2009-01-07 8:13 UTC (permalink / raw) To: Pekka Enberg; +Cc: linux-kernel, Rusty Russell Pekka Enberg wrote: > Hi Justin, > > On Wed, Jan 7, 2009 at 2:12 AM, Justin P. Mattock > <justinmattock@gmail.com> wrote: > >> With pulling git today I'm unable to shut the machine down completely. >> (the system just sits there with the message on the screen); >> >> * will now halt >> [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b >> > > That looks like use-after-free in __stop_machine() so lets cc Rusty. > If you want, you can convert the oops location into human-readable > form. Just search for "GDB" in Documentation/BUG-HUNTING for > instructions how to do that. And don't forget to send your .config. > > >> [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 >> [ 286.550598] Oops: 0002 [#1] SMP >> [ 286.552206] last sysfs file: /sys/block/sda/removeable >> [ 286.553844] Modules linked in: hidp radeon drm agpgart btusb rfcomm bnep >> sco l2cap bluetooth fan battery container ipt_LOG xt_limit xt_tcpudp >> xt_state ipt_addrtype nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat >> nf_conntrack_ftp ipmi_watchdog ipmi_msghandler uvcvideo isight_firmware >> uinput arpt_mangle arptable_filter arp_tables nf_conntrack_ipv4 nf_conntrack >> nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables coretemp >> eeprom acpi_cpufreq cpufreq_powersave cpufreq_performance cpufreq_ondemand >> cpufreq_conservative appletouch snd_had_codec_idt ohci1394 ehci_hcd >> snd_hda_intel snd_hda_codec thermal ath9k uhci_hcd ieee1394 joydev pata_acpi >> snd_hwdep snd_pcm snd_page_alloc video ac button processor applesmc evdev >> [ 286.560580] >> [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 #1) >> MacBookPro2,2 >> [ 286.560580] EIP: 0060:[<c0150ca4>] EFLAGS: 00010293 CPU: 0 >> [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 >> [ 286.560580] EAX: 6b6b6b6b EBX: 00000000 ECX: 6b6b6b6b EDX: 00000000 >> [ 286.560580] ESI: c054abe0 EDI: c03d03a4 EBP: f1a29e54 ESP: f1a29e44 >> [ 286.560580] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 >> [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 >> task.ti=f1a28000) >> [ 286.560580] Stack: >> [ 286.560580] f1a29e60 c054abe0 00000001 00000010 f1a29e7c c03d04e4 >> ffffffea 00000010 >> [ 286.560580] 00000001 00000003 00000022 00000001 4321fedc c054abe0 >> f1a29e94 c012a57e >> [ 286.560580] 00000000 ffffffff 4321fedc 28121969 f1a29e9c c01360c0 >> f1a29fb0 c0136301 >> [ 286.560580] Call Trace: >> [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 >> [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc >> [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 >> [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c >> [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 >> [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 >> [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c >> [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 >> [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a >> [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a >> [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 >> [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 >> [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 >> [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 >> [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 >> [ 286.560580] Code: c7 05 10 06 62 c0 00 00 00 00 a3 f4 05 62 c0 c7 05 ec >> 05 62 c0 01 00 00 00 83 cb ff eb 2d a1 1c 06 62 c0 f7 d0 8b 0c 98 8d 41 04 >> <c7> 01 00 00 00 00 89 41 04 89 41 08 c7 41 0c ff 0c 15 c0 89 d8 >> [ 286.560580] EIP: [<c0150ca4>] __stop_machine+0x88/0xe3 SS:ESP >> 0068:f1a29e44 >> [ 286.639215] ---[ end trace 5b080c1ab14203ae ] --- >> Segmentation fault >> >> after this message appears, if I hold down the start button >> the system shuts off after a few seconds. >> (BTW hopefully the number are correct, >> manually writing this down, is a bit of a pain); >> >> regards; >> >> Justin P. Mattock >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> >> > > Thats nice, thanks for the info. I like the idea of using gcc to disassemble this text. Since not knowing what I'm doing, I'll have to do my homework on this. (really curious to see what this does); regards; Justin P. Mattock ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b 2009-01-07 6:48 ` Pekka Enberg 2009-01-07 8:13 ` Justin P. Mattock @ 2009-01-07 8:30 ` Pekka Enberg 2009-01-07 9:15 ` Heiko Carstens 1 sibling, 1 reply; 15+ messages in thread From: Pekka Enberg @ 2009-01-07 8:30 UTC (permalink / raw) To: Justin P. Mattock; +Cc: linux-kernel, Rusty Russell, heiko.carstens On Wed, Jan 7, 2009 at 8:48 AM, Pekka Enberg <penberg@cs.helsinki.fi> wrote: > On Wed, Jan 7, 2009 at 2:12 AM, Justin P. Mattock > <justinmattock@gmail.com> wrote: >> With pulling git today I'm unable to shut the machine down completely. >> (the system just sits there with the message on the screen); >> >> * will now halt >> [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b > > That looks like use-after-free in __stop_machine() so lets cc Rusty. > If you want, you can convert the oops location into human-readable > form. Just search for "GDB" in Documentation/BUG-HUNTING for > instructions how to do that. And don't forget to send your .config. > >> [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 >> [ 286.550598] Oops: 0002 [#1] SMP >> [ 286.552206] last sysfs file: /sys/block/sda/removeable >> [ 286.553844] Modules linked in: hidp radeon drm agpgart btusb rfcomm bnep >> sco l2cap bluetooth fan battery container ipt_LOG xt_limit xt_tcpudp >> xt_state ipt_addrtype nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat >> nf_conntrack_ftp ipmi_watchdog ipmi_msghandler uvcvideo isight_firmware >> uinput arpt_mangle arptable_filter arp_tables nf_conntrack_ipv4 nf_conntrack >> nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables coretemp >> eeprom acpi_cpufreq cpufreq_powersave cpufreq_performance cpufreq_ondemand >> cpufreq_conservative appletouch snd_had_codec_idt ohci1394 ehci_hcd >> snd_hda_intel snd_hda_codec thermal ath9k uhci_hcd ieee1394 joydev pata_acpi >> snd_hwdep snd_pcm snd_page_alloc video ac button processor applesmc evdev >> [ 286.560580] >> [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 #1) >> MacBookPro2,2 >> [ 286.560580] EIP: 0060:[<c0150ca4>] EFLAGS: 00010293 CPU: 0 >> [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 >> [ 286.560580] EAX: 6b6b6b6b EBX: 00000000 ECX: 6b6b6b6b EDX: 00000000 >> [ 286.560580] ESI: c054abe0 EDI: c03d03a4 EBP: f1a29e54 ESP: f1a29e44 >> [ 286.560580] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 >> [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 >> task.ti=f1a28000) >> [ 286.560580] Stack: >> [ 286.560580] f1a29e60 c054abe0 00000001 00000010 f1a29e7c c03d04e4 >> ffffffea 00000010 >> [ 286.560580] 00000001 00000003 00000022 00000001 4321fedc c054abe0 >> f1a29e94 c012a57e >> [ 286.560580] 00000000 ffffffff 4321fedc 28121969 f1a29e9c c01360c0 >> f1a29fb0 c0136301 >> [ 286.560580] Call Trace: >> [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 >> [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc >> [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 >> [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c >> [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 >> [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 >> [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c >> [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 >> [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a >> [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a >> [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 >> [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 >> [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 >> [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 >> [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 >> [ 286.560580] Code: c7 05 10 06 62 c0 00 00 00 00 a3 f4 05 62 c0 c7 05 ec >> 05 62 c0 01 00 00 00 83 cb ff eb 2d a1 1c 06 62 c0 f7 d0 8b 0c 98 8d 41 04 >> <c7> 01 00 00 00 00 89 41 04 89 41 08 c7 41 0c ff 0c 15 c0 89 d8 >> [ 286.560580] EIP: [<c0150ca4>] __stop_machine+0x88/0xe3 SS:ESP >> 0068:f1a29e44 >> [ 286.639215] ---[ end trace 5b080c1ab14203ae ] --- >> Segmentation fault >> >> after this message appears, if I hold down the start button >> the system shuts off after a few seconds. >> (BTW hopefully the number are correct, >> manually writing this down, is a bit of a pain); scripts/decodecode gives us: 0: c7 05 10 06 62 c0 00 movl $0x0,0xc0620610 7: 00 00 00 a: a3 f4 05 62 c0 mov %eax,0xc06205f4 f: c7 05 ec 05 62 c0 01 movl $0x1,0xc06205ec 16: 00 00 00 19: 83 cb ff or $0xffffffff,%ebx 1c: eb 2d jmp 0x4b 1e: a1 1c 06 62 c0 mov 0xc062061c,%eax 23: f7 d0 not %eax 25: 8b 0c 98 mov (%eax,%ebx,4),%ecx 28: 8d 41 04 lea 0x4(%ecx),%eax 2b: c7 01 00 00 00 00 movl $0x0,(%ecx) 31: 89 41 04 mov %eax,0x4(%ecx) 34: 89 41 08 mov %eax,0x8(%ecx) 37: c7 41 0c ff 0c 15 c0 movl $0xc0150cff,0xc(%ecx) 3e: 89 d8 mov %ebx,%eax 0: c7 01 00 00 00 00 movl $0x0,(%ecx) <-- oops 6: 89 41 04 mov %eax,0x4(%ecx) 9: 89 41 08 mov %eax,0x8(%ecx) c: c7 41 0c ff 0c 15 c0 movl $0xc0150cff,0xc(%ecx) 13: 89 d8 mov %ebx,%eax objdump -S -d kernel/stop_machine.o looks like this on my machine: /* Schedule the stop_cpu work on all cpus: hold this CPU so one * doesn't hit this CPU until we're ready. */ get_cpu(); for_each_online_cpu(i) { sm_work = percpu_ptr(stop_machine_work, i); INIT_WORK(sm_work, stop_cpu); 8b: c7 01 00 00 00 00 movl $0x0,(%ecx) 91: c7 41 0c e0 00 00 00 movl $0xe0,0xc(%ecx) where #define INIT_WORK(_work, _func) \ do { \ (_work)->data = (atomic_long_t) WORK_DATA_INIT(); \ and offset of ->data is zero and WORK_DATA_INIT() expands to ATOMIC_LONG_INIT(0) so looks to me like 'sm_work' is used after it has been free'd. Perhaps stop_machine_destroy() was called before or in parallel to __stop_machine()? So lets cc Heiko as well. Pekka ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b 2009-01-07 8:30 ` Pekka Enberg @ 2009-01-07 9:15 ` Heiko Carstens 2009-01-07 9:19 ` Pekka Enberg 0 siblings, 1 reply; 15+ messages in thread From: Heiko Carstens @ 2009-01-07 9:15 UTC (permalink / raw) To: Pekka Enberg; +Cc: Justin P. Mattock, linux-kernel, Rusty Russell On Wed, Jan 07, 2009 at 10:30:56AM +0200, Pekka Enberg wrote: > On Wed, Jan 7, 2009 at 8:48 AM, Pekka Enberg <penberg@cs.helsinki.fi> wrote: > > On Wed, Jan 7, 2009 at 2:12 AM, Justin P. Mattock > > <justinmattock@gmail.com> wrote: > >> With pulling git today I'm unable to shut the machine down completely. > >> (the system just sits there with the message on the screen); [...] > >> [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b > > > > That looks like use-after-free in __stop_machine() so lets cc Rusty. > > If you want, you can convert the oops location into human-readable > > form. Just search for "GDB" in Documentation/BUG-HUNTING for > > instructions how to do that. And don't forget to send your .config. [...] > >> [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 > >> [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 > >> [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc > >> [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 > >> [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c > >> [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 > >> [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 > >> [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c > >> [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 > >> [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a > >> [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a > >> [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 > >> [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 > >> [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 > >> [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 > >> [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 [...] > and offset of ->data is zero and WORK_DATA_INIT() expands to > ATOMIC_LONG_INIT(0) so looks to me like 'sm_work' is used after it has > been free'd. Perhaps stop_machine_destroy() was called before or in > parallel to __stop_machine()? So lets cc Heiko as well. I missed to convert disable_nonboot_cpus to stop_machine_create/destroy. So it's a use before-even-allocated bug. The patch below should hopefully fix it: Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus From: Heiko Carstens <heiko.carstens@de.ibm.com> disable_nonboot_cpus calls directly _cpu_down. _cpu_down however relies on the in advanced created stop_machine kernel threads which should be created by the caller (like cpu_down does). So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus as well. Fixes this bug: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 [ 286.550598] Oops: 0002 [#1] SMP [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 [ 286.560580] Call Trace: [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 Reported-by: "Justin P. Mattock" <justinmattock@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> --- kernel/cpu.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/cpu.c =================================================================== --- linux-2.6.orig/kernel/cpu.c +++ linux-2.6/kernel/cpu.c @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus; int disable_nonboot_cpus(void) { - int cpu, first_cpu, error = 0; + int cpu, first_cpu, error; + error = stop_machine_create(); + if (error) + return error; cpu_maps_update_begin(); first_cpu = cpumask_first(cpu_online_mask); /* We take down all of the non-boot CPUs in one shot to avoid races @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void) printk(KERN_ERR "Non-boot CPUs are not disabled\n"); } cpu_maps_update_done(); + stop_machine_destroy(); return error; } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b 2009-01-07 9:15 ` Heiko Carstens @ 2009-01-07 9:19 ` Pekka Enberg 2009-01-07 11:36 ` Jeff Chua 0 siblings, 1 reply; 15+ messages in thread From: Pekka Enberg @ 2009-01-07 9:19 UTC (permalink / raw) To: Heiko Carstens; +Cc: Justin P. Mattock, linux-kernel, Rusty Russell On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote: > I missed to convert disable_nonboot_cpus to > stop_machine_create/destroy. > So it's a use before-even-allocated bug. > > The patch below should hopefully fix it: > > Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus > > From: Heiko Carstens <heiko.carstens@de.ibm.com> > > disable_nonboot_cpus calls directly _cpu_down. _cpu_down however relies on > the in advanced created stop_machine kernel threads which should be created > by the caller (like cpu_down does). > > So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus > as well. > > Fixes this bug: > > [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b > [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 > [ 286.550598] Oops: 0002 [#1] SMP > [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 > [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 > [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 > [ 286.560580] Call Trace: > [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 > [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc > [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 > [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c > [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 > [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 > [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c > [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 > [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a > [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a > [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 > [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 > [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 > [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 > [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 > > Reported-by: "Justin P. Mattock" <justinmattock@gmail.com> > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Looks good to me! Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> > --- > kernel/cpu.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > Index: linux-2.6/kernel/cpu.c > =================================================================== > --- linux-2.6.orig/kernel/cpu.c > +++ linux-2.6/kernel/cpu.c > @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus; > > int disable_nonboot_cpus(void) > { > - int cpu, first_cpu, error = 0; > + int cpu, first_cpu, error; > > + error = stop_machine_create(); > + if (error) > + return error; > cpu_maps_update_begin(); > first_cpu = cpumask_first(cpu_online_mask); > /* We take down all of the non-boot CPUs in one shot to avoid races > @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void) > printk(KERN_ERR "Non-boot CPUs are not disabled\n"); > } > cpu_maps_update_done(); > + stop_machine_destroy(); > return error; > } > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b 2009-01-07 9:19 ` Pekka Enberg @ 2009-01-07 11:36 ` Jeff Chua 2009-01-07 12:27 ` Heiko Carstens 0 siblings, 1 reply; 15+ messages in thread From: Jeff Chua @ 2009-01-07 11:36 UTC (permalink / raw) To: Pekka Enberg Cc: Heiko Carstens, Justin P. Mattock, linux-kernel, Rusty Russell On Wed, Jan 7, 2009 at 5:19 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote: > On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote: >> I missed to convert disable_nonboot_cpus to >> stop_machine_create/destroy. >> So it's a use before-even-allocated bug. >> The patch below should hopefully fix it: >> Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus > Looks good to me! This also fixes the suspend-to-ram/disk problem. Without it, the system will just hang. Thanks for the fix. Thanks, Jeff. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b 2009-01-07 11:36 ` Jeff Chua @ 2009-01-07 12:27 ` Heiko Carstens 2009-01-07 13:51 ` Jeff Chua 0 siblings, 1 reply; 15+ messages in thread From: Heiko Carstens @ 2009-01-07 12:27 UTC (permalink / raw) To: Jeff Chua; +Cc: Pekka Enberg, Justin P. Mattock, linux-kernel, Rusty Russell On Wed, Jan 07, 2009 at 07:36:57PM +0800, Jeff Chua wrote: > On Wed, Jan 7, 2009 at 5:19 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote: > > On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote: > >> I missed to convert disable_nonboot_cpus to > >> stop_machine_create/destroy. > >> So it's a use before-even-allocated bug. > >> The patch below should hopefully fix it: > >> Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus > > Looks good to me! > > This also fixes the suspend-to-ram/disk problem. Without it, the > system will just hang. > > Thanks for the fix. Did you also see the reboot problem and does the patch fix it for you? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b 2009-01-07 12:27 ` Heiko Carstens @ 2009-01-07 13:51 ` Jeff Chua 2009-01-07 15:19 ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens 2009-01-07 15:28 ` [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock 0 siblings, 2 replies; 15+ messages in thread From: Jeff Chua @ 2009-01-07 13:51 UTC (permalink / raw) To: Heiko Carstens Cc: Pekka Enberg, Justin P. Mattock, linux-kernel, Rusty Russell On Wed, Jan 7, 2009 at 8:27 PM, Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > On Wed, Jan 07, 2009 at 07:36:57PM +0800, Jeff Chua wrote: >> On Wed, Jan 7, 2009 at 5:19 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote: >> > On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote: >> >> I missed to convert disable_nonboot_cpus to >> >> stop_machine_create/destroy. >> >> So it's a use before-even-allocated bug. >> >> The patch below should hopefully fix it: >> >> Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus >> > Looks good to me! >> >> This also fixes the suspend-to-ram/disk problem. Without it, the >> system will just hang. > Did you also see the reboot problem and does the patch fix it for you? I never had problem with rebooting. Just suspend hanging which is really annoying ... walking away and come back hours later and realized that the suspend is hanging and having to do a hard boot. Thanks, Jeff. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus 2009-01-07 13:51 ` Jeff Chua @ 2009-01-07 15:19 ` Heiko Carstens 2009-01-07 15:23 ` Ingo Molnar 2009-01-07 15:30 ` Frédéric Weisbecker 2009-01-07 15:28 ` [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock 1 sibling, 2 replies; 15+ messages in thread From: Heiko Carstens @ 2009-01-07 15:19 UTC (permalink / raw) To: Linus Torvalds, Andrew Morton, Rusty Russell Cc: Pekka Enberg, Justin P. Mattock, linux-kernel, Jeff Chua From: Heiko Carstens <heiko.carstens@de.ibm.com> disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the caller already created the stop_machine workqueue (like cpu_down does). Otherwise a call to stop_machine will lead to accesses to random memory regions. When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85 "stop_machine: introduce stop_machine_create/destroy") I missed the second call site of _cpu_down. So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus as well. Fixes suspend-to-ram/disk and also this bug: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 [ 286.550598] Oops: 0002 [#1] SMP [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 [ 286.560580] Call Trace: [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 Reported-by: "Justin P. Mattock" <justinmattock@gmail.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> --- kernel/cpu.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/cpu.c =================================================================== --- linux-2.6.orig/kernel/cpu.c +++ linux-2.6/kernel/cpu.c @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus; int disable_nonboot_cpus(void) { - int cpu, first_cpu, error = 0; + int cpu, first_cpu, error; + error = stop_machine_create(); + if (error) + return error; cpu_maps_update_begin(); first_cpu = cpumask_first(cpu_online_mask); /* We take down all of the non-boot CPUs in one shot to avoid races @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void) printk(KERN_ERR "Non-boot CPUs are not disabled\n"); } cpu_maps_update_done(); + stop_machine_destroy(); return error; } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus 2009-01-07 15:19 ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens @ 2009-01-07 15:23 ` Ingo Molnar 2009-01-07 15:30 ` Frédéric Weisbecker 1 sibling, 0 replies; 15+ messages in thread From: Ingo Molnar @ 2009-01-07 15:23 UTC (permalink / raw) To: Heiko Carstens Cc: Linus Torvalds, Andrew Morton, Rusty Russell, Pekka Enberg, Justin P. Mattock, linux-kernel, Jeff Chua * Heiko Carstens <heiko.carstens@de.ibm.com> wrote: > From: Heiko Carstens <heiko.carstens@de.ibm.com> > > disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the > caller already created the stop_machine workqueue (like cpu_down does). > Otherwise a call to stop_machine will lead to accesses to random memory > regions. btw., i got this crash earlier today: CPU0 attaching sched-domain: domain 0: span 0-1 level CPU groups: 0 1 CPU1 attaching sched-domain: domain 0: span 0-1 level CPU groups: 1 0 eth0: no IPv6 routers present BUG: Bad page state in process cc1 pfn:00879 page:c101b894 flags:00000400 count:0 mapcount:0 mapping:(null) index:0 Pid: 3060, comm: cc1 Not tainted 2.6.28-tip-07641-gb97d41d-dirty #14985 Call Trace: [<c016ce8b>] bad_page+0xcf/0xe5 [<c016d3b4>] free_pages_check+0xa7/0xc5 [<c016d400>] free_hot_cold_page+0x2e/0x138 [<c014751c>] ? __lock_acquire+0x127/0x29d [<c016d558>] free_hot_page+0xf/0x11 [<c0170963>] put_page+0x76/0x7c [<c0185071>] ? constant_test_bit+0x9/0x20 [<c0187149>] kfree+0x30/0xe5 [<c0164993>] ? trace_hardirqs_on+0x8/0x1c [<c01547dd>] free_user_ns+0x1d/0x20 [<c01547c0>] ? free_user_ns+0x0/0x20 [<c02c7a41>] kref_put+0x18/0x22 [<c0132d4c>] put_user_ns+0x16/0x18 [<c0132f52>] free_uid+0x59/0xc8 [<c0136239>] ? groups_free+0x36/0x3a [<c0140406>] put_cred_rcu+0x5f/0x70 [<c01598fb>] __rcu_process_callbacks+0x168/0x1f8 [<c03031be>] ? isicom_tx+0x0/0x31f [<c01599b1>] rcu_process_callbacks+0x26/0x46 [<c012f11d>] __do_softirq+0x9d/0x139 [<c012f080>] ? __do_softirq+0x0/0x139 <IRQ> [<c012efe2>] ? irq_exit+0x4c/0x83 [<c05cc586>] ? __irqentry_text_start+0x6e/0x7c [<c0103f61>] ? apic_timer_interrupt+0x2d/0x34 and i applied your patch (from the other thread) and never saw this bug again. So if it's the same bug (it appears to be) then you have my: Tested-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus 2009-01-07 15:19 ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens 2009-01-07 15:23 ` Ingo Molnar @ 2009-01-07 15:30 ` Frédéric Weisbecker 2009-01-07 15:52 ` Justin P. Mattock 2009-01-08 5:13 ` Justin P. Mattock 1 sibling, 2 replies; 15+ messages in thread From: Frédéric Weisbecker @ 2009-01-07 15:30 UTC (permalink / raw) To: Heiko Carstens Cc: Linus Torvalds, Andrew Morton, Rusty Russell, Pekka Enberg, Justin P. Mattock, linux-kernel, Jeff Chua 2009/1/7 Heiko Carstens <heiko.carstens@de.ibm.com>: > From: Heiko Carstens <heiko.carstens@de.ibm.com> > > disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the > caller already created the stop_machine workqueue (like cpu_down does). > Otherwise a call to stop_machine will lead to accesses to random memory > regions. > > When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85 > "stop_machine: introduce stop_machine_create/destroy") I missed the second > call site of _cpu_down. > So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus > as well. > > Fixes suspend-to-ram/disk and also this bug: > > [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b > [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 > [ 286.550598] Oops: 0002 [#1] SMP > [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 > [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 > [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 > [ 286.560580] Call Trace: > [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 > [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc > [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 > [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c > [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 > [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 > [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c > [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 > [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a > [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a > [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 > [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 > [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 > [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 > [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 > > Reported-by: "Justin P. Mattock" <justinmattock@gmail.com> > Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> > --- > kernel/cpu.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > Index: linux-2.6/kernel/cpu.c > =================================================================== > --- linux-2.6.orig/kernel/cpu.c > +++ linux-2.6/kernel/cpu.c > @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus; > > int disable_nonboot_cpus(void) > { > - int cpu, first_cpu, error = 0; > + int cpu, first_cpu, error; > > + error = stop_machine_create(); > + if (error) > + return error; > cpu_maps_update_begin(); > first_cpu = cpumask_first(cpu_online_mask); > /* We take down all of the non-boot CPUs in one shot to avoid races > @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void) > printk(KERN_ERR "Non-boot CPUs are not disabled\n"); > } > cpu_maps_update_done(); > + stop_machine_destroy(); > return error; > } > That should explain why suspend to disk failed on my box yesterday on the processors stage... Thanks! ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus 2009-01-07 15:30 ` Frédéric Weisbecker @ 2009-01-07 15:52 ` Justin P. Mattock 2009-01-08 5:13 ` Justin P. Mattock 1 sibling, 0 replies; 15+ messages in thread From: Justin P. Mattock @ 2009-01-07 15:52 UTC (permalink / raw) To: Frédéric Weisbecker Cc: Heiko Carstens, Linus Torvalds, Andrew Morton, Rusty Russell, Pekka Enberg, linux-kernel, Jeff Chua Frédéric Weisbecker wrote: > 2009/1/7 Heiko Carstens <heiko.carstens@de.ibm.com>: > >> From: Heiko Carstens <heiko.carstens@de.ibm.com> >> >> disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the >> caller already created the stop_machine workqueue (like cpu_down does). >> Otherwise a call to stop_machine will lead to accesses to random memory >> regions. >> >> When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85 >> "stop_machine: introduce stop_machine_create/destroy") I missed the second >> call site of _cpu_down. >> So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus >> as well. >> >> Fixes suspend-to-ram/disk and also this bug: >> >> [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b >> [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 >> [ 286.550598] Oops: 0002 [#1] SMP >> [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 >> [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 >> [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 >> [ 286.560580] Call Trace: >> [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 >> [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc >> [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 >> [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c >> [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 >> [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 >> [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c >> [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 >> [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a >> [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a >> [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 >> [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 >> [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 >> [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 >> [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 >> >> Reported-by: "Justin P. Mattock" <justinmattock@gmail.com> >> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> >> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> >> --- >> kernel/cpu.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> Index: linux-2.6/kernel/cpu.c >> =================================================================== >> --- linux-2.6.orig/kernel/cpu.c >> +++ linux-2.6/kernel/cpu.c >> @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus; >> >> int disable_nonboot_cpus(void) >> { >> - int cpu, first_cpu, error = 0; >> + int cpu, first_cpu, error; >> >> + error = stop_machine_create(); >> + if (error) >> + return error; >> cpu_maps_update_begin(); >> first_cpu = cpumask_first(cpu_online_mask); >> /* We take down all of the non-boot CPUs in one shot to avoid races >> @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void) >> printk(KERN_ERR "Non-boot CPUs are not disabled\n"); >> } >> cpu_maps_update_done(); >> + stop_machine_destroy(); >> return error; >> } >> >> > > > That should explain why suspend to disk failed on my box yesterday on > the processors stage... > Thanks! > > O.K. applied the patch, and shutdown the machine a few times; no freeze, no bug message. sweet!!. Now I'm gonna try and dismantle a bug message for educational purposes. Thanks for the assistance. regards; Justin P. Mattock ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus 2009-01-07 15:30 ` Frédéric Weisbecker 2009-01-07 15:52 ` Justin P. Mattock @ 2009-01-08 5:13 ` Justin P. Mattock 1 sibling, 0 replies; 15+ messages in thread From: Justin P. Mattock @ 2009-01-08 5:13 UTC (permalink / raw) To: Frédéric Weisbecker Cc: Heiko Carstens, Linus Torvalds, Andrew Morton, Rusty Russell, Pekka Enberg, linux-kernel, Jeff Chua Frédéric Weisbecker wrote: > 2009/1/7 Heiko Carstens <heiko.carstens@de.ibm.com>: > >> From: Heiko Carstens <heiko.carstens@de.ibm.com> >> >> disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the >> caller already created the stop_machine workqueue (like cpu_down does). >> Otherwise a call to stop_machine will lead to accesses to random memory >> regions. >> >> When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85 >> "stop_machine: introduce stop_machine_create/destroy") I missed the second >> call site of _cpu_down. >> So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus >> as well. >> >> Fixes suspend-to-ram/disk and also this bug: >> >> [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b >> [ 286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3 >> [ 286.550598] Oops: 0002 [#1] SMP >> [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 >> [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 >> [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 >> [ 286.560580] Call Trace: >> [ 286.560580] [<c03d04e4>] ? _cpu_down+0x10f/0x234 >> [ 286.560580] [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc >> [ 286.560580] [<c01360c0>] ? kernel_poweroff+0x22/0x39 >> [ 286.560580] [<c0136301>] ? sys_reboot+0xde/0x14c >> [ 286.560580] [<c01331b2>] ? complete_signal+0x179/0x191 >> [ 286.560580] [<c0133396>] ? send_signal+0x1cc/0x1e1 >> [ 286.560580] [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c >> [ 286.560580] [<c0133b65>] ? group_send_signal_info+0x58/0x61 >> [ 286.560580] [<c0133b9e>] ? kill_pid_info+0x30/0x3a >> [ 286.560580] [<c0133d49>] ? sys_kill+0x75/0x13a >> [ 286.560580] [<c01a06cb>] ? mntput_no_expire+ox1f/0x101 >> [ 286.560580] [<c019b3b3>] ? dput+0x1e/0x105 >> [ 286.560580] [<c018ef87>] ? __fput+0x150/0x158 >> [ 286.560580] [<c0157abf>] ? audit_syscall_entry+0x137/0x159 >> [ 286.560580] [<c010329f>] ? sysenter_do_call+0x12/0x34 >> >> Reported-by: "Justin P. Mattock" <justinmattock@gmail.com> >> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> >> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> >> --- >> kernel/cpu.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> Index: linux-2.6/kernel/cpu.c >> =================================================================== >> --- linux-2.6.orig/kernel/cpu.c >> +++ linux-2.6/kernel/cpu.c >> @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus; >> >> int disable_nonboot_cpus(void) >> { >> - int cpu, first_cpu, error = 0; >> + int cpu, first_cpu, error; >> >> + error = stop_machine_create(); >> + if (error) >> + return error; >> cpu_maps_update_begin(); >> first_cpu = cpumask_first(cpu_online_mask); >> /* We take down all of the non-boot CPUs in one shot to avoid races >> @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void) >> printk(KERN_ERR "Non-boot CPUs are not disabled\n"); >> } >> cpu_maps_update_done(); >> + stop_machine_destroy(); >> return error; >> } >> >> > > > That should explain why suspend to disk failed on my box yesterday on > the processors stage... > Thanks! > > I hate to ask this, but I'm going to anyway: when running gdb /usr/src/linux/vmlinux (hoping to see if gdb will catch the bug); I keep getting: Program terminated with signal SIGKILL, Killed. The program no longer exists. You can't do that without a process to debug. if i do a: (gdb) disassemble __stop_machine (as described in Documentation); I'll see a bit of info. How do I start/or figure out a process to debug? i.g. under the bug message that I wrote down, it says Pid: 3273 entering that in (gdb) r 3273 results in a SIGKILL. regards; Justin P. Mattock ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b 2009-01-07 13:51 ` Jeff Chua 2009-01-07 15:19 ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens @ 2009-01-07 15:28 ` Justin P. Mattock 1 sibling, 0 replies; 15+ messages in thread From: Justin P. Mattock @ 2009-01-07 15:28 UTC (permalink / raw) To: Jeff Chua; +Cc: Heiko Carstens, Pekka Enberg, linux-kernel, Rusty Russell Jeff Chua wrote: > On Wed, Jan 7, 2009 at 8:27 PM, Heiko Carstens > <heiko.carstens@de.ibm.com> wrote: > >> On Wed, Jan 07, 2009 at 07:36:57PM +0800, Jeff Chua wrote: >> >>> On Wed, Jan 7, 2009 at 5:19 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote: >>> >>>> On Wed, 2009-01-07 at 10:15 +0100, Heiko Carstens wrote: >>>> >>>>> I missed to convert disable_nonboot_cpus to >>>>> stop_machine_create/destroy. >>>>> So it's a use before-even-allocated bug. >>>>> The patch below should hopefully fix it: >>>>> Subject: [PATCH] cpu hotplug: add stop_machine_create/destroy to disable_nonboot_cpus >>>>> >>>> Looks good to me! >>>> >>> This also fixes the suspend-to-ram/disk problem. Without it, the >>> system will just hang. >>> > > >> Did you also see the reboot problem and does the patch fix it for you? >> > > I never had problem with rebooting. Just suspend hanging which is > really annoying ... walking away and come back hours later and > realized that the suspend is hanging and having to do a hard boot. > > Thanks, > Jeff. > > Man!! I missed this whole conversation (passed out, too tired); I'll go ahead and apply the patch and let you know If I get the freeze at shutdown. Then I'm still interested in knowing how to take a debug messages and dissect it to find the exact location of the problem. (but could do that later); regards; Justin P. Mattock ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-01-08 5:13 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-07 0:12 [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock 2009-01-07 6:48 ` Pekka Enberg 2009-01-07 8:13 ` Justin P. Mattock 2009-01-07 8:30 ` Pekka Enberg 2009-01-07 9:15 ` Heiko Carstens 2009-01-07 9:19 ` Pekka Enberg 2009-01-07 11:36 ` Jeff Chua 2009-01-07 12:27 ` Heiko Carstens 2009-01-07 13:51 ` Jeff Chua 2009-01-07 15:19 ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens 2009-01-07 15:23 ` Ingo Molnar 2009-01-07 15:30 ` Frédéric Weisbecker 2009-01-07 15:52 ` Justin P. Mattock 2009-01-08 5:13 ` Justin P. Mattock 2009-01-07 15:28 ` [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox