All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Justin P. Mattock" <justinmattock@gmail.com>
To: "Frédéric Weisbecker" <fweisbec@gmail.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	linux-kernel@vger.kernel.org,
	Jeff Chua <jeff.chua.linux@gmail.com>
Subject: Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus
Date: Wed, 07 Jan 2009 21:13:27 -0800	[thread overview]
Message-ID: <49658B77.7010407@gmail.com> (raw)
In-Reply-To: <c62985530901070730j7515a106x2a56a6d8a14fbbc5@mail.gmail.com>

Frédéric Weisbecker wrote:
> 2009/1/7 Heiko Carstens <heiko.carstens@de.ibm.com>:
>   
>> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>>
>> disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the
>> caller already created the stop_machine workqueue (like cpu_down does).
>> Otherwise a call to stop_machine will lead to accesses to random memory
>> regions.
>>
>> When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85
>> "stop_machine: introduce stop_machine_create/destroy") I missed the second
>> call site of _cpu_down.
>> So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus
>> as well.
>>
>> Fixes suspend-to-ram/disk and also this bug:
>>
>> [  286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
>> [  286.548940] IP: [<c0150ca4>] __stop_machine+0x88/0xe3
>> [  286.550598] Oops: 0002 [#1] SMP
>> [  286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5
>> [  286.560580] EIP: is at __stop_machine+0x88/0xe3
>> [  286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
>> [  286.560580] Call Trace:
>> [  286.560580]  [<c03d04e4>] ? _cpu_down+0x10f/0x234
>> [  286.560580]  [<c012a57e>] ? disable_nonboot_cpus+0x58/0xdc
>> [  286.560580]  [<c01360c0>] ? kernel_poweroff+0x22/0x39
>> [  286.560580]  [<c0136301>] ? sys_reboot+0xde/0x14c
>> [  286.560580]  [<c01331b2>] ? complete_signal+0x179/0x191
>> [  286.560580]  [<c0133396>] ? send_signal+0x1cc/0x1e1
>> [  286.560580]  [<c03de418>] ? _spin_unlock_irqrestore+0x2d/0x3c
>> [  286.560580]  [<c0133b65>] ? group_send_signal_info+0x58/0x61
>> [  286.560580]  [<c0133b9e>] ? kill_pid_info+0x30/0x3a
>> [  286.560580]  [<c0133d49>] ? sys_kill+0x75/0x13a
>> [  286.560580]  [<c01a06cb>] ? mntput_no_expire+ox1f/0x101
>> [  286.560580]  [<c019b3b3>] ? dput+0x1e/0x105
>> [  286.560580]  [<c018ef87>] ?  __fput+0x150/0x158
>> [  286.560580]  [<c0157abf>] ? audit_syscall_entry+0x137/0x159
>> [  286.560580]  [<c010329f>] ? sysenter_do_call+0x12/0x34
>>
>> Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
>> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
>> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>> ---
>>  kernel/cpu.c |    6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> Index: linux-2.6/kernel/cpu.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/cpu.c
>> +++ linux-2.6/kernel/cpu.c
>> @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus;
>>
>>  int disable_nonboot_cpus(void)
>>  {
>> -       int cpu, first_cpu, error = 0;
>> +       int cpu, first_cpu, error;
>>
>> +       error = stop_machine_create();
>> +       if (error)
>> +               return error;
>>        cpu_maps_update_begin();
>>        first_cpu = cpumask_first(cpu_online_mask);
>>        /* We take down all of the non-boot CPUs in one shot to avoid races
>> @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void)
>>                printk(KERN_ERR "Non-boot CPUs are not disabled\n");
>>        }
>>        cpu_maps_update_done();
>> +       stop_machine_destroy();
>>        return error;
>>  }
>>
>>     
>
>
> That should explain why suspend to disk failed on my box yesterday on
> the processors stage...
> Thanks!
>
>   
I hate to ask this, but I'm going to
anyway:
 when running
gdb /usr/src/linux/vmlinux
(hoping to see if gdb will catch the bug);
I keep getting:
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
You can't do that without a process to debug.

if i do a:
(gdb) disassemble __stop_machine
(as described in Documentation);
I'll see a bit of info.

How do I start/or figure out a process
to debug? i.g. under the bug message
that I wrote down, it says Pid: 3273
entering that in (gdb) r 3273
results in a SIGKILL.

regards;

Justin P. Mattock




  parent reply	other threads:[~2009-01-08  5:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-07  0:12 [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock
2009-01-07  6:48 ` Pekka Enberg
2009-01-07  8:13   ` Justin P. Mattock
2009-01-07  8:30   ` Pekka Enberg
2009-01-07  9:15     ` Heiko Carstens
2009-01-07  9:19       ` Pekka Enberg
2009-01-07 11:36         ` Jeff Chua
2009-01-07 12:27           ` Heiko Carstens
2009-01-07 13:51             ` Jeff Chua
2009-01-07 15:19               ` [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus Heiko Carstens
2009-01-07 15:23                 ` Ingo Molnar
2009-01-07 15:30                 ` Frédéric Weisbecker
2009-01-07 15:52                   ` Justin P. Mattock
2009-01-08  5:13                   ` Justin P. Mattock [this message]
2009-01-07 15:28               ` [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b Justin P. Mattock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49658B77.7010407@gmail.com \
    --to=justinmattock@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=fweisbec@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jeff.chua.linux@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=rusty@rustcorp.com.au \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.