All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wen Congyang <wency@cn.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	mingo@redhat.com, peterz@infradead.org, miaox@cn.fujitsu.com,
	linux-kernel@vger.kernel.org, linux-numa@vger.kernel.org
Subject: Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.
Date: Tue, 09 Oct 2012 18:22:09 +0800	[thread overview]
Message-ID: <5073FAD1.1000403@cn.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1210090259250.24261@chino.kir.corp.google.com>

At 10/09/2012 06:04 PM, David Rientjes Wrote:
> On Tue, 9 Oct 2012, Tang Chen wrote:
> 
>>>> Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this
>>>> should be called at CPU_DYING level and migrate_tasks() still sees a valid
>>>> cpu.
>>
>> As Wen said below, nid is now set to -1 when cpu is hotremoved.
>> I reproduce this problem in this situation:
>>
>> all cpus are online, and hot remove a system board directorily, without
>> offlining any cpu.
>>
>> As a result, the removed cpu's nid is set to -1, and this causes
>> problems.
>>
> 
> Let's add Andrew to the cc list then, because I'm nacking 
> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm 
> tree for this reason.
> 
> We can only clear a cpu-to-node mapping when the cpu is completely 
> offline, not before or during the CPU_DYING stage.  Kernel code, such as 

I clear cpu-to-node mapping when the cpu is hotremoved. If the cpu is onlined,
it will be offlined before clearing cpu-to-node mapping.

Here is the code in driver/acpi/processor_driver.c:
=============
static int acpi_processor_handle_eject(struct acpi_processor *pr)
{
	if (cpu_online(pr->id))
		cpu_down(pr->id);      <========== cpu is offlined here.

	arch_unregister_cpu(pr->id);
	acpi_unmap_lsapic(pr->id);     <========== I clear the mapping here.
	return (0);
}
=============

The real problem is that: we don't migrate this task to another cpu when
the cpu is offlined. I guess this task is not in running state, and it
is not in the cpu's runqueue.

Thanks
Wen Congyang

> the sched code that you are now trying to "fix", depends on this mapping 
> to work correctly; obviously no audit was done of cpu hotplug code 
> depending on it before the patch was proposed.
> 
> I say "fix" because even this workaround isn't a good solution since it 
> would be much better to pick another cpu on the same node as the offlining 
> cpu for the runqueue before falling back to the set of all allowed nodes.  
> We lose all NUMA affinity information with that patch.  There's no reason 
> why we shouldn't know the node of a cpu that is being offlined.
> 
> So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch.  
> After it's removed because it's buggy, this "fix" will no longer be 
> necessary.
> 

This patch is try to fix a bug: the kernel will be panicked after removing a
node.

  reply	other threads:[~2012-10-09 10:22 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-08  2:59 [PATCH] Do not use cpu_to_node() to find an offlined cpu's node Tang Chen
2012-10-09  6:21 ` David Rientjes
2012-10-09  8:34   ` Wen Congyang
2012-10-09  8:39     ` Tang Chen
2012-10-09 10:04       ` David Rientjes
2012-10-09 10:22         ` Wen Congyang [this message]
2012-10-09 20:00           ` David Rientjes
2012-10-09 10:57 ` Peter Zijlstra
2012-10-09 20:36   ` David Rientjes
2012-10-09 20:47     ` Peter Zijlstra
2012-10-09 23:27       ` David Rientjes
2012-10-10  2:06         ` Wen Congyang
2012-10-10  3:48           ` Wen Congyang
2012-10-10  9:10         ` Peter Zijlstra
2012-10-10  9:33           ` Wen Congyang
2012-10-10  9:51             ` Peter Zijlstra
2012-10-10 10:10               ` Wen Congyang
2012-10-10 10:07                 ` Peter Zijlstra
2012-10-10 20:30           ` David Rientjes
2012-10-10 20:37             ` Andrew Morton
2012-10-10 20:57               ` David Rientjes
2012-10-18  0:52                 ` David Rientjes
2012-10-18  2:51                   ` Tang Chen
2012-10-18  3:29                     ` David Rientjes
2012-10-19 11:18                       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5073FAD1.1000403@cn.fujitsu.com \
    --to=wency@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-numa@vger.kernel.org \
    --cc=miaox@cn.fujitsu.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=tangchen@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.