Re: [PATCH] Update sched_domains_numa_masks when new cpus are onlined.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Tang Chen <tangchen@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-numa@vger.kernel.org, wency@cn.fujitsu.com,
	mingo@kernel.org, Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH] Update sched_domains_numa_masks when new cpus are onlined.
Date: Mon, 24 Sep 2012 11:38:56 +0200	[thread overview]
Message-ID: <1348479536.11847.25.camel@twins> (raw)
In-Reply-To: <1347963128-25942-1-git-send-email-tangchen@cn.fujitsu.com>

Why are you cc'ing x86 and numa folks but not a single scheduler person
when you're patching scheduler stuff?

On Tue, 2012-09-18 at 18:12 +0800, Tang Chen wrote:
> Once array sched_domains_numa_masks is defined, it is never updated.
> When a new cpu on a new node is onlined,

Hmm, so there's hardware where you can boot with smaller nr_node_ids
than possible.. I guess that makes sense.

>  the coincident member in
> sched_domains_numa_masks is not initialized, and all the masks are 0.
> As a result, the build_overlap_sched_groups() will initialize a NULL
> sched_group for the new cpu on the new node, which will lead to kernel panic.

<snip>

> This patch registers a new notifier for cpu hotplug notify chain, and
> updates sched_domains_numa_masks every time a new cpu is onlined or offlined.

Urgh, more hotplug notifiers.. a well.

> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> ---
>  kernel/sched/core.c |   62 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 62 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index fbf1fd0..66b36ab 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6711,6 +6711,14 @@ static void sched_init_numa(void)
>  	 * numbers.
>  	 */
>  
> +	/*
> +	 * Since sched_domains_numa_levels is also used in other functions as
> +	 * an index for sched_domains_numa_masks[][], we should reset it here in
> +	 * case sched_domains_numa_masks[][] fails to be initialized. And set it
> +	 * to 'level' when sched_domains_numa_masks[][] is fully initialized.
> +	 */
> +	sched_domains_numa_levels = 0;

This isn't strictly needed for this patch right? I don't see anybody
calling sched_init_numa() a second time (although they should)..

>  	sched_domains_numa_masks = kzalloc(sizeof(void *) * level, GFP_KERNEL);
>  	if (!sched_domains_numa_masks)
>  		return;
> @@ -6765,11 +6773,64 @@ static void sched_init_numa(void)
>  	}
>  
>  	sched_domain_topology = tl;
> +
> +	sched_domains_numa_levels = level;
> +}
> +
> +static void sched_domains_numa_masks_set(int cpu)
> +{
> +	int i, j;
> +	int node = cpu_to_node(cpu);
> +
> +	for (i = 0; i < sched_domains_numa_levels; i++)
> +		for (j = 0; j < nr_node_ids; j++)
> +			if (node_distance(j, node) <= sched_domains_numa_distance[i])
> +				cpumask_set_cpu(cpu, sched_domains_numa_masks[i][j]);
> +}
> +
> +static void sched_domains_numa_masks_clear(int cpu)
> +{
> +	int i, j;
> +	for (i = 0; i < sched_domains_numa_levels; i++)
> +		for (j = 0; j < nr_node_ids; j++)
> +			cpumask_clear_cpu(cpu, sched_domains_numa_masks[i][j]);
> +}

Aside from the coding style nit of wanting braces over multi-line
statements even though not strictly required, I really don't see how
this could possibly be right.. 

We do this because nr_node_ids changed, right? This means the entire
distance table grew/shrunk, which means we should do the level scan
again.

> @@ -7218,6 +7279,7 @@ void __init sched_init_smp(void)
>         mutex_unlock(&sched_domains_mutex);
>         put_online_cpus();
>  
> +       hotcpu_notifier(sched_domains_numa_masks_update, CPU_PRI_SCHED_ACTIVE);
>         hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
>         hotcpu_notifier(cpuset_cpu_inactive, CPU_PRI_CPUSET_INACTIVE);

OK, so you really want your notifier to run before cpuset_cpu_active
because otherwise you get that crash, yet you fail with the whole order
thing.. You should not _ever_ rely on registration order.

next prev parent reply	other threads:[~2012-09-24  9:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-18 10:12 [PATCH] Update sched_domains_numa_masks when new cpus are onlined Tang Chen
2012-09-24  5:52 ` Tang Chen
2012-09-24  9:38 ` Peter Zijlstra [this message]
2012-09-24  9:57   ` Srivatsa S. Bhat
2012-09-24 10:12     ` Peter Zijlstra
2012-09-24 10:17       ` Srivatsa S. Bhat
2012-09-25  2:39     ` Tang Chen
2012-09-25  2:39   ` Tang Chen
2012-09-25 10:33     ` Peter Zijlstra
2012-09-25 11:45       ` Tang Chen
2012-09-25 11:50         ` Peter Zijlstra
2012-09-25 12:02           ` Tang Chen
2012-09-25 11:07     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1348479536.11847.25.camel@twins \
    --to=peterz@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-numa@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=wency@cn.fujitsu.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.