All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael wang <wangyun@linux.vnet.ibm.com>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: sfr@canb.auug.org.au, LKML <linux-kernel@vger.kernel.org>,
	paulus@samba.org, alistair@popple.id.au,
	nfont@linux.vnet.ibm.com,
	Andrew Morton <akpm@linux-foundation.org>,
	rcj@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
Date: Tue, 08 Apr 2014 10:39:31 +0800	[thread overview]
Message-ID: <53436163.1010102@linux.vnet.ibm.com> (raw)
In-Reply-To: <53427AC1.9010801@linux.vnet.ibm.com>

Hi, Srivatsa

It's nice to have you confirmed the fix, and thanks for the well-writing
comments, will apply them and send out the new patch later :)

Regards,
Michael Wang

On 04/07/2014 06:15 PM, Srivatsa S. Bhat wrote:
> Hi Michael,
> 
> On 04/02/2014 08:59 AM, Michael wang wrote:
>> During the testing, we encounter below WARN followed by Oops:
>>
>> 	WARNING: at kernel/sched/core.c:6218
>> 	...
>> 	NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200
>> 	LR [c000000000101358] .build_sched_domains+0xec8/0x1200
>> 	PACATMSCRATCH [800000000000f032]
>> 	Call Trace:
>> 	[c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200
>> 	[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
>> 	[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
>> 	[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
>> 	...
>> 	Oops: Kernel access of bad area, sig: 11 [#1]
>> 	...
>> 	NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0
>> 	LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200
>> 	PACATMSCRATCH [8000000000029032]
>> 	Call Trace:
>> 	[c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0
>> 	[c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200
>> 	[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
>> 	[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
>> 	[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
>> 	...
>>
>> This was caused by that 'sd->groups == NULL' after building groups, which
>> was caused by the empty 'sd->span'.
>>
>> The cpu's domain contain nothing because the cpu was assigned to wrong
>> node inside arch_update_cpu_topology() by calling update_lookup_table()
>> with the uninitialized param, in the case when there is nothing to be
>> update.
>>
> 
> Can you reword the above paragraph to something like this:
> 
> The cpu's domain contained nothing because the cpu was assigned to a wrong
> node, due to the following unfortunate sequence of events:
> 
> 1. The hypervisor sent a topology update to the guest OS, to notify changes
>    to the cpu-node mapping. However, the update was actually redundant - i.e.,
>    the "new" mapping was exactly the same as the old one.
> 
> 2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting
>    the 'for-loop' in arch_update_cpu_topology().
> 
> 3. So we ended up calling stop-machine() with an empty cpumask list, which made
>    stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as
>    the cpu to run the payload (the update_cpu_topology() function).
> 
> 4. This causes update_cpu_topology() to be run by CPU0. And since 'updates'
>    is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology()
>    finds update->cpu as well as update->new_nid to be 0. In other words, we
>    end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly.
> 
> This causes the sched-domain rebuild code to break and crash the system.
> 
> 
>> Thus we should stop the updating in such cases, this patch will achieve
>> this and fix the issue.
>>
> 
> We can reword this part as:
> 
> Fix this by skipping the topology update in cases where we find that
> the topology has not actually changed in reality (ie., spurious updates).
> 
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> CC: Stephen Rothwell <sfr@canb.auug.org.au>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Robert Jennings <rcj@linux.vnet.ibm.com>
>> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
>> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
>> CC: Alistair Popple <alistair@popple.id.au>
>> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/mm/numa.c |    9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 30a42e2..6757690 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -1591,6 +1591,14 @@ int arch_update_cpu_topology(void)
>>  		cpu = cpu_last_thread_sibling(cpu);
>>  	}
>>  
>> +	/*
>> +	 * The 'cpu_associativity_changes_mask' could be cleared if
>> +	 * all the cpus it indicates won't change their node, in
>> +	 * which case the 'updated_cpus' will be empty.
>> +	 */
> 
> How about rewording the comment like this:
> 
> In cases where we have nothing to update (because the updates list
> is too short or because the new topology is same as the old one),
> skip invoking update_cpu_topology() via stop-machine(). This is
> necessary (and not just a fast-path optimization) because stop-machine
> can end up electing a random CPU to run update_cpu_topology(), and
> thus trick us into setting up incorrect cpu-node mappings (since
> 'updates' is kzalloc()'ed).
> 
> Regards,
> Srivatsa S. Bhat
> 
>> +	if (!cpumask_weight(&updated_cpus))
>> +		goto out;
>> +
>>  	stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
>>  
>>  	/*
>> @@ -1612,6 +1620,7 @@ int arch_update_cpu_topology(void)
>>  		changed = 1;
>>  	}
>>  
>> +out:
>>  	kfree(updates);
>>  	return changed;
>>  }
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

WARNING: multiple messages have this Message-ID (diff)
From: Michael wang <wangyun@linux.vnet.ibm.com>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org,
	LKML <linux-kernel@vger.kernel.org>,
	benh@kernel.crashing.org, paulus@samba.org,
	nfont@linux.vnet.ibm.com, sfr@canb.auug.org.au,
	Andrew Morton <akpm@linux-foundation.org>,
	rcj@linux.vnet.ibm.com, alistair@popple.id.au
Subject: Re: [PATCH] power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
Date: Tue, 08 Apr 2014 10:39:31 +0800	[thread overview]
Message-ID: <53436163.1010102@linux.vnet.ibm.com> (raw)
In-Reply-To: <53427AC1.9010801@linux.vnet.ibm.com>

Hi, Srivatsa

It's nice to have you confirmed the fix, and thanks for the well-writing
comments, will apply them and send out the new patch later :)

Regards,
Michael Wang

On 04/07/2014 06:15 PM, Srivatsa S. Bhat wrote:
> Hi Michael,
> 
> On 04/02/2014 08:59 AM, Michael wang wrote:
>> During the testing, we encounter below WARN followed by Oops:
>>
>> 	WARNING: at kernel/sched/core.c:6218
>> 	...
>> 	NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200
>> 	LR [c000000000101358] .build_sched_domains+0xec8/0x1200
>> 	PACATMSCRATCH [800000000000f032]
>> 	Call Trace:
>> 	[c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200
>> 	[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
>> 	[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
>> 	[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
>> 	...
>> 	Oops: Kernel access of bad area, sig: 11 [#1]
>> 	...
>> 	NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0
>> 	LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200
>> 	PACATMSCRATCH [8000000000029032]
>> 	Call Trace:
>> 	[c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0
>> 	[c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200
>> 	[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
>> 	[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
>> 	[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
>> 	...
>>
>> This was caused by that 'sd->groups == NULL' after building groups, which
>> was caused by the empty 'sd->span'.
>>
>> The cpu's domain contain nothing because the cpu was assigned to wrong
>> node inside arch_update_cpu_topology() by calling update_lookup_table()
>> with the uninitialized param, in the case when there is nothing to be
>> update.
>>
> 
> Can you reword the above paragraph to something like this:
> 
> The cpu's domain contained nothing because the cpu was assigned to a wrong
> node, due to the following unfortunate sequence of events:
> 
> 1. The hypervisor sent a topology update to the guest OS, to notify changes
>    to the cpu-node mapping. However, the update was actually redundant - i.e.,
>    the "new" mapping was exactly the same as the old one.
> 
> 2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting
>    the 'for-loop' in arch_update_cpu_topology().
> 
> 3. So we ended up calling stop-machine() with an empty cpumask list, which made
>    stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as
>    the cpu to run the payload (the update_cpu_topology() function).
> 
> 4. This causes update_cpu_topology() to be run by CPU0. And since 'updates'
>    is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology()
>    finds update->cpu as well as update->new_nid to be 0. In other words, we
>    end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly.
> 
> This causes the sched-domain rebuild code to break and crash the system.
> 
> 
>> Thus we should stop the updating in such cases, this patch will achieve
>> this and fix the issue.
>>
> 
> We can reword this part as:
> 
> Fix this by skipping the topology update in cases where we find that
> the topology has not actually changed in reality (ie., spurious updates).
> 
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>> CC: Stephen Rothwell <sfr@canb.auug.org.au>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Robert Jennings <rcj@linux.vnet.ibm.com>
>> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
>> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
>> CC: Alistair Popple <alistair@popple.id.au>
>> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/mm/numa.c |    9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 30a42e2..6757690 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -1591,6 +1591,14 @@ int arch_update_cpu_topology(void)
>>  		cpu = cpu_last_thread_sibling(cpu);
>>  	}
>>  
>> +	/*
>> +	 * The 'cpu_associativity_changes_mask' could be cleared if
>> +	 * all the cpus it indicates won't change their node, in
>> +	 * which case the 'updated_cpus' will be empty.
>> +	 */
> 
> How about rewording the comment like this:
> 
> In cases where we have nothing to update (because the updates list
> is too short or because the new topology is same as the old one),
> skip invoking update_cpu_topology() via stop-machine(). This is
> necessary (and not just a fast-path optimization) because stop-machine
> can end up electing a random CPU to run update_cpu_topology(), and
> thus trick us into setting up incorrect cpu-node mappings (since
> 'updates' is kzalloc()'ed).
> 
> Regards,
> Srivatsa S. Bhat
> 
>> +	if (!cpumask_weight(&updated_cpus))
>> +		goto out;
>> +
>>  	stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
>>  
>>  	/*
>> @@ -1612,6 +1620,7 @@ int arch_update_cpu_topology(void)
>>  		changed = 1;
>>  	}
>>  
>> +out:
>>  	kfree(updates);
>>  	return changed;
>>  }
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


  reply	other threads:[~2014-04-08  2:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-02  3:29 [PATCH] power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update Michael wang
2014-04-03  8:50 ` Srivatsa S. Bhat
2014-04-03  8:50   ` Srivatsa S. Bhat
2014-04-04  3:48   ` Michael wang
2014-04-04  3:48     ` Michael wang
2014-04-07  9:51     ` Srivatsa S. Bhat
2014-04-07  9:51       ` Srivatsa S. Bhat
2014-04-07 10:15 ` Srivatsa S. Bhat
2014-04-07 10:15   ` Srivatsa S. Bhat
2014-04-08  2:39   ` Michael wang [this message]
2014-04-08  2:39     ` Michael wang
2014-04-08  3:19 ` [PATCH v2] " Michael wang
2014-04-08  8:24   ` Srivatsa S. Bhat
2014-04-08  8:24     ` Srivatsa S. Bhat
2014-04-16  3:14   ` Michael wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53436163.1010102@linux.vnet.ibm.com \
    --to=wangyun@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alistair@popple.id.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.