From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e28smtp09.in.ibm.com (e28smtp09.in.ibm.com [122.248.162.9]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41BEE1400FD for ; Tue, 8 Apr 2014 12:39:43 +1000 (EST) Received: from /spool/local by e28smtp09.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 8 Apr 2014 08:09:39 +0530 Received: from d28relay02.in.ibm.com (d28relay02.in.ibm.com [9.184.220.59]) by d28dlp02.in.ibm.com (Postfix) with ESMTP id E09103940023 for ; Tue, 8 Apr 2014 08:09:36 +0530 (IST) Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66]) by d28relay02.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id s382dVSB1704310 for ; Tue, 8 Apr 2014 08:09:31 +0530 Received: from d28av04.in.ibm.com (localhost [127.0.0.1]) by d28av04.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s382dYvd006694 for ; Tue, 8 Apr 2014 08:09:35 +0530 Message-ID: <53436163.1010102@linux.vnet.ibm.com> Date: Tue, 08 Apr 2014 10:39:31 +0800 From: Michael wang MIME-Version: 1.0 To: "Srivatsa S. Bhat" Subject: Re: [PATCH] power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update References: <533B8431.8090507@linux.vnet.ibm.com> <53427AC1.9010801@linux.vnet.ibm.com> In-Reply-To: <53427AC1.9010801@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8 Cc: sfr@canb.auug.org.au, LKML , paulus@samba.org, alistair@popple.id.au, nfont@linux.vnet.ibm.com, Andrew Morton , rcj@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hi, Srivatsa It's nice to have you confirmed the fix, and thanks for the well-writing comments, will apply them and send out the new patch later :) Regards, Michael Wang On 04/07/2014 06:15 PM, Srivatsa S. Bhat wrote: > Hi Michael, > > On 04/02/2014 08:59 AM, Michael wang wrote: >> During the testing, we encounter below WARN followed by Oops: >> >> WARNING: at kernel/sched/core.c:6218 >> ... >> NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200 >> LR [c000000000101358] .build_sched_domains+0xec8/0x1200 >> PACATMSCRATCH [800000000000f032] >> Call Trace: >> [c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200 >> [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 >> [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 >> [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 >> ... >> Oops: Kernel access of bad area, sig: 11 [#1] >> ... >> NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0 >> LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200 >> PACATMSCRATCH [8000000000029032] >> Call Trace: >> [c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0 >> [c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200 >> [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 >> [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 >> [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 >> ... >> >> This was caused by that 'sd->groups == NULL' after building groups, which >> was caused by the empty 'sd->span'. >> >> The cpu's domain contain nothing because the cpu was assigned to wrong >> node inside arch_update_cpu_topology() by calling update_lookup_table() >> with the uninitialized param, in the case when there is nothing to be >> update. >> > > Can you reword the above paragraph to something like this: > > The cpu's domain contained nothing because the cpu was assigned to a wrong > node, due to the following unfortunate sequence of events: > > 1. The hypervisor sent a topology update to the guest OS, to notify changes > to the cpu-node mapping. However, the update was actually redundant - i.e., > the "new" mapping was exactly the same as the old one. > > 2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting > the 'for-loop' in arch_update_cpu_topology(). > > 3. So we ended up calling stop-machine() with an empty cpumask list, which made > stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as > the cpu to run the payload (the update_cpu_topology() function). > > 4. This causes update_cpu_topology() to be run by CPU0. And since 'updates' > is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology() > finds update->cpu as well as update->new_nid to be 0. In other words, we > end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly. > > This causes the sched-domain rebuild code to break and crash the system. > > >> Thus we should stop the updating in such cases, this patch will achieve >> this and fix the issue. >> > > We can reword this part as: > > Fix this by skipping the topology update in cases where we find that > the topology has not actually changed in reality (ie., spurious updates). > >> CC: Benjamin Herrenschmidt >> CC: Paul Mackerras >> CC: Nathan Fontenot >> CC: Stephen Rothwell >> CC: Andrew Morton >> CC: Robert Jennings >> CC: Jesse Larrew >> CC: "Srivatsa S. Bhat" >> CC: Alistair Popple >> Signed-off-by: Michael Wang >> --- >> arch/powerpc/mm/numa.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >> index 30a42e2..6757690 100644 >> --- a/arch/powerpc/mm/numa.c >> +++ b/arch/powerpc/mm/numa.c >> @@ -1591,6 +1591,14 @@ int arch_update_cpu_topology(void) >> cpu = cpu_last_thread_sibling(cpu); >> } >> >> + /* >> + * The 'cpu_associativity_changes_mask' could be cleared if >> + * all the cpus it indicates won't change their node, in >> + * which case the 'updated_cpus' will be empty. >> + */ > > How about rewording the comment like this: > > In cases where we have nothing to update (because the updates list > is too short or because the new topology is same as the old one), > skip invoking update_cpu_topology() via stop-machine(). This is > necessary (and not just a fast-path optimization) because stop-machine > can end up electing a random CPU to run update_cpu_topology(), and > thus trick us into setting up incorrect cpu-node mappings (since > 'updates' is kzalloc()'ed). > > Regards, > Srivatsa S. Bhat > >> + if (!cpumask_weight(&updated_cpus)) >> + goto out; >> + >> stop_machine(update_cpu_topology, &updates[0], &updated_cpus); >> >> /* >> @@ -1612,6 +1620,7 @@ int arch_update_cpu_topology(void) >> changed = 1; >> } >> >> +out: >> kfree(updates); >> return changed; >> } >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ >