All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: sfr@canb.auug.org.au,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	jlarrew@linux.vnet.ibm.com, paulus@samba.org,
	alistair@popple.id.au, nfont@linux.vnet.ibm.com,
	Andrew Morton <akpm@linux-foundation.org>,
	rcj@linux.vnet.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
Date: Mon, 07 Apr 2014 15:21:06 +0530	[thread overview]
Message-ID: <5342750A.3060307@linux.vnet.ibm.com> (raw)
In-Reply-To: <533E2BA2.6000107@linux.vnet.ibm.com>

Hi Michael,

On 04/04/2014 09:18 AM, Michael wang wrote:
> Hi, Srivatsa
> 
> Thanks for your reply :)
> 
> On 04/03/2014 04:50 PM, Srivatsa S. Bhat wrote:
> [snip]
>>
>> Now, the interesting thing to note here is that, if CPU0's node was already
>> set as node0, *nothing* should go wrong, since its just a redundant update.
>> However, if CPU0's original node mapping was something different, or if
>> node0 doesn't even exist in the machine, then the system can crash.
> 
> By printk I confirmed all cpus was belong to node 1 at very beginning,
> and things become magically after the wrong updating...
>

Ok, thanks!
 
>>
>> Have you verified that CPU0's node mapping is different from node 0?
>> That is, boot the kernel with "numa=debug" in the kernel command line and
>> it will print out the cpu-to-node associativity during boot. That way you
>> can figure out what was the original associativity that was set. This will
>> confirm the theory that the hypervisor sent a redundant update, but because
>> of the weird pre-allocation using kzalloc that we do inside
>> arch_update_cpu_topology(), we wrongly updated CPU0's mapping as CPU0 <-> Node0.
> 
> Associativity should changes, otherwise we won't continue the updating,
> and empty updates[] was confirmed to show up inside
> arch_update_cpu_topology().
> 

Ah, ok, that makes it very clear. So, I agree that your patch is correct,
but I think the comment in your patch can be enhanced a bit. I'll suggest
something if I manage to come up with a better wording.

> What I can't make sure is whether this is legal, notify changes but no
> changes happen sounds weird...however, even if it's legal, a check in
> here still make sense IMHO.
> 

That looks like a bug in the hypervisor/firmware. But the Linux kernel should
be able to handle such NULL updates without crashing. So yes, your patch makes
sense to me.

Thank you!

Regards,
Srivatsa S. Bhat

>>
>>> Thus we should stop the updating in such cases, this patch will achieve
>>> this and fix the issue.
>>>
>>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> CC: Paul Mackerras <paulus@samba.org>
>>> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>> CC: Stephen Rothwell <sfr@canb.auug.org.au>
>>> CC: Andrew Morton <akpm@linux-foundation.org>
>>> CC: Robert Jennings <rcj@linux.vnet.ibm.com>
>>> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
>>> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
>>> CC: Alistair Popple <alistair@popple.id.au>
>>> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/mm/numa.c |    9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>> index 30a42e2..6757690 100644
>>> --- a/arch/powerpc/mm/numa.c
>>> +++ b/arch/powerpc/mm/numa.c
>>> @@ -1591,6 +1591,14 @@ int arch_update_cpu_topology(void)
>>>  		cpu = cpu_last_thread_sibling(cpu);
>>>  	}
>>>
>>> +	/*
>>> +	 * The 'cpu_associativity_changes_mask' could be cleared if
>>> +	 * all the cpus it indicates won't change their node, in
>>> +	 * which case the 'updated_cpus' will be empty.
>>> +	 */
>>> +	if (!cpumask_weight(&updated_cpus))
>>> +		goto out;
>>> +
>>>  	stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
>>>
>>>  	/*
>>> @@ -1612,6 +1620,7 @@ int arch_update_cpu_topology(void)
>>>  		changed = 1;
>>>  	}
>>>
>>> +out:
>>>  	kfree(updates);
>>>  	return changed;
>>>  }
>>>
>>

WARNING: multiple messages have this Message-ID (diff)
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org,
	LKML <linux-kernel@vger.kernel.org>,
	benh@kernel.crashing.org, paulus@samba.org,
	nfont@linux.vnet.ibm.com, sfr@canb.auug.org.au,
	Andrew Morton <akpm@linux-foundation.org>,
	rcj@linux.vnet.ibm.com, jlarrew@linux.vnet.ibm.com,
	alistair@popple.id.au,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Subject: Re: [PATCH] power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
Date: Mon, 07 Apr 2014 15:21:06 +0530	[thread overview]
Message-ID: <5342750A.3060307@linux.vnet.ibm.com> (raw)
In-Reply-To: <533E2BA2.6000107@linux.vnet.ibm.com>

Hi Michael,

On 04/04/2014 09:18 AM, Michael wang wrote:
> Hi, Srivatsa
> 
> Thanks for your reply :)
> 
> On 04/03/2014 04:50 PM, Srivatsa S. Bhat wrote:
> [snip]
>>
>> Now, the interesting thing to note here is that, if CPU0's node was already
>> set as node0, *nothing* should go wrong, since its just a redundant update.
>> However, if CPU0's original node mapping was something different, or if
>> node0 doesn't even exist in the machine, then the system can crash.
> 
> By printk I confirmed all cpus was belong to node 1 at very beginning,
> and things become magically after the wrong updating...
>

Ok, thanks!
 
>>
>> Have you verified that CPU0's node mapping is different from node 0?
>> That is, boot the kernel with "numa=debug" in the kernel command line and
>> it will print out the cpu-to-node associativity during boot. That way you
>> can figure out what was the original associativity that was set. This will
>> confirm the theory that the hypervisor sent a redundant update, but because
>> of the weird pre-allocation using kzalloc that we do inside
>> arch_update_cpu_topology(), we wrongly updated CPU0's mapping as CPU0 <-> Node0.
> 
> Associativity should changes, otherwise we won't continue the updating,
> and empty updates[] was confirmed to show up inside
> arch_update_cpu_topology().
> 

Ah, ok, that makes it very clear. So, I agree that your patch is correct,
but I think the comment in your patch can be enhanced a bit. I'll suggest
something if I manage to come up with a better wording.

> What I can't make sure is whether this is legal, notify changes but no
> changes happen sounds weird...however, even if it's legal, a check in
> here still make sense IMHO.
> 

That looks like a bug in the hypervisor/firmware. But the Linux kernel should
be able to handle such NULL updates without crashing. So yes, your patch makes
sense to me.

Thank you!

Regards,
Srivatsa S. Bhat

>>
>>> Thus we should stop the updating in such cases, this patch will achieve
>>> this and fix the issue.
>>>
>>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> CC: Paul Mackerras <paulus@samba.org>
>>> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
>>> CC: Stephen Rothwell <sfr@canb.auug.org.au>
>>> CC: Andrew Morton <akpm@linux-foundation.org>
>>> CC: Robert Jennings <rcj@linux.vnet.ibm.com>
>>> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
>>> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
>>> CC: Alistair Popple <alistair@popple.id.au>
>>> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/mm/numa.c |    9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>> index 30a42e2..6757690 100644
>>> --- a/arch/powerpc/mm/numa.c
>>> +++ b/arch/powerpc/mm/numa.c
>>> @@ -1591,6 +1591,14 @@ int arch_update_cpu_topology(void)
>>>  		cpu = cpu_last_thread_sibling(cpu);
>>>  	}
>>>
>>> +	/*
>>> +	 * The 'cpu_associativity_changes_mask' could be cleared if
>>> +	 * all the cpus it indicates won't change their node, in
>>> +	 * which case the 'updated_cpus' will be empty.
>>> +	 */
>>> +	if (!cpumask_weight(&updated_cpus))
>>> +		goto out;
>>> +
>>>  	stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
>>>
>>>  	/*
>>> @@ -1612,6 +1620,7 @@ int arch_update_cpu_topology(void)
>>>  		changed = 1;
>>>  	}
>>>
>>> +out:
>>>  	kfree(updates);
>>>  	return changed;
>>>  }
>>>
>>


  reply	other threads:[~2014-04-07  9:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-02  3:29 [PATCH] power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update Michael wang
2014-04-03  8:50 ` Srivatsa S. Bhat
2014-04-03  8:50   ` Srivatsa S. Bhat
2014-04-04  3:48   ` Michael wang
2014-04-04  3:48     ` Michael wang
2014-04-07  9:51     ` Srivatsa S. Bhat [this message]
2014-04-07  9:51       ` Srivatsa S. Bhat
2014-04-07 10:15 ` Srivatsa S. Bhat
2014-04-07 10:15   ` Srivatsa S. Bhat
2014-04-08  2:39   ` Michael wang
2014-04-08  2:39     ` Michael wang
2014-04-08  3:19 ` [PATCH v2] " Michael wang
2014-04-08  8:24   ` Srivatsa S. Bhat
2014-04-08  8:24     ` Srivatsa S. Bhat
2014-04-16  3:14   ` Michael wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5342750A.3060307@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alistair@popple.id.au \
    --cc=jlarrew@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=rcj@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=wangyun@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.