linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>, linuxppc-dev@lists.ozlabs.org
Subject: Re: [RFC PATCH] powerpc/numa: add ability to disable and debug topology updates
Date: Mon, 15 Sep 2014 16:54:55 -0700	[thread overview]
Message-ID: <20140915235455.GB5238@linux.vnet.ibm.com> (raw)
In-Reply-To: <1410757536.32643.1.camel@concordia>

Hi Michael,

On 15.09.2014 [15:05:36 +1000], Michael Ellerman wrote:
> On Tue, 2014-09-09 at 13:09 -0700, Nishanth Aravamudan wrote:
> > We have hit a few customer issues with the topology update code (VPHN
> > and PRRN). It would be nice to be able to debug the notifications coming
> > from the hypervisor in both cases to the LPAR, as well as to disable
> > reacting to the notifications, to narrow down the source of the
> > problems. Add a basic level of such functionality, similar to the numa=
> > command-line parameter.
> > 
> > Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> > ---
> > This is pretty rough, but has been useful in the field already. I'm not
> > sure if more information would be useful than this basic amount.
> > 
> > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> > index 5ae8608ca9f5..6e3b9e3a2ab4 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -3370,6 +3370,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >  			e.g. base its process migration decisions on it.
> >  			Default is on.
> >  
> > +	topology_updates= [KNL, PPC, NUMA]
> > +			Format: {off | debug}
> > +			Specify if the kernel should ignore (off) or
> > +			emit more information (debug) when the
> > +			hypervisor sends NUMA topology updates to an
> > +			LPAR.
> 
> Boot-time parameters are kind of a pain, not least because they
> require a reboot to activate.

Thanks for the review, this is a good point. In the case we're
hitting, we're actually crashing at boot because of possibly dodgy
firmware data.

> Does it really need to be a boot param, or could it be a debugfs or
> sysctl flag? ie. do we need to disable it immediately at boot or would
> it be OK if it was /etc/rc.local or similar that turned it off ?

We need it off at boot, potentially. An LPAR does not indicate that it
will or will not respond to the events in any synchronous fashion, so
the hypervisor is free to send them to us whenever.

> It looks like arch_update_cpu_topology() is called quite early, from
> init_sched_domains(), but in that case I don't see how
> cpu_associativity_changes_mask can be non-zero, ie. we'll never do any
> work via that path.

Correct, that path is probably fine. But we can get these hypervisor
events fairly early in boot.

> As far as the debug goes, we could just use pr_debug() with
> CONFIG_DYNAMIC_DEBUG, it's not quite as easy to enable as a kernel
> parameter but for the odd bit of debugging it should be fine.

That's a good point, I wonder with that mechanism if we should perhaps
extend/remove other static debugging methods in favor of that (e.g.,
numa=debug)?

> I guess the downside there is you have to do some work before you know
> if you're printing anything out. More below ...
> 
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index d7737a542fd7..72c5ad313cbe 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -1160,6 +1160,28 @@ static int __init early_numa(char *p)
> >  }
> >  early_param("numa", early_numa);
> >  
> > +static int topology_updates_enabled = 1;
> > +static int topology_updates_debug = 0;
> > +
> > +static int __init early_topology_updates(char *p)
> > +{
> > +	if (!p)
> > +		return 0;
> > +
> > +	if (strstr(p, "off")) {
> > +		printk(KERN_INFO "Disabling topology updates\n");
> > +		topology_updates_enabled = 0;
> > +	}
> 
> Can you add a:
> 	#define pr_fmt(fmt) "numa: " fmt
> 
> At the top of the file and then use pr_xxx() for these?

Yes.

> > +	if (strstr(p, "debug")) {
> > +		printk(KERN_INFO "Enabling topology updates debug\n");
> > +		topology_updates_debug = 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +early_param("topology_updates", early_topology_updates);
> > +
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> >  /*
> >   * Find the node associated with a hot added memory section for
> > @@ -1546,6 +1568,9 @@ int arch_update_cpu_topology(void)
> >  	struct device *dev;
> >  	int weight, new_nid, i = 0;
> >  
> > +	if (!topology_updates_enabled)
> > +		return 0;
> > +
> > 	weight = cpumask_weight(&cpu_associativity_changes_mask);
> > 	if (!weight)
> > 		return 0;
> > @@ -1610,6 +1635,25 @@ int arch_update_cpu_topology(void)
> >  	 *
> >  	 * And for the similar reason, we will skip all the following updating.
> >  	 */
> 
> The comment should stay attached to the check below of updated_cpus, ie. this
> block should preceed the comment.

Yep.

> > +	if (topology_updates_debug) {
> > +		char *buf = kmalloc_array(NR_CPUS*5, sizeof(char), GFP_KERNEL);
> > +		cpumask_scnprintf(buf, NR_CPUS*5, &updated_cpus);
> > +		printk(KERN_DEBUG "Topology update for the following CPUs:\n");
> > +		printk(KERN_DEBUG " %s\n", buf);
> > +		printk(KERN_DEBUG "cpumask_weight(&updated_cpus)) = %u\n",
> > +						cpumask_weight(&updated_cpus));
> 
> Do we really need to print the cpumask? The same information is
> available below right, just in a more verbose form?

Yes, sorry, this is a "living" patch, and I haven't fully cleaned it up.
Will remove the cpumask bodge in v2.

> Assuming we can skip that we can just use pr_debug() for these I think
> and drop the debug flag. Or is this a super hot-path that I'm not
> aware of?

Not super hot, but when it breaks, your LPAR is unusable.

> > +		if (cpumask_weight(&updated_cpus)) {
> > +			for (ud = &updates[0]; ud; ud = ud->next) {
> > +				printk(KERN_DEBUG "cpu %d moving from node %d "
> > +						  "to %d\n", ud->cpu,
> > +						  ud->old_nid, ud->new_nid);
> > +			}
> > +		}
> > +		kfree(buf);
> > +	}
> > +
> >  	if (!cpumask_weight(&updated_cpus))
> >  		goto out;
> >  
> > @@ -1807,8 +1851,10 @@ static const struct file_operations topology_ops = {
> >  
> >  static int topology_update_init(void)
> >  {
> > -	start_topology_update();
> > -	proc_create("powerpc/topology_updates", 0644, NULL, &topology_ops);
> > +	if (topology_updates_enabled) {
> > +		start_topology_update();
> > +		proc_create("powerpc/topology_updates", 0644, NULL, &topology_ops);
> > +	}
> 
> It was already broken, but can you fix the (lack of) error handling
> here please?

Yep, will make it the first patch in the series.

-Nish

  reply	other threads:[~2014-09-15 23:55 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-09 20:09 [RFC PATCH] powerpc/numa: add ability to disable and debug topology updates Nishanth Aravamudan
2014-09-15  5:05 ` Michael Ellerman
2014-09-15 23:54   ` Nishanth Aravamudan [this message]
2014-09-17  3:38     ` Michael Ellerman
2014-09-16 19:42 ` Nathan Fontenot
2014-09-16 20:10   ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140915235455.GB5238@linux.vnet.ibm.com \
    --to=nacc@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).