linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: lorenzo.pieralisi@arm.com (Lorenzo Pieralisi)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v3] drivers: psci: PSCI checker module
Date: Wed, 26 Oct 2016 14:17:52 +0100	[thread overview]
Message-ID: <20161026131752.GA15478@red-moon> (raw)
In-Reply-To: <20161025183436.GF3716@linux.vnet.ibm.com>

On Tue, Oct 25, 2016 at 11:34:36AM -0700, Paul E. McKenney wrote:

[...]

> > > +static int __init psci_checker(void)
> > > +{
> > > +	int ret;
> > > +
> > > +	/*
> > > +	 * Since we're in an initcall, we assume that all the CPUs that all
> > > +	 * CPUs that can be onlined have been onlined.
> > > +	 *
> > > +	 * The tests assume that hotplug is enabled but nobody else is using it,
> > > +	 * otherwise the results will be unpredictable. However, since there
> > > +	 * is no userspace yet in initcalls, that should be fine.
> > 
> > I do not think it is. If you run a kernel with, say,
> > CONFIG_LOCK_TORTURE_TEST, cpus may disappear from the radar while
> > running the PSCI checker test itself; that at least would confuse the
> > checker, and that's just an example.
> > 
> > I added Paul to check what are the assumptions behind the torture test
> > hotplug tests, in particular if there are any implicit assumptions for
> > it to work (ie it is the only kernel test hotplugging cpus in and out
> > (?)), what I know is that the PSCI checker assumptions are not correct.
> 
> Both CONFIG_LOCK_TORTURE_TEST and CONFIG_RCU_TORTURE_TEST can and will
> hotplug CPUs.  The locktorture.onoff_holdoff and rcutorture.onoff_holdoff
> kernel parameters can delay the start of CPU-hotplug testing, and in
> my testing I set this delay to 30 seconds after boot.
> 
> One approach would be to make your test refuse to run if either of
> the lock/RCU torture tests was running.  Or do what Lorenzo suggests
> below.  The torture tests aren't crazy enough to offline the last CPU.
> Though they do try, just for effect, in cases where the last CPU is
> marked cpu_is_hotpluggable().  ;-)

Thank you Paul. I have an additional question though. Is there any
implicit assumption in LOCK/RCU torture tests whereby nothing else
in the kernel is hotplugging cpus in/out (through cpu_down()/up())
while they are running ?

I am asking because that's the main reason behind my query. Those tests
hotplug cpus in and out through cpu_down/up() but AFAICS nothing
prevents another piece of code in the kernel to call those functions and
the tests may just fail in that case (ie trying to cpu_down() a cpu
that is not online).

Are false negatives contemplated (or I am missing something) ?

I just would like to understand if what this patch currently does
is safe and sound. I think that calling cpu_down() and cpu_up()
is always safe, but the test can result in false negatives if
other kernel subsystems (eg LOCK torture test) are calling those
APIs in parallel (because cpu_down()/cpu_up() calls can fail - eg
trying to cpu_down() a cpu that is not online any longer, since it
was taken down by another kernel control path), that's the question
I have.

Thanks !
Lorenzo

> 
> 						Thanx, Paul
> 
> > There is something simple you can do to get this "fixed".
> > 
> > You can use the new API James implemented for hibernate,
> > that allows you to disable (ie PSCI CPU OFF) all "secondary" cpus
> > other than the primary one passed in as parameter:
> > 
> > freeze_secondary_cpus(int primary);
> > 
> > that function will _cpu_down() all online cpus other than "primary"
> > in one go, without any interference allowed from other bits of the
> > kernel. It requires an enable_nonboot_cpus() counterpart, and you
> > can do that for every online cpus you detect (actually you can even
> > avoid using the online cpu mask and use the present mask to carry
> > out the test). If there is a resident trusted OS you can just
> > trigger the test with primary == tos_resident_cpu, since all
> > others are bound to fail (well, you can run them and check they
> > do fail, it is a checker after all).
> > 
> > You would lose the capability of hotplugging a "cluster" at a time, but
> > I do not think it is a big problem, the test above would cover it
> > and more importantly, it is safe to execute.
> > 
> > Or we can augment the torture test API to restrict the cpumask
> > it actually uses to offline/online cpus (I am referring to
> > torture_onoff(), kernel/torture.c).
> > 
> > Further comments appreciated since I am not sure I grokked the
> > assumption the torture tests make about hotplugging cpus in and out,
> > I will go through the commits logs again to find more info.
> > 
> > Thanks !
> > Lorenzo
> > 
> > > +	 */
> > > +	nb_available_cpus = num_online_cpus();
> > > +
> > > +	/* Check PSCI operations are set up and working. */
> > > +	ret = psci_ops_check();
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	pr_info("PSCI checker started using %u CPUs\n", nb_available_cpus);
> > > +
> > > +	pr_info("Starting hotplug tests\n");
> > > +	ret = hotplug_tests();
> > > +	if (ret == 0)
> > > +		pr_info("Hotplug tests passed OK\n");
> > > +	else if (ret > 0)
> > > +		pr_err("%d error(s) encountered in hotplug tests\n", ret);
> > > +	else {
> > > +		pr_err("Out of memory\n");
> > > +		return ret;
> > > +	}
> > > +
> > > +	pr_info("Starting suspend tests (%d cycles per state)\n",
> > > +		NUM_SUSPEND_CYCLE);
> > > +	ret = suspend_tests();
> > > +	if (ret == 0)
> > > +		pr_info("Suspend tests passed OK\n");
> > > +	else if (ret > 0)
> > > +		pr_err("%d error(s) encountered in suspend tests\n", ret);
> > > +	else {
> > > +		switch (ret) {
> > > +		case -ENOMEM:
> > > +			pr_err("Out of memory\n");
> > > +			break;
> > > +		case -ENODEV:
> > > +			pr_warn("Could not start suspend tests on any CPU\n");
> > > +			break;
> > > +		}
> > > +	}
> > > +
> > > +	pr_info("PSCI checker completed\n");
> > > +	return ret < 0 ? ret : 0;
> > > +}
> > > +late_initcall(psci_checker);
> > > -- 
> > > 2.10.0
> > > 
> > 
> 

  reply	other threads:[~2016-10-26 13:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-20 14:51 [PATCH v3] drivers: psci: PSCI checker module Kevin Brodsky
2016-10-25 15:45 ` Lorenzo Pieralisi
2016-10-25 18:34   ` Paul E. McKenney
2016-10-26 13:17     ` Lorenzo Pieralisi [this message]
2016-10-26 15:18       ` Paul E. McKenney
2016-10-26 17:10         ` Lorenzo Pieralisi
2016-10-26 17:22           ` Paul E. McKenney
2016-10-26 17:35             ` Lorenzo Pieralisi
2016-10-26 18:11               ` Paul E. McKenney
2016-10-27  9:13                 ` Lorenzo Pieralisi
2016-10-27 12:51                   ` Kevin Brodsky
2016-10-27 14:54                     ` Paul E. McKenney
2016-10-27 16:06                       ` Kevin Brodsky
2016-10-27 16:32                         ` Lorenzo Pieralisi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161026131752.GA15478@red-moon \
    --to=lorenzo.pieralisi@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).