public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Su Yue <suy.fnst@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, josh@joshtriplett.org,
	rostedt@goodmis.org, mathieu.desnoyers@efficios.com,
	jiangshanlai@gmail.com, "Li, Philip" <philip.li@intel.com>,
	lkp-developer@eclists.intel.com
Subject: Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
Date: Tue, 22 Jan 2019 19:22:51 -0800	[thread overview]
Message-ID: <20190123032251.GG4240@linux.ibm.com> (raw)
In-Reply-To: <c2cf5125-2545-c325-0393-0dba4aab379d@cn.fujitsu.com>

On Tue, Jan 22, 2019 at 04:42:19PM +0800, Su Yue wrote:
> Thanks for your quick reply! Paul
> 
> On 1/22/19 12:01 PM, Paul E. McKenney wrote:
> >On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote:
> >>Hi, guys
> >>   While running rcutorture tests with "onoff_interval", some tests
> >>failed and results show like:
> >>
> >>=====================================================================
> >>[  316.354501] srcud-torture:--- End of test: RCU_HOTPLUG:
> >>nreaders=1 nfakewriters=4 stat_interval=60 verbose=2
> >>test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\
> >>s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0
> >>test_boost_interval=7 test_boost_duration=4 shutdown_secs=0
> >>stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\
> >>rrier_cbs=0 onoff_interval=3 onoff_holdoff=0
> >>====================================================================
> >>
> >>I am wondering that meaning of "RCU_HOTPLUG". Is it expected because
> >>cpu hotplug is enabled in the test? Or just represents another type of
> >>failure?
> >
> >This says that at least one CPU hotplug operation failed, that is,
> >the CPU didn't actually come online or go offline as requested.  If you
> >are introducing CPU hotplug to an architecture, this usually indicates
> >that you have bugs in your CPU-hotplug code.  Or it nmight be that
> 
> It should hit the case since there is no RCU CPU stall warnings.
> 
> >RCU grace periods failed to progress -- though this would normally
> >also result in RCU CPU stall warnings.
> >
> >There should be lines containing "ver:" in your console output.  What
> >does one of the later one of these say?
> >
> 
> The line says:
> ======================================================================
> [  318.850175] busted_srcud-torture: rtc:           (null) ver:
> 27040 tfle: 0 rta: 27040 rtaf: 0 rtf: 27027 rtmbe: 0 rtbe: 0 rtbke:
> 0 rtbre: 0 rtbf: 0 rtb: 0 \
> nt: 9497 onoff: 2639/2639:2640/5310 40,373:10,355 162868:67542
> (HZ=1000) barrier: 0/0:0

Yes, you have many more offline attempts than successes, which is
why RCU_HOTPLUG was printed.

> =====================================================================
> 
> And here are useful errors:
> =====================================================================
> kern  :info  : [  135.379693] KVM setup async PF for cpu 1
> kern  :info  : [  135.381412] kvm-stealtime: cpu 1, msr 23fd16180
> kern  :alert : [  135.386897] busted_srcud-torture:torture_onoff

Just so your know, busted_srcud can sometimes fail by design.  Hence
the "busted" in the name.  But failure didn't happen this time.

> task: onlined 1
> kern  :alert : [  135.408241] busted_srcud-torture:torture_onoff
> task: offlining 1
> kern  :info  : [  135.423310] Unregister pv shared memory for cpu 1
> kern  :info  : [  135.427940] smpboot: CPU 1 is now offline
> kern  :alert : [  135.430106] busted_srcud-torture:torture_onoff
> task: offlined 1
> kern  :alert : [  135.436404] busted_srcud-torture:torture_onoff
> task: offlining 0
> kern  :alert : [  135.446173] busted_srcud-torture:torture_onoff
> task: offline 0 failed: errno -16
> kern  :alert : [  135.453076] busted_srcud-torture:torture_onoff
> task: offlining 0
> kern  :alert : [  135.457461] busted_srcud-torture:torture_onoff
> task: offline 0 failed: errno -16
> 
> 
> =====================================================================
> There are only two CPUs on the VM. Torture try to offline the last one
> but -EBUSY occured.
> 
> I spent time to understand kernel/torture.c.
> There is torture_onoff():
> 
> 225        while (!torture_must_stop()) {
> 226                cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
> 227                if (!torture_offline(cpu,
> 228                                     &n_offline_attempts,
> &n_offline_successes,
> 229                                     &sum_offline, &min_offline,
> &max_offline))
> 230                        torture_online(cpu,
> 231                                       &n_online_attempts,
> &n_online_successes,
> 232                                       &sum_online, &min_online,
> &max_online);
> 233                schedule_timeout_interruptible(onoff_interval);
> 234        }
> 235
> 
> torture_offline() and torture_offline() don't pre judge if the current
> cpu is only one usable.

That does appear to be the case, and that would be a problem with
the CONFIG_BOOTPARAM_HOTPLUG_CPU0 listed below.

Good catch!

> Our test machines are configured with CONFIG_BOOTPARAM_HOTPLUG_CPU0. If
> there are only one oneline and hotplugable cpux, then
> n_offline_successes != n_offline_attempts which caused "End of test:
> RCU_HOTPLUG".
> 
> Does I misunderstand something above? Feel free to correct me.

Does the following patch help?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/torture.c b/kernel/torture.c
index a03ff722352b..2b6700ca2a43 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -101,6 +101,8 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
 
 	if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
 		return false;
+	if (num_online_cpus() <= 1)
+		return false;  /* Can't offline the last CPU. */
 
 	if (verbose > 1)
 		pr_alert("%s" TORTURE_FLAG


  reply	other threads:[~2019-01-23  3:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-22  3:40 rcutorture: meaning of "End of test: RCU_HOTPLUG" Su Yue
2019-01-22  4:01 ` Paul E. McKenney
2019-01-22  8:42   ` Su Yue
2019-01-23  3:22     ` Paul E. McKenney [this message]
2019-01-24  7:00       ` Su Yue
2019-01-24 20:54         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190123032251.GG4240@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp-developer@eclists.intel.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=philip.li@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=suy.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox