From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Su Yue <suy.fnst@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, josh@joshtriplett.org,
rostedt@goodmis.org, mathieu.desnoyers@efficios.com,
jiangshanlai@gmail.com, "Li, Philip" <philip.li@intel.com>,
lkp-developer@eclists.intel.com
Subject: Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
Date: Thu, 24 Jan 2019 12:54:52 -0800 [thread overview]
Message-ID: <20190124205452.GU4240@linux.ibm.com> (raw)
In-Reply-To: <8f6fc868-b420-bcf1-6b4d-1ca616aa6e4c@cn.fujitsu.com>
On Thu, Jan 24, 2019 at 03:00:37PM +0800, Su Yue wrote:
> On 1/23/19 11:22 AM, Paul E. McKenney wrote:
> >On Tue, Jan 22, 2019 at 04:42:19PM +0800, Su Yue wrote:
> >>Thanks for your quick reply! Paul
> >>
> >>On 1/22/19 12:01 PM, Paul E. McKenney wrote:
> >>>On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote:
> >>>>Hi, guys
> >>>> While running rcutorture tests with "onoff_interval", some tests
> >>>>failed and results show like:
> >>>>
> >>>>=====================================================================
> >>>>[ 316.354501] srcud-torture:--- End of test: RCU_HOTPLUG:
> >>>>nreaders=1 nfakewriters=4 stat_interval=60 verbose=2
> >>>>test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\
> >>>>s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0
> >>>>test_boost_interval=7 test_boost_duration=4 shutdown_secs=0
> >>>>stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\
> >>>>rrier_cbs=0 onoff_interval=3 onoff_holdoff=0
> >>>>====================================================================
> >>>>
> >>>>I am wondering that meaning of "RCU_HOTPLUG". Is it expected because
> >>>>cpu hotplug is enabled in the test? Or just represents another type of
> >>>>failure?
> >>>
> >>>This says that at least one CPU hotplug operation failed, that is,
> >>>the CPU didn't actually come online or go offline as requested. If you
> >>>are introducing CPU hotplug to an architecture, this usually indicates
> >>>that you have bugs in your CPU-hotplug code. Or it nmight be that
> >>
> >>It should hit the case since there is no RCU CPU stall warnings.
> >>
> >>>RCU grace periods failed to progress -- though this would normally
> >>>also result in RCU CPU stall warnings.
> >>>
> >>>There should be lines containing "ver:" in your console output. What
> >>>does one of the later one of these say?
> >>>
> >>
> >>The line says:
> >>======================================================================
> >>[ 318.850175] busted_srcud-torture: rtc: (null) ver:
> >>27040 tfle: 0 rta: 27040 rtaf: 0 rtf: 27027 rtmbe: 0 rtbe: 0 rtbke:
> >>0 rtbre: 0 rtbf: 0 rtb: 0 \
> >>nt: 9497 onoff: 2639/2639:2640/5310 40,373:10,355 162868:67542
> >>(HZ=1000) barrier: 0/0:0
> >
> >Yes, you have many more offline attempts than successes, which is
> >why RCU_HOTPLUG was printed.
> >
> >>=====================================================================
> >>
> >>And here are useful errors:
> >>=====================================================================
> >>kern :info : [ 135.379693] KVM setup async PF for cpu 1
> >>kern :info : [ 135.381412] kvm-stealtime: cpu 1, msr 23fd16180
> >>kern :alert : [ 135.386897] busted_srcud-torture:torture_onoff
> >
> >Just so your know, busted_srcud can sometimes fail by design. Hence
> >the "busted" in the name. But failure didn't happen this time.
> >
>
> Yes..The corner case I mentioned actually happened in every "onoff"
> tests whatever the torture_type is.
>
> >>task: onlined 1
> >>kern :alert : [ 135.408241] busted_srcud-torture:torture_onoff
> >>task: offlining 1
> >>kern :info : [ 135.423310] Unregister pv shared memory for cpu 1
> >>kern :info : [ 135.427940] smpboot: CPU 1 is now offline
> >>kern :alert : [ 135.430106] busted_srcud-torture:torture_onoff
> >>task: offlined 1
> >>kern :alert : [ 135.436404] busted_srcud-torture:torture_onoff
> >>task: offlining 0
> >>kern :alert : [ 135.446173] busted_srcud-torture:torture_onoff
> >>task: offline 0 failed: errno -16
> >>kern :alert : [ 135.453076] busted_srcud-torture:torture_onoff
> >>task: offlining 0
> >>kern :alert : [ 135.457461] busted_srcud-torture:torture_onoff
> >>task: offline 0 failed: errno -16
> >>
> >>
> >>=====================================================================
> >>There are only two CPUs on the VM. Torture try to offline the last one
> >>but -EBUSY occured.
> >>
> >>I spent time to understand kernel/torture.c.
> >>There is torture_onoff():
> >>
> >>225 while (!torture_must_stop()) {
> >>226 cpu = (torture_random(&rand) >> 4) % (maxcpu + 1);
> >>227 if (!torture_offline(cpu,
> >>228 &n_offline_attempts,
> >>&n_offline_successes,
> >>229 &sum_offline, &min_offline,
> >>&max_offline))
> >>230 torture_online(cpu,
> >>231 &n_online_attempts,
> >>&n_online_successes,
> >>232 &sum_online, &min_online,
> >>&max_online);
> >>233 schedule_timeout_interruptible(onoff_interval);
> >>234 }
> >>235
> >>
> >>torture_offline() and torture_offline() don't pre judge if the current
> >>cpu is only one usable.
> >
> >That does appear to be the case, and that would be a problem with
> >the CONFIG_BOOTPARAM_HOTPLUG_CPU0 listed below.
> >
> >Good catch!
> >
> >>Our test machines are configured with CONFIG_BOOTPARAM_HOTPLUG_CPU0. If
> >>there are only one oneline and hotplugable cpux, then
> >>n_offline_successes != n_offline_attempts which caused "End of test:
> >>RCU_HOTPLUG".
> >>
> >>Does I misunderstand something above? Feel free to correct me.
> >
> >Does the following patch help?
> >
>
> Yes, no more "errnor: -16" in dmesg and "End of test: SUCCESS" is in
> the end.
>
> Thanks for your patch.
> If the patch is to be sent in format, you can add:
>
> Tested-By: Su Yue <suy.fnst@cn.fujitsu.com>
Very good, applied! And especially thank you for finding this and
bringing it to my attention!!!
Thanx, Paul
> ---
> Su
> > Thanx, Paul
> >
> >------------------------------------------------------------------------
> >
> >diff --git a/kernel/torture.c b/kernel/torture.c
> >index a03ff722352b..2b6700ca2a43 100644
> >--- a/kernel/torture.c
> >+++ b/kernel/torture.c
> >@@ -101,6 +101,8 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
> > if (!cpu_online(cpu) || !cpu_is_hotpluggable(cpu))
> > return false;
> >+ if (num_online_cpus() <= 1)
> >+ return false; /* Can't offline the last CPU. */
> > if (verbose > 1)
> > pr_alert("%s" TORTURE_FLAG
> >
> >
> >
>
>
prev parent reply other threads:[~2019-01-24 20:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-22 3:40 rcutorture: meaning of "End of test: RCU_HOTPLUG" Su Yue
2019-01-22 4:01 ` Paul E. McKenney
2019-01-22 8:42 ` Su Yue
2019-01-23 3:22 ` Paul E. McKenney
2019-01-24 7:00 ` Su Yue
2019-01-24 20:54 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190124205452.GU4240@linux.ibm.com \
--to=paulmck@linux.ibm.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp-developer@eclists.intel.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=philip.li@intel.com \
--cc=rostedt@goodmis.org \
--cc=suy.fnst@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.