Linux RCU subsystem development
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Zhouyi Zhou <zhouzhouyi@gmail.com>,
	"moderated list:ARM/STM32 ARCHITECTURE" 
	<linux-arm-kernel@lists.infradead.org>,
	Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	rcu <rcu@vger.kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>
Subject: Re: arm64 torture test hotplug failures (offlining causes -EBUSY)
Date: Wed, 18 Jan 2023 22:37:08 +0000	[thread overview]
Message-ID: <Y8h0lDnQ9Z0VtXfV@google.com> (raw)
In-Reply-To: <20230118040058.GV2948950@paulmck-ThinkPad-P17-Gen-1>

On Tue, Jan 17, 2023 at 08:00:58PM -0800, Paul E. McKenney wrote:
[...]
> > > > > Is there a plan to make CPU hotplug failures more frequent?
> > > >
> > > > I am not aware of such a plan but I was going by "There are quite some
> > > > reasons why a CPU-hotplug or a hot-unplug operation can fail, which is
> > > > not a fatal problem, really." in [1].
> > > >
> > > > What about an rcutorture to skip hotplug for a certain cpu id,
> > > > rcutorture.skip_hotplug_cpus="0". Can be a last resort. But we/I
> > > > should debug this issue more before getting to that.
> > >
> > > Yes, in fact there already are some checks along those lines, for example,
> > > the torture_offline() function's check of cpu_is_hotpluggable().  So for
> > > example, as I understand it, a CONFIG_NO_HZ_FULL=y system should mark
> > > the housekeeping CPU as !cpu_is_hotpluggable().
> > 
> > I don't think CONFIG_NO_HZ_FULL does any such marking (at least I am
> > not seeing it). Even on x86, if you enable
> > CONFIG_BOOTPARAM_HOTPLUG_CPU0=y , and CONFIG_NO_HZ_FULL=y, and run
> > rcutorture with boot args:
> > 
> > nohz_full=0-3 rcutorture.onoff_interval=100 rcutorture.onoff_holdoff=2
> > rcutorture.shutdown_secs=30
> > 
> > You will see this in the kernel logs:
> > [    2.816022] rcu-torture:torture_onoff task: offline 0 failed: errno -16
> > [    2.975913] rcu-torture:torture_onoff task: offline 0 failed: errno -16
> > 
> > So RCU torture test clearly thought the CPUs were hot-pluggable, when
> > they was chance for them to return -EBUSY (due to housekeeping and
> > what not). So this issue seems to be architecture independent, in that
> > sense.
> > 
> > So the 2 ways forward I see are:
> > - Make the torture test aware of which CPUs are 'house keeping'
> > - Make it possible to turn off CPU0 hotplugging on ARM64 by default
> > (via CONFIG or boot option).
> > 
> > Another option could be, forgive -EBUSY on CPU0 for
> > CONFIG_NO_HZ_FULL=y.  Is it possible to assign a non-0 CPU id as a
> > housekeeping CPU?
> 
> I would be happier to forgive failure to offline housekeeping CPUs than
> blanket forgiveness of CPU 0.  Especially given that I recently got
> burned by a non-zero boot cpu.  ;-)
> 
> But wouldn't it be even better for cpu_is_hotpluggable() to know the
> NO_HZ_FULL rules of the road?

That's a great idea. I found a way to do that without having to do the
EXPORT_SYMBOL (like in Zhouyi's patch).

Would the following be acceptable (only build-tested)?

I can run more tests and submit a patch:

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 55405ebf23ab..f73bc520b70e 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -487,7 +487,8 @@ static const struct attribute_group *cpu_root_attr_groups[] = {
 bool cpu_is_hotpluggable(unsigned int cpu)
 {
 	struct device *dev = get_cpu_device(cpu);
-	return dev && container_of(dev, struct cpu, dev)->hotpluggable;
+	return dev && container_of(dev, struct cpu, dev)->hotpluggable
+		&& !tick_nohz_cpu_hotpluggable(cpu);
 }
 EXPORT_SYMBOL_GPL(cpu_is_hotpluggable);
 
diff --git a/include/linux/tick.h b/include/linux/tick.h
index bfd571f18cfd..9459fef5b857 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -216,6 +216,7 @@ extern void tick_nohz_dep_set_signal(struct task_struct *tsk,
 				     enum tick_dep_bits bit);
 extern void tick_nohz_dep_clear_signal(struct signal_struct *signal,
 				       enum tick_dep_bits bit);
+extern bool tick_nohz_cpu_hotpluggable(unsigned int cpu);
 
 /*
  * The below are tick_nohz_[set,clear]_dep() wrappers that optimize off-cases
@@ -280,6 +281,7 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
 
 static inline void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) { }
 static inline void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { }
+static inline bool tick_nohz_cpu_hotpluggable(unsigned int cpu) { return true; }
 
 static inline void tick_dep_set(enum tick_dep_bits bit) { }
 static inline void tick_dep_clear(enum tick_dep_bits bit) { }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9c6f661fb436..d1cc7525240e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -522,6 +522,11 @@ static int tick_nohz_cpu_down(unsigned int cpu)
 	return 0;
 }
 
+bool tick_nohz_cpu_hotpluggable(unsigned int cpu)
+{
+	return tick_nohz_cpu_down(cpu) == 0;
+}
+
 void __init tick_nohz_init(void)
 {
 	int cpu, ret;
-- 
2.39.0.246.g2a6d74b583-goog


  parent reply	other threads:[~2023-01-18 22:37 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-16 17:03 arm64 torture test hotplug failures (offlining causes -EBUSY) Joel Fernandes
2023-01-16 18:03 ` Marc Zyngier
2023-01-16 22:43   ` Joel Fernandes
2023-01-16 18:32 ` Zhouyi Zhou
2023-01-16 22:38   ` Joel Fernandes
2023-01-17  0:15     ` Joel Fernandes
2023-01-17  0:37       ` Zhouyi Zhou
2023-01-17  1:45         ` Joel Fernandes
2023-01-17  3:15           ` Zhouyi Zhou
2023-01-17  4:34             ` Joel Fernandes
2023-01-17 11:42               ` Zhouyi Zhou
2023-01-17 19:50                 ` Joel Fernandes
2023-01-18 10:15                 ` Zhouyi Zhou
2023-01-18 15:51                   ` Joel Fernandes
2023-01-17  4:30       ` Paul E. McKenney
2023-01-17  4:36         ` Joel Fernandes
2023-01-17  4:54           ` Paul E. McKenney
2023-01-17 20:02             ` Joel Fernandes
2023-01-17 20:42               ` Paul E. McKenney
2023-01-18  2:17                 ` Joel Fernandes
2023-01-18  4:00                   ` Paul E. McKenney
2023-01-18 16:51                     ` Will Deacon
2023-01-18 17:56                       ` Paul E. McKenney
2023-01-18 22:01                       ` Joel Fernandes
2023-01-19  9:12                         ` Mark Rutland
2023-01-18 22:37                     ` Joel Fernandes [this message]
2023-01-18 22:39                       ` Joel Fernandes
2023-01-19  0:15                         ` Paul E. McKenney
2023-01-19  0:53                           ` Joel Fernandes
2023-01-19  3:21                         ` Zhouyi Zhou
2023-01-19  8:26                           ` Joel Fernandes
2023-01-19 12:17                             ` Zhouyi Zhou
2023-01-19 13:57                       ` Frederic Weisbecker
2023-01-19 20:25                         ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8h0lDnQ9Z0VtXfV@google.com \
    --to=joel@joelfernandes.org \
    --cc=catalin.marinas@arm.com \
    --cc=frederic@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=will@kernel.org \
    --cc=zhouzhouyi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox