Linux RCU subsystem development
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Pingfan Liu <kernelfans@gmail.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	rcu@vger.kernel.org, David Woodhouse <dwmw@amazon.co.uk>,
	Neeraj Upadhyay <quic_neeraju@quicinc.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>
Subject: Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining
Date: Wed, 9 Nov 2022 18:55:36 +0000	[thread overview]
Message-ID: <Y2v3qM3hwca7Rc2P@google.com> (raw)
In-Reply-To: <20221107160726.GA3892067@paulmck-ThinkPad-P17-Gen-1>

On Mon, Nov 07, 2022 at 08:07:26AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 03, 2022 at 09:51:43AM -0700, Paul E. McKenney wrote:
> > On Mon, Oct 31, 2022 at 11:24:37AM +0800, Pingfan Liu wrote:
> > > On Fri, Oct 28, 2022 at 1:46 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Mon, Oct 10, 2022 at 09:55:26AM +0800, Pingfan Liu wrote:
> > > > > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > [...]
> > > > >
> > > > > > >
> > > > > > > But unfortunately, I did not keep the data. I will run it again and
> > > > > > > submit the data.
> > > > > >
> > > > >
> > > > > I have finished the test on a machine with two sockets and 256 cpus.
> > > > > The test runs against the kernel with three commits reverted.
> > > > >   96926686deab ("rcu: Make CPU-hotplug removal operations enable tick")
> > > > >   53e87e3cdc15 ("timers/nohz: Last resort update jiffies on nohz_full
> > > > > IRQ entry")
> > > > >   a1ff03cd6fb9c5 ("tick: Detect and fix jiffies update stall")
> > > > >
> > > > > Summary from console.log
> > > > > "
> > > > >  --- Sat Oct  8 11:34:02 AM EDT 2022 Test summary:
> > > > > Results directory:
> > > > > /home/linux/tools/testing/selftests/rcutorture/res/2022.10.07-23.10.54
> > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration
> > > > > 125h --bootargs rcutorture.onoff_interval=200
> > > > > rcutorture.onoff_holdoff=30 --configs 32*TREE04
> > > > > TREE04 ------- 1365444 GPs (3.03432/s) n_max_cbs: 850290
> > > > > TREE04 no success message, 2897 successful version messages
> > > > > Completed in 44512 vs. 450000
> > > > > TREE04.10 ------- 1331565 GPs (2.95903/s) n_max_cbs: 909075
> > > > > TREE04.10 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.11 ------- 1331535 GPs (2.95897/s) n_max_cbs: 1213974
> > > > > TREE04.11 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.12 ------- 1322160 GPs (2.93813/s) n_max_cbs: 2615313
> > > > > TREE04.12 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.13 ------- 1320032 GPs (2.9334/s) n_max_cbs: 914751
> > > > > TREE04.13 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.14 ------- 1339969 GPs (2.97771/s) n_max_cbs: 1560203
> > > > > TREE04.14 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.15 ------- 1318805 GPs (2.93068/s) n_max_cbs: 1757478
> > > > > TREE04.15 no success message, 2897 successful version messages
> > > > > Completed in 44510 vs. 450000
> > > > > TREE04.16 ------- 1340633 GPs (2.97918/s) n_max_cbs: 1377647
> > > > > TREE04.16 no success message, 2897 successful version messages
> > > > > Completed in 44510 vs. 450000
> > > > > TREE04.17 ------- 1322798 GPs (2.93955/s) n_max_cbs: 1266344
> > > > > TREE04.17 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.18 ------- 1346302 GPs (2.99178/s) n_max_cbs: 1030713
> > > > > TREE04.18 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.19 ------- 1322499 GPs (2.93889/s) n_max_cbs: 917118
> > > > > TREE04.19 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > ...
> > > > > TREE04.4 ------- 1310283 GPs (2.91174/s) n_max_cbs: 2146905
> > > > > TREE04.4 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.5 ------- 1333238 GPs (2.96275/s) n_max_cbs: 1027172
> > > > > TREE04.5 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.6 ------- 1313915 GPs (2.91981/s) n_max_cbs: 1017511
> > > > > TREE04.6 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.7 ------- 1341871 GPs (2.98194/s) n_max_cbs: 816265
> > > > > TREE04.7 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.8 ------- 1339412 GPs (2.97647/s) n_max_cbs: 1316404
> > > > > TREE04.8 no success message, 2897 successful version messages
> > > > > Completed in 44511 vs. 450000
> > > > > TREE04.9 ------- 1327240 GPs (2.94942/s) n_max_cbs: 1409531
> > > > > TREE04.9 no success message, 2897 successful version messages
> > > > > Completed in 44510 vs. 450000
> > > > > 32 runs with runtime errors.
> > > > >  --- Done at Sat Oct  8 11:34:10 AM EDT 2022 (12:23:16) exitcode 2
> > > > > "
> > > > > I have no idea about the test so just arbitrarily pick up the
> > > > > console.log of TREE04.10 as an example. Please get it from attachment.
> > > >
> > > > Very good, thank you!
> > > >
> > > > Could you please clearly indicate what you tested?  For example, if
> > > > you have an externally visible git tree, please point me at the tree
> > > > and the SHA-1.  Or send a patch series clearly indicating what it is
> > > > based on.
> > > >
> > > 
> > > Yes, it is a good way to eliminate any unexpected mistakes before a rigid test.
> > > 
> > > Please clone it from https://github.com/pfliu/linux.git  branch:
> > > rcu#revert_tick_dep
> > 
> > Thank you very much!
> > 
> > > > Then I can try a long run on a larger collection of systems.
> > > >
> > > 
> > > Thank you very much.
> > > 
> > > > If that works out, we can see about adjustments to mainline.  ;-)
> > > >
> > > 
> > > Eager to see.
> > 
> > I ran 200 hours of TREE04 and got an RCU CPU stall warning.  I ran 2000
> > hours on v6.0, which precedes these commits, and everything passed.
> > 
> > I will run more, primarily on v6.0, but that is what I have thus far.
> > At the moment, I have some concerns about this change.
> 
> OK, so I have run a total of 8000 hours on v6.0 without failure.  I have
> run 4200 hours on rcu#revert_tick_dep with 15 failures.  The ones I
> looked at were RCU CPU stall warnings with timer failures.
> 
> This data suggests that the kernel is not yet ready for that commit
> to be reverted.

Even if the tests pass, can we really survive with this patch
that he reverted?
https://github.com/pfliu/linux/commit/03179ef33e8e2608184ade99a27f760f9d01e6b7

If stop machine on a CPU spends a good amount of time in kernel mode, while a
grace period starts on another CPU, then we're kind of screwed if we don't
have the tick enabled right?

Or, did we make any changes to stop machine such that, that's no longer an
issue?

thanks,

 - Joel


  reply	other threads:[~2022-11-09 18:55 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-15  5:58 [PATCHv2 0/3] rcu: Enhance the capability to cope with concurrent cpu offlining/onlining Pingfan Liu
2022-09-15  5:58 ` [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu() Pingfan Liu
2022-09-15  6:11   ` Pingfan Liu
2022-09-16 14:52   ` Frederic Weisbecker
2022-09-19 10:24     ` Pingfan Liu
2022-09-19 10:51       ` Frederic Weisbecker
2022-09-20  3:45         ` Pingfan Liu
2022-09-20  9:23           ` Frederic Weisbecker
2022-10-01  2:26             ` Joel Fernandes
2022-10-02 12:34               ` Pingfan Liu
2022-10-02 15:52                 ` Joel Fernandes
2022-09-20 10:31   ` Frederic Weisbecker
2022-09-21 11:56     ` Pingfan Liu
2022-09-15  5:58 ` [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining Pingfan Liu
2022-09-16 14:23   ` Frederic Weisbecker
2022-09-19  4:33     ` Pingfan Liu
2022-09-19 10:34       ` Frederic Weisbecker
2022-09-20  3:16         ` Pingfan Liu
2022-09-20  9:00           ` Frederic Weisbecker
2022-09-20  9:38   ` Frederic Weisbecker
2022-09-21 11:48     ` Pingfan Liu
2022-09-15  5:58 ` [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining Pingfan Liu
2022-09-16 13:42   ` Frederic Weisbecker
2022-09-20  7:26     ` Pingfan Liu
2022-09-20  9:46       ` Frederic Weisbecker
2022-09-20 19:13         ` Paul E. McKenney
2022-09-22  9:29           ` Pingfan Liu
2022-09-22 13:54             ` Paul E. McKenney
2022-09-23 22:13               ` Frederic Weisbecker
2022-09-26  6:34               ` Pingfan Liu
2022-09-26 22:23                 ` Paul E. McKenney
2022-09-27  9:59                   ` Pingfan Liu
2022-09-29  8:19                     ` Pingfan Liu
2022-09-29  8:20                       ` Pingfan Liu
2022-09-30 13:04                         ` Joel Fernandes
2022-10-02 14:06                           ` Pingfan Liu
2022-10-02 16:11                             ` Joel Fernandes
2022-10-02 16:24                               ` Paul E. McKenney
2022-10-02 16:30                                 ` Joel Fernandes
2022-10-02 16:57                                   ` Paul E. McKenney
2022-10-02 16:59                                     ` Joel Fernandes
2022-09-30 15:44                       ` Paul E. McKenney
2022-10-02 13:29                         ` Pingfan Liu
2022-10-02 15:08                           ` Frederic Weisbecker
2022-10-02 16:20                             ` Paul E. McKenney
2022-10-02 16:20                           ` Paul E. McKenney
     [not found]                             ` <CAFgQCTtgLfc0NeYqyWk4Ew-pA9rMREjRjWSnQhYLv-V5117s9Q@mail.gmail.com>
2022-10-27 17:46                               ` Paul E. McKenney
2022-10-31  3:24                                 ` Pingfan Liu
2022-11-03 16:51                                   ` Paul E. McKenney
2022-11-07 16:07                                     ` Paul E. McKenney
2022-11-09 18:55                                       ` Joel Fernandes [this message]
2022-11-18 12:08                                         ` Pingfan Liu
2022-11-18 23:30                                           ` Paul E. McKenney
2022-11-21  3:48                                             ` Pingfan Liu
2022-11-21 17:14                                               ` Paul E. McKenney
2022-11-17 14:39                                       ` Frederic Weisbecker
2022-11-18  1:45                                         ` Pingfan Liu
     [not found]                             ` <CAFgQCTtNetv7v_Law=abPtngC8Gv6OGcGz9M_wWMxz_GAEWDUQ@mail.gmail.com>
2022-10-27 18:13                               ` Paul E. McKenney
2022-10-31  2:10                                 ` Pingfan Liu
2022-09-26 16:13   ` Joel Fernandes
2022-09-27  9:42     ` Pingfan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2v3qM3hwca7Rc2P@google.com \
    --to=joel@joelfernandes.org \
    --cc=Jason@zx2c4.com \
    --cc=dwmw@amazon.co.uk \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kernelfans@gmail.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=quic_neeraju@quicinc.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox