From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: "He, Bo" <bo.he@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"josh@joshtriplett.org" <josh@joshtriplett.org>,
"mathieu.desnoyers@efficios.com" <mathieu.desnoyers@efficios.com>,
"jiangshanlai@gmail.com" <jiangshanlai@gmail.com>,
"Zhang, Jun" <jun.zhang@intel.com>,
"Xiao, Jin" <jin.xiao@intel.com>,
"Zhang, Yanmin" <yanmin.zhang@intel.com>,
"Bai, Jie A" <jie.a.bai@intel.com>
Subject: Re: rcu_preempt caused oom
Date: Tue, 11 Dec 2018 18:24:46 -0800 [thread overview]
Message-ID: <20181212022446.GV4170@linux.ibm.com> (raw)
In-Reply-To: <CD6925E8781EFD4D8E11882D20FC406D52A18E53@SHSMSX104.ccr.corp.intel.com>
On Wed, Dec 12, 2018 at 01:37:40AM +0000, He, Bo wrote:
> We reproduced the issue panic in hung_task with the patch "Improve diagnostics for failed RCU grace-period start", but unfortunately maybe it's due to the loglevel, the show_rcu_gp_kthreads doesn't print any logs, we will improve the build and run the test to double check.
Well, at least the diagnostics didn't prevent the problem from happening. ;-)
Thanx, Paul
> -----Original Message-----
> From: Paul E. McKenney <paulmck@linux.ibm.com>
> Sent: Tuesday, December 11, 2018 12:47 PM
> To: He, Bo <bo.he@intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>; linux-kernel@vger.kernel.org; josh@joshtriplett.org; mathieu.desnoyers@efficios.com; jiangshanlai@gmail.com; Zhang, Jun <jun.zhang@intel.com>; Xiao, Jin <jin.xiao@intel.com>; Zhang, Yanmin <yanmin.zhang@intel.com>; Bai, Jie A <jie.a.bai@intel.com>
> Subject: Re: rcu_preempt caused oom
>
> On Mon, Dec 10, 2018 at 04:38:38PM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 10, 2018 at 06:56:18AM +0000, He, Bo wrote:
> > > Hi,
> > > We have start the test with the CONFIG_PROVE_RCU=y, and also add one 2s to detect the preempt rcu hang, hope we can get more useful logs tomorrow.
> > > I also enclosed the config and the debug patches for you review.
> >
> > I instead suggest the (lightly tested) debug patch shown below, which
> > tracks wakeups of RCU's grace-period kthreads and dumps them out if a
> > given requested grace period fails to start. Again, it is necessary
> > to build with CONFIG_PROVE_RCU=y, that is, with CONFIG_PROVE_LOCKING=y.
>
> Right. This time without commenting out the wakeup as a test of the diagnostic. :-/
>
> Please use the patch below instead of the one that I sent in my previous email.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit adfc7dff659495a3433d5084256be59eee0ac6df
> Author: Paul E. McKenney <paulmck@linux.ibm.com>
> Date: Mon Dec 10 16:33:59 2018 -0800
>
> rcu: Improve diagnostics for failed RCU grace-period start
>
> Backported from v4.21/v5.0
>
> If a grace period fails to start (for example, because you commented
> out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
> will invoke rcu_check_gp_start_stall(), which will notice and complain.
> However, this complaint is lacking crucial debugging information such
> as when the last wakeup executed and what the value of ->gp_seq was at
> that time. This commit therefore removes the current pr_alert() from
> rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
> which has been updated to print the needed information, which is collected
> by rcu_gp_kthread_wake().
>
> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0b760c1369f7..4bcd8753e293 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -626,25 +626,57 @@ void rcu_sched_force_quiescent_state(void)
> }
> EXPORT_SYMBOL_GPL(rcu_sched_force_quiescent_state);
>
> +/*
> + * Convert a ->gp_state value to a character string.
> + */
> +static const char *gp_state_getname(short gs) {
> + if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
> + return "???";
> + return gp_state_names[gs];
> +}
> +
> +/*
> + * Return the root node of the specified rcu_state structure.
> + */
> +static struct rcu_node *rcu_get_root(struct rcu_state *rsp) {
> + return &rsp->node[0];
> +}
> +
> /*
> * Show the state of the grace-period kthreads.
> */
> void show_rcu_gp_kthreads(void)
> {
> int cpu;
> + unsigned long j;
> + unsigned long ja;
> + unsigned long jr;
> + unsigned long jw;
> struct rcu_data *rdp;
> struct rcu_node *rnp;
> struct rcu_state *rsp;
>
> + j = jiffies;
> for_each_rcu_flavor(rsp) {
> - pr_info("%s: wait state: %d ->state: %#lx\n",
> - rsp->name, rsp->gp_state, rsp->gp_kthread->state);
> + ja = j - READ_ONCE(rsp->gp_activity);
> + jr = j - READ_ONCE(rsp->gp_req_activity);
> + jw = j - READ_ONCE(rsp->gp_wake_time);
> + pr_info("%s: wait state: %s(%d) ->state: %#lx delta ->gp_activity %lu ->gp_req_activity %lu ->gp_wake_time %lu ->gp_wake_seq %ld ->gp_seq %ld ->gp_seq_needed %ld ->gp_flags %#x\n",
> + rsp->name, gp_state_getname(rsp->gp_state),
> + rsp->gp_state,
> + rsp->gp_kthread ? rsp->gp_kthread->state : 0x1ffffL,
> + ja, jr, jw, (long)READ_ONCE(rsp->gp_wake_seq),
> + (long)READ_ONCE(rsp->gp_seq),
> + (long)READ_ONCE(rcu_get_root(rsp)->gp_seq_needed),
> + READ_ONCE(rsp->gp_flags));
> rcu_for_each_node_breadth_first(rsp, rnp) {
> if (ULONG_CMP_GE(rsp->gp_seq, rnp->gp_seq_needed))
> continue;
> - pr_info("\trcu_node %d:%d ->gp_seq %lu ->gp_seq_needed %lu\n",
> - rnp->grplo, rnp->grphi, rnp->gp_seq,
> - rnp->gp_seq_needed);
> + pr_info("\trcu_node %d:%d ->gp_seq %ld ->gp_seq_needed %ld\n",
> + rnp->grplo, rnp->grphi, (long)rnp->gp_seq,
> + (long)rnp->gp_seq_needed);
> if (!rcu_is_leaf_node(rnp))
> continue;
> for_each_leaf_node_possible_cpu(rnp, cpu) { @@ -653,8 +685,8 @@ void show_rcu_gp_kthreads(void)
> ULONG_CMP_GE(rsp->gp_seq,
> rdp->gp_seq_needed))
> continue;
> - pr_info("\tcpu %d ->gp_seq_needed %lu\n",
> - cpu, rdp->gp_seq_needed);
> + pr_info("\tcpu %d ->gp_seq_needed %ld\n",
> + cpu, (long)rdp->gp_seq_needed);
> }
> }
> /* sched_show_task(rsp->gp_kthread); */ @@ -690,14 +722,6 @@ void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags, } EXPORT_SYMBOL_GPL(rcutorture_get_gp_data);
>
> -/*
> - * Return the root node of the specified rcu_state structure.
> - */
> -static struct rcu_node *rcu_get_root(struct rcu_state *rsp) -{
> - return &rsp->node[0];
> -}
> -
> /*
> * Enter an RCU extended quiescent state, which can be either the
> * idle loop or adaptive-tickless usermode execution.
> @@ -1285,16 +1309,6 @@ static void record_gp_stall_check_time(struct rcu_state *rsp)
> rsp->n_force_qs_gpstart = READ_ONCE(rsp->n_force_qs); }
>
> -/*
> - * Convert a ->gp_state value to a character string.
> - */
> -static const char *gp_state_getname(short gs) -{
> - if (gs < 0 || gs >= ARRAY_SIZE(gp_state_names))
> - return "???";
> - return gp_state_names[gs];
> -}
> -
> /*
> * Complain about starvation of grace-period kthread.
> */
> @@ -1693,7 +1707,8 @@ static bool rcu_future_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp)
> * Don't do a self-awaken, and don't bother awakening when there is
> * nothing for the grace-period kthread to do (as in several CPUs
> * raced to awaken, and we lost), and finally don't try to awaken
> - * a kthread that has not yet been created.
> + * a kthread that has not yet been created. If all those checks are
> + * passed, track some debug information and awaken.
> */
> static void rcu_gp_kthread_wake(struct rcu_state *rsp) { @@ -1701,6 +1716,8 @@ static void rcu_gp_kthread_wake(struct rcu_state *rsp)
> !READ_ONCE(rsp->gp_flags) ||
> !rsp->gp_kthread)
> return;
> + WRITE_ONCE(rsp->gp_wake_time, jiffies);
> + WRITE_ONCE(rsp->gp_wake_seq, READ_ONCE(rsp->gp_seq));
> swake_up_one(&rsp->gp_wq);
> }
>
> @@ -2802,16 +2819,11 @@ rcu_check_gp_start_stall(struct rcu_state *rsp, struct rcu_node *rnp,
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> return;
> }
> - pr_alert("%s: g%ld->%ld gar:%lu ga:%lu f%#x gs:%d %s->state:%#lx\n",
> - __func__, (long)READ_ONCE(rsp->gp_seq),
> - (long)READ_ONCE(rnp_root->gp_seq_needed),
> - j - rsp->gp_req_activity, j - rsp->gp_activity,
> - rsp->gp_flags, rsp->gp_state, rsp->name,
> - rsp->gp_kthread ? rsp->gp_kthread->state : 0x1ffffL);
> WARN_ON(1);
> if (rnp_root != rnp)
> raw_spin_unlock_rcu_node(rnp_root);
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> + show_rcu_gp_kthreads();
> }
>
> /*
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 4e74df768c57..0e051d9b5f1a 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -327,6 +327,8 @@ struct rcu_state {
> struct swait_queue_head gp_wq; /* Where GP task waits. */
> short gp_flags; /* Commands for GP task. */
> short gp_state; /* GP kthread sleep state. */
> + unsigned long gp_wake_time; /* Last GP kthread wake. */
> + unsigned long gp_wake_seq; /* ->gp_seq at ^^^. */
>
> /* End of fields guarded by root rcu_node's lock. */
>
>
next prev parent reply other threads:[~2018-12-12 2:24 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-29 8:49 rcu_preempt caused oom He, Bo
2018-11-29 13:06 ` Paul E. McKenney
2018-11-29 14:27 ` Paul E. McKenney
2018-11-30 8:03 ` He, Bo
2018-11-30 14:43 ` Paul E. McKenney
2018-11-30 15:16 ` Steven Rostedt
2018-11-30 15:18 ` He, Bo
2018-11-30 16:49 ` Paul E. McKenney
2018-12-03 7:44 ` He, Bo
2018-12-03 13:56 ` Paul E. McKenney
2018-12-04 7:50 ` He, Bo
2018-12-04 19:49 ` Paul E. McKenney
2018-12-05 8:42 ` He, Bo
2018-12-05 17:44 ` Paul E. McKenney
[not found] ` <CD6925E8781EFD4D8E11882D20FC406D52A16C46@SHSMSX104.ccr.corp.intel.com>
2018-12-06 17:38 ` Paul E. McKenney
[not found] ` <CD6925E8781EFD4D8E11882D20FC406D52A180C5@SHSMSX104.ccr.corp.intel.com>
2018-12-07 14:11 ` Paul E. McKenney
2018-12-09 19:56 ` Paul E. McKenney
2018-12-10 6:56 ` He, Bo
2018-12-11 0:38 ` Paul E. McKenney
2018-12-11 4:46 ` Paul E. McKenney
2018-12-11 5:29 ` He, Bo
2018-12-12 1:37 ` He, Bo
2018-12-12 2:24 ` Paul E. McKenney [this message]
[not found] ` <CD6925E8781EFD4D8E11882D20FC406D52A192C3@SHSMSX104.ccr.corp.intel.com>
2018-12-12 15:42 ` Paul E. McKenney
2018-12-12 21:03 ` Paul E. McKenney
2018-12-12 23:13 ` He, Bo
2018-12-13 0:12 ` Paul E. McKenney
2018-12-13 2:11 ` Zhang, Jun
2018-12-13 2:42 ` Paul E. McKenney
[not found] ` <88DC34334CA3444C85D647DBFA962C2735AD5F9E@SHSMSX104.ccr.corp.intel.com>
2018-12-13 4:40 ` Paul E. McKenney
[not found] ` <CD6925E8781EFD4D8E11882D20FC406D52A197EC@SHSMSX104.ccr.corp.intel.com>
2018-12-13 18:11 ` Paul E. McKenney
2018-12-14 1:30 ` He, Bo
2018-12-14 2:15 ` Paul E. McKenney
2018-12-14 2:40 ` He, Bo
2018-12-14 5:10 ` Paul E. McKenney
2018-12-14 5:38 ` Paul E. McKenney
2018-12-17 3:15 ` He, Bo
2018-12-17 4:26 ` Paul E. McKenney
[not found] ` <CD6925E8781EFD4D8E11882D20FC406D52A1A634@SHSMSX104.ccr.corp.intel.com>
2018-12-18 2:46 ` Zhang, Jun
2018-12-18 3:12 ` He, Bo
2018-12-18 5:34 ` Paul E. McKenney
2019-02-13 8:31 ` [tip:core/rcu] rcu: Prevent needless ->gp_seq_needed update in __note_gp_changes() tip-bot for Zhang, Jun
2019-02-13 8:30 ` [tip:core/rcu] rcu: Do RCU GP kthread self-wakeup from softirq and interrupt tip-bot for Zhang, Jun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181212022446.GV4170@linux.ibm.com \
--to=paulmck@linux.ibm.com \
--cc=bo.he@intel.com \
--cc=jiangshanlai@gmail.com \
--cc=jie.a.bai@intel.com \
--cc=jin.xiao@intel.com \
--cc=josh@joshtriplett.org \
--cc=jun.zhang@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=rostedt@goodmis.org \
--cc=yanmin.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.