Re: rcu: endless stalls - Paul E. McKenney

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mike Galbraith <efault@gmx.de>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: Re: rcu: endless stalls
Date: Thu, 14 Jun 2012 09:47:47 -0700	[thread overview]
Message-ID: <20120614164746.GG2458@linux.vnet.ibm.com> (raw)
In-Reply-To: <1339659932.7347.76.camel@marge.simpson.net>

On Thu, Jun 14, 2012 at 09:45:32AM +0200, Mike Galbraith wrote:
> On Wed, 2012-06-13 at 09:12 +0200, Mike Galbraith wrote: 
> > On Wed, 2012-06-13 at 07:56 +0200, Mike Galbraith wrote:
> > 
> > > Question remains though.  Maybe the box hit some other problem that led
> > > to death by RCU gripage, but the info I received indicated the box was
> > > in the midst of a major spin-fest.
> > 
> > To (maybe) speak more clearly, since it's a mutex like any other mutex
> > that loads of CPUs can hit if you've got loads of CPUs, did huge box
> > driver do something that we don't expect so many CPUs to be doing, thus
> > instigate simultaneous exit trouble (ie shoot self in foot), or did that
> > mutex addition create the exit trouble which box appeared to be having?
> 
> Crickets chirping.. I know what _that_ means: "tsk tsk, you dummy" :)

In my case, it means that I suspect you would rather me continue working
on my series of patches to further reduce RCU grace-period-initialization
latencies on large systems than worry about this patch.

If I understand correctly, your patch did what you wanted in the
situation at hand.  I have some quibbles, please see below.

> I suspected that would happen, but asked anyway because I couldn't
> imagine even 4096 CPUs getting tangled up for an _eternity_ trying to go
> to sleep, but the lock which landed after 32-stable where these beasts
> earn their daily fuel rods was splattered all over the event.  Oh well.
> 
> So, I can forget that and just make the thing not gripe itself to death
> should a stall for whatever reason be encountered again.
> 
> Rather than mucking about with rcu_cpu_stall_suppress, how about adjust
> timeout as you proceed, and block report functions?  That way, there's
> no fiddling with things used elsewhere, and it shouldn't matter how
> badly console is being be hammered, you get a full report, and maybe
> even only one.

That sounds like a very good interim approach to me!

> Hm, maybe I should forget hoping to keep check_cpu_stall() happy too,
> and only silently ignore it when busy.

But this is a good way to accumulate a variety of stalls, so not
recommended.

							Thanx, Paul

> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 0da7b88..e9dd654 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -727,24 +727,29 @@ static void record_gp_stall_check_time(struct rcu_state *rsp)
>  	rsp->jiffies_stall = jiffies + jiffies_till_stall_check();
>  }
> 
> +int rcu_stall_report_in_progress;
> +
>  static void print_other_cpu_stall(struct rcu_state *rsp)
>  {
>  	int cpu;
>  	long delta;
>  	unsigned long flags;
>  	int ndetected;
> -	struct rcu_node *rnp = rcu_get_root(rsp);
> +	struct rcu_node *root = rcu_get_root(rsp);

s/root/rnp_root/, please -- consistency with other places in the code.
But see below.

> +	struct rcu_node *rnp;
> 
>  	/* Only let one CPU complain about others per time interval. */
> 
> -	raw_spin_lock_irqsave(&rnp->lock, flags);
> +	raw_spin_lock_irqsave(&root->lock, flags);

On a 4096-CPU system, I bet that this needs to be trylock, with
a bare "return" on failure to acquire the lock.

Of course, that fails if someone is holding this lock for some other
reason.  So I believe that you need a separate ->stalllock in the
rcu_state structure for this purpose.  You won't need to disable irqs
when acquiring the lock.

>  	delta = jiffies - rsp->jiffies_stall;
> -	if (delta < RCU_STALL_RAT_DELAY || !rcu_gp_in_progress(rsp)) {
> -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +	if (delta < RCU_STALL_RAT_DELAY || !rcu_gp_in_progress(rsp) ||
> +			rcu_stall_report_in_progress) {

If you conditionally acquire the new ->stalllock, you shouldn't need
this added check.

> +		raw_spin_unlock_irqrestore(&root->lock, flags);
>  		return;
>  	}
>  	rsp->jiffies_stall = jiffies + 3 * jiffies_till_stall_check() + 3;
> -	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +	rcu_stall_report_in_progress++;

And with the new ->stalllock, rcu_stall_report_in_progress can go away.

> +	raw_spin_unlock_irqrestore(&root->lock, flags);
> 
>  	/*
>  	 * OK, time to rat on our buddy...
> @@ -765,16 +770,23 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
>  				print_cpu_stall_info(rsp, rnp->grplo + cpu);
>  				ndetected++;
>  			}
> +
> +		/*
> +		 * Push the timeout back as we go.  With a slow serial
> +		 * console on a large machine, this may take a while.
> +		 */ 
> +		raw_spin_lock_irqsave(&root->lock, flags);
> +		rsp->jiffies_stall = jiffies + 3 * jiffies_till_stall_check() + 3;
> +		raw_spin_unlock_irqrestore(&root->lock, flags);

And the separate ->stalllock should make this unnecessary as well.
However, updating rsp->jiffies_stall periodically is a good idea
in order to decrease cache thrashing on ->stalllock.

>  	}
> 
>  	/*
>  	 * Now rat on any tasks that got kicked up to the root rcu_node
>  	 * due to CPU offlining.
>  	 */
> -	rnp = rcu_get_root(rsp);
> -	raw_spin_lock_irqsave(&rnp->lock, flags);
> -	ndetected = rcu_print_task_stall(rnp);
> -	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +	raw_spin_lock_irqsave(&root->lock, flags);
> +	ndetected = rcu_print_task_stall(root);
> +	raw_spin_unlock_irqrestore(&root->lock, flags);

And the separate lock makes this change unnecessary, also.

>  	print_cpu_stall_info_end();
>  	printk(KERN_CONT "(detected by %d, t=%ld jiffies)\n",
> @@ -784,6 +796,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
>  	else if (!trigger_all_cpu_backtrace())
>  		dump_stack();
> 
> +	raw_spin_lock_irqsave(&root->lock, flags);
> +	rcu_stall_report_in_progress--;
> +	raw_spin_unlock_irqrestore(&root->lock, flags);

Ditto here.

> +
>  	/* If so configured, complain about tasks blocking the grace period. */
> 
>  	rcu_print_detail_task_stall(rsp);
> @@ -796,6 +812,17 @@ static void print_cpu_stall(struct rcu_state *rsp)
>  	unsigned long flags;
>  	struct rcu_node *rnp = rcu_get_root(rsp);
> 
> +	raw_spin_lock_irqsave(&rnp->lock, flags);
> +	if (rcu_stall_report_in_progress) {
> +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +		return;
> +	}
> +
> +	/* Reset timeout, dump_stack() may take a while on large machines. */
> +	rsp->jiffies_stall = jiffies + 3 * jiffies_till_stall_check() + 3;
> +	rcu_stall_report_in_progress++;
> +	raw_spin_unlock_irqrestore(&rnp->lock, flags);

And do trylock (without disabling irqs) here as well.  No need for
rcu_stall_report_in_progress.  You can update rsp->jiffies_stall
to reduce cache thrashing on ->stalllock.

> +
>  	/*
>  	 * OK, time to rat on ourselves...
>  	 * See Documentation/RCU/stallwarn.txt for info on how to debug
> @@ -813,6 +840,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
>  	if (ULONG_CMP_GE(jiffies, rsp->jiffies_stall))
>  		rsp->jiffies_stall = jiffies +
>  				     3 * jiffies_till_stall_check() + 3;
> +	rcu_stall_report_in_progress--;

And ->stalllock makes this unnecessary as well.

>  	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> 
>  	set_need_resched();  /* kick ourselves to get things going. */
> 
> -Mike
>

next prev parent reply	other threads:[~2012-06-14 16:51 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-11 10:06 rcu: endless stalls Mike Galbraith
2012-06-11 13:39 ` Paul E. McKenney
2012-06-11 14:22   ` Mike Galbraith
2012-06-11 16:54     ` Paul E. McKenney
2012-06-11 17:20     ` Mike Galbraith
2012-06-11 18:01       ` Paul E. McKenney
2012-06-11 18:10         ` Mike Galbraith
2012-06-13  3:35           ` Mike Galbraith
2012-06-13  4:31             ` Hugh Dickins
2012-06-13  5:56               ` Mike Galbraith
2012-06-13  7:12                 ` Mike Galbraith
2012-06-14  7:45                   ` Mike Galbraith
2012-06-14 16:47                     ` Paul E. McKenney [this message]
2012-06-14 21:34                       ` Mike Galbraith
2012-06-15  7:49                       ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120614164746.GG2458@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.