Re: [PATCH 08/11] rcu: don't use negative ->rcu_read_lock_nesting

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Lai Jiangshan <laijs@linux.alibaba.com>
To: paulmck@kernel.org
Cc: linux-kernel@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	rcu@vger.kernel.org
Subject: Re: [PATCH 08/11] rcu: don't use negative ->rcu_read_lock_nesting
Date: Mon, 18 Nov 2019 09:54:29 +0800	[thread overview]
Message-ID: <77222fe8-db55-d09f-e8fd-e6f1a10f9dc3@linux.alibaba.com> (raw)
In-Reply-To: <20191117215339.GD2889@paulmck-ThinkPad-P72>



On 2019/11/18 5:53 上午, Paul E. McKenney wrote:
> On Sat, Nov 16, 2019 at 09:04:56PM +0800, Lai Jiangshan wrote:
>> On 2019/11/1 8:33 下午, Paul E. McKenney wrote:
>>> On Thu, Oct 31, 2019 at 10:08:03AM +0000, Lai Jiangshan wrote:
>>>> Negative ->rcu_read_lock_nesting was introduced to prevent
>>>> scheduler deadlock which was just prevented by deferred qs.
>>>> So negative ->rcu_read_lock_nesting is useless now and
>>>> rcu_read_unlock() can be simplified.
>>>>
>>>> And negative ->rcu_read_lock_nesting is bug-prone,
>>>> it is good to kill it.
>>>>
>>>> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
>>>> ---
>>>>    kernel/rcu/tree_exp.h    | 30 ++----------------------------
>>>>    kernel/rcu/tree_plugin.h | 21 +++++----------------
>>>>    2 files changed, 7 insertions(+), 44 deletions(-)
>>>>
>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>>> index c0d06bce35ea..9dcbd2734620 100644
>>>> --- a/kernel/rcu/tree_exp.h
>>>> +++ b/kernel/rcu/tree_exp.h
>>>> @@ -621,11 +621,11 @@ static void rcu_exp_handler(void *unused)
>>>>    	 * report the quiescent state, otherwise defer.
>>>>    	 */
>>>>    	if (!t->rcu_read_lock_nesting) {
>>>> +		rdp->exp_deferred_qs = true;
>>>>    		if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
>>>>    		    rcu_dynticks_curr_cpu_in_eqs()) {
>>>> -			rcu_report_exp_rdp(rdp);
>>>> +			rcu_preempt_deferred_qs(t);
>>>>    		} else {
>>>> -			rdp->exp_deferred_qs = true;
>>>>    			set_tsk_need_resched(t);
>>>>    			set_preempt_need_resched();
>>>>    		}
>>>> @@ -646,32 +646,6 @@ static void rcu_exp_handler(void *unused)
>>>>    		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
>>>>    		return;
>>>>    	}
>>>> -
>>>> -	/*
>>>> -	 * The final and least likely case is where the interrupted
>>>> -	 * code was just about to or just finished exiting the RCU-preempt
>>>> -	 * read-side critical section, and no, we can't tell which.
>>>> -	 * So either way, set ->deferred_qs to flag later code that
>>>> -	 * a quiescent state is required.
>>>> -	 *
>>>> -	 * If the CPU is fully enabled (or if some buggy RCU-preempt
>>>> -	 * read-side critical section is being used from idle), just
>>>> -	 * invoke rcu_preempt_deferred_qs() to immediately report the
>>>> -	 * quiescent state.  We cannot use rcu_read_unlock_special()
>>>> -	 * because we are in an interrupt handler, which will cause that
>>>> -	 * function to take an early exit without doing anything.
>>>> -	 *
>>>> -	 * Otherwise, force a context switch after the CPU enables everything.
>>>> -	 */
>>>> -	rdp->exp_deferred_qs = true;
>>>> -	if (rcu_preempt_need_deferred_qs(t) &&
>>>> -	    (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
>>>> -	    WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs()))) {
>>>> -		rcu_preempt_deferred_qs(t);
>>>> -	} else {
>>>> -		set_tsk_need_resched(t);
>>>> -		set_preempt_need_resched();
>>>> -	}
>>>>    }
>>>>    /* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up after. */
>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>> index dbded2b8c792..c62631c79463 100644
>>>> --- a/kernel/rcu/tree_plugin.h
>>>> +++ b/kernel/rcu/tree_plugin.h
>>>> @@ -344,8 +344,6 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
>>>>    }
>>>>    /* Bias and limit values for ->rcu_read_lock_nesting. */
>>>> -#define RCU_NEST_BIAS INT_MAX
>>>> -#define RCU_NEST_NMAX (-INT_MAX / 2)
>>>>    #define RCU_NEST_PMAX (INT_MAX / 2)
>>>>    /*
>>>> @@ -373,21 +371,15 @@ void __rcu_read_unlock(void)
>>>>    {
>>>>    	struct task_struct *t = current;
>>>> -	if (t->rcu_read_lock_nesting != 1) {
>>>> -		--t->rcu_read_lock_nesting;
>>>> -	} else {
>>>> +	if (--t->rcu_read_lock_nesting == 0) {
>>>>    		barrier();  /* critical section before exit code. */
>>>> -		t->rcu_read_lock_nesting = -RCU_NEST_BIAS;
>>>> -		barrier();  /* assign before ->rcu_read_unlock_special load */
>>>
>>> But if we take an interrupt here, and the interrupt handler contains
>>> an RCU read-side critical section, don't we end up in the same hole
>>> that resulted in this article when the corresponding rcu_read_unlock()
>>> executes?  https://lwn.net/Articles/453002/
>>
>> Hello, Paul
>>
>> I'm replying the email of V1, which is relying on deferred_qs changes
>> in [PATCH 07/11] (V1).
>> ([PATCH 04/11](V1) relies on it too as you pointed out)
>>
>> I hope I can answer the question wrt https://lwn.net/Articles/453002/
>> maybe partially.
>>
>> With the help of deferred_qs mechanism and the special.b.deferred_qs
>> bit, I HOPED rcu_read_unlock_special() can find if itself is
>> risking in scheduler locks via special.b.deferred_qs bit.
>>
>> --t->rcu_read_lock_nesting;
>> //outmost rcu c.s, rcu_read_lock_nesting is 0. but special is not zero
>> INTERRUPT
>>   // the fallowing code will normally be in_interrupt()
>>   // or NOT in_interrupt() when wakeup_softirqd() in invoke_softirq()
>>   // or NOT in_interrupt() when preempt_shedule_irq()
>>   // or other cases I missed.
>>   scheduler_lock()
>>   rcu_read_lock()
>>   rcu_read_unlock()
>>    // special has been set but with no special.b.deferred_qs
>>    rcu_read_unlock_special()
>>     raise_softirq_irqoff()
>>      wake_up() when !in_interrupt() // dead lock
>>
>> preempt_shedule_irq() is guaranteed to clear rcu_read_unlock_special
>> when rcu_read_lock_nesting = 0 before calling into scheduler locks.
>>
>> But, at least, what caused my hope to be failed was the case
>> wakeup_softirqd() in invoke_softirq() (which was once protected by
>> softirq in about 2 years between ec433f0c5152 and facd8b80c67a).
>> I don't think it is hard to fix it if we keep using
>> special.b.deferred_qs as this V1 series.
> 
> It is quite possible that special.b.deferred_qs might be useful
> for debugging.  But it should now be possible to take care of the
> nohz_full issue for expedited grace periods, which might in turn allow
> rcu_read_unlock_special() to avoid acquiring scheduler locks.
> 
> This could avoid the need for negative ->rcu_read_lock_nesting,
> in turn allowing your simplified _rcu_read_unlock().
> 
> Would you like to do the expedited grace-period modifications, or
> would you rather that I do so?
> 

Hello, Paul

To be honest, I didn't known there was special issue about
nohz_full with expedited grace periods until several days before
you told me. I just thought that it is requested to be expedited
so that we need to wake up something to handle it ASAP.

IOW, I'm not in a position to do the expedited grace-period
modifications before I learnt enough about it. I would be very
obliged that you do so. I believe it will be a better solution
than this one or the one in V2 relying on preempt_count.

Thanks
Lai

next prev parent reply	other threads:[~2019-11-18  1:54 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-31 10:07 [PATCH 00/11] rcu: introduce percpu rcu_preempt_depth Lai Jiangshan
2019-10-31 10:07 ` [PATCH 01/11] rcu: avoid leaking exp_deferred_qs into next GP Lai Jiangshan
2019-10-31 13:43   ` Paul E. McKenney
2019-10-31 18:19     ` Lai Jiangshan
2019-10-31 19:00       ` Paul E. McKenney
2019-10-31 10:07 ` [PATCH 02/11] rcu: fix bug when rcu_exp_handler() in nested interrupt Lai Jiangshan
2019-10-31 13:47   ` Paul E. McKenney
2019-10-31 14:20     ` Lai Jiangshan
2019-10-31 14:31     ` Paul E. McKenney
2019-10-31 15:14       ` Lai Jiangshan
2019-10-31 18:52         ` Paul E. McKenney
2019-11-01  0:19           ` Boqun Feng
2019-11-01  2:29             ` Lai Jiangshan
2019-10-31 10:07 ` [PATCH 03/11] rcu: clean up rcu_preempt_deferred_qs_irqrestore() Lai Jiangshan
2019-10-31 13:52   ` Paul E. McKenney
2019-10-31 15:25     ` Lai Jiangshan
2019-10-31 18:57       ` Paul E. McKenney
2019-10-31 19:02         ` Paul E. McKenney
2019-10-31 10:07 ` [PATCH 04/11] rcu: cleanup rcu_preempt_deferred_qs() Lai Jiangshan
2019-10-31 14:10   ` Paul E. McKenney
2019-10-31 14:35     ` Lai Jiangshan
2019-10-31 15:07       ` Paul E. McKenney
2019-10-31 18:33         ` Lai Jiangshan
2019-10-31 22:45           ` Paul E. McKenney
2019-10-31 10:08 ` [PATCH 05/11] rcu: clean all rcu_read_unlock_special after report qs Lai Jiangshan
2019-11-01 11:54   ` Paul E. McKenney
2019-10-31 10:08 ` [PATCH 06/11] rcu: clear t->rcu_read_unlock_special in one go Lai Jiangshan
2019-11-01 12:10   ` Paul E. McKenney
2019-11-01 16:58     ` Paul E. McKenney
2019-10-31 10:08 ` [PATCH 07/11] rcu: set special.b.deferred_qs before wake_up() Lai Jiangshan
2019-10-31 10:08 ` [PATCH 08/11] rcu: don't use negative ->rcu_read_lock_nesting Lai Jiangshan
2019-11-01 12:33   ` Paul E. McKenney
2019-11-16 13:04     ` Lai Jiangshan
2019-11-17 21:53       ` Paul E. McKenney
2019-11-18  1:54         ` Lai Jiangshan [this message]
2019-11-18 14:57           ` Paul E. McKenney
2019-10-31 10:08 ` [PATCH 09/11] rcu: wrap usages of rcu_read_lock_nesting Lai Jiangshan
2019-10-31 10:08 ` [PATCH 10/11] rcu: clear the special.b.need_qs in rcu_note_context_switch() Lai Jiangshan
2019-10-31 10:08 ` [PATCH 11/11] x86,rcu: use percpu rcu_preempt_depth Lai Jiangshan
2019-11-01 12:58   ` Paul E. McKenney
2019-11-01 13:13     ` Peter Zijlstra
2019-11-01 14:30       ` Paul E. McKenney
2019-11-01 15:32         ` Lai Jiangshan
2019-11-01 16:21           ` Paul E. McKenney
2019-11-01 16:21             ` Paul E. McKenney
2019-11-01 15:47       ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77222fe8-db55-d09f-e8fd-e6f1a10f9dc3@linux.alibaba.com \
    --to=laijs@linux.alibaba.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.