All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neeraj Upadhyay <neeraju@codeaurora.org>
To: David Chen <david.chen@nutanix.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "stable@vger.kernel.org" <stable@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
Date: Wed, 18 Aug 2021 12:49:11 +0530	[thread overview]
Message-ID: <d30dc0f2-1a91-b0fb-cb59-aed0696bfa33@codeaurora.org> (raw)
In-Reply-To: <CO1PR02MB8489899CD7101180B2759D9C94FD9@CO1PR02MB8489.namprd02.prod.outlook.com>



On 8/17/2021 3:32 AM, David Chen wrote:
> 
> 
>> -----Original Message-----
>> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Sent: Monday, August 16, 2021 12:31 PM
>> To: David Chen <david.chen@nutanix.com>
>> Cc: stable@vger.kernel.org; Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
>> Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
>>
>> On Mon, Aug 16, 2021 at 07:19:34PM +0000, David Chen wrote:
>>> Hi Greg,
>>>
>>> We recently hit a hung task timeout issue in synchronize_rcu_expedited on
>> 4.14 branch.
>>> The issue seems to be identical to the one described in `fd6bc19d7676
>>> rcu: Fix missed wakeup of exp_wq waiters` Can we backport it to 4.14 and
>> 4.19 branch?
>>> The patch doesn't apply cleanly, but it should be trivial to resolve,
>>> just do this
>>>
>>> -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp-
>>> expedited_sequence) & 0x3]);
>>> +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
>>>
>>> I don't know if we should do it for 4.9, because the handling of sequence
>> number is a bit different.
>>
>> Please provide a working backport, me hand-editing patches does not scale,
>> and this way you get the proper credit for backporting it (after testing it).
> 
> Sure, appended at the end.
> 
>>
>> You have tested, this, right?
> 
> I don't have a good repro for the original issue, so I only ran rcutorture and
> some basic work load test to see if anything obvious went wrong.
> 
>>
>> thanks,
>>
>> greg k-h
> 
> --------
> 
>  From 307a212335fe143027e3a9f7a9d548beead7ba33 Mon Sep 17 00:00:00 2001
> From: Neeraj Upadhyay <neeraju@codeaurora.org>
> Date: Tue, 19 Nov 2019 03:17:07 +0000
> Subject: [PATCH] rcu: Fix missed wakeup of exp_wq waiters
> 
> [ Upstream commit fd6bc19d7676a060a171d1cf3dcbf6fd797eb05f ]
> 
> Tasks waiting within exp_funnel_lock() for an expedited grace period to
> elapse can be starved due to the following sequence of events:
> 
> 1.	Tasks A and B both attempt to start an expedited grace
> 	period at about the same time.	This grace period will have
> 	completed when the lower four bits of the rcu_state structure's
> 	->expedited_sequence field are 0b'0100', for example, when the
> 	initial value of this counter is zero.	Task A wins, and thus
> 	does the actual work of starting the grace period, including
> 	acquiring the rcu_state structure's .exp_mutex and sets the
> 	counter to 0b'0001'.
> 
> 2.	Because task B lost the race to start the grace period, it
> 	waits on ->expedited_sequence to reach 0b'0100' inside of
> 	exp_funnel_lock(). This task therefore blocks on the rcu_node
> 	structure's ->exp_wq[1] field, keeping in mind that the
> 	end-of-grace-period value of ->expedited_sequence (0b'0100')
> 	is shifted down two bits before indexing the ->exp_wq[] field.
> 
> 3.	Task C attempts to start another expedited grace period,
> 	but blocks on ->exp_mutex, which is still held by Task A.
> 
> 4.	The aforementioned expedited grace period completes, so that
> 	->expedited_sequence now has the value 0b'0100'.  A kworker task
> 	therefore acquires the rcu_state structure's ->exp_wake_mutex
> 	and starts awakening any tasks waiting for this grace period.
> 
> 5.	One of the first tasks awakened happens to be Task A.  Task A
> 	therefore releases the rcu_state structure's ->exp_mutex,
> 	which allows Task C to start the next expedited grace period,
> 	which causes the lower four bits of the rcu_state structure's
> 	->expedited_sequence field to become 0b'0101'.
> 
> 6.	Task C's expedited grace period completes, so that the lower four
> 	bits of the rcu_state structure's ->expedited_sequence field now
> 	become 0b'1000'.
> 
> 7.	The kworker task from step 4 above continues its wakeups.
> 	Unfortunately, the wake_up_all() refetches the rcu_state
> 	structure's .expedited_sequence field:
> 
> 	wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]);

Minor: On these kernel versions, we had rsp pointer, per RCU flavor, 
whereas post 4.20 kernel versions, we have a single rcu_state. So, the 
commit message can be corrected here. The  functionality is mostly
unchanged and same fix is applicable.

> 
> 	This results in the wakeup being applied to the rcu_node
> 	structure's ->exp_wq[2] field, which is unfortunate given that
> 	Task B is instead waiting on ->exp_wq[1].
> 
> On a busy system, no harm is done (or at least no permanent harm is done).
> Some later expedited grace period will redo the wakeup.  But on a quiet
> system, such as many embedded systems, it might be a good long time before
> there was another expedited grace period.  On such embedded systems,
> this situation could therefore result in a system hang.
> 
> This issue manifested as DPM device timeout during suspend (which
> usually qualifies as a quiet time) due to a SCSI device being stuck in
> _synchronize_rcu_expedited(), with the following stack trace:
> 
> 	schedule()
> 	synchronize_rcu_expedited()
> 	synchronize_rcu()
> 	scsi_device_quiesce()
> 	scsi_bus_suspend()
> 	dpm_run_callback()
> 	__device_suspend()
> 
> This commit therefore prevents such delays, timeouts, and hangs by
> making rcu_exp_wait_wake() use its "s" argument consistently instead of
> refetching from rcu_state.expedited_sequence.
> 
> Fixes: 3b5f668e715b ("rcu: Overlap wakeups with next expedited grace period")
> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> Signed-off-by: David Chen <david.chen@nutanix.com>
> ---
>   kernel/rcu/tree_exp.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 46d61b597731..f90d10c1c3c8 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -534,7 +534,7 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
>   			spin_unlock(&rnp->exp_lock);
>   		}
>   		smp_mb(); /* All above changes before wakeup. */
> -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp->expedited_sequence) & 0x3]);
> +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
>   	}
>   	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
>   	mutex_unlock(&rsp->exp_wake_mutex);
> 


Acked-by: Neeraj Upadhyay <neeraju@codeaurora.org>

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

      parent reply	other threads:[~2021-08-18  7:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 19:19 Request for backport fd6bc19d7676 to 4.14 and 4.19 branch David Chen
2021-08-16 19:30 ` Greg Kroah-Hartman
2021-08-16 22:02   ` David Chen
2021-08-17  6:16     ` Greg Kroah-Hartman
2021-08-17 18:47       ` David Chen
2021-08-18  6:55         ` Greg Kroah-Hartman
2021-08-19  0:28           ` David Chen
2021-09-23  7:52             ` Greg Kroah-Hartman
2021-08-18  7:19     ` Neeraj Upadhyay [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d30dc0f2-1a91-b0fb-cb59-aed0696bfa33@codeaurora.org \
    --to=neeraju@codeaurora.org \
    --cc=david.chen@nutanix.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.