public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] rcu: Reduce synchronize_rcu() delays when all wait heads are in use
@ 2024-04-03 10:52 Neeraj Upadhyay
  2024-04-04 17:46 ` Paul E. McKenney
  2024-04-04 21:42 ` Frederic Weisbecker
  0 siblings, 2 replies; 6+ messages in thread
From: Neeraj Upadhyay @ 2024-04-03 10:52 UTC (permalink / raw)
  To: paulmck, frederic, joel, urezki, josh, boqun.feng, rostedt,
	mathieu.desnoyers, jiangshanlai, qiang.zhang1211
  Cc: rcu, linux-kernel, neeraj.upadhyay, Neeraj Upadhyay

When all wait heads are in use, which can happen when
rcu_sr_normal_gp_cleanup_work()'s callback processing
is slow, any new synchronize_rcu() user's rcu_synchronize
node's processing is deferred to future GP periods. This
can result in long list of synchronize_rcu() invocations
waiting for full grace period processing, which can delay
freeing of memory. Mitigate this problem by using first
node in the list as wait tail when all wait heads are in use.
While methods to speed up callback processing would be needed
to recover from this situation, allowing new nodes to complete
their grace period can help prevent delays due to a fixed
number of wait head nodes.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
---
Changes since v1:
* Fix use-after-free issue in rcu_sr_normal_gp_cleanup() (Frederic)
* Remove WARN_ON_ONCE(!rcu_sr_is_wait_head()) for wait and done tail
  (Frederic)
* Rebase on top of commit 1c56d246027f ("rcu/tree: Reduce wake up
  for synchronize_rcu() common case") (Joel)
---
 kernel/rcu/tree.c | 40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a7c7a2b2b527..fe4a59d7cf61 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1464,14 +1464,11 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
  * for this new grace period. Given that there are a fixed
  * number of wait nodes, if all wait nodes are in use
  * (which can happen when kworker callback processing
- * is delayed) and additional grace period is requested.
- * This means, a system is slow in processing callbacks.
- *
- * TODO: If a slow processing is detected, a first node
- * in the llist should be used as a wait-tail for this
- * grace period, therefore users which should wait due
- * to a slow process are handled by _this_ grace period
- * and not next.
+ * is delayed), first node in the llist is used as wait
+ * tail for this grace period. This means, the first node
+ * has to go through additional grace periods before it is
+ * part of the wait callbacks. This should be ok, as
+ * the system is slow in processing callbacks anyway.
  *
  * Below is an illustration of how the done and wait
  * tail pointers move from one set of rcu_synchronize nodes
@@ -1642,7 +1639,6 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
 		return;
 	}
 
-	WARN_ON_ONCE(!rcu_sr_is_wait_head(done));
 	head = done->next;
 	done->next = NULL;
 
@@ -1682,13 +1678,21 @@ static void rcu_sr_normal_gp_cleanup(void)
 
 	rcu_state.srs_wait_tail = NULL;
 	ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_wait_tail);
-	WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail));
 
 	/*
 	 * Process (a) and (d) cases. See an illustration.
 	 */
 	llist_for_each_safe(rcu, next, wait_tail->next) {
-		if (rcu_sr_is_wait_head(rcu))
+		/*
+		 * The done tail may reference a rcu_synchronize node.
+		 * Stop at done tail, as using rcu_sr_normal_complete()
+		 * from this path can result in use-after-free. This
+		 * may occur if, following the wake-up of the synchronize_rcu()
+		 * wait contexts and freeing up of node memory,
+		 * rcu_sr_normal_gp_cleanup_work() accesses the done tail and
+		 * its subsequent nodes.
+		 */
+		if (wait_tail->next == rcu_state.srs_done_tail)
 			break;
 
 		rcu_sr_normal_complete(rcu);
@@ -1743,15 +1747,17 @@ static bool rcu_sr_normal_gp_init(void)
 		return start_new_poll;
 
 	wait_head = rcu_sr_get_wait_head();
-	if (!wait_head) {
-		// Kick another GP to retry.
+	if (wait_head) {
+		/* Inject a wait-dummy-node. */
+		llist_add(wait_head, &rcu_state.srs_next);
+	} else {
+		// Kick another GP for first node.
 		start_new_poll = true;
-		return start_new_poll;
+		if (first == rcu_state.srs_done_tail)
+			return start_new_poll;
+		wait_head = first;
 	}
 
-	/* Inject a wait-dummy-node. */
-	llist_add(wait_head, &rcu_state.srs_next);
-
 	/*
 	 * A waiting list of rcu_synchronize nodes should be empty on
 	 * this step, since a GP-kthread, rcu_gp_init() -> gp_cleanup(),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] rcu: Reduce synchronize_rcu() delays when all wait heads are in use
  2024-04-03 10:52 [PATCH v2] rcu: Reduce synchronize_rcu() delays when all wait heads are in use Neeraj Upadhyay
@ 2024-04-04 17:46 ` Paul E. McKenney
  2024-04-04 21:42 ` Frederic Weisbecker
  1 sibling, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2024-04-04 17:46 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: frederic, joel, urezki, josh, boqun.feng, rostedt,
	mathieu.desnoyers, jiangshanlai, qiang.zhang1211, rcu,
	linux-kernel, neeraj.upadhyay

On Wed, Apr 03, 2024 at 04:22:12PM +0530, Neeraj Upadhyay wrote:
> When all wait heads are in use, which can happen when
> rcu_sr_normal_gp_cleanup_work()'s callback processing
> is slow, any new synchronize_rcu() user's rcu_synchronize
> node's processing is deferred to future GP periods. This
> can result in long list of synchronize_rcu() invocations
> waiting for full grace period processing, which can delay
> freeing of memory. Mitigate this problem by using first
> node in the list as wait tail when all wait heads are in use.
> While methods to speed up callback processing would be needed
> to recover from this situation, allowing new nodes to complete
> their grace period can help prevent delays due to a fixed
> number of wait head nodes.
> 
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>

Seeing no objections, I have queued this for testing and review alongside
the other synchronize_rcu() speedup patches, thank you!

							Thanx, Paul

> ---
> Changes since v1:
> * Fix use-after-free issue in rcu_sr_normal_gp_cleanup() (Frederic)
> * Remove WARN_ON_ONCE(!rcu_sr_is_wait_head()) for wait and done tail
>   (Frederic)
> * Rebase on top of commit 1c56d246027f ("rcu/tree: Reduce wake up
>   for synchronize_rcu() common case") (Joel)
> ---
>  kernel/rcu/tree.c | 40 +++++++++++++++++++++++-----------------
>  1 file changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index a7c7a2b2b527..fe4a59d7cf61 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1464,14 +1464,11 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
>   * for this new grace period. Given that there are a fixed
>   * number of wait nodes, if all wait nodes are in use
>   * (which can happen when kworker callback processing
> - * is delayed) and additional grace period is requested.
> - * This means, a system is slow in processing callbacks.
> - *
> - * TODO: If a slow processing is detected, a first node
> - * in the llist should be used as a wait-tail for this
> - * grace period, therefore users which should wait due
> - * to a slow process are handled by _this_ grace period
> - * and not next.
> + * is delayed), first node in the llist is used as wait
> + * tail for this grace period. This means, the first node
> + * has to go through additional grace periods before it is
> + * part of the wait callbacks. This should be ok, as
> + * the system is slow in processing callbacks anyway.
>   *
>   * Below is an illustration of how the done and wait
>   * tail pointers move from one set of rcu_synchronize nodes
> @@ -1642,7 +1639,6 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
>  		return;
>  	}
>  
> -	WARN_ON_ONCE(!rcu_sr_is_wait_head(done));
>  	head = done->next;
>  	done->next = NULL;
>  
> @@ -1682,13 +1678,21 @@ static void rcu_sr_normal_gp_cleanup(void)
>  
>  	rcu_state.srs_wait_tail = NULL;
>  	ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_wait_tail);
> -	WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail));
>  
>  	/*
>  	 * Process (a) and (d) cases. See an illustration.
>  	 */
>  	llist_for_each_safe(rcu, next, wait_tail->next) {
> -		if (rcu_sr_is_wait_head(rcu))
> +		/*
> +		 * The done tail may reference a rcu_synchronize node.
> +		 * Stop at done tail, as using rcu_sr_normal_complete()
> +		 * from this path can result in use-after-free. This
> +		 * may occur if, following the wake-up of the synchronize_rcu()
> +		 * wait contexts and freeing up of node memory,
> +		 * rcu_sr_normal_gp_cleanup_work() accesses the done tail and
> +		 * its subsequent nodes.
> +		 */
> +		if (wait_tail->next == rcu_state.srs_done_tail)
>  			break;
>  
>  		rcu_sr_normal_complete(rcu);
> @@ -1743,15 +1747,17 @@ static bool rcu_sr_normal_gp_init(void)
>  		return start_new_poll;
>  
>  	wait_head = rcu_sr_get_wait_head();
> -	if (!wait_head) {
> -		// Kick another GP to retry.
> +	if (wait_head) {
> +		/* Inject a wait-dummy-node. */
> +		llist_add(wait_head, &rcu_state.srs_next);
> +	} else {
> +		// Kick another GP for first node.
>  		start_new_poll = true;
> -		return start_new_poll;
> +		if (first == rcu_state.srs_done_tail)
> +			return start_new_poll;
> +		wait_head = first;
>  	}
>  
> -	/* Inject a wait-dummy-node. */
> -	llist_add(wait_head, &rcu_state.srs_next);
> -
>  	/*
>  	 * A waiting list of rcu_synchronize nodes should be empty on
>  	 * this step, since a GP-kthread, rcu_gp_init() -> gp_cleanup(),
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] rcu: Reduce synchronize_rcu() delays when all wait heads are in use
  2024-04-03 10:52 [PATCH v2] rcu: Reduce synchronize_rcu() delays when all wait heads are in use Neeraj Upadhyay
  2024-04-04 17:46 ` Paul E. McKenney
@ 2024-04-04 21:42 ` Frederic Weisbecker
  2024-04-05  2:15   ` Neeraj Upadhyay
  1 sibling, 1 reply; 6+ messages in thread
From: Frederic Weisbecker @ 2024-04-04 21:42 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: paulmck, joel, urezki, josh, boqun.feng, rostedt,
	mathieu.desnoyers, jiangshanlai, qiang.zhang1211, rcu,
	linux-kernel, neeraj.upadhyay

Le Wed, Apr 03, 2024 at 04:22:12PM +0530, Neeraj Upadhyay a écrit :
> When all wait heads are in use, which can happen when
> rcu_sr_normal_gp_cleanup_work()'s callback processing
> is slow, any new synchronize_rcu() user's rcu_synchronize
> node's processing is deferred to future GP periods. This
> can result in long list of synchronize_rcu() invocations
> waiting for full grace period processing, which can delay
> freeing of memory. Mitigate this problem by using first
> node in the list as wait tail when all wait heads are in use.
> While methods to speed up callback processing would be needed
> to recover from this situation, allowing new nodes to complete
> their grace period can help prevent delays due to a fixed
> number of wait head nodes.
> 
> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>

Looking at it again, I'm not sure if it's a good idea to
optimize the thing that far. It's already a tricky state machine
to review and the workqueue has SR_NORMAL_GP_WAIT_HEAD_MAX - 1 = 4
grace periods worth of time to execute. Such a tense situation may
happen of course but, should we really work around that?

I let you guys judge. In the meantime, I haven't found correctness
issues:

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] rcu: Reduce synchronize_rcu() delays when all wait heads are in use
  2024-04-04 21:42 ` Frederic Weisbecker
@ 2024-04-05  2:15   ` Neeraj Upadhyay
  2024-04-09  8:49     ` Uladzislau Rezki
  0 siblings, 1 reply; 6+ messages in thread
From: Neeraj Upadhyay @ 2024-04-05  2:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulmck, joel, urezki, josh, boqun.feng, rostedt,
	mathieu.desnoyers, jiangshanlai, qiang.zhang1211, rcu,
	linux-kernel, neeraj.upadhyay



On 4/5/2024 3:12 AM, Frederic Weisbecker wrote:
> Le Wed, Apr 03, 2024 at 04:22:12PM +0530, Neeraj Upadhyay a écrit :
>> When all wait heads are in use, which can happen when
>> rcu_sr_normal_gp_cleanup_work()'s callback processing
>> is slow, any new synchronize_rcu() user's rcu_synchronize
>> node's processing is deferred to future GP periods. This
>> can result in long list of synchronize_rcu() invocations
>> waiting for full grace period processing, which can delay
>> freeing of memory. Mitigate this problem by using first
>> node in the list as wait tail when all wait heads are in use.
>> While methods to speed up callback processing would be needed
>> to recover from this situation, allowing new nodes to complete
>> their grace period can help prevent delays due to a fixed
>> number of wait head nodes.
>>
>> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> 
> Looking at it again, I'm not sure if it's a good idea to
> optimize the thing that far. It's already a tricky state machine
> to review and the workqueue has SR_NORMAL_GP_WAIT_HEAD_MAX - 1 = 4
> grace periods worth of time to execute. Such a tense situation may
> happen of course but, should we really work around that?
> 
> I let you guys judge. In the meantime, I haven't found correctness

I agree that this adds more complexity for handling a scenario
which is not expected to happen often. Also, this does not help
much to recover from the situation, as most of the callbacks are still
blocked on kworker execution. Intent was to keep the patch ready, in
case we see fixed SR_NORMAL_GP_WAIT_HEAD_MAX  as a blocking factor.
It's fine from my side if we want to hold off this one. Uladzislau
what do you think?

> issues:
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> 

Thanks!


- Neeraj

> Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] rcu: Reduce synchronize_rcu() delays when all wait heads are in use
  2024-04-05  2:15   ` Neeraj Upadhyay
@ 2024-04-09  8:49     ` Uladzislau Rezki
  2024-04-09 17:14       ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Uladzislau Rezki @ 2024-04-09  8:49 UTC (permalink / raw)
  To: Neeraj Upadhyay, Frederic Weisbecker
  Cc: Frederic Weisbecker, paulmck, joel, urezki, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang.zhang1211, rcu,
	linux-kernel, neeraj.upadhyay

Hello, Neeraj, Frederic!

> 
> On 4/5/2024 3:12 AM, Frederic Weisbecker wrote:
> > Le Wed, Apr 03, 2024 at 04:22:12PM +0530, Neeraj Upadhyay a écrit :
> >> When all wait heads are in use, which can happen when
> >> rcu_sr_normal_gp_cleanup_work()'s callback processing
> >> is slow, any new synchronize_rcu() user's rcu_synchronize
> >> node's processing is deferred to future GP periods. This
> >> can result in long list of synchronize_rcu() invocations
> >> waiting for full grace period processing, which can delay
> >> freeing of memory. Mitigate this problem by using first
> >> node in the list as wait tail when all wait heads are in use.
> >> While methods to speed up callback processing would be needed
> >> to recover from this situation, allowing new nodes to complete
> >> their grace period can help prevent delays due to a fixed
> >> number of wait head nodes.
> >>
> >> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> > 
> > Looking at it again, I'm not sure if it's a good idea to
> > optimize the thing that far. It's already a tricky state machine
> > to review and the workqueue has SR_NORMAL_GP_WAIT_HEAD_MAX - 1 = 4
> > grace periods worth of time to execute. Such a tense situation may
> > happen of course but, should we really work around that?
> > 
> > I let you guys judge. In the meantime, I haven't found correctness
> 
> I agree that this adds more complexity for handling a scenario
> which is not expected to happen often. Also, this does not help
> much to recover from the situation, as most of the callbacks are still
> blocked on kworker execution. Intent was to keep the patch ready, in
> case we see fixed SR_NORMAL_GP_WAIT_HEAD_MAX  as a blocking factor.
> It's fine from my side if we want to hold off this one. Uladzislau
> what do you think?
> 
I agree with Frederic and we discussed this patch with Neeraj! I think
the state machine is a bit complex as of now. Let's hold off it so far.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] rcu: Reduce synchronize_rcu() delays when all wait heads are in use
  2024-04-09  8:49     ` Uladzislau Rezki
@ 2024-04-09 17:14       ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2024-04-09 17:14 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Neeraj Upadhyay, Frederic Weisbecker, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang.zhang1211, rcu,
	linux-kernel, neeraj.upadhyay

On Tue, Apr 09, 2024 at 10:49:30AM +0200, Uladzislau Rezki wrote:
> Hello, Neeraj, Frederic!
> 
> > 
> > On 4/5/2024 3:12 AM, Frederic Weisbecker wrote:
> > > Le Wed, Apr 03, 2024 at 04:22:12PM +0530, Neeraj Upadhyay a écrit :
> > >> When all wait heads are in use, which can happen when
> > >> rcu_sr_normal_gp_cleanup_work()'s callback processing
> > >> is slow, any new synchronize_rcu() user's rcu_synchronize
> > >> node's processing is deferred to future GP periods. This
> > >> can result in long list of synchronize_rcu() invocations
> > >> waiting for full grace period processing, which can delay
> > >> freeing of memory. Mitigate this problem by using first
> > >> node in the list as wait tail when all wait heads are in use.
> > >> While methods to speed up callback processing would be needed
> > >> to recover from this situation, allowing new nodes to complete
> > >> their grace period can help prevent delays due to a fixed
> > >> number of wait head nodes.
> > >>
> > >> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
> > > 
> > > Looking at it again, I'm not sure if it's a good idea to
> > > optimize the thing that far. It's already a tricky state machine
> > > to review and the workqueue has SR_NORMAL_GP_WAIT_HEAD_MAX - 1 = 4
> > > grace periods worth of time to execute. Such a tense situation may
> > > happen of course but, should we really work around that?
> > > 
> > > I let you guys judge. In the meantime, I haven't found correctness
> > 
> > I agree that this adds more complexity for handling a scenario
> > which is not expected to happen often. Also, this does not help
> > much to recover from the situation, as most of the callbacks are still
> > blocked on kworker execution. Intent was to keep the patch ready, in
> > case we see fixed SR_NORMAL_GP_WAIT_HEAD_MAX  as a blocking factor.
> > It's fine from my side if we want to hold off this one. Uladzislau
> > what do you think?
> > 
> I agree with Frederic and we discussed this patch with Neeraj! I think
> the state machine is a bit complex as of now. Let's hold off it so far.

There is always the next merge window, should it be required.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-09 17:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-03 10:52 [PATCH v2] rcu: Reduce synchronize_rcu() delays when all wait heads are in use Neeraj Upadhyay
2024-04-04 17:46 ` Paul E. McKenney
2024-04-04 21:42 ` Frederic Weisbecker
2024-04-05  2:15   ` Neeraj Upadhyay
2024-04-09  8:49     ` Uladzislau Rezki
2024-04-09 17:14       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox