public inbox for rcu@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rcu 0/7] kvfree_rcu updates for v6.1
@ 2022-08-31 18:09 Paul E. McKenney
  2022-08-31 18:09 ` [PATCH rcu 1/3] rcu: Back off upon fill_page_cache_func() allocation failure Paul E. McKenney
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:09 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt

Hello!

This series contains a few kvfree_rcu() updates:

1.	Back off upon fill_page_cache_func() allocation failure, courtesy
	of Michal Hocko.

2.	Fix kfree_rcu_shrink_count() return value, courtesy of "Joel
	Fernandes (Google)".

3.	Update KFREE_DRAIN_JIFFIES interval, courtesy of "Uladzislau Rezki
	(Sony)".

						Thanx, Paul

------------------------------------------------------------------------

 b/kernel/rcu/tree.c |   17 +++++++++--------
 kernel/rcu/tree.c   |   25 ++++++++++++++++++++-----
 2 files changed, 29 insertions(+), 13 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH rcu 1/3] rcu: Back off upon fill_page_cache_func() allocation failure
  2022-08-31 18:09 [PATCH rcu 0/7] kvfree_rcu updates for v6.1 Paul E. McKenney
@ 2022-08-31 18:09 ` Paul E. McKenney
  2022-08-31 18:09 ` [PATCH rcu 2/3] rcu/kfree: Fix kfree_rcu_shrink_count() return value Paul E. McKenney
  2022-08-31 18:09 ` [PATCH rcu 3/3] rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval Paul E. McKenney
  2 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:09 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Michal Hocko,
	Uladzislau Rezki, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Josh Triplett, Mathieu Desnoyers, Lai Jiangshan,
	Joel Fernandes

From: Michal Hocko <mhocko@suse.com>

The fill_page_cache_func() function allocates couple of pages to store
kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY)
allocation which can fail under memory pressure. The function will,
however keep retrying even when the previous attempt has failed.

This retrying is in theory correct, but in practice the allocation is
invoked from workqueue context, which means that if the memory reclaim
gets stuck, these retries can hog the worker for quite some time.
Although the workqueues subsystem automatically adjusts concurrency, such
adjustment is not guaranteed to happen until the worker context sleeps.
And the fill_page_cache_func() function's retry loop is not guaranteed
to sleep (see the should_reclaim_retry() function).

And we have seen this function cause workqueue lockups:

kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s!
[...]
kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146
kernel:   pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5
kernel:     in-flight: 1917:fill_page_cache_func
kernel:     pending: dbs_work_handler, free_work, kfree_rcu_monitor

Originally, we thought that the root cause of this lockup was several
retries with direct reclaim, but this is not yet confirmed.  Furthermore,
we have seen similar lockups without any heavy memory pressure.  This
suggests that there are other factors contributing to these lockups.
However, it is not really clear that endless retries are desireable.

So let's make the fill_page_cache_func() function back off after
allocation failure.

Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 79aea7df4345e..eb435941e92fd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3183,15 +3183,16 @@ static void fill_page_cache_func(struct work_struct *work)
 		bnode = (struct kvfree_rcu_bulk_data *)
 			__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 
-		if (bnode) {
-			raw_spin_lock_irqsave(&krcp->lock, flags);
-			pushed = put_cached_bnode(krcp, bnode);
-			raw_spin_unlock_irqrestore(&krcp->lock, flags);
+		if (!bnode)
+			break;
 
-			if (!pushed) {
-				free_page((unsigned long) bnode);
-				break;
-			}
+		raw_spin_lock_irqsave(&krcp->lock, flags);
+		pushed = put_cached_bnode(krcp, bnode);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+		if (!pushed) {
+			free_page((unsigned long) bnode);
+			break;
 		}
 	}
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH rcu 2/3] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-08-31 18:09 [PATCH rcu 0/7] kvfree_rcu updates for v6.1 Paul E. McKenney
  2022-08-31 18:09 ` [PATCH rcu 1/3] rcu: Back off upon fill_page_cache_func() allocation failure Paul E. McKenney
@ 2022-08-31 18:09 ` Paul E. McKenney
  2022-08-31 18:09 ` [PATCH rcu 3/3] rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval Paul E. McKenney
  2 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:09 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Uladzislau Rezki, Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

As per the comments in include/linux/shrinker.h, .count_objects callback
should return the number of freeable items, but if there are no objects
to free, SHRINK_EMPTY should be returned. The only time 0 is returned
should be when we are unable to determine the number of objects, or the
cache should be skipped for another reason.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index eb435941e92fd..3d234d536d4c3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3372,7 +3372,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 		atomic_set(&krcp->backoff_page_cache_fill, 1);
 	}
 
-	return count;
+	return count == 0 ? SHRINK_EMPTY : count;
 }
 
 static unsigned long
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH rcu 3/3] rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval
  2022-08-31 18:09 [PATCH rcu 0/7] kvfree_rcu updates for v6.1 Paul E. McKenney
  2022-08-31 18:09 ` [PATCH rcu 1/3] rcu: Back off upon fill_page_cache_func() allocation failure Paul E. McKenney
  2022-08-31 18:09 ` [PATCH rcu 2/3] rcu/kfree: Fix kfree_rcu_shrink_count() return value Paul E. McKenney
@ 2022-08-31 18:09 ` Paul E. McKenney
  2 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:09 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Uladzislau Rezki (Sony),
	Paul E . McKenney

From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>

Currently the monitor work is scheduled with a fixed interval of HZ/20,
which is roughly 50 milliseconds. The drawback of this approach is
low utilization of the 512 page slots in scenarios with infrequence
kvfree_rcu() calls.  For example on an Android system:

<snip>
  kworker/3:3-507     [003] ....   470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6
  kworker/6:1-76      [006] ....   470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1
  kworker/6:1-76      [006] ....   470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9
  kworker/3:3-507     [003] ....   471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48
  kworker/1:1-73      [001] ....   471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3
  kworker/1:1-73      [001] ....   471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76
  kworker/0:4-1411    [000] ....   472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1
  kworker/0:4-1411    [000] ....   472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5
  kworker/6:1-76      [006] ....   472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102
<snip>

In many cases, out of 512 slots, fewer than 10 were actually used.
In order to improve batching and make utilization more efficient this
commit sets a drain interval to a fixed 5-seconds interval. Floods are
detected when a page fills quickly, and in that case, the reclaim work
is re-scheduled for the next scheduling-clock tick (jiffy).

After this change:

<snip>
  kworker/7:1-371     [007] ....  5630.725708: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005ab0ffb3 nr_records=121
  kworker/7:1-371     [007] ....  5630.989702: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000060c84761 nr_records=47
  kworker/7:1-371     [007] ....  5630.989714: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000babf308 nr_records=510
  kworker/7:1-371     [007] ....  5631.553790: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000bb7bd0ef nr_records=169
  kworker/7:1-371     [007] ....  5631.553808: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044c78753 nr_records=510
  kworker/5:6-9428    [005] ....  5631.746102: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d98519aa nr_records=123
  kworker/4:7-9434    [004] ....  5632.001758: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000526c9d44 nr_records=322
  kworker/4:7-9434    [004] ....  5632.002073: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000002c6a8afa nr_records=185
  kworker/7:1-371     [007] ....  5632.277515: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007f4a962f nr_records=510
<snip>

Here, all but one of the cases, more than one hundreds slots were used,
representing an order-of-magnitude improvement.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3d234d536d4c3..7b90478b752e8 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2832,7 +2832,7 @@ EXPORT_SYMBOL_GPL(call_rcu);
 
 
 /* Maximum number of jiffies to wait before draining a batch. */
-#define KFREE_DRAIN_JIFFIES (HZ / 50)
+#define KFREE_DRAIN_JIFFIES (5 * HZ)
 #define KFREE_N_BATCHES 2
 #define FREE_N_CHANNELS 2
 
@@ -3093,6 +3093,21 @@ need_offload_krc(struct kfree_rcu_cpu *krcp)
 	return !!krcp->head;
 }
 
+static void
+schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
+{
+	long delay, delay_left;
+
+	delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES;
+	if (delayed_work_pending(&krcp->monitor_work)) {
+		delay_left = krcp->monitor_work.timer.expires - jiffies;
+		if (delay < delay_left)
+			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
+		return;
+	}
+	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
+}
+
 /*
  * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
  */
@@ -3150,7 +3165,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
 	// work to repeat an attempt. Because previous batches are
 	// still in progress.
 	if (need_offload_krc(krcp))
-		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
+		schedule_delayed_monitor_work(krcp);
 
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
@@ -3339,7 +3354,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 
 	// Set timer to drain after KFREE_DRAIN_JIFFIES.
 	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
-		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
+		schedule_delayed_monitor_work(krcp);
 
 unlock_return:
 	krc_this_cpu_unlock(krcp, flags);
@@ -3415,7 +3430,7 @@ void __init kfree_rcu_scheduler_running(void)
 
 		raw_spin_lock_irqsave(&krcp->lock, flags);
 		if (need_offload_krc(krcp))
-			schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
+			schedule_delayed_monitor_work(krcp);
 		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 	}
 }
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-08-31 18:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-31 18:09 [PATCH rcu 0/7] kvfree_rcu updates for v6.1 Paul E. McKenney
2022-08-31 18:09 ` [PATCH rcu 1/3] rcu: Back off upon fill_page_cache_func() allocation failure Paul E. McKenney
2022-08-31 18:09 ` [PATCH rcu 2/3] rcu/kfree: Fix kfree_rcu_shrink_count() return value Paul E. McKenney
2022-08-31 18:09 ` [PATCH rcu 3/3] rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox