netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] fix softlockup in run_timer_softirq
@ 2023-05-05 11:33 Liu Jian
  2023-05-05 11:33 ` [PATCH 1/9] softirq: Rewrite softirq processing loop Liu Jian
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

I encountered the run_timer_softirq lockup issue[1] many times during fuzz
tests. We analyze __run_timers() and find the following problem.

In the while loop of __run_timers(), because there are too many timers or
improper timer handler functions, if the processing time of the expired
timers is always greater than the time wheel's next_expiry, the function
will loop infinitely.

Here base on Peter's softirq_needs_break patchset[2], use the timeout/break
logic jump out of the loop.

[1] https://lore.kernel.org/lkml/fb8d80434b2148e78c0032c6c70a8b4d@huawei.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=core/softirq

Liu Jian (1):
  softirq, timer: Use softirq_needs_break()

Peter Zijlstra (8):
  softirq: Rewrite softirq processing loop
  softirq: Use sched_clock() based timeout
  softirq: Factor loop termination condition
  softirq: Allow early break
  softirq: Context aware timeout
  softirq: Provide a softirq_needs_break() API
  softirq,net: Use softirq_needs_break()
  softirq,rcu: Use softirq_needs_break()

 Documentation/admin-guide/sysctl/net.rst | 11 +--
 include/linux/interrupt.h                |  5 ++
 kernel/rcu/tree.c                        | 29 ++-----
 kernel/rcu/tree_nocb.h                   |  2 +-
 kernel/softirq.c                         | 97 +++++++++++++++---------
 kernel/time/timer.c                      | 12 ++-
 net/core/dev.c                           |  6 +-
 net/core/dev.h                           |  1 -
 net/core/sysctl_net_core.c               |  8 --
 9 files changed, 86 insertions(+), 85 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/9] softirq: Rewrite softirq processing loop
  2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
@ 2023-05-05 11:33 ` Liu Jian
  2023-05-05 11:33 ` [PATCH 2/9] softirq: Use sched_clock() based timeout Liu Jian
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

From: Peter Zijlstra <peterz@infradead.org>

Simplify the softirq processing loop by using the bitmap APIs

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 kernel/softirq.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 1b725510dd0f..bff5debf6ce6 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -531,9 +531,9 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	unsigned long old_flags = current->flags;
 	int max_restart = MAX_SOFTIRQ_RESTART;
 	struct softirq_action *h;
+	unsigned long pending;
+	unsigned int vec_nr;
 	bool in_hardirq;
-	__u32 pending;
-	int softirq_bit;
 
 	/*
 	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
@@ -554,15 +554,13 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 	local_irq_enable();
 
-	h = softirq_vec;
+	for_each_set_bit(vec_nr, &pending, NR_SOFTIRQS) {
+		unsigned int prev_count;
 
-	while ((softirq_bit = ffs(pending))) {
-		unsigned int vec_nr;
-		int prev_count;
+		__clear_bit(vec_nr, &pending);
 
-		h += softirq_bit - 1;
+		h = softirq_vec + vec_nr;
 
-		vec_nr = h - softirq_vec;
 		prev_count = preempt_count();
 
 		kstat_incr_softirqs_this_cpu(vec_nr);
@@ -576,8 +574,6 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 			       prev_count, preempt_count());
 			preempt_count_set(prev_count);
 		}
-		h++;
-		pending >>= softirq_bit;
 	}
 
 	if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/9] softirq: Use sched_clock() based timeout
  2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
  2023-05-05 11:33 ` [PATCH 1/9] softirq: Rewrite softirq processing loop Liu Jian
@ 2023-05-05 11:33 ` Liu Jian
  2023-05-08  4:08   ` Jason Xing
  2023-05-05 11:33 ` [PATCH 3/9] softirq: Factor loop termination condition Liu Jian
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

From: Peter Zijlstra <peterz@infradead.org>

Replace the jiffies based timeout with a sched_clock() based one.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 kernel/softirq.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index bff5debf6ce6..59f16a9af5d1 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -27,6 +27,7 @@
 #include <linux/tick.h>
 #include <linux/irq.h>
 #include <linux/wait_bit.h>
+#include <linux/sched/clock.h>
 
 #include <asm/softirq_stack.h>
 
@@ -489,7 +490,7 @@ asmlinkage __visible void do_softirq(void)
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_TIME	(2 * NSEC_PER_MSEC)
 #define MAX_SOFTIRQ_RESTART 10
 
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -527,9 +528,9 @@ static inline void lockdep_softirq_end(bool in_hardirq) { }
 
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
-	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 	unsigned long old_flags = current->flags;
 	int max_restart = MAX_SOFTIRQ_RESTART;
+	u64 start = sched_clock();
 	struct softirq_action *h;
 	unsigned long pending;
 	unsigned int vec_nr;
@@ -584,7 +585,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 	pending = local_softirq_pending();
 	if (pending) {
-		if (time_before(jiffies, end) && !need_resched() &&
+		if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() &&
 		    --max_restart)
 			goto restart;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/9] softirq: Factor loop termination condition
  2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
  2023-05-05 11:33 ` [PATCH 1/9] softirq: Rewrite softirq processing loop Liu Jian
  2023-05-05 11:33 ` [PATCH 2/9] softirq: Use sched_clock() based timeout Liu Jian
@ 2023-05-05 11:33 ` Liu Jian
  2023-05-05 11:33 ` [PATCH 4/9] softirq: Allow early break Liu Jian
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

From: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 kernel/softirq.c | 44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 59f16a9af5d1..48a81d8ae49a 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -477,22 +477,6 @@ asmlinkage __visible void do_softirq(void)
 
 #endif /* !CONFIG_PREEMPT_RT */
 
-/*
- * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
- * but break the loop if need_resched() is set or after 2 ms.
- * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
- * certain cases, such as stop_machine(), jiffies may cease to
- * increment and so we need the MAX_SOFTIRQ_RESTART limit as
- * well to make sure we eventually return from this method.
- *
- * These limits have been established via experimentation.
- * The two things to balance is latency against fairness -
- * we want to handle softirqs as soon as possible, but they
- * should not be able to lock up the box.
- */
-#define MAX_SOFTIRQ_TIME	(2 * NSEC_PER_MSEC)
-#define MAX_SOFTIRQ_RESTART 10
-
 #ifdef CONFIG_TRACE_IRQFLAGS
 /*
  * When we run softirqs from irq_exit() and thus on the hardirq stack we need
@@ -526,10 +510,33 @@ static inline bool lockdep_softirq_start(void) { return false; }
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
+/*
+ * We restart softirq processing but break the loop if need_resched() is set or
+ * after 2 ms. The MAX_SOFTIRQ_RESTART guarantees a loop termination if
+ * sched_clock() were ever to stall.
+ *
+ * These limits have been established via experimentation.  The two things to
+ * balance is latency against fairness - we want to handle softirqs as soon as
+ * possible, but they should not be able to lock up the box.
+ */
+#define MAX_SOFTIRQ_TIME	(2 * NSEC_PER_MSEC)
+#define MAX_SOFTIRQ_RESTART	10
+
+static inline bool __softirq_needs_break(u64 start)
+{
+	if (need_resched())
+		return true;
+
+	if (sched_clock() - start >= MAX_SOFTIRQ_TIME)
+		return true;
+
+	return false;
+}
+
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
+	unsigned int max_restart = MAX_SOFTIRQ_RESTART;
 	unsigned long old_flags = current->flags;
-	int max_restart = MAX_SOFTIRQ_RESTART;
 	u64 start = sched_clock();
 	struct softirq_action *h;
 	unsigned long pending;
@@ -585,8 +592,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 	pending = local_softirq_pending();
 	if (pending) {
-		if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() &&
-		    --max_restart)
+		if (!__softirq_needs_break(start) && --max_restart)
 			goto restart;
 
 		wakeup_softirqd();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/9] softirq: Allow early break
  2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
                   ` (2 preceding siblings ...)
  2023-05-05 11:33 ` [PATCH 3/9] softirq: Factor loop termination condition Liu Jian
@ 2023-05-05 11:33 ` Liu Jian
  2023-05-05 12:10   ` Eric Dumazet
  2023-05-05 11:33 ` [PATCH 5/9] softirq: Context aware timeout Liu Jian
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

From: Peter Zijlstra <peterz@infradead.org>

Allow terminating the softirq processing loop without finishing the
vectors.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 kernel/softirq.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 48a81d8ae49a..e2cad5d108c8 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -582,6 +582,9 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 			       prev_count, preempt_count());
 			preempt_count_set(prev_count);
 		}
+
+		if (pending && __softirq_needs_break(start))
+			break;
 	}
 
 	if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
@@ -590,13 +593,14 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 	local_irq_disable();
 
-	pending = local_softirq_pending();
-	if (pending) {
-		if (!__softirq_needs_break(start) && --max_restart)
-			goto restart;
+	if (pending)
+		or_softirq_pending(pending);
+	else if ((pending = local_softirq_pending()) &&
+		 !__softirq_needs_break(start) &&
+		 --max_restart)
+		goto restart;
 
-		wakeup_softirqd();
-	}
+	wakeup_softirqd();
 
 	account_softirq_exit(current);
 	lockdep_softirq_end(in_hardirq);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/9] softirq: Context aware timeout
  2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
                   ` (3 preceding siblings ...)
  2023-05-05 11:33 ` [PATCH 4/9] softirq: Allow early break Liu Jian
@ 2023-05-05 11:33 ` Liu Jian
  2023-05-05 11:33 ` [PATCH 6/9] softirq: Provide a softirq_needs_break() API Liu Jian
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

From: Peter Zijlstra <peterz@infradead.org>

Reduce the softirq timeout when it is preempting an RT task.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 kernel/softirq.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index e2cad5d108c8..baa08ae1604f 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -522,12 +522,12 @@ static inline void lockdep_softirq_end(bool in_hardirq) { }
 #define MAX_SOFTIRQ_TIME	(2 * NSEC_PER_MSEC)
 #define MAX_SOFTIRQ_RESTART	10
 
-static inline bool __softirq_needs_break(u64 start)
+static inline bool __softirq_needs_break(u64 start, u64 timo)
 {
 	if (need_resched())
 		return true;
 
-	if (sched_clock() - start >= MAX_SOFTIRQ_TIME)
+	if (sched_clock() - start >= timo)
 		return true;
 
 	return false;
@@ -537,6 +537,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
 	unsigned int max_restart = MAX_SOFTIRQ_RESTART;
 	unsigned long old_flags = current->flags;
+	u64 timo = MAX_SOFTIRQ_TIME;
 	u64 start = sched_clock();
 	struct softirq_action *h;
 	unsigned long pending;
@@ -556,6 +557,9 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	in_hardirq = lockdep_softirq_start();
 	account_softirq_enter(current);
 
+	if (__this_cpu_read(ksoftirqd) != current && task_is_realtime(current))
+		timo >>= 2;
+
 restart:
 	/* Reset the pending bitmask before enabling irqs */
 	set_softirq_pending(0);
@@ -583,7 +587,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 			preempt_count_set(prev_count);
 		}
 
-		if (pending && __softirq_needs_break(start))
+		if (pending && __softirq_needs_break(start, timo))
 			break;
 	}
 
@@ -596,7 +600,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 	if (pending)
 		or_softirq_pending(pending);
 	else if ((pending = local_softirq_pending()) &&
-		 !__softirq_needs_break(start) &&
+		 !__softirq_needs_break(start, timo) &&
 		 --max_restart)
 		goto restart;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/9] softirq: Provide a softirq_needs_break() API
  2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
                   ` (4 preceding siblings ...)
  2023-05-05 11:33 ` [PATCH 5/9] softirq: Context aware timeout Liu Jian
@ 2023-05-05 11:33 ` Liu Jian
  2023-05-05 11:33 ` [PATCH 7/9] softirq,net: Use softirq_needs_break() Liu Jian
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

From: Peter Zijlstra <peterz@infradead.org>

So that all open_softirq() users can employ the same break conditions
and work off the same time-base in case of multiple pending vectors.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 include/linux/interrupt.h |  5 +++++
 kernel/softirq.c          | 16 ++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a92bce40b04b..94d1047fe0c3 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -584,8 +584,13 @@ extern const char * const softirq_to_name[NR_SOFTIRQS];
 struct softirq_action
 {
 	void	(*action)(struct softirq_action *);
+	u64	start;
+	u64	timo;
 };
 
+extern struct softirq_action softirq_action_now(void);
+extern bool softirq_needs_break(struct softirq_action *action);
+
 asmlinkage void do_softirq(void);
 asmlinkage void __do_softirq(void);
 
diff --git a/kernel/softirq.c b/kernel/softirq.c
index baa08ae1604f..dc4299e40dad 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -533,6 +533,20 @@ static inline bool __softirq_needs_break(u64 start, u64 timo)
 	return false;
 }
 
+bool softirq_needs_break(struct softirq_action *h)
+{
+	return __softirq_needs_break(h->start, h->timo);
+}
+
+struct softirq_action softirq_action_now(void)
+{
+	struct softirq_action h = {
+		.start = sched_clock(),
+		.timo = MAX_SOFTIRQ_TIME,
+	};
+	return h;
+}
+
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
 	unsigned int max_restart = MAX_SOFTIRQ_RESTART;
@@ -572,6 +586,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 		__clear_bit(vec_nr, &pending);
 
 		h = softirq_vec + vec_nr;
+		h->start = start;
+		h->timo = timo;
 
 		prev_count = preempt_count();
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/9] softirq,net: Use softirq_needs_break()
  2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
                   ` (5 preceding siblings ...)
  2023-05-05 11:33 ` [PATCH 6/9] softirq: Provide a softirq_needs_break() API Liu Jian
@ 2023-05-05 11:33 ` Liu Jian
  2023-05-08  6:21   ` Jason Xing
  2023-05-05 11:33 ` [PATCH 8/9] softirq,rcu: " Liu Jian
  2023-05-05 11:33 ` [PATCH 9/9] softirq, timer: " Liu Jian
  8 siblings, 1 reply; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

From: Peter Zijlstra <peterz@infradead.org>

SoftIRQs provide their own timeout/break code now, use that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 Documentation/admin-guide/sysctl/net.rst | 11 +----------
 net/core/dev.c                           |  6 +-----
 net/core/dev.h                           |  1 -
 net/core/sysctl_net_core.c               |  8 --------
 4 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 466c560b0c30..095c60788c61 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -267,16 +267,7 @@ netdev_budget
 
 Maximum number of packets taken from all interfaces in one polling cycle (NAPI
 poll). In one polling cycle interfaces which are registered to polling are
-probed in a round-robin manner. Also, a polling cycle may not exceed
-netdev_budget_usecs microseconds, even if netdev_budget has not been
-exhausted.
-
-netdev_budget_usecs
----------------------
-
-Maximum number of microseconds in one NAPI polling cycle. Polling
-will exit when either netdev_budget_usecs have elapsed during the
-poll cycle or the number of packets processed reaches netdev_budget.
+probed in a round-robin manner.
 
 netdev_max_backlog
 ------------------
diff --git a/net/core/dev.c b/net/core/dev.c
index 735096d42c1d..70b6726beee6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4321,7 +4321,6 @@ int netdev_tstamp_prequeue __read_mostly = 1;
 unsigned int sysctl_skb_defer_max __read_mostly = 64;
 int netdev_budget __read_mostly = 300;
 /* Must be at least 2 jiffes to guarantee 1 jiffy timeout */
-unsigned int __read_mostly netdev_budget_usecs = 2 * USEC_PER_SEC / HZ;
 int weight_p __read_mostly = 64;           /* old backlog weight */
 int dev_weight_rx_bias __read_mostly = 1;  /* bias for backlog weight */
 int dev_weight_tx_bias __read_mostly = 1;  /* bias for output_queue quota */
@@ -6659,8 +6658,6 @@ static int napi_threaded_poll(void *data)
 static __latent_entropy void net_rx_action(struct softirq_action *h)
 {
 	struct softnet_data *sd = this_cpu_ptr(&softnet_data);
-	unsigned long time_limit = jiffies +
-		usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
 	int budget = READ_ONCE(netdev_budget);
 	LIST_HEAD(list);
 	LIST_HEAD(repoll);
@@ -6699,8 +6696,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
 		 * Allow this to run for 2 jiffies since which will allow
 		 * an average latency of 1.5/HZ.
 		 */
-		if (unlikely(budget <= 0 ||
-			     time_after_eq(jiffies, time_limit))) {
+		if (unlikely(budget <= 0 || softirq_needs_break(h))) {
 			sd->time_squeeze++;
 			break;
 		}
diff --git a/net/core/dev.h b/net/core/dev.h
index e075e198092c..e64a60c767ab 100644
--- a/net/core/dev.h
+++ b/net/core/dev.h
@@ -39,7 +39,6 @@ void dev_addr_check(struct net_device *dev);
 
 /* sysctls not referred to from outside net/core/ */
 extern int		netdev_budget;
-extern unsigned int	netdev_budget_usecs;
 extern unsigned int	sysctl_skb_defer_max;
 extern int		netdev_tstamp_prequeue;
 extern int		netdev_unregister_timeout_secs;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 782273bb93c2..59765c805f5b 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -595,14 +595,6 @@ static struct ctl_table net_core_table[] = {
 		.extra1		= SYSCTL_ONE,
 		.extra2		= &max_skb_frags,
 	},
-	{
-		.procname	= "netdev_budget_usecs",
-		.data		= &netdev_budget_usecs,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= SYSCTL_ZERO,
-	},
 	{
 		.procname	= "fb_tunnels_only_for_init_net",
 		.data		= &sysctl_fb_tunnels_only_for_init_net,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 8/9] softirq,rcu: Use softirq_needs_break()
  2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
                   ` (6 preceding siblings ...)
  2023-05-05 11:33 ` [PATCH 7/9] softirq,net: Use softirq_needs_break() Liu Jian
@ 2023-05-05 11:33 ` Liu Jian
  2023-05-05 11:33 ` [PATCH 9/9] softirq, timer: " Liu Jian
  8 siblings, 0 replies; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

From: Peter Zijlstra <peterz@infradead.org>

SoftIRQs provide their own timeout/break code now, use that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 kernel/rcu/tree.c      | 29 +++++++----------------------
 kernel/rcu/tree_nocb.h |  2 +-
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f52ff7241041..1942e3db4145 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -397,10 +397,6 @@ static bool rcu_kick_kthreads;
 static int rcu_divisor = 7;
 module_param(rcu_divisor, int, 0644);
 
-/* Force an exit from rcu_do_batch() after 3 milliseconds. */
-static long rcu_resched_ns = 3 * NSEC_PER_MSEC;
-module_param(rcu_resched_ns, long, 0644);
-
 /*
  * How long the grace period must be before we start recruiting
  * quiescent-state help from rcu_note_context_switch().
@@ -2050,7 +2046,7 @@ rcu_check_quiescent_state(struct rcu_data *rdp)
  * Invoke any RCU callbacks that have made it to the end of their grace
  * period.  Throttle as specified by rdp->blimit.
  */
-static void rcu_do_batch(struct rcu_data *rdp)
+static void rcu_do_batch(struct softirq_action *h, struct rcu_data *rdp)
 {
 	int div;
 	bool __maybe_unused empty;
@@ -2058,7 +2054,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	long bl, count = 0;
-	long pending, tlimit = 0;
+	long pending;
 
 	/* If no callbacks are ready, just return. */
 	if (!rcu_segcblist_ready_cbs(&rdp->cblist)) {
@@ -2082,12 +2078,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
 	div = READ_ONCE(rcu_divisor);
 	div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
 	bl = max(rdp->blimit, pending >> div);
-	if (in_serving_softirq() && unlikely(bl > 100)) {
-		long rrn = READ_ONCE(rcu_resched_ns);
-
-		rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn;
-		tlimit = local_clock() + rrn;
-	}
 	trace_rcu_batch_start(rcu_state.name,
 			      rcu_segcblist_n_cbs(&rdp->cblist), bl);
 	rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
@@ -2126,13 +2116,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
 			 * Make sure we don't spend too much time here and deprive other
 			 * softirq vectors of CPU cycles.
 			 */
-			if (unlikely(tlimit)) {
-				/* only call local_clock() every 32 callbacks */
-				if (likely((count & 31) || local_clock() < tlimit))
-					continue;
-				/* Exceeded the time limit, so leave. */
+			if (unlikely(!(count & 31)) && softirq_needs_break(h))
 				break;
-			}
 		} else {
 			// In rcuoc context, so no worries about depriving
 			// other softirq vectors of CPU cycles.
@@ -2320,7 +2305,7 @@ static void strict_work_handler(struct work_struct *work)
 }
 
 /* Perform RCU core processing work for the current CPU.  */
-static __latent_entropy void rcu_core(void)
+static __latent_entropy void rcu_core(struct softirq_action *h)
 {
 	unsigned long flags;
 	struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
@@ -2374,7 +2359,7 @@ static __latent_entropy void rcu_core(void)
 	/* If there are callbacks ready, invoke them. */
 	if (do_batch && rcu_segcblist_ready_cbs(&rdp->cblist) &&
 	    likely(READ_ONCE(rcu_scheduler_fully_active))) {
-		rcu_do_batch(rdp);
+		rcu_do_batch(h, rdp);
 		/* Re-invoke RCU core processing if there are callbacks remaining. */
 		if (rcu_segcblist_ready_cbs(&rdp->cblist))
 			invoke_rcu_core();
@@ -2391,7 +2376,7 @@ static __latent_entropy void rcu_core(void)
 
 static void rcu_core_si(struct softirq_action *h)
 {
-	rcu_core();
+	rcu_core(h);
 }
 
 static void rcu_wake_cond(struct task_struct *t, int status)
@@ -2462,7 +2447,7 @@ static void rcu_cpu_kthread(unsigned int cpu)
 		*workp = 0;
 		local_irq_enable();
 		if (work)
-			rcu_core();
+			rcu_core(NULL);
 		local_bh_enable();
 		if (*workp == 0) {
 			trace_rcu_utilization(TPS("End CPU kthread@rcu_wait"));
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f2280616f9d5..44fc907fdb5e 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -951,7 +951,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
 	 * instances of this callback would execute concurrently.
 	 */
 	local_bh_disable();
-	rcu_do_batch(rdp);
+	rcu_do_batch(NULL, rdp);
 	local_bh_enable();
 	lockdep_assert_irqs_enabled();
 	rcu_nocb_lock_irqsave(rdp, flags);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 9/9] softirq, timer: Use softirq_needs_break()
  2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
                   ` (7 preceding siblings ...)
  2023-05-05 11:33 ` [PATCH 8/9] softirq,rcu: " Liu Jian
@ 2023-05-05 11:33 ` Liu Jian
  8 siblings, 0 replies; 15+ messages in thread
From: Liu Jian @ 2023-05-05 11:33 UTC (permalink / raw)
  To: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu
  Cc: liujian56, linux-doc, linux-kernel, rcu, netdev

In the while loop of __run_timers(), because there are too many timers or
improper timer handler functions, if the processing time of the expired
timers is always greater than the time wheel's next_expiry, the function
will loop infinitely.

To prevent this, use the timeout/break logic provided by SoftIRQs. If the
running time exceeds the limit, break the loop and an additional
TIMER_SOFTIRQ is triggered.

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 kernel/time/timer.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 63a8ce7177dd..70744a469a39 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1992,7 +1992,7 @@ void timer_clear_idle(void)
  * __run_timers - run all expired timers (if any) on this CPU.
  * @base: the timer vector to be processed.
  */
-static inline void __run_timers(struct timer_base *base)
+static inline void __run_timers(struct timer_base *base, struct softirq_action *h)
 {
 	struct hlist_head heads[LVL_DEPTH];
 	int levels;
@@ -2020,6 +2020,12 @@ static inline void __run_timers(struct timer_base *base)
 
 		while (levels--)
 			expire_timers(base, heads + levels);
+
+		if (softirq_needs_break(h)) {
+			if (time_after_eq(jiffies, base->next_expiry))
+				__raise_softirq_irqoff(TIMER_SOFTIRQ);
+			break;
+		}
 	}
 	raw_spin_unlock_irq(&base->lock);
 	timer_base_unlock_expiry(base);
@@ -2032,9 +2038,9 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 {
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 
-	__run_timers(base);
+	__run_timers(base, h);
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
-		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
+		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]), h);
 }
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/9] softirq: Allow early break
  2023-05-05 11:33 ` [PATCH 4/9] softirq: Allow early break Liu Jian
@ 2023-05-05 12:10   ` Eric Dumazet
  2023-05-08  3:37     ` liujian (CE)
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2023-05-05 12:10 UTC (permalink / raw)
  To: Liu Jian
  Cc: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, kuba, pabeni, peterz, frankwoo, Rhinewuwu,
	linux-doc, linux-kernel, rcu, netdev

On Fri, May 5, 2023 at 1:24 PM Liu Jian <liujian56@huawei.com> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> Allow terminating the softirq processing loop without finishing the
> vectors.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Liu Jian <liujian56@huawei.com>
> ---
>  kernel/softirq.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 48a81d8ae49a..e2cad5d108c8 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -582,6 +582,9 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>                                prev_count, preempt_count());
>                         preempt_count_set(prev_count);
>                 }
> +
> +               if (pending && __softirq_needs_break(start))
> +                       break;

This means that some softirq may starve, because

for_each_set_bit(vec_nr, &pending, NR_SOFTIRQS)

Always iterate using the same order (of bits)




>         }
>
>         if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
> @@ -590,13 +593,14 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>
>         local_irq_disable();
>
> -       pending = local_softirq_pending();
> -       if (pending) {
> -               if (!__softirq_needs_break(start) && --max_restart)
> -                       goto restart;
> +       if (pending)
> +               or_softirq_pending(pending);
> +       else if ((pending = local_softirq_pending()) &&
> +                !__softirq_needs_break(start) &&
> +                --max_restart)
> +               goto restart;
>
> -               wakeup_softirqd();
> -       }
> +       wakeup_softirqd();
>
>         account_softirq_exit(current);
>         lockdep_softirq_end(in_hardirq);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/9] softirq: Allow early break
  2023-05-05 12:10   ` Eric Dumazet
@ 2023-05-08  3:37     ` liujian (CE)
  0 siblings, 0 replies; 15+ messages in thread
From: liujian (CE) @ 2023-05-08  3:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, kuba, pabeni, peterz, frankwoo, Rhinewuwu,
	linux-doc, linux-kernel, rcu, netdev



On 2023/5/5 20:10, Eric Dumazet wrote:
> On Fri, May 5, 2023 at 1:24 PM Liu Jian <liujian56@huawei.com> wrote:
>>
>> From: Peter Zijlstra <peterz@infradead.org>
>>
>> Allow terminating the softirq processing loop without finishing the
>> vectors.
>>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Signed-off-by: Liu Jian <liujian56@huawei.com>
>> ---
>>   kernel/softirq.c | 16 ++++++++++------
>>   1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>> index 48a81d8ae49a..e2cad5d108c8 100644
>> --- a/kernel/softirq.c
>> +++ b/kernel/softirq.c
>> @@ -582,6 +582,9 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>>                                 prev_count, preempt_count());
>>                          preempt_count_set(prev_count);
>>                  }
>> +
>> +               if (pending && __softirq_needs_break(start))
>> +                       break;
> 
> This means that some softirq may starve, because
> 
> for_each_set_bit(vec_nr, &pending, NR_SOFTIRQS)
> 
> Always iterate using the same order (of bits)
> 
> 
Yes, we need to record the softirq that is not executed and continue 
next time. I will fix it in next version.
Thanks.
> 
> 
>>          }
>>
>>          if (!IS_ENABLED(CONFIG_PREEMPT_RT) &&
>> @@ -590,13 +593,14 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>>
>>          local_irq_disable();
>>
>> -       pending = local_softirq_pending();
>> -       if (pending) {
>> -               if (!__softirq_needs_break(start) && --max_restart)
>> -                       goto restart;
>> +       if (pending)
>> +               or_softirq_pending(pending);
>> +       else if ((pending = local_softirq_pending()) &&
>> +                !__softirq_needs_break(start) &&
>> +                --max_restart)
>> +               goto restart;
>>
>> -               wakeup_softirqd();
>> -       }
>> +       wakeup_softirqd();
>>
>>          account_softirq_exit(current);
>>          lockdep_softirq_end(in_hardirq);
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/9] softirq: Use sched_clock() based timeout
  2023-05-05 11:33 ` [PATCH 2/9] softirq: Use sched_clock() based timeout Liu Jian
@ 2023-05-08  4:08   ` Jason Xing
  2023-05-08 17:51     ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Xing @ 2023-05-08  4:08 UTC (permalink / raw)
  To: Liu Jian
  Cc: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu, linux-doc, linux-kernel, rcu, netdev

On Fri, May 5, 2023 at 7:25 PM Liu Jian <liujian56@huawei.com> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> Replace the jiffies based timeout with a sched_clock() based one.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Liu Jian <liujian56@huawei.com>
> ---
>  kernel/softirq.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index bff5debf6ce6..59f16a9af5d1 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -27,6 +27,7 @@
>  #include <linux/tick.h>
>  #include <linux/irq.h>
>  #include <linux/wait_bit.h>
> +#include <linux/sched/clock.h>
>
>  #include <asm/softirq_stack.h>
>
> @@ -489,7 +490,7 @@ asmlinkage __visible void do_softirq(void)
>   * we want to handle softirqs as soon as possible, but they
>   * should not be able to lock up the box.
>   */
> -#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
> +#define MAX_SOFTIRQ_TIME       (2 * NSEC_PER_MSEC)

I wonder if it affects those servers that set HZ to some different
values rather than 1000 as default.

Thanks,
Jason

>  #define MAX_SOFTIRQ_RESTART 10
>
>  #ifdef CONFIG_TRACE_IRQFLAGS
> @@ -527,9 +528,9 @@ static inline void lockdep_softirq_end(bool in_hardirq) { }
>
>  asmlinkage __visible void __softirq_entry __do_softirq(void)
>  {
> -       unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
>         unsigned long old_flags = current->flags;
>         int max_restart = MAX_SOFTIRQ_RESTART;
> +       u64 start = sched_clock();
>         struct softirq_action *h;
>         unsigned long pending;
>         unsigned int vec_nr;
> @@ -584,7 +585,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>
>         pending = local_softirq_pending();
>         if (pending) {
> -               if (time_before(jiffies, end) && !need_resched() &&
> +               if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() &&
>                     --max_restart)
>                         goto restart;
>
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/9] softirq,net: Use softirq_needs_break()
  2023-05-05 11:33 ` [PATCH 7/9] softirq,net: Use softirq_needs_break() Liu Jian
@ 2023-05-08  6:21   ` Jason Xing
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Xing @ 2023-05-08  6:21 UTC (permalink / raw)
  To: Liu Jian
  Cc: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	tglx, sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo,
	Rhinewuwu, linux-doc, linux-kernel, rcu, netdev

On Fri, May 5, 2023 at 7:27 PM Liu Jian <liujian56@huawei.com> wrote:
>
> From: Peter Zijlstra <peterz@infradead.org>
>
> SoftIRQs provide their own timeout/break code now, use that.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Liu Jian <liujian56@huawei.com>
> ---
>  Documentation/admin-guide/sysctl/net.rst | 11 +----------
>  net/core/dev.c                           |  6 +-----
>  net/core/dev.h                           |  1 -
>  net/core/sysctl_net_core.c               |  8 --------
>  4 files changed, 2 insertions(+), 24 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
> index 466c560b0c30..095c60788c61 100644
> --- a/Documentation/admin-guide/sysctl/net.rst
> +++ b/Documentation/admin-guide/sysctl/net.rst
> @@ -267,16 +267,7 @@ netdev_budget
>
>  Maximum number of packets taken from all interfaces in one polling cycle (NAPI
>  poll). In one polling cycle interfaces which are registered to polling are
> -probed in a round-robin manner. Also, a polling cycle may not exceed
> -netdev_budget_usecs microseconds, even if netdev_budget has not been
> -exhausted.
> -
> -netdev_budget_usecs
> ----------------------
> -
> -Maximum number of microseconds in one NAPI polling cycle. Polling
> -will exit when either netdev_budget_usecs have elapsed during the
> -poll cycle or the number of packets processed reaches netdev_budget.
> +probed in a round-robin manner.
>
>  netdev_max_backlog
>  ------------------
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 735096d42c1d..70b6726beee6 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4321,7 +4321,6 @@ int netdev_tstamp_prequeue __read_mostly = 1;
>  unsigned int sysctl_skb_defer_max __read_mostly = 64;
>  int netdev_budget __read_mostly = 300;
>  /* Must be at least 2 jiffes to guarantee 1 jiffy timeout */
> -unsigned int __read_mostly netdev_budget_usecs = 2 * USEC_PER_SEC / HZ;
>  int weight_p __read_mostly = 64;           /* old backlog weight */
>  int dev_weight_rx_bias __read_mostly = 1;  /* bias for backlog weight */
>  int dev_weight_tx_bias __read_mostly = 1;  /* bias for output_queue quota */
> @@ -6659,8 +6658,6 @@ static int napi_threaded_poll(void *data)
>  static __latent_entropy void net_rx_action(struct softirq_action *h)
>  {
>         struct softnet_data *sd = this_cpu_ptr(&softnet_data);
> -       unsigned long time_limit = jiffies +
> -               usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
>         int budget = READ_ONCE(netdev_budget);
>         LIST_HEAD(list);
>         LIST_HEAD(repoll);
> @@ -6699,8 +6696,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
>                  * Allow this to run for 2 jiffies since which will allow
>                  * an average latency of 1.5/HZ.
>                  */
> -               if (unlikely(budget <= 0 ||
> -                            time_after_eq(jiffies, time_limit))) {
> +               if (unlikely(budget <= 0 || softirq_needs_break(h))) {
>                         sd->time_squeeze++;
>                         break;
>                 }
> diff --git a/net/core/dev.h b/net/core/dev.h
> index e075e198092c..e64a60c767ab 100644
> --- a/net/core/dev.h
> +++ b/net/core/dev.h
> @@ -39,7 +39,6 @@ void dev_addr_check(struct net_device *dev);
>
>  /* sysctls not referred to from outside net/core/ */
>  extern int             netdev_budget;
> -extern unsigned int    netdev_budget_usecs;
>  extern unsigned int    sysctl_skb_defer_max;
>  extern int             netdev_tstamp_prequeue;
>  extern int             netdev_unregister_timeout_secs;
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index 782273bb93c2..59765c805f5b 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -595,14 +595,6 @@ static struct ctl_table net_core_table[] = {
>                 .extra1         = SYSCTL_ONE,
>                 .extra2         = &max_skb_frags,
>         },
> -       {
> -               .procname       = "netdev_budget_usecs",
> -               .data           = &netdev_budget_usecs,
> -               .maxlen         = sizeof(unsigned int),
> -               .mode           = 0644,
> -               .proc_handler   = proc_dointvec_minmax,
> -               .extra1         = SYSCTL_ZERO,
> -       },

I cannot help asking whether we really need to remove the sysctl knob
because it can let some users tune this. It's useful for some cases, I
believe. Do you have any evidence/number to prove we can get rid of
this?

Thanks,
Jason

>         {
>                 .procname       = "fb_tunnels_only_for_init_net",
>                 .data           = &sysctl_fb_tunnels_only_for_init_net,
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/9] softirq: Use sched_clock() based timeout
  2023-05-08  4:08   ` Jason Xing
@ 2023-05-08 17:51     ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2023-05-08 17:51 UTC (permalink / raw)
  To: Jason Xing, Liu Jian
  Cc: corbet, paulmck, frederic, quic_neeraju, joel, josh, boqun.feng,
	rostedt, mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	sboyd, davem, edumazet, kuba, pabeni, peterz, frankwoo, Rhinewuwu,
	linux-doc, linux-kernel, rcu, netdev

On Mon, May 08 2023 at 12:08, Jason Xing wrote:
> On Fri, May 5, 2023 at 7:25 PM Liu Jian <liujian56@huawei.com> wrote:
>> @@ -489,7 +490,7 @@ asmlinkage __visible void do_softirq(void)
>>   * we want to handle softirqs as soon as possible, but they
>>   * should not be able to lock up the box.
>>   */
>> -#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
>> +#define MAX_SOFTIRQ_TIME       (2 * NSEC_PER_MSEC)
>
> I wonder if it affects those servers that set HZ to some different
> values rather than 1000 as default.

The result of msecs_to_jiffies(2) for different HZ values:

HZ=100     1
HZ=250     1
HZ=1000    2

So depending on when the softirq processing starts, this gives the
following ranges in which the timeout ends:

HZ=100    0 - 10ms
HZ=250    0 -  4ms
HZ=1000   1 -  2ms

But as the various softirq handlers have their own notion of timeouts,
loop limits etc. and the timeout is only checked after _all_ pending
bits of each iteration have been processed, the outcome of this is all
lottery.

Due to that the sched_clock() change per se won't have too much impact,
but if the overall changes to consolidate the break conditions are in
place, I think it will have observable effects.

Making this consistent is definitely a good thing, but it won't solve
the underlying problem of soft interrupt processing at all.

We definitely need to spend more thoughts on pulling things out of soft
interrupt context so that these functionalities get under proper
resource control by the scheduler.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-05-08 17:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-05 11:33 [PATCH 0/9] fix softlockup in run_timer_softirq Liu Jian
2023-05-05 11:33 ` [PATCH 1/9] softirq: Rewrite softirq processing loop Liu Jian
2023-05-05 11:33 ` [PATCH 2/9] softirq: Use sched_clock() based timeout Liu Jian
2023-05-08  4:08   ` Jason Xing
2023-05-08 17:51     ` Thomas Gleixner
2023-05-05 11:33 ` [PATCH 3/9] softirq: Factor loop termination condition Liu Jian
2023-05-05 11:33 ` [PATCH 4/9] softirq: Allow early break Liu Jian
2023-05-05 12:10   ` Eric Dumazet
2023-05-08  3:37     ` liujian (CE)
2023-05-05 11:33 ` [PATCH 5/9] softirq: Context aware timeout Liu Jian
2023-05-05 11:33 ` [PATCH 6/9] softirq: Provide a softirq_needs_break() API Liu Jian
2023-05-05 11:33 ` [PATCH 7/9] softirq,net: Use softirq_needs_break() Liu Jian
2023-05-08  6:21   ` Jason Xing
2023-05-05 11:33 ` [PATCH 8/9] softirq,rcu: " Liu Jian
2023-05-05 11:33 ` [PATCH 9/9] softirq, timer: " Liu Jian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).