public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock
@ 2023-08-01 13:24 Frederic Weisbecker
  2023-08-01 13:24 ` [RFC PATCH 1/6] softirq: Turn set_softirq_pending() to reset_softirq_pending() Frederic Weisbecker
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2023-08-01 13:24 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Sebastian Andrzej Siewior, Peter Zijlstra,
	Thomas Gleixner, Paul E . McKenney, Linus Torvalds,
	Anna-Maria Behnsen, Eric Dumazet

Networking softirqs can take time, holding the execution of block,
tasklets, timers and RCU callbacks.

People fight hard through this big softirq lock, proposing more and
more hacks over the years to deal with the resulting fundamental
unfairness that is not only a problem for RT users.

Here is a proposal for an entrypoint to dealing with that issue in the
long term. The purpose is to adopt a similar journey to the one we took
with the BKL push-down but with timers. Most timers are unrelated to
other softirq vectors, those can simply be tagged with the new
TIMER_SOFTINTERRUPTIBLE flag that makes a callback soft-interruptible.
The others can carry the TIMER_SOFTINTERRUPTIBLE after they get converted
to use appropriate synchronization against other vectors callbacks
(using spin_lock_bh() for example).

Once all timers are dealt with after a few years (famous last words),
they can be handled separately from the softirq infrastructure.

RCU could follow a similar treatment, if we manage to find room for a
flag somewhere...

(Only -RT and x86 are supported for now)

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/softirq-preemptible

HEAD: c233aee141ddb78d07b2f7311be38cfc286de654

Thanks,
	Frederic
---

Frederic Weisbecker (6):
      softirq: Turn set_softirq_pending() to reset_softirq_pending()
      softirq: Make softirq handling entry/exit generally available
      softirq: Introduce softirq disabled mask
      x86/softirq: Support softirq disabled mask
      timers: Introduce soft-interruptible timers
      timers: Make process_timeout() soft-interruptible


 arch/Kconfig                   |  3 +++
 arch/x86/Kconfig               |  1 +
 arch/x86/include/asm/current.h |  1 +
 arch/x86/include/asm/hardirq.h |  1 +
 include/linux/bottom_half.h    |  9 +++++++++
 include/linux/interrupt.h      | 15 ++++++++++++---
 include/linux/timer.h          |  5 +++--
 kernel/softirq.c               | 40 ++++++++++++++++++++++++++++++++--------
 kernel/time/timer.c            | 20 +++++++++++++++++++-
 9 files changed, 81 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/6] softirq: Turn set_softirq_pending() to reset_softirq_pending()
  2023-08-01 13:24 [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Frederic Weisbecker
@ 2023-08-01 13:24 ` Frederic Weisbecker
  2023-08-01 13:24 ` [RFC PATCH 2/6] softirq: Make softirq handling entry/exit generally available Frederic Weisbecker
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2023-08-01 13:24 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Sebastian Andrzej Siewior, Peter Zijlstra,
	Thomas Gleixner, Paul E . McKenney, Linus Torvalds,
	Anna-Maria Behnsen, Eric Dumazet

set_softirq_pending() is only ever used to reset the pending vector's
mask to 0. Make the function more specialized for that very purpose and
rename it accordingly.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/interrupt.h | 2 +-
 kernel/softirq.c          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index bf82980f569d..2099fe3980bc 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -524,7 +524,7 @@ DECLARE_STATIC_KEY_FALSE(force_irqthreads_key);
 #endif
 
 #define local_softirq_pending()	(__this_cpu_read(local_softirq_pending_ref))
-#define set_softirq_pending(x)	(__this_cpu_write(local_softirq_pending_ref, (x)))
+#define reset_softirq_pending()	(__this_cpu_write(local_softirq_pending_ref, 0))
 #define or_softirq_pending(x)	(__this_cpu_or(local_softirq_pending_ref, (x)))
 
 #endif /* local_softirq_pending */
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 623985f18833..1a3c3fe341ea 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -532,7 +532,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 restart:
 	/* Reset the pending bitmask before enabling irqs */
-	set_softirq_pending(0);
+	reset_softirq_pending();
 
 	local_irq_enable();
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/6] softirq: Make softirq handling entry/exit generally available
  2023-08-01 13:24 [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Frederic Weisbecker
  2023-08-01 13:24 ` [RFC PATCH 1/6] softirq: Turn set_softirq_pending() to reset_softirq_pending() Frederic Weisbecker
@ 2023-08-01 13:24 ` Frederic Weisbecker
  2023-08-01 13:24 ` [RFC PATCH 3/6] softirq: Introduce softirq disabled mask Frederic Weisbecker
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2023-08-01 13:24 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Sebastian Andrzej Siewior, Peter Zijlstra,
	Thomas Gleixner, Paul E . McKenney, Linus Torvalds,
	Anna-Maria Behnsen, Eric Dumazet

In order to prepare for re-enabling softirqs from vector callbacks that
are known safe, make the code incrementing the preempt count while
servicing softirqs more generally available.

No intended behaviour change.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/bottom_half.h |  7 +++++++
 kernel/softirq.c            | 22 ++++++++++++++++------
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
index fc53e0ad56d9..2243c7de4917 100644
--- a/include/linux/bottom_half.h
+++ b/include/linux/bottom_half.h
@@ -33,6 +33,13 @@ static inline void local_bh_enable(void)
 	__local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET);
 }
 
+static inline void local_bh_enter(void)
+{
+	__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
+}
+
+extern void local_bh_exit(void);
+
 #ifdef CONFIG_PREEMPT_RT
 extern bool local_bh_blocked(void);
 #else
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 1a3c3fe341ea..ba998d572ef4 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -247,21 +247,26 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
 }
 EXPORT_SYMBOL(__local_bh_enable_ip);
 
+inline void local_bh_exit(void)
+{
+	__local_bh_enable(SOFTIRQ_OFFSET, true);
+	WARN_ON_ONCE(in_interrupt());
+}
+
 /*
  * Invoked from ksoftirqd_run() outside of the interrupt disabled section
  * to acquire the per CPU local lock for reentrancy protection.
  */
 static inline void ksoftirqd_run_begin(void)
 {
-	__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
+	local_bh_enter();
 	local_irq_disable();
 }
 
 /* Counterpart to ksoftirqd_run_begin() */
 static inline void ksoftirqd_run_end(void)
 {
-	__local_bh_enable(SOFTIRQ_OFFSET, true);
-	WARN_ON_ONCE(in_interrupt());
+	local_bh_exit();
 	local_irq_enable();
 }
 
@@ -389,15 +394,20 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
 }
 EXPORT_SYMBOL(__local_bh_enable_ip);
 
+inline void local_bh_exit(void)
+{
+	__local_bh_enable(SOFTIRQ_OFFSET);
+	WARN_ON_ONCE(in_interrupt());
+}
+
 static inline void softirq_handle_begin(void)
 {
-	__local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
+	local_bh_enter();
 }
 
 static inline void softirq_handle_end(void)
 {
-	__local_bh_enable(SOFTIRQ_OFFSET);
-	WARN_ON_ONCE(in_interrupt());
+	local_bh_exit();
 }
 
 static inline void ksoftirqd_run_begin(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/6] softirq: Introduce softirq disabled mask
  2023-08-01 13:24 [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Frederic Weisbecker
  2023-08-01 13:24 ` [RFC PATCH 1/6] softirq: Turn set_softirq_pending() to reset_softirq_pending() Frederic Weisbecker
  2023-08-01 13:24 ` [RFC PATCH 2/6] softirq: Make softirq handling entry/exit generally available Frederic Weisbecker
@ 2023-08-01 13:24 ` Frederic Weisbecker
  2023-08-01 13:24 ` [RFC PATCH 4/6] x86/softirq: Support " Frederic Weisbecker
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2023-08-01 13:24 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Sebastian Andrzej Siewior, Peter Zijlstra,
	Thomas Gleixner, Paul E . McKenney, Linus Torvalds,
	Anna-Maria Behnsen, Eric Dumazet

(DISCLAIMER: contains -RT bits)

When softirq vectors will be able to re-enable softirqs when deemed
safe, for example when a timer callback is tagged as soft-interruptible
by other softirq vectors, care must be taken to ensure a given vector
is not re-entrant. Ie: a vector can be interrupted by others but not
by itself.

In order to prepare for this, introduce a softirq disabled mask so that
vectors can disable themselves before re-enabling softirqs.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/Kconfig                |  3 +++
 include/linux/bottom_half.h |  2 ++
 include/linux/interrupt.h   | 15 ++++++++++++---
 kernel/softirq.c            | 16 +++++++++++++++-
 4 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 205fd23e0cad..d23968860ddf 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1358,6 +1358,9 @@ config RELR
 config ARCH_HAS_MEM_ENCRYPT
 	bool
 
+config ARCH_HAS_SOFTIRQ_DISABLED_MASK
+       bool
+
 config ARCH_HAS_CC_PLATFORM
 	bool
 
diff --git a/include/linux/bottom_half.h b/include/linux/bottom_half.h
index 2243c7de4917..d5b37b580c79 100644
--- a/include/linux/bottom_half.h
+++ b/include/linux/bottom_half.h
@@ -42,6 +42,8 @@ extern void local_bh_exit(void);
 
 #ifdef CONFIG_PREEMPT_RT
 extern bool local_bh_blocked(void);
+extern void local_bh_vec_enable(int vec);
+extern void local_bh_vec_disable(int vec);
 #else
 static inline bool local_bh_blocked(void) { return false; }
 #endif
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 2099fe3980bc..7819d16d8d6f 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -523,8 +523,16 @@ DECLARE_STATIC_KEY_FALSE(force_irqthreads_key);
 #define local_softirq_pending_ref irq_stat.__softirq_pending
 #endif
 
-#define local_softirq_pending()	(__this_cpu_read(local_softirq_pending_ref))
-#define reset_softirq_pending()	(__this_cpu_write(local_softirq_pending_ref, 0))
+#if defined(CONFIG_PREEMPT_RT) && defined(CONFIG_ARCH_HAS_SOFTIRQ_DISABLED_MASK)
+#define local_softirq_disabled()	(__this_cpu_read(local_softirq_disabled_ref))
+#else
+#define local_softirq_disabled()	(0)
+#endif
+
+#define local_softirq_pending()	(__this_cpu_read(local_softirq_pending_ref) & \
+				 ~local_softirq_disabled())
+#define reset_softirq_pending() (__this_cpu_and(local_softirq_pending_ref, \
+				local_softirq_disabled()))
 #define or_softirq_pending(x)	(__this_cpu_or(local_softirq_pending_ref, (x)))
 
 #endif /* local_softirq_pending */
@@ -614,7 +622,8 @@ extern void raise_hrtimer_softirq(void);
 
 static inline unsigned int local_pending_timers(void)
 {
-        return __this_cpu_read(pending_timer_softirq);
+        return __this_cpu_read(pending_timer_softirq) &
+		~local_softirq_disabled();
 }
 
 #else
diff --git a/kernel/softirq.c b/kernel/softirq.c
index ba998d572ef4..a394f78de627 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -297,6 +297,18 @@ void do_softirq_post_smp_call_flush(unsigned int was_pending)
 		invoke_softirq();
 }
 
+#ifdef CONFIG_ARCH_HAS_SOFTIRQ_DISABLED_MASK
+void local_bh_vec_enable(int vec)
+{
+	__this_cpu_and(local_softirq_disabled_ref, ~vec);
+}
+
+void local_bh_vec_disable(int vec)
+{
+	__this_cpu_or(local_softirq_disabled_ref, vec);
+}
+#endif
+
 #else /* CONFIG_PREEMPT_RT */
 
 /*
@@ -1009,11 +1021,13 @@ static int timersd_should_run(unsigned int cpu)
 static void run_timersd(unsigned int cpu)
 {
 	unsigned int timer_si;
+	unsigned long timersd_vecs = (1 << TIMER_SOFTIRQ) | (1 << HRTIMER_SOFTIRQ);
 
 	ksoftirqd_run_begin();
 
 	timer_si = local_pending_timers();
-	__this_cpu_write(pending_timer_softirq, 0);
+	__this_cpu_and(pending_timer_softirq,
+		       local_softirq_disabled() & timersd_vecs);
 	or_softirq_pending(timer_si);
 
 	__do_softirq();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 4/6] x86/softirq: Support softirq disabled mask
  2023-08-01 13:24 [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2023-08-01 13:24 ` [RFC PATCH 3/6] softirq: Introduce softirq disabled mask Frederic Weisbecker
@ 2023-08-01 13:24 ` Frederic Weisbecker
  2023-08-01 13:24 ` [RFC PATCH 5/6] timers: Introduce soft-interruptible timers Frederic Weisbecker
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2023-08-01 13:24 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Sebastian Andrzej Siewior, Peter Zijlstra,
	Thomas Gleixner, Paul E . McKenney, Linus Torvalds,
	Anna-Maria Behnsen, Eric Dumazet

Support the new softirq disabled vectors mask and put in the per-cpu
hot structure along with the pending vectors mask as both are used
closely together.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/x86/Kconfig               | 1 +
 arch/x86/include/asm/current.h | 1 +
 arch/x86/include/asm/hardirq.h | 1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a2a410d13e39..f1cd68b672dc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -93,6 +93,7 @@ config X86
 	select ARCH_HAS_COPY_MC			if X86_64
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SET_DIRECT_MAP
+	select ARCH_HAS_SOFTIRQ_DISABLED_MASK
 	select ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index a1168e7b69e5..b86ca9c0ddc2 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -23,6 +23,7 @@ struct pcpu_hot {
 			unsigned long		top_of_stack;
 			void			*hardirq_stack_ptr;
 			u16			softirq_pending;
+			u16			softirq_disabled;
 #ifdef CONFIG_X86_64
 			bool			hardirq_stack_inuse;
 #else
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 66837b8c67f1..933cf05e5738 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -61,6 +61,7 @@ extern u64 arch_irq_stat(void);
 #define arch_irq_stat		arch_irq_stat
 
 #define local_softirq_pending_ref       pcpu_hot.softirq_pending
+#define local_softirq_disabled_ref       pcpu_hot.softirq_disabled
 
 #if IS_ENABLED(CONFIG_KVM_INTEL)
 static inline void kvm_set_cpu_l1tf_flush_l1d(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 5/6] timers: Introduce soft-interruptible timers
  2023-08-01 13:24 [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2023-08-01 13:24 ` [RFC PATCH 4/6] x86/softirq: Support " Frederic Weisbecker
@ 2023-08-01 13:24 ` Frederic Weisbecker
  2023-08-01 13:24 ` [RFC PATCH 6/6] timers: Make process_timeout() soft-interruptible Frederic Weisbecker
  2023-08-07 12:50 ` [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Sebastian Andrzej Siewior
  6 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2023-08-01 13:24 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Sebastian Andrzej Siewior, Peter Zijlstra,
	Thomas Gleixner, Paul E . McKenney, Linus Torvalds,
	Anna-Maria Behnsen, Eric Dumazet

Most timers are unrelated to networking or other softirq vectors (RCU,
NET_*, HRTIMER, TASKLET, ...). Yet when a timer batch is executing,
other softirq vectors have to wait for the timers batch completion
even though there is nothing to synchronize against most callbacks.

However there is no automatic way to determine if a timer callback is
safely soft-interruptible by other vectors. So the only long term viable
approach to solve this is to adopt a progressive push down solution
similar to the one used for getting rid of the big kernel lock.

Introduce a new TIMER_SOFTINTERRUPTIBLE flag which tells the timer
subsystem that a callback is safely soft-interruptible by other vectors,
either because it's completely unrelated to them or because it uses the
appropriate local_bh_disable()/spin_lock_bh() on narrowed-down regions.

Once all timers are dealt with after a few years, it will become
possible to run timers out of the softirqs processing.

It's worth noting though that if the softirq infrastructure supports
soft-interruption of a TIMER_SOFTINTERRUPTIBLE timer, it doesn't allow
yet a TIMER_SOFTINTERRUPTIBLE timer to soft-interrupt other vectors,
even though nothing prevents from it to happen from a correctness point
of view, more tweaks are needed to support that.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/timer.h |  5 +++--
 kernel/time/timer.c   | 18 ++++++++++++++++++
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 9162f275819a..fbe40bacc8c3 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -61,13 +61,14 @@ struct timer_list {
  * should be placed on a particular CPU, then add_timer_on() has to be
  * used.
  */
-#define TIMER_CPUMASK		0x0003FFFF
+#define TIMER_CPUMASK		0x0001FFFF /* If 1 more bit is needed, flags must be 64 */
+#define TIMER_SOFTINTERRUPTIBLE	0x00020000
 #define TIMER_MIGRATING		0x00040000
 #define TIMER_BASEMASK		(TIMER_CPUMASK | TIMER_MIGRATING)
 #define TIMER_DEFERRABLE	0x00080000
 #define TIMER_PINNED		0x00100000
 #define TIMER_IRQSAFE		0x00200000
-#define TIMER_INIT_FLAGS	(TIMER_DEFERRABLE | TIMER_PINNED | TIMER_IRQSAFE)
+#define TIMER_INIT_FLAGS	(TIMER_SOFTINTERRUPTIBLE | TIMER_DEFERRABLE | TIMER_PINNED | TIMER_IRQSAFE)
 #define TIMER_ARRAYSHIFT	22
 #define TIMER_ARRAYMASK		0xFFC00000
 
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 7cad6fe3c035..1e43f54def0e 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1676,6 +1676,7 @@ static void call_timer_fn(struct timer_list *timer,
 			  unsigned long baseclk)
 {
 	int count = preempt_count();
+	bool softinterruptible = false;
 
 #ifdef CONFIG_LOCKDEP
 	/*
@@ -1689,6 +1690,17 @@ static void call_timer_fn(struct timer_list *timer,
 
 	lockdep_copy_map(&lockdep_map, &timer->lockdep_map);
 #endif
+
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) &&
+	    IS_ENABLED(CONFIG_ARCH_HAS_SOFTIRQ_DISABLED_MASK) &&
+	    timer->flags & TIMER_SOFTINTERRUPTIBLE)
+		softinterruptible = true;
+
+	if (softinterruptible) {
+		local_bh_vec_disable(1 << TIMER_SOFTIRQ);
+		local_bh_exit();
+	}
+
 	/*
 	 * Couple the lock chain with the lock chain at
 	 * timer_delete_sync() by acquiring the lock_map around the fn()
@@ -1702,6 +1714,12 @@ static void call_timer_fn(struct timer_list *timer,
 
 	lock_map_release(&lockdep_map);
 
+	if (softinterruptible) {
+		local_bh_enter();
+		local_bh_vec_enable(1 << TIMER_SOFTIRQ);
+	}
+
+
 	if (count != preempt_count()) {
 		WARN_ONCE(1, "timer: %pS preempt leak: %08x -> %08x\n",
 			  fn, count, preempt_count());
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 6/6] timers: Make process_timeout() soft-interruptible
  2023-08-01 13:24 [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2023-08-01 13:24 ` [RFC PATCH 5/6] timers: Introduce soft-interruptible timers Frederic Weisbecker
@ 2023-08-01 13:24 ` Frederic Weisbecker
  2023-08-07 12:50 ` [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Sebastian Andrzej Siewior
  6 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2023-08-01 13:24 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Sebastian Andrzej Siewior, Peter Zijlstra,
	Thomas Gleixner, Paul E . McKenney, Linus Torvalds,
	Anna-Maria Behnsen, Eric Dumazet

The most frequent timer can be safely interrupted by other vectors as
it's only performing a wake-up.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 1e43f54def0e..0f6eb9c0901c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -2180,7 +2180,7 @@ signed long __sched schedule_timeout(signed long timeout)
 	expire = timeout + jiffies;
 
 	timer.task = current;
-	timer_setup_on_stack(&timer.timer, process_timeout, 0);
+	timer_setup_on_stack(&timer.timer, process_timeout, TIMER_SOFTINTERRUPTIBLE);
 	__mod_timer(&timer.timer, expire, MOD_TIMER_NOTPENDING);
 	schedule();
 	del_timer_sync(&timer.timer);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock
  2023-08-01 13:24 [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2023-08-01 13:24 ` [RFC PATCH 6/6] timers: Make process_timeout() soft-interruptible Frederic Weisbecker
@ 2023-08-07 12:50 ` Sebastian Andrzej Siewior
  2023-08-07 15:22   ` Frederic Weisbecker
  6 siblings, 1 reply; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-08-07 12:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Thomas Gleixner, Paul E . McKenney,
	Linus Torvalds, Anna-Maria Behnsen, Eric Dumazet

On 2023-08-01 15:24:35 [+0200], Frederic Weisbecker wrote:
> Networking softirqs can take time, holding the execution of block,
> tasklets, timers and RCU callbacks.
> 
> People fight hard through this big softirq lock, proposing more and
> more hacks over the years to deal with the resulting fundamental
> unfairness that is not only a problem for RT users.
> 
> Here is a proposal for an entrypoint to dealing with that issue in the
> long term. The purpose is to adopt a similar journey to the one we took
> with the BKL push-down but with timers. Most timers are unrelated to
> other softirq vectors, those can simply be tagged with the new
> TIMER_SOFTINTERRUPTIBLE flag that makes a callback soft-interruptible.
> The others can carry the TIMER_SOFTINTERRUPTIBLE after they get converted
> to use appropriate synchronization against other vectors callbacks
> (using spin_lock_bh() for example).

This doesn't work as proposed because of lock ordering:
|======================================================
|WARNING: possible circular locking dependency detected
|6.5.0-rc4-rt1+ #220 Not tainted
|------------------------------------------------------
|ktimers/0/15 is trying to acquire lock:
|ffff88817b41b6d8 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0xb7/0x1a0
|
|but task is already holding lock:
|ffff88817b41c820 (&base->expiry_lock){+.+.}-{2:2}, at: run_timer_softirq+0x61/0x3f0
|
|which lock already depends on the new lock.
|
|
|the existing dependency chain (in reverse order) is:
|
|-> #1 (&base->expiry_lock){+.+.}-{2:2}:
|       lock_acquire+0xd4/0x2f0
|       rt_spin_lock+0x21/0xf0
|       run_timer_softirq+0x61/0x3f0
|       __do_softirq+0x19b/0x4cb
|       run_timersd+0x92/0xf0
|       smpboot_thread_fn+0x211/0x330
|       kthread+0x110/0x130
|       ret_from_fork+0x2b/0x40
|       ret_from_fork_asm+0x1b/0x30
|
|-> #0 ((softirq_ctrl.lock)){+.+.}-{2:2}:
|       check_prev_add+0xe2/0xd60
|       __lock_acquire+0x132d/0x1700
|       lock_acquire+0xd4/0x2f0
|       rt_spin_lock+0x21/0xf0
|       __local_bh_disable_ip+0xb7/0x1a0
|       call_timer_fn+0x172/0x310
|       run_timer_softirq+0x331/0x3f0
|       __do_softirq+0x19b/0x4cb
|       run_timersd+0x92/0xf0
|       smpboot_thread_fn+0x211/0x330
|       kthread+0x110/0x130
|       ret_from_fork+0x2b/0x40
|       ret_from_fork_asm+0x1b/0x30
|
|other info that might help us debug this:
|
| Possible unsafe locking scenario:
|
|       CPU0                    CPU1
|       ----                    ----
|  lock(&base->expiry_lock);
|                               lock((softirq_ctrl.lock));
|                               lock(&base->expiry_lock);
|  lock((softirq_ctrl.lock));
|
| *** DEADLOCK ***
|
|2 locks held by ktimers/0/15:
| #0: ffffffff826e9ce0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5d/0xf0
| #1: ffff88817b41c820 (&base->expiry_lock){+.+.}-{2:2}, at: run_timer_softirq+0x61/0x3f0

I posted a different series last week where I drop the lock for other
reasons at a different spot where it is safe to do so. It needs adopting
other level softirq handler (besides TIMER_SOFTIRQ) but there is no
need to deal with the individual callback.

However, how do you continue here? Assuming all timers are marked
TIMER_SOFTINTERRUPTIBLE then you could avoid the BH-lock at the
timer-softirq.
But when is a timer considered safe? Would the lack of the _bh suffix be
that and you would simply add it during your push down? 
Then you continue the same thing for the remaining softirqs. And once
you are done you would remove that RT lock within local_bh_disable()?
This isn't something a !RT user would benefit, right?

The other idea I have (besides the preemption point in each softirq
handler (mentioned earlier)) is to simple drop the BH-lock on RT. Unlike
mainline, RT wouldn't deadlock then. The only that would be missing is
synchronisation against local_bh_disable() only locking for variables.
From what I remember from the various BH-models we have in RT in the
past, that was the only thing that exploded.

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock
  2023-08-07 12:50 ` [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Sebastian Andrzej Siewior
@ 2023-08-07 15:22   ` Frederic Weisbecker
  2023-08-08  7:15     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2023-08-07 15:22 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, Peter Zijlstra, Thomas Gleixner, Paul E . McKenney,
	Linus Torvalds, Anna-Maria Behnsen, Eric Dumazet

On Mon, Aug 07, 2023 at 02:50:20PM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-08-01 15:24:35 [+0200], Frederic Weisbecker wrote:
> > Networking softirqs can take time, holding the execution of block,
> > tasklets, timers and RCU callbacks.
> > 
> > People fight hard through this big softirq lock, proposing more and
> > more hacks over the years to deal with the resulting fundamental
> > unfairness that is not only a problem for RT users.
> > 
> > Here is a proposal for an entrypoint to dealing with that issue in the
> > long term. The purpose is to adopt a similar journey to the one we took
> > with the BKL push-down but with timers. Most timers are unrelated to
> > other softirq vectors, those can simply be tagged with the new
> > TIMER_SOFTINTERRUPTIBLE flag that makes a callback soft-interruptible.
> > The others can carry the TIMER_SOFTINTERRUPTIBLE after they get converted
> > to use appropriate synchronization against other vectors callbacks
> > (using spin_lock_bh() for example).
> 
> This doesn't work as proposed because of lock ordering:
> |======================================================
> |WARNING: possible circular locking dependency detected
> |6.5.0-rc4-rt1+ #220 Not tainted
> |------------------------------------------------------
> |ktimers/0/15 is trying to acquire lock:
> |ffff88817b41b6d8 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0xb7/0x1a0
> |
> |but task is already holding lock:
> |ffff88817b41c820 (&base->expiry_lock){+.+.}-{2:2}, at: run_timer_softirq+0x61/0x3f0
> |
> |which lock already depends on the new lock.
> |
> |
> |the existing dependency chain (in reverse order) is:
> |
> |-> #1 (&base->expiry_lock){+.+.}-{2:2}:
> |       lock_acquire+0xd4/0x2f0
> |       rt_spin_lock+0x21/0xf0
> |       run_timer_softirq+0x61/0x3f0
> |       __do_softirq+0x19b/0x4cb
> |       run_timersd+0x92/0xf0
> |       smpboot_thread_fn+0x211/0x330
> |       kthread+0x110/0x130
> |       ret_from_fork+0x2b/0x40
> |       ret_from_fork_asm+0x1b/0x30
> |
> |-> #0 ((softirq_ctrl.lock)){+.+.}-{2:2}:
> |       check_prev_add+0xe2/0xd60
> |       __lock_acquire+0x132d/0x1700
> |       lock_acquire+0xd4/0x2f0
> |       rt_spin_lock+0x21/0xf0
> |       __local_bh_disable_ip+0xb7/0x1a0
> |       call_timer_fn+0x172/0x310
> |       run_timer_softirq+0x331/0x3f0
> |       __do_softirq+0x19b/0x4cb
> |       run_timersd+0x92/0xf0
> |       smpboot_thread_fn+0x211/0x330
> |       kthread+0x110/0x130
> |       ret_from_fork+0x2b/0x40
> |       ret_from_fork_asm+0x1b/0x30
> |
> |other info that might help us debug this:
> |
> | Possible unsafe locking scenario:
> |
> |       CPU0                    CPU1
> |       ----                    ----
> |  lock(&base->expiry_lock);
> |                               lock((softirq_ctrl.lock));
> |                               lock(&base->expiry_lock);
> |  lock((softirq_ctrl.lock));
> |
> | *** DEADLOCK ***
> |
> |2 locks held by ktimers/0/15:
> | #0: ffffffff826e9ce0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5d/0xf0
> | #1: ffff88817b41c820 (&base->expiry_lock){+.+.}-{2:2}, at:
> |run_timer_softirq+0x61/0x3f0

Right, need to pull that to the caller before it releases the lock.

> 
> I posted a different series last week where I drop the lock for other
> reasons at a different spot where it is safe to do so. It needs adopting
> other level softirq handler (besides TIMER_SOFTIRQ) but there is no
> need to deal with the individual callback.

I've seen that yes, I have yet to review deeper but it looks like a good idea
in any case to have a preemption point between timer callbacks.

RCU already does a similar thing from its rcu boost processing with
cond_resched_tasks_rcu_qs().

> 
> However, how do you continue here? Assuming all timers are marked
> TIMER_SOFTINTERRUPTIBLE then you could avoid the BH-lock at the
> timer-softirq.
> But when is a timer considered safe? Would the lack of the _bh suffix be
> that and you would simply add it during your push down?

Yeah that requires manual inspection. A timer that obviously doesn't mess
up with other softirqs, as is the case most of the time, can simply get the flag.

Other timers can be dealt with individually with local_bh_disable() or
spin_lock_bh() or critical section.

> Then you continue the same thing for the remaining softirqs. And once
> you are done you would remove that RT lock within local_bh_disable()?
> This isn't something a !RT user would benefit, right?

Why not? A long lasting ksoftirqd handling lots of NET_RX could be
interrupted by a timer/rcu softirq in !RT for example. Further, there
could even be more than one ksoftirqd if necessary, though I doubt it.

> The other idea I have (besides the preemption point in each softirq
> handler (mentioned earlier)) is to simple drop the BH-lock on RT. Unlike
> mainline, RT wouldn't deadlock then. The only that would be missing is
> synchronisation against local_bh_disable() only locking for variables.
> From what I remember from the various BH-models we have in RT in the
> past, that was the only thing that exploded.

I thought the issue was about timers fiddling with per-cpu state assuming
they wouldn't be disturbed by other vectors and thus they lack
local_bh_disable() on appropriate places. Or perhaps I misunderstood?

Otherwise all timers can carry TIMER_SOFTINTERRUPTIBLE right away, right?

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock
  2023-08-07 15:22   ` Frederic Weisbecker
@ 2023-08-08  7:15     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-08-08  7:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Thomas Gleixner, Paul E . McKenney,
	Linus Torvalds, Anna-Maria Behnsen, Eric Dumazet

On 2023-08-07 17:22:10 [+0200], Frederic Weisbecker wrote:
> > However, how do you continue here? Assuming all timers are marked
> > TIMER_SOFTINTERRUPTIBLE then you could avoid the BH-lock at the
> > timer-softirq.
> > But when is a timer considered safe? Would the lack of the _bh suffix be
> > that and you would simply add it during your push down?
> 
> Yeah that requires manual inspection. A timer that obviously doesn't mess
> up with other softirqs, as is the case most of the time, can simply get the flag.
> 
> Other timers can be dealt with individually with local_bh_disable() or
> spin_lock_bh() or critical section.

The question is what keeps _bh() doing if you intend to get rid of the
lock.

> > Then you continue the same thing for the remaining softirqs. And once
> > you are done you would remove that RT lock within local_bh_disable()?
> > This isn't something a !RT user would benefit, right?
> 
> Why not? A long lasting ksoftirqd handling lots of NET_RX could be
> interrupted by a timer/rcu softirq in !RT for example. Further, there
> could even be more than one ksoftirqd if necessary, though I doubt it.

That would require more thinking on how to accomplish this. It is
generally assumed that a softirq callback makes always progress if you
look at it from a different CPU. Therefore a loop like in
__timer_delete_sync() is "okay" because the callback will continue and
finish. If you drop the BH bits from the preemption pointer and allow
preemption then the other CPU could spin until that timer gets back and
continues to work. There is more of this kind of logic…

But I don't think that you want ksoftirqd handling NET_RX in the first
place. However having a long running NET_RX is a problem in RT because
it blocks all other force-threaded interrupts. Mainline doesn't have
this problem unless it uses threaded interrupts.
The other problematic thing is that everything is mixed up. So you can't
really distinguish between TASKLET as in USB vs SCSI. Not to mention
NET_RX vs TASKLET. If you don't have luxury to move them to different
CPU I *think* assigning them a priority would be a good thing.

> > The other idea I have (besides the preemption point in each softirq
> > handler (mentioned earlier)) is to simple drop the BH-lock on RT. Unlike
> > mainline, RT wouldn't deadlock then. The only that would be missing is
> > synchronisation against local_bh_disable() only locking for variables.
> > From what I remember from the various BH-models we have in RT in the
> > past, that was the only thing that exploded.
> 
> I thought the issue was about timers fiddling with per-cpu state assuming
> they wouldn't be disturbed by other vectors and thus they lack
> local_bh_disable() on appropriate places. Or perhaps I misunderstood?
> 
> Otherwise all timers can carry TIMER_SOFTINTERRUPTIBLE right away, right?

The last time I have been looking at it is just the usage of per-CPU
variables. For instance assume a timer invokes napi_consume_skb().

> Thanks.

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-08 19:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-01 13:24 [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Frederic Weisbecker
2023-08-01 13:24 ` [RFC PATCH 1/6] softirq: Turn set_softirq_pending() to reset_softirq_pending() Frederic Weisbecker
2023-08-01 13:24 ` [RFC PATCH 2/6] softirq: Make softirq handling entry/exit generally available Frederic Weisbecker
2023-08-01 13:24 ` [RFC PATCH 3/6] softirq: Introduce softirq disabled mask Frederic Weisbecker
2023-08-01 13:24 ` [RFC PATCH 4/6] x86/softirq: Support " Frederic Weisbecker
2023-08-01 13:24 ` [RFC PATCH 5/6] timers: Introduce soft-interruptible timers Frederic Weisbecker
2023-08-01 13:24 ` [RFC PATCH 6/6] timers: Make process_timeout() soft-interruptible Frederic Weisbecker
2023-08-07 12:50 ` [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Sebastian Andrzej Siewior
2023-08-07 15:22   ` Frederic Weisbecker
2023-08-08  7:15     ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox