public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls
@ 2024-12-31 15:01 Zhongqiu Han
  2024-12-31 16:08 ` Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Zhongqiu Han @ 2024-12-31 15:01 UTC (permalink / raw)
  To: anna-maria, frederic, tglx; +Cc: linux-kernel, quic_zhonhan

If the timer is deferrable and NO_HZ_COMMON is enabled, the function
get_timer_cpu_base() will call per_cpu_ptr() twice. Optimize the function
to avoid potentially redundant per_cpu_ptr() calls.

One of the call paths of the get_timer_cpu_base() function is through the
lock_timer_base() function, which contains a loop. Within this loop, the
get_timer_base() func is called, and in turn, it calls the
get_timer_cpu_base() function. And in such a path, get_timer_cpu_base is
a hotspot function. It is called approximately 13,000 times in 12 seconds
on test x86 KVM machines.

lock_timer_base(){
    for(;;) {
    ...
    --> get_timer_base() [inline]
        --> get_timer_cpu_base() [inline]
    ...
    }
}

With the patch, assembly code(on x86 and ARM64) to be executed in loop is
reduced. And conducting comparative tests on x86 KVM virtual machines,
comparison of runtime before and after optimization (in nanoseconds), we
can see that the distribution of runtime tends to favor smaller time
intervals.

      Before                  After
[0-19]:      0    	[0-19]:      0
[20-39]:     6    	[20-39]:     1014
[40-59]:     41   	[40-59]:     2198
[60-79]:     93   	[60-79]:     2073
[80-99]:     814  	[80-99]:     3081
[100-119]:   5262 	[100-119]:   3268
[120-139]:   4510 	[120-139]:   671
[140-159]:   2202 	[140-159]:   468
[160-179]:   81   	[160-179]:   158
[180-199]:   15   	[180-199]:   160
[200-219]:   3    	[200-219]:   54
[220-239]:   2    	[220-239]:   7
[240-259]:   2    	[240-259]:   3
[260-279]:   0    	[260-279]:   0
[280-299]:   0    	[280-299]:   1
[300-319]:   0    	[300-319]:   0
total:       13031      total:       13156

Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
---
 kernel/time/timer.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index a5860bf6d16f..40706cb36920 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -956,33 +956,29 @@ static int detach_if_pending(struct timer_list *timer, struct timer_base *base,
 static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu)
 {
 	int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL;
-	struct timer_base *base;
-
-	base = per_cpu_ptr(&timer_bases[index], cpu);
 
 	/*
 	 * If the timer is deferrable and NO_HZ_COMMON is set then we need
 	 * to use the deferrable base.
 	 */
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
-		base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu);
-	return base;
+		index = BASE_DEF;
+
+	return per_cpu_ptr(&timer_bases[index], cpu);
 }
 
 static inline struct timer_base *get_timer_this_cpu_base(u32 tflags)
 {
 	int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL;
-	struct timer_base *base;
-
-	base = this_cpu_ptr(&timer_bases[index]);
 
 	/*
 	 * If the timer is deferrable and NO_HZ_COMMON is set then we need
 	 * to use the deferrable base.
 	 */
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
-		base = this_cpu_ptr(&timer_bases[BASE_DEF]);
-	return base;
+		index = BASE_DEF;
+
+	return this_cpu_ptr(&timer_bases[index]);
 }
 
 static inline struct timer_base *get_timer_base(u32 tflags)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls
  2024-12-31 15:01 [PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls Zhongqiu Han
@ 2024-12-31 16:08 ` Frederic Weisbecker
  2025-01-15 21:12 ` Thomas Gleixner
  2025-01-16  8:08 ` [tip: timers/core] timers: Optimize get_timer_[this_]cpu_base() tip-bot2 for Zhongqiu Han
  2 siblings, 0 replies; 5+ messages in thread
From: Frederic Weisbecker @ 2024-12-31 16:08 UTC (permalink / raw)
  To: Zhongqiu Han; +Cc: anna-maria, tglx, linux-kernel

Le Tue, Dec 31, 2024 at 11:01:15PM +0800, Zhongqiu Han a écrit :
> If the timer is deferrable and NO_HZ_COMMON is enabled, the function
> get_timer_cpu_base() will call per_cpu_ptr() twice. Optimize the function
> to avoid potentially redundant per_cpu_ptr() calls.
> 
> One of the call paths of the get_timer_cpu_base() function is through the
> lock_timer_base() function, which contains a loop. Within this loop, the
> get_timer_base() func is called, and in turn, it calls the
> get_timer_cpu_base() function. And in such a path, get_timer_cpu_base is
> a hotspot function. It is called approximately 13,000 times in 12 seconds
> on test x86 KVM machines.
> 
> lock_timer_base(){
>     for(;;) {
>     ...
>     --> get_timer_base() [inline]
>         --> get_timer_cpu_base() [inline]
>     ...
>     }
> }
> 
> With the patch, assembly code(on x86 and ARM64) to be executed in loop is
> reduced. And conducting comparative tests on x86 KVM virtual machines,
> comparison of runtime before and after optimization (in nanoseconds), we
> can see that the distribution of runtime tends to favor smaller time
> intervals.
> 
>       Before                  After
> [0-19]:      0    	[0-19]:      0
> [20-39]:     6    	[20-39]:     1014
> [40-59]:     41   	[40-59]:     2198
> [60-79]:     93   	[60-79]:     2073
> [80-99]:     814  	[80-99]:     3081
> [100-119]:   5262 	[100-119]:   3268
> [120-139]:   4510 	[120-139]:   671
> [140-159]:   2202 	[140-159]:   468
> [160-179]:   81   	[160-179]:   158
> [180-199]:   15   	[180-199]:   160
> [200-219]:   3    	[200-219]:   54
> [220-239]:   2    	[220-239]:   7
> [240-259]:   2    	[240-259]:   3
> [260-279]:   0    	[260-279]:   0
> [280-299]:   0    	[280-299]:   1
> [300-319]:   0    	[300-319]:   0
> total:       13031      total:       13156
> 
> Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls
  2024-12-31 15:01 [PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls Zhongqiu Han
  2024-12-31 16:08 ` Frederic Weisbecker
@ 2025-01-15 21:12 ` Thomas Gleixner
  2025-01-16  3:36   ` Zhongqiu Han
  2025-01-16  8:08 ` [tip: timers/core] timers: Optimize get_timer_[this_]cpu_base() tip-bot2 for Zhongqiu Han
  2 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2025-01-15 21:12 UTC (permalink / raw)
  To: Zhongqiu Han, anna-maria, frederic; +Cc: linux-kernel, quic_zhonhan

On Tue, Dec 31 2024 at 23:01, Zhongqiu Han wrote:
> If the timer is deferrable and NO_HZ_COMMON is enabled, the function
> get_timer_cpu_base() will call per_cpu_ptr() twice. Optimize the function
> to avoid potentially redundant per_cpu_ptr() calls.

This lacks an explanation for the second hunk which changes
get_timer_this_cpu_base().

> One of the call paths of the get_timer_cpu_base() function is through the
> lock_timer_base() function, which contains a loop. Within this loop, the
> get_timer_base() func is called, and in turn, it calls the
> get_timer_cpu_base() function. And in such a path, get_timer_cpu_base is
> a hotspot function. It is called approximately 13,000 times in 12 seconds
> on test x86 KVM machines.

Which is roughly once per millisecond and depending on the number of
CPUs that's far from a hotspot.

I'm not against the change per se, but this change log is a bit over the
top aside of ot mentioning the second hunk. I'll fix it up when
applying.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls
  2025-01-15 21:12 ` Thomas Gleixner
@ 2025-01-16  3:36   ` Zhongqiu Han
  0 siblings, 0 replies; 5+ messages in thread
From: Zhongqiu Han @ 2025-01-16  3:36 UTC (permalink / raw)
  To: Thomas Gleixner, anna-maria, frederic; +Cc: linux-kernel

On 1/16/2025 5:12 AM, Thomas Gleixner wrote:
> On Tue, Dec 31 2024 at 23:01, Zhongqiu Han wrote:
>> If the timer is deferrable and NO_HZ_COMMON is enabled, the function
>> get_timer_cpu_base() will call per_cpu_ptr() twice. Optimize the function
>> to avoid potentially redundant per_cpu_ptr() calls.
> 
> This lacks an explanation for the second hunk which changes
> get_timer_this_cpu_base().
> 
Acknowledged.

>> One of the call paths of the get_timer_cpu_base() function is through the
>> lock_timer_base() function, which contains a loop. Within this loop, the
>> get_timer_base() func is called, and in turn, it calls the
>> get_timer_cpu_base() function. And in such a path, get_timer_cpu_base is
>> a hotspot function. It is called approximately 13,000 times in 12 seconds
>> on test x86 KVM machines.
> 
> Which is roughly once per millisecond and depending on the number of
> CPUs that's far from a hotspot.

Acknowledged.
> 
> I'm not against the change per se, but this change log is a bit over the
> top aside of ot mentioning the second hunk. I'll fix it up when
> applying.
> 
> Thanks,
> 
>          tglx
Hi tglx / Frederic,
Thanks for the review.
Since you will help modify the commit message when applying, I will not
arise patchset2. Thanks~



-- 
Thx and BRs,
Zhongqiu Han

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: timers/core] timers: Optimize get_timer_[this_]cpu_base()
  2024-12-31 15:01 [PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls Zhongqiu Han
  2024-12-31 16:08 ` Frederic Weisbecker
  2025-01-15 21:12 ` Thomas Gleixner
@ 2025-01-16  8:08 ` tip-bot2 for Zhongqiu Han
  2 siblings, 0 replies; 5+ messages in thread
From: tip-bot2 for Zhongqiu Han @ 2025-01-16  8:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Zhongqiu Han, Thomas Gleixner, Frederic Weisbecker, x86,
	linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     3ec955713d9617059d2fc8f2816d0b95ace72256
Gitweb:        https://git.kernel.org/tip/3ec955713d9617059d2fc8f2816d0b95ace72256
Author:        Zhongqiu Han <quic_zhonhan@quicinc.com>
AuthorDate:    Tue, 31 Dec 2024 23:01:15 +08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 16 Jan 2025 09:04:23 +01:00

timers: Optimize get_timer_[this_]cpu_base()

If a timer is deferrable and NO_HZ_COMMON is enabled, get_timer_cpu_base()
and get_timer_this_cpu_base() invoke per_cpu_ptr() and this_cpu_ptr()
twice.

While this seems to be cheap, get_timer_cpu_base() can be called in a loop
in lock_timer_base().

Optimize the functions by updating the base index for deferrable timers and
retrieving the actual base pointer once.

In both cases the resulting assembly code of those helpers becomes smaller,
which results in a ~30% execution time reduction for a lock_timer_base()
micro bench mark.

Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/all/20241231150115.1978342-1-quic_zhonhan@quicinc.com

---
 kernel/time/timer.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index a5860bf..40706cb 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -956,33 +956,29 @@ static int detach_if_pending(struct timer_list *timer, struct timer_base *base,
 static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu)
 {
 	int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL;
-	struct timer_base *base;
-
-	base = per_cpu_ptr(&timer_bases[index], cpu);
 
 	/*
 	 * If the timer is deferrable and NO_HZ_COMMON is set then we need
 	 * to use the deferrable base.
 	 */
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
-		base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu);
-	return base;
+		index = BASE_DEF;
+
+	return per_cpu_ptr(&timer_bases[index], cpu);
 }
 
 static inline struct timer_base *get_timer_this_cpu_base(u32 tflags)
 {
 	int index = tflags & TIMER_PINNED ? BASE_LOCAL : BASE_GLOBAL;
-	struct timer_base *base;
-
-	base = this_cpu_ptr(&timer_bases[index]);
 
 	/*
 	 * If the timer is deferrable and NO_HZ_COMMON is set then we need
 	 * to use the deferrable base.
 	 */
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE))
-		base = this_cpu_ptr(&timer_bases[BASE_DEF]);
-	return base;
+		index = BASE_DEF;
+
+	return this_cpu_ptr(&timer_bases[index]);
 }
 
 static inline struct timer_base *get_timer_base(u32 tflags)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-01-16  8:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-31 15:01 [PATCH] timers: Optimize get_timer_cpu_base() to reduce potentially redundant per_cpu_ptr() calls Zhongqiu Han
2024-12-31 16:08 ` Frederic Weisbecker
2025-01-15 21:12 ` Thomas Gleixner
2025-01-16  3:36   ` Zhongqiu Han
2025-01-16  8:08 ` [tip: timers/core] timers: Optimize get_timer_[this_]cpu_base() tip-bot2 for Zhongqiu Han

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox