public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/vdso: Use non-serializing instruction rdtsc
@ 2023-05-16  6:52 Rong Tao
  2023-05-16 14:12 ` Dave Hansen
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Rong Tao @ 2023-05-16  6:52 UTC (permalink / raw)
  To: tglx
  Cc: rtoax, Rong Tao, Ingo Molnar, Borislav Petkov, Dave Hansen,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

From: Rong Tao <rongtao@cestc.cn>

Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction
rdtsc can achieve a 40% performance improvement with only a small loss of
precision.

The RDTSCP instruction is not a serializing instruction, but it does wait
until all previous instructions have executed and all previous loads are
globally visible. The RDTSC instruction is not a serializing instruction.
It does not necessarily wait until all previous instructions have been
executed before reading the counter.

Record the time-consuming of vdso clock_gettime(), pseudo code:

    count = 1000 * 1000 * 100;
    while (count--)
        clock_gettime(CLOCK_REALTIME, &ts);

Time-consuming comparison:

     Time Consume(ns) | rdtsc_ordered() |  rdtsc()  | Promote
    ------------------+-----------------+-----------+---------
    Physical Machine  |  1269147289     | 759067324 |   40%
     Guest OS (KVM)   |  1756615963     | 995823886 |   43%

Signed-off-by: Rong Tao <rongtao@cestc.cn>
---
 arch/x86/include/asm/vdso/gettimeofday.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h
index 4cf6794f9d68..342d29106208 100644
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -228,7 +228,7 @@ static u64 vread_pvclock(void)
 		if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)))
 			return U64_MAX;
 
-		ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
+		ret = __pvclock_read_cycles(pvti, rdtsc());
 	} while (pvclock_read_retry(pvti, version));
 
 	return ret;
@@ -246,7 +246,7 @@ static inline u64 __arch_get_hw_counter(s32 clock_mode,
 					const struct vdso_data *vd)
 {
 	if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
-		return (u64)rdtsc_ordered();
+		return (u64)rdtsc();
 	/*
 	 * For any memory-mapped vclock type, we need to make sure that gcc
 	 * doesn't cleverly hoist a load before the mode check.  Otherwise we
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-05-17  0:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-16  6:52 [PATCH] x86/vdso: Use non-serializing instruction rdtsc Rong Tao
2023-05-16 14:12 ` Dave Hansen
2023-05-16 17:57   ` H. Peter Anvin
2023-05-16 20:39     ` Thomas Gleixner
2023-05-16 14:20 ` Thomas Gleixner
2023-05-16 21:53 ` Andy Lutomirski
2023-05-17  0:41 ` Rong Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox