From: Jiri Wiesner <jwiesner@suse.de>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Mel Gorman <mgorman@suse.de>, LKML <linux-kernel@vger.kernel.org>,
Linux-Net <netdev@vger.kernel.org>
Subject: [RFC PATCH] clocksource: increase watchdog retries
Date: Wed, 27 Oct 2021 18:43:52 +0200 [thread overview]
Message-ID: <20211027164352.GA23273@incl> (raw)
A recent change to the clocksource watchdog in commit db3a34e17433
("clocksource: Retry clock read if long delays detected") has caused a
severe performance regression in TCP throughput tests. Netperf executed on
localhost was used for testing. The regression was more than 80%. On the
testing machine, the HPET clocksource was used to detect delays in reading
the TSC clocksource, which was the selected clocksource. In 10% of the
boots of the machine, TSC was marked unstable and the HPET clocksource was
selected as the best clocksource:
[ 13.669682] clocksource: timekeeping watchdog on CPU6: hpet read-back delay of 60666ns, attempt 4, marking unstable
[ 13.669827] tsc: Marking TSC unstable due to clocksource watchdog
[ 13.669917] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[ 13.670048] sched_clock: Marking unstable (11633513890, 2036384489)<-(13965149377, -295250974)
[ 13.672268] clocksource: Checking clocksource tsc synchronization from CPU 0 to CPUs 1-3,6-7.
[ 13.673148] clocksource: Switched to clocksource hpet
The earliest occurrence was this:
[ 3.423636] clocksource: timekeeping watchdog on CPU2: hpet read-back delay of 61833ns, attempt 4, marking unstable
[ 3.435182] tsc: Marking TSC unstable due to clocksource watchdog
[ 3.455228] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0
[ 3.459182] hpet0: 8 comparators, 64-bit 24.000000 MHz counter
[ 3.471195] clocksource: Switched to clocksource hpet
The HPET clocksource suffers from lock contention when its read() function
is executed on multiple CPUs concurrently. A perf profile from the netperf
test (netperf ran on CPU 1, netserver ran on CPU 0):
Samples: 14K of event 'bus-cycles'
Overhead Command Shared Object Symbol CPU
43.83% netperf [kernel.vmlinux] [k] read_hpet 001
40.86% netserver [kernel.vmlinux] [k] read_hpet 000
2.27% netperf [kernel.vmlinux] [k] syscall_exit_to_user_mode 001
2.19% netserver [kernel.vmlinux] [k] syscall_exit_to_user_mode 000
0.96% netserver [kernel.vmlinux] [k] entry_SYSCALL_64 000
0.92% swapper [kernel.vmlinux] [k] read_hpet 000
For timestamping, TCP needs to execute ktime_get() in both the transmit
and receive path. Lock contention caused by HPET on 2 CPUs was enough to
lose 88% of the throughput measured with TSC (1.6 Gbit/s with HPET, 13
Gbit/s with TSC). The lock contention can also be reproduced by switching
to HPET via sysfs.
Tests were carried out to tweak the value of the
clocksource.max_cswd_read_retries parameter. The results indicate that
setting the number of retries to 50 mitigates the issue on the testing
machine, but it does not make it go away entirely:
clocksource.max_cswd_read_retries=3
Reboots: 100 TSC unstable: 10
Reboots: 300 TSC unstable: 32
clocksource.max_cswd_read_retries=5
Reboots: 100 TSC unstable: 5
clocksource.max_cswd_read_retries=10
Reboots: 100 TSC unstable: 6
clocksource.max_cswd_read_retries=50
Reboots: 100 TSC unstable: 0
Reboots: 300 TSC unstable: 1
The testing machine has a Skylake CPU (Intel(R) Xeon(R) CPU E3-1240 v5 @
3.50GHz) with 4 cores (8 CPUs when SMT is on). Perhaps, the number of
retries to mitigate the issue could depend on the number of online CPUs on
the system. Tweaking clocksource.verify_n_cpus had no effect:
clocksource.max_cswd_read_retries=3 clocksource.verify_n_cpus=1
Reboots: 100 TSC unstable: 11
The issue has been observed on both Intel and AMD machines, and it is not
specific to Skylake CPUs. The observed regression varies but, so far, tens
of per cent have been observed.
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
---
Documentation/admin-guide/kernel-parameters.txt | 4 ++--
kernel/time/clocksource.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc0..b8bebca0f520 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -603,8 +603,8 @@
clocksource.max_cswd_read_retries= [KNL]
Number of clocksource_watchdog() retries due to
external delays before the clock will be marked
- unstable. Defaults to three retries, that is,
- four attempts to read the clock under test.
+ unstable. Defaults to fifty retries, that is,
+ fiftyone attempts to read the clock under test.
clocksource.verify_n_cpus= [KNL]
Limit the number of CPUs checked for clocksources
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index b8a14d2fb5ba..c15de711617a 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -199,7 +199,7 @@ void clocksource_mark_unstable(struct clocksource *cs)
spin_unlock_irqrestore(&watchdog_lock, flags);
}
-ulong max_cswd_read_retries = 3;
+ulong max_cswd_read_retries = 50;
module_param(max_cswd_read_retries, ulong, 0644);
EXPORT_SYMBOL_GPL(max_cswd_read_retries);
static int verify_n_cpus = 8;
--
2.26.2
next reply other threads:[~2021-10-27 16:43 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-27 16:43 Jiri Wiesner [this message]
2021-10-27 21:38 ` [RFC PATCH] clocksource: increase watchdog retries Paul E. McKenney
2021-10-28 16:20 ` Jiri Wiesner
2021-10-28 18:42 ` Paul E. McKenney
2021-11-01 10:28 ` Jiri Wiesner
2021-11-02 0:09 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211027164352.GA23273@incl \
--to=jwiesner@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=netdev@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).