From: Juri Lelli <juri.lelli@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
Anna-Maria Gleixner <anna-maria@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC PATCH 01/10] timer: Prevent base->clk from moving backward
Date: Wed, 1 Jul 2020 18:35:04 +0200 [thread overview]
Message-ID: <20200701163504.GD9670@localhost.localdomain> (raw)
In-Reply-To: <20200701011030.14324-2-frederic@kernel.org>
Hi,
Thanks for the set, first of all. Reviewing and testing it.
On 01/07/20 03:10, Frederic Weisbecker wrote:
> When a timer is enqueued with a negative delta (ie: expiry is below
> base->clk), it gets added to the wheel as expiring now (base->clk).
>
> Yet the value that gets stored in base->next_expiry, while calling
> trigger_dyntick_cpu(), is the initial timer->expires value. The
> resulting state becomes:
>
> base->next_expiry < base->clk
>
> On the next timer enqueue, forward_timer_base() may accidentally
> rewind base->clk. As a possible outcome, timers may expire way too
> early, the worst case being that the highest wheel levels get spuriously
> processed again.
>
> To prevent from that, make sure that base->next_expiry doesn't get below
> base->clk.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> ---
> kernel/time/timer.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 398e6eadb861..9a838d38dbe6 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -584,7 +584,15 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
> * Set the next expiry time and kick the CPU so it can reevaluate the
> * wheel:
> */
> - base->next_expiry = timer->expires;
> + if (time_before(timer->expires, base->clk)) {
> + /*
> + * Prevent from forward_timer_base() moving the base->clk
> + * backward
> + */
> + base->next_expiry = base->clk;
> + } else {
> + base->next_expiry = timer->expires;
> + }
> wake_up_nohz_cpu(base->cpu);
> }
>
> @@ -896,10 +904,13 @@ static inline void forward_timer_base(struct timer_base *base)
> * If the next expiry value is > jiffies, then we fast forward to
> * jiffies otherwise we forward to the next expiry value.
> */
> - if (time_after(base->next_expiry, jnow))
> + if (time_after(base->next_expiry, jnow)) {
> base->clk = jnow;
> - else
> + } else {
> + if (WARN_ON_ONCE(time_before(base->next_expiry, base->clk)))
> + return;
I actually ported it to latest RT tree (v5.6.17-rt10) w/o conflicts,
but hit this one above:
...
000: Kernel command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.17-rt10 root=/dev/mapper/rhel_rt--qe--04-root ro crashkernel=auto resume=/dev/mapper/rhel_rt--qe--04-swap rd.lvm.lv=rhel_rt-qe-04/root rd.lvm.lv=rhel_rt-qe-04/swap console=ttyS0,115200
000: mem auto-init: stack:off, heap alloc:off, heap free:off
000: Memory: 2102240K/134089036K available (12292K kernel code, 2449K rwdata, 4332K rodata, 2248K init, 15392K bss, 2278548K reserved, 0K cma-reserved)
000: random: get_random_u64 called from cache_random_seq_create+0x7c/0x140 with crng_init=0
000: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=24, Nodes=2
000: Kernel/User page tables isolation: enabled
000: ftrace: allocating 37680 entries in 148 pages
000: ftrace: allocated 148 pages with 3 groups
000: rcu: Preemptible hierarchical RCU implementation.
000: rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=24.
000: rcu: RCU priority boosting: priority 1 delay 500 ms.
000: rcu: RCU_SOFTIRQ processing moved to rcuc kthreads.
000: No expedited grace period (rcu_normal_after_boot).
000: Tasks RCU enabled.
000: rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
000: rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=24
000: NR_IRQS: 524544, nr_irqs: 1432, preallocated irqs: 16
000: random: crng done (trusting CPU's manufacturer)
000: Console: colour VGA+ 80x25
000: printk: console [ttyS0] enabled
000: ACPI: Core revision 20200110
000: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
000: APIC: Switch to symmetric I/O mode setup
000: DMAR: Host address width 46
000: DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
000: DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020df
000: DMAR: DRHD base: 0x000000c7ffc000 flags: 0x1
000: DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap d2078c106f0466 ecap f020df
000: DMAR: RMRR base: 0x00000079190000 end: 0x00000079192fff
000: DMAR: RMRR base: 0x000000791f4000 end: 0x000000791f7fff
000: DMAR: RMRR base: 0x000000791de000 end: 0x000000791f3fff
000: DMAR: RMRR base: 0x000000791cb000 end: 0x000000791dbfff
000: DMAR: RMRR base: 0x000000791dc000 end: 0x000000791ddfff
000: DMAR: RMRR base: 0x0000005a661000 end: 0x0000005a6a0fff
000: DMAR-IR: IOAPIC id 10 under DRHD base 0xfbffc000 IOMMU 0
000: DMAR-IR: IOAPIC id 8 under DRHD base 0xc7ffc000 IOMMU 1
000: DMAR-IR: IOAPIC id 9 under DRHD base 0xc7ffc000 IOMMU 1
000: DMAR-IR: HPET id 0 under DRHD base 0xc7ffc000
000: DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
000: DMAR-IR: Enabled IRQ remapping in x2apic mode
000: x2apic enabled
000: Switched APIC routing to cluster x2apic.
000: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
000: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x228d095820e, max_idle_ns: 440795295198 ns
000: ------------[ cut here ]------------
000: WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:897 add_timer_on+0x129/0x140
000: Modules linked in:
000: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.17-rt10 #1
000: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/25/2017
000: RIP: 0010:add_timer_on+0x129/0x140
000: Code: 24 48 89 ef e8 e8 ca 7d 00 48 8b 44 24 08 65 48 33 04 25 28 00 00 00 75 1f 48 83 c4 10 5b 5d 41 5c c3 48 89 45 48 eb ca 0f 0b <0f> 0b eb c4 e8 53 94 e9 ff e9 7b ff ff ff e8 64 c4 f6 ff 0f 1f 40
000: RSP: 0000:ffffffffb2c03e80 EFLAGS: 00010083
000: RAX: 00000000fffb6c25 RBX: ffffffffb3886540 RCX: 0000000000000000
000: RDX: 00000000fffb6c20 RSI: ffffffffb2c03e80 RDI: 0000000000000001
000: RBP: ffff92687fa19300 R08: 0000000000000000 R09: 0000000000000001
000: R10: ffffffffb2c5b4e0 R11: 3235393730343420 R12: 0000000000000000
000: R13: ffffffffb2cd42c0 R14: 0000000000000000 R15: ffff92687fa27f40
000: FS: 0000000000000000(0000) GS:ffff92687fa00000(0000) knlGS:0000000000000000
000: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
000: CR2: ffff92610ac01000 CR3: 00000008c920e001 CR4: 00000000000606b0
000: Call Trace:
000: clocksource_select_watchdog+0x144/0x1a0
000: __clocksource_register_scale+0x88/0xf0
000: tsc_init+0x1a1/0x268
000: start_kernel+0x4ae/0x56e
000: secondary_startup_64+0xb6/0xc0
000: ---[ end trace 0000000000000001 ]---
Guess you might be faster to understand what I'm missing. :-)
Best,
Juri
next prev parent reply other threads:[~2020-07-01 16:35 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-01 1:10 [RFC PATCH 00/10] timer: Reduce timers softirq (and other optimizations) Frederic Weisbecker
2020-07-01 1:10 ` [RFC PATCH 01/10] timer: Prevent base->clk from moving backward Frederic Weisbecker
2020-07-01 16:35 ` Juri Lelli [this message]
2020-07-01 23:20 ` Frederic Weisbecker
2020-07-02 9:59 ` Juri Lelli
2020-07-02 14:04 ` Frederic Weisbecker
2020-07-02 14:32 ` Frederic Weisbecker
2020-07-02 15:57 ` Juri Lelli
2020-07-02 9:48 ` Thomas Gleixner
2020-07-01 1:10 ` [RFC PATCH 02/10] timer: Move trigger_dyntick_cpu() to enqueue_timer() Frederic Weisbecker
2020-07-01 1:10 ` [RFC PATCH 03/10] timer: Simplify LVL_START() and calc_index() Frederic Weisbecker
2020-07-02 11:59 ` Thomas Gleixner
2020-07-02 12:27 ` Frederic Weisbecker
2020-07-01 1:10 ` [RFC PATCH 04/10] timer: Optimize _next_timer_interrupt() level iteration Frederic Weisbecker
2020-07-01 1:10 ` [RFC PATCH 05/10] timers: Always keep track of next expiry Frederic Weisbecker
2020-07-01 1:10 ` [RFC PATCH 06/10] timer: Reuse next expiry cache after nohz exit Frederic Weisbecker
2020-07-01 1:10 ` [RFC PATCH 07/10] timer: Expand clk forward logic beyond nohz Frederic Weisbecker
2020-07-01 1:10 ` [RFC PATCH 08/10] timer: Spare timer softirq until next expiry Frederic Weisbecker
2020-07-01 1:10 ` [RFC PATCH 09/10] timer: Remove must_forward_clk Frederic Weisbecker
2020-07-01 1:10 ` [RFC PATCH 10/10] timer: Lower base clock forwarding threshold Frederic Weisbecker
2020-07-02 13:21 ` Thomas Gleixner
2020-07-02 13:32 ` Frederic Weisbecker
2020-07-02 15:14 ` Thomas Gleixner
2020-07-03 0:12 ` Frederic Weisbecker
2020-07-03 9:13 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200701163504.GD9670@localhost.localdomain \
--to=juri.lelli@redhat.com \
--cc=anna-maria@linutronix.de \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.