public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Juri Lelli <juri.lelli@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC PATCH 01/10] timer: Prevent base->clk from moving backward
Date: Wed, 1 Jul 2020 18:35:04 +0200	[thread overview]
Message-ID: <20200701163504.GD9670@localhost.localdomain> (raw)
In-Reply-To: <20200701011030.14324-2-frederic@kernel.org>

Hi,

Thanks for the set, first of all. Reviewing and testing it.

On 01/07/20 03:10, Frederic Weisbecker wrote:
> When a timer is enqueued with a negative delta (ie: expiry is below
> base->clk), it gets added to the wheel as expiring now (base->clk).
> 
> Yet the value that gets stored in base->next_expiry, while calling
> trigger_dyntick_cpu(), is the initial timer->expires value. The
> resulting state becomes:
> 
> 	base->next_expiry < base->clk
> 
> On the next timer enqueue, forward_timer_base() may accidentally
> rewind base->clk. As a possible outcome, timers may expire way too
> early, the worst case being that the highest wheel levels get spuriously
> processed again.
> 
> To prevent from that, make sure that base->next_expiry doesn't get below
> base->clk.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> ---
>  kernel/time/timer.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 398e6eadb861..9a838d38dbe6 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -584,7 +584,15 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
>  	 * Set the next expiry time and kick the CPU so it can reevaluate the
>  	 * wheel:
>  	 */
> -	base->next_expiry = timer->expires;
> +	if (time_before(timer->expires, base->clk)) {
> +		/*
> +		 * Prevent from forward_timer_base() moving the base->clk
> +		 * backward
> +		 */
> +		base->next_expiry = base->clk;
> +	} else {
> +		base->next_expiry = timer->expires;
> +	}
>  	wake_up_nohz_cpu(base->cpu);
>  }
>  
> @@ -896,10 +904,13 @@ static inline void forward_timer_base(struct timer_base *base)
>  	 * If the next expiry value is > jiffies, then we fast forward to
>  	 * jiffies otherwise we forward to the next expiry value.
>  	 */
> -	if (time_after(base->next_expiry, jnow))
> +	if (time_after(base->next_expiry, jnow)) {
>  		base->clk = jnow;
> -	else
> +	} else {
> +		if (WARN_ON_ONCE(time_before(base->next_expiry, base->clk)))
> +			return;

I actually ported it to latest RT tree (v5.6.17-rt10) w/o conflicts,
but hit this one above:

 ...
 000: Kernel command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.17-rt10 root=/dev/mapper/rhel_rt--qe--04-root ro crashkernel=auto resume=/dev/mapper/rhel_rt--qe--04-swap rd.lvm.lv=rhel_rt-qe-04/root rd.lvm.lv=rhel_rt-qe-04/swap console=ttyS0,115200
 000: mem auto-init: stack:off, heap alloc:off, heap free:off
 000: Memory: 2102240K/134089036K available (12292K kernel code, 2449K rwdata, 4332K rodata, 2248K init, 15392K bss, 2278548K reserved, 0K cma-reserved)
 000: random: get_random_u64 called from cache_random_seq_create+0x7c/0x140 with crng_init=0
 000: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=24, Nodes=2
 000: Kernel/User page tables isolation: enabled
 000: ftrace: allocating 37680 entries in 148 pages
 000: ftrace: allocated 148 pages with 3 groups
 000: rcu: Preemptible hierarchical RCU implementation.
 000: rcu:        RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=24.
 000: rcu:        RCU priority boosting: priority 1 delay 500 ms.
 000: rcu:        RCU_SOFTIRQ processing moved to rcuc kthreads.
 000:     No expedited grace period (rcu_normal_after_boot).
 000:     Tasks RCU enabled.
 000: rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
 000: rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=24
 000: NR_IRQS: 524544, nr_irqs: 1432, preallocated irqs: 16
 000: random: crng done (trusting CPU's manufacturer)
 000: Console: colour VGA+ 80x25
 000: printk: console [ttyS0] enabled
 000: ACPI: Core revision 20200110
 000: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
 000: APIC: Switch to symmetric I/O mode setup
 000: DMAR: Host address width 46
 000: DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
 000: DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020df
 000: DMAR: DRHD base: 0x000000c7ffc000 flags: 0x1
 000: DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap d2078c106f0466 ecap f020df
 000: DMAR: RMRR base: 0x00000079190000 end: 0x00000079192fff
 000: DMAR: RMRR base: 0x000000791f4000 end: 0x000000791f7fff
 000: DMAR: RMRR base: 0x000000791de000 end: 0x000000791f3fff
 000: DMAR: RMRR base: 0x000000791cb000 end: 0x000000791dbfff
 000: DMAR: RMRR base: 0x000000791dc000 end: 0x000000791ddfff
 000: DMAR: RMRR base: 0x0000005a661000 end: 0x0000005a6a0fff
 000: DMAR-IR: IOAPIC id 10 under DRHD base  0xfbffc000 IOMMU 0
 000: DMAR-IR: IOAPIC id 8 under DRHD base  0xc7ffc000 IOMMU 1
 000: DMAR-IR: IOAPIC id 9 under DRHD base  0xc7ffc000 IOMMU 1
 000: DMAR-IR: HPET id 0 under DRHD base 0xc7ffc000
 000: DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
 000: DMAR-IR: Enabled IRQ remapping in x2apic mode
 000: x2apic enabled
 000: Switched APIC routing to cluster x2apic.
 000: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
 000: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x228d095820e, max_idle_ns: 440795295198 ns
 000: ------------[ cut here ]------------
 000: WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:897 add_timer_on+0x129/0x140
 000: Modules linked in:
 000: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.17-rt10 #1
 000: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/25/2017
 000: RIP: 0010:add_timer_on+0x129/0x140
 000: Code: 24 48 89 ef e8 e8 ca 7d 00 48 8b 44 24 08 65 48 33 04 25 28 00 00 00 75 1f 48 83 c4 10 5b 5d 41 5c c3 48 89 45 48 eb ca 0f 0b <0f> 0b eb c4 e8 53 94 e9 ff e9 7b ff ff ff e8 64 c4 f6 ff 0f 1f 40
 000: RSP: 0000:ffffffffb2c03e80 EFLAGS: 00010083
 000: RAX: 00000000fffb6c25 RBX: ffffffffb3886540 RCX: 0000000000000000
 000: RDX: 00000000fffb6c20 RSI: ffffffffb2c03e80 RDI: 0000000000000001
 000: RBP: ffff92687fa19300 R08: 0000000000000000 R09: 0000000000000001
 000: R10: ffffffffb2c5b4e0 R11: 3235393730343420 R12: 0000000000000000
 000: R13: ffffffffb2cd42c0 R14: 0000000000000000 R15: ffff92687fa27f40
 000: FS:  0000000000000000(0000) GS:ffff92687fa00000(0000) knlGS:0000000000000000
 000: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 000: CR2: ffff92610ac01000 CR3: 00000008c920e001 CR4: 00000000000606b0
 000: Call Trace:
 000:  clocksource_select_watchdog+0x144/0x1a0
 000:  __clocksource_register_scale+0x88/0xf0
 000:  tsc_init+0x1a1/0x268
 000:  start_kernel+0x4ae/0x56e
 000:  secondary_startup_64+0xb6/0xc0
 000: ---[ end trace 0000000000000001 ]---

Guess you might be faster to understand what I'm missing. :-)

Best,

Juri


  reply	other threads:[~2020-07-01 16:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-01  1:10 [RFC PATCH 00/10] timer: Reduce timers softirq (and other optimizations) Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 01/10] timer: Prevent base->clk from moving backward Frederic Weisbecker
2020-07-01 16:35   ` Juri Lelli [this message]
2020-07-01 23:20     ` Frederic Weisbecker
2020-07-02  9:59       ` Juri Lelli
2020-07-02 14:04         ` Frederic Weisbecker
2020-07-02 14:32         ` Frederic Weisbecker
2020-07-02 15:57           ` Juri Lelli
2020-07-02  9:48   ` Thomas Gleixner
2020-07-01  1:10 ` [RFC PATCH 02/10] timer: Move trigger_dyntick_cpu() to enqueue_timer() Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 03/10] timer: Simplify LVL_START() and calc_index() Frederic Weisbecker
2020-07-02 11:59   ` Thomas Gleixner
2020-07-02 12:27     ` Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 04/10] timer: Optimize _next_timer_interrupt() level iteration Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 05/10] timers: Always keep track of next expiry Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 06/10] timer: Reuse next expiry cache after nohz exit Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 07/10] timer: Expand clk forward logic beyond nohz Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 08/10] timer: Spare timer softirq until next expiry Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 09/10] timer: Remove must_forward_clk Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 10/10] timer: Lower base clock forwarding threshold Frederic Weisbecker
2020-07-02 13:21   ` Thomas Gleixner
2020-07-02 13:32     ` Frederic Weisbecker
2020-07-02 15:14       ` Thomas Gleixner
2020-07-03  0:12         ` Frederic Weisbecker
2020-07-03  9:13           ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200701163504.GD9670@localhost.localdomain \
    --to=juri.lelli@redhat.com \
    --cc=anna-maria@linutronix.de \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox