All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juri Lelli <juri.lelli@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC PATCH 01/10] timer: Prevent base->clk from moving backward
Date: Wed, 1 Jul 2020 18:35:04 +0200	[thread overview]
Message-ID: <20200701163504.GD9670@localhost.localdomain> (raw)
In-Reply-To: <20200701011030.14324-2-frederic@kernel.org>

Hi,

Thanks for the set, first of all. Reviewing and testing it.

On 01/07/20 03:10, Frederic Weisbecker wrote:
> When a timer is enqueued with a negative delta (ie: expiry is below
> base->clk), it gets added to the wheel as expiring now (base->clk).
> 
> Yet the value that gets stored in base->next_expiry, while calling
> trigger_dyntick_cpu(), is the initial timer->expires value. The
> resulting state becomes:
> 
> 	base->next_expiry < base->clk
> 
> On the next timer enqueue, forward_timer_base() may accidentally
> rewind base->clk. As a possible outcome, timers may expire way too
> early, the worst case being that the highest wheel levels get spuriously
> processed again.
> 
> To prevent from that, make sure that base->next_expiry doesn't get below
> base->clk.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> ---
>  kernel/time/timer.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 398e6eadb861..9a838d38dbe6 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -584,7 +584,15 @@ trigger_dyntick_cpu(struct timer_base *base, struct timer_list *timer)
>  	 * Set the next expiry time and kick the CPU so it can reevaluate the
>  	 * wheel:
>  	 */
> -	base->next_expiry = timer->expires;
> +	if (time_before(timer->expires, base->clk)) {
> +		/*
> +		 * Prevent from forward_timer_base() moving the base->clk
> +		 * backward
> +		 */
> +		base->next_expiry = base->clk;
> +	} else {
> +		base->next_expiry = timer->expires;
> +	}
>  	wake_up_nohz_cpu(base->cpu);
>  }
>  
> @@ -896,10 +904,13 @@ static inline void forward_timer_base(struct timer_base *base)
>  	 * If the next expiry value is > jiffies, then we fast forward to
>  	 * jiffies otherwise we forward to the next expiry value.
>  	 */
> -	if (time_after(base->next_expiry, jnow))
> +	if (time_after(base->next_expiry, jnow)) {
>  		base->clk = jnow;
> -	else
> +	} else {
> +		if (WARN_ON_ONCE(time_before(base->next_expiry, base->clk)))
> +			return;

I actually ported it to latest RT tree (v5.6.17-rt10) w/o conflicts,
but hit this one above:

 ...
 000: Kernel command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.17-rt10 root=/dev/mapper/rhel_rt--qe--04-root ro crashkernel=auto resume=/dev/mapper/rhel_rt--qe--04-swap rd.lvm.lv=rhel_rt-qe-04/root rd.lvm.lv=rhel_rt-qe-04/swap console=ttyS0,115200
 000: mem auto-init: stack:off, heap alloc:off, heap free:off
 000: Memory: 2102240K/134089036K available (12292K kernel code, 2449K rwdata, 4332K rodata, 2248K init, 15392K bss, 2278548K reserved, 0K cma-reserved)
 000: random: get_random_u64 called from cache_random_seq_create+0x7c/0x140 with crng_init=0
 000: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=24, Nodes=2
 000: Kernel/User page tables isolation: enabled
 000: ftrace: allocating 37680 entries in 148 pages
 000: ftrace: allocated 148 pages with 3 groups
 000: rcu: Preemptible hierarchical RCU implementation.
 000: rcu:        RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=24.
 000: rcu:        RCU priority boosting: priority 1 delay 500 ms.
 000: rcu:        RCU_SOFTIRQ processing moved to rcuc kthreads.
 000:     No expedited grace period (rcu_normal_after_boot).
 000:     Tasks RCU enabled.
 000: rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
 000: rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=24
 000: NR_IRQS: 524544, nr_irqs: 1432, preallocated irqs: 16
 000: random: crng done (trusting CPU's manufacturer)
 000: Console: colour VGA+ 80x25
 000: printk: console [ttyS0] enabled
 000: ACPI: Core revision 20200110
 000: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
 000: APIC: Switch to symmetric I/O mode setup
 000: DMAR: Host address width 46
 000: DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
 000: DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020df
 000: DMAR: DRHD base: 0x000000c7ffc000 flags: 0x1
 000: DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap d2078c106f0466 ecap f020df
 000: DMAR: RMRR base: 0x00000079190000 end: 0x00000079192fff
 000: DMAR: RMRR base: 0x000000791f4000 end: 0x000000791f7fff
 000: DMAR: RMRR base: 0x000000791de000 end: 0x000000791f3fff
 000: DMAR: RMRR base: 0x000000791cb000 end: 0x000000791dbfff
 000: DMAR: RMRR base: 0x000000791dc000 end: 0x000000791ddfff
 000: DMAR: RMRR base: 0x0000005a661000 end: 0x0000005a6a0fff
 000: DMAR-IR: IOAPIC id 10 under DRHD base  0xfbffc000 IOMMU 0
 000: DMAR-IR: IOAPIC id 8 under DRHD base  0xc7ffc000 IOMMU 1
 000: DMAR-IR: IOAPIC id 9 under DRHD base  0xc7ffc000 IOMMU 1
 000: DMAR-IR: HPET id 0 under DRHD base 0xc7ffc000
 000: DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
 000: DMAR-IR: Enabled IRQ remapping in x2apic mode
 000: x2apic enabled
 000: Switched APIC routing to cluster x2apic.
 000: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
 000: clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x228d095820e, max_idle_ns: 440795295198 ns
 000: ------------[ cut here ]------------
 000: WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:897 add_timer_on+0x129/0x140
 000: Modules linked in:
 000: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.17-rt10 #1
 000: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/25/2017
 000: RIP: 0010:add_timer_on+0x129/0x140
 000: Code: 24 48 89 ef e8 e8 ca 7d 00 48 8b 44 24 08 65 48 33 04 25 28 00 00 00 75 1f 48 83 c4 10 5b 5d 41 5c c3 48 89 45 48 eb ca 0f 0b <0f> 0b eb c4 e8 53 94 e9 ff e9 7b ff ff ff e8 64 c4 f6 ff 0f 1f 40
 000: RSP: 0000:ffffffffb2c03e80 EFLAGS: 00010083
 000: RAX: 00000000fffb6c25 RBX: ffffffffb3886540 RCX: 0000000000000000
 000: RDX: 00000000fffb6c20 RSI: ffffffffb2c03e80 RDI: 0000000000000001
 000: RBP: ffff92687fa19300 R08: 0000000000000000 R09: 0000000000000001
 000: R10: ffffffffb2c5b4e0 R11: 3235393730343420 R12: 0000000000000000
 000: R13: ffffffffb2cd42c0 R14: 0000000000000000 R15: ffff92687fa27f40
 000: FS:  0000000000000000(0000) GS:ffff92687fa00000(0000) knlGS:0000000000000000
 000: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 000: CR2: ffff92610ac01000 CR3: 00000008c920e001 CR4: 00000000000606b0
 000: Call Trace:
 000:  clocksource_select_watchdog+0x144/0x1a0
 000:  __clocksource_register_scale+0x88/0xf0
 000:  tsc_init+0x1a1/0x268
 000:  start_kernel+0x4ae/0x56e
 000:  secondary_startup_64+0xb6/0xc0
 000: ---[ end trace 0000000000000001 ]---

Guess you might be faster to understand what I'm missing. :-)

Best,

Juri


  reply	other threads:[~2020-07-01 16:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-01  1:10 [RFC PATCH 00/10] timer: Reduce timers softirq (and other optimizations) Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 01/10] timer: Prevent base->clk from moving backward Frederic Weisbecker
2020-07-01 16:35   ` Juri Lelli [this message]
2020-07-01 23:20     ` Frederic Weisbecker
2020-07-02  9:59       ` Juri Lelli
2020-07-02 14:04         ` Frederic Weisbecker
2020-07-02 14:32         ` Frederic Weisbecker
2020-07-02 15:57           ` Juri Lelli
2020-07-02  9:48   ` Thomas Gleixner
2020-07-01  1:10 ` [RFC PATCH 02/10] timer: Move trigger_dyntick_cpu() to enqueue_timer() Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 03/10] timer: Simplify LVL_START() and calc_index() Frederic Weisbecker
2020-07-02 11:59   ` Thomas Gleixner
2020-07-02 12:27     ` Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 04/10] timer: Optimize _next_timer_interrupt() level iteration Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 05/10] timers: Always keep track of next expiry Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 06/10] timer: Reuse next expiry cache after nohz exit Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 07/10] timer: Expand clk forward logic beyond nohz Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 08/10] timer: Spare timer softirq until next expiry Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 09/10] timer: Remove must_forward_clk Frederic Weisbecker
2020-07-01  1:10 ` [RFC PATCH 10/10] timer: Lower base clock forwarding threshold Frederic Weisbecker
2020-07-02 13:21   ` Thomas Gleixner
2020-07-02 13:32     ` Frederic Weisbecker
2020-07-02 15:14       ` Thomas Gleixner
2020-07-03  0:12         ` Frederic Weisbecker
2020-07-03  9:13           ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200701163504.GD9670@localhost.localdomain \
    --to=juri.lelli@redhat.com \
    --cc=anna-maria@linutronix.de \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.