public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: John Stultz <jstultz@google.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Stephen Boyd <sboyd@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	x86@kernel.org
Subject: Re: [PATCH] clocksource: disable irq when holding watchdog_lock.
Date: Mon, 16 Oct 2023 23:47:55 +0200	[thread overview]
Message-ID: <878r826xys.ffs@tglx> (raw)
In-Reply-To: <CANDhNCpw+hEHNbtdAZR01HsHW_L1C0BXjZq21eXouQGNnYuUNQ@mail.gmail.com>

On Mon, Oct 16 2023 at 10:46, John Stultz wrote:
> On Fri, Oct 13, 2023 at 7:51 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> Lockdep found that spin_lock(&watchdog_lock) from call_timer_fn()
>> is not safe. Use spin_lock_irqsave(&watchdog_lock, flags) instead.
>>
>> [    0.378387] TSC synchronization [CPU#0 -> CPU#1]:
>> [    0.378387] Measured 55060 cycles TSC warp between CPUs, turning off TSC clock.

<SNIP/>

Please trim splats down to the relevant content. Timestamps and the bulk
of the backtraces are completely uninteresting. Something like this is
completely sufficient and readable:

   tsc: Marking TSC unstable due to check_tsc_sync_source failed
  
   WARNING: inconsistent lock state
   inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
   ffffffff8cfa1c78 (watchdog_lock){?.-.}-{2:2}, at: clocksource_watchdog+0x23/0x5a0
   {IN-HARDIRQ-W} state was registered at:
     _raw_spin_lock_irqsave+0x3f/0x60
     clocksource_mark_unstable+0x1b/0x90
     mark_tsc_unstable+0x41/0x50
     check_tsc_sync_source+0x14f/0x180
     sysvec_call_function_single+0x69/0x90

   Possible unsafe locking scenario:
     lock(watchdog_lock);
     <Interrupt>
       lock(watchdog_lock);

   stack backtrace:
    _raw_spin_lock+0x30/0x40
    clocksource_watchdog+0x23/0x5a0
    run_timer_softirq+0x2a/0x50
    sysvec_apic_timer_interrupt+0x6e/0x90

20 lines of useful information vs. ~100 lines containing mostly
noise. No?

Documentation/process/* clearly asks for this.

>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>> index c108ed8a9804..4e8fc0a5ca9d 100644
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -405,8 +405,9 @@ static void clocksource_watchdog(struct timer_list *unused)
>>         enum wd_read_status read_ret;
>>         unsigned long extra_wait = 0;
>>         u32 md;
>> +       unsigned long flags;

  https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations

>> -       spin_lock(&watchdog_lock);
>> +       spin_lock_irqsave(&watchdog_lock, flags);

That looks obvious, but it's just making the problem worse. See below.

> At first glance this looks sane to me.

Emphasis on "first glance" :)

> The only thing that gives me pause is that I'm a little shocked this
> hasn't been caught earlier, and I don't right off see recent changes
> that would have caused this to trip. But Thomas may have more insight.

From looking at the (for you neatly trimmed) backtrace:

     {IN-HARDIRQ-W} state was registered at:
     _raw_spin_lock_irqsave+0x3f/0x60
     clocksource_mark_unstable+0x1b/0x90
     mark_tsc_unstable+0x41/0x50
     check_tsc_sync_source+0x14f/0x180
     sysvec_call_function_single+0x69/0x90

it's bloody obvious which change caused that:

  9d349d47f0e3 ("x86/smpboot: Make TSC synchronization function call based")

Even if you don't recognize it immediately then:

  git grep -l check_tsc_sync_source | xargs git blame | grep check_tsc_sync_source

is not really rocket science either, right?

Vs. you being shocked: As long as that synchronization function does not
try to mark the TSC unstable, lockdep won't notice. Definitely
surprising that this took a few months to surface.

But that's not what shocks me...

That spinlock_irqsave() conversion is not a solution. Why?

  If that happens on RT then this is even worse. mark_tsc_unstable()
  is invoked from hard interrupt context via smp function call which
  is a NONO due to watchdog_lock being a spinlock and the _irqsave()
  change won't help anything.

  Yes, I broke it and did not think about that particular side effect
  when moving check_tsc_sync_source() from thread to smp function call
  context.

But that's just a distraction. What's even worse is that the patch
blindly slaps spinlock_irqsave() at the problem without actually looking
what the function does and why it does not use the irqsave variant in
the first place.

  The watchdog function deliberately runs with interrupts enabled and
  only explicitly disables them for the actual readout because there is
  no point in delaying interrupts for this horrorshow accross the whole
  function.

  But the explicit local_irq_disable()/enable() pair around the readout
  means that the patch above is obviously broken:

    spin_lock_irqsave(&watchdog_lock, flags);
    cs_watchdog_read()
      local_irq_disable();
      ...
      local_irq_enable();  <--- FAIL

  Lockdep should have yelled at that too, but even if it failed to do
  so then it's still very obviously wrong.

Something like the uncompiled/untested below should cure it for real. It
really does not matter whether the TSC unstable event happens a bit
later. The system is unhappy no matter what.

That said, this whole clocksource watchdog mess wants a proper
overhaul. It has become a pile of warts and duct tape by now and after
staring at it long enough there is no real reason to run it in a timer
callback anymore. It just can move into delayed work and the whole
locking problem can be reduced to the clocksource_mutex and some well
thought out atomic operations to handle the mark unstable case. But
that's a different story and not relevant for curing the problem at
hand.

Thanks,

        tglx
---
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -15,6 +15,7 @@
  * ( The serial nature of the boot logic and the CPU hotplug lock
  *   protects against more than 2 CPUs entering this code. )
  */
+#include <linux/workqueue.h>
 #include <linux/topology.h>
 #include <linux/spinlock.h>
 #include <linux/kernel.h>
@@ -342,6 +343,13 @@ static inline unsigned int loop_timeout(
 	return (cpumask_weight(topology_core_cpumask(cpu)) > 1) ? 2 : 20;
 }
 
+static void tsc_sync_mark_tsc_unstable(struct work_struct *work)
+{
+	mark_tsc_unstable("check_tsc_sync_source failed");
+}
+
+static DECLARE_WORK(tsc_sync_work, tsc_sync_mark_tsc_unstable);
+
 /*
  * The freshly booted CPU initiates this via an async SMP function call.
  */
@@ -395,7 +403,7 @@ static void check_tsc_sync_source(void *
 			"turning off TSC clock.\n", max_warp);
 		if (random_warps)
 			pr_warn("TSC warped randomly between CPUs\n");
-		mark_tsc_unstable("check_tsc_sync_source failed");
+		schedule_work(&tsc_sync_work);
 	}
 
 	/*

  reply	other threads:[~2023-10-16 21:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-13 14:51 [PATCH] clocksource: disable irq when holding watchdog_lock Tetsuo Handa
2023-10-16 17:46 ` John Stultz
2023-10-16 21:47   ` Thomas Gleixner [this message]
2023-10-16 23:03     ` Paul E. McKenney
2023-10-17  6:49       ` Thomas Gleixner
2023-10-17 14:11         ` Paul E. McKenney
2023-10-17 10:37     ` Tetsuo Handa
2023-10-17 14:10       ` Paul E. McKenney
2023-10-19  9:30         ` Tetsuo Handa
2023-10-19 12:14           ` Thomas Gleixner
2023-10-19 14:26             ` Tetsuo Handa
2023-10-20  3:30               ` Paul E. McKenney
2023-10-20 12:50                 ` Tetsuo Handa
2023-10-20 13:40                   ` Paul E. McKenney
2023-10-24 13:00               ` Thomas Gleixner
2023-10-24 14:52                 ` Tetsuo Handa
2023-10-25 21:28                   ` Thomas Gleixner
2023-10-25 21:31       ` [PATCH] x86/tsc: Defer marking TSC unstable to a worker Thomas Gleixner
2023-10-26  9:39         ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
2023-10-27 18:46         ` tip-bot2 for Thomas Gleixner
2023-10-20  9:02     ` [PATCH] clocksource: disable irq when holding watchdog_lock Sebastian Andrzej Siewior
2023-10-26  2:33 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878r826xys.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=bigeasy@linutronix.de \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=sboyd@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox