public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
	Tony Luck <tony.luck@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Len Brown <len.brown@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>,
	Stephane Eranian <eranian@google.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Ricardo Neri <ricardo.neri@intel.com>,
	linuxppc-dev@lists.ozlabs.org, iommu@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org,
	Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Subject: Re: [PATCH v7 19/24] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector
Date: Tue, 03 Feb 2026 18:02:14 +0100	[thread overview]
Message-ID: <87y0l93gmh.ffs@tglx> (raw)
In-Reply-To: <20230301234753.28582-20-ricardo.neri-calderon@linux.intel.com>

On Wed, Mar 01 2023 at 15:47, Ricardo Neri wrote:
> + * An HPET channel is reserved for the detector. The channel issues an NMI to
> + * one of the CPUs in @watchdog_allowed_mask. This CPU monitors itself for
> + * hardlockups and sends an NMI IPI to the rest of the CPUs in the system.
> + *
> + * The detector uses IPI shorthands. Thus, all CPUs in the system get the NMI
> + * (offline CPUs also get the NMI but they "ignore" it). A cpumask is used to
> + * specify whether a CPU must check for hardlockups.
> + *
> + * The NMI also disturbs isolated CPUs. The detector fails to initialize if
> + * tick_nohz_full is enabled.

This can be completely avoided. See below.

> +/**
> + * hardlockup_detector_nmi_handler() - NMI Interrupt handler
> + * @type:	Type of NMI handler; not used.
> + * @regs:	Register values as seen when the NMI was asserted
> + *
> + * Check if our HPET channel caused the NMI. If yes, inspect for lockups by
> + * issuing an IPI to the rest of the CPUs. Also, kick the timer if it is
> + * non-periodic.
> + *
> + * Returns:
> + * NMI_DONE if the HPET timer did not cause the interrupt. NMI_HANDLED
> + * otherwise.
> + */
> +static int hardlockup_detector_nmi_handler(unsigned int type,
> +					   struct pt_regs *regs)
> +{
> +	struct hpet_hld_data *hdata = hld_data;
> +	int cpu;
> +
> +	/*
> +	 * The CPU handling the HPET NMI will land here and trigger the
> +	 * inspection of hardlockups in the rest of the monitored
> +	 * CPUs.
> +	 */
> +	if (is_hpet_hld_interrupt(hdata)) {
> +		/*
> +		 * Kick the timer first. If the HPET channel is periodic, it
> +		 * helps to reduce the delta between the expected TSC value and
> +		 * its actual value the next time the HPET channel fires.
> +		 */
> +		kick_timer(hdata, !(hdata->has_periodic));
> +
> +		if (cpumask_weight(hld_data->monitored_cpumask) > 1) {
> +			/*
> +			 * Since we cannot know the source of an NMI, the best
> +			 * we can do is to use a flag to indicate to all online
> +			 * CPUs that they will get an NMI and that the source of
> +			 * that NMI is the hardlockup detector. Offline CPUs
> +			 * also receive the NMI but they ignore it.
> +			 */
> +			cpumask_copy(hld_data->inspect_cpumask,
> +				     cpu_online_mask);
> +
> +			/* If we are here, IPI shorthands are enabled. */
> +			apic->send_IPI_allbutself(NMI_VECTOR);

This should be done differently so that the isolation stuff is
respected.

The HPET NMI is directed to one of the CPUs in the 'to be monitored'
mask. That CPU is known when this is [re-]configured. So this can be
changed to:

        seq = atomic_read(&hld.nmi_seq);
        if (seq == __this_cpu_read(hpet_wd_seq))
     		return;

        cpu = raw_smp_processor_id();
        if (cpu == hld.nmi_target) {
        	if (!hpet_hld_interrupt(...))
                	return;
        }
                
        next_cpu = cpumask_next_wrap(cpu, monitored_mask);
        if (next_cpu < nr_cpu_ids) {
                if (cpu == hld.nmi_target)
               	      atomic_inc(&hld.nmi_seq);
                apic->send_IPI(next_cpu, NMI_VECTOR);
	}
        __this_cpu_write(hpet_wd_seq, seq);

        watchdog_hardlockup_check();

which has the following downside:

      It requires x2APIC because xAPIC has this two stage ICR dance,
      which is not NMI safe and only to be used if there is no other
      choice, i.e. backtrace trigger from NMI context.

but the following upsides:

    1) The fast path quick check is a trivial memory read/cmp

       It is always dropping through for the NMI target CPUn as that has
       to check for the timing window. 

       For the other CPUs it only becomes true when the NMI target CPU
       incremented the sequence counter and sent a NMI to the next one.

    2) Propagating the NMI through the monitored mask avoids the whole
       limitation vs. the shorthand setup, which means the watchdog can
       be enabled early on.

       This also allows to use it in isolation scenarios.

Restricting this to x2APIC is in my opinion reasonable as the people who
crave for the lost PMC are surely not those who use 8 CPU machines from
two decades ago.

Thanks,

        tglx

  reply	other threads:[~2026-02-03 17:02 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01 23:47 [PATCH v7 00/24] x86: Implement an HPET-based hardlockup detector Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 01/24] x86/apic: Add irq_cfg::delivery_mode Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 02/24] x86/apic/msi: Use the delivery mode from irq_cfg for message composition Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 03/24] x86/apic: Add the X86_IRQ_ALLOC_AS_NMI interrupt allocation flag Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 04/24] x86/apic/vector: Implement a local APIC NMI controller Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 05/24] x86/apic/vector: Skip cleanup for the NMI vector Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 06/24] iommu/vt-d: Clear the redirection hint when the destination mode is physical Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 07/24] iommu/vt-d: Rework prepare_irte() to support per-interrupt delivery mode Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 08/24] iommu/vt-d: Set the IRTE delivery mode individually for each interrupt Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 09/24] iommu/amd: Expose [set|get]_dev_entry_bit() Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 10/24] iommu/amd: Enable NMIPass when allocating an NMI Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 11/24] iommu/amd: Compose MSI messages for NMIs in non-IR format Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 12/24] x86/hpet: Expose hpet_writel() in header Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 13/24] x86/hpet: Add helper function hpet_set_comparator_periodic() Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 14/24] x86/hpet: Prepare IRQ assignments to use the X86_ALLOC_AS_NMI flag Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 15/24] x86/hpet: Reserve an HPET channel for the hardlockup detector Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 16/24] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 17/24] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 18/24] init/main: Delay initialization of the lockup detector after smp_init() Ricardo Neri
2026-02-03 16:08   ` Thomas Gleixner
2023-03-01 23:47 ` [PATCH v7 19/24] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
2026-02-03 17:02   ` Thomas Gleixner [this message]
2023-03-01 23:47 ` [PATCH v7 20/24] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 21/24] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 22/24] x86/watchdog: Add a shim hardlockup detector Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 23/24] watchdog: Introduce hardlockup_detector_mark_unavailable() Ricardo Neri
2023-03-01 23:47 ` [PATCH v7 24/24] x86/tsc: Stop the HPET hardlockup detector if TSC become unstable Ricardo Neri
2023-04-13  3:58 ` [PATCH v7 00/24] x86: Implement an HPET-based hardlockup detector Ricardo Neri
2026-02-03 15:58   ` Thomas Gleixner
2026-02-04  5:02     ` Ricardo Neri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y0l93gmh.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=ak@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=eranian@google.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=ricardo.neri-calderon@linux.intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox