linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: linux@arm.linux.org.uk (Russell King - ARM Linux)
To: linux-arm-kernel@lists.infradead.org
Subject: question about detect hard lockups without NMIs using secondary cpus
Date: Wed, 29 Jul 2015 19:29:06 +0100	[thread overview]
Message-ID: <20150729182905.GO7557@n2100.arm.linux.org.uk> (raw)
In-Reply-To: <CADUS3onib4DYYtPo7RfLAYiN4fpspzcFCx86-pJwuLc37hz3dQ@mail.gmail.com>

On Thu, Jul 30, 2015 at 12:03:46AM +0800, yoma sophian wrote:
> hi all:
> below link introduced how to emulate NMIs on systems where they are
> not available by using timer interrupts on other cpus.
> 
> http://article.gmane.org/gmane.linux.kernel/1419661
> 
> in kernel/watchdog.c
>     --> watchdog_overflow_callback
>           if (is_hardlockup()) {
>            ...........................
>                 if (hardlockup_panic)
>                         panic("Watchdog detected hard LOCKUP on cpu %d",
>                               this_cpu); /*************/
>                 else
>                         WARN(1, "Watchdog detected hard LOCKUP on cpu %d",
>                              this_cpu);
>              .......................
>         }
> 
> I have some questions:
> a.
> in SMP system, suppose 4 cores, and hardlockup_panic is 1.
> Core0 find Core1 hard lcokup in hardIRQ context
> the panic function, above with '*' marked, will fail on
> smp_send_stop(), and we will have no idea where core1 is trapped in,
> right?

watchdog_overflow_callback() is only ever entered for the failed core.
What you missed out on is:

	int this_cpu = smp_processor_id();

which gets the CPU number of the CPU executing this code.  So, Core 0
will never find Core 1 having locked up via this code path.

> b.
> things will get worse if we are running single core system if hard
> lockup happen.
> We even have no idea what happen.

Basically, without NMIs (or FIQs in ARM speak) lockups with IRQs off are
undetectable by the kernel other than "the system stopped responding".

In a SMP system, there are mechanisms by which other CPUs can detect a
locked-up CPU, and they can call trigger_all_cpu_backtrace() - and that
can only get a trace out of the locked up CPU if it uses FIQs.  A CPU
which has locked up in an IRQs-off region won't be able to receive an
IRQ or IPI by definition.

Work has been going on for the last 9 months to try and bring a working
trigger_all_cpu_backtrace() implementation, initially with IRQs and
later with FIQs.

In previous merge windows, we have moved forward with getting some FIQ
changes merged, and in the next merge window, I have patches queued up
(available in linux-next) which add IRQ-based trigger_all_cpu_backtrace()
support.

The next piece of the puzzle is sorting out the patches which bring FIQ
based trigger_all_cpu_backtrace() support - but even if we do, that won't
be available everywhere - for example, it won't be available if your kernel
runs in the non-secure world with a secure monitor, because FIQs generally
aren't usable in that world.

The only other alternative is a hardware JTAG debugger to inspect the
state of all CPUs in the system.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

  reply	other threads:[~2015-07-29 18:29 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-29 16:03 question about detect hard lockups without NMIs using secondary cpus yoma sophian
2015-07-29 18:29 ` Russell King - ARM Linux [this message]
2015-07-30 16:20   ` yoma sophian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150729182905.GO7557@n2100.arm.linux.org.uk \
    --to=linux@arm.linux.org.uk \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).