From mboxrd@z Thu Jan 1 00:00:00 1970 From: akpm@linux-foundation.org (Andrew Morton) Date: Mon, 14 Jan 2013 16:25:04 -0800 Subject: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus In-Reply-To: References: <1357941108-14138-1-git-send-email-ccross@android.com> <20130114154914.6d69eb27.akpm@linux-foundation.org> Message-ID: <20130114162504.e667a4be.akpm@linux-foundation.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Mon, 14 Jan 2013 16:19:23 -0800 Colin Cross wrote: > >> +static void watchdog_check_hardlockup_other_cpu(void) > >> +{ > >> + unsigned int next_cpu; > >> + > >> + /* > >> + * Test for hardlockups every 3 samples. The sample period is > >> + * watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over > >> + * watchdog_thresh (over by 20%). > >> + */ > >> + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0) > >> + return; > > > > The hardwired interval Seems Wrong. watchdog_thresh is tunable at runtime. > > > > The comment could do with some fleshing out. *why* do we want to test > > at an interval "slightly over watchdog_thresh"? What's going on here? > > I'll reword it. We don't want to be slightly over watchdog_thresh, > ideally we would be exactly at watchdog_thresh. However, since this > relies on the hrtimer interrupts that are scheduled at watchdog_thresh > * 2 / 5, there is no multiple of hrtimer_interrupts that will result > in watchdog_thresh. watchdog_thresh * 2 / 5 * 3 (watchdog_thresh * > 1.2) is the closest I can get to testing for a hardlockup once every > watchdog_thresh seconds. It needs more than rewording, doesn't it? What happens if watchdog_thresh is altered at runtime?