All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu
@ 2014-08-28  4:52 Don Zickus
  2014-08-28 23:07 ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Don Zickus @ 2014-08-28  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Ingo Molnar, chai wen, Don Zickus

From: chai wen <chaiw.fnst@cn.fujitsu.com>

For now, soft lockup detector warns once for each case of process softlockup.
But the thread 'watchdog/n' may not always get the cpu at the time slot between
the task switch of two processes hogging that cpu to reset soft_watchdog_warn.

An example would be two processes hogging the cpu.  Process A causes the
softlockup warning and is killed manually by a user.  Process B immediately
becomes the new process hogging the cpu preventing the softlockup code from
resetting the soft_watchdog_warn variable.

This case is a false negative of "warn only once for a process", as there may
be a different process that is going to hog the cpu.  Resolve this by
saving/checking the task pointer of the hogging process and use that to reset
soft_watchdog_warn too.

Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 kernel/watchdog.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index c3319bd..499f65f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -47,6 +47,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
+static DEFINE_PER_CPU(struct task_struct *, softlockup_task_ptr_saved);
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
@@ -331,8 +332,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 			return HRTIMER_RESTART;
 
 		/* only warn once */
-		if (__this_cpu_read(soft_watchdog_warn) == true)
+		if (__this_cpu_read(soft_watchdog_warn) == true) {
+			/*
+			 * Handle the case where multiple processes are
+			 * causing softlockups but the duration is small
+			 * enough, the softlockup detector can not reset
+			 * itself in time.  Use task pointers to detect this.
+			 */
+			if (__this_cpu_read(softlockup_task_ptr_saved) !=
+			    current) {
+				__this_cpu_write(soft_watchdog_warn, false);
+				__touch_watchdog();
+			}
 			return HRTIMER_RESTART;
+		}
 
 		if (softlockup_all_cpu_backtrace) {
 			/* Prevent multiple soft-lockup reports if one cpu is already
@@ -348,6 +361,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
 			smp_processor_id(), duration,
 			current->comm, task_pid_nr(current));
+		__this_cpu_write(softlockup_task_ptr_saved, current);
 		print_modules();
 		print_irqtrace_events(current);
 		if (regs)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread
* Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
@ 2014-08-21  2:30 Don Zickus
  2014-08-21  5:42 ` [PATCH] " chai wen
  0 siblings, 1 reply; 9+ messages in thread
From: Don Zickus @ 2014-08-21  2:30 UTC (permalink / raw)
  To: Chai Wen; +Cc: Ingo Molnar, akpm, kvm, pbonzini, mingo, LKML

On Thu, Aug 21, 2014 at 09:37:04AM +0800, Chai Wen wrote:
> On 08/19/2014 09:36 AM, Chai Wen wrote:
> 
> > On 08/19/2014 04:38 AM, Don Zickus wrote:
> > 
> >> On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote:
> >>>
> >>> * Don Zickus <dzickus@redhat.com> wrote:
> >>>
> >>>>>>> So I agree with the motivation of this improvement, but 
> >>>>>>> is this implementation namespace-safe?
> >>>>>>
> >>>>>> What namespace are you worried about colliding with?  I 
> >>>>>> thought softlockup_ would provide the safety??  Maybe I 
> >>>>>> am missing something obvious. :-(
> >>>>>
> >>>>> I meant PID namespaces - a PID in itself isn't guaranteed 
> >>>>> to be unique across the system.
> >>>>
> >>>> Ah, I don't think we thought about that.  Is there a better 
> >>>> way to do this?  Is there a domain id or something that can 
> >>>> be OR'd with the pid?
> >>>
> >>> What is always unique is the task pointer itself. We use pids 
> >>> when we interface with user-space - but we don't really do that 
> >>> here, right?
> >>
> >> No, I don't believe so.  Ok, so saving 'current' and comparing that should
> >> be enough, correct?
> >>
> > 
> > 
> > I am not sure of the safety about using pid here with namespace.
> > But as to the pointer of process, is there a chance that we got a 'historical'
> > address saved in the 'softlockup_warn_pid(or address)_saved' and the current
> > hogging process happened to get the same task pointer address?
> > If it never happens, I think the comparing of address is ok.
> > 
> 
> 
> Hi Ingo
> 
> what do you think of Don's solution- 'comparing of task pointer' ?
> Anyway this is just an additional check about some very special cases,
> so I think the issue that I am concerned above is not a problem at all.
> And after learning some concepts about PID namespace, I think comparing
> of task pointer is reliable dealing with PID namespace here.
> 
> And Don, If you want me to re-post this patch, please let me know that.

Sure, just quickly test with the task pointer to make sure it still works
and then re-post.

Cheers,
Don

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-08-29  1:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-28  4:52 [PATCH] softlockup: Make detector be aware of task switch of processes hogging cpu Don Zickus
2014-08-28 23:07 ` Andrew Morton
2014-08-29  1:27   ` Don Zickus
  -- strict thread matches above, loose matches on Subject: below --
2014-08-21  2:30 [PATCH 2/5] softlockup: make " Don Zickus
2014-08-21  5:42 ` [PATCH] " chai wen
2014-08-22  1:12   ` Chai Wen
2014-08-22  1:58   ` Don Zickus
2014-08-26 12:51     ` Chai Wen
2014-08-26 14:22       ` Don Zickus
2014-08-27  1:33         ` Chai Wen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.