All of lore.kernel.org
 help / color / mirror / Atom feed
From: Viresh Kumar <viresh.kumar@linaro.org>
To: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org,
	alex.elder@linaro.org, johan@kernel.org
Subject: Re: [Query] Preemption (hogging) of the work handler
Date: Fri, 1 Jul 2016 10:28:54 -0700	[thread overview]
Message-ID: <20160701172854.GS12473@ubuntu> (raw)
In-Reply-To: <20160701172232.GD28719@htj.duckdns.org>

Thanks for the quick reply Tejun, really appreciate it.

On 01-07-16, 12:22, Tejun Heo wrote:
> Hello, Viresh.
> 
> On Fri, Jul 01, 2016 at 09:59:59AM -0700, Viresh Kumar wrote:
> > The system watchdog uses a delayed-work (1 second) for petting the
> > watchdog (resetting its counter) and if the work doesn't reset the
> > counters in time (another 1 second), the watchdog resets the system.
> > 
> > Petting-time: 1 second
> > Watchdog Reset-time: 2 seconds
> > 
> > The wq is allocated with:
> >         wdog_wq = alloc_workqueue("wdog", WQ_HIGHPRI, 0);
> 
> You probably want WQ_MEM_RECLAIM there to guarantee that it can run
> quickly under memory pressure.  In reality, this shouldn't matter too
> much as there aren't many competing highpri work items and thus it's
> likely that there are ready highpri workers waiting for work items.

Sure. I though we should also have WQ_UNBOUND here to let the handler
run on any CPU.

> > kernel: 3.10 (Yeah, you can rant me for that, but its not something I
> > can decide on :)
> 
> Android?

Yeah.

> > - But somehow, the timer isn't programmed for the right time.
> > 
> > - Something is happening between the time the work-handler starts
> >   running and we read jiffies from the add_timer() function which gets
> >   called from within the queue_delayed_work().
> > 
> > - For example, if the value of jiffies in the pet_watchdog_work()
> >   handler (before calling queue_delayed_work()) is say 1000000, then
> >   the value of jiffies after the call to queue_delayed_work() has
> >   returned becomes 1000310. i.e. it sometimes increases by a value of
> >   over 300, which is 1 second in our setup. I have seen this delta to
> >   vary from 50 to 350. If it crosses 300, the watchdog resets the
> >   system (as it was programmed for 2 seconds).
> 
> That's weird.  Once the work item starts executing, there isn't much
> which can delay it.  queue_delayed_work() doesn't even take any lock
> before reading jiffies.  In the failing cases, what's jiffies right
> before and after pet_watchdog_work()?  Can that take long?

Have verified that and that part isn't taking long even in the cases
where we reboot..

Sometimes (not always), I have read the jiffies around the
lock_irq_save() in queue_delayed_work() and the jiffies had a delta of
300 :)

> > So, we aren't able to queue the next timer in time and that causes all
> > these problems. I haven't concluded on why is that so..
> > 
> > Questions:
> > 
> > - I hope that the wq handler can be preempted, but can it be this bad?
> 
> It doesn't get preempted more than any other kthread w/ -20 nice
> value, so in most systems it shouldn't get preempted at all.

Hmm..

> > - Is it fine to use the wq-handler for petting the watchdog? Or should
> >   that only be done with help of interrupt-handlers?
> 
> It's absoultely fine but you'd prolly want WQ_HIGHPRI |
> WQ_MEM_RECLAIM.

Sure.

> > - Any other clues you can give which can help us figure out what's
> >   going on?
> > 
> > Thanks in advance and sorry to bother you :)
> 
> I'd watch sched TPs and see what actually is going on.  The described
> scenario should work completely fine.

Hmm, I will see if I can get those in..

-- 
viresh

  reply	other threads:[~2016-07-01 17:37 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 16:59 [Query] Preemption (hogging) of the work handler Viresh Kumar
2016-07-01 17:22 ` Tejun Heo
2016-07-01 17:28   ` Viresh Kumar [this message]
2016-07-06 18:28   ` Viresh Kumar
2016-07-06 19:23     ` Steven Rostedt
2016-07-06 19:25       ` Viresh Kumar
2016-07-11 10:26     ` Jan Kara
2016-07-11 15:44       ` Sergey Senozhatsky
2016-07-11 22:35         ` Viresh Kumar
2016-07-11 22:44           ` Rafael J. Wysocki
2016-07-11 22:46             ` Viresh Kumar
2016-07-12 12:24               ` Rafael J. Wysocki
2016-07-12 13:02                 ` Viresh Kumar
2016-07-12 13:56                   ` Petr Mladek
2016-07-12 14:04                     ` Viresh Kumar
2016-07-12  9:38           ` Sergey Senozhatsky
2016-07-12 12:52             ` Petr Mladek
2016-07-12 13:12               ` Viresh Kumar
2016-07-12 17:11                 ` Viresh Kumar
2016-07-12 19:59                   ` Rafael J. Wysocki
2016-07-12 20:08                     ` Viresh Kumar
2016-07-13  7:00                   ` Sergey Senozhatsky
2016-07-13 12:05                     ` Rafael J. Wysocki
2016-07-13 12:57                       ` Sergey Senozhatsky
2016-07-13 13:22                         ` Rafael J. Wysocki
2016-07-12 14:03               ` Sergey Senozhatsky
2016-07-12 14:12                 ` Viresh Kumar
2016-07-14 23:52                 ` Viresh Kumar
2016-07-15 13:11                   ` Sergey Senozhatsky
2016-07-15 15:57                     ` Viresh Kumar
2016-07-12 23:19           ` Viresh Kumar
2016-07-13  0:18             ` Viresh Kumar
2016-07-13  5:45             ` Sergey Senozhatsky
2016-07-13 15:39               ` Viresh Kumar
2016-07-13 23:08                 ` Rafael J. Wysocki
2016-07-13 23:18                   ` Viresh Kumar
2016-07-13 23:38                     ` Greg Kroah-Hartman
2016-07-14  0:55                 ` Sergey Senozhatsky
2016-07-14  1:09                   ` Rafael J. Wysocki
2016-07-14  1:32                     ` Sergey Senozhatsky
2016-07-14 21:57                       ` Viresh Kumar
2016-07-14 21:55                   ` Viresh Kumar
2016-07-14 14:12               ` Jan Kara
2016-07-14 14:33                 ` Rafael J. Wysocki
2016-07-14 14:39                   ` Jan Kara
2016-07-14 14:47                     ` Rafael J. Wysocki
2016-07-14 14:55                       ` Jan Kara
2016-07-14 22:14                         ` Viresh Kumar
2016-07-14 14:34                 ` Sergey Senozhatsky
2016-07-14 15:03                   ` Jan Kara
2016-07-14 22:12                 ` Viresh Kumar
2016-07-18 11:01                   ` Jan Kara
2016-07-18 11:49                     ` Rafael J. Wysocki
2016-07-29 20:42               ` Viresh Kumar
2016-07-30  2:12                 ` Sergey Senozhatsky
2016-07-11 19:03       ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160701172854.GS12473@ubuntu \
    --to=viresh.kumar@linaro.org \
    --cc=alex.elder@linaro.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=johan@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=vaibhav.hiremath@linaro.org \
    --cc=vlevenetz@mm-sol.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.