Re: [Query] Preemption (hogging) of the work handler

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Viresh Kumar <viresh.kumar@linaro.org>
To: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org,
	alex.elder@linaro.org, johan@kernel.org,
	akpm@linux-foundation.org, jack@suse.cz, rostedt@goodmis.org
Subject: Re: [Query] Preemption (hogging) of the work handler
Date: Wed, 6 Jul 2016 11:28:42 -0700	[thread overview]
Message-ID: <20160706182842.GS2671@ubuntu> (raw)
In-Reply-To: <20160701172232.GD28719@htj.duckdns.org>

On 01-07-16, 12:22, Tejun Heo wrote:
> Hello, Viresh.
> 
> On Fri, Jul 01, 2016 at 09:59:59AM -0700, Viresh Kumar wrote:
> > The system watchdog uses a delayed-work (1 second) for petting the
> > watchdog (resetting its counter) and if the work doesn't reset the
> > counters in time (another 1 second), the watchdog resets the system.
> > 
> > Petting-time: 1 second
> > Watchdog Reset-time: 2 seconds
> > 
> > The wq is allocated with:
> >         wdog_wq = alloc_workqueue("wdog", WQ_HIGHPRI, 0);
> 
> You probably want WQ_MEM_RECLAIM there to guarantee that it can run
> quickly under memory pressure.  In reality, this shouldn't matter too
> much as there aren't many competing highpri work items and thus it's
> likely that there are ready highpri workers waiting for work items.
> 
> > The watchdog's work-handler looks like this:
> > 
> > static void pet_watchdog_work(struct work_struct *work)
> > {
> > 
> >         ...
> > 
> >         pet_watchdog(); //Reset its counters
> > 
> >         /* CONFIG_HZ=300, queuing for 1 second */
> >         queue_delayed_work(wdog_wq, &wdog_dwork, 300);
> > }
> > 
> > kernel: 3.10 (Yeah, you can rant me for that, but its not something I
> > can decide on :)
> 
> Android?
> 
> > Symptoms:
> > 
> > - The watchdog reboots the system sometimes. It is more reproducible
> >   in cases where an (out-of-tree) bus enumerated over USB is suddenly
> >   disconnected, which leads to removal of lots of kernel devices on
> >   that bus and a lot of print messages as well, due to failures for
> >   sending any more data for those devices..
> 
> I see.
> 
> > Observations:
> > 
> > I tried to get more into it and found this..
> > 
> > - The timer used by the delayed work fires at the time it was
> >   programmed for (checked timer->expires with value of jiffies) and
> >   the work-handler gets a chance to run and reset the counters pretty
> >   quickly after that.
> 
> Hmmm...
> 
> > - But somehow, the timer isn't programmed for the right time.
> > 
> > - Something is happening between the time the work-handler starts
> >   running and we read jiffies from the add_timer() function which gets
> >   called from within the queue_delayed_work().
> > 
> > - For example, if the value of jiffies in the pet_watchdog_work()
> >   handler (before calling queue_delayed_work()) is say 1000000, then
> >   the value of jiffies after the call to queue_delayed_work() has
> >   returned becomes 1000310. i.e. it sometimes increases by a value of
> >   over 300, which is 1 second in our setup. I have seen this delta to
> >   vary from 50 to 350. If it crosses 300, the watchdog resets the
> >   system (as it was programmed for 2 seconds).
> 
> That's weird.  Once the work item starts executing, there isn't much
> which can delay it.  queue_delayed_work() doesn't even take any lock
> before reading jiffies.  In the failing cases, what's jiffies right
> before and after pet_watchdog_work()?  Can that take long?
> 
> > So, we aren't able to queue the next timer in time and that causes all
> > these problems. I haven't concluded on why is that so..
> > 
> > Questions:
> > 
> > - I hope that the wq handler can be preempted, but can it be this bad?
> 
> It doesn't get preempted more than any other kthread w/ -20 nice
> value, so in most systems it shouldn't get preempted at all.
> 
> > - Is it fine to use the wq-handler for petting the watchdog? Or should
> >   that only be done with help of interrupt-handlers?
> 
> It's absoultely fine but you'd prolly want WQ_HIGHPRI |
> WQ_MEM_RECLAIM.
> 
> > - Any other clues you can give which can help us figure out what's
> >   going on?
> > 
> > Thanks in advance and sorry to bother you :)
> 
> I'd watch sched TPs and see what actually is going on.  The described
> scenario should work completely fine.

Okay, so I did trace what's going on the CPU for that long and I know
what it is now. The *print* messages :)

I enabled traces with '-e all' to look at everything happening on the
CPU.

Following is what starts in the middle of the delayed-work handler:

    kworker/0:1H-40    [000] d..1  2994.918766: console: [ 2994.918754] ***
    kworker/0:1H-40    [000] d..1  2994.923057: console: [ 2994.918817] ***
    kworker/0:1H-40    [000] d..1  2994.935639: console: [ 2994.918842] ***

... (more messages)

    kworker/0:1H-40    [000] d..1  2996.264372: console: [ 2994.943074] ***
    kworker/0:1H-40    [000] d..1  2996.268622: console: [ 2994.943410] ***
    kworker/0:1H-40    [000] d..1  2996.275050: ipi_entry: (Rescheduling interrupts)


(*** are some specific errors to our usecase and aren't relevant here)

These print messages continue from 2994.918 to 2996.268 (1.35 seconds)
and they hog the work-handler for that long, which results in watchdog
reboot in our setup. The 3.10 kernel implementation of the printk
looks like this (if I am not wrong):

        local_irq_save();
        flush-console-buffer(); //console_unlock()
        local_irq_restore();

So, the current CPU will try to print all the messages from the
buffer, before enabling the interrupts again on the local CPU and so I
don't see the hrtimer fire at all for almost a second.


I tried looking at if something related to this changed between 3.10
and mainline, and found few patches at least. One of the important
ones is:

commit 5874af2003b1 ("printk: enable interrupts before calling
console_trylock_for_printk()")

I wasn't able to backport it cleanly to 3.10 yet to see it makes thing
work better though. But it looks like it was targeting similar
problems.

@Jan Kara, Right ?

Any other useful piece of advice/information you guys have? I see Jan,
Andrew and Steven do a bunch of stuff in printk.c to make it better
and so Cc'ing them.

-- 
viresh

next prev parent reply	other threads:[~2016-07-06 18:28 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 16:59 [Query] Preemption (hogging) of the work handler Viresh Kumar
2016-07-01 17:22 ` Tejun Heo
2016-07-01 17:28   ` Viresh Kumar
2016-07-06 18:28   ` Viresh Kumar [this message]
2016-07-06 19:23     ` Steven Rostedt
2016-07-06 19:25       ` Viresh Kumar
2016-07-11 10:26     ` Jan Kara
2016-07-11 15:44       ` Sergey Senozhatsky
2016-07-11 22:35         ` Viresh Kumar
2016-07-11 22:44           ` Rafael J. Wysocki
2016-07-11 22:46             ` Viresh Kumar
2016-07-12 12:24               ` Rafael J. Wysocki
2016-07-12 13:02                 ` Viresh Kumar
2016-07-12 13:56                   ` Petr Mladek
2016-07-12 14:04                     ` Viresh Kumar
2016-07-12  9:38           ` Sergey Senozhatsky
2016-07-12 12:52             ` Petr Mladek
2016-07-12 13:12               ` Viresh Kumar
2016-07-12 17:11                 ` Viresh Kumar
2016-07-12 19:59                   ` Rafael J. Wysocki
2016-07-12 20:08                     ` Viresh Kumar
2016-07-13  7:00                   ` Sergey Senozhatsky
2016-07-13 12:05                     ` Rafael J. Wysocki
2016-07-13 12:57                       ` Sergey Senozhatsky
2016-07-13 13:22                         ` Rafael J. Wysocki
2016-07-12 14:03               ` Sergey Senozhatsky
2016-07-12 14:12                 ` Viresh Kumar
2016-07-14 23:52                 ` Viresh Kumar
2016-07-15 13:11                   ` Sergey Senozhatsky
2016-07-15 15:57                     ` Viresh Kumar
2016-07-12 23:19           ` Viresh Kumar
2016-07-13  0:18             ` Viresh Kumar
2016-07-13  5:45             ` Sergey Senozhatsky
2016-07-13 15:39               ` Viresh Kumar
2016-07-13 23:08                 ` Rafael J. Wysocki
2016-07-13 23:18                   ` Viresh Kumar
2016-07-13 23:38                     ` Greg Kroah-Hartman
2016-07-14  0:55                 ` Sergey Senozhatsky
2016-07-14  1:09                   ` Rafael J. Wysocki
2016-07-14  1:32                     ` Sergey Senozhatsky
2016-07-14 21:57                       ` Viresh Kumar
2016-07-14 21:55                   ` Viresh Kumar
2016-07-14 14:12               ` Jan Kara
2016-07-14 14:33                 ` Rafael J. Wysocki
2016-07-14 14:39                   ` Jan Kara
2016-07-14 14:47                     ` Rafael J. Wysocki
2016-07-14 14:55                       ` Jan Kara
2016-07-14 22:14                         ` Viresh Kumar
2016-07-14 14:34                 ` Sergey Senozhatsky
2016-07-14 15:03                   ` Jan Kara
2016-07-14 22:12                 ` Viresh Kumar
2016-07-18 11:01                   ` Jan Kara
2016-07-18 11:49                     ` Rafael J. Wysocki
2016-07-29 20:42               ` Viresh Kumar
2016-07-30  2:12                 ` Sergey Senozhatsky
2016-07-11 19:03       ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160706182842.GS2671@ubuntu \
    --to=viresh.kumar@linaro.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex.elder@linaro.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=johan@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vaibhav.hiremath@linaro.org \
    --cc=vlevenetz@mm-sol.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox