All of lore.kernel.org
 help / color / mirror / Atom feed
From: Viresh Kumar <viresh.kumar@linaro.org>
To: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org,
	alex.elder@linaro.org, johan@kernel.org
Subject: [Query] Preemption (hogging) of the work handler
Date: Fri, 1 Jul 2016 09:59:59 -0700	[thread overview]
Message-ID: <20160701165959.GR12473@ubuntu> (raw)

Hi Tejun,

we are stuck with a typical issue on our octa-core ARM platform and
wanted to make sure that we aren't abusing the workqueue API by using
it for the wrong usecase.

Setup:

The system watchdog uses a delayed-work (1 second) for petting the
watchdog (resetting its counter) and if the work doesn't reset the
counters in time (another 1 second), the watchdog resets the system.

Petting-time: 1 second
Watchdog Reset-time: 2 seconds

The wq is allocated with:
        wdog_wq = alloc_workqueue("wdog", WQ_HIGHPRI, 0);

The watchdog's work-handler looks like this:

static void pet_watchdog_work(struct work_struct *work)
{

        ...

        pet_watchdog(); //Reset its counters

        /* CONFIG_HZ=300, queuing for 1 second */
        queue_delayed_work(wdog_wq, &wdog_dwork, 300);
}

kernel: 3.10 (Yeah, you can rant me for that, but its not something I
can decide on :)

Symptoms:

- The watchdog reboots the system sometimes. It is more reproducible
  in cases where an (out-of-tree) bus enumerated over USB is suddenly
  disconnected, which leads to removal of lots of kernel devices on
  that bus and a lot of print messages as well, due to failures for
  sending any more data for those devices..

Observations:

I tried to get more into it and found this..

- The timer used by the delayed work fires at the time it was
  programmed for (checked timer->expires with value of jiffies) and
  the work-handler gets a chance to run and reset the counters pretty
  quickly after that.

- But somehow, the timer isn't programmed for the right time.

- Something is happening between the time the work-handler starts
  running and we read jiffies from the add_timer() function which gets
  called from within the queue_delayed_work().

- For example, if the value of jiffies in the pet_watchdog_work()
  handler (before calling queue_delayed_work()) is say 1000000, then
  the value of jiffies after the call to queue_delayed_work() has
  returned becomes 1000310. i.e. it sometimes increases by a value of
  over 300, which is 1 second in our setup. I have seen this delta to
  vary from 50 to 350. If it crosses 300, the watchdog resets the
  system (as it was programmed for 2 seconds).


So, we aren't able to queue the next timer in time and that causes all
these problems. I haven't concluded on why is that so..

Questions:

- I hope that the wq handler can be preempted, but can it be this bad?
- Is it fine to use the wq-handler for petting the watchdog? Or should
  that only be done with help of interrupt-handlers?
- Any other clues you can give which can help us figure out what's
  going on?

Thanks in advance and sorry to bother you :)

-- 
viresh

             reply	other threads:[~2016-07-01 17:00 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 16:59 Viresh Kumar [this message]
2016-07-01 17:22 ` [Query] Preemption (hogging) of the work handler Tejun Heo
2016-07-01 17:28   ` Viresh Kumar
2016-07-06 18:28   ` Viresh Kumar
2016-07-06 19:23     ` Steven Rostedt
2016-07-06 19:25       ` Viresh Kumar
2016-07-11 10:26     ` Jan Kara
2016-07-11 15:44       ` Sergey Senozhatsky
2016-07-11 22:35         ` Viresh Kumar
2016-07-11 22:44           ` Rafael J. Wysocki
2016-07-11 22:46             ` Viresh Kumar
2016-07-12 12:24               ` Rafael J. Wysocki
2016-07-12 13:02                 ` Viresh Kumar
2016-07-12 13:56                   ` Petr Mladek
2016-07-12 14:04                     ` Viresh Kumar
2016-07-12  9:38           ` Sergey Senozhatsky
2016-07-12 12:52             ` Petr Mladek
2016-07-12 13:12               ` Viresh Kumar
2016-07-12 17:11                 ` Viresh Kumar
2016-07-12 19:59                   ` Rafael J. Wysocki
2016-07-12 20:08                     ` Viresh Kumar
2016-07-13  7:00                   ` Sergey Senozhatsky
2016-07-13 12:05                     ` Rafael J. Wysocki
2016-07-13 12:57                       ` Sergey Senozhatsky
2016-07-13 13:22                         ` Rafael J. Wysocki
2016-07-12 14:03               ` Sergey Senozhatsky
2016-07-12 14:12                 ` Viresh Kumar
2016-07-14 23:52                 ` Viresh Kumar
2016-07-15 13:11                   ` Sergey Senozhatsky
2016-07-15 15:57                     ` Viresh Kumar
2016-07-12 23:19           ` Viresh Kumar
2016-07-13  0:18             ` Viresh Kumar
2016-07-13  5:45             ` Sergey Senozhatsky
2016-07-13 15:39               ` Viresh Kumar
2016-07-13 23:08                 ` Rafael J. Wysocki
2016-07-13 23:18                   ` Viresh Kumar
2016-07-13 23:38                     ` Greg Kroah-Hartman
2016-07-14  0:55                 ` Sergey Senozhatsky
2016-07-14  1:09                   ` Rafael J. Wysocki
2016-07-14  1:32                     ` Sergey Senozhatsky
2016-07-14 21:57                       ` Viresh Kumar
2016-07-14 21:55                   ` Viresh Kumar
2016-07-14 14:12               ` Jan Kara
2016-07-14 14:33                 ` Rafael J. Wysocki
2016-07-14 14:39                   ` Jan Kara
2016-07-14 14:47                     ` Rafael J. Wysocki
2016-07-14 14:55                       ` Jan Kara
2016-07-14 22:14                         ` Viresh Kumar
2016-07-14 14:34                 ` Sergey Senozhatsky
2016-07-14 15:03                   ` Jan Kara
2016-07-14 22:12                 ` Viresh Kumar
2016-07-18 11:01                   ` Jan Kara
2016-07-18 11:49                     ` Rafael J. Wysocki
2016-07-29 20:42               ` Viresh Kumar
2016-07-30  2:12                 ` Sergey Senozhatsky
2016-07-11 19:03       ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160701165959.GR12473@ubuntu \
    --to=viresh.kumar@linaro.org \
    --cc=alex.elder@linaro.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=johan@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=vaibhav.hiremath@linaro.org \
    --cc=vlevenetz@mm-sol.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.