public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v3 wq/for-6.5] workqueue: Implement automatic CPU intensive detection and add monitoring
@ 2023-05-11 18:19 Tejun Heo
  2023-05-11 18:19 ` [PATCH 1/7] workqueue: Add pwq->stats[] and a monitoring script Tejun Heo
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Tejun Heo @ 2023-05-11 18:19 UTC (permalink / raw)
  To: jiangshanlai; +Cc: torvalds, peterz, linux-kernel, kernel-team

Hello,

v3: * Switched to hooking into scheduler_tick() instead of scheduling paths
      as suggested by Peter. It's less gnarly and works well in general;
      however, as the mechanism is now sampling based, there can be
      contrived cases where detection can be temporarily avoided. Also, it
      wouldn't work on nohz_full CPUs. Neither is critical especially given
      that common offenders are likely to be weeded out with the debug
      reporting over time.

    * As the above means that workqueue is no longer obersving all
      scheduling events, it can't track the CPU time being consumed by the
      workers on its own and thus can't use global clocks (e.g. jiffies).
      The CPU time consumption tracking is still done with
      p->se.sum_exec_runtime.

    * The mechanism was incorrectly monitoring the entire CPU time a given
      work item has consumed instead of each stretch without intervening
      sleeps. Fixed.

    * CPU time monitoring is now tick sampling based. The previous
      p->se.sum_exec_runtime implementation was missing CPU time consumed
      between the last scheduling event the work item finished and the
      completion, so, e.g., work items that never schedule would always be
      accounted as zero CPU time. While the sampling based implementation
      isn't very accurate, this is good enough for getting the overall
      picture of workqueues that consume a lot of CPU cycles.

    * Patches reordered so that the visibility one can be applied first.
      Documentation improved.

v2: * Lai pointed out that !SM_NONE cases should also be notified. 0001 and
      0004 are updated accordingly.

    * PeterZ suggested reporting on work items that trigger the auto CPU
      intensive mechanism. 0006 adds reporting of work functions that
      trigger the mechanism repeatedly with exponential backoff.

Hello,

To reduce the number of concurrent worker threads, workqueue holds back
starting per-cpu work items while the previous work item stays in the
RUNNING state. As such a per-cpu work item which consumes a lot of CPU
cycles, even if it has cond_resched()'s in the right places, can stall other
per-cpu work items.

To support per-cpu work items that may occupy the CPU for a substantial
period of time, workqueue has WQ_CPU_INTENSIVE flag which exempts work items
issued through the marked workqueue from concurrency management - they're
started immediately and don't block other work items. While this works, it's
error-prone in that a workqueue user can easily forget to set the flag or
set it unnecessarily. Furthermore, the impacts of the wrong flag setting can
be rather indirect and challenging to root-cause.

This patchset makes workqueue auto-detect CPU intensive work items based on
CPU consumption. If a work item consumes more than the threshold (10ms by
default) of CPU time, it's automatically marked as CPU intensive when it
gets scheduled out which unblocks starting of pending per-cpu work items.

The mechanism isn't foolproof in that the detection delays can add up if
many CPU-hogging work items are queued at the same time. However, in such
situations, the bigger problem likely is the CPU being saturated with
per-cpu work items and the solution would be making them UNBOUND. Future
changes will make UNBOUND workqueues more attractive by improving their
locality behaviors and configurability. We might eventually remove the
explicit WQ_CPU_INTENSIVE flag.

While at it, add statistics and a monitoring script. Lack of visibility has
always been a bit of pain point when debugging workqueue related issues and
with this change and more drastic ones planned for workqueue, this is a good
time to address the shortcoming.

This patchset was born out of the discussion in the following thread:

 https://lkml.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@mail.gmail.com

and contains the following patches:

 0001-workqueue-Add-pwq-stats-and-a-monitoring-script.patch
 0002-workqueue-Re-order-struct-worker-fields.patch
 0003-workqueue-Move-worker_set-clr_flags-upwards.patch
 0004-workqueue-Improve-locking-rule-description-for-worke.patch
 0005-workqueue-Automatically-mark-CPU-hogging-work-items-.patch
 0006-workqueue-Report-work-funcs-that-trigger-automatic-C.patch
 0007-workqueue-Track-and-monitor-per-workqueue-CPU-time-u.patch

and also available in the following git branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git auto-cpu-intensive-v3

diffstat follows. Thanks.

 Documentation/core-api/workqueue.rst |   32 +++++
 kernel/sched/core.c                  |    3
 kernel/workqueue.c                   |  337 ++++++++++++++++++++++++++++++++++++++++++++++++++-----------
 kernel/workqueue_internal.h          |   24 ++--
 lib/Kconfig.debug                    |   13 ++
 tools/workqueue/wq_monitor.py        |  169 ++++++++++++++++++++++++++++++
 6 files changed, 507 insertions(+), 71 deletions(-)

--
tejun


^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCHSET v4 wq/for-6.5] workqueue: Implement automatic CPU intensive detection and add monitoring
@ 2023-05-18  3:00 Tejun Heo
  2023-05-18  3:00 ` [PATCH 1/7] workqueue: Add pwq->stats[] and a monitoring script Tejun Heo
  0 siblings, 1 reply; 30+ messages in thread
From: Tejun Heo @ 2023-05-18  3:00 UTC (permalink / raw)
  To: jiangshanlai; +Cc: torvalds, peterz, linux-kernel, kernel-team

Hello,

The only meaningful change is the use of printk_deferred() in 0006 which was
posted in the v3 thread. I'm applying this version to wq/for-6.5. Posting
for the record.

v4: * 0006-workqueue-Report-work-funcs-that-trigger-automatic-C.patch
      updated to use printk_deferred() instead of custom bouncing which was
      broken and didn't resolve the deadlock possibility anyway.

    * Documentation updates.

v3: * Switched to hooking into scheduler_tick() instead of scheduling paths
      as suggested by Peter. It's less gnarly and works well in general;
      however, as the mechanism is now sampling based, there can be
      contrived cases where detection can be temporarily avoided. Also, it
      wouldn't work on nohz_full CPUs. Neither is critical especially given
      that common offenders are likely to be weeded out with the debug
      reporting over time.

    * As the above means that workqueue is no longer obersving all
      scheduling events, it can't track the CPU time being consumed by the
      workers on its own and thus can't use global clocks (e.g. jiffies).
      The CPU time consumption tracking is still done with
      p->se.sum_exec_runtime.

    * The mechanism was incorrectly monitoring the entire CPU time a given
      work item has consumed instead of each stretch without intervening
      sleeps. Fixed.

    * CPU time monitoring is now tick sampling based. The previous
      p->se.sum_exec_runtime implementation was missing CPU time consumed
      between the last scheduling event the work item finished and the
      completion, so, e.g., work items that never schedule would always be
      accounted as zero CPU time. While the sampling based implementation
      isn't very accurate, this is good enough for getting the overall
      picture of workqueues that consume a lot of CPU cycles.

    * Patches reordered so that the visibility one can be applied first.
      Documentation improved.

v2: * Lai pointed out that !SM_NONE cases should also be notified. 0001 and
      0004 are updated accordingly.

    * PeterZ suggested reporting on work items that trigger the auto CPU
      intensive mechanism. 0006 adds reporting of work functions that
      trigger the mechanism repeatedly with exponential backoff.

To reduce the number of concurrent worker threads, workqueue holds back
starting per-cpu work items while the previous work item stays in the
RUNNING state. As such a per-cpu work item which consumes a lot of CPU
cycles, even if it has cond_resched()'s in the right places, can stall other
per-cpu work items.

To support per-cpu work items that may occupy the CPU for a substantial
period of time, workqueue has WQ_CPU_INTENSIVE flag which exempts work items
issued through the marked workqueue from concurrency management - they're
started immediately and don't block other work items. While this works, it's
error-prone in that a workqueue user can easily forget to set the flag or
set it unnecessarily. Furthermore, the impacts of the wrong flag setting can
be rather indirect and challenging to root-cause.

This patchset makes workqueue auto-detect CPU intensive work items based on
CPU consumption. If a work item consumes more than the threshold (10ms by
default) of CPU time, it's automatically marked as CPU intensive when it
gets scheduled out which unblocks starting of pending per-cpu work items.

The mechanism isn't foolproof in that the detection delays can add up if
many CPU-hogging work items are queued at the same time. However, in such
situations, the bigger problem likely is the CPU being saturated with
per-cpu work items and the solution would be making them UNBOUND. Future
changes will make UNBOUND workqueues more attractive by improving their
locality behaviors and configurability. We might eventually remove the
explicit WQ_CPU_INTENSIVE flag.

While at it, add statistics and a monitoring script. Lack of visibility has
always been a bit of pain point when debugging workqueue related issues and
with this change and more drastic ones planned for workqueue, this is a good
time to address the shortcoming.

This patchset was born out of the discussion in the following thread:

 https://lkml.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@mail.gmail.com

and contains the following patches:

 0001-workqueue-Add-pwq-stats-and-a-monitoring-script.patch
 0002-workqueue-Re-order-struct-worker-fields.patch
 0003-workqueue-Move-worker_set-clr_flags-upwards.patch
 0004-workqueue-Improve-locking-rule-description-for-worke.patch
 0005-workqueue-Automatically-mark-CPU-hogging-work-items-.patch
 0006-workqueue-Report-work-funcs-that-trigger-automatic-C.patch
 0007-workqueue-Track-and-monitor-per-workqueue-CPU-time-u.patch

and also available in the following git branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git auto-cpu-intensive-v4

diffstat follows. Thanks.

 Documentation/admin-guide/kernel-parameters.txt |   12 +
 Documentation/core-api/workqueue.rst            |   32 +++++
 kernel/sched/core.c                             |    3
 kernel/workqueue.c                              |  302 ++++++++++++++++++++++++++++++++++++++++----------
 kernel/workqueue_internal.h                     |   24 ++-
 lib/Kconfig.debug                               |   13 ++
 tools/workqueue/wq_monitor.py                   |  168 +++++++++++++++++++++++++++
 7 files changed, 483 insertions(+), 71 deletions(-)

--
tejun


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-07-25 21:52 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-11 18:19 [PATCHSET v3 wq/for-6.5] workqueue: Implement automatic CPU intensive detection and add monitoring Tejun Heo
2023-05-11 18:19 ` [PATCH 1/7] workqueue: Add pwq->stats[] and a monitoring script Tejun Heo
2023-05-11 18:19 ` [PATCH 2/7] workqueue: Re-order struct worker fields Tejun Heo
2023-05-11 18:19 ` [PATCH 3/7] workqueue: Move worker_set/clr_flags() upwards Tejun Heo
2023-05-11 18:19 ` [PATCH 4/7] workqueue: Improve locking rule description for worker fields Tejun Heo
2023-05-11 18:19 ` [PATCH 5/7] workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE Tejun Heo
2023-05-11 21:23   ` Peter Zijlstra
2023-05-11 22:47     ` Tejun Heo
2023-05-11 18:19 ` [PATCH 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism Tejun Heo
2023-05-11 21:26   ` Peter Zijlstra
2023-05-11 22:52     ` Tejun Heo
2023-05-12 19:42   ` [PATCH v2 " Tejun Heo
2023-07-11 13:55     ` Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism) Geert Uytterhoeven
2023-07-11 14:06       ` Geert Uytterhoeven
2023-07-11 21:39         ` Tejun Heo
2023-07-12  0:30           ` Tejun Heo
2023-07-12  9:57             ` Geert Uytterhoeven
2023-07-17 23:03               ` Tejun Heo
2023-07-18  9:54                 ` Geert Uytterhoeven
2023-07-18 22:01                   ` Tejun Heo
2023-07-25 14:46                     ` Geert Uytterhoeven
2023-07-25 21:52                       ` [PATCH wq/for-6.5-fixes] workqueue: Drop the special locking rule for worker->flags and worker_pool->flags Tejun Heo
2023-07-12  8:05           ` Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism) Peter Zijlstra
2023-07-12  9:04             ` Geert Uytterhoeven
2023-07-12 12:27               ` Peter Zijlstra
2023-07-13 18:53                 ` Tejun Heo
2023-05-11 18:19 ` [PATCH 7/7] workqueue: Track and monitor per-workqueue CPU time usage Tejun Heo
2023-05-11 21:11   ` Peter Zijlstra
2023-05-11 23:03     ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2023-05-18  3:00 [PATCHSET v4 wq/for-6.5] workqueue: Implement automatic CPU intensive detection and add monitoring Tejun Heo
2023-05-18  3:00 ` [PATCH 1/7] workqueue: Add pwq->stats[] and a monitoring script Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox