public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: jiangshanlai@gmail.com
Cc: torvalds@linux-foundation.org, peterz@infradead.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com,
	Tejun Heo <tj@kernel.org>
Subject: [PATCH 7/7] workqueue: Track and monitor per-workqueue CPU time usage
Date: Thu, 11 May 2023 08:19:31 -1000	[thread overview]
Message-ID: <20230511181931.869812-8-tj@kernel.org> (raw)
In-Reply-To: <20230511181931.869812-1-tj@kernel.org>

Now that wq_worker_tick() is there, we can easily track the rough CPU time
consumption of each workqueue by charging the whole tick whenever a tick
hits an active workqueue. While not super accurate, it provides reasonable
visibility into the workqueues that consume a lot of CPU cycles.
wq_monitor.py is updated to report the per-workqueue CPU times.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 Documentation/core-api/workqueue.rst | 38 ++++++++++++++--------------
 kernel/workqueue.c                   |  3 +++
 tools/workqueue/wq_monitor.py        |  9 ++++++-
 3 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/Documentation/core-api/workqueue.rst b/Documentation/core-api/workqueue.rst
index a389f31b025c..a4c9b9d1905f 100644
--- a/Documentation/core-api/workqueue.rst
+++ b/Documentation/core-api/workqueue.rst
@@ -354,25 +354,25 @@ Monitoring
 Use tools/workqueue/wq_monitor.py to monitor workqueue operations: ::
 
   $ tools/workqueue/wq_monitor.py events
-                              total  infl CPUitsv  CMwake  mayday rescued
-  events                      18545     0       0       5       -       -
-  events_highpri                  8     0       0       0       -       -
-  events_long                     3     0       0       0       -       -
-  events_unbound              38306     0       -       -       -       -
-  events_freezable                0     0       0       0       -       -
-  events_power_efficient      29598     0       0       0       -       -
-  events_freezable_power_        10     0       0       0       -       -
-  sock_diag_events                0     0       0       0       -       -
-
-                              total  infl CPUitsv  CMwake  mayday rescued
-  events                      18548     0       0       5       -       -
-  events_highpri                  8     0       0       0       -       -
-  events_long                     3     0       0       0       -       -
-  events_unbound              38322     0       -       -       -       -
-  events_freezable                0     0       0       0       -       -
-  events_power_efficient      29603     0       0       0       -       -
-  events_freezable_power_        10     0       0       0       -       -
-  sock_diag_events                0     0       0       0       -       -
+                              total  infl  CPUtime  CPUhog  CMwake  mayday rescued
+  events                      18545     0      6.1       0       5       -       -
+  events_highpri                  8     0      0.0       0       0       -       -
+  events_long                     3     0      0.0       0       0       -       -
+  events_unbound              38306     0      0.1       -       -       -       -
+  events_freezable                0     0      0.0       0       0       -       -
+  events_power_efficient      29598     0      0.2       0       0       -       -
+  events_freezable_power_        10     0      0.0       0       0       -       -
+  sock_diag_events                0     0      0.0       0       0       -       -
+
+                              total  infl  CPUtime  CPUhog  CMwake  mayday rescued
+  events                      18548     0      6.1       0       5       -       -
+  events_highpri                  8     0      0.0       0       0       -       -
+  events_long                     3     0      0.0       0       0       -       -
+  events_unbound              38322     0      0.1       -       -       -       -
+  events_freezable                0     0      0.0       0       0       -       -
+  events_power_efficient      29603     0      0.2       0       0       -       -
+  events_freezable_power_        10     0      0.0       0       0       -       -
+  sock_diag_events                0     0      0.0       0       0       -       -
 
   ...
 
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b85d70f133f7..42c41d5368dc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -212,6 +212,7 @@ struct worker_pool {
 enum pool_workqueue_stats {
 	PWQ_STAT_STARTED,	/* work items started execution */
 	PWQ_STAT_COMPLETED,	/* work items completed execution */
+	PWQ_STAT_CPU_TIME,	/* total CPU time consumed */
 	PWQ_STAT_CPU_INTENSIVE,	/* wq_cpu_intensive_thresh_us violations */
 	PWQ_STAT_CM_WAKEUP,	/* concurrency-management worker wakeups */
 	PWQ_STAT_MAYDAY,	/* maydays to rescuer */
@@ -1170,6 +1171,8 @@ void wq_worker_tick(struct task_struct *task)
 	if (!pwq)
 		return;
 
+	pwq->stats[PWQ_STAT_CPU_TIME] += TICK_USEC;
+
 	/*
 	 * If the current worker is concurrency managed and hogged the CPU for
 	 * longer than wq_cpu_intensive_thresh_us, it's automatically marked
diff --git a/tools/workqueue/wq_monitor.py b/tools/workqueue/wq_monitor.py
index 5dc0b61b36e8..3c35239d50c4 100644
--- a/tools/workqueue/wq_monitor.py
+++ b/tools/workqueue/wq_monitor.py
@@ -12,6 +12,10 @@ https://github.com/osandov/drgn.
 
   infl     The number of currently in-flight work items.
 
+  CPUtime  Total CPU time consumed by the workqueue in seconds. This is
+           sampled from scheduler ticks and only provides ballpark
+           measurement. "nohz_full=" CPUs are excluded from measurement.
+
   CPUitsv  The number of times a concurrency-managed work item hogged CPU
            longer than the threshold (workqueue.cpu_intensive_thresh_us)
            and got excluded from concurrency management to avoid stalling
@@ -59,6 +63,7 @@ WQ_MEM_RECLAIM          = prog['WQ_MEM_RECLAIM']
 
 PWQ_STAT_STARTED        = prog['PWQ_STAT_STARTED']      # work items started execution
 PWQ_STAT_COMPLETED      = prog['PWQ_STAT_COMPLETED']	# work items completed execution
+PWQ_STAT_CPU_TIME       = prog['PWQ_STAT_CPU_TIME']     # total CPU time consumed
 PWQ_STAT_CPU_INTENSIVE  = prog['PWQ_STAT_CPU_INTENSIVE'] # wq_cpu_intensive_thresh_us violations
 PWQ_STAT_CM_WAKEUP      = prog['PWQ_STAT_CM_WAKEUP']    # concurrency-management worker wakeups
 PWQ_STAT_MAYDAY         = prog['PWQ_STAT_MAYDAY']	# maydays to rescuer
@@ -82,13 +87,14 @@ PWQ_NR_STATS            = prog['PWQ_NR_STATS']
                  'mem_reclaim'          : self.mem_reclaim,
                  'started'              : self.stats[PWQ_STAT_STARTED],
                  'completed'            : self.stats[PWQ_STAT_COMPLETED],
+                 'cputime'              : self.stats[PWQ_STAT_CPU_TIME],
                  'cpu_intensive'        : self.stats[PWQ_STAT_CPU_INTENSIVE],
                  'cm_wakeup'            : self.stats[PWQ_STAT_CM_WAKEUP],
                  'mayday'               : self.stats[PWQ_STAT_MAYDAY],
                  'rescued'              : self.stats[PWQ_STAT_RESCUED], }
 
     def table_header_str():
-        return f'{"":>24} {"total":>8} {"infl":>5} {"CPUitsv":>7} '\
+        return f'{"":>24} {"total":>8} {"infl":>5} {"CPUtime":>8} {"CPUitsv":>7} '\
             f'{"CMwake":>7} {"mayday":>7} {"rescued":>7}'
 
     def table_row_str(self):
@@ -108,6 +114,7 @@ PWQ_NR_STATS            = prog['PWQ_NR_STATS']
         out = f'{self.name[-24:]:24} ' \
               f'{self.stats[PWQ_STAT_STARTED]:8} ' \
               f'{max(self.stats[PWQ_STAT_STARTED] - self.stats[PWQ_STAT_COMPLETED], 0):5} ' \
+              f'{self.stats[PWQ_STAT_CPU_TIME] / 1000000:8.1f} ' \
               f'{cpu_intensive:>7} ' \
               f'{cm_wakeup:>7} ' \
               f'{mayday:>7} ' \
-- 
2.40.1


  parent reply	other threads:[~2023-05-11 18:20 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11 18:19 [PATCHSET v3 wq/for-6.5] workqueue: Implement automatic CPU intensive detection and add monitoring Tejun Heo
2023-05-11 18:19 ` [PATCH 1/7] workqueue: Add pwq->stats[] and a monitoring script Tejun Heo
2023-05-11 18:19 ` [PATCH 2/7] workqueue: Re-order struct worker fields Tejun Heo
2023-05-11 18:19 ` [PATCH 3/7] workqueue: Move worker_set/clr_flags() upwards Tejun Heo
2023-05-11 18:19 ` [PATCH 4/7] workqueue: Improve locking rule description for worker fields Tejun Heo
2023-05-11 18:19 ` [PATCH 5/7] workqueue: Automatically mark CPU-hogging work items CPU_INTENSIVE Tejun Heo
2023-05-11 21:23   ` Peter Zijlstra
2023-05-11 22:47     ` Tejun Heo
2023-05-11 18:19 ` [PATCH 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism Tejun Heo
2023-05-11 21:26   ` Peter Zijlstra
2023-05-11 22:52     ` Tejun Heo
2023-05-12 19:42   ` [PATCH v2 " Tejun Heo
2023-07-11 13:55     ` Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism) Geert Uytterhoeven
2023-07-11 14:06       ` Geert Uytterhoeven
2023-07-11 21:39         ` Tejun Heo
2023-07-12  0:30           ` Tejun Heo
2023-07-12  9:57             ` Geert Uytterhoeven
2023-07-17 23:03               ` Tejun Heo
2023-07-18  9:54                 ` Geert Uytterhoeven
2023-07-18 22:01                   ` Tejun Heo
2023-07-25 14:46                     ` Geert Uytterhoeven
2023-07-25 21:52                       ` [PATCH wq/for-6.5-fixes] workqueue: Drop the special locking rule for worker->flags and worker_pool->flags Tejun Heo
2023-07-12  8:05           ` Consider switching to WQ_UNBOUND messages (was: Re: [PATCH v2 6/7] workqueue: Report work funcs that trigger automatic CPU_INTENSIVE mechanism) Peter Zijlstra
2023-07-12  9:04             ` Geert Uytterhoeven
2023-07-12 12:27               ` Peter Zijlstra
2023-07-13 18:53                 ` Tejun Heo
2023-05-11 18:19 ` Tejun Heo [this message]
2023-05-11 21:11   ` [PATCH 7/7] workqueue: Track and monitor per-workqueue CPU time usage Peter Zijlstra
2023-05-11 23:03     ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2023-05-18  3:00 [PATCHSET v4 wq/for-6.5] workqueue: Implement automatic CPU intensive detection and add monitoring Tejun Heo
2023-05-18  3:00 ` [PATCH 7/7] workqueue: Track and monitor per-workqueue CPU time usage Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230511181931.869812-8-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox