linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Zhaolei <zhaolei@cn.fujitsu.com>,
	mingo@elte.hu, kosaki.motohiro@jp.fujitsu.com,
	rostedt@goodmis.org, tzanussi@gmail.com,
	linux-kernel@vger.kernel.org, oleg@redhat.com
Subject: Re: [PATCH 0/4] workqueue_tracepoint: Add worklet tracepoints for worklet lifecycle tracing
Date: Sat, 25 Apr 2009 00:59:10 +0200	[thread overview]
Message-ID: <20090424225909.GA6658@nowhere> (raw)
In-Reply-To: <20090424130616.a3c217cb.akpm@linux-foundation.org>

On Fri, Apr 24, 2009 at 01:06:16PM -0700, Andrew Morton wrote:
> On Fri, 24 Apr 2009 19:42:19 +0800
> Zhaolei <zhaolei@cn.fujitsu.com> wrote:
> 
> > These patchs add tracepoints for per-worklet tracing.
> > Now we have enough tracepoints to start makeing trace_workqueue.c support
> > worklet time mesurement.
> 
> I'm not seing anywhere in this patchset a description of the user
> interface.  What does the operator/programmer actually get to see from
> the kernel as a result of these changes?
> 
> A complete usage example would be an appropriate way of communicating
> all of this.
> 
> The patches introduce a moderate amount of tracing-specific hooks into
> the core workqueue code, which inevitably increases the maintenance
> load for that code.  It is important that it be demonstrated that the
> benefts of the code are worth that cost.  Hence it is important that
> these benefits be demonstrated to us, by yourself.  Please.
> 
> Another way of looking at it: which previously-encountered problems
> would this facility have helped us to solve?  How will this facility
> help us to solve problems in the future?  Looking at this patch series
> I cannot answer those questions!
> 


Hi Andrew,

Although I'm not the author of this patchset, I'm somewhat
involved in the workqueue tracer and I would like to express
my opinion on what is happening.

Until recently, the workqueue tracer was a basic machine.
It was designed to trace the workqueue level. We were not yet
thinking about the worklet level:

- creation of a workqueue thread
- cleanup/destruction of these threads
- insertion of a work in a workqueue
- execution of this work

The output looked like such following histogram:

#   CPU   INSERTED  EXECUTED   WORKQUEUE THREAD
#
      1    125        125       reiserfs/1
      1      0          0       scsi_tgtd/1
      1      0          0       aio/1
      1      0          0       ata/1
      1    114        114       kblockd/1
      1      0          0       kintegrityd/1
      1   2147       2147       events/1
    
      0      0          0       kpsmoused
      0    105        105       reiserfs/0
      0      0          0       scsi_tgtd/0
      0      0          0       aio/0
      0      0          0       ata_aux
      0      0          0       ata/0
      0      0          0       cqueue
      0      0          0       kacpi_notify
      0      0          0       kacpid
      0    149        149       kblockd/0
      0      0          0       kintegrityd/0
      0   1000       1000       khelper
      0   2270       2270       events/0


Its purpose and the information it gave was limited, though
somehow useful.
It was able to provide some informations about the frequency
of the works inserted.

Why that? Because it pushed some useful informations to help a
kernel developer when it's time to decide how to defer async jobs.
Should he use the traditional and whole purposed events workqueues
or a devoted workqueue?

The decision has to be made relatively to:

- the workload it induces. If we might insert too much
  works at a time, we can create unacceptable latencies
  that may delay other queued works. Events/%d are such
  fragile pieces.

- The workload is not only about work insertion frequency
  but also about how much time is spent to process a work.
  Could it be trying to perform too much things which involve
  too much cpu time? Could it sleep for too much time, waiting
  for an event, a very contended lock or something?

- Another kind of reasons that may encumber the events and
  create bad latencies.

After reading this list of requirements, I guess you noticed
that only capturing the frequency of inserted events is only
one variable that composes the workload recipe.
Not only is it giving incomplete information, but it is also
providing erroneous datas to the user which might focuse
on these frequencies when they become large whereas a single work
can create more burden than hundreds of others.


Then, Zhaolei and Kosaki Motohiro came up recently with a good
update for this tracer by posting a patchset which integrates
the per-worklet tracing dimension.
And the output on my laptop became the following:

$ cat /debug/tracing/trace_stat/workqueues 
# CPU  INSERTED  EXECUTED   TASKNAME:PID
# |      |         |        `-WORKFUNC
# |      |         |          |
  0   3822       3898       events/0:9
  0      1          1        |-hpet_work+0x0/0x340
  0     47         46        |-rt_worker_func+0x0/0x400
  0   2655       2654        |-vmstat_update+0x0/0x40
  0      1          1        |-power_supply_changed_work+0x0/0x60
  0    212        212        |-fb_flashcursor+0x0/0x3c0
  0     80         80        |-test_work+0x0/0x64
  0     10          9        |-rekey_seq_generator+0x0/0x70
  0      8          7        |-lru_add_drain_per_cpu+0x0/0x10
  0    455        455        |-console_callback+0x0/0x1e0
  0    249        249        |-flush_to_ldisc+0x0/0x2f0
  0      1          1        |-atkbd_event_work+0x0/0xb0
  0    103         96        `-dst_gc_task+0x0/0x340
  0      0          0       cpuset:11
  0   1378       1378       khelper:12
  0   1378       1378        `-__call_usermodehelper+0x0/0xe0
  0      0          0       kintegrityd/0:178
  0    333        333       kblockd/0:184
  0    233        233        |-blk_unplug_work+0x0/0x110
  0    100        100        `-cfq_kick_queue+0x0/0x50
  0      0          0       kacpid:190
  0      0          0       kacpi_notify:191
  0      0          0       tifm:498
  0   8048       8048       ata/0:507
  0   8048       8048        `-ata_pio_task+0x0/0x280
  0      0          0       ata_aux:511
  0      0          0       aio/0:871
  0      0          0       crypto/0:897
  0      0          0       scsi_tgtd/0:2368
  0      0          0       iscsi_eh:2387
  0      0          0       kpsmoused:2521
  0      0          0       hd-audio0:2660
  0      0          0       krxrpcd/0:2745
  0     59         59       reiserfs/0:2895
  0     59         59        `-flush_async_commits+0x0/0x90
  1   5480       5503       events/1:10
  1      1          1        |-hpet_work+0x0/0x340
  1   2819       2818        |-vmstat_update+0x0/0x40
  1    464        464        |-console_callback+0x0/0x1e0
  1     80         80        |-test_work+0x0/0x64
  1      1          1        |-sysfs_add_func+0x0/0xb0
  1      8          8        |-lru_add_drain_per_cpu+0x0/0x10
  1      1          1        |-sync_cmos_clock+0x0/0x140
  1   1823       1823        |-fb_flashcursor+0x0/0x3c0
  1    282        282        |-flush_to_ldisc+0x0/0x2f0
  1      1          1        `-atkbd_event_work+0x0/0xb0
  1      0          0       kintegrityd/1:181
  1    254        254       kblockd/1:185
  1    148        148        |-blk_unplug_work+0x0/0x110
  1    106        106        `-cfq_kick_queue+0x0/0x50
  1  10988      10988       ata/1:509
  1  10988      10988        `-ata_pio_task+0x0/0x280
  1      0          0       aio/1:878
  1      0          0       crypto/1:898
  1      0          0       scsi_tgtd/1:2370
  1      0          0       krxrpcd/1:2746
  1     25         25       reiserfs/1:2896
  1     25         25        `-flush_async_commits+0x0/0x90


The trace has evolved by decomposing these frequency stats
per-worklet. Not only does it help for a driver writer to
instrument its own async jobs, but also it can help
any developer to find the culprit source of a latency
issued from a workqueue.

It can at least show a first direction of investigation, because
we are still only talking about frequency.
And speaking about incomplete informations, this output is only
a step before the main goal which is to provide the average and the
maximum time of execution for each worklet.


# CPU  INSERTED  EXECUTED  AVG TIME  MAX TIME	 TASKNAME:PID
# |      |         |          |         |        `-WORKFUNC
# |      |         |          |         |          |
  0   3822       3898        100 us    200us       events/0:9
  0      1          1        12 us     15us        |-hpet_work+0x0/0x340
  0     47         46        10 us     45us        |-rt_worker_func+0x0/0x400


With that we'll have much more indications to start a latency investigation.
But to get this end result, we needed the changes done on this latest patchset
from Zhaolei and Kosaki which, in the middle of other useful changes, turns
the workqueue_execution event into a new dichotomous capture with:

- trace_work_execute
- trace_work_complete

It was a required piece to perform the time of execution measurement.

Also, as I said, the statistical/histogram output is only the first step
of any worklet instrumentation. Once we focuse in a specific culprit,
we can trace all of its lifecycle using the trace events:
insertion, execution, completion, delayed insertion, cancelling, and so on.

All these events provide all the required fine grained instrumentation
to learn any worklet lifecycle details. Combining with well chosen events
toggling and event's fields filtering, we can focus on the only relevant
events at the time. 

Example of such trace:

# cd /debug/tracing

# ls events/workqueue/
filter          worklet_complete  worklet_enqueue_delayed  workqueue_creation     workqueue_flush
worklet_cancel  worklet_enqueue   worklet_execute          workqueue_destruction

# for e in $(ls events/workqueue); do echo 1 > events/workqueue/$e/enable; done

# tail -f trace

 events/1-10    [001]  4687.187314: worklet_execute: thread=events/1:10 work=ffff880001b27200 func=vmstat_update+0x0/0x40
 events/1-10    [001]  4687.187330: worklet_enqueue_delayed: thread=events/1:10 func=vmstat_update+0x0/0x40 cpu=-1 delay=250
 events/1-10    [001]  4687.187331: worklet_complete: thread=events/1:10 work=ffff880001b27200
 events/0-9     [000]  4687.250236: worklet_execute: thread=events/0:9 work=ffff880001b0c200 func=vmstat_update+0x0/0x40
 events/0-9     [000]  4687.250250: worklet_enqueue_delayed: thread=events/0:9 func=vmstat_update+0x0/0x40 cpu=-1 delay=237
 events/0-9     [000]  4687.250251: worklet_complete: thread=events/0:9 work=ffff880001b0c200

If more details are needed, such events could fit nicely inside with a
function graph tracer output.

So this latest patchset provides all these required informations on the events
tracing level. The next step is likely to be on the stat tracing to provide the avg/max time
of execution.


Thanks,
Frederic.


  reply	other threads:[~2009-04-24 22:59 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-13  0:53 [PATCH 0/1] tracing, workqueuetrace: Make workqueue tracepoints use TRACE_EVENT macro Zhaolei
2009-04-13  0:54 ` [PATCH 1/1] " Zhaolei
2009-04-13  1:44   ` KOSAKI Motohiro
2009-04-13  3:25     ` KOSAKI Motohiro
2009-04-13  3:49       ` Zhaolei
2009-04-13  3:57         ` Ingo Molnar
2009-04-13  4:07           ` Re: [PATCH 1/1] tracing, workqueuetrace: Make workqueuetracepoints " Zhaolei
2009-04-13  3:57         ` [PATCH 1/1] tracing, workqueuetrace: Make workqueue tracepoints " KOSAKI Motohiro
2009-04-13  5:52           ` [PATCH v2 1/4] ftrace, " KOSAKI Motohiro
2009-04-13  5:53             ` [PATCH v2 2/4] ftrace: introduce workqueue_handler_exit tracepoint and rename workqueue_execution to workqueue_handler_entry KOSAKI Motohiro
2009-04-13 16:25               ` Frederic Weisbecker
2009-04-15 10:22                 ` Oleg Nesterov
2009-04-15 11:09                   ` Ingo Molnar
2009-04-15 11:33                     ` Oleg Nesterov
2009-04-15 11:45                       ` Ingo Molnar
2009-04-15 12:05                         ` Oleg Nesterov
2009-04-13  5:53             ` [PATCH v2 3/4] ftrace: add max execution time mesurement to workqueue tracer KOSAKI Motohiro
2009-04-13 16:16               ` Frederic Weisbecker
2009-04-13 21:21                 ` Ingo Molnar
2009-04-14  1:43                 ` KOSAKI Motohiro
2009-04-14 11:40                   ` Frederic Weisbecker
2009-04-15  0:31                     ` KOSAKI Motohiro
2009-04-13  5:55             ` [PATCH v2 4/4] ftrace: add latecy mesurement feature " KOSAKI Motohiro
2009-04-14  2:50               ` KOSAKI Motohiro
2009-04-13 15:24             ` [PATCH v2 1/4] ftrace, workqueuetrace: Make workqueue tracepoints use TRACE_EVENT macro Frederic Weisbecker
2009-04-14  4:03             ` [PATCH v2 5/4] ftrace, workqueuetrace: display work name KOSAKI Motohiro
2009-04-14 21:16               ` Frederic Weisbecker
2009-04-14 23:55                 ` KOSAKI Motohiro
2009-04-15  1:15                   ` Frederic Weisbecker
2009-04-15  6:13                     ` KOSAKI Motohiro
2009-04-15  6:17                       ` Zhaolei
2009-04-15  9:44                       ` Ingo Molnar
2009-04-15 16:23                       ` Frederic Weisbecker
2009-04-17  6:56                       ` [PATCH v3 0/1] ftrace, workqueuetrace: Make workqueue tracepoints use TRACE_EVENT macro Zhaolei
2009-04-17  7:15                         ` [PATCH v3 1/1] " Zhaolei
2009-04-17 13:45                           ` Ingo Molnar
2009-04-20  1:30                             ` [PATCH v3 1/1] ftrace, workqueuetrace: Make workqueuetracepoints " Zhaolei
2009-04-20  1:38                               ` KOSAKI Motohiro
2009-04-20  1:43                                 ` Zhaolei
2009-04-20  1:49                                   ` KOSAKI Motohiro
2009-04-20  8:46                                     ` Ingo Molnar
2009-04-20 22:25                                       ` Oleg Nesterov
2009-04-20 23:48                                         ` Frederic Weisbecker
2009-04-21 15:28                                           ` Oleg Nesterov
2009-04-21 15:50                                             ` Oleg Nesterov
2009-04-21 18:33                                               ` Frederic Weisbecker
2009-04-21 18:28                                             ` Frederic Weisbecker
2009-04-21 19:37                                               ` Oleg Nesterov
2009-04-24 11:42                             ` [PATCH 0/4] workqueue_tracepoint: Add worklet tracepoints for worklet lifecycle tracing Zhaolei
2009-04-24 11:43                               ` [PATCH 1/4] workqueue_tracepoint: introduce workqueue_handler_exit tracepoint and rename workqueue_execution to workqueue_handler_entry Zhaolei
2009-04-24 11:45                               ` [PATCH 2/4] workqueue_tracepoint: Add workqueue_flush and worklet_cancel tracepoint Zhaolei
2009-04-24 11:45                               ` [PATCH 3/4] workqueue_tracepoint: Change tracepoint name to fit worklet and workqueue lifecycle Zhaolei
2009-04-24 11:46                               ` [PATCH 4/4] workqueue_trace: Separate worklet_insertion into worklet_enqueue and worklet_enqueue_delayed Zhaolei
2009-04-24 20:06                               ` [PATCH 0/4] workqueue_tracepoint: Add worklet tracepoints for worklet lifecycle tracing Andrew Morton
2009-04-24 22:59                                 ` Frederic Weisbecker [this message]
2009-04-24 23:20                                   ` Andrew Morton
2009-04-25  0:37                                     ` Frederic Weisbecker
2009-04-25  1:28                                       ` Andrew Morton
2009-04-25  2:00                                         ` Steven Rostedt
2009-04-25  2:24                                           ` Andrew Morton
2009-04-25  2:51                                             ` Steven Rostedt
2009-04-25  3:10                                               ` Andrew Morton
2009-04-25  3:32                                                 ` Steven Rostedt
2009-04-26 10:47                                         ` Ingo Molnar
2009-04-27  5:44                                           ` Andrew Morton
2009-04-27 15:02                                             ` Oleg Nesterov
2009-04-27 15:43                                               ` Ingo Molnar
2009-04-27 19:09                                                 ` Oleg Nesterov
2009-04-28 13:42                                               ` Frederic Weisbecker
2009-04-28 16:43                                                 ` Oleg Nesterov
2009-04-28 17:49                                                   ` Frederic Weisbecker
2009-04-24 23:27                                   ` Frederic Weisbecker
2009-04-28 17:24                                 ` Frank Ch. Eigler
2009-04-28 18:48                                   ` Andrew Morton
2009-04-28 20:51                                     ` Frank Ch. Eigler
2009-04-29  4:03                                     ` KOSAKI Motohiro
2009-04-29  4:29                                       ` Andrew Morton
2009-04-29  5:21                                         ` KOSAKI Motohiro
2009-04-20 17:11                           ` [PATCH v3 1/1] ftrace, workqueuetrace: Make workqueue tracepoints use TRACE_EVENT macro Frederic Weisbecker
2009-04-21  1:20                             ` KOSAKI Motohiro
2009-04-20  6:42                       ` [PATCH 0/4] ftrace, workqueuetrace: Add worklet informationame Zhaolei
2009-04-20  6:58                         ` [PATCH 1/4] trace_workqueue: use list_for_each_entry() instead of list_for_each_entry_safe() Zhaolei
2009-04-20  7:26                           ` Frederic Weisbecker
2009-04-20  6:59                         ` [PATCH 2/4] trace_workqueue: Remove cpu_workqueue_stats->first_entry Zhaolei
2009-04-20  7:02                         ` [PATCH 3/4] trace_workqueue: Remove blank line between each cpu Zhaolei
2009-04-20  7:09                         ` [PATCH 4/4] trace_workqueue: Add worklet information Zhaolei
2009-04-20 11:36                           ` Frederic Weisbecker
2009-04-21  1:57                             ` Zhaolei
2009-04-21 23:39                               ` Frederic Weisbecker
2009-04-20  7:23                         ` [PATCH 0/4] ftrace, workqueuetrace: Add worklet informationame Frederic Weisbecker
2009-04-13 14:34     ` [PATCH 1/1] tracing, workqueuetrace: Make workqueue tracepoints use TRACE_EVENT macro Frederic Weisbecker
2009-04-13 14:13   ` Frederic Weisbecker
2009-04-15  6:47     ` KOSAKI Motohiro
2009-04-13 13:57 ` [PATCH 0/1] " Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090424225909.GA6658@nowhere \
    --to=fweisbec@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tzanussi@gmail.com \
    --cc=zhaolei@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).