All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Antonov <alexander.antonov@linux.intel.com>,
	Alexei Budankov <abudankov@huawei.com>,
	Riccardo Mancini <rickyman7@gmail.com>
Subject: Re: [PATCH v12 00/16] Introduce threaded trace streaming for basic perf record operation
Date: Sun, 5 Dec 2021 16:15:08 +0100	[thread overview]
Message-ID: <YazXfO9CoCaR4THO@krava> (raw)
In-Reply-To: <cover.1637675515.git.alexey.v.bayduraev@linux.intel.com>

On Tue, Nov 23, 2021 at 05:07:56PM +0300, Alexey Bayduraev wrote:
> Changes in v12:
> - fixed nr_threads=1 cases
> - fixed "Woken up %ld times" message
> - removed unnecessary record__fini_thread_masks function
> - moved bytes written/compressed statistics to struct record_thread
> - moved all unnecessary debug messages to verbose=2 level
> - renamed "socket" option to "package" for consistency with util/cputopo.h
> - excluded single trace file reading patches
> 
> v11: https://lore.kernel.org/lkml/cover.1629186429.git.alexey.v.bayduraev@linux.intel.com/


I'm getting perf record hang with:

[root@krava perf]# ./perf record --threads
^C[ perf record: Woken up 1 times to write data ]

^C^C^C^C^C




with following backtrace:


(gdb) bt
#0  0x00007f8115d2885f in __GI___poll (fds=fds@entry=0x7ffd2116b930, nfds=1, timeout=1000) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f811574029e in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=0x7ffd2116b930) at /usr/include/bits/poll2.h:48
#2  Curl_poll (timeout_ms=<optimized out>, nfds=<optimized out>, ufds=0x7ffd2116b930) at ../../lib/select.c:374
#3  Curl_poll (ufds=0x7ffd2116b930, nfds=<optimized out>, timeout_ms=<optimized out>) at ../../lib/select.c:329
#4  0x00007f8115739cf7 in multi_wait (multi=multi@entry=0x3458c50, extra_fds=extra_fds@entry=0x0, extra_nfds=extra_nfds@entry=0, timeout_ms=<optimized out>, 
    timeout_ms@entry=1000, ret=ret@entry=0x0, extrawait=extrawait@entry=false, use_wakeup=<optimized out>) at ../../lib/multi.c:1282
#5  0x00007f8115739f26 in multi_wait (use_wakeup=false, extrawait=false, ret=0x0, timeout_ms=1000, extra_nfds=0, extra_fds=0x0, multi=0x3458c50)
    at ../../lib/multi.c:1410
#6  curl_multi_wait (multi=multi@entry=0x3458c50, extra_fds=extra_fds@entry=0x0, extra_nfds=extra_nfds@entry=0, timeout_ms=timeout_ms@entry=1000, 
    ret=ret@entry=0x0) at ../../lib/multi.c:1411
#7  0x00007f811653a7f2 in debuginfod_query_server (c=c@entry=0x2571fb0, build_id=build_id@entry=0x7ffd2116ed70 "c6eee0984964c63e328d13be49d68bd52595ad00", 
    build_id_len=build_id_len@entry=0, type=type@entry=0x7f811653c3ba "debuginfo", filename=filename@entry=0x0, path=path@entry=0x7ffd2116dcf8)
    at /usr/src/debug/elfutils-0.186-1.fc35.x86_64/debuginfod/debuginfod-client.c:1057
#8  0x00007f811653b2a6 in debuginfod_find_debuginfo (client=client@entry=0x2571fb0, 
    build_id=build_id@entry=0x7ffd2116ed70 "c6eee0984964c63e328d13be49d68bd52595ad00", build_id_len=build_id_len@entry=0, path=path@entry=0x7ffd2116dcf8)
    at /usr/src/debug/elfutils-0.186-1.fc35.x86_64/debuginfod/debuginfod-client.c:1511
#9  0x00000000004b18e1 in build_id_cache__find_debug (nsi=0x0, sbuild_id=0x7ffd2116ed70 "c6eee0984964c63e328d13be49d68bd52595ad00") at util/build-id.c:660
#10 build_id_cache__add (sbuild_id=sbuild_id@entry=0x7ffd2116ed70 "c6eee0984964c63e328d13be49d68bd52595ad00", 
    name=name@entry=0x259be50 "/lib/modules/5.15.6-200.fc35.x86_64/kernel/sound/core/seq/snd-seq-dummy.ko.xz", 
    realname=realname@entry=0x27255e0 "/usr/lib/modules/5.15.6-200.fc35.x86_64/kernel/sound/core/seq/snd-seq-dummy.ko.xz", nsi=nsi@entry=0x0, 
    is_kallsyms=is_kallsyms@entry=false, is_vdso=is_vdso@entry=false) at util/build-id.c:724
#11 0x00000000004b1b24 in build_id_cache__add_s (is_vdso=false, is_kallsyms=false, nsi=0x0, 
    name=0x259be50 "/lib/modules/5.15.6-200.fc35.x86_64/kernel/sound/core/seq/snd-seq-dummy.ko.xz", 
    sbuild_id=0x7ffd2116ed70 "c6eee0984964c63e328d13be49d68bd52595ad00") at util/build-id.c:805
#12 build_id_cache__add_b (is_vdso=false, is_kallsyms=<optimized out>, nsi=<optimized out>, name=<optimized out>, bid=<optimized out>) at util/build-id.c:821
#13 dso__cache_build_id (dso=<optimized out>, machine=<optimized out>, priv=<optimized out>) at util/build-id.c:889
#14 0x0000000000501674 in machine__for_each_dso (machine=machine@entry=0x2512d58, fn=fn@entry=0x4b1a00 <dso__cache_build_id>, priv=priv@entry=0x0)
    at util/machine.c:3179
#15 0x00000000004b1f09 in machines__for_each_dso (priv=0x0, fn=0x4b1a00 <dso__cache_build_id>, machines=0x2512d58) at util/build-id.c:896
#16 __perf_session__cache_build_ids (priv=0x0, fn=0x4b1a00 <dso__cache_build_id>, session=0x2512b50) at util/build-id.c:917
#17 __perf_session__cache_build_ids (session=0x2512b50, fn=0x4b1a00 <dso__cache_build_id>, priv=0x0) at util/build-id.c:908
#18 0x00000000004e80fb in write_build_id (ff=ff@entry=0x7ffd2116ef40, evlist=evlist@entry=0x2506050) at util/header.c:329
#19 0x00000000004f15c1 in do_write_feat (evlist=0x2506050, p=<synthetic pointer>, type=2, ff=0x7ffd2116ef40) at util/header.c:3428
#20 perf_header__adds_write (fd=3, evlist=0x2506050, header=0x2512b50) at util/header.c:3473
#21 perf_session__write_header (session=0x2512b50, evlist=0x2506050, fd=3, at_exit=at_exit@entry=true) at util/header.c:3565
#22 0x0000000000423ec4 in record__finish_output (rec=0xbba300 <record>) at builtin-record.c:1673
#23 0x0000000000426374 in __cmd_record (argc=argc@entry=0, argv=argv@entry=0x7ffd21174770, rec=0xbba300 <record>) at builtin-record.c:2600
#24 0x00000000004298e7 in cmd_record (argc=<optimized out>, argv=<optimized out>) at builtin-record.c:3969
#25 0x00000000004a42eb in run_builtin (p=p@entry=0xbd0f78 <commands+216>, argc=argc@entry=2, argv=argv@entry=0x7ffd21174770) at perf.c:313
#26 0x000000000040ec23 in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365
#27 run_argv (argv=<optimized out>, argcp=<optimized out>) at perf.c:409
#28 main (argc=2, argv=0x7ffd21174770) at perf.c:539


jirka

> 
> Changes in v11:
> - removed python dependency on zstd (perf test 19)
> - captured tags from Riccardo Mancini 
> 
> v10: https://lore.kernel.org/lkml/cover.1626072008.git.alexey.v.bayduraev@linux.intel.com/
> 
> Changes in v10:
> - renamed fdarray__clone to fdarray__dup_entry_from
> - captured Acked-by: tags by Namhyung Kim for 09/24
> 
> v9: https://lore.kernel.org/lkml/cover.1625227739.git.alexey.v.bayduraev@linux.intel.com/
> 
> Changes in v9:
> - fixes in [v9 01/24]:
>   - move 'nr_threads' to before 'thread_masks'
>   - combined decl+assign into one line in record__thread_mask_alloc
>   - releasing masks inplace in record__alloc_thread_masks
> - split patch [v8 02/22] to [v9 02/24] and [v9 03/24]
> - fixes in [v9 03/24]:
>   - renamed 'struct thread_data' to 'struct record_thread'
>   - moved nr_mmaps after ctlfd_pos
>   - releasing resources inplace in record__thread_data_init_maps
>   - initializing pipes by -1 value
>   - added temporary gettid() wrapper
> - split patch [v8 03/22] to [v9 04/24] and [v9 05/24] 
> - removed upstreamed [v8 09/22]
> - split [v8 10/22] to [v9 12/24] and [v9 13/24]
> - moved --threads documentation to the related patches
> - fixed output of written/compressed stats in [v9 10/24]
> - split patch [v8 12/22] to [v9 15/24] and [v9 16/24]
> - fixed order of error checking for decompressed events in [v9 16/24]
> - merged patch [v8 21/22] with [v9 23/24] and [v9 24/24]
> - moved patch [v8 22/22] to [v9 09/24]
> - added max reader size constant in [v9 24/24]
> 
> v8: https://lore.kernel.org/lkml/cover.1625065643.git.alexey.v.bayduraev@linux.intel.com/
> 
> Changes in v8:
> - captured Acked-by: tags by Namhyung Kim
> - merged with origin/perf/core
> - added patch 21/22 introducing READER_NODATA state
> - added patch 22/22 fixing --max-size option
> 
> v7: https://lore.kernel.org/lkml/cover.1624350588.git.alexey.v.bayduraev@linux.intel.com/
> 
> Changes in v7:
> - fixed possible crash after out_free_threads label
> - added missing pthread_attr_destroy() call
> - added check of correctness of user masks 
> - fixed zsts_data finalization
> 
> v6: https://lore.kernel.org/lkml/cover.1622025774.git.alexey.v.bayduraev@linux.intel.com/
> 
> Changes in v6:
> - fixed leaks and possible double free in record__thread_mask_alloc()
> - fixed leaks in record__init_thread_user_masks()
> - fixed final mmaps flushing for threads id > 0
> - merged with origin/perf/core
> 
> v5: https://lore.kernel.org/lkml/cover.1619781188.git.alexey.v.bayduraev@linux.intel.com/
> 
> Changes in v5:
> - fixed leaks in record__init_thread_masks_spec()
> - fixed leaks after failed realloc
> - replaced "%m" to strerror()
> - added masks examples to the documentation
> - captured Acked-by: tags by Andi Kleen
> - do not allow --thread option for full_auxtrace mode 
> - split patch 06/12 to 06/20 and 07/20
> - split patch 08/12 to 09/20 and 10/20
> - split patches 11/12 and 11/12 to 13/20-20/20
> 
> v4: https://lore.kernel.org/lkml/6c15adcb-6a9d-320e-70b5-957c4c8b6ff2@linux.intel.com/
> 
> Changes in v4:
> - renamed 'comm' structure to 'pipes'
> - moved thread fd/maps messages to verbose=2
> - fixed leaks during allocation of thread_data structures
> - fixed leaks during allocation of thread masks
> - fixed possible fails when releasing thread masks
> 
> v3: https://lore.kernel.org/lkml/7d197a2d-56e2-896d-bf96-6de0a4db1fb8@linux.intel.com/
> 
> Changes in v3:
> - avoided skipped redundant patch 3/15
> - applied "data file" and "data directory" terms allover the patch set
> - captured Acked-by: tags by Namhyung Kim
> - avoided braces where don't needed
> - employed thread local variable for serial trace streaming 
> - added specs for --thread option - core, socket, numa and user defined
> - added parallel loading of data directory files similar to the prototype [1]
> 
> v2: https://lore.kernel.org/lkml/1ec29ed6-0047-d22f-630b-a7f5ccee96b4@linux.intel.com/
> 
> Changes in v2:
> - explicitly added credit tags to patches 6/15 and 15/15,
>   additionally to cites [1], [2]
> - updated description of 3/15 to explicitly mention the reason
>   to open data directories in read access mode (e.g. for perf report)
> - implemented fix for compilation error of 2/15
> - explicitly elaborated on found issues to be resolved for
>   threaded AUX trace capture
> 
> v1: https://lore.kernel.org/lkml/810f3a69-0004-9dff-a911-b7ff97220ae0@linux.intel.com/
> 
> Patch set provides parallel threaded trace streaming mode for basic
> perf record operation. Provided mode mitigates profiling data losses
> and resolves scalability issues of serial and asynchronous (--aio)
> trace streaming modes on multicore server systems. The design and
> implementation are based on the prototype [1], [2].
> 
> Parallel threaded mode executes trace streaming threads that read kernel
> data buffers and write captured data into several data files located at
> data directory. Layout of trace streaming threads and their mapping to data
> buffers to read can be configured using a value of --thread command line
> option. Specification value provides masks separated by colon so the masks
> define cpus to be monitored by one thread and thread affinity mask is
> separated by slash. <cpus mask 1>/<affinity mask 1>:<cpu mask 2>/<affinity mask 2>
> specifies parallel threads layout that consists of two threads with
> corresponding assigned cpus to be monitored. Specification value can be
> a string e.g. "cpu", "core" or "socket" meaning creation of data streaming
> thread for monitoring every cpu, whole core or socket. The option provided
> with no or empty value defaults to "cpu" layout creating data streaming
> thread for every cpu being monitored. Specification masks are filtered
> by the mask provided via -C option.
> 
> Parallel streaming mode is compatible with Zstd compression/decompression
> (--compression-level) and external control commands (--control). The mode
> is not enabled for pipe mode. The mode is not enabled for AUX area tracing,
> related and derived modes like --snapshot or --aux-sample. --switch-output-*
> and --timestamp-filename options are not enabled for parallel streaming.
> Initial intent to enable AUX area tracing faced the need to define some
> optimal way to store index data in data directory. --switch-output-* and
> --timestamp-filename use cases are not clear for data directories.
> Asynchronous(--aio) trace streaming and affinity (--affinity) modes are
> mutually exclusive to parallel streaming mode.
> 
> Basic analysis of data directories is provided in perf report mode.
> Raw dump and aggregated reports are available for data directories,
> still with no memory consumption optimizations.
> 
> Tested:
> 
> tools/perf/perf record -o prof.data --threads -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data --threads= -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data --threads=cpu -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data --threads=core -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data --threads=socket -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data --threads=numa -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data -C 2,5 --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data -C 3,4 --threads=0-3/3:4-7/4 -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data -C 0,4,2,6 --threads=core -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data -C 0,4,2,6 --threads=numa -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data --threads -g --call-graph dwarf,4096 -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data --threads -g --call-graph dwarf,4096 --compression-level=3 -- matrix.gcc.g.O3
> tools/perf/perf record -o prof.data --threads -a
> tools/perf/perf record -D -1 -e cpu-cycles -a --control fd:10,11 -- sleep 30
> tools/perf/perf record --threads -D -1 -e cpu-cycles -a --control fd:10,11 -- sleep 30
> 
> tools/perf/perf report -i prof.data
> tools/perf/perf report -i prof.data --call-graph=callee
> tools/perf/perf report -i prof.data --stdio --header
> tools/perf/perf report -i prof.data -D --header
> 
> [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
> [2] https://lore.kernel.org/lkml/20180913125450.21342-1-jolsa@kernel.org/
> 
> Alexey Bayduraev (16):
>   perf record: Introduce thread affinity and mmap masks
>   tools lib: Introduce fdarray duplicate function
>   perf record: Introduce thread specific data array
>   perf record: Introduce function to propagate control commands
>   perf record: Introduce thread local variable
>   perf record: Stop threads in the end of trace streaming
>   perf record: Start threads in the beginning of trace streaming
>   perf record: Introduce data file at mmap buffer object
>   perf record: Introduce bytes written stats
>   perf record: Introduce compressor at mmap buffer object
>   perf record: Introduce data transferred and compressed stats
>   perf record: Introduce --threads command line option
>   perf record: Extend --threads command line option
>   perf record: Implement compatibility checks
>   perf session: Load data directory files for analysis
>   perf report: Output data file name in raw trace dump
> 
>  tools/lib/api/fd/array.c                 |   17 +
>  tools/lib/api/fd/array.h                 |    1 +
>  tools/perf/Documentation/perf-record.txt |   30 +
>  tools/perf/builtin-inject.c              |    3 +-
>  tools/perf/builtin-kvm.c                 |    2 +-
>  tools/perf/builtin-record.c              | 1167 ++++++++++++++++++++--
>  tools/perf/builtin-top.c                 |    2 +-
>  tools/perf/builtin-trace.c               |    2 +-
>  tools/perf/util/evlist.c                 |   16 +
>  tools/perf/util/evlist.h                 |    1 +
>  tools/perf/util/mmap.c                   |   10 +
>  tools/perf/util/mmap.h                   |    3 +
>  tools/perf/util/ordered-events.c         |    3 +-
>  tools/perf/util/ordered-events.h         |    3 +-
>  tools/perf/util/record.h                 |    2 +
>  tools/perf/util/session.c                |  208 +++-
>  tools/perf/util/session.h                |    3 +-
>  tools/perf/util/tool.h                   |    3 +-
>  18 files changed, 1370 insertions(+), 106 deletions(-)
> 
> -- 
> 2.19.0
> 


  parent reply	other threads:[~2021-12-05 15:15 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-23 14:07 [PATCH v12 00/16] Introduce threaded trace streaming for basic perf record operation Alexey Bayduraev
2021-11-23 14:07 ` [PATCH v12 01/16] perf record: Introduce thread affinity and mmap masks Alexey Bayduraev
2021-12-05 15:13   ` Jiri Olsa
2021-12-20 10:37     ` Bayduraev, Alexey V
2021-11-23 14:07 ` [PATCH v12 02/16] tools lib: Introduce fdarray duplicate function Alexey Bayduraev
2021-11-23 14:07 ` [PATCH v12 03/16] perf record: Introduce thread specific data array Alexey Bayduraev
2021-11-23 14:08 ` [PATCH v12 04/16] perf record: Introduce function to propagate control commands Alexey Bayduraev
2021-11-23 14:08 ` [PATCH v12 05/16] perf record: Introduce thread local variable Alexey Bayduraev
2021-12-05 15:14   ` Jiri Olsa
2021-12-20  9:49     ` Bayduraev, Alexey V
2021-11-23 14:08 ` [PATCH v12 06/16] perf record: Stop threads in the end of trace streaming Alexey Bayduraev
2021-11-23 14:08 ` [PATCH v12 07/16] perf record: Start threads in the beginning " Alexey Bayduraev
2021-11-23 14:08 ` [PATCH v12 08/16] perf record: Introduce data file at mmap buffer object Alexey Bayduraev
2021-12-05 15:14   ` Jiri Olsa
2021-11-23 14:08 ` [PATCH v12 09/16] perf record: Introduce bytes written stats Alexey Bayduraev
2021-12-05 15:14   ` Jiri Olsa
2021-12-06 11:22     ` Bayduraev, Alexey V
2021-11-23 14:08 ` [PATCH v12 10/16] perf record: Introduce compressor at mmap buffer object Alexey Bayduraev
2021-11-23 14:08 ` [PATCH v12 11/16] perf record: Introduce data transferred and compressed stats Alexey Bayduraev
2021-11-23 14:08 ` [PATCH v12 12/16] perf record: Introduce --threads command line option Alexey Bayduraev
2021-11-23 14:08 ` [PATCH v12 13/16] perf record: Extend " Alexey Bayduraev
2021-12-05 15:13   ` Jiri Olsa
2021-12-05 15:14   ` Jiri Olsa
2021-12-05 15:14   ` Jiri Olsa
2021-12-05 15:14   ` Jiri Olsa
2021-11-23 14:08 ` [PATCH v12 14/16] perf record: Implement compatibility checks Alexey Bayduraev
2021-11-23 14:08 ` [PATCH v12 15/16] perf session: Load data directory files for analysis Alexey Bayduraev
2021-11-23 14:08 ` [PATCH v12 16/16] perf report: Output data file name in raw trace dump Alexey Bayduraev
2021-12-05 15:15 ` Jiri Olsa [this message]
2021-12-09 17:56   ` [PATCH v12 00/16] Introduce threaded trace streaming for basic perf record operation Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YazXfO9CoCaR4THO@krava \
    --to=jolsa@redhat.com \
    --cc=abudankov@huawei.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.antonov@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.v.bayduraev@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rickyman7@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.