linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
@ 2025-04-30 20:55 Namhyung Kim
  2025-04-30 20:55 ` [PATCH 01/11] perf hist: Remove output field from sort-list properly Namhyung Kim
                   ` (12 more replies)
  0 siblings, 13 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

Hello,

The perf mem uses PERF_SAMPLE_DATA_SRC which has a lot of information
for memory access.  It has various sort keys to group related samples
together but it's still cumbersome to see the result.  While perf c2c
command provides a way to investigate the data in a specific way, I'd
like to add more generic ways using new output fields.

For example, the following is the 'cache' output field which breaks
down the sample weights into different level of caches.

  $ perf mem record -a sleep 1
  
  $ perf mem report -F cache,dso,sym --stdio
  ...
  #
  # -------------- Cache --------------
  #      L1     L2     L3 L1-buf  Other  Shared Object                                  Symbol
  # ...................................  .....................................  .........................................
  #
       0.0%   0.0%   0.0%   0.0% 100.0%  [kernel.kallsyms]                      [k] ioread8
     100.0%   0.0%   0.0%   0.0%   0.0%  [kernel.kallsyms]                      [k] _raw_spin_lock_irq
       0.0%   0.0%   0.0%   0.0% 100.0%  [xhci_hcd]                             [k] xhci_update_erst_dequeue
       0.0%   0.0%   0.0%  95.8%   4.2%  [kernel.kallsyms]                      [k] smaps_account
       0.6%   1.8%  22.7%  45.5%  29.5%  [kernel.kallsyms]                      [k] sched_balance_update_blocked_averages
      29.4%   0.0%   1.6%  58.8%  10.2%  [kernel.kallsyms]                      [k] __update_load_avg_cfs_rq
       0.0%   8.5%   4.3%   0.0%  87.2%  [kernel.kallsyms]                      [k] copy_mc_enhanced_fast_string
      63.9%   0.0%   8.0%  23.8%   4.3%  [kernel.kallsyms]                      [k] psi_group_change
       3.9%   0.0%   9.3%  35.7%  51.1%  [kernel.kallsyms]                      [k] timerqueue_add
      35.9%  10.9%   0.0%  39.0%  14.2%  [kernel.kallsyms]                      [k] memcpy
      94.1%   0.0%   0.0%   5.9%   0.0%  [kernel.kallsyms]                      [k] unmap_page_range
      25.7%   0.0%   4.9%  51.0%  18.4%  [kernel.kallsyms]                      [k] __update_load_avg_se
       0.0%  24.9%  19.4%   9.6%  46.1%  [kernel.kallsyms]                      [k] _copy_to_iter
      12.9%   0.0%   0.0%  87.1%   0.0%  [kernel.kallsyms]                      [k] next_uptodate_folio
      36.8%   0.0%   9.5%  16.6%  37.1%  [kernel.kallsyms]                      [k] update_curr
     100.0%   0.0%   0.0%   0.0%   0.0%  bpf_prog_b9611ccbbb3d1833_dfs_iter     [k] bpf_prog_b9611ccbbb3d1833_dfs_iter
      45.4%   1.8%  20.4%  23.6%   8.8%  [kernel.kallsyms]                      [k] audit_filter_rules.isra.0
      92.8%   0.0%   0.0%   7.2%   0.0%  [kernel.kallsyms]                      [k] filemap_map_pages
      10.6%   0.0%   0.0%  89.4%   0.0%  [kernel.kallsyms]                      [k] smaps_page_accumulate
      38.3%   0.0%  29.6%  27.1%   5.0%  [kernel.kallsyms]                      [k] __schedule

Please see the description of each commit for other fields.

New mem_stat field was added to the hist_entry to save this
information.  It's a generic data structure (array) to handle
different type of information like cache-level, memory location,
snoop-result, etc.

The first patch is a fix for the hierarchy mode and it was sent
separately.  I just add it here not to break the hierarchy mode.  The
second patch is to enable SAMPLE_DATA_SRC without SAMPLE_ADDR and
perf_event_attr.mmap_data which generate a lot more data.

The name of some new fields are the same as the corresponding sort
keys (mem, op, snoop) so I had to change the order whether it's
applied as an output field or a sort key.  Maybe it's better to name
them differently but I couldn't come up with better ideas.

That means, you need to use -F/--fields option to specify those fields
and the sort keys you want.  Maybe we can change the default output
and sort keys for perf mem report with this.

The code is available at 'perf/mem-field-v1' branch in

 git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (11):
  perf hist: Remove output field from sort-list properly
  perf record: Add --sample-mem-info option
  perf hist: Support multi-line header
  perf hist: Add struct he_mem_stat
  perf hist: Basic support for mem_stat accounting
  perf hist: Implement output fields for mem stats
  perf mem: Add 'op' output field
  perf hist: Hide unused mem stat columns
  perf mem: Add 'cache' and 'memory' output fields
  perf mem: Add 'snoop' output field
  perf mem: Add 'dtlb' output field

 tools/perf/Documentation/perf-record.txt |   7 +-
 tools/perf/builtin-record.c              |   6 +
 tools/perf/ui/browsers/hists.c           |  50 ++++-
 tools/perf/ui/hist.c                     | 272 ++++++++++++++++++++++-
 tools/perf/ui/stdio/hist.c               |  57 +++--
 tools/perf/util/evsel.c                  |   2 +-
 tools/perf/util/hist.c                   |  78 +++++++
 tools/perf/util/hist.h                   |  22 ++
 tools/perf/util/mem-events.c             | 183 ++++++++++++++-
 tools/perf/util/mem-events.h             |  57 +++++
 tools/perf/util/record.h                 |   1 +
 tools/perf/util/sort.c                   |  42 +++-
 12 files changed, 718 insertions(+), 59 deletions(-)

-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/11] perf hist: Remove output field from sort-list properly
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-04-30 20:55 ` [PATCH 02/11] perf record: Add --sample-mem-info option Namhyung Kim
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

When it removes an output format for cancelled children or latency, it
should delete itself from the sort list as well.  Otherwise assertion
in fmt_free() will fire.

  $ perf report -H --stdio
  perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed.
  Aborted (core dumped)

Also convert to perf_hpp__column_unregister() for the same open codes.

Fixes: dbd11b6bdab12f60 ("perf hist: Remove formats in hierarchy when cancel children")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/hist.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 3ffce69fc823e0bf..bc0689fceeb18bde 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -696,6 +696,7 @@ void perf_hpp_list__prepend_sort_field(struct perf_hpp_list *list,
 static void perf_hpp__column_unregister(struct perf_hpp_fmt *format)
 {
 	list_del_init(&format->list);
+	list_del_init(&format->sort_list);
 	fmt_free(format);
 }
 
@@ -818,18 +819,12 @@ void perf_hpp__reset_output_field(struct perf_hpp_list *list)
 	struct perf_hpp_fmt *fmt, *tmp;
 
 	/* reset output fields */
-	perf_hpp_list__for_each_format_safe(list, fmt, tmp) {
-		list_del_init(&fmt->list);
-		list_del_init(&fmt->sort_list);
-		fmt_free(fmt);
-	}
+	perf_hpp_list__for_each_format_safe(list, fmt, tmp)
+		perf_hpp__column_unregister(fmt);
 
 	/* reset sort keys */
-	perf_hpp_list__for_each_sort_list_safe(list, fmt, tmp) {
-		list_del_init(&fmt->list);
-		list_del_init(&fmt->sort_list);
-		fmt_free(fmt);
-	}
+	perf_hpp_list__for_each_sort_list_safe(list, fmt, tmp)
+		perf_hpp__column_unregister(fmt);
 }
 
 /*
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/11] perf record: Add --sample-mem-info option
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
  2025-04-30 20:55 ` [PATCH 01/11] perf hist: Remove output field from sort-list properly Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-04-30 20:55 ` [PATCH 03/11] perf hist: Support multi-line header Namhyung Kim
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

There's no way to enable PERF_SAMPLE_DATA_SRC without PERF_SAMPLE_ADDR
which brings a lot of overhead due to the number of MMAP[2] records.

Let's add a new option to enable this information separately.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-record.txt | 7 ++++++-
 tools/perf/builtin-record.c              | 6 ++++++
 tools/perf/util/evsel.c                  | 2 +-
 tools/perf/util/record.h                 | 1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index c7fc1ba265e2755d..c59f1e79f2b4a6f8 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -340,7 +340,7 @@ OPTIONS
 
 -d::
 --data::
-	Record the sample virtual addresses.
+	Record the sample virtual addresses.  Implies --sample-mem-info.
 
 --phys-data::
 	Record the sample physical addresses.
@@ -368,6 +368,11 @@ OPTIONS
 	the sample_type member of the struct perf_event_attr argument to the
 	perf_event_open system call.
 
+--sample-mem-info::
+	Record the sample data source information for memory operations.
+	It requires hardware supports and may work on specific events only.
+	Please consider using 'perf mem record' instead if you're not sure.
+
 -n::
 --no-samples::
 	Don't sample.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ba20bf7c011d7765..6637a3acb1f1295f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -3436,6 +3436,8 @@ static struct option __record_options[] = {
 		    "Record the sampled data address data page size"),
 	OPT_BOOLEAN(0, "code-page-size", &record.opts.sample_code_page_size,
 		    "Record the sampled code address (ip) page size"),
+	OPT_BOOLEAN(0, "sample-mem-info", &record.opts.sample_data_src,
+		    "Record the data source for memory operations"),
 	OPT_BOOLEAN(0, "sample-cpu", &record.opts.sample_cpu, "Record the sample cpu"),
 	OPT_BOOLEAN(0, "sample-identifier", &record.opts.sample_identifier,
 		    "Record the sample identifier"),
@@ -4130,6 +4132,10 @@ int cmd_record(int argc, const char **argv)
 		goto out_opts;
 	}
 
+	/* For backward compatibility, -d implies --mem-info */
+	if (rec->opts.sample_address)
+		rec->opts.sample_data_src = true;
+
 	/*
 	 * Allow aliases to facilitate the lookup of symbols for address
 	 * filters. Refer to auxtrace_parse_filters().
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1d79ffecd41f10ec..0f86df259c822799 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1425,7 +1425,7 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
 		evsel__set_sample_bit(evsel, CPU);
 	}
 
-	if (opts->sample_address)
+	if (opts->sample_data_src)
 		evsel__set_sample_bit(evsel, DATA_SRC);
 
 	if (opts->sample_phys_addr)
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index a6566134e09e5b19..f1956c4db3195070 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -28,6 +28,7 @@ struct record_opts {
 	bool	      sample_time_set;
 	bool	      sample_cpu;
 	bool	      sample_identifier;
+	bool	      sample_data_src;
 	bool	      period;
 	bool	      period_set;
 	bool	      running_time;
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/11] perf hist: Support multi-line header
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
  2025-04-30 20:55 ` [PATCH 01/11] perf hist: Remove output field from sort-list properly Namhyung Kim
  2025-04-30 20:55 ` [PATCH 02/11] perf record: Add --sample-mem-info option Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-04-30 20:55 ` [PATCH 04/11] perf hist: Add struct he_mem_stat Namhyung Kim
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

This is a preparation to support multi-line headers in perf mem report.
Normal sort keys and output fields that don't have contents for multi-
line will print the header string at the last line only.

As we don't use multi-line headers normally, it should not have any
changes in the output.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c | 24 +++++++++-----
 tools/perf/ui/hist.c           |  9 ++++--
 tools/perf/ui/stdio/hist.c     | 57 +++++++++++++++++++++-------------
 tools/perf/util/sort.c         |  8 +++--
 4 files changed, 64 insertions(+), 34 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index cf022e92d06b9b28..67cbdec90d0bf0ea 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1686,7 +1686,8 @@ hists_browser__scnprintf_headers(struct hist_browser *browser, char *buf,
 	return ret;
 }
 
-static int hists_browser__scnprintf_hierarchy_headers(struct hist_browser *browser, char *buf, size_t size)
+static int hists_browser__scnprintf_hierarchy_headers(struct hist_browser *browser,
+						      char *buf, size_t size, int line)
 {
 	struct hists *hists = browser->hists;
 	struct perf_hpp dummy_hpp = {
@@ -1712,7 +1713,7 @@ static int hists_browser__scnprintf_hierarchy_headers(struct hist_browser *brows
 		if (column++ < browser->b.horiz_scroll)
 			continue;
 
-		ret = fmt->header(fmt, &dummy_hpp, hists, 0, NULL);
+		ret = fmt->header(fmt, &dummy_hpp, hists, line, NULL);
 		if (advance_hpp_check(&dummy_hpp, ret))
 			break;
 
@@ -1723,6 +1724,9 @@ static int hists_browser__scnprintf_hierarchy_headers(struct hist_browser *brows
 		first_node = false;
 	}
 
+	if (line < hists->hpp_list->nr_header_lines - 1)
+		return ret;
+
 	if (!first_node) {
 		ret = scnprintf(dummy_hpp.buf, dummy_hpp.size, "%*s",
 				indent * HIERARCHY_INDENT, "");
@@ -1753,7 +1757,7 @@ static int hists_browser__scnprintf_hierarchy_headers(struct hist_browser *brows
 			}
 			first_col = false;
 
-			ret = fmt->header(fmt, &dummy_hpp, hists, 0, NULL);
+			ret = fmt->header(fmt, &dummy_hpp, hists, line, NULL);
 			dummy_hpp.buf[ret] = '\0';
 
 			start = strim(dummy_hpp.buf);
@@ -1772,14 +1776,18 @@ static int hists_browser__scnprintf_hierarchy_headers(struct hist_browser *brows
 
 static void hists_browser__hierarchy_headers(struct hist_browser *browser)
 {
+	struct perf_hpp_list *hpp_list = browser->hists->hpp_list;
 	char headers[1024];
+	int line;
 
-	hists_browser__scnprintf_hierarchy_headers(browser, headers,
-						   sizeof(headers));
+	for (line = 0; line < hpp_list->nr_header_lines; line++) {
+		hists_browser__scnprintf_hierarchy_headers(browser, headers,
+							   sizeof(headers), line);
 
-	ui_browser__gotorc_title(&browser->b, 0, 0);
-	ui_browser__set_color(&browser->b, HE_COLORSET_ROOT);
-	ui_browser__write_nstring(&browser->b, headers, browser->b.width + 1);
+		ui_browser__gotorc_title(&browser->b, line, 0);
+		ui_browser__set_color(&browser->b, HE_COLORSET_ROOT);
+		ui_browser__write_nstring(&browser->b, headers, browser->b.width + 1);
+	}
 }
 
 static void hists_browser__headers(struct hist_browser *browser)
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index bc0689fceeb18bde..ec44633207aa3aba 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -321,11 +321,16 @@ static int hpp__width_fn(struct perf_hpp_fmt *fmt,
 }
 
 static int hpp__header_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
-			  struct hists *hists, int line __maybe_unused,
+			  struct hists *hists, int line,
 			  int *span __maybe_unused)
 {
 	int len = hpp__width_fn(fmt, hpp, hists);
-	return scnprintf(hpp->buf, hpp->size, "%*s", len, fmt->name);
+	const char *hdr = "";
+
+	if (line == hists->hpp_list->nr_header_lines - 1)
+		hdr = fmt->name;
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", len, hdr);
 }
 
 int hpp_color_scnprintf(struct perf_hpp *hpp, const char *fmt, ...)
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 7ac4b98e28bca82e..8c4c8925df2c22fc 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -643,45 +643,58 @@ static int hists__fprintf_hierarchy_headers(struct hists *hists,
 	unsigned header_width = 0;
 	struct perf_hpp_fmt *fmt;
 	struct perf_hpp_list_node *fmt_node;
+	struct perf_hpp_list *hpp_list = hists->hpp_list;
 	const char *sep = symbol_conf.field_sep;
 
 	indent = hists->nr_hpp_node;
 
-	/* preserve max indent depth for column headers */
-	print_hierarchy_indent(sep, indent, " ", fp);
-
 	/* the first hpp_list_node is for overhead columns */
 	fmt_node = list_first_entry(&hists->hpp_formats,
 				    struct perf_hpp_list_node, list);
 
-	perf_hpp_list__for_each_format(&fmt_node->hpp, fmt) {
-		fmt->header(fmt, hpp, hists, 0, NULL);
-		fprintf(fp, "%s%s", hpp->buf, sep ?: "  ");
-	}
+	for (int line = 0; line < hpp_list->nr_header_lines; line++) {
+		/* first # is displayed one level up */
+		if (line)
+			fprintf(fp, "# ");
 
-	/* combine sort headers with ' / ' */
-	first_node = true;
-	list_for_each_entry_continue(fmt_node, &hists->hpp_formats, list) {
-		if (!first_node)
-			header_width += fprintf(fp, " / ");
-		first_node = false;
+		/* preserve max indent depth for column headers */
+		print_hierarchy_indent(sep, indent, " ", fp);
 
-		first_col = true;
 		perf_hpp_list__for_each_format(&fmt_node->hpp, fmt) {
-			if (perf_hpp__should_skip(fmt, hists))
-				continue;
+			fmt->header(fmt, hpp, hists, line, NULL);
+			fprintf(fp, "%s%s", hpp->buf, sep ?: "  ");
+		}
 
-			if (!first_col)
-				header_width += fprintf(fp, "+");
-			first_col = false;
+		if (line < hpp_list->nr_header_lines - 1)
+			goto next_line;
+
+		/* combine sort headers with ' / ' */
+		first_node = true;
+		list_for_each_entry_continue(fmt_node, &hists->hpp_formats, list) {
+			if (!first_node)
+				header_width += fprintf(fp, " / ");
+			first_node = false;
 
-			fmt->header(fmt, hpp, hists, 0, NULL);
+			first_col = true;
+			perf_hpp_list__for_each_format(&fmt_node->hpp, fmt) {
+				if (perf_hpp__should_skip(fmt, hists))
+					continue;
 
-			header_width += fprintf(fp, "%s", strim(hpp->buf));
+				if (!first_col)
+					header_width += fprintf(fp, "+");
+				first_col = false;
+
+				fmt->header(fmt, hpp, hists, line, NULL);
+
+				header_width += fprintf(fp, "%s", strim(hpp->buf));
+			}
 		}
+
+next_line:
+		fprintf(fp, "\n");
 	}
 
-	fprintf(fp, "\n# ");
+	fprintf(fp, "# ");
 
 	/* preserve max indent depth for initial dots */
 	print_hierarchy_indent(sep, indent, dots, fp);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 594b75ca95bf72b2..ae8b8ceb82f3d00b 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2641,18 +2641,22 @@ void perf_hpp__reset_sort_width(struct perf_hpp_fmt *fmt, struct hists *hists)
 }
 
 static int __sort__hpp_header(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
-			      struct hists *hists, int line __maybe_unused,
+			      struct hists *hists, int line,
 			      int *span __maybe_unused)
 {
 	struct hpp_sort_entry *hse;
 	size_t len = fmt->user_len;
+	const char *hdr = "";
+
+	if (line == hists->hpp_list->nr_header_lines - 1)
+		hdr = fmt->name;
 
 	hse = container_of(fmt, struct hpp_sort_entry, hpp);
 
 	if (!len)
 		len = hists__col_len(hists, hse->se->se_width_idx);
 
-	return scnprintf(hpp->buf, hpp->size, "%-*.*s", len, len, fmt->name);
+	return scnprintf(hpp->buf, hpp->size, "%-*.*s", len, len, hdr);
 }
 
 static int __sort__hpp_width(struct perf_hpp_fmt *fmt,
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/11] perf hist: Add struct he_mem_stat
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (2 preceding siblings ...)
  2025-04-30 20:55 ` [PATCH 03/11] perf hist: Support multi-line header Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-04-30 20:55 ` [PATCH 05/11] perf hist: Basic support for mem_stat accounting Namhyung Kim
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

The struct he_mem_stat is to save detailed information about memory
instruction.  It'll be used to show breakdown of various data from
PERF_SAMPLE_DATA_SRC.  Note that this structure is generic and the
contents will be different depending on actual data it'll use later.

The information about the actual data will be saved in struct hists and
its length is in nr_mem_stats.  This commit just adds ground works and
does nothing since hists->nr_mem_stats is 0 for now.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 74 ++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/hist.h |  9 +++++
 2 files changed, 83 insertions(+)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index d65228c1141251fb..fcb9f0db0c92a229 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -336,6 +336,67 @@ static void he_stat__decay(struct he_stat *he_stat)
 	he_stat->latency = (he_stat->latency * 7) / 8;
 }
 
+static int hists__update_mem_stat(struct hists *hists, struct hist_entry *he,
+				  struct mem_info *mi, u64 period)
+{
+	if (hists->nr_mem_stats == 0)
+		return 0;
+
+	if (he->mem_stat == NULL) {
+		he->mem_stat = calloc(hists->nr_mem_stats, sizeof(*he->mem_stat));
+		if (he->mem_stat == NULL)
+			return -1;
+	}
+
+	for (int i = 0; i < hists->nr_mem_stats; i++) {
+		int idx = 0; /* TODO: get correct index from mem info */
+
+		(void)mi;
+		he->mem_stat[i].entries[idx] += period;
+	}
+	return 0;
+}
+
+static void hists__add_mem_stat(struct hists *hists, struct hist_entry *dst,
+				struct hist_entry *src)
+{
+	if (hists->nr_mem_stats == 0)
+		return;
+
+	for (int i = 0; i < hists->nr_mem_stats; i++) {
+		for (int k = 0; k < MEM_STAT_LEN; k++)
+			dst->mem_stat[i].entries[k] += src->mem_stat[i].entries[k];
+	}
+}
+
+static int hists__clone_mem_stat(struct hists *hists, struct hist_entry *dst,
+				  struct hist_entry *src)
+{
+	if (hists->nr_mem_stats == 0)
+		return 0;
+
+	dst->mem_stat = calloc(hists->nr_mem_stats, sizeof(*dst->mem_stat));
+	if (dst->mem_stat == NULL)
+		return -1;
+
+	for (int i = 0; i < hists->nr_mem_stats; i++) {
+		for (int k = 0; k < MEM_STAT_LEN; k++)
+			dst->mem_stat[i].entries[k] = src->mem_stat[i].entries[k];
+	}
+	return 0;
+}
+
+static void hists__decay_mem_stat(struct hists *hists, struct hist_entry *he)
+{
+	if (hists->nr_mem_stats == 0)
+		return;
+
+	for (int i = 0; i < hists->nr_mem_stats; i++) {
+		for (int k = 0; k < MEM_STAT_LEN; k++)
+			he->mem_stat[i].entries[k] = (he->mem_stat[i].entries[k] * 7) / 8;
+	}
+}
+
 static void hists__delete_entry(struct hists *hists, struct hist_entry *he);
 
 static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
@@ -350,6 +411,7 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 	if (symbol_conf.cumulate_callchain)
 		he_stat__decay(he->stat_acc);
 	decay_callchain(he->callchain);
+	hists__decay_mem_stat(hists, he);
 
 	if (!he->depth) {
 		u64 period_diff = prev_period - he->stat.period;
@@ -693,6 +755,10 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 		he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
 	if (symbol_conf.cumulate_callchain)
 		he_stat__add_cpumode_period(he->stat_acc, al->cpumode, period);
+	if (hists__update_mem_stat(hists, he, entry->mem_info, period) < 0) {
+		hist_entry__delete(he);
+		return NULL;
+	}
 	return he;
 }
 
@@ -1423,6 +1489,7 @@ void hist_entry__delete(struct hist_entry *he)
 	free_callchain(he->callchain);
 	zfree(&he->trace_output);
 	zfree(&he->raw_data);
+	zfree(&he->mem_stat);
 	ops->free(he);
 }
 
@@ -1572,6 +1639,7 @@ static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
 		cmp = hist_entry__collapse_hierarchy(hpp_list, iter, he);
 		if (!cmp) {
 			he_stat__add_stat(&iter->stat, &he->stat);
+			hists__add_mem_stat(hists, iter, he);
 			return iter;
 		}
 
@@ -1613,6 +1681,11 @@ static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
 			new->srcfile = NULL;
 	}
 
+	if (hists__clone_mem_stat(hists, new, he) < 0) {
+		hist_entry__delete(new);
+		return NULL;
+	}
+
 	rb_link_node(&new->rb_node_in, parent, p);
 	rb_insert_color_cached(&new->rb_node_in, root, leftmost);
 	return new;
@@ -1695,6 +1768,7 @@ static int hists__collapse_insert_entry(struct hists *hists,
 			he_stat__add_stat(&iter->stat, &he->stat);
 			if (symbol_conf.cumulate_callchain)
 				he_stat__add_stat(iter->stat_acc, he->stat_acc);
+			hists__add_mem_stat(hists, iter, he);
 
 			if (hist_entry__has_callchains(he) && symbol_conf.use_callchain) {
 				struct callchain_cursor *cursor = get_tls_callchain_cursor();
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 76efd8952507a561..aba1d84ca074f27b 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -100,6 +100,13 @@ enum hist_column {
 struct thread;
 struct dso;
 
+#define MEM_STAT_LEN  8
+
+struct he_mem_stat {
+	/* meaning of entries depends on enum mem_stat_type */
+	u64			entries[MEM_STAT_LEN];
+};
+
 struct hists {
 	struct rb_root_cached	entries_in_array[2];
 	struct rb_root_cached	*entries_in;
@@ -125,6 +132,7 @@ struct hists {
 	struct perf_hpp_list	*hpp_list;
 	struct list_head	hpp_formats;
 	int			nr_hpp_node;
+	int			nr_mem_stats;
 };
 
 #define hists__has(__h, __f) (__h)->hpp_list->__f
@@ -232,6 +240,7 @@ struct hist_entry {
 	} pairs;
 	struct he_stat		stat;
 	struct he_stat		*stat_acc;
+	struct he_mem_stat	*mem_stat;
 	struct map_symbol	ms;
 	struct thread		*thread;
 	struct comm		*comm;
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/11] perf hist: Basic support for mem_stat accounting
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (3 preceding siblings ...)
  2025-04-30 20:55 ` [PATCH 04/11] perf hist: Add struct he_mem_stat Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-04-30 20:55 ` [PATCH 06/11] perf hist: Implement output fields for mem stats Namhyung Kim
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

Add a logic to account he->mem_stat based on mem_stat_type in hists.
Each mem_stat entry will have different meaning based on the type so the
index in the array is calculated at runtime using the corresponding
value in the sample.data_src.

Still hists has no mem_stat_types yet so this code won't work for now.
Later hists->mem_stat_types will be allocated based on what users want
in the output actually.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/hist.c         | 39 ++++++++++++++++++++++++++++++++++++
 tools/perf/util/hist.c       |  6 ++++--
 tools/perf/util/hist.h       |  4 ++++
 tools/perf/util/mem-events.c | 18 +++++++++++++++++
 tools/perf/util/mem-events.h |  6 ++++++
 tools/perf/util/sort.c       |  4 ++++
 6 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index ec44633207aa3aba..2aad46bbd2ed4d93 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -11,6 +11,7 @@
 #include "../util/sort.h"
 #include "../util/evsel.h"
 #include "../util/evlist.h"
+#include "../util/mem-events.h"
 #include "../util/thread.h"
 #include "../util/util.h"
 
@@ -500,6 +501,12 @@ static int64_t hpp__nop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return 0;
 }
 
+static bool perf_hpp__is_mem_stat_entry(struct perf_hpp_fmt *fmt)
+{
+	(void)fmt;
+	return false;
+}
+
 static bool perf_hpp__is_hpp_entry(struct perf_hpp_fmt *a)
 {
 	return a->header == hpp__header_fn;
@@ -1022,3 +1029,35 @@ int perf_hpp__setup_hists_formats(struct perf_hpp_list *list,
 
 	return 0;
 }
+
+int perf_hpp__alloc_mem_stats(struct perf_hpp_list *list, struct evlist *evlist)
+{
+	struct perf_hpp_fmt *fmt;
+	struct evsel *evsel;
+	enum mem_stat_type mst[16];
+	unsigned nr_mem_stats = 0;
+
+	perf_hpp_list__for_each_format(list, fmt) {
+		if (!perf_hpp__is_mem_stat_entry(fmt))
+			continue;
+
+		assert(nr_mem_stats < ARRAY_SIZE(mst));
+		mst[nr_mem_stats++] = PERF_MEM_STAT_UNKNOWN;
+	}
+
+	if (nr_mem_stats == 0)
+		return 0;
+
+	evlist__for_each_entry(evlist, evsel) {
+		struct hists *hists = evsel__hists(evsel);
+
+		hists->mem_stat_types = calloc(nr_mem_stats,
+					       sizeof(*hists->mem_stat_types));
+		if (hists->mem_stat_types == NULL)
+			return -ENOMEM;
+
+		memcpy(hists->mem_stat_types, mst, nr_mem_stats * sizeof(*mst));
+		hists->nr_mem_stats = nr_mem_stats;
+	}
+	return 0;
+}
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index fcb9f0db0c92a229..7759c1818c1ad168 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -349,9 +349,10 @@ static int hists__update_mem_stat(struct hists *hists, struct hist_entry *he,
 	}
 
 	for (int i = 0; i < hists->nr_mem_stats; i++) {
-		int idx = 0; /* TODO: get correct index from mem info */
+		int idx = mem_stat_index(hists->mem_stat_types[i],
+					 mem_info__const_data_src(mi)->val);
 
-		(void)mi;
+		assert(0 <= idx && idx < MEM_STAT_LEN);
 		he->mem_stat[i].entries[idx] += period;
 	}
 	return 0;
@@ -3052,6 +3053,7 @@ static void hists_evsel__exit(struct evsel *evsel)
 	struct perf_hpp_list_node *node, *tmp;
 
 	hists__delete_all_entries(hists);
+	zfree(&hists->mem_stat_types);
 
 	list_for_each_entry_safe(node, tmp, &hists->hpp_formats, list) {
 		perf_hpp_list__for_each_format_safe(&node->hpp, fmt, pos) {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index aba1d84ca074f27b..509af09691b84e10 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -9,6 +9,7 @@
 #include "events_stats.h"
 #include "evsel.h"
 #include "map_symbol.h"
+#include "mem-events.h"
 #include "mutex.h"
 #include "sample.h"
 #include "spark.h"
@@ -133,6 +134,7 @@ struct hists {
 	struct list_head	hpp_formats;
 	int			nr_hpp_node;
 	int			nr_mem_stats;
+	enum mem_stat_type	*mem_stat_types;
 };
 
 #define hists__has(__h, __f) (__h)->hpp_list->__f
@@ -597,6 +599,8 @@ void perf_hpp__reset_output_field(struct perf_hpp_list *list);
 void perf_hpp__append_sort_keys(struct perf_hpp_list *list);
 int perf_hpp__setup_hists_formats(struct perf_hpp_list *list,
 				  struct evlist *evlist);
+int perf_hpp__alloc_mem_stats(struct perf_hpp_list *list,
+			      struct evlist *evlist);
 
 
 bool perf_hpp__is_sort_entry(struct perf_hpp_fmt *format);
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 884d9aebce9199c0..1bc60ad3dc312542 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -799,3 +799,21 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
 	stats->nomap		+= add->nomap;
 	stats->noparse		+= add->noparse;
 }
+
+/*
+ * It returns an index in hist_entry->mem_stat array for the given val which
+ * represents a data-src based on the mem_stat_type.
+ *
+ * For example, when mst is about cache level, the index can be 1 for L1, 2 for
+ * L2 and so on.
+ */
+int mem_stat_index(const enum mem_stat_type mst, const u64 val)
+{
+	switch (mst) {
+	case PERF_MEM_STAT_UNKNOWN:  /* placeholder */
+	default:
+		break;
+	}
+	(void)val;
+	return -1;
+}
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index a5c19d39ee37147b..2604464f985815f6 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -89,4 +89,10 @@ struct hist_entry;
 int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi);
 void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add);
 
+enum mem_stat_type {
+	PERF_MEM_STAT_UNKNOWN,  /* placeholder */
+};
+
+int mem_stat_index(const enum mem_stat_type mst, const u64 data_src);
+
 #endif /* __PERF_MEM_EVENTS_H */
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index ae8b8ceb82f3d00b..6024f588f66f3156 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -4163,6 +4163,10 @@ int setup_sorting(struct evlist *evlist)
 	if (err < 0)
 		return err;
 
+	err = perf_hpp__alloc_mem_stats(&perf_hpp_list, evlist);
+	if (err < 0)
+		return err;
+
 	/* copy sort keys to output fields */
 	perf_hpp__setup_output_field(&perf_hpp_list);
 	/* and then copy output fields to sort keys */
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/11] perf hist: Implement output fields for mem stats
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (4 preceding siblings ...)
  2025-04-30 20:55 ` [PATCH 05/11] perf hist: Basic support for mem_stat accounting Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-04-30 20:55 ` [PATCH 07/11] perf mem: Add 'op' output field Namhyung Kim
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

This is a preparation for later changes to support mem_stat output.  The
new fields will need two lines for the header - the first line will show
type of mem stat and the second line will show the name of each item
which is returned by mem_stat_name().

Each element in the mem_stat array will be printed in percentage for the
hist_entry and their sum would be 100%.

Add new output field dimension only for SORT_MODE__MEM using mem_stat.
To handle possible name conflict with existing sort keys, move the order
of checking output field dimensions after the sort dimensions when it
looks for sort keys.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c |  11 +++
 tools/perf/ui/hist.c           | 158 ++++++++++++++++++++++++++++++++-
 tools/perf/util/hist.h         |   4 +
 tools/perf/util/mem-events.c   |  12 +++
 tools/perf/util/mem-events.h   |   3 +
 tools/perf/util/sort.c         |  26 ++++--
 6 files changed, 202 insertions(+), 12 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 67cbdec90d0bf0ea..f6ab1310a0bdd6c4 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1266,6 +1266,16 @@ hist_browser__hpp_color_##_type(struct perf_hpp_fmt *fmt,		\
 			_fmttype);					\
 }
 
+#define __HPP_COLOR_MEM_STAT_FN(_name, _type)				\
+static int								\
+hist_browser__hpp_color_mem_stat_##_name(struct perf_hpp_fmt *fmt,	\
+					 struct perf_hpp *hpp,		\
+					 struct hist_entry *he)		\
+{									\
+	return hpp__fmt_mem_stat(fmt, hpp, he, PERF_MEM_STAT_##_type,	\
+				 " %5.1f%%", __hpp__slsmg_color_printf);\
+}
+
 __HPP_COLOR_PERCENT_FN(overhead, period, PERF_HPP_FMT_TYPE__PERCENT)
 __HPP_COLOR_PERCENT_FN(latency, latency, PERF_HPP_FMT_TYPE__LATENCY)
 __HPP_COLOR_PERCENT_FN(overhead_sys, period_sys, PERF_HPP_FMT_TYPE__PERCENT)
@@ -1277,6 +1287,7 @@ __HPP_COLOR_ACC_PERCENT_FN(latency_acc, latency, PERF_HPP_FMT_TYPE__LATENCY)
 
 #undef __HPP_COLOR_PERCENT_FN
 #undef __HPP_COLOR_ACC_PERCENT_FN
+#undef __HPP_COLOR_MEM_STAT_FN
 
 void hist_browser__init_hpp(void)
 {
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 2aad46bbd2ed4d93..2a5c9f2b328b2c5c 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -12,6 +12,7 @@
 #include "../util/evsel.h"
 #include "../util/evlist.h"
 #include "../util/mem-events.h"
+#include "../util/string2.h"
 #include "../util/thread.h"
 #include "../util/util.h"
 
@@ -151,6 +152,45 @@ int hpp__fmt_acc(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 	return hpp__fmt(fmt, hpp, he, get_field, fmtstr, print_fn, fmtype);
 }
 
+int hpp__fmt_mem_stat(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
+		      struct hist_entry *he, enum mem_stat_type mst,
+		      const char *fmtstr, hpp_snprint_fn print_fn)
+{
+	struct hists *hists = he->hists;
+	int mem_stat_idx = -1;
+	char *buf = hpp->buf;
+	size_t size = hpp->size;
+	u64 total = 0;
+	int ret = 0;
+
+	for (int i = 0; i < hists->nr_mem_stats; i++) {
+		if (hists->mem_stat_types[i] == mst) {
+			mem_stat_idx = i;
+			break;
+		}
+	}
+	assert(mem_stat_idx != -1);
+
+	for (int i = 0; i < MEM_STAT_LEN; i++)
+		total += he->mem_stat[mem_stat_idx].entries[i];
+	assert(total != 0);
+
+	for (int i = 0; i < MEM_STAT_LEN; i++) {
+		u64 val = he->mem_stat[mem_stat_idx].entries[i];
+
+		ret += hpp__call_print_fn(hpp, print_fn, fmtstr, 100.0 * val / total);
+	}
+
+	/*
+	 * Restore original buf and size as it's where caller expects
+	 * the result will be saved.
+	 */
+	hpp->buf = buf;
+	hpp->size = size;
+
+	return ret;
+}
+
 static int field_cmp(u64 field_a, u64 field_b)
 {
 	if (field_a > field_b)
@@ -295,6 +335,23 @@ static int __hpp__sort_acc(struct hist_entry *a, struct hist_entry *b,
 	return ret;
 }
 
+static bool perf_hpp__is_mem_stat_entry(struct perf_hpp_fmt *fmt);
+
+static enum mem_stat_type hpp__mem_stat_type(struct perf_hpp_fmt *fmt)
+{
+	if (!perf_hpp__is_mem_stat_entry(fmt))
+		return -1;
+
+	pr_debug("Should not reach here\n");
+	return -1;
+}
+
+static int64_t hpp__sort_mem_stat(struct perf_hpp_fmt *fmt __maybe_unused,
+				  struct hist_entry *a, struct hist_entry *b)
+{
+	return a->stat.period - b->stat.period;
+}
+
 static int hpp__width_fn(struct perf_hpp_fmt *fmt,
 			 struct perf_hpp *hpp __maybe_unused,
 			 struct hists *hists)
@@ -334,6 +391,45 @@ static int hpp__header_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 	return scnprintf(hpp->buf, hpp->size, "%*s", len, hdr);
 }
 
+static int hpp__header_mem_stat_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+				   struct hists *hists, int line,
+				   int *span __maybe_unused)
+{
+	char *buf = hpp->buf;
+	int ret = 0;
+	int len;
+	enum mem_stat_type mst = hpp__mem_stat_type(fmt);
+
+	(void)hists;
+	if (line == 0) {
+		int left, right;
+
+		len = fmt->len;
+		left = (len - strlen(fmt->name)) / 2 - 1;
+		right = len - left - strlen(fmt->name) - 2;
+
+		if (left < 0)
+			left = 0;
+		if (right < 0)
+			right = 0;
+
+		return scnprintf(hpp->buf, hpp->size, "%.*s %s %.*s",
+				 left, graph_dotted_line, fmt->name, right, graph_dotted_line);
+	}
+
+	len = hpp->size;
+	for (int i = 0; i < MEM_STAT_LEN; i++) {
+		int printed;
+
+		printed = scnprintf(buf, len, "%*s", MEM_STAT_PRINT_LEN,
+				    mem_stat_name(mst, i));
+		ret += printed;
+		buf += printed;
+		len -= printed;
+	}
+	return ret;
+}
+
 int hpp_color_scnprintf(struct perf_hpp *hpp, const char *fmt, ...)
 {
 	va_list args;
@@ -459,6 +555,23 @@ static int64_t hpp__sort_##_type(struct perf_hpp_fmt *fmt __maybe_unused, 	\
 	return __hpp__sort(a, b, he_get_##_field);				\
 }
 
+#define __HPP_COLOR_MEM_STAT_FN(_name, _type)					\
+static int hpp__color_mem_stat_##_name(struct perf_hpp_fmt *fmt,		\
+				       struct perf_hpp *hpp,			\
+				       struct hist_entry *he)			\
+{										\
+	return hpp__fmt_mem_stat(fmt, hpp, he, PERF_MEM_STAT_##_type,		\
+				 " %5.1f%%", hpp_color_scnprintf);		\
+}
+
+#define __HPP_ENTRY_MEM_STAT_FN(_name, _type)					\
+static int hpp__entry_mem_stat_##_name(struct perf_hpp_fmt *fmt, 		\
+				       struct perf_hpp *hpp,			\
+				       struct hist_entry *he)			\
+{										\
+	return hpp__fmt_mem_stat(fmt, hpp, he, PERF_MEM_STAT_##_type,		\
+				 " %5.1f%%", hpp_entry_scnprintf);		\
+}
 
 #define HPP_PERCENT_FNS(_type, _field, _fmttype)			\
 __HPP_COLOR_PERCENT_FN(_type, _field, _fmttype)				\
@@ -478,6 +591,10 @@ __HPP_SORT_RAW_FN(_type, _field)
 __HPP_ENTRY_AVERAGE_FN(_type, _field)					\
 __HPP_SORT_AVERAGE_FN(_type, _field)
 
+#define HPP_MEM_STAT_FNS(_name, _type)					\
+__HPP_COLOR_MEM_STAT_FN(_name, _type)					\
+__HPP_ENTRY_MEM_STAT_FN(_name, _type)
+
 HPP_PERCENT_FNS(overhead, period, PERF_HPP_FMT_TYPE__PERCENT)
 HPP_PERCENT_FNS(latency, latency, PERF_HPP_FMT_TYPE__LATENCY)
 HPP_PERCENT_FNS(overhead_sys, period_sys, PERF_HPP_FMT_TYPE__PERCENT)
@@ -494,6 +611,8 @@ HPP_AVERAGE_FNS(weight1, weight1)
 HPP_AVERAGE_FNS(weight2, weight2)
 HPP_AVERAGE_FNS(weight3, weight3)
 
+HPP_MEM_STAT_FNS(unknown, UNKNOWN)  /* placeholder */
+
 static int64_t hpp__nop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 			    struct hist_entry *a __maybe_unused,
 			    struct hist_entry *b __maybe_unused)
@@ -503,8 +622,7 @@ static int64_t hpp__nop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 
 static bool perf_hpp__is_mem_stat_entry(struct perf_hpp_fmt *fmt)
 {
-	(void)fmt;
-	return false;
+	return fmt->sort == hpp__sort_mem_stat;
 }
 
 static bool perf_hpp__is_hpp_entry(struct perf_hpp_fmt *a)
@@ -520,6 +638,14 @@ static bool hpp__equal(struct perf_hpp_fmt *a, struct perf_hpp_fmt *b)
 	return a->idx == b->idx;
 }
 
+static bool hpp__equal_mem_stat(struct perf_hpp_fmt *a, struct perf_hpp_fmt *b)
+{
+	if (!perf_hpp__is_mem_stat_entry(a) || !perf_hpp__is_mem_stat_entry(b))
+		return false;
+
+	return a->entry == b->entry;
+}
+
 #define HPP__COLOR_PRINT_FNS(_name, _fn, _idx)		\
 	{						\
 		.name   = _name,			\
@@ -561,6 +687,20 @@ static bool hpp__equal(struct perf_hpp_fmt *a, struct perf_hpp_fmt *b)
 		.equal	= hpp__equal,			\
 	}
 
+#define HPP__MEM_STAT_PRINT_FNS(_name, _fn, _type)	\
+	{						\
+		.name   = _name,			\
+		.header	= hpp__header_mem_stat_fn,	\
+		.width	= hpp__width_fn,		\
+		.color	= hpp__color_mem_stat_ ## _fn,	\
+		.entry	= hpp__entry_mem_stat_ ## _fn,	\
+		.cmp	= hpp__nop_cmp,			\
+		.collapse = hpp__nop_cmp,		\
+		.sort	= hpp__sort_mem_stat,		\
+		.idx	= PERF_HPP__MEM_STAT_ ## _type,	\
+		.equal	= hpp__equal_mem_stat,		\
+	}
+
 struct perf_hpp_fmt perf_hpp__format[] = {
 	HPP__COLOR_PRINT_FNS("Overhead", overhead, OVERHEAD),
 	HPP__COLOR_PRINT_FNS("Latency", latency, LATENCY),
@@ -575,6 +715,7 @@ struct perf_hpp_fmt perf_hpp__format[] = {
 	HPP__PRINT_FNS("Weight1", weight1, WEIGHT1),
 	HPP__PRINT_FNS("Weight2", weight2, WEIGHT2),
 	HPP__PRINT_FNS("Weight3", weight3, WEIGHT3),
+	HPP__MEM_STAT_PRINT_FNS("Unknown", unknown, UNKNOWN),  /* placeholder */
 };
 
 struct perf_hpp_list perf_hpp_list = {
@@ -586,11 +727,13 @@ struct perf_hpp_list perf_hpp_list = {
 #undef HPP__COLOR_PRINT_FNS
 #undef HPP__COLOR_ACC_PRINT_FNS
 #undef HPP__PRINT_FNS
+#undef HPP__MEM_STAT_PRINT_FNS
 
 #undef HPP_PERCENT_FNS
 #undef HPP_PERCENT_ACC_FNS
 #undef HPP_RAW_FNS
 #undef HPP_AVERAGE_FNS
+#undef HPP_MEM_STAT_FNS
 
 #undef __HPP_HEADER_FN
 #undef __HPP_WIDTH_FN
@@ -600,6 +743,9 @@ struct perf_hpp_list perf_hpp_list = {
 #undef __HPP_ENTRY_ACC_PERCENT_FN
 #undef __HPP_ENTRY_RAW_FN
 #undef __HPP_ENTRY_AVERAGE_FN
+#undef __HPP_COLOR_MEM_STAT_FN
+#undef __HPP_ENTRY_MEM_STAT_FN
+
 #undef __HPP_SORT_FN
 #undef __HPP_SORT_ACC_FN
 #undef __HPP_SORT_RAW_FN
@@ -924,6 +1070,10 @@ void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, struct hists *hists)
 		fmt->len = 8;
 		break;
 
+	case PERF_HPP__MEM_STAT_UNKNOWN:  /* placeholder */
+		fmt->len = MEM_STAT_LEN * MEM_STAT_PRINT_LEN;
+		break;
+
 	default:
 		break;
 	}
@@ -1042,12 +1192,14 @@ int perf_hpp__alloc_mem_stats(struct perf_hpp_list *list, struct evlist *evlist)
 			continue;
 
 		assert(nr_mem_stats < ARRAY_SIZE(mst));
-		mst[nr_mem_stats++] = PERF_MEM_STAT_UNKNOWN;
+		mst[nr_mem_stats++] = hpp__mem_stat_type(fmt);
 	}
 
 	if (nr_mem_stats == 0)
 		return 0;
 
+	list->nr_header_lines = 2;
+
 	evlist__for_each_entry(evlist, evsel) {
 		struct hists *hists = evsel__hists(evsel);
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 509af09691b84e10..18c696d8d568a9fa 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -587,6 +587,7 @@ enum {
 	PERF_HPP__WEIGHT1,
 	PERF_HPP__WEIGHT2,
 	PERF_HPP__WEIGHT3,
+	PERF_HPP__MEM_STAT_UNKNOWN,  /* placeholder */
 
 	PERF_HPP__MAX_INDEX
 };
@@ -656,6 +657,9 @@ int hpp__fmt_acc(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 		 struct hist_entry *he, hpp_field_fn get_field,
 		 const char *fmtstr, hpp_snprint_fn print_fn,
 		 enum perf_hpp_fmt_type fmtype);
+int hpp__fmt_mem_stat(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		      struct hist_entry *he, enum mem_stat_type mst,
+		      const char *fmtstr, hpp_snprint_fn print_fn);
 
 static inline void advance_hpp(struct perf_hpp *hpp, int inc)
 {
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 1bc60ad3dc312542..a4c1e42de30f8307 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -817,3 +817,15 @@ int mem_stat_index(const enum mem_stat_type mst, const u64 val)
 	(void)val;
 	return -1;
 }
+
+/* To align output, returned string should be shorter than MEM_STAT_PRINT_LEN */
+const char *mem_stat_name(const enum mem_stat_type mst, const int idx)
+{
+	switch (mst) {
+	case PERF_MEM_STAT_UNKNOWN:
+	default:
+		break;
+	}
+	(void)idx;
+	return "N/A";
+}
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 2604464f985815f6..7aeb4c5fefc89698 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -93,6 +93,9 @@ enum mem_stat_type {
 	PERF_MEM_STAT_UNKNOWN,  /* placeholder */
 };
 
+#define MEM_STAT_PRINT_LEN  7  /* 1 space + 5 digits + 1 percent sign */
+
 int mem_stat_index(const enum mem_stat_type mst, const u64 data_src);
+const char *mem_stat_name(const enum mem_stat_type mst, const int idx);
 
 #endif /* __PERF_MEM_EVENTS_H */
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 6024f588f66f3156..7c669ea27af247e5 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2598,9 +2598,11 @@ struct hpp_dimension {
 	struct perf_hpp_fmt	*fmt;
 	int			taken;
 	int			was_taken;
+	int			mem_mode;
 };
 
 #define DIM(d, n) { .name = n, .fmt = &perf_hpp__format[d], }
+#define DIM_MEM(d, n) { .name = n, .fmt = &perf_hpp__format[d], .mem_mode = 1, }
 
 static struct hpp_dimension hpp_sort_dimensions[] = {
 	DIM(PERF_HPP__OVERHEAD, "overhead"),
@@ -2620,8 +2622,11 @@ static struct hpp_dimension hpp_sort_dimensions[] = {
 	DIM(PERF_HPP__WEIGHT2, "ins_lat"),
 	DIM(PERF_HPP__WEIGHT3, "retire_lat"),
 	DIM(PERF_HPP__WEIGHT3, "p_stage_cyc"),
+	/* used for output only when SORT_MODE__MEM */
+	DIM_MEM(PERF_HPP__MEM_STAT_UNKNOWN, "unknown"),  /* placeholder */
 };
 
+#undef DIM_MEM
 #undef DIM
 
 struct hpp_sort_entry {
@@ -3608,15 +3613,6 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
 		return __sort_dimension__add(sd, list, level);
 	}
 
-	for (i = 0; i < ARRAY_SIZE(hpp_sort_dimensions); i++) {
-		struct hpp_dimension *hd = &hpp_sort_dimensions[i];
-
-		if (strncasecmp(tok, hd->name, strlen(tok)))
-			continue;
-
-		return __hpp_dimension__add(hd, list, level);
-	}
-
 	for (i = 0; i < ARRAY_SIZE(bstack_sort_dimensions); i++) {
 		struct sort_dimension *sd = &bstack_sort_dimensions[i];
 
@@ -3658,6 +3654,15 @@ int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
 		return 0;
 	}
 
+	for (i = 0; i < ARRAY_SIZE(hpp_sort_dimensions); i++) {
+		struct hpp_dimension *hd = &hpp_sort_dimensions[i];
+
+		if (strncasecmp(tok, hd->name, strlen(tok)))
+			continue;
+
+		return __hpp_dimension__add(hd, list, level);
+	}
+
 	if (!add_dynamic_entry(evlist, tok, level))
 		return 0;
 
@@ -4020,6 +4025,9 @@ int output_field_add(struct perf_hpp_list *list, const char *tok, int *level)
 		if (!strcasecmp(tok, "weight"))
 			ui__warning("--fields weight shows the average value unlike in the --sort key.\n");
 
+		if (hd->mem_mode && sort__mode != SORT_MODE__MEMORY)
+			continue;
+
 		return __hpp_dimension__add_output(list, hd, *level);
 	}
 
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/11] perf mem: Add 'op' output field
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (5 preceding siblings ...)
  2025-04-30 20:55 ` [PATCH 06/11] perf hist: Implement output fields for mem stats Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-04-30 20:55 ` [PATCH 08/11] perf hist: Hide unused mem stat columns Namhyung Kim
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

This is an actual example of the he_mem_stat based sample breakdown.  It
uses 'mem_op' field of union perf_mem_data_src which means memory
operations.

It'd have basically 'load' or 'store' which can be useful if PMU doesn't
have separate events for them like IBS or SPE.  In addition, there's an
entry in case load and store happen at the same time.  Also adds entries
for prefetching and execution.

  $ perf mem report -F +op -s comm --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 4K of event 'ibs_op//'
  # Total weight : 9559
  # Sort order   : comm
  #
  #                         ------------------------ Mem Op ------------------------
  # Overhead       Samples     Load  Store  Ld+St Pfetch   Exec  Other    N/A    N/A  Command
  # ........  ............  ........................................................  ...............
  #
      44.85%          4077    21.1%  30.7%   0.0%   0.0%   0.0%  48.3%   0.0%   0.0%  swapper
      26.82%            45    98.8%   0.3%   0.0%   0.0%   0.0%   0.9%   0.0%   0.0%  netsli-prober
       7.19%           442    51.7%  13.7%   0.0%   0.0%   0.0%  34.6%   0.0%   0.0%  perf
       5.81%            75    89.7%   2.2%   0.0%   0.0%   0.0%   8.1%   0.0%   0.0%  qemu-system-ppc
       4.77%             1   100.0%   0.0%   0.0%   0.0%   0.0%   0.0%   0.0%   0.0%  notifications_c
       1.77%            10    95.9%   1.2%   0.0%   0.0%   0.0%   3.0%   0.0%   0.0%  MemoryReleaser
       0.77%            32    71.6%   4.1%   0.0%   0.0%   0.0%  24.3%   0.0%   0.0%  DefaultEventMan
       0.19%            10    66.7%  22.2%   0.0%   0.0%   0.0%  11.1%   0.0%   0.0%  gnome-shell

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c |  3 +++
 tools/perf/ui/hist.c           | 12 ++++++---
 tools/perf/util/hist.h         |  2 +-
 tools/perf/util/mem-events.c   | 48 ++++++++++++++++++++++++++--------
 tools/perf/util/mem-events.h   | 11 +++++++-
 tools/perf/util/sort.c         |  2 +-
 6 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index f6ab1310a0bdd6c4..66a4c769b2d76436 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1284,6 +1284,7 @@ __HPP_COLOR_PERCENT_FN(overhead_guest_sys, period_guest_sys, PERF_HPP_FMT_TYPE__
 __HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us, PERF_HPP_FMT_TYPE__PERCENT)
 __HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period, PERF_HPP_FMT_TYPE__PERCENT)
 __HPP_COLOR_ACC_PERCENT_FN(latency_acc, latency, PERF_HPP_FMT_TYPE__LATENCY)
+__HPP_COLOR_MEM_STAT_FN(op, OP)
 
 #undef __HPP_COLOR_PERCENT_FN
 #undef __HPP_COLOR_ACC_PERCENT_FN
@@ -1307,6 +1308,8 @@ void hist_browser__init_hpp(void)
 				hist_browser__hpp_color_overhead_acc;
 	perf_hpp__format[PERF_HPP__LATENCY_ACC].color =
 				hist_browser__hpp_color_latency_acc;
+	perf_hpp__format[PERF_HPP__MEM_STAT_OP].color =
+				hist_browser__hpp_color_mem_stat_op;
 
 	res_sample_init();
 }
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 2a5c9f2b328b2c5c..427ce687ad815a62 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -342,6 +342,12 @@ static enum mem_stat_type hpp__mem_stat_type(struct perf_hpp_fmt *fmt)
 	if (!perf_hpp__is_mem_stat_entry(fmt))
 		return -1;
 
+	switch (fmt->idx) {
+	case PERF_HPP__MEM_STAT_OP:
+		return PERF_MEM_STAT_OP;
+	default:
+		break;
+	}
 	pr_debug("Should not reach here\n");
 	return -1;
 }
@@ -611,7 +617,7 @@ HPP_AVERAGE_FNS(weight1, weight1)
 HPP_AVERAGE_FNS(weight2, weight2)
 HPP_AVERAGE_FNS(weight3, weight3)
 
-HPP_MEM_STAT_FNS(unknown, UNKNOWN)  /* placeholder */
+HPP_MEM_STAT_FNS(op, OP)
 
 static int64_t hpp__nop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 			    struct hist_entry *a __maybe_unused,
@@ -715,7 +721,7 @@ struct perf_hpp_fmt perf_hpp__format[] = {
 	HPP__PRINT_FNS("Weight1", weight1, WEIGHT1),
 	HPP__PRINT_FNS("Weight2", weight2, WEIGHT2),
 	HPP__PRINT_FNS("Weight3", weight3, WEIGHT3),
-	HPP__MEM_STAT_PRINT_FNS("Unknown", unknown, UNKNOWN),  /* placeholder */
+	HPP__MEM_STAT_PRINT_FNS("Mem Op", op, OP),
 };
 
 struct perf_hpp_list perf_hpp_list = {
@@ -1070,7 +1076,7 @@ void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, struct hists *hists)
 		fmt->len = 8;
 		break;
 
-	case PERF_HPP__MEM_STAT_UNKNOWN:  /* placeholder */
+	case PERF_HPP__MEM_STAT_OP:
 		fmt->len = MEM_STAT_LEN * MEM_STAT_PRINT_LEN;
 		break;
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 18c696d8d568a9fa..3990cfc21b1615ae 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -587,7 +587,7 @@ enum {
 	PERF_HPP__WEIGHT1,
 	PERF_HPP__WEIGHT2,
 	PERF_HPP__WEIGHT3,
-	PERF_HPP__MEM_STAT_UNKNOWN,  /* placeholder */
+	PERF_HPP__MEM_STAT_OP,
 
 	PERF_HPP__MAX_INDEX
 };
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index a4c1e42de30f8307..1c44ccc026fe9974 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -303,15 +303,12 @@ int perf_mem_events__record_args(const char **rec_argv, int *argv_nr, char **eve
 	}
 
 	if (cpu_map) {
-		struct perf_cpu_map *online = cpu_map__online();
-
-		if (!perf_cpu_map__equal(cpu_map, online)) {
+		if (!perf_cpu_map__equal(cpu_map, cpu_map__online())) {
 			char buf[200];
 
 			cpu_map__snprint(cpu_map, buf, sizeof(buf));
 			pr_warning("Memory events are enabled on a subset of CPUs: %s\n", buf);
 		}
-		perf_cpu_map__put(online);
 		perf_cpu_map__put(cpu_map);
 	}
 
@@ -803,18 +800,32 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
 /*
  * It returns an index in hist_entry->mem_stat array for the given val which
  * represents a data-src based on the mem_stat_type.
- *
- * For example, when mst is about cache level, the index can be 1 for L1, 2 for
- * L2 and so on.
  */
 int mem_stat_index(const enum mem_stat_type mst, const u64 val)
 {
+	union perf_mem_data_src src = {
+		.val = val,
+	};
+
 	switch (mst) {
-	case PERF_MEM_STAT_UNKNOWN:  /* placeholder */
+	case PERF_MEM_STAT_OP:
+		switch (src.mem_op) {
+		case PERF_MEM_OP_LOAD:
+			return MEM_STAT_OP_LOAD;
+		case PERF_MEM_OP_STORE:
+			return MEM_STAT_OP_STORE;
+		case PERF_MEM_OP_LOAD | PERF_MEM_OP_STORE:
+			return MEM_STAT_OP_LDST;
+		default:
+			if (src.mem_op & PERF_MEM_OP_PFETCH)
+				return MEM_STAT_OP_PFETCH;
+			if (src.mem_op & PERF_MEM_OP_EXEC)
+				return MEM_STAT_OP_EXEC;
+			return MEM_STAT_OP_OTHER;
+		}
 	default:
 		break;
 	}
-	(void)val;
 	return -1;
 }
 
@@ -822,10 +833,25 @@ int mem_stat_index(const enum mem_stat_type mst, const u64 val)
 const char *mem_stat_name(const enum mem_stat_type mst, const int idx)
 {
 	switch (mst) {
-	case PERF_MEM_STAT_UNKNOWN:
+	case PERF_MEM_STAT_OP:
+		switch (idx) {
+		case MEM_STAT_OP_LOAD:
+			return "Load";
+		case MEM_STAT_OP_STORE:
+			return "Store";
+		case MEM_STAT_OP_LDST:
+			return "Ld+St";
+		case MEM_STAT_OP_PFETCH:
+			return "Pfetch";
+		case MEM_STAT_OP_EXEC:
+			return "Exec";
+		case MEM_STAT_OP_OTHER:
+			return "Other";
+		default:
+			break;
+		}
 	default:
 		break;
 	}
-	(void)idx;
 	return "N/A";
 }
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 7aeb4c5fefc89698..55e5e2607fb732b4 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -90,7 +90,16 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi);
 void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add);
 
 enum mem_stat_type {
-	PERF_MEM_STAT_UNKNOWN,  /* placeholder */
+	PERF_MEM_STAT_OP,
+};
+
+enum mem_stat_op {
+	MEM_STAT_OP_LOAD,
+	MEM_STAT_OP_STORE,
+	MEM_STAT_OP_LDST,
+	MEM_STAT_OP_PFETCH,
+	MEM_STAT_OP_EXEC,
+	MEM_STAT_OP_OTHER,
 };
 
 #define MEM_STAT_PRINT_LEN  7  /* 1 space + 5 digits + 1 percent sign */
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 7c669ea27af247e5..53fcb9191ea0cdc3 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2623,7 +2623,7 @@ static struct hpp_dimension hpp_sort_dimensions[] = {
 	DIM(PERF_HPP__WEIGHT3, "retire_lat"),
 	DIM(PERF_HPP__WEIGHT3, "p_stage_cyc"),
 	/* used for output only when SORT_MODE__MEM */
-	DIM_MEM(PERF_HPP__MEM_STAT_UNKNOWN, "unknown"),  /* placeholder */
+	DIM_MEM(PERF_HPP__MEM_STAT_OP, "op"),
 };
 
 #undef DIM_MEM
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/11] perf hist: Hide unused mem stat columns
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (6 preceding siblings ...)
  2025-04-30 20:55 ` [PATCH 07/11] perf mem: Add 'op' output field Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-05-02 16:18   ` Arnaldo Carvalho de Melo
  2025-05-02 16:27   ` Arnaldo Carvalho de Melo
  2025-04-30 20:55 ` [PATCH 09/11] perf mem: Add 'cache' and 'memory' output fields Namhyung Kim
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

Some mem_stat types don't use all 8 columns.  And there are cases only
samples in certain kinds of mem_stat types are available only.  For that
case hide columns which has no samples.

The new output for the previous data would be:

  $ perf mem report -F overhead,op,comm --stdio
  ...
  #           ------ Mem Op -------
  # Overhead     Load  Store  Other  Command
  # ........  .....................  ...............
  #
      44.85%    21.1%  30.7%  48.3%  swapper
      26.82%    98.8%   0.3%   0.9%  netsli-prober
       7.19%    51.7%  13.7%  34.6%  perf
       5.81%    89.7%   2.2%   8.1%  qemu-system-ppc
       4.77%   100.0%   0.0%   0.0%  notifications_c
       1.77%    95.9%   1.2%   3.0%  MemoryReleaser
       0.77%    71.6%   4.1%  24.3%  DefaultEventMan
       0.19%    66.7%  22.2%  11.1%  gnome-shell
       ...

On Intel machines, the event is only for loads or stores so it'll have
only one columns like below:

  #            Mem Op
  # Overhead     Load  Command
  # ........  .......  ...............
  #
      20.55%   100.0%  swapper
      17.13%   100.0%  chrome
       9.02%   100.0%  data-loop.0
       6.26%   100.0%  pipewire-pulse
       5.63%   100.0%  threaded-ml
       5.47%   100.0%  GraphRunner
       5.37%   100.0%  AudioIP~allback
       5.30%   100.0%  Chrome_ChildIOT
       3.17%   100.0%  Isolated Web Co
       ...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/hist.c   | 35 +++++++++++++++++++++++++++++++++--
 tools/perf/util/hist.c |  2 ++
 tools/perf/util/hist.h |  1 +
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 427ce687ad815a62..661922c4d7863224 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -178,6 +178,9 @@ int hpp__fmt_mem_stat(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *
 	for (int i = 0; i < MEM_STAT_LEN; i++) {
 		u64 val = he->mem_stat[mem_stat_idx].entries[i];
 
+		if (hists->mem_stat_total[mem_stat_idx].entries[i] == 0)
+			continue;
+
 		ret += hpp__call_print_fn(hpp, print_fn, fmtstr, 100.0 * val / total);
 	}
 
@@ -405,12 +408,31 @@ static int hpp__header_mem_stat_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hp
 	int ret = 0;
 	int len;
 	enum mem_stat_type mst = hpp__mem_stat_type(fmt);
+	int mem_stat_idx = -1;
+
+	for (int i = 0; i < hists->nr_mem_stats; i++) {
+		if (hists->mem_stat_types[i] == mst) {
+			mem_stat_idx = i;
+			break;
+		}
+	}
+	assert(mem_stat_idx != -1);
 
-	(void)hists;
 	if (line == 0) {
 		int left, right;
 
-		len = fmt->len;
+		len = 0;
+		/* update fmt->len for acutally used columns only */
+		for (int i = 0; i < MEM_STAT_LEN; i++) {
+			if (hists->mem_stat_total[mem_stat_idx].entries[i])
+				len += MEM_STAT_PRINT_LEN;
+		}
+		fmt->len = len;
+
+		/* print header directly if single column only */
+		if (len == MEM_STAT_PRINT_LEN)
+			return scnprintf(hpp->buf, hpp->size, "%*s", len, fmt->name);
+
 		left = (len - strlen(fmt->name)) / 2 - 1;
 		right = len - left - strlen(fmt->name) - 2;
 
@@ -423,10 +445,14 @@ static int hpp__header_mem_stat_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hp
 				 left, graph_dotted_line, fmt->name, right, graph_dotted_line);
 	}
 
+
 	len = hpp->size;
 	for (int i = 0; i < MEM_STAT_LEN; i++) {
 		int printed;
 
+		if (hists->mem_stat_total[mem_stat_idx].entries[i] == 0)
+			continue;
+
 		printed = scnprintf(buf, len, "%*s", MEM_STAT_PRINT_LEN,
 				    mem_stat_name(mst, i));
 		ret += printed;
@@ -1214,6 +1240,11 @@ int perf_hpp__alloc_mem_stats(struct perf_hpp_list *list, struct evlist *evlist)
 		if (hists->mem_stat_types == NULL)
 			return -ENOMEM;
 
+		hists->mem_stat_total = calloc(nr_mem_stats,
+					       sizeof(*hists->mem_stat_total));
+		if (hists->mem_stat_total == NULL)
+			return -ENOMEM;
+
 		memcpy(hists->mem_stat_types, mst, nr_mem_stats * sizeof(*mst));
 		hists->nr_mem_stats = nr_mem_stats;
 	}
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 7759c1818c1ad168..afc6855327ab0de6 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -354,6 +354,7 @@ static int hists__update_mem_stat(struct hists *hists, struct hist_entry *he,
 
 		assert(0 <= idx && idx < MEM_STAT_LEN);
 		he->mem_stat[i].entries[idx] += period;
+		hists->mem_stat_total[i].entries[idx] += period;
 	}
 	return 0;
 }
@@ -3054,6 +3055,7 @@ static void hists_evsel__exit(struct evsel *evsel)
 
 	hists__delete_all_entries(hists);
 	zfree(&hists->mem_stat_types);
+	zfree(&hists->mem_stat_total);
 
 	list_for_each_entry_safe(node, tmp, &hists->hpp_formats, list) {
 		perf_hpp_list__for_each_format_safe(&node->hpp, fmt, pos) {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 3990cfc21b1615ae..fa5e886e5b04ec9b 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -135,6 +135,7 @@ struct hists {
 	int			nr_hpp_node;
 	int			nr_mem_stats;
 	enum mem_stat_type	*mem_stat_types;
+	struct he_mem_stat	*mem_stat_total;
 };
 
 #define hists__has(__h, __f) (__h)->hpp_list->__f
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/11] perf mem: Add 'cache' and 'memory' output fields
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (7 preceding siblings ...)
  2025-04-30 20:55 ` [PATCH 08/11] perf hist: Hide unused mem stat columns Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-04-30 20:55 ` [PATCH 10/11] perf mem: Add 'snoop' output field Namhyung Kim
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

This is a breakdown of perf_mem_data_src.mem_lvl_num.  But it's also
divided into two parts because the combination is bigger than 8.

Since there are many entries for different cache levels, 'cache' field
focuses on them.  I generalized buffers like LFB, MAB and MHB to L1-buf
and L2-buf.

The rest goes to 'memory' field which can be RAM, CXL, PMEM, IO, etc.

  $ perf mem report -F cache,mem,dso --stdio
  ...
  #
  # -------------- Cache --------------  --- Memory ---
  #      L1     L2     L3 L1-buf  Other      RAM  Other  Shared Object
  # ...................................  ..............  ....................................
  #
      53.9%   3.6%  16.2%  21.6%   4.8%     4.8%  95.2%  [kernel.kallsyms]
      64.7%   1.7%   3.5%  17.4%  12.8%    12.8%  87.2%  chrome (deleted)
      78.3%   2.8%   0.0%   1.0%  17.9%    17.9%  82.1%  libc.so.6
      39.6%   1.5%   0.0%   5.7%  53.2%    53.2%  46.8%  libxul.so
      26.2%   0.0%   0.0%   0.0%  73.8%    73.8%  26.2%  [unknown]
      85.5%   0.0%   0.0%  14.5%   0.0%     0.0% 100.0%  libspa-audioconvert.so
      66.3%   4.4%   0.0%  29.4%   0.0%     0.0% 100.0%  libglib-2.0.so.0.8200.1 (deleted)
       1.9%   0.0%   0.0%   0.0%  98.1%    98.1%   1.9%  libmutter-cogl-15.so.0.0.0 (deleted)
      10.6%   0.0%   0.0%  89.4%   0.0%     0.0% 100.0%  libpulsecommon-16.1.so
       0.0%   0.0%   0.0% 100.0%   0.0%     0.0% 100.0%  libfreeblpriv3.so (deleted)
       ...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c |  6 +++
 tools/perf/ui/hist.c           | 10 +++++
 tools/perf/util/hist.h         |  2 +
 tools/perf/util/mem-events.c   | 71 +++++++++++++++++++++++++++++++++-
 tools/perf/util/mem-events.h   | 24 +++++++++++-
 tools/perf/util/sort.c         |  2 +
 6 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 66a4c769b2d76436..675dd64067747126 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1285,6 +1285,8 @@ __HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us, PERF_HPP_FMT_TYPE__PE
 __HPP_COLOR_ACC_PERCENT_FN(overhead_acc, period, PERF_HPP_FMT_TYPE__PERCENT)
 __HPP_COLOR_ACC_PERCENT_FN(latency_acc, latency, PERF_HPP_FMT_TYPE__LATENCY)
 __HPP_COLOR_MEM_STAT_FN(op, OP)
+__HPP_COLOR_MEM_STAT_FN(cache, CACHE)
+__HPP_COLOR_MEM_STAT_FN(memory, MEMORY)
 
 #undef __HPP_COLOR_PERCENT_FN
 #undef __HPP_COLOR_ACC_PERCENT_FN
@@ -1310,6 +1312,10 @@ void hist_browser__init_hpp(void)
 				hist_browser__hpp_color_latency_acc;
 	perf_hpp__format[PERF_HPP__MEM_STAT_OP].color =
 				hist_browser__hpp_color_mem_stat_op;
+	perf_hpp__format[PERF_HPP__MEM_STAT_CACHE].color =
+				hist_browser__hpp_color_mem_stat_cache;
+	perf_hpp__format[PERF_HPP__MEM_STAT_MEMORY].color =
+				hist_browser__hpp_color_mem_stat_memory;
 
 	res_sample_init();
 }
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 661922c4d7863224..7fc09c738ed02acb 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -348,6 +348,10 @@ static enum mem_stat_type hpp__mem_stat_type(struct perf_hpp_fmt *fmt)
 	switch (fmt->idx) {
 	case PERF_HPP__MEM_STAT_OP:
 		return PERF_MEM_STAT_OP;
+	case PERF_HPP__MEM_STAT_CACHE:
+		return PERF_MEM_STAT_CACHE;
+	case PERF_HPP__MEM_STAT_MEMORY:
+		return PERF_MEM_STAT_MEMORY;
 	default:
 		break;
 	}
@@ -644,6 +648,8 @@ HPP_AVERAGE_FNS(weight2, weight2)
 HPP_AVERAGE_FNS(weight3, weight3)
 
 HPP_MEM_STAT_FNS(op, OP)
+HPP_MEM_STAT_FNS(cache, CACHE)
+HPP_MEM_STAT_FNS(memory, MEMORY)
 
 static int64_t hpp__nop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 			    struct hist_entry *a __maybe_unused,
@@ -748,6 +754,8 @@ struct perf_hpp_fmt perf_hpp__format[] = {
 	HPP__PRINT_FNS("Weight2", weight2, WEIGHT2),
 	HPP__PRINT_FNS("Weight3", weight3, WEIGHT3),
 	HPP__MEM_STAT_PRINT_FNS("Mem Op", op, OP),
+	HPP__MEM_STAT_PRINT_FNS("Cache", cache, CACHE),
+	HPP__MEM_STAT_PRINT_FNS("Memory", memory, MEMORY),
 };
 
 struct perf_hpp_list perf_hpp_list = {
@@ -1103,6 +1111,8 @@ void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, struct hists *hists)
 		break;
 
 	case PERF_HPP__MEM_STAT_OP:
+	case PERF_HPP__MEM_STAT_CACHE:
+	case PERF_HPP__MEM_STAT_MEMORY:
 		fmt->len = MEM_STAT_LEN * MEM_STAT_PRINT_LEN;
 		break;
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index fa5e886e5b04ec9b..9de50d929ad1268c 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -589,6 +589,8 @@ enum {
 	PERF_HPP__WEIGHT2,
 	PERF_HPP__WEIGHT3,
 	PERF_HPP__MEM_STAT_OP,
+	PERF_HPP__MEM_STAT_CACHE,
+	PERF_HPP__MEM_STAT_MEMORY,
 
 	PERF_HPP__MAX_INDEX
 };
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 1c44ccc026fe9974..6822815278a4b213 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -823,6 +823,40 @@ int mem_stat_index(const enum mem_stat_type mst, const u64 val)
 				return MEM_STAT_OP_EXEC;
 			return MEM_STAT_OP_OTHER;
 		}
+	case PERF_MEM_STAT_CACHE:
+		switch (src.mem_lvl_num) {
+		case PERF_MEM_LVLNUM_L1:
+			return MEM_STAT_CACHE_L1;
+		case PERF_MEM_LVLNUM_L2:
+			return MEM_STAT_CACHE_L2;
+		case PERF_MEM_LVLNUM_L3:
+			return MEM_STAT_CACHE_L3;
+		case PERF_MEM_LVLNUM_L4:
+			return MEM_STAT_CACHE_L4;
+		case PERF_MEM_LVLNUM_LFB:
+			return MEM_STAT_CACHE_L1_BUF;
+		case PERF_MEM_LVLNUM_L2_MHB:
+			return MEM_STAT_CACHE_L2_BUF;
+		default:
+			return MEM_STAT_CACHE_OTHER;
+		}
+	case PERF_MEM_STAT_MEMORY:
+		switch (src.mem_lvl_num) {
+		case PERF_MEM_LVLNUM_MSC:
+			return MEM_STAT_MEMORY_MSC;
+		case PERF_MEM_LVLNUM_RAM:
+			return MEM_STAT_MEMORY_RAM;
+		case PERF_MEM_LVLNUM_UNC:
+			return MEM_STAT_MEMORY_UNC;
+		case PERF_MEM_LVLNUM_CXL:
+			return MEM_STAT_MEMORY_CXL;
+		case PERF_MEM_LVLNUM_IO:
+			return MEM_STAT_MEMORY_IO;
+		case PERF_MEM_LVLNUM_PMEM:
+			return MEM_STAT_MEMORY_PMEM;
+		default:
+			return MEM_STAT_MEMORY_OTHER;
+		}
 	default:
 		break;
 	}
@@ -846,9 +880,44 @@ const char *mem_stat_name(const enum mem_stat_type mst, const int idx)
 		case MEM_STAT_OP_EXEC:
 			return "Exec";
 		case MEM_STAT_OP_OTHER:
+		default:
+			return "Other";
+		}
+	case PERF_MEM_STAT_CACHE:
+		switch (idx) {
+		case MEM_STAT_CACHE_L1:
+			return "L1";
+		case MEM_STAT_CACHE_L2:
+			return "L2";
+		case MEM_STAT_CACHE_L3:
+			return "L3";
+		case MEM_STAT_CACHE_L4:
+			return "L4";
+		case MEM_STAT_CACHE_L1_BUF:
+			return "L1-buf";
+		case MEM_STAT_CACHE_L2_BUF:
+			return "L2-buf";
+		case MEM_STAT_CACHE_OTHER:
+		default:
 			return "Other";
+		}
+	case PERF_MEM_STAT_MEMORY:
+		switch (idx) {
+		case MEM_STAT_MEMORY_RAM:
+			return "RAM";
+		case MEM_STAT_MEMORY_MSC:
+			return "MSC";
+		case MEM_STAT_MEMORY_UNC:
+			return "Uncach";
+		case MEM_STAT_MEMORY_CXL:
+			return "CXL";
+		case MEM_STAT_MEMORY_IO:
+			return "IO";
+		case MEM_STAT_MEMORY_PMEM:
+			return "PMEM";
+		case MEM_STAT_MEMORY_OTHER:
 		default:
-			break;
+			return "Other";
 		}
 	default:
 		break;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 55e5e2607fb732b4..002e2772400e3dda 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -91,8 +91,12 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add);
 
 enum mem_stat_type {
 	PERF_MEM_STAT_OP,
+	PERF_MEM_STAT_CACHE,
+	PERF_MEM_STAT_MEMORY,
 };
 
+#define MEM_STAT_PRINT_LEN  7  /* 1 space + 5 digits + 1 percent sign */
+
 enum mem_stat_op {
 	MEM_STAT_OP_LOAD,
 	MEM_STAT_OP_STORE,
@@ -102,7 +106,25 @@ enum mem_stat_op {
 	MEM_STAT_OP_OTHER,
 };
 
-#define MEM_STAT_PRINT_LEN  7  /* 1 space + 5 digits + 1 percent sign */
+enum mem_stat_cache {
+	MEM_STAT_CACHE_L1,
+	MEM_STAT_CACHE_L2,
+	MEM_STAT_CACHE_L3,
+	MEM_STAT_CACHE_L4,
+	MEM_STAT_CACHE_L1_BUF,
+	MEM_STAT_CACHE_L2_BUF,
+	MEM_STAT_CACHE_OTHER,
+};
+
+enum mem_stat_memory {
+	MEM_STAT_MEMORY_RAM,
+	MEM_STAT_MEMORY_MSC,
+	MEM_STAT_MEMORY_UNC,
+	MEM_STAT_MEMORY_CXL,
+	MEM_STAT_MEMORY_IO,
+	MEM_STAT_MEMORY_PMEM,
+	MEM_STAT_MEMORY_OTHER,
+};
 
 int mem_stat_index(const enum mem_stat_type mst, const u64 data_src);
 const char *mem_stat_name(const enum mem_stat_type mst, const int idx);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 53fcb9191ea0cdc3..2ad88f7de95a2247 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2624,6 +2624,8 @@ static struct hpp_dimension hpp_sort_dimensions[] = {
 	DIM(PERF_HPP__WEIGHT3, "p_stage_cyc"),
 	/* used for output only when SORT_MODE__MEM */
 	DIM_MEM(PERF_HPP__MEM_STAT_OP, "op"),
+	DIM_MEM(PERF_HPP__MEM_STAT_CACHE, "cache"),
+	DIM_MEM(PERF_HPP__MEM_STAT_MEMORY, "memory"),
 };
 
 #undef DIM_MEM
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/11] perf mem: Add 'snoop' output field
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (8 preceding siblings ...)
  2025-04-30 20:55 ` [PATCH 09/11] perf mem: Add 'cache' and 'memory' output fields Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-04-30 20:55 ` [PATCH 11/11] perf mem: Add 'dtlb' " Namhyung Kim
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

This is a breakdown of perf_mem_data_src.mem_snoop values.  For now, it
doesn't use mem_snoopx values like FWD and PEER.

  $ perf mem report -F overhead,snoop,comm --stdio
  ...
  #           ---------- Snoop -----------
  # Overhead      Hit   HitM   Miss  Other  Command
  # ........  ............................  ...............
  #
      34.24%     0.6%   0.0%   0.0%  99.4%  gnome-shell
      12.02%     1.0%   0.0%   0.0%  99.0%  chrome
       9.32%     1.0%   0.0%   0.3%  98.7%  Isolated Web Co
       6.85%     1.0%   0.3%   0.0%  98.6%  swapper
       6.30%     0.8%   0.8%   0.0%  98.5%  Xorg
       3.02%     2.4%   0.0%   0.0%  97.6%  VizCompositorTh
       2.35%     0.0%   0.0%   0.0% 100.0%  firefox-esr
       2.04%     0.0%   0.0%   0.0% 100.0%  JS Helper
       1.51%     3.2%   0.0%   0.0%  96.8%  threaded-ml
       1.44%     0.0%   0.0%   0.0% 100.0%  AudioIP~allback
       ...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c |  3 +++
 tools/perf/ui/hist.c           |  5 +++++
 tools/perf/util/hist.h         |  1 +
 tools/perf/util/mem-events.c   | 23 +++++++++++++++++++++++
 tools/perf/util/mem-events.h   |  8 ++++++++
 tools/perf/util/sort.c         |  1 +
 6 files changed, 41 insertions(+)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 675dd64067747126..5b080f5062440246 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1287,6 +1287,7 @@ __HPP_COLOR_ACC_PERCENT_FN(latency_acc, latency, PERF_HPP_FMT_TYPE__LATENCY)
 __HPP_COLOR_MEM_STAT_FN(op, OP)
 __HPP_COLOR_MEM_STAT_FN(cache, CACHE)
 __HPP_COLOR_MEM_STAT_FN(memory, MEMORY)
+__HPP_COLOR_MEM_STAT_FN(snoop, SNOOP)
 
 #undef __HPP_COLOR_PERCENT_FN
 #undef __HPP_COLOR_ACC_PERCENT_FN
@@ -1316,6 +1317,8 @@ void hist_browser__init_hpp(void)
 				hist_browser__hpp_color_mem_stat_cache;
 	perf_hpp__format[PERF_HPP__MEM_STAT_MEMORY].color =
 				hist_browser__hpp_color_mem_stat_memory;
+	perf_hpp__format[PERF_HPP__MEM_STAT_SNOOP].color =
+				hist_browser__hpp_color_mem_stat_snoop;
 
 	res_sample_init();
 }
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 7fc09c738ed02acb..94024dfa8dccf9ba 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -352,6 +352,8 @@ static enum mem_stat_type hpp__mem_stat_type(struct perf_hpp_fmt *fmt)
 		return PERF_MEM_STAT_CACHE;
 	case PERF_HPP__MEM_STAT_MEMORY:
 		return PERF_MEM_STAT_MEMORY;
+	case PERF_HPP__MEM_STAT_SNOOP:
+		return PERF_MEM_STAT_SNOOP;
 	default:
 		break;
 	}
@@ -650,6 +652,7 @@ HPP_AVERAGE_FNS(weight3, weight3)
 HPP_MEM_STAT_FNS(op, OP)
 HPP_MEM_STAT_FNS(cache, CACHE)
 HPP_MEM_STAT_FNS(memory, MEMORY)
+HPP_MEM_STAT_FNS(snoop, SNOOP)
 
 static int64_t hpp__nop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 			    struct hist_entry *a __maybe_unused,
@@ -756,6 +759,7 @@ struct perf_hpp_fmt perf_hpp__format[] = {
 	HPP__MEM_STAT_PRINT_FNS("Mem Op", op, OP),
 	HPP__MEM_STAT_PRINT_FNS("Cache", cache, CACHE),
 	HPP__MEM_STAT_PRINT_FNS("Memory", memory, MEMORY),
+	HPP__MEM_STAT_PRINT_FNS("Snoop", snoop, SNOOP),
 };
 
 struct perf_hpp_list perf_hpp_list = {
@@ -1113,6 +1117,7 @@ void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, struct hists *hists)
 	case PERF_HPP__MEM_STAT_OP:
 	case PERF_HPP__MEM_STAT_CACHE:
 	case PERF_HPP__MEM_STAT_MEMORY:
+	case PERF_HPP__MEM_STAT_SNOOP:
 		fmt->len = MEM_STAT_LEN * MEM_STAT_PRINT_LEN;
 		break;
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 9de50d929ad1268c..c2d286c4ba395674 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -591,6 +591,7 @@ enum {
 	PERF_HPP__MEM_STAT_OP,
 	PERF_HPP__MEM_STAT_CACHE,
 	PERF_HPP__MEM_STAT_MEMORY,
+	PERF_HPP__MEM_STAT_SNOOP,
 
 	PERF_HPP__MAX_INDEX
 };
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 6822815278a4b213..ddcfc6500d77a9e6 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -857,6 +857,17 @@ int mem_stat_index(const enum mem_stat_type mst, const u64 val)
 		default:
 			return MEM_STAT_MEMORY_OTHER;
 		}
+	case PERF_MEM_STAT_SNOOP:
+		switch (src.mem_snoop) {
+		case PERF_MEM_SNOOP_HIT:
+			return MEM_STAT_SNOOP_HIT;
+		case PERF_MEM_SNOOP_HITM:
+			return MEM_STAT_SNOOP_HITM;
+		case PERF_MEM_SNOOP_MISS:
+			return MEM_STAT_SNOOP_MISS;
+		default:
+			return MEM_STAT_SNOOP_OTHER;
+		}
 	default:
 		break;
 	}
@@ -919,6 +930,18 @@ const char *mem_stat_name(const enum mem_stat_type mst, const int idx)
 		default:
 			return "Other";
 		}
+	case PERF_MEM_STAT_SNOOP:
+		switch (idx) {
+		case MEM_STAT_SNOOP_HIT:
+			return "Hit";
+		case MEM_STAT_SNOOP_HITM:
+			return "HitM";
+		case MEM_STAT_SNOOP_MISS:
+			return "Miss";
+		case MEM_STAT_SNOOP_OTHER:
+		default:
+			return "Other";
+		}
 	default:
 		break;
 	}
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 002e2772400e3dda..4d8f18583af42550 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -93,6 +93,7 @@ enum mem_stat_type {
 	PERF_MEM_STAT_OP,
 	PERF_MEM_STAT_CACHE,
 	PERF_MEM_STAT_MEMORY,
+	PERF_MEM_STAT_SNOOP,
 };
 
 #define MEM_STAT_PRINT_LEN  7  /* 1 space + 5 digits + 1 percent sign */
@@ -126,6 +127,13 @@ enum mem_stat_memory {
 	MEM_STAT_MEMORY_OTHER,
 };
 
+enum mem_stat_snoop {
+	MEM_STAT_SNOOP_HIT,
+	MEM_STAT_SNOOP_HITM,
+	MEM_STAT_SNOOP_MISS,
+	MEM_STAT_SNOOP_OTHER,
+};
+
 int mem_stat_index(const enum mem_stat_type mst, const u64 data_src);
 const char *mem_stat_name(const enum mem_stat_type mst, const int idx);
 
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 2ad88f7de95a2247..51a210d874327d3a 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2626,6 +2626,7 @@ static struct hpp_dimension hpp_sort_dimensions[] = {
 	DIM_MEM(PERF_HPP__MEM_STAT_OP, "op"),
 	DIM_MEM(PERF_HPP__MEM_STAT_CACHE, "cache"),
 	DIM_MEM(PERF_HPP__MEM_STAT_MEMORY, "memory"),
+	DIM_MEM(PERF_HPP__MEM_STAT_SNOOP, "snoop"),
 };
 
 #undef DIM_MEM
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/11] perf mem: Add 'dtlb' output field
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (9 preceding siblings ...)
  2025-04-30 20:55 ` [PATCH 10/11] perf mem: Add 'snoop' output field Namhyung Kim
@ 2025-04-30 20:55 ` Namhyung Kim
  2025-05-02 16:30   ` Arnaldo Carvalho de Melo
  2025-05-02 16:00 ` [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Arnaldo Carvalho de Melo
  2025-05-08  4:12 ` Ravi Bangoria
  12 siblings, 1 reply; 23+ messages in thread
From: Namhyung Kim @ 2025-04-30 20:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Ravi Bangoria, Leo Yan

This is a breakdown of perf_mem_data_src.mem_dtlb values.  It assumes
PMU drivers would set PERF_MEM_TLB_HIT bit with an appropriate level.
And having PERF_MEM_TLB_MISS means that it failed to find one in any
levels of TLB.  For now, it doesn't use PERF_MEM_TLB_{WK,OS} bits.

Also it seems Intel machines don't distinguish L1 or L2 precisely.  So I
added ANY_HIT (printed as "L?-Hit") to handle the case.

  $ perf mem report -F overhead,dtlb,dso --stdio
  ...
  #           --- D-TLB ----
  # Overhead   L?-Hit   Miss  Shared Object
  # ........  ..............  .................
  #
      67.03%    99.5%   0.5%  [unknown]
      31.23%    99.2%   0.8%  [kernel.kallsyms]
       1.08%    97.8%   2.2%  [i915]
       0.36%   100.0%   0.0%  [JIT] tid 6853
       0.12%   100.0%   0.0%  [drm]
       0.05%   100.0%   0.0%  [drm_kms_helper]
       0.05%   100.0%   0.0%  [ext4]
       0.02%   100.0%   0.0%  [aesni_intel]
       0.02%   100.0%   0.0%  [crc32c_intel]
       0.02%   100.0%   0.0%  [dm_crypt]
       ...

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/browsers/hists.c |  3 +++
 tools/perf/ui/hist.c           |  5 +++++
 tools/perf/util/hist.h         |  1 +
 tools/perf/util/mem-events.c   | 27 +++++++++++++++++++++++++++
 tools/perf/util/mem-events.h   |  9 +++++++++
 tools/perf/util/sort.c         |  1 +
 6 files changed, 46 insertions(+)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 5b080f5062440246..d26b925e3d7f46af 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -1288,6 +1288,7 @@ __HPP_COLOR_MEM_STAT_FN(op, OP)
 __HPP_COLOR_MEM_STAT_FN(cache, CACHE)
 __HPP_COLOR_MEM_STAT_FN(memory, MEMORY)
 __HPP_COLOR_MEM_STAT_FN(snoop, SNOOP)
+__HPP_COLOR_MEM_STAT_FN(dtlb, DTLB)
 
 #undef __HPP_COLOR_PERCENT_FN
 #undef __HPP_COLOR_ACC_PERCENT_FN
@@ -1319,6 +1320,8 @@ void hist_browser__init_hpp(void)
 				hist_browser__hpp_color_mem_stat_memory;
 	perf_hpp__format[PERF_HPP__MEM_STAT_SNOOP].color =
 				hist_browser__hpp_color_mem_stat_snoop;
+	perf_hpp__format[PERF_HPP__MEM_STAT_DTLB].color =
+				hist_browser__hpp_color_mem_stat_dtlb;
 
 	res_sample_init();
 }
diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index 94024dfa8dccf9ba..ed5c40ebd906f076 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -354,6 +354,8 @@ static enum mem_stat_type hpp__mem_stat_type(struct perf_hpp_fmt *fmt)
 		return PERF_MEM_STAT_MEMORY;
 	case PERF_HPP__MEM_STAT_SNOOP:
 		return PERF_MEM_STAT_SNOOP;
+	case PERF_HPP__MEM_STAT_DTLB:
+		return PERF_MEM_STAT_DTLB;
 	default:
 		break;
 	}
@@ -653,6 +655,7 @@ HPP_MEM_STAT_FNS(op, OP)
 HPP_MEM_STAT_FNS(cache, CACHE)
 HPP_MEM_STAT_FNS(memory, MEMORY)
 HPP_MEM_STAT_FNS(snoop, SNOOP)
+HPP_MEM_STAT_FNS(dtlb, DTLB)
 
 static int64_t hpp__nop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 			    struct hist_entry *a __maybe_unused,
@@ -760,6 +763,7 @@ struct perf_hpp_fmt perf_hpp__format[] = {
 	HPP__MEM_STAT_PRINT_FNS("Cache", cache, CACHE),
 	HPP__MEM_STAT_PRINT_FNS("Memory", memory, MEMORY),
 	HPP__MEM_STAT_PRINT_FNS("Snoop", snoop, SNOOP),
+	HPP__MEM_STAT_PRINT_FNS("D-TLB", dtlb, DTLB),
 };
 
 struct perf_hpp_list perf_hpp_list = {
@@ -1118,6 +1122,7 @@ void perf_hpp__reset_width(struct perf_hpp_fmt *fmt, struct hists *hists)
 	case PERF_HPP__MEM_STAT_CACHE:
 	case PERF_HPP__MEM_STAT_MEMORY:
 	case PERF_HPP__MEM_STAT_SNOOP:
+	case PERF_HPP__MEM_STAT_DTLB:
 		fmt->len = MEM_STAT_LEN * MEM_STAT_PRINT_LEN;
 		break;
 
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index c2d286c4ba395674..355198fd70281f43 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -592,6 +592,7 @@ enum {
 	PERF_HPP__MEM_STAT_CACHE,
 	PERF_HPP__MEM_STAT_MEMORY,
 	PERF_HPP__MEM_STAT_SNOOP,
+	PERF_HPP__MEM_STAT_DTLB,
 
 	PERF_HPP__MAX_INDEX
 };
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index ddcfc6500d77a9e6..3e9131e05348a996 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -868,6 +868,19 @@ int mem_stat_index(const enum mem_stat_type mst, const u64 val)
 		default:
 			return MEM_STAT_SNOOP_OTHER;
 		}
+	case PERF_MEM_STAT_DTLB:
+		switch (src.mem_dtlb) {
+		case PERF_MEM_TLB_L1 | PERF_MEM_TLB_HIT:
+			return MEM_STAT_DTLB_L1_HIT;
+		case PERF_MEM_TLB_L2 | PERF_MEM_TLB_HIT:
+			return MEM_STAT_DTLB_L2_HIT;
+		case PERF_MEM_TLB_L1 | PERF_MEM_TLB_L2 | PERF_MEM_TLB_HIT:
+			return MEM_STAT_DTLB_ANY_HIT;
+		default:
+			if (src.mem_dtlb & PERF_MEM_TLB_MISS)
+				return MEM_STAT_DTLB_MISS;
+			return MEM_STAT_DTLB_OTHER;
+		}
 	default:
 		break;
 	}
@@ -942,6 +955,20 @@ const char *mem_stat_name(const enum mem_stat_type mst, const int idx)
 		default:
 			return "Other";
 		}
+	case PERF_MEM_STAT_DTLB:
+		switch (idx) {
+		case MEM_STAT_DTLB_L1_HIT:
+			return "L1-Hit";
+		case MEM_STAT_DTLB_L2_HIT:
+			return "L2-Hit";
+		case MEM_STAT_DTLB_ANY_HIT:
+			return "L?-Hit";
+		case MEM_STAT_DTLB_MISS:
+			return "Miss";
+		case MEM_STAT_DTLB_OTHER:
+		default:
+			return "Other";
+		}
 	default:
 		break;
 	}
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 4d8f18583af42550..5b98076904b0b689 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -94,6 +94,7 @@ enum mem_stat_type {
 	PERF_MEM_STAT_CACHE,
 	PERF_MEM_STAT_MEMORY,
 	PERF_MEM_STAT_SNOOP,
+	PERF_MEM_STAT_DTLB,
 };
 
 #define MEM_STAT_PRINT_LEN  7  /* 1 space + 5 digits + 1 percent sign */
@@ -134,6 +135,14 @@ enum mem_stat_snoop {
 	MEM_STAT_SNOOP_OTHER,
 };
 
+enum mem_stat_dtlb {
+	MEM_STAT_DTLB_L1_HIT,
+	MEM_STAT_DTLB_L2_HIT,
+	MEM_STAT_DTLB_ANY_HIT,
+	MEM_STAT_DTLB_MISS,
+	MEM_STAT_DTLB_OTHER,
+};
+
 int mem_stat_index(const enum mem_stat_type mst, const u64 data_src);
 const char *mem_stat_name(const enum mem_stat_type mst, const int idx);
 
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 51a210d874327d3a..8efafa7c10822ee9 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -2627,6 +2627,7 @@ static struct hpp_dimension hpp_sort_dimensions[] = {
 	DIM_MEM(PERF_HPP__MEM_STAT_CACHE, "cache"),
 	DIM_MEM(PERF_HPP__MEM_STAT_MEMORY, "memory"),
 	DIM_MEM(PERF_HPP__MEM_STAT_SNOOP, "snoop"),
+	DIM_MEM(PERF_HPP__MEM_STAT_DTLB, "dtlb"),
 };
 
 #undef DIM_MEM
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (10 preceding siblings ...)
  2025-04-30 20:55 ` [PATCH 11/11] perf mem: Add 'dtlb' " Namhyung Kim
@ 2025-05-02 16:00 ` Arnaldo Carvalho de Melo
  2025-05-08  4:12 ` Ravi Bangoria
  12 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-05-02 16:00 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Joe Mario, Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter,
	Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
	Ravi Bangoria, Leo Yan

On Wed, Apr 30, 2025 at 01:55:37PM -0700, Namhyung Kim wrote:
> Hello,
 
> The perf mem uses PERF_SAMPLE_DATA_SRC which has a lot of information
> for memory access.  It has various sort keys to group related samples
> together but it's still cumbersome to see the result.  While perf c2c
> command provides a way to investigate the data in a specific way, I'd
> like to add more generic ways using new output fields.
 
> For example, the following is the 'cache' output field which breaks
> down the sample weights into different level of caches.

Super cool!
 
>   $ perf mem record -a sleep 1
>   
>   $ perf mem report -F cache,dso,sym --stdio
>   ...
>   #
>   # -------------- Cache --------------
>   #      L1     L2     L3 L1-buf  Other  Shared Object                                  Symbol
>   # ...................................  .....................................  .........................................
>   #
>        0.0%   0.0%   0.0%   0.0% 100.0%  [kernel.kallsyms]                      [k] ioread8
>      100.0%   0.0%   0.0%   0.0%   0.0%  [kernel.kallsyms]                      [k] _raw_spin_lock_irq
>        0.0%   0.0%   0.0%   0.0% 100.0%  [xhci_hcd]                             [k] xhci_update_erst_dequeue
>        0.0%   0.0%   0.0%  95.8%   4.2%  [kernel.kallsyms]                      [k] smaps_account
>        0.6%   1.8%  22.7%  45.5%  29.5%  [kernel.kallsyms]                      [k] sched_balance_update_blocked_averages
>       29.4%   0.0%   1.6%  58.8%  10.2%  [kernel.kallsyms]                      [k] __update_load_avg_cfs_rq
>        0.0%   8.5%   4.3%   0.0%  87.2%  [kernel.kallsyms]                      [k] copy_mc_enhanced_fast_string
>       63.9%   0.0%   8.0%  23.8%   4.3%  [kernel.kallsyms]                      [k] psi_group_change
>        3.9%   0.0%   9.3%  35.7%  51.1%  [kernel.kallsyms]                      [k] timerqueue_add
>       35.9%  10.9%   0.0%  39.0%  14.2%  [kernel.kallsyms]                      [k] memcpy
>       94.1%   0.0%   0.0%   5.9%   0.0%  [kernel.kallsyms]                      [k] unmap_page_range
>       25.7%   0.0%   4.9%  51.0%  18.4%  [kernel.kallsyms]                      [k] __update_load_avg_se
>        0.0%  24.9%  19.4%   9.6%  46.1%  [kernel.kallsyms]                      [k] _copy_to_iter
>       12.9%   0.0%   0.0%  87.1%   0.0%  [kernel.kallsyms]                      [k] next_uptodate_folio
>       36.8%   0.0%   9.5%  16.6%  37.1%  [kernel.kallsyms]                      [k] update_curr
>      100.0%   0.0%   0.0%   0.0%   0.0%  bpf_prog_b9611ccbbb3d1833_dfs_iter     [k] bpf_prog_b9611ccbbb3d1833_dfs_iter
>       45.4%   1.8%  20.4%  23.6%   8.8%  [kernel.kallsyms]                      [k] audit_filter_rules.isra.0
>       92.8%   0.0%   0.0%   7.2%   0.0%  [kernel.kallsyms]                      [k] filemap_map_pages
>       10.6%   0.0%   0.0%  89.4%   0.0%  [kernel.kallsyms]                      [k] smaps_page_accumulate
>       38.3%   0.0%  29.6%  27.1%   5.0%  [kernel.kallsyms]                      [k] __schedule
 
> Please see the description of each commit for other fields.
 
> New mem_stat field was added to the hist_entry to save this
> information.  It's a generic data structure (array) to handle
> different type of information like cache-level, memory location,
> snoop-result, etc.
 
> The first patch is a fix for the hierarchy mode and it was sent
> separately.  I just add it here not to break the hierarchy mode.  The
> second patch is to enable SAMPLE_DATA_SRC without SAMPLE_ADDR and
> perf_event_attr.mmap_data which generate a lot more data.

I merged it and added a test for the hierachy mode as mentioned in my
reply to that patch.
 
> The name of some new fields are the same as the corresponding sort
> keys (mem, op, snoop) so I had to change the order whether it's
> applied as an output field or a sort key.  Maybe it's better to name
> them differently but I couldn't come up with better ideas.

Looks ok at first sight.
 
> That means, you need to use -F/--fields option to specify those fields
> and the sort keys you want.  Maybe we can change the default output
> and sort keys for perf mem report with this.

Maybe we can come up with aliases to help using these new features
without having to create a long command line, maybe:

perf cache

Or some other more suitable name.

That would just be translated into the long command line for 'perf
report', kinda like 'perf kvm', but maybe we can do it like with 'perf
archive', i.e. just a shell wrapper?
 
> The code is available at 'perf/mem-field-v1' branch in

I'll test it, and I'm CCing Joe Mario, who I think will be very much
interesting in trying this!

- Arnaldo
 
>  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
 
> Thanks,
> Namhyung
 
> Namhyung Kim (11):
>   perf hist: Remove output field from sort-list properly
>   perf record: Add --sample-mem-info option
>   perf hist: Support multi-line header
>   perf hist: Add struct he_mem_stat
>   perf hist: Basic support for mem_stat accounting
>   perf hist: Implement output fields for mem stats
>   perf mem: Add 'op' output field
>   perf hist: Hide unused mem stat columns
>   perf mem: Add 'cache' and 'memory' output fields
>   perf mem: Add 'snoop' output field
>   perf mem: Add 'dtlb' output field
> 
>  tools/perf/Documentation/perf-record.txt |   7 +-
>  tools/perf/builtin-record.c              |   6 +
>  tools/perf/ui/browsers/hists.c           |  50 ++++-
>  tools/perf/ui/hist.c                     | 272 ++++++++++++++++++++++-
>  tools/perf/ui/stdio/hist.c               |  57 +++--
>  tools/perf/util/evsel.c                  |   2 +-
>  tools/perf/util/hist.c                   |  78 +++++++
>  tools/perf/util/hist.h                   |  22 ++
>  tools/perf/util/mem-events.c             | 183 ++++++++++++++-
>  tools/perf/util/mem-events.h             |  57 +++++
>  tools/perf/util/record.h                 |   1 +
>  tools/perf/util/sort.c                   |  42 +++-
>  12 files changed, 718 insertions(+), 59 deletions(-)
> 
> -- 
> 2.49.0.906.g1f30a19c02-goog

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 08/11] perf hist: Hide unused mem stat columns
  2025-04-30 20:55 ` [PATCH 08/11] perf hist: Hide unused mem stat columns Namhyung Kim
@ 2025-05-02 16:18   ` Arnaldo Carvalho de Melo
  2025-05-02 16:27   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-05-02 16:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users, Ravi Bangoria, Leo Yan

On Wed, Apr 30, 2025 at 01:55:45PM -0700, Namhyung Kim wrote:
> Some mem_stat types don't use all 8 columns.  And there are cases only
> samples in certain kinds of mem_stat types are available only.  For that
> case hide columns which has no samples.
> 
> The new output for the previous data would be:
> 
>   $ perf mem report -F overhead,op,comm --stdio
>   ...
>   #           ------ Mem Op -------
>   # Overhead     Load  Store  Other  Command
>   # ........  .....................  ...............
>   #
>       44.85%    21.1%  30.7%  48.3%  swapper
>       26.82%    98.8%   0.3%   0.9%  netsli-prober
>        7.19%    51.7%  13.7%  34.6%  perf
>        5.81%    89.7%   2.2%   8.1%  qemu-system-ppc
>        4.77%   100.0%   0.0%   0.0%  notifications_c
>        1.77%    95.9%   1.2%   3.0%  MemoryReleaser
>        0.77%    71.6%   4.1%  24.3%  DefaultEventMan
>        0.19%    66.7%  22.2%  11.1%  gnome-shell
>        ...
> 
> On Intel machines, the event is only for loads or stores so it'll have
> only one columns like below:
> 
>   #            Mem Op
>   # Overhead     Load  Command
>   # ........  .......  ...............
>   #
>       20.55%   100.0%  swapper
>       17.13%   100.0%  chrome
>        9.02%   100.0%  data-loop.0
>        6.26%   100.0%  pipewire-pulse
>        5.63%   100.0%  threaded-ml
>        5.47%   100.0%  GraphRunner
>        5.37%   100.0%  AudioIP~allback
>        5.30%   100.0%  Chrome_ChildIOT
>        3.17%   100.0%  Isolated Web Co
>        ...

  # grep "model name" -m1 /proc/cpuinfo
  model name    : AMD Ryzen 9 9950X3D 16-Core Processo
  # perf mem report -F overhead,op,comm --stdio
  # Total Lost Samples: 0
  #
  # Samples: 2K of event 'cycles:P'
  # Total weight : 2637
  # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc
  #
  #           ------ Mem Op -------
  # Overhead     Load  Store  Other  Command
  # ........  .....................  ...............
  #
      61.02%    14.4%  25.5%  60.1%  swapper
       5.61%    26.4%  13.5%  60.1%  Isolated Web Co
       5.50%    21.4%  29.7%  49.0%  perf
       4.74%    27.2%  15.2%  57.6%  gnome-shell
       4.63%    33.6%  11.5%  54.9%  mdns_service
       4.29%    28.3%  12.4%  59.3%  ptyxis
       2.16%    24.6%  19.3%  56.1%  DOM Worker
       0.99%    23.1%  34.6%  42.3%  firefox
       0.72%    26.3%  15.8%  57.9%  IPC I/O Parent
       0.61%    12.5%  12.5%  75.0%  kworker/u130:20
       0.61%    37.5%  18.8%  43.8%  podman
       0.57%    33.3%   6.7%  60.0%  Timer
       0.53%    14.3%   7.1%  78.6%  KMS thread
       0.49%    30.8%   7.7%  61.5%  kworker/u130:3-
       0.46%    41.7%  33.3%  25.0%  IPDL Background

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 08/11] perf hist: Hide unused mem stat columns
  2025-04-30 20:55 ` [PATCH 08/11] perf hist: Hide unused mem stat columns Namhyung Kim
  2025-05-02 16:18   ` Arnaldo Carvalho de Melo
@ 2025-05-02 16:27   ` Arnaldo Carvalho de Melo
  2025-05-02 18:21     ` Namhyung Kim
  1 sibling, 1 reply; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-05-02 16:27 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users, Ravi Bangoria, Leo Yan

On Wed, Apr 30, 2025 at 01:55:45PM -0700, Namhyung Kim wrote:
> Some mem_stat types don't use all 8 columns.  And there are cases only
> samples in certain kinds of mem_stat types are available only.  For that
> case hide columns which has no samples.
> 
> The new output for the previous data would be:
> 
>   $ perf mem report -F overhead,op,comm --stdio
>   ...
>   #           ------ Mem Op -------
>   # Overhead     Load  Store  Other  Command
>   # ........  .....................  ...............
>   #
>       44.85%    21.1%  30.7%  48.3%  swapper
>       26.82%    98.8%   0.3%   0.9%  netsli-prober

/me curious about this "Other" column.

Maps to MEM_STAT_OP_OTHER, that comes from mem_stat_index, that comes
from:

int mem_stat_index(const enum mem_stat_type mst, const u64 val)
{
        union perf_mem_data_src src = {
                .val = val,
        };

                int idx = mem_stat_index(hists->mem_stat_types[i],
                                         mem_info__const_data_src(mi)->val);

struct mem_info *mi


union perf_mem_data_src {
        __u64 val;
        struct {
                __u64   mem_op:5,       /* type of opcode */
                        mem_lvl:14,     /* memory hierarchy level */
                        mem_snoop:5,    /* snoop mode */
                        mem_lock:2,     /* lock instr */
                        mem_dtlb:7,     /* tlb access */
                        mem_lvl_num:4,  /* memory hierarchy level number */
                        mem_remote:1,   /* remote */
                        mem_snoopx:2,   /* snoop mode, ext */
                        mem_blk:3,      /* access blocked */
                        mem_hops:3,     /* hop level */
                        mem_rsvd:18;
        };
};

As the percentage for "Other" is so high I think some other patch in
this series will elucidate that :-)

Lemme continue testing...

- Arnaldo

>        7.19%    51.7%  13.7%  34.6%  perf
>        5.81%    89.7%   2.2%   8.1%  qemu-system-ppc
>        4.77%   100.0%   0.0%   0.0%  notifications_c
>        1.77%    95.9%   1.2%   3.0%  MemoryReleaser
>        0.77%    71.6%   4.1%  24.3%  DefaultEventMan
>        0.19%    66.7%  22.2%  11.1%  gnome-shell
>        ...
> 
> On Intel machines, the event is only for loads or stores so it'll have
> only one columns like below:
> 
>   #            Mem Op
>   # Overhead     Load  Command
>   # ........  .......  ...............
>   #
>       20.55%   100.0%  swapper
>       17.13%   100.0%  chrome
>        9.02%   100.0%  data-loop.0
>        6.26%   100.0%  pipewire-pulse
>        5.63%   100.0%  threaded-ml
>        5.47%   100.0%  GraphRunner
>        5.37%   100.0%  AudioIP~allback
>        5.30%   100.0%  Chrome_ChildIOT
>        3.17%   100.0%  Isolated Web Co
>        ...
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/ui/hist.c   | 35 +++++++++++++++++++++++++++++++++--
>  tools/perf/util/hist.c |  2 ++
>  tools/perf/util/hist.h |  1 +
>  3 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
> index 427ce687ad815a62..661922c4d7863224 100644
> --- a/tools/perf/ui/hist.c
> +++ b/tools/perf/ui/hist.c
> @@ -178,6 +178,9 @@ int hpp__fmt_mem_stat(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *
>  	for (int i = 0; i < MEM_STAT_LEN; i++) {
>  		u64 val = he->mem_stat[mem_stat_idx].entries[i];
>  
> +		if (hists->mem_stat_total[mem_stat_idx].entries[i] == 0)
> +			continue;
> +
>  		ret += hpp__call_print_fn(hpp, print_fn, fmtstr, 100.0 * val / total);
>  	}
>  
> @@ -405,12 +408,31 @@ static int hpp__header_mem_stat_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hp
>  	int ret = 0;
>  	int len;
>  	enum mem_stat_type mst = hpp__mem_stat_type(fmt);
> +	int mem_stat_idx = -1;
> +
> +	for (int i = 0; i < hists->nr_mem_stats; i++) {
> +		if (hists->mem_stat_types[i] == mst) {
> +			mem_stat_idx = i;
> +			break;
> +		}
> +	}
> +	assert(mem_stat_idx != -1);
>  
> -	(void)hists;
>  	if (line == 0) {
>  		int left, right;
>  
> -		len = fmt->len;
> +		len = 0;
> +		/* update fmt->len for acutally used columns only */
> +		for (int i = 0; i < MEM_STAT_LEN; i++) {
> +			if (hists->mem_stat_total[mem_stat_idx].entries[i])
> +				len += MEM_STAT_PRINT_LEN;
> +		}
> +		fmt->len = len;
> +
> +		/* print header directly if single column only */
> +		if (len == MEM_STAT_PRINT_LEN)
> +			return scnprintf(hpp->buf, hpp->size, "%*s", len, fmt->name);
> +
>  		left = (len - strlen(fmt->name)) / 2 - 1;
>  		right = len - left - strlen(fmt->name) - 2;
>  
> @@ -423,10 +445,14 @@ static int hpp__header_mem_stat_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hp
>  				 left, graph_dotted_line, fmt->name, right, graph_dotted_line);
>  	}
>  
> +
>  	len = hpp->size;
>  	for (int i = 0; i < MEM_STAT_LEN; i++) {
>  		int printed;
>  
> +		if (hists->mem_stat_total[mem_stat_idx].entries[i] == 0)
> +			continue;
> +
>  		printed = scnprintf(buf, len, "%*s", MEM_STAT_PRINT_LEN,
>  				    mem_stat_name(mst, i));
>  		ret += printed;
> @@ -1214,6 +1240,11 @@ int perf_hpp__alloc_mem_stats(struct perf_hpp_list *list, struct evlist *evlist)
>  		if (hists->mem_stat_types == NULL)
>  			return -ENOMEM;
>  
> +		hists->mem_stat_total = calloc(nr_mem_stats,
> +					       sizeof(*hists->mem_stat_total));
> +		if (hists->mem_stat_total == NULL)
> +			return -ENOMEM;
> +
>  		memcpy(hists->mem_stat_types, mst, nr_mem_stats * sizeof(*mst));
>  		hists->nr_mem_stats = nr_mem_stats;
>  	}
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 7759c1818c1ad168..afc6855327ab0de6 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -354,6 +354,7 @@ static int hists__update_mem_stat(struct hists *hists, struct hist_entry *he,
>  
>  		assert(0 <= idx && idx < MEM_STAT_LEN);
>  		he->mem_stat[i].entries[idx] += period;
> +		hists->mem_stat_total[i].entries[idx] += period;
>  	}
>  	return 0;
>  }
> @@ -3054,6 +3055,7 @@ static void hists_evsel__exit(struct evsel *evsel)
>  
>  	hists__delete_all_entries(hists);
>  	zfree(&hists->mem_stat_types);
> +	zfree(&hists->mem_stat_total);
>  
>  	list_for_each_entry_safe(node, tmp, &hists->hpp_formats, list) {
>  		perf_hpp_list__for_each_format_safe(&node->hpp, fmt, pos) {
> diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> index 3990cfc21b1615ae..fa5e886e5b04ec9b 100644
> --- a/tools/perf/util/hist.h
> +++ b/tools/perf/util/hist.h
> @@ -135,6 +135,7 @@ struct hists {
>  	int			nr_hpp_node;
>  	int			nr_mem_stats;
>  	enum mem_stat_type	*mem_stat_types;
> +	struct he_mem_stat	*mem_stat_total;
>  };
>  
>  #define hists__has(__h, __f) (__h)->hpp_list->__f
> -- 
> 2.49.0.906.g1f30a19c02-goog

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 11/11] perf mem: Add 'dtlb' output field
  2025-04-30 20:55 ` [PATCH 11/11] perf mem: Add 'dtlb' " Namhyung Kim
@ 2025-05-02 16:30   ` Arnaldo Carvalho de Melo
  2025-05-02 18:38     ` Namhyung Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-05-02 16:30 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users, Ravi Bangoria, Leo Yan

On Wed, Apr 30, 2025 at 01:55:48PM -0700, Namhyung Kim wrote:
> This is a breakdown of perf_mem_data_src.mem_dtlb values.  It assumes
> PMU drivers would set PERF_MEM_TLB_HIT bit with an appropriate level.
> And having PERF_MEM_TLB_MISS means that it failed to find one in any
> levels of TLB.  For now, it doesn't use PERF_MEM_TLB_{WK,OS} bits.
> 
> Also it seems Intel machines don't distinguish L1 or L2 precisely.  So I
> added ANY_HIT (printed as "L?-Hit") to handle the case.
> 
>   $ perf mem report -F overhead,dtlb,dso --stdio
>   ...
>   #           --- D-TLB ----
>   # Overhead   L?-Hit   Miss  Shared Object
>   # ........  ..............  .................
>   #
>       67.03%    99.5%   0.5%  [unknown]
>       31.23%    99.2%   0.8%  [kernel.kallsyms]
>        1.08%    97.8%   2.2%  [i915]
>        0.36%   100.0%   0.0%  [JIT] tid 6853
>        0.12%   100.0%   0.0%  [drm]
>        0.05%   100.0%   0.0%  [drm_kms_helper]
>        0.05%   100.0%   0.0%  [ext4]
>        0.02%   100.0%   0.0%  [aesni_intel]
>        0.02%   100.0%   0.0%  [crc32c_intel]
>        0.02%   100.0%   0.0%  [dm_crypt]
>        ...

root@number:~# perf report --header | grep cpudesc
# cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor
root@number:~# perf mem report -F overhead,dtlb,dso --stdio | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 2K of event 'cycles:P'
# Total weight : 2637
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc
#
#           ---------- D-TLB -----------                                   
# Overhead   L1-Hit L2-Hit   Miss  Other  Shared Object                    
# ........  ............................  .................................
#
    77.47%    18.4%   0.1%   0.6%  80.9%  [kernel.kallsyms]                
     5.61%    36.5%   0.7%   1.4%  61.5%  libxul.so                        
     2.77%    39.7%   0.0%  12.3%  47.9%  libc.so.6                        
     2.01%    34.0%   1.9%   1.9%  62.3%  libglib-2.0.so.0.8400.1          
     1.93%    31.4%   2.0%   2.0%  64.7%  [amdgpu]                         
     1.63%    48.8%   0.0%   0.0%  51.2%  [JIT] tid 60168                  
     1.14%     3.3%   0.0%   0.0%  96.7%  [vdso]                           
root@number:~#

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 08/11] perf hist: Hide unused mem stat columns
  2025-05-02 16:27   ` Arnaldo Carvalho de Melo
@ 2025-05-02 18:21     ` Namhyung Kim
  0 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-05-02 18:21 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users, Ravi Bangoria, Leo Yan

On Fri, May 02, 2025 at 01:27:14PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Apr 30, 2025 at 01:55:45PM -0700, Namhyung Kim wrote:
> > Some mem_stat types don't use all 8 columns.  And there are cases only
> > samples in certain kinds of mem_stat types are available only.  For that
> > case hide columns which has no samples.
> > 
> > The new output for the previous data would be:
> > 
> >   $ perf mem report -F overhead,op,comm --stdio
> >   ...
> >   #           ------ Mem Op -------
> >   # Overhead     Load  Store  Other  Command
> >   # ........  .....................  ...............
> >   #
> >       44.85%    21.1%  30.7%  48.3%  swapper
> >       26.82%    98.8%   0.3%   0.9%  netsli-prober
> 
> /me curious about this "Other" column.

They are instructions that don't have memory operations.

> 
> Maps to MEM_STAT_OP_OTHER, that comes from mem_stat_index, that comes
> from:
> 
> int mem_stat_index(const enum mem_stat_type mst, const u64 val)
> {
>         union perf_mem_data_src src = {
>                 .val = val,
>         };
> 
>                 int idx = mem_stat_index(hists->mem_stat_types[i],
>                                          mem_info__const_data_src(mi)->val);
> 
> struct mem_info *mi
> 
> 
> union perf_mem_data_src {
>         __u64 val;
>         struct {
>                 __u64   mem_op:5,       /* type of opcode */
>                         mem_lvl:14,     /* memory hierarchy level */
>                         mem_snoop:5,    /* snoop mode */
>                         mem_lock:2,     /* lock instr */
>                         mem_dtlb:7,     /* tlb access */
>                         mem_lvl_num:4,  /* memory hierarchy level number */
>                         mem_remote:1,   /* remote */
>                         mem_snoopx:2,   /* snoop mode, ext */
>                         mem_blk:3,      /* access blocked */
>                         mem_hops:3,     /* hop level */
>                         mem_rsvd:18;
>         };
> };
> 
> As the percentage for "Other" is so high I think some other patch in
> this series will elucidate that :-)

IIUC AMD IBS cannot sample memory instructions specifically.  It'd just
pick random uops/instructions and capture the data.  So it's natural to
see large 'Other' operations on AMD.

> 
> Lemme continue testing...
> 
> - Arnaldo
> 
> >        7.19%    51.7%  13.7%  34.6%  perf
> >        5.81%    89.7%   2.2%   8.1%  qemu-system-ppc
> >        4.77%   100.0%   0.0%   0.0%  notifications_c
> >        1.77%    95.9%   1.2%   3.0%  MemoryReleaser
> >        0.77%    71.6%   4.1%  24.3%  DefaultEventMan
> >        0.19%    66.7%  22.2%  11.1%  gnome-shell
> >        ...
> > 
> > On Intel machines, the event is only for loads or stores so it'll have
> > only one columns like below:
> > 
> >   #            Mem Op
> >   # Overhead     Load  Command
> >   # ........  .......  ...............
> >   #
> >       20.55%   100.0%  swapper
> >       17.13%   100.0%  chrome
> >        9.02%   100.0%  data-loop.0
> >        6.26%   100.0%  pipewire-pulse
> >        5.63%   100.0%  threaded-ml
> >        5.47%   100.0%  GraphRunner
> >        5.37%   100.0%  AudioIP~allback
> >        5.30%   100.0%  Chrome_ChildIOT
> >        3.17%   100.0%  Isolated Web Co
> >        ...
> > 
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/ui/hist.c   | 35 +++++++++++++++++++++++++++++++++--
> >  tools/perf/util/hist.c |  2 ++
> >  tools/perf/util/hist.h |  1 +
> >  3 files changed, 36 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
> > index 427ce687ad815a62..661922c4d7863224 100644
> > --- a/tools/perf/ui/hist.c
> > +++ b/tools/perf/ui/hist.c
> > @@ -178,6 +178,9 @@ int hpp__fmt_mem_stat(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *
> >  	for (int i = 0; i < MEM_STAT_LEN; i++) {
> >  		u64 val = he->mem_stat[mem_stat_idx].entries[i];
> >  
> > +		if (hists->mem_stat_total[mem_stat_idx].entries[i] == 0)
> > +			continue;
> > +
> >  		ret += hpp__call_print_fn(hpp, print_fn, fmtstr, 100.0 * val / total);
> >  	}
> >  
> > @@ -405,12 +408,31 @@ static int hpp__header_mem_stat_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hp
> >  	int ret = 0;
> >  	int len;
> >  	enum mem_stat_type mst = hpp__mem_stat_type(fmt);
> > +	int mem_stat_idx = -1;
> > +
> > +	for (int i = 0; i < hists->nr_mem_stats; i++) {
> > +		if (hists->mem_stat_types[i] == mst) {
> > +			mem_stat_idx = i;
> > +			break;
> > +		}
> > +	}
> > +	assert(mem_stat_idx != -1);
> >  
> > -	(void)hists;
> >  	if (line == 0) {
> >  		int left, right;
> >  
> > -		len = fmt->len;
> > +		len = 0;
> > +		/* update fmt->len for acutally used columns only */
> > +		for (int i = 0; i < MEM_STAT_LEN; i++) {
> > +			if (hists->mem_stat_total[mem_stat_idx].entries[i])
> > +				len += MEM_STAT_PRINT_LEN;
> > +		}
> > +		fmt->len = len;
> > +
> > +		/* print header directly if single column only */
> > +		if (len == MEM_STAT_PRINT_LEN)
> > +			return scnprintf(hpp->buf, hpp->size, "%*s", len, fmt->name);
> > +
> >  		left = (len - strlen(fmt->name)) / 2 - 1;
> >  		right = len - left - strlen(fmt->name) - 2;
> >  
> > @@ -423,10 +445,14 @@ static int hpp__header_mem_stat_fn(struct perf_hpp_fmt *fmt, struct perf_hpp *hp
> >  				 left, graph_dotted_line, fmt->name, right, graph_dotted_line);
> >  	}
> >  
> > +
> >  	len = hpp->size;
> >  	for (int i = 0; i < MEM_STAT_LEN; i++) {
> >  		int printed;
> >  
> > +		if (hists->mem_stat_total[mem_stat_idx].entries[i] == 0)
> > +			continue;
> > +
> >  		printed = scnprintf(buf, len, "%*s", MEM_STAT_PRINT_LEN,
> >  				    mem_stat_name(mst, i));
> >  		ret += printed;
> > @@ -1214,6 +1240,11 @@ int perf_hpp__alloc_mem_stats(struct perf_hpp_list *list, struct evlist *evlist)
> >  		if (hists->mem_stat_types == NULL)
> >  			return -ENOMEM;
> >  
> > +		hists->mem_stat_total = calloc(nr_mem_stats,
> > +					       sizeof(*hists->mem_stat_total));
> > +		if (hists->mem_stat_total == NULL)
> > +			return -ENOMEM;
> > +
> >  		memcpy(hists->mem_stat_types, mst, nr_mem_stats * sizeof(*mst));
> >  		hists->nr_mem_stats = nr_mem_stats;
> >  	}
> > diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> > index 7759c1818c1ad168..afc6855327ab0de6 100644
> > --- a/tools/perf/util/hist.c
> > +++ b/tools/perf/util/hist.c
> > @@ -354,6 +354,7 @@ static int hists__update_mem_stat(struct hists *hists, struct hist_entry *he,
> >  
> >  		assert(0 <= idx && idx < MEM_STAT_LEN);
> >  		he->mem_stat[i].entries[idx] += period;
> > +		hists->mem_stat_total[i].entries[idx] += period;
> >  	}
> >  	return 0;
> >  }
> > @@ -3054,6 +3055,7 @@ static void hists_evsel__exit(struct evsel *evsel)
> >  
> >  	hists__delete_all_entries(hists);
> >  	zfree(&hists->mem_stat_types);
> > +	zfree(&hists->mem_stat_total);
> >  
> >  	list_for_each_entry_safe(node, tmp, &hists->hpp_formats, list) {
> >  		perf_hpp_list__for_each_format_safe(&node->hpp, fmt, pos) {
> > diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
> > index 3990cfc21b1615ae..fa5e886e5b04ec9b 100644
> > --- a/tools/perf/util/hist.h
> > +++ b/tools/perf/util/hist.h
> > @@ -135,6 +135,7 @@ struct hists {
> >  	int			nr_hpp_node;
> >  	int			nr_mem_stats;
> >  	enum mem_stat_type	*mem_stat_types;
> > +	struct he_mem_stat	*mem_stat_total;
> >  };
> >  
> >  #define hists__has(__h, __f) (__h)->hpp_list->__f
> > -- 
> > 2.49.0.906.g1f30a19c02-goog

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 11/11] perf mem: Add 'dtlb' output field
  2025-05-02 16:30   ` Arnaldo Carvalho de Melo
@ 2025-05-02 18:38     ` Namhyung Kim
  2025-05-02 19:21       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 23+ messages in thread
From: Namhyung Kim @ 2025-05-02 18:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users, Ravi Bangoria, Leo Yan

On Fri, May 02, 2025 at 01:30:35PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Apr 30, 2025 at 01:55:48PM -0700, Namhyung Kim wrote:
> > This is a breakdown of perf_mem_data_src.mem_dtlb values.  It assumes
> > PMU drivers would set PERF_MEM_TLB_HIT bit with an appropriate level.
> > And having PERF_MEM_TLB_MISS means that it failed to find one in any
> > levels of TLB.  For now, it doesn't use PERF_MEM_TLB_{WK,OS} bits.
> > 
> > Also it seems Intel machines don't distinguish L1 or L2 precisely.  So I
> > added ANY_HIT (printed as "L?-Hit") to handle the case.
> > 
> >   $ perf mem report -F overhead,dtlb,dso --stdio
> >   ...
> >   #           --- D-TLB ----
> >   # Overhead   L?-Hit   Miss  Shared Object
> >   # ........  ..............  .................
> >   #
> >       67.03%    99.5%   0.5%  [unknown]
> >       31.23%    99.2%   0.8%  [kernel.kallsyms]
> >        1.08%    97.8%   2.2%  [i915]
> >        0.36%   100.0%   0.0%  [JIT] tid 6853
> >        0.12%   100.0%   0.0%  [drm]
> >        0.05%   100.0%   0.0%  [drm_kms_helper]
> >        0.05%   100.0%   0.0%  [ext4]
> >        0.02%   100.0%   0.0%  [aesni_intel]
> >        0.02%   100.0%   0.0%  [crc32c_intel]
> >        0.02%   100.0%   0.0%  [dm_crypt]
> >        ...
> 
> root@number:~# perf report --header | grep cpudesc
> # cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor
> root@number:~# perf mem report -F overhead,dtlb,dso --stdio | head -20
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 2K of event 'cycles:P'
> # Total weight : 2637
> # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc
> #
> #           ---------- D-TLB -----------                                   
> # Overhead   L1-Hit L2-Hit   Miss  Other  Shared Object                    
> # ........  ............................  .................................
> #
>     77.47%    18.4%   0.1%   0.6%  80.9%  [kernel.kallsyms]                
>      5.61%    36.5%   0.7%   1.4%  61.5%  libxul.so                        
>      2.77%    39.7%   0.0%  12.3%  47.9%  libc.so.6                        
>      2.01%    34.0%   1.9%   1.9%  62.3%  libglib-2.0.so.0.8400.1          
>      1.93%    31.4%   2.0%   2.0%  64.7%  [amdgpu]                         
>      1.63%    48.8%   0.0%   0.0%  51.2%  [JIT] tid 60168                  
>      1.14%     3.3%   0.0%   0.0%  96.7%  [vdso]                           
> root@number:~#

I guess it's because those samples don't have mem_info as they are not
memory instructions.

Can you please re-run the perf record with filters like below?

  $ perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 11/11] perf mem: Add 'dtlb' output field
  2025-05-02 18:38     ` Namhyung Kim
@ 2025-05-02 19:21       ` Arnaldo Carvalho de Melo
  2025-05-02 20:01         ` Namhyung Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-05-02 19:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users, Ravi Bangoria, Leo Yan

On Fri, May 02, 2025 at 11:38:35AM -0700, Namhyung Kim wrote:
> On Fri, May 02, 2025 at 01:30:35PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Wed, Apr 30, 2025 at 01:55:48PM -0700, Namhyung Kim wrote:
> > > This is a breakdown of perf_mem_data_src.mem_dtlb values.  It assumes
> > > PMU drivers would set PERF_MEM_TLB_HIT bit with an appropriate level.
> > > And having PERF_MEM_TLB_MISS means that it failed to find one in any
> > > levels of TLB.  For now, it doesn't use PERF_MEM_TLB_{WK,OS} bits.

> > > Also it seems Intel machines don't distinguish L1 or L2 precisely.  So I
> > > added ANY_HIT (printed as "L?-Hit") to handle the case.

> > >   $ perf mem report -F overhead,dtlb,dso --stdio
> > >   ...
> > >   #           --- D-TLB ----
> > >   # Overhead   L?-Hit   Miss  Shared Object
> > >   # ........  ..............  .................
> > >   #
> > >       67.03%    99.5%   0.5%  [unknown]
> > >       31.23%    99.2%   0.8%  [kernel.kallsyms]
> > >        1.08%    97.8%   2.2%  [i915]
> > >        0.36%   100.0%   0.0%  [JIT] tid 6853
> > >        0.12%   100.0%   0.0%  [drm]
> > >        0.05%   100.0%   0.0%  [drm_kms_helper]
> > >        0.05%   100.0%   0.0%  [ext4]
> > >        0.02%   100.0%   0.0%  [aesni_intel]
> > >        0.02%   100.0%   0.0%  [crc32c_intel]
> > >        0.02%   100.0%   0.0%  [dm_crypt]
> > >        ...

> > root@number:~# perf report --header | grep cpudesc
> > # cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor
> > root@number:~# perf mem report -F overhead,dtlb,dso --stdio | head -20
> > # To display the perf.data header info, please use --header/--header-only options.
> > #
> > #
> > # Total Lost Samples: 0
> > #
> > # Samples: 2K of event 'cycles:P'
> > # Total weight : 2637
> > # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc
> > #
> > #           ---------- D-TLB -----------                                   
> > # Overhead   L1-Hit L2-Hit   Miss  Other  Shared Object                    
> > # ........  ............................  .................................
> > #
> >     77.47%    18.4%   0.1%   0.6%  80.9%  [kernel.kallsyms]                
> >      5.61%    36.5%   0.7%   1.4%  61.5%  libxul.so                        
> >      2.77%    39.7%   0.0%  12.3%  47.9%  libc.so.6                        
> >      2.01%    34.0%   1.9%   1.9%  62.3%  libglib-2.0.so.0.8400.1          
> >      1.93%    31.4%   2.0%   2.0%  64.7%  [amdgpu]                         
> >      1.63%    48.8%   0.0%   0.0%  51.2%  [JIT] tid 60168                  
> >      1.14%     3.3%   0.0%   0.0%  96.7%  [vdso]                           
> > root@number:~#
> 
> I guess it's because those samples don't have mem_info as they are not
> memory instructions.
> 
> Can you please re-run the perf record with filters like below?
> 
>   $ perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1

I tried, it got stuck for more than 1 second and then control+C took
also a while and eventually:

root@x1:~# fg
perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
^X^C^C^C^Clibbpf: prog 'perf_sample_filter': BPF program load failed: -EAGAIN
libbpf: prog 'perf_sample_filter': -- BEGIN PROG LOAD LOG --
arg#0 reference type('UNKNOWN ') size cannot be determined: -22
0: R1=ctx() R10=fp0
; kctx = bpf_cast_to_kern_ctx(ctx); @ sample_filter.bpf.c:215
0: (85) call bpf_cast_to_kern_ctx#72125       ; R0_w=trusted_ptr_bpf_perf_event_data_kern()
1: (7b) *(u64 *)(r10 -48) = r0        ; R0_w=trusted_ptr_bpf_perf_event_data_kern() R10=fp0 fp-48_w=trusted_ptr_bpf_perf_event_data_kern()
2: (b7) r6 = 0                        ; R6_w=0
; k = 0; @ sample_filter.bpf.c:217
3: (63) *(u32 *)(r10 -4) = r6         ; R6_w=0 R10=fp0 fp-8=0000????
; if (use_idx_hash) { @ sample_filter.bpf.c:219
4: (18) r1 = 0xffffa9e081be6000       ; R1_w=map_value(map=sample_f.rodata,ks=4,vs=4)
6: (61) r1 = *(u32 *)(r1 +0)          ; R1_w=0
7: (15) if r1 == 0x0 goto pc+42       ; R1_w=0
; k = *idx; @ sample_filter.bpf.c:239
50: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
; entry = bpf_map_lookup_elem(&filters, &k); @ sample_filter.bpf.c:244
51: (07) r2 += -4                     ; R2_w=fp-4
52: (18) r1 = 0xffff8ecc22be8000      ; R1_w=map_ptr(map=filters,ks=4,vs=1536)
54: (85) call bpf_map_lookup_elem#1   ; R0_w=map_value_or_null(id=1,map=filters,ks=4,vs=1536)
55: (bf) r6 = r0                      ; R0_w=map_value_or_null(id=1,map=filters,ks=4,vs=1536) R6_w=map_value_or_null(id=1,map=filters,ks=4,vs=1536)
56: (79) r5 = *(u64 *)(r10 -48)       ; R5_w=trusted_ptr_bpf_perf_event_data_kern() R10=fp0 fp-48=trusted_ptr_bpf_perf_event_data_kern()
; if (entry == NULL) @ sample_filter.bpf.c:245
57: (15) if r6 == 0x0 goto pc-21      ; R6_w=map_value(map=filters,ks=4,vs=1536)
58: (b7) r8 = 0                       ; R8_w=0
59: (b7) r9 = 0                       ; R9_w=0
60: (b7) r1 = 0                       ; R1_w=0
61: (7b) *(u64 *)(r10 -40) = r1       ; R1_w=0 R10=fp0 fp-40_w=0
62: (05) goto pc+9
; for (i = 0; i < MAX_FILTERS; i++) { @ sample_filter.bpf.c:248
72: (bf) r7 = r6                      ; R6=map_value(map=filters,ks=4,vs=1536) R7_w=map_value(map=filters,ks=4,vs=1536)
73: (0f) r7 += r8                     ; R7_w=map_value(map=filters,ks=4,vs=1536) R8=0
; struct perf_sample_data___new *data = (void *)kctx->data; @ sample_filter.bpf.c:77


<SNIP tons of verifier lines>

269: (65) if r2 s> 0x1 goto pc+28     ; R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
270: (15) if r2 == 0x0 goto pc+45     ; R2_w=1
271: (15) if r2 == 0x1 goto pc+1      ; R2_w=1
; CHECK_RESULT(sample_data, !=, entry[i].value) @ sample_filter.bpf.c:256
273: (bf) r2 = r6                     ; R2_w=map_value(map=filters,ks=4,vs=1536) R6=map_value(map=filters,ks=4,vs=1536)
274: (0f) r2 += r8                    ; R2_w=map_value(map=filters,ks=4,vs=1536,off=984) R8_w=984
275: (79) r2 = *(u64 *)(r2 +16)       ; R2_w=scalar()
276: (5d) if r1 != r2 goto pc+101 378: R0_w=1 R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=scalar() R3_w=ptr_perf_sample_data() R4_w=scalar(smin=umin=umin32=1,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R5=trusted_ptr_bpf_perf_event_data_kern() R6=map_value(map=filters,ks=4,vs=1536) R7_w=map_value(map=filters,ks=4,vs=1536,off=984) R8_w=984 R9=0 R10=fp0 fp-8=mmmm???? fp-40=0 fp-48=trusted_ptr_bpf_perf_event_data_kern()
378: (67) r9 <<= 32                   ; R9_w=0
379: (bf) r1 = r9                     ; R1_w=0 R9_w=0
380: (77) r1 >>= 32                   ; R1_w=0
381: (b7) r9 = 0                      ; R9_w=0
382: (15) if r1 == 0x0 goto pc-313    ; R1_w=0
; for (i = 0; i < MAX_FILTERS; i++) { @ sample_filter.bpf.c:248
70: (07) r8 += 24                     ; R8_w=1008
71: (15) if r8 == 0x600 goto pc-25    ; R8_w=1008
72: (bf) r7 = r6                      ; R6=map_value(map=filters,ks=4,vs=1536) R7_w=map_value(map=filters,ks=4,vs=1536)
73: (0f) r7 += r8                     ; R7_w=map_value(map=filters,ks=4,vs=1536,off=1008) R8_w=1008
; struct perf_sample_data___new *data = (void *)kctx->data; @ sample_filter.bpf.c:77
74: (79) r3 = *(u64 *)(r5 +8)
processed 2344 insns (limit 1000000) max_states_per_insn 4 total_states 57 peak_states 57 mark_read 6
-- END PROG LOAD LOG --
libbpf: prog 'perf_sample_filter': failed to load: -EAGAIN
libbpf: failed to load object 'sample_filter_bpf'
libbpf: failed to load BPF skeleton 'sample_filter_bpf': -EAGAIN
Failed to load perf sample-filter BPF skeleton
failed to set filter "BPF" on event cpu_core/cycles/P with 11 (Resource temporarily unavailable)
^Z
[1]+  Stopped                 perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
root@x1:~# 
root@x1:~# fg
perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
^C^C^C^C^C


^C^C^C^C^C

Well, I just had it suspended, I'm not being able to stop it:

root@x1:~# ps 
    PID TTY          TIME CMD
1355566 pts/15   00:00:00 sudo
1355567 pts/15   00:00:00 su
1355570 pts/15   00:00:00 bash
1373907 pts/15   00:00:44 perf
1373908 pts/15   00:00:00 perf-exec
1374019 pts/15   00:00:00 ps
root@x1:~# kill -9 1373907
root@x1:~# kill -9 1373907
-bash: kill: (1373907) - No such process
[1]+  Killed                  perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
root@x1:~# 

Ok, killed.

It gets stuck in that MAP_FREEZE sys_bpf call:

root@x1:~# perf trace perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
<SNIP>
 16349.552 ( 0.023 ms): perf/1374043 bpf(uattr: (union bpf_attr){(struct){.map_type = (__u32)2,.key_size = (__u32)4,.value_size = (__u32)4,.max_entries = (__u32)1,.map_flags = (__u32)1152,.map_name = (char[16])['s','a','m','p','l','e','_','f','.','r','o','d','a','t','a',],.btf_fd = (__u32)61,.btf_value_type_id = (__u32)59,},(struct){.map_fd = (__u32)2,.key = (__u64)4294967300,(union){.value = (__u64)1152,.next_key = (__u64)1152,},.flags = (__u64)8101238451258523648,},.batch = (struct){.in_batch = (__u64)17179869186,.out_batch = (__u64)4294967300,.keys = (__u64)1152,.values = (__u64)8101238451258523648,.count = (__u32)1717527916,.map_fd = (__u32)1685025326,.elem_flags = (__u64)6386785,.flags = (__u64)61,},(struct){.prog_type = (__u32)2,.insn_cnt = (__u32)4,.insns = (__u64)4294967300,.license = (__u64)1152,.log_size = (__u32)1886216563,.log_buf = (__u64)7237128669819266412,.kern_version = (__u32)6386785,.prog_name = (char[16])['=',';',],.func_info = (__u64)140735998446044,.func_info_cnt = (__u32)520807888,.line_info = (__u64)140735998445848,.line_info_cnt = (__u32)4874802,.attach_btf_id = (__u32)4,(union){.attach_prog_fd = (__u32)2805057752,.attach_btf_obj_fd = (__u32)2805057752,},.core_relo_cnt = (__u32)32767,.fd_array = (__u64)15131936,.core_relos = (__u64)140735998445952,.core_relo_rec_size = (__u32)15049664,.prog_token_fd = (__s32)-1489909376,},(struct){.pathname = (__u64)17179869186,.bpf_fd = (__u32)4,.file_flags = (__u32)1,.path_fd = (__s32)1152,},(struct){(union){.target_fd = (__u32)2,.target_ifindex = (__u32)2,},.attach_bpf_fd = (__u32)4,.attach_type = (__u32)4,.attach_flags = (__u32)1,.replace_bpf_fd = (__u32)1152,.expected_revision = (__u64)8101238451258523648,},.test = (struct){.prog_fd = (__u32)2,.retval = (__u32)4,.data_size_in = (__u32)4,.data_size_out = (__u32)1,.data_in = (__u64)1152,.data_out = (__u64)8101238451258523648,.repeat = (__u32)1717527916,.duration = (__u32)1685025326,.ctx_size_in = (__u32)6386785,.ctx_in = (__u64)61,.ctx_out = (__u64)59,},(struct){(union){.start_id = (__u32)2,.prog_id = (__u32)2,.map_id = (__u32)2,.btf_id ) = 62
 16349.576 ( 0.002 ms): perf/1374043 dup3(oldfd: 62, newfd: 60<anon_inode:bpf-map>, flags: 524288)         = 60
 16349.582 ( 0.001 ms): perf/1374043 close(fd: 62)                                                         = 0
 16349.586 ( 0.006 ms): perf/1374043 bpf(cmd: MAP_UPDATE_ELEM, uattr: (union bpf_attr){(struct){.map_type = (__u32)60,.value_size = (__u32)2805058140,.max_entries = (__u32)32767,.map_flags = (__u32)1994993664,.inner_map_fd = (__u32)32622,.map_ifindex = (__u32)2,.btf_fd = (__u32)2805058000,.btf_key_type_id = (__u32)32767,.btf_value_type_id = (__u32)16,.btf_vmlinux_value_type_id = (__u32)48,.map_extra = (__u64)140735998446240,.value_type_btf_obj_fd = (__s32)-1489909280,.map_token_fd = (__s32)32767,},(struct){.map_fd = (__u32)60,.key = (__u64)140735998446172,(union){.value = (__u64)140112418123776,.next_key = (__u64)140112418123776,},},.batch = (struct){.in_batch = (__u64)60,.out_batch = (__u64)140735998446172,.keys = (__u64)140112418123776,.count = (__u32)7379400,.elem_flags = (__u64)9110741168,.flags = (__u64)140735998446032,},(struct){.prog_type = (__u32)60,.insns = (__u64)140735998446172,.license = (__u64)140112418123776,.log_buf = (__u64)7379400,.kern_version = (__u32)520806576,.prog_flags = (__u32)2,.prog_name = (char[16])[208,201,'1',167,255,127,16,'0',],.prog_ifindex = (__u32)2805058208,.expected_attach_type = (__u32)32767,.prog_btf_fd = (__u32)2805058016,.func_info_rec_size = (__u32)32767,.func_info = (__u64)410826951296,.func_info_cnt = (__u32)4874930,.line_info = (__u64)40,.line_info_cnt = (__u32)1,.attach_btf_id = (__u32)4,(union){.attach_prog_fd = (__u32)520846144,.attach_btf_obj_fd = (__u32)520846144,},.fd_array = (__u64)60,.core_relos = (__u64)140735998445776,.core_relo_rec_size = (__u32)2805057752,.log_true_size = (__u32)524288,},(struct){.pathname = (__u64)60,.bpf_fd = (__u32)2805058140,.file_flags = (__u32)32767,.path_fd = (__s32)1994993664,},(struct){(union){.target_fd = (__u32)60,.target_ifindex = (__u32)60,},.attach_type = (__u32)2805058140,.attach_flags = (__u32)32767,.replace_bpf_fd = (__u32)1994993664,(union){.relative_fd = (__u32)32622,.relative_id = (__u32)32622,},},.test = (struct){.prog_fd = (__u32)60,.data_size_in = (__u32)2805058140,.data_size_out = (__u32)32767,.data_in = (__u64)140112418123776,.repeat = (__u32)7379400,.) = 0

 16349.594 ( 0.002 ms): perf/1374043 bpf(cmd: MAP_FREEZE, uattr: (union bpf_attr){(struct){.map_type = (__u32)60,.value_size = (__u32)2805058140,.max_entries = (__u32)32767,.map_flags = (__u32)1994993664,.inner_map_fd = (__u32)32622,.map_ifindex = (__u32)2,.btf_fd = (__u32)2805058000,.btf_key_type_id = (__u32)32767,.btf_value_type_id = (__u32)16,.btf_vmlinux_value_type_id = (__u32)48,.map_extra = (__u64)140735998446240,.value_type_btf_obj_fd = (__s32)-1489909280,.map_token_fd = (__s32)32767,},(struct){.map_fd = (__u32)60,.key = (__u64)140735998446172,(union){.value = (__u64)140112418123776,.next_key = (__u64)140112418123776,},},.batch = (struct){.in_batch = (__u64)60,.out_batch = (__u64)140735998446172,.keys = (__u64)140112418123776,.count = (__u32)7379400,.elem_flags = (__u64)9110741168,.flags = (__u64)140735998446032,},(struct){.prog_type = (__u32)60,.insns = (__u64)140735998446172,.license = (__u64)140112418123776,.log_buf = (__u64)7379400,.kern_version = (__u32)520806576,.prog_flags = (__u32)2,.prog_name = (char[16])[208,201,'1',167,255,127,16,'0',],.prog_ifindex = (__u32)2805058208,.expected_attach_type = (__u32)32767,.prog_btf_fd = (__u32)2805058016,.func_info_rec_size = (__u32)32767,.func_info = (__u64)410826951296,.func_info_cnt = (__u32)4874930,.line_info = (__u64)40,.line_info_cnt = (__u32)1,.attach_btf_id = (__u32)4,(union){.attach_prog_fd = (__u32)520846144,.attach_btf_obj_fd = (__u32)520846144,},.fd_array = (__u64)60,.core_relos = (__u64)140735998445776,.core_relo_rec_size = (__u32)2805057752,.log_true_size = (__u32)524288,},(struct){.pathname = (__u64)60,.bpf_fd = (__u32)2805058140,.file_flags = (__u32)32767,.path_fd = (__s32)1994993664,},(struct){(union){.target_fd = (__u32)60,.target_ifindex = (__u32)60,},.attach_type = (__u32)2805058140,.attach_flags = (__u32)32767,.replace_bpf_fd = (__u32)1994993664,(union){.relative_fd = (__u32)32622,.relative_id = (__u32)32622,},},.test = (struct){.prog_fd = (__u32)60,.data_size_in = (__u32)2805058140,.data_size_out = (__u32)32767,.data_in = (__u64)140112418123776,.repeat = (__u32)7379400,.ctx_s) = 0
 o
16349.601 ( 0.050 ms): perf/1374043 mmap(addr: 0x7f6e76e93000, len: 4096, prot: READ, flags: SHARED|FIXED, fd: 60) = 0x7f6e76e93000

root@x1:~# uname -a
Linux x1 6.13.9-100.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Mar 29 01:27:18 UTC 2025 x86_64 GNU/Linux
root@x1:~#

root@x1:~# grep -m1 "model name" /proc/cpuinfo 
model name	: 13th Gen Intel(R) Core(TM) i7-1365U
root@x1:~# 

Now trying on another machine:

root@number:~# uname -a
Linux number 6.15.0-rc4+ #2 SMP PREEMPT_DYNAMIC Tue Apr 29 15:56:43 -03 2025 x86_64 GNU/Linux
root@number:~# grep -m1 "model name" /proc/cpuinfo
model name	: AMD Ryzen 9 9950X3D 16-Core Processor
root@number:~# perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.870 MB perf.data (1199 samples) ]
root@number:~#

root@number:~# perf mem report -F overhead,dtlb,dso --stdio | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:P'
# Total weight : 7879
# Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc
#
#           ---------- D-TLB -----------                                   
# Overhead   L1-Hit L2-Hit   Miss  Other  Shared Object                    
# ........  ............................  .................................
#
    48.51%    44.8%  18.7%  28.3%   8.2%  [kernel.kallsyms]                
    11.97%     2.1%   1.4%  96.0%   0.5%  libc.so.6                        
     8.58%    85.7%  14.1%   0.0%   0.3%  libxul.so                        
     6.76%   100.0%   0.0%   0.0%   0.0%  libfreeblpriv3.so                
     6.08%   100.0%   0.0%   0.0%   0.0%  libsystemd-shared-257.5-2.fc42.so
     4.59%   100.0%   0.0%   0.0%   0.0%  firefox                          
     4.33%   100.0%   0.0%   0.0%   0.0%  libgallium-25.0.4.so             
root@number:~#

Looks better :-)

Having the record as an alias seems interesting, ditto for the report.

- Arnaldo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 11/11] perf mem: Add 'dtlb' output field
  2025-05-02 19:21       ` Arnaldo Carvalho de Melo
@ 2025-05-02 20:01         ` Namhyung Kim
  0 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2025-05-02 20:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users, Ravi Bangoria, Leo Yan

On Fri, May 02, 2025 at 04:21:13PM -0300, Arnaldo Carvalho de Melo wrote:
> On Fri, May 02, 2025 at 11:38:35AM -0700, Namhyung Kim wrote:
> > On Fri, May 02, 2025 at 01:30:35PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Wed, Apr 30, 2025 at 01:55:48PM -0700, Namhyung Kim wrote:
> > > > This is a breakdown of perf_mem_data_src.mem_dtlb values.  It assumes
> > > > PMU drivers would set PERF_MEM_TLB_HIT bit with an appropriate level.
> > > > And having PERF_MEM_TLB_MISS means that it failed to find one in any
> > > > levels of TLB.  For now, it doesn't use PERF_MEM_TLB_{WK,OS} bits.
> 
> > > > Also it seems Intel machines don't distinguish L1 or L2 precisely.  So I
> > > > added ANY_HIT (printed as "L?-Hit") to handle the case.
> 
> > > >   $ perf mem report -F overhead,dtlb,dso --stdio
> > > >   ...
> > > >   #           --- D-TLB ----
> > > >   # Overhead   L?-Hit   Miss  Shared Object
> > > >   # ........  ..............  .................
> > > >   #
> > > >       67.03%    99.5%   0.5%  [unknown]
> > > >       31.23%    99.2%   0.8%  [kernel.kallsyms]
> > > >        1.08%    97.8%   2.2%  [i915]
> > > >        0.36%   100.0%   0.0%  [JIT] tid 6853
> > > >        0.12%   100.0%   0.0%  [drm]
> > > >        0.05%   100.0%   0.0%  [drm_kms_helper]
> > > >        0.05%   100.0%   0.0%  [ext4]
> > > >        0.02%   100.0%   0.0%  [aesni_intel]
> > > >        0.02%   100.0%   0.0%  [crc32c_intel]
> > > >        0.02%   100.0%   0.0%  [dm_crypt]
> > > >        ...
> 
> > > root@number:~# perf report --header | grep cpudesc
> > > # cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor
> > > root@number:~# perf mem report -F overhead,dtlb,dso --stdio | head -20
> > > # To display the perf.data header info, please use --header/--header-only options.
> > > #
> > > #
> > > # Total Lost Samples: 0
> > > #
> > > # Samples: 2K of event 'cycles:P'
> > > # Total weight : 2637
> > > # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc
> > > #
> > > #           ---------- D-TLB -----------                                   
> > > # Overhead   L1-Hit L2-Hit   Miss  Other  Shared Object                    
> > > # ........  ............................  .................................
> > > #
> > >     77.47%    18.4%   0.1%   0.6%  80.9%  [kernel.kallsyms]                
> > >      5.61%    36.5%   0.7%   1.4%  61.5%  libxul.so                        
> > >      2.77%    39.7%   0.0%  12.3%  47.9%  libc.so.6                        
> > >      2.01%    34.0%   1.9%   1.9%  62.3%  libglib-2.0.so.0.8400.1          
> > >      1.93%    31.4%   2.0%   2.0%  64.7%  [amdgpu]                         
> > >      1.63%    48.8%   0.0%   0.0%  51.2%  [JIT] tid 60168                  
> > >      1.14%     3.3%   0.0%   0.0%  96.7%  [vdso]                           
> > > root@number:~#
> > 
> > I guess it's because those samples don't have mem_info as they are not
> > memory instructions.
> > 
> > Can you please re-run the perf record with filters like below?
> > 
> >   $ perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
> 
> I tried, it got stuck for more than 1 second and then control+C took
> also a while and eventually:

I also saw some delay when using BPF.  I suspect it's BPF loader taking
long.

> 
> root@x1:~# fg
> perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
> ^X^C^C^C^Clibbpf: prog 'perf_sample_filter': BPF program load failed: -EAGAIN
> libbpf: prog 'perf_sample_filter': -- BEGIN PROG LOAD LOG --
> arg#0 reference type('UNKNOWN ') size cannot be determined: -22
> 0: R1=ctx() R10=fp0
> ; kctx = bpf_cast_to_kern_ctx(ctx); @ sample_filter.bpf.c:215
> 0: (85) call bpf_cast_to_kern_ctx#72125       ; R0_w=trusted_ptr_bpf_perf_event_data_kern()
> 1: (7b) *(u64 *)(r10 -48) = r0        ; R0_w=trusted_ptr_bpf_perf_event_data_kern() R10=fp0 fp-48_w=trusted_ptr_bpf_perf_event_data_kern()
> 2: (b7) r6 = 0                        ; R6_w=0
> ; k = 0; @ sample_filter.bpf.c:217
> 3: (63) *(u32 *)(r10 -4) = r6         ; R6_w=0 R10=fp0 fp-8=0000????
> ; if (use_idx_hash) { @ sample_filter.bpf.c:219
> 4: (18) r1 = 0xffffa9e081be6000       ; R1_w=map_value(map=sample_f.rodata,ks=4,vs=4)
> 6: (61) r1 = *(u32 *)(r1 +0)          ; R1_w=0
> 7: (15) if r1 == 0x0 goto pc+42       ; R1_w=0
> ; k = *idx; @ sample_filter.bpf.c:239
> 50: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
> ; entry = bpf_map_lookup_elem(&filters, &k); @ sample_filter.bpf.c:244
> 51: (07) r2 += -4                     ; R2_w=fp-4
> 52: (18) r1 = 0xffff8ecc22be8000      ; R1_w=map_ptr(map=filters,ks=4,vs=1536)
> 54: (85) call bpf_map_lookup_elem#1   ; R0_w=map_value_or_null(id=1,map=filters,ks=4,vs=1536)
> 55: (bf) r6 = r0                      ; R0_w=map_value_or_null(id=1,map=filters,ks=4,vs=1536) R6_w=map_value_or_null(id=1,map=filters,ks=4,vs=1536)
> 56: (79) r5 = *(u64 *)(r10 -48)       ; R5_w=trusted_ptr_bpf_perf_event_data_kern() R10=fp0 fp-48=trusted_ptr_bpf_perf_event_data_kern()
> ; if (entry == NULL) @ sample_filter.bpf.c:245
> 57: (15) if r6 == 0x0 goto pc-21      ; R6_w=map_value(map=filters,ks=4,vs=1536)
> 58: (b7) r8 = 0                       ; R8_w=0
> 59: (b7) r9 = 0                       ; R9_w=0
> 60: (b7) r1 = 0                       ; R1_w=0
> 61: (7b) *(u64 *)(r10 -40) = r1       ; R1_w=0 R10=fp0 fp-40_w=0
> 62: (05) goto pc+9
> ; for (i = 0; i < MAX_FILTERS; i++) { @ sample_filter.bpf.c:248
> 72: (bf) r7 = r6                      ; R6=map_value(map=filters,ks=4,vs=1536) R7_w=map_value(map=filters,ks=4,vs=1536)
> 73: (0f) r7 += r8                     ; R7_w=map_value(map=filters,ks=4,vs=1536) R8=0
> ; struct perf_sample_data___new *data = (void *)kctx->data; @ sample_filter.bpf.c:77
> 
> 
> <SNIP tons of verifier lines>
> 
> 269: (65) if r2 s> 0x1 goto pc+28     ; R2_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
> 270: (15) if r2 == 0x0 goto pc+45     ; R2_w=1
> 271: (15) if r2 == 0x1 goto pc+1      ; R2_w=1
> ; CHECK_RESULT(sample_data, !=, entry[i].value) @ sample_filter.bpf.c:256
> 273: (bf) r2 = r6                     ; R2_w=map_value(map=filters,ks=4,vs=1536) R6=map_value(map=filters,ks=4,vs=1536)
> 274: (0f) r2 += r8                    ; R2_w=map_value(map=filters,ks=4,vs=1536,off=984) R8_w=984
> 275: (79) r2 = *(u64 *)(r2 +16)       ; R2_w=scalar()
> 276: (5d) if r1 != r2 goto pc+101 378: R0_w=1 R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=scalar() R3_w=ptr_perf_sample_data() R4_w=scalar(smin=umin=umin32=1,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R5=trusted_ptr_bpf_perf_event_data_kern() R6=map_value(map=filters,ks=4,vs=1536) R7_w=map_value(map=filters,ks=4,vs=1536,off=984) R8_w=984 R9=0 R10=fp0 fp-8=mmmm???? fp-40=0 fp-48=trusted_ptr_bpf_perf_event_data_kern()
> 378: (67) r9 <<= 32                   ; R9_w=0
> 379: (bf) r1 = r9                     ; R1_w=0 R9_w=0
> 380: (77) r1 >>= 32                   ; R1_w=0
> 381: (b7) r9 = 0                      ; R9_w=0
> 382: (15) if r1 == 0x0 goto pc-313    ; R1_w=0
> ; for (i = 0; i < MAX_FILTERS; i++) { @ sample_filter.bpf.c:248
> 70: (07) r8 += 24                     ; R8_w=1008
> 71: (15) if r8 == 0x600 goto pc-25    ; R8_w=1008
> 72: (bf) r7 = r6                      ; R6=map_value(map=filters,ks=4,vs=1536) R7_w=map_value(map=filters,ks=4,vs=1536)
> 73: (0f) r7 += r8                     ; R7_w=map_value(map=filters,ks=4,vs=1536,off=1008) R8_w=1008
> ; struct perf_sample_data___new *data = (void *)kctx->data; @ sample_filter.bpf.c:77
> 74: (79) r3 = *(u64 *)(r5 +8)
> processed 2344 insns (limit 1000000) max_states_per_insn 4 total_states 57 peak_states 57 mark_read 6
> -- END PROG LOAD LOG --
> libbpf: prog 'perf_sample_filter': failed to load: -EAGAIN
> libbpf: failed to load object 'sample_filter_bpf'
> libbpf: failed to load BPF skeleton 'sample_filter_bpf': -EAGAIN
> Failed to load perf sample-filter BPF skeleton
> failed to set filter "BPF" on event cpu_core/cycles/P with 11 (Resource temporarily unavailable)
> ^Z
> [1]+  Stopped                 perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
> root@x1:~# 
> root@x1:~# fg
> perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
> ^C^C^C^C^C
> 
> 
> ^C^C^C^C^C
> 
> Well, I just had it suspended, I'm not being able to stop it:
> 
> root@x1:~# ps 
>     PID TTY          TIME CMD
> 1355566 pts/15   00:00:00 sudo
> 1355567 pts/15   00:00:00 su
> 1355570 pts/15   00:00:00 bash
> 1373907 pts/15   00:00:44 perf
> 1373908 pts/15   00:00:00 perf-exec
> 1374019 pts/15   00:00:00 ps
> root@x1:~# kill -9 1373907
> root@x1:~# kill -9 1373907
> -bash: kill: (1373907) - No such process
> [1]+  Killed                  perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
> root@x1:~# 
> 
> Ok, killed.
> 
> It gets stuck in that MAP_FREEZE sys_bpf call:

Interesting.

> 
> root@x1:~# perf trace perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
> <SNIP>
>  16349.552 ( 0.023 ms): perf/1374043 bpf(uattr: (union bpf_attr){(struct){.map_type = (__u32)2,.key_size = (__u32)4,.value_size = (__u32)4,.max_entries = (__u32)1,.map_flags = (__u32)1152,.map_name = (char[16])['s','a','m','p','l','e','_','f','.','r','o','d','a','t','a',],.btf_fd = (__u32)61,.btf_value_type_id = (__u32)59,},(struct){.map_fd = (__u32)2,.key = (__u64)4294967300,(union){.value = (__u64)1152,.next_key = (__u64)1152,},.flags = (__u64)8101238451258523648,},.batch = (struct){.in_batch = (__u64)17179869186,.out_batch = (__u64)4294967300,.keys = (__u64)1152,.values = (__u64)8101238451258523648,.count = (__u32)1717527916,.map_fd = (__u32)1685025326,.elem_flags = (__u64)6386785,.flags = (__u64)61,},(struct){.prog_type = (__u32)2,.insn_cnt = (__u32)4,.insns = (__u64)4294967300,.license = (__u64)1152,.log_size = (__u32)1886216563,.log_buf = (__u64)7237128669819266412,.kern_version = (__u32)6386785,.prog_name = (char[16])['=',';',],.func_info = (__u64)140735998446044,.func_info_cnt = (__u32)520807888,.line_info = (__u64)140735998445848,.line_info_cnt = (__u32)4874802,.attach_btf_id = (__u32)4,(union){.attach_prog_fd = (__u32)2805057752,.attach_btf_obj_fd = (__u32)2805057752,},.core_relo_cnt = (__u32)32767,.fd_array = (__u64)15131936,.core_relos = (__u64)140735998445952,.core_relo_rec_size = (__u32)15049664,.prog_token_fd = (__s32)-1489909376,},(struct){.pathname = (__u64)17179869186,.bpf_fd = (__u32)4,.file_flags = (__u32)1,.path_fd = (__s32)1152,},(struct){(union){.target_fd = (__u32)2,.target_ifindex = (__u32)2,},.attach_bpf_fd = (__u32)4,.attach_type = (__u32)4,.attach_flags = (__u32)1,.replace_bpf_fd = (__u32)1152,.expected_revision = (__u64)8101238451258523648,},.test = (struct){.prog_fd = (__u32)2,.retval = (__u32)4,.data_size_in = (__u32)4,.data_size_out = (__u32)1,.data_in = (__u64)1152,.data_out = (__u64)8101238451258523648,.repeat = (__u32)1717527916,.duration = (__u32)1685025326,.ctx_size_in = (__u32)6386785,.ctx_in = (__u64)61,.ctx_out = (__u64)59,},(struct){(union){.start_id = (__u32)2,.prog_id = (__u32)2,.map_id = (__u32)2,.btf_id ) = 62

This line misses 'cmd:' part.  Probably it was 0 (MAP_CREATE).  It seems
we need to print the value even if it's 0.

>  16349.576 ( 0.002 ms): perf/1374043 dup3(oldfd: 62, newfd: 60<anon_inode:bpf-map>, flags: 524288)         = 60
>  16349.582 ( 0.001 ms): perf/1374043 close(fd: 62)                                                         = 0
>  16349.586 ( 0.006 ms): perf/1374043 bpf(cmd: MAP_UPDATE_ELEM, uattr: (union bpf_attr){(struct){.map_type = (__u32)60,.value_size = (__u32)2805058140,.max_entries = (__u32)32767,.map_flags = (__u32)1994993664,.inner_map_fd = (__u32)32622,.map_ifindex = (__u32)2,.btf_fd = (__u32)2805058000,.btf_key_type_id = (__u32)32767,.btf_value_type_id = (__u32)16,.btf_vmlinux_value_type_id = (__u32)48,.map_extra = (__u64)140735998446240,.value_type_btf_obj_fd = (__s32)-1489909280,.map_token_fd = (__s32)32767,},(struct){.map_fd = (__u32)60,.key = (__u64)140735998446172,(union){.value = (__u64)140112418123776,.next_key = (__u64)140112418123776,},},.batch = (struct){.in_batch = (__u64)60,.out_batch = (__u64)140735998446172,.keys = (__u64)140112418123776,.count = (__u32)7379400,.elem_flags = (__u64)9110741168,.flags = (__u64)140735998446032,},(struct){.prog_type = (__u32)60,.insns = (__u64)140735998446172,.license = (__u64)140112418123776,.log_buf = (__u64)7379400,.kern_version = (__u32)520806576,.prog_flags = (__u32)2,.prog_name = (char[16])[208,201,'1',167,255,127,16,'0',],.prog_ifindex = (__u32)2805058208,.expected_attach_type = (__u32)32767,.prog_btf_fd = (__u32)2805058016,.func_info_rec_size = (__u32)32767,.func_info = (__u64)410826951296,.func_info_cnt = (__u32)4874930,.line_info = (__u64)40,.line_info_cnt = (__u32)1,.attach_btf_id = (__u32)4,(union){.attach_prog_fd = (__u32)520846144,.attach_btf_obj_fd = (__u32)520846144,},.fd_array = (__u64)60,.core_relos = (__u64)140735998445776,.core_relo_rec_size = (__u32)2805057752,.log_true_size = (__u32)524288,},(struct){.pathname = (__u64)60,.bpf_fd = (__u32)2805058140,.file_flags = (__u32)32767,.path_fd = (__s32)1994993664,},(struct){(union){.target_fd = (__u32)60,.target_ifindex = (__u32)60,},.attach_type = (__u32)2805058140,.attach_flags = (__u32)32767,.replace_bpf_fd = (__u32)1994993664,(union){.relative_fd = (__u32)32622,.relative_id = (__u32)32622,},},.test = (struct){.prog_fd = (__u32)60,.data_size_in = (__u32)2805058140,.data_size_out = (__u32)32767,.data_in = (__u64)140112418123776,.repeat = (__u32)7379400,.) = 0
> 
>  16349.594 ( 0.002 ms): perf/1374043 bpf(cmd: MAP_FREEZE, uattr: (union bpf_attr){(struct){.map_type = (__u32)60,.value_size = (__u32)2805058140,.max_entries = (__u32)32767,.map_flags = (__u32)1994993664,.inner_map_fd = (__u32)32622,.map_ifindex = (__u32)2,.btf_fd = (__u32)2805058000,.btf_key_type_id = (__u32)32767,.btf_value_type_id = (__u32)16,.btf_vmlinux_value_type_id = (__u32)48,.map_extra = (__u64)140735998446240,.value_type_btf_obj_fd = (__s32)-1489909280,.map_token_fd = (__s32)32767,},(struct){.map_fd = (__u32)60,.key = (__u64)140735998446172,(union){.value = (__u64)140112418123776,.next_key = (__u64)140112418123776,},},.batch = (struct){.in_batch = (__u64)60,.out_batch = (__u64)140735998446172,.keys = (__u64)140112418123776,.count = (__u32)7379400,.elem_flags = (__u64)9110741168,.flags = (__u64)140735998446032,},(struct){.prog_type = (__u32)60,.insns = (__u64)140735998446172,.license = (__u64)140112418123776,.log_buf = (__u64)7379400,.kern_version = (__u32)520806576,.prog_flags = (__u32)2,.prog_name = (char[16])[208,201,'1',167,255,127,16,'0',],.prog_ifindex = (__u32)2805058208,.expected_attach_type = (__u32)32767,.prog_btf_fd = (__u32)2805058016,.func_info_rec_size = (__u32)32767,.func_info = (__u64)410826951296,.func_info_cnt = (__u32)4874930,.line_info = (__u64)40,.line_info_cnt = (__u32)1,.attach_btf_id = (__u32)4,(union){.attach_prog_fd = (__u32)520846144,.attach_btf_obj_fd = (__u32)520846144,},.fd_array = (__u64)60,.core_relos = (__u64)140735998445776,.core_relo_rec_size = (__u32)2805057752,.log_true_size = (__u32)524288,},(struct){.pathname = (__u64)60,.bpf_fd = (__u32)2805058140,.file_flags = (__u32)32767,.path_fd = (__s32)1994993664,},(struct){(union){.target_fd = (__u32)60,.target_ifindex = (__u32)60,},.attach_type = (__u32)2805058140,.attach_flags = (__u32)32767,.replace_bpf_fd = (__u32)1994993664,(union){.relative_fd = (__u32)32622,.relative_id = (__u32)32622,},},.test = (struct){.prog_fd = (__u32)60,.data_size_in = (__u32)2805058140,.data_size_out = (__u32)32767,.data_in = (__u64)140112418123776,.repeat = (__u32)7379400,.ctx_s) = 0
>  o
> 16349.601 ( 0.050 ms): perf/1374043 mmap(addr: 0x7f6e76e93000, len: 4096, prot: READ, flags: SHARED|FIXED, fd: 60) = 0x7f6e76e93000
> 
> root@x1:~# uname -a
> Linux x1 6.13.9-100.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Mar 29 01:27:18 UTC 2025 x86_64 GNU/Linux
> root@x1:~#
> 
> root@x1:~# grep -m1 "model name" /proc/cpuinfo 
> model name	: 13th Gen Intel(R) Core(TM) i7-1365U
> root@x1:~# 

Well.. you don't need that on Intel and just use 'perf mem record'. :)

> 
> Now trying on another machine:
> 
> root@number:~# uname -a
> Linux number 6.15.0-rc4+ #2 SMP PREEMPT_DYNAMIC Tue Apr 29 15:56:43 -03 2025 x86_64 GNU/Linux
> root@number:~# grep -m1 "model name" /proc/cpuinfo
> model name	: AMD Ryzen 9 9950X3D 16-Core Processor
> root@number:~# perf record -aW --sample-mem-info -e cycles:P --filter 'mem_op == load || mem_op == store' sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.870 MB perf.data (1199 samples) ]
> root@number:~#
> 
> root@number:~# perf mem report -F overhead,dtlb,dso --stdio | head -20
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 1K of event 'cycles:P'
> # Total weight : 7879
> # Sort order   : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc
> #
> #           ---------- D-TLB -----------                                   
> # Overhead   L1-Hit L2-Hit   Miss  Other  Shared Object                    
> # ........  ............................  .................................
> #
>     48.51%    44.8%  18.7%  28.3%   8.2%  [kernel.kallsyms]                
>     11.97%     2.1%   1.4%  96.0%   0.5%  libc.so.6                        
>      8.58%    85.7%  14.1%   0.0%   0.3%  libxul.so                        
>      6.76%   100.0%   0.0%   0.0%   0.0%  libfreeblpriv3.so                
>      6.08%   100.0%   0.0%   0.0%   0.0%  libsystemd-shared-257.5-2.fc42.so
>      4.59%   100.0%   0.0%   0.0%   0.0%  firefox                          
>      4.33%   100.0%   0.0%   0.0%   0.0%  libgallium-25.0.4.so             
> root@number:~#
> 
> Looks better :-)
> 
> Having the record as an alias seems interesting, ditto for the report.

Yep, you can update 'perf mem record'.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
  2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
                   ` (11 preceding siblings ...)
  2025-05-02 16:00 ` [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Arnaldo Carvalho de Melo
@ 2025-05-08  4:12 ` Ravi Bangoria
  2025-05-09 16:17   ` Namhyung Kim
  12 siblings, 1 reply; 23+ messages in thread
From: Ravi Bangoria @ 2025-05-08  4:12 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang, Jiri Olsa,
	Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Leo Yan, Ravi Bangoria

Hi Namhyung,

I feel the overall idea is good. Running few simple perf-mem commands
on AMD works fine too. Few general feedback below.

> The name of some new fields are the same as the corresponding sort
> keys (mem, op, snoop) so I had to change the order whether it's
> applied as an output field or a sort key.  Maybe it's better to name
> them differently but I couldn't come up with better ideas.

1) These semantic changes of the field name seems counter intuitive
   (to me). Example:

   -F mem:

     Without patch:

     $ perf mem report -F overhead,sample,mem --stdio
     # Overhead       Samples  Memory access
         39.29%             1  L3 hit
         37.50%            21  N/A
         23.21%            13  L1 hit

     With patch:

     $ perf mem report -F overhead,sample,mem --stdio
     #                          Memory
     # Overhead       Samples    Other
        100.00%            35   100.0%

   -F 'snoop':

     Without patch:

     $ perf mem report -F overhead,sample,snoop --stdio
     # Overhead       Samples  Snoop
         60.71%            34  N/A
         39.29%             1  HitM
   
     With patchset:

     $ perf mem report -F overhead,sample,snoop --stdio
     #                         --- Snoop ----
     # Overhead       Samples     HitM  Other
        100.00%            35    39.3%  60.7%

2) It was not intuitive (to me:)) that perf-mem overhead is calculated
   using sample->weight by overwriting sample->period. I also don't see
   it documented anywhere (or did I miss it?)

   perf report:

     $ perf report -F overhead,sample,period,dso --stdio
     # Overhead  Samples   Period  Shared Object
         80.00%       28  2800000  [kernel.kallsyms]
          5.71%        2   200000  ld-linux-x86-64.so.2
          5.71%        2   200000  libc.so.6
          5.71%        2   200000  ls
          2.86%        1   100000  libpcre2-8.so.0.11.2

   perf mem report:

     $ perf mem report -F overhead,sample,period,dso --stdio
     # Overhead  Samples   Period  Shared Object
         87.50%       28       49  [kernel.kallsyms]
          3.57%        2        2  ld-linux-x86-64.so.2
          3.57%        2        2  libc.so.6
          3.57%        2        2  ls
          1.79%        1        1  libpcre2-8.so.0.11.2

3) Similarly, it was not intuitive (again, to me:)) that -F op/snoop/dtlb
   percentages are calculated based on sample->weight.

4) I've similar recommended perf-mem command in perf-amd-ibs man page.
   Can you please update alternate command there.
   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-amd-ibs.txt?h=v6.15-rc5#n167

Please correct me if I'm missing anything.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
  2025-05-08  4:12 ` Ravi Bangoria
@ 2025-05-09 16:17   ` Namhyung Kim
  2025-05-12 10:01     ` Ravi Bangoria
  0 siblings, 1 reply; 23+ messages in thread
From: Namhyung Kim @ 2025-05-09 16:17 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang, Jiri Olsa,
	Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Leo Yan, Stephane Eranian

Hi Ravi,

On Thu, May 08, 2025 at 09:42:41AM +0530, Ravi Bangoria wrote:
> Hi Namhyung,
> 
> I feel the overall idea is good. Running few simple perf-mem commands
> on AMD works fine too. Few general feedback below.

Thanks for your review!

> 
> > The name of some new fields are the same as the corresponding sort
> > keys (mem, op, snoop) so I had to change the order whether it's
> > applied as an output field or a sort key.  Maybe it's better to name
> > them differently but I couldn't come up with better ideas.
> 
> 1) These semantic changes of the field name seems counter intuitive
>    (to me). Example:
> 
>    -F mem:
> 
>      Without patch:
> 
>      $ perf mem report -F overhead,sample,mem --stdio
>      # Overhead       Samples  Memory access
>          39.29%             1  L3 hit
>          37.50%            21  N/A
>          23.21%            13  L1 hit
> 
>      With patch:
> 
>      $ perf mem report -F overhead,sample,mem --stdio
>      #                          Memory
>      # Overhead       Samples    Other
>         100.00%            35   100.0%

Yep, that's because I split the 'mem' part to 'cache' and 'mem' because
he_mem_stat can handle up to 8 entries.  As your samples hit mostly in
the caches, you'd get the similar result when you run:

  $ perf mem report -F overhead,sample,cache --stdio

> 
>    -F 'snoop':
> 
>      Without patch:
> 
>      $ perf mem report -F overhead,sample,snoop --stdio
>      # Overhead       Samples  Snoop
>          60.71%            34  N/A
>          39.29%             1  HitM
>    
>      With patchset:
> 
>      $ perf mem report -F overhead,sample,snoop --stdio
>      #                         --- Snoop ----
>      # Overhead       Samples     HitM  Other
>         100.00%            35    39.3%  60.7%

This matches to 'Overhead' distribution without patch, right?

> 
> 2) It was not intuitive (to me:)) that perf-mem overhead is calculated
>    using sample->weight by overwriting sample->period. I also don't see
>    it documented anywhere (or did I miss it?)

I don't see the documentation and I also find it confusing.  Sometimes I
think the weight is better but sometimes not. :(  At least we could add
and option to control that (like --use-weight ?).

Also we now have 'weight' output field so users can see it, althought it
shows averages.

> 
>    perf report:
> 
>      $ perf report -F overhead,sample,period,dso --stdio
>      # Overhead  Samples   Period  Shared Object
>          80.00%       28  2800000  [kernel.kallsyms]
>           5.71%        2   200000  ld-linux-x86-64.so.2
>           5.71%        2   200000  libc.so.6
>           5.71%        2   200000  ls
>           2.86%        1   100000  libpcre2-8.so.0.11.2
> 
>    perf mem report:
> 
>      $ perf mem report -F overhead,sample,period,dso --stdio
>      # Overhead  Samples   Period  Shared Object
>          87.50%       28       49  [kernel.kallsyms]
>           3.57%        2        2  ld-linux-x86-64.so.2
>           3.57%        2        2  libc.so.6
>           3.57%        2        2  ls
>           1.79%        1        1  libpcre2-8.so.0.11.2
> 
> 3) Similarly, it was not intuitive (again, to me:)) that -F op/snoop/dtlb
>    percentages are calculated based on sample->weight.

Hmm.. ok.  Maybe better to use the original period for percentage
breakdown in the new output fields.  For examples, in the above result
you have 13 samples for L1 and 1 sample for L3 but the weight of L3
access is bigger.  But I guess users probably want to see L1 access was
dominant.

> 
> 4) I've similar recommended perf-mem command in perf-amd-ibs man page.
>    Can you please update alternate command there.
>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-amd-ibs.txt?h=v6.15-rc5#n167

Sure will do.

Thanks,
Namhyung

> 
> Please correct me if I'm missing anything.
> 
> Thanks,
> Ravi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1)
  2025-05-09 16:17   ` Namhyung Kim
@ 2025-05-12 10:01     ` Ravi Bangoria
  0 siblings, 0 replies; 23+ messages in thread
From: Ravi Bangoria @ 2025-05-12 10:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang, Jiri Olsa,
	Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users@vger.kernel.org, Leo Yan, Stephane Eranian,
	Ravi Bangoria

Hi Namhyung,

>>> The name of some new fields are the same as the corresponding sort
>>> keys (mem, op, snoop) so I had to change the order whether it's
>>> applied as an output field or a sort key.  Maybe it's better to name
>>> them differently but I couldn't come up with better ideas.
>>
>> 1) These semantic changes of the field name seems counter intuitive
>>    (to me). Example:
>>
>>    -F mem:
>>
>>      Without patch:
>>
>>      $ perf mem report -F overhead,sample,mem --stdio
>>      # Overhead       Samples  Memory access
>>          39.29%             1  L3 hit
>>          37.50%            21  N/A
>>          23.21%            13  L1 hit
>>
>>      With patch:
>>
>>      $ perf mem report -F overhead,sample,mem --stdio
>>      #                          Memory
>>      # Overhead       Samples    Other
>>         100.00%            35   100.0%
> 
> Yep, that's because I split the 'mem' part to 'cache' and 'mem' because
> he_mem_stat can handle up to 8 entries.

+1.

>  As your samples hit mostly in
> the caches, you'd get the similar result when you run:
> 
>   $ perf mem report -F overhead,sample,cache --stdio
> 
>>
>>    -F 'snoop':
>>
>>      Without patch:
>>
>>      $ perf mem report -F overhead,sample,snoop --stdio
>>      # Overhead       Samples  Snoop
>>          60.71%            34  N/A
>>          39.29%             1  HitM
>>    
>>      With patchset:
>>
>>      $ perf mem report -F overhead,sample,snoop --stdio
>>      #                         --- Snoop ----
>>      # Overhead       Samples     HitM  Other
>>         100.00%            35    39.3%  60.7%
> 
> This matches to 'Overhead' distribution without patch, right?

Right, it does.

>> 2) It was not intuitive (to me:)) that perf-mem overhead is calculated
>>    using sample->weight by overwriting sample->period. I also don't see
>>    it documented anywhere (or did I miss it?)
> 
> I don't see the documentation and I also find it confusing.  Sometimes I
> think the weight is better but sometimes not. :(  At least we could add
> and option to control that (like --use-weight ?).

this and below ...

> Also we now have 'weight' output field so users can see it, althought it
> shows averages.
> 
>>
>>    perf report:
>>
>>      $ perf report -F overhead,sample,period,dso --stdio
>>      # Overhead  Samples   Period  Shared Object
>>          80.00%       28  2800000  [kernel.kallsyms]
>>           5.71%        2   200000  ld-linux-x86-64.so.2
>>           5.71%        2   200000  libc.so.6
>>           5.71%        2   200000  ls
>>           2.86%        1   100000  libpcre2-8.so.0.11.2
>>
>>    perf mem report:
>>
>>      $ perf mem report -F overhead,sample,period,dso --stdio
>>      # Overhead  Samples   Period  Shared Object
>>          87.50%       28       49  [kernel.kallsyms]
>>           3.57%        2        2  ld-linux-x86-64.so.2
>>           3.57%        2        2  libc.so.6
>>           3.57%        2        2  ls
>>           1.79%        1        1  libpcre2-8.so.0.11.2
>>
>> 3) Similarly, it was not intuitive (again, to me:)) that -F op/snoop/dtlb
>>    percentages are calculated based on sample->weight.
> 
> Hmm.. ok.  Maybe better to use the original period for percentage
> breakdown in the new output fields.  For examples, in the above result
> you have 13 samples for L1 and 1 sample for L3 but the weight of L3
> access is bigger.  But I guess users probably want to see L1 access was
> dominant.

... I'm also not sure. Logically, it makes sense to use weight as overhead.
Also it dates back to ~2014 and nobody has complained so far. So I'm just
being pedantic 🙂. For now, how about just document it in the perf-mem man
page and leave it. Attaching the patch at the end.

>> 4) I've similar recommended perf-mem command in perf-amd-ibs man page.
>>    Can you please update alternate command there.
>>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/perf-amd-ibs.txt?h=v6.15-rc5#n167
> 
> Sure will do.

Thanks!

------------><---------------
From 7e4393ab7b20f8d89a5dece08fdd925e3e50b15a Mon Sep 17 00:00:00 2001
From: Ravi Bangoria <ravi.bangoria@amd.com>
Date: Mon, 12 May 2025 06:22:57 +0000
Subject: [PATCH] perf mem doc: Describe overhead calculation in brief

Unlike perf-report which uses sample period for overhead calculation,
perf-mem overhead is calculated using sample weight. Describe perf-mem
overhead calculation method in it's man page.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/Documentation/perf-mem.txt | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index a9e3c71a2205..965e73d37772 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -137,6 +137,25 @@ REPORT OPTIONS
 In addition, for report all perf report options are valid, and for record
 all perf record options.
 
+OVERHEAD CALCULATION
+--------------------
+Unlike linkperf:perf-report[1], which calculates overhead from the actual
+sample period, perf-mem overhead is calculated using sample weight. E.g.
+there are two samples in perf.data file, both with the same sample period,
+but one sample with weight 180 and the other with weight 20:
+
+  $ perf script -F period,data_src,weight,ip,sym
+  100000    629080842 |OP LOAD|LVL L3 hit|...     20       7e69b93ca524 strcmp
+  100000   1a29081042 |OP LOAD|LVL RAM hit|...   180   ffffffff82429168 memcpy
+
+  $ perf report -F overhead,symbol
+  50%   [.] strcmp
+  50%   [k] memcpy
+
+  $ perf mem report -F overhead,symbol
+  90%   [k] memcpy
+  10%   [.] strcmp
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]
-- 
2.43.0

Thanks,
Ravi

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-05-12 10:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-30 20:55 [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Namhyung Kim
2025-04-30 20:55 ` [PATCH 01/11] perf hist: Remove output field from sort-list properly Namhyung Kim
2025-04-30 20:55 ` [PATCH 02/11] perf record: Add --sample-mem-info option Namhyung Kim
2025-04-30 20:55 ` [PATCH 03/11] perf hist: Support multi-line header Namhyung Kim
2025-04-30 20:55 ` [PATCH 04/11] perf hist: Add struct he_mem_stat Namhyung Kim
2025-04-30 20:55 ` [PATCH 05/11] perf hist: Basic support for mem_stat accounting Namhyung Kim
2025-04-30 20:55 ` [PATCH 06/11] perf hist: Implement output fields for mem stats Namhyung Kim
2025-04-30 20:55 ` [PATCH 07/11] perf mem: Add 'op' output field Namhyung Kim
2025-04-30 20:55 ` [PATCH 08/11] perf hist: Hide unused mem stat columns Namhyung Kim
2025-05-02 16:18   ` Arnaldo Carvalho de Melo
2025-05-02 16:27   ` Arnaldo Carvalho de Melo
2025-05-02 18:21     ` Namhyung Kim
2025-04-30 20:55 ` [PATCH 09/11] perf mem: Add 'cache' and 'memory' output fields Namhyung Kim
2025-04-30 20:55 ` [PATCH 10/11] perf mem: Add 'snoop' output field Namhyung Kim
2025-04-30 20:55 ` [PATCH 11/11] perf mem: Add 'dtlb' " Namhyung Kim
2025-05-02 16:30   ` Arnaldo Carvalho de Melo
2025-05-02 18:38     ` Namhyung Kim
2025-05-02 19:21       ` Arnaldo Carvalho de Melo
2025-05-02 20:01         ` Namhyung Kim
2025-05-02 16:00 ` [RFC/PATCHSET 00/11] perf mem: Add new output fields for data source (v1) Arnaldo Carvalho de Melo
2025-05-08  4:12 ` Ravi Bangoria
2025-05-09 16:17   ` Namhyung Kim
2025-05-12 10:01     ` Ravi Bangoria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).