linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/5] TPEBS counting mode support
@ 2024-02-21  7:20 weilin.wang
  2024-02-21  7:20 ` [RFC PATCH v1 1/5] perf stat: Parse and find tpebs events when parsing metrics to prepare for perf record sampling weilin.wang
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: weilin.wang @ 2024-02-21  7:20 UTC (permalink / raw)
  To: weilin.wang, Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Adrian Hunter, Kan Liang
  Cc: linux-perf-users, linux-kernel, Perry Taylor, Samantha Alt,
	Caleb Biggers

From: Weilin Wang <weilin.wang@intel.com>

TPEBS is one of the features provided by the next generation of Intel PMU.
Please refer to Section 8.4.1 of "Intel® Architecture Instruction Set Extensions
Programming Reference" [1] for more details about this feature.

This set of patches supports TPEBS in counting mode. The code works in the
following way: it forks a perf record process from perf stat when retire_latency
of one or more events are used in a metric formula. Perf stat would send a
SIGTERM signal to perf record before it needs the retire latency value for
metric calculation. Perf stat will then process sample data to extract the
retire latency data for metric calculations. Currently, the code uses the
arithmetic average of retire latency values.

[1] https://www.intel.com/content/www/us/en/content-details/812218/intel-architecture-instruction-set-extensions-programming-reference.html?wapkw=future%20features

  perf stat: Parse and find tpebs events when parsing metrics to prepare
    for perf record sampling
  perf stat: Fork and launch perf record when perf stat needs to get
    retire latency value for a metric.
  perf stat: Add retire latency values into the expr_parse_ctx to
    prepare for final metric calculation
  perf stat: Create another thread for sample data processing
  perf stat: Add retire latency print functions to print out at the very
    end of print out

 tools/perf/builtin-stat.c      | 222 ++++++++++++++++++++++++++++++++-
 tools/perf/util/data.c         |   4 +
 tools/perf/util/data.h         |   1 +
 tools/perf/util/metricgroup.c  |  61 +++++++--
 tools/perf/util/metricgroup.h  |  18 ++-
 tools/perf/util/stat-display.c |  65 ++++++++++
 tools/perf/util/stat-shadow.c  |  18 +++
 tools/perf/util/stat.h         |   5 +
 8 files changed, 376 insertions(+), 18 deletions(-)

--
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH v1 1/5] perf stat: Parse and find tpebs events when parsing metrics to prepare for perf record sampling
  2024-02-21  7:20 [RFC PATCH v1 0/5] TPEBS counting mode support weilin.wang
@ 2024-02-21  7:20 ` weilin.wang
  2024-02-21  7:20 ` [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric weilin.wang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: weilin.wang @ 2024-02-21  7:20 UTC (permalink / raw)
  To: weilin.wang, Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Adrian Hunter, Kan Liang
  Cc: linux-perf-users, linux-kernel, Perry Taylor, Samantha Alt,
	Caleb Biggers

From: Weilin Wang <weilin.wang@intel.com>

Metrics that use tpebs values would use the :retire_latency keyword in
formulas. We put all these events into a list and pass the list to perf
record to collect their retire latency value.

Signed-off-by: Weilin Wang <weilin.wang@intel.com>
---
 tools/perf/builtin-stat.c     | 38 +++++++++++++++++++++----
 tools/perf/util/metricgroup.c | 52 +++++++++++++++++++++++++++--------
 tools/perf/util/metricgroup.h | 10 ++++++-
 tools/perf/util/stat.h        |  2 ++
 4 files changed, 84 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 07b48f6df48e..93a7cba30f12 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -162,6 +162,7 @@ static struct perf_stat_config stat_config = {
 	.ctl_fd			= -1,
 	.ctl_fd_ack		= -1,
 	.iostat_run		= false,
+	.tpebs_event_size = 0,
 };
 
 static bool cpus_map_matched(struct evsel *a, struct evsel *b)
@@ -686,6 +687,12 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
 	return COUNTER_FATAL;
 }
 
+static int __run_perf_record(void)
+{
+	pr_debug("Prepare perf record for retire_latency\n");
+	return 0;
+}
+
 static int __run_perf_stat(int argc, const char **argv, int run_idx)
 {
 	int interval = stat_config.interval;
@@ -703,6 +710,15 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	int err;
 	bool second_pass = false;
 
+	//Prepare perf record for sampling event retire_latency before fork and prepare workload
+	if (stat_config.tpebs_event_size > 0) {
+		int ret;
+
+		ret = __run_perf_record();
+		if (ret)
+			return ret;
+	}
+
 	if (forks) {
 		if (evlist__prepare_workload(evsel_list, &target, argv, is_pipe, workload_exec_failed_signal) < 0) {
 			perror("failed to prepare workload");
@@ -2061,7 +2077,9 @@ static int add_default_attributes(void)
 						stat_config.metric_no_threshold,
 						stat_config.user_requested_cpu_list,
 						stat_config.system_wide,
-						&stat_config.metric_events);
+						&stat_config.metric_events,
+						&stat_config.tpebs_events,
+						&stat_config.tpebs_event_size);
 	}
 
 	if (smi_cost) {
@@ -2094,7 +2112,9 @@ static int add_default_attributes(void)
 						stat_config.metric_no_threshold,
 						stat_config.user_requested_cpu_list,
 						stat_config.system_wide,
-						&stat_config.metric_events);
+						&stat_config.metric_events,
+						&stat_config.tpebs_events,
+						&stat_config.tpebs_event_size);
 	}
 
 	if (topdown_run) {
@@ -2128,7 +2148,9 @@ static int add_default_attributes(void)
 						/*metric_no_threshold=*/true,
 						stat_config.user_requested_cpu_list,
 						stat_config.system_wide,
-						&stat_config.metric_events) < 0)
+						&stat_config.metric_events,
+						&stat_config.tpebs_events,
+						&stat_config.tpebs_event_size) < 0)
 			return -1;
 	}
 
@@ -2169,7 +2191,9 @@ static int add_default_attributes(void)
 							/*metric_no_threshold=*/true,
 							stat_config.user_requested_cpu_list,
 							stat_config.system_wide,
-							&stat_config.metric_events) < 0)
+							&stat_config.metric_events,
+							/*&stat_config.tpebs_events=*/NULL,
+							/*stat_config.tpebs_event_size=*/0) < 0)
 				return -1;
 
 			evlist__for_each_entry(metric_evlist, metric_evsel) {
@@ -2689,6 +2713,8 @@ int cmd_stat(int argc, const char **argv)
 		}
 	}
 
+	INIT_LIST_HEAD(&stat_config.tpebs_events);
+
 	/*
 	 * Metric parsing needs to be delayed as metrics may optimize events
 	 * knowing the target is system-wide.
@@ -2702,7 +2728,9 @@ int cmd_stat(int argc, const char **argv)
 					stat_config.metric_no_threshold,
 					stat_config.user_requested_cpu_list,
 					stat_config.system_wide,
-					&stat_config.metric_events);
+					&stat_config.metric_events,
+					&stat_config.tpebs_events,
+					&stat_config.tpebs_event_size);
 		zfree(&metrics);
 	}
 
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 6231044a491e..6c16e5a0b1fc 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -275,7 +275,8 @@ static bool contains_metric_id(struct evsel **metric_events, int num_events,
  */
 static int setup_metric_events(const char *pmu, struct hashmap *ids,
 			       struct evlist *metric_evlist,
-			       struct evsel ***out_metric_events)
+			       struct evsel ***out_metric_events,
+			       size_t tpebs_event_size)
 {
 	struct evsel **metric_events;
 	const char *metric_id;
@@ -284,7 +285,7 @@ static int setup_metric_events(const char *pmu, struct hashmap *ids,
 	bool all_pmus = !strcmp(pmu, "all") || perf_pmus__num_core_pmus() == 1 || !is_pmu_core(pmu);
 
 	*out_metric_events = NULL;
-	ids_size = hashmap__size(ids);
+	ids_size = hashmap__size(ids) - tpebs_event_size;
 
 	metric_events = calloc(sizeof(void *), ids_size + 1);
 	if (!metric_events)
@@ -321,6 +322,7 @@ static int setup_metric_events(const char *pmu, struct hashmap *ids,
 		}
 	}
 	if (matched_events < ids_size) {
+		pr_debug("Error: matched_events = %lu, ids_size = %lu\n", matched_events, ids_size);
 		free(metric_events);
 		return -EINVAL;
 	}
@@ -668,7 +670,9 @@ static int decode_all_metric_ids(struct evlist *perf_evlist, const char *modifie
 static int metricgroup__build_event_string(struct strbuf *events,
 					   const struct expr_parse_ctx *ctx,
 					   const char *modifier,
-					   bool group_events)
+					   bool group_events,
+					   struct list_head *tpebs_events __maybe_unused,
+					   size_t *tpebs_event_size)
 {
 	struct hashmap_entry *cur;
 	size_t bkt;
@@ -681,9 +685,21 @@ static int metricgroup__build_event_string(struct strbuf *events,
 	hashmap__for_each_entry(ctx->ids, cur, bkt) {
 		const char *sep, *rsep, *id = cur->pkey;
 		enum perf_tool_event ev;
+		char *p = strstr(id, ":retire_latency");
 
 		pr_debug("found event %s\n", id);
 
+		if (p) {
+			struct tpebs_event *new_event = malloc(sizeof(struct tpebs_event));
+			*p = '\0';
+			new_event->tpebs_name = strdup(id);
+			*tpebs_event_size += 1;
+			pr_debug("retire_latency required, tpebs_event_size=%lu, new_event=%s\n",
+			*tpebs_event_size, new_event->name);
+			list_add_tail(&new_event->nd, tpebs_events);
+			continue;
+		}
+
 		/* Always move tool events outside of the group. */
 		ev = perf_tool_event__from_str(id);
 		if (ev != PERF_TOOL_NONE) {
@@ -1447,7 +1463,8 @@ static int build_combined_expr_ctx(const struct list_head *metric_list,
 static int parse_ids(bool metric_no_merge, struct perf_pmu *fake_pmu,
 		     struct expr_parse_ctx *ids, const char *modifier,
 		     bool group_events, const bool tool_events[PERF_TOOL_MAX],
-		     struct evlist **out_evlist)
+		     struct evlist **out_evlist, struct list_head *tpebs_events,
+		     size_t *tpebs_event_size)
 {
 	struct parse_events_error parse_error;
 	struct evlist *parsed_evlist;
@@ -1490,7 +1507,7 @@ static int parse_ids(bool metric_no_merge, struct perf_pmu *fake_pmu,
 		}
 	}
 	ret = metricgroup__build_event_string(&events, ids, modifier,
-					      group_events);
+					      group_events, tpebs_events, tpebs_event_size);
 	if (ret)
 		return ret;
 
@@ -1529,7 +1546,9 @@ static int parse_groups(struct evlist *perf_evlist,
 			bool system_wide,
 			struct perf_pmu *fake_pmu,
 			struct rblist *metric_events_list,
-			const struct pmu_metrics_table *table)
+			const struct pmu_metrics_table *table,
+			struct list_head *tpebs_events,
+			size_t *tpebs_event_size)
 {
 	struct evlist *combined_evlist = NULL;
 	LIST_HEAD(metric_list);
@@ -1561,7 +1580,8 @@ static int parse_groups(struct evlist *perf_evlist,
 					/*modifier=*/NULL,
 					/*group_events=*/false,
 					tool_events,
-					&combined_evlist);
+					&combined_evlist,
+					tpebs_events, tpebs_event_size);
 		}
 		if (combined)
 			expr__ctx_free(combined);
@@ -1616,14 +1636,15 @@ static int parse_groups(struct evlist *perf_evlist,
 		}
 		if (!metric_evlist) {
 			ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier,
-					m->group_events, tool_events, &m->evlist);
+					m->group_events, tool_events, &m->evlist,
+					tpebs_events, tpebs_event_size);
 			if (ret)
 				goto out;
 
 			metric_evlist = m->evlist;
 		}
 		ret = setup_metric_events(fake_pmu ? "all" : m->pmu, m->pctx->ids,
-					  metric_evlist, &metric_events);
+					  metric_evlist, &metric_events, *tpebs_event_size);
 		if (ret) {
 			pr_err("Cannot resolve IDs for %s: %s\n",
 				m->metric_name, m->metric_expr);
@@ -1690,16 +1711,21 @@ int metricgroup__parse_groups(struct evlist *perf_evlist,
 			      bool metric_no_threshold,
 			      const char *user_requested_cpu_list,
 			      bool system_wide,
-			      struct rblist *metric_events)
+			      struct rblist *metric_events,
+			      struct list_head *tpebs_events,
+			      size_t *tpebs_event_size)
 {
 	const struct pmu_metrics_table *table = pmu_metrics_table__find();
 
+	pr_debug("Test debugging\n");
+
 	if (!table)
 		return -EINVAL;
 
 	return parse_groups(perf_evlist, pmu, str, metric_no_group, metric_no_merge,
 			    metric_no_threshold, user_requested_cpu_list, system_wide,
-			    /*fake_pmu=*/NULL, metric_events, table);
+			    /*fake_pmu=*/NULL, metric_events, table, tpebs_events,
+			    tpebs_event_size);
 }
 
 int metricgroup__parse_groups_test(struct evlist *evlist,
@@ -1713,7 +1739,9 @@ int metricgroup__parse_groups_test(struct evlist *evlist,
 			    /*metric_no_threshold=*/false,
 			    /*user_requested_cpu_list=*/NULL,
 			    /*system_wide=*/false,
-			    &perf_pmu__fake, metric_events, table);
+			    &perf_pmu__fake, metric_events, table,
+			    /*tpebs_events=*/NULL,
+			    /*tpebs_event_size=*/0);
 }
 
 struct metricgroup__has_metric_data {
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index d5325c6ec8e1..7c24ed768ff3 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -66,6 +66,12 @@ struct metric_expr {
 	int runtime;
 };
 
+struct tpebs_event {
+	struct list_head nd;
+	const char *name;
+	const char *tpebs_name;
+};
+
 struct metric_event *metricgroup__lookup(struct rblist *metric_events,
 					 struct evsel *evsel,
 					 bool create);
@@ -77,7 +83,9 @@ int metricgroup__parse_groups(struct evlist *perf_evlist,
 			      bool metric_no_threshold,
 			      const char *user_requested_cpu_list,
 			      bool system_wide,
-			      struct rblist *metric_events);
+			      struct rblist *metric_events,
+			      struct list_head *tpebs_events,
+			      size_t *tpebs_event_size);
 int metricgroup__parse_groups_test(struct evlist *evlist,
 				   const struct pmu_metrics_table *table,
 				   const char *str,
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 325d0fad1842..9d0186316b29 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -109,6 +109,8 @@ struct perf_stat_config {
 	struct cpu_aggr_map	*cpus_aggr_map;
 	u64			*walltime_run;
 	struct rblist		 metric_events;
+	struct list_head	 tpebs_events;
+	size_t			 tpebs_event_size;
 	int			 ctl_fd;
 	int			 ctl_fd_ack;
 	bool			 ctl_fd_close;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.
  2024-02-21  7:20 [RFC PATCH v1 0/5] TPEBS counting mode support weilin.wang
  2024-02-21  7:20 ` [RFC PATCH v1 1/5] perf stat: Parse and find tpebs events when parsing metrics to prepare for perf record sampling weilin.wang
@ 2024-02-21  7:20 ` weilin.wang
  2024-02-21 17:52   ` Arnaldo Carvalho de Melo
  2024-02-21  7:20 ` [RFC PATCH v1 3/5] perf stat: Add retire latency values into the expr_parse_ctx to prepare for final metric calculation weilin.wang
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: weilin.wang @ 2024-02-21  7:20 UTC (permalink / raw)
  To: weilin.wang, Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Adrian Hunter, Kan Liang
  Cc: linux-perf-users, linux-kernel, Perry Taylor, Samantha Alt,
	Caleb Biggers

From: Weilin Wang <weilin.wang@intel.com>

Signed-off-by: Weilin Wang <weilin.wang@intel.com>
---
 tools/perf/builtin-stat.c     | 179 +++++++++++++++++++++++++++++++++-
 tools/perf/util/data.c        |   4 +
 tools/perf/util/data.h        |   1 +
 tools/perf/util/metricgroup.c |  11 ++-
 tools/perf/util/metricgroup.h |   7 ++
 tools/perf/util/stat.h        |   3 +
 6 files changed, 201 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 93a7cba30f12..e03665af9da2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -94,8 +94,13 @@
 #include <perf/evlist.h>
 #include <internal/threadmap.h>
 
+#include "util/sample.h"
+#include <sys/param.h>
+#include <subcmd/run-command.h>
+
 #define DEFAULT_SEPARATOR	" "
 #define FREEZE_ON_SMI_PATH	"devices/cpu/freeze_on_smi"
+#define PERF_DATA		"-"
 
 static void print_counters(struct timespec *ts, int argc, const char **argv);
 
@@ -162,7 +167,8 @@ static struct perf_stat_config stat_config = {
 	.ctl_fd			= -1,
 	.ctl_fd_ack		= -1,
 	.iostat_run		= false,
-	.tpebs_event_size = 0,
+	.tpebs_event_size       = 0,
+	.tpebs_pid              = -1,
 };
 
 static bool cpus_map_matched(struct evsel *a, struct evsel *b)
@@ -687,12 +693,163 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
 	return COUNTER_FATAL;
 }
 
-static int __run_perf_record(void)
+static int __run_perf_record(const char **record_argv)
 {
+	int i = 0;
+	struct tpebs_event *e;
 	pr_debug("Prepare perf record for retire_latency\n");
+
+
+	record_argv[i++] = "perf";
+	record_argv[i++] = "record";
+	record_argv[i++] = "-W";
+
+	if (stat_config.user_requested_cpu_list) {
+		record_argv[i++] = "-C";
+		record_argv[i++] = stat_config.user_requested_cpu_list;
+	}
+
+	if (stat_config.system_wide)
+		record_argv[i++] = "-a";
+
+	list_for_each_entry(e, &stat_config.tpebs_events, nd) {
+		record_argv[i++] = "-e";
+		record_argv[i++] = e->name;
+	}
+
+	record_argv[i++] = "-o";
+	record_argv[i++] = PERF_DATA;
+
 	return 0;
 }
 
+static void prepare_run_command(struct child_process *cmd,
+			       const char **argv)
+{
+	memset(cmd, 0, sizeof(*cmd));
+	cmd->argv = argv;
+	cmd->out = -1;
+}
+static int prepare_perf_record(struct child_process *cmd)
+{
+	const char **record_argv;
+
+	record_argv = calloc(10 + 2 * stat_config.tpebs_event_size, sizeof(char *));
+	if (!record_argv)
+		return -1;
+	__run_perf_record(record_argv);
+
+	prepare_run_command(cmd, record_argv);
+	return start_command(cmd);
+}
+
+struct perf_script {
+	struct perf_tool	tool;
+	struct perf_session	*session;
+	struct evswitch		evswitch;
+	struct perf_cpu_map	*cpus;
+	struct perf_thread_map *threads;
+	int			name_width;
+};
+
+static void tpebs_data__delete(void)
+{
+	struct tpebs_retire_lat *r, *rtmp;
+	struct tpebs_event *e, *etmp;
+	list_for_each_entry_safe(r, rtmp, &stat_config.tpebs_results, nd) {
+		list_del_init(&r->nd);
+		free(r);
+	}
+	list_for_each_entry_safe(e, etmp, &stat_config.tpebs_events, nd) {
+		list_del_init(&e->nd);
+		free(e);
+	}
+}
+
+static int process_sample_event(struct perf_tool *tool,
+				union perf_event *event __maybe_unused,
+				struct perf_sample *sample,
+				struct evsel *evsel,
+				struct machine *machine __maybe_unused)
+{
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	int ret = 0;
+	const char *evname;
+	struct tpebs_retire_lat *t;
+
+	pr_debug("entering function %s\n ", __func__);
+	evname = evsel__name(evsel);
+
+	pr_debug("[%03d] ", sample->cpu);
+	pr_debug("%*s: ", script->name_width, evname ?: "[unknown]");
+	pr_debug("%16" PRIu16, sample->retire_lat);
+	pr_debug("\n");
+
+	// Need to handle per core results?
+	// We are assuming average retire latency value will be used. Save the number of
+	// samples and the sum of retire latency value for each event.
+	list_for_each_entry(t, &stat_config.tpebs_results, nd) {
+		if (!strcmp(evname, t->name)) {
+			t->count += 1;
+			t->sum += sample->retire_lat;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static int process_feature_event(struct perf_session *session,
+				 union perf_event *event)
+{
+	if (event->feat.feat_id < HEADER_LAST_FEATURE)
+		return perf_event__process_feature(session, event);
+	return 0;
+}
+
+static int __cmd_script(struct child_process *cmd __maybe_unused)
+{
+	int err = 0;
+	struct perf_session *session;
+	struct perf_data data = {
+		.mode = PERF_DATA_MODE_READ,
+		.path = PERF_DATA,
+		.fd   = cmd->out,
+	};
+	struct perf_script script = {
+		.tool = {
+		.sample		 = process_sample_event,
+		.ordered_events	 = true,
+		.ordering_requires_timestamps = true,
+		.feature	 = process_feature_event,
+		.attr		 = perf_event__process_attr,
+		},
+	};
+	struct tpebs_event *e;
+
+	list_for_each_entry(e, &stat_config.tpebs_events, nd) {
+		struct tpebs_retire_lat *new = malloc(sizeof(struct tpebs_retire_lat));
+
+		if (!new)
+			return -1;
+		new->name = strdup(e->name);
+		new->tpebs_name = strdup(e->tpebs_name);
+		new->count = 0;
+		new->sum = 0;
+		list_add_tail(&new->nd, &stat_config.tpebs_results);
+	}
+
+	kill(cmd->pid, SIGTERM);
+	session = perf_session__new(&data, &script.tool);
+	if (IS_ERR(session))
+		return PTR_ERR(session);
+	script.session = session;
+	err = perf_session__process_events(session);
+	perf_session__delete(session);
+
+	return err;
+}
+
 static int __run_perf_stat(int argc, const char **argv, int run_idx)
 {
 	int interval = stat_config.interval;
@@ -709,12 +866,14 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	struct affinity saved_affinity, *affinity = NULL;
 	int err;
 	bool second_pass = false;
+	struct child_process cmd;
 
 	//Prepare perf record for sampling event retire_latency before fork and prepare workload
 	if (stat_config.tpebs_event_size > 0) {
 		int ret;
 
-		ret = __run_perf_record();
+		pr_debug("perf stat pid = %d\n", getpid());
+		ret = prepare_perf_record(&cmd);
 		if (ret)
 			return ret;
 	}
@@ -924,6 +1083,17 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 
 	t1 = rdclock();
 
+	if (stat_config.tpebs_event_size > 0) {
+		int ret;
+
+		pr_debug("pid = %d\n", getpid());
+		pr_debug("cmd.pid = %d\n", cmd.pid);
+
+		ret = __cmd_script(&cmd);
+		close(cmd.out);
+		pr_debug("%d\n", ret);
+	}
+
 	if (stat_config.walltime_run_table)
 		stat_config.walltime_run[run_idx] = t1 - t0;
 
@@ -2714,6 +2884,7 @@ int cmd_stat(int argc, const char **argv)
 	}
 
 	INIT_LIST_HEAD(&stat_config.tpebs_events);
+	INIT_LIST_HEAD(&stat_config.tpebs_results);
 
 	/*
 	 * Metric parsing needs to be delayed as metrics may optimize events
@@ -2921,5 +3092,7 @@ int cmd_stat(int argc, const char **argv)
 	metricgroup__rblist_exit(&stat_config.metric_events);
 	evlist__close_control(stat_config.ctl_fd, stat_config.ctl_fd_ack, &stat_config.ctl_fd_close);
 
+	tpebs_data__delete();
+
 	return status;
 }
diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
index fc16299c915f..2298ca3b370b 100644
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
@@ -173,6 +173,10 @@ static bool check_pipe(struct perf_data *data)
 	int fd = perf_data__is_read(data) ?
 		 STDIN_FILENO : STDOUT_FILENO;
 
+	if (data->fd > 0) {
+		fd = data->fd;
+	}
+
 	if (!data->path) {
 		if (!fstat(fd, &st) && S_ISFIFO(st.st_mode))
 			is_pipe = true;
diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
index effcc195d7e9..5554d46ad212 100644
--- a/tools/perf/util/data.h
+++ b/tools/perf/util/data.h
@@ -28,6 +28,7 @@ struct perf_data_file {
 
 struct perf_data {
 	const char		*path;
+	int			 fd;
 	struct perf_data_file	 file;
 	bool			 is_pipe;
 	bool			 is_dir;
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 6c16e5a0b1fc..8518e2b3e5be 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -691,8 +691,17 @@ static int metricgroup__build_event_string(struct strbuf *events,
 
 		if (p) {
 			struct tpebs_event *new_event = malloc(sizeof(struct tpebs_event));
-			*p = '\0';
+			char *name;
+
 			new_event->tpebs_name = strdup(id);
+			*p = '\0';
+			name = malloc(strlen(id) + 2);
+			if (!name)
+				return -ENOMEM;
+
+			strcpy(name, id);
+			strcat(name, ":p");
+			new_event->name = name;
 			*tpebs_event_size += 1;
 			pr_debug("retire_latency required, tpebs_event_size=%lu, new_event=%s\n",
 			*tpebs_event_size, new_event->name);
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 7c24ed768ff3..1fa12cc3294e 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -71,6 +71,13 @@ struct tpebs_event {
 	const char *name;
 	const char *tpebs_name;
 };
+struct tpebs_retire_lat {
+	struct list_head nd;
+	const char *name;
+	const char *tpebs_name;
+	size_t count;
+	int sum;
+};
 
 struct metric_event *metricgroup__lookup(struct rblist *metric_events,
 					 struct evsel *evsel,
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 9d0186316b29..8e180f13aa2d 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -111,6 +111,9 @@ struct perf_stat_config {
 	struct rblist		 metric_events;
 	struct list_head	 tpebs_events;
 	size_t			 tpebs_event_size;
+	struct list_head	 tpebs_results;
+	pid_t			 tpebs_pid;
+	int			 tpebs_pipe;
 	int			 ctl_fd;
 	int			 ctl_fd_ack;
 	bool			 ctl_fd_close;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v1 3/5] perf stat: Add retire latency values into the expr_parse_ctx to prepare for final metric calculation
  2024-02-21  7:20 [RFC PATCH v1 0/5] TPEBS counting mode support weilin.wang
  2024-02-21  7:20 ` [RFC PATCH v1 1/5] perf stat: Parse and find tpebs events when parsing metrics to prepare for perf record sampling weilin.wang
  2024-02-21  7:20 ` [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric weilin.wang
@ 2024-02-21  7:20 ` weilin.wang
  2024-02-21  7:20 ` [RFC PATCH v1 4/5] perf stat: Create another thread for sample data processing weilin.wang
  2024-02-21  7:20 ` [RFC PATCH v1 5/5] perf stat: Add retire latency print functions to print out at the very end of print out weilin.wang
  4 siblings, 0 replies; 11+ messages in thread
From: weilin.wang @ 2024-02-21  7:20 UTC (permalink / raw)
  To: weilin.wang, Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Adrian Hunter, Kan Liang
  Cc: linux-perf-users, linux-kernel, Perry Taylor, Samantha Alt,
	Caleb Biggers

From: Weilin Wang <weilin.wang@intel.com>

Retire latency values of events are used in metric formulas. This update adds
code to process data from perf record for required retire latency values.

Signed-off-by: Weilin Wang <weilin.wang@intel.com>
---
 tools/perf/builtin-stat.c     |  1 +
 tools/perf/util/metricgroup.h |  1 +
 tools/perf/util/stat-shadow.c | 18 ++++++++++++++++++
 3 files changed, 20 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e03665af9da2..948caab9ea91 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -792,6 +792,7 @@ static int process_sample_event(struct perf_tool *tool,
 		if (!strcmp(evname, t->name)) {
 			t->count += 1;
 			t->sum += sample->retire_lat;
+			t->val = t->count > 0 ? t->sum/t->count : 0;
 			break;
 		}
 	}
diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
index 1fa12cc3294e..08af0f447550 100644
--- a/tools/perf/util/metricgroup.h
+++ b/tools/perf/util/metricgroup.h
@@ -77,6 +77,7 @@ struct tpebs_retire_lat {
 	const char *tpebs_name;
 	size_t count;
 	int sum;
+	double val;
 };
 
 struct metric_event *metricgroup__lookup(struct rblist *metric_events,
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 1c5c3eeba4cf..262a6960b461 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -355,6 +355,19 @@ static void print_nsecs(struct perf_stat_config *config,
 		print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0);
 }
 
+static int prepare_retire_lat(struct expr_parse_ctx *pctx,
+			     struct list_head *retire_lats)
+{
+	int ret = 0;
+	struct tpebs_retire_lat *t;
+	list_for_each_entry(t, retire_lats, nd) {
+		ret = expr__add_id_val(pctx, strdup(t->tpebs_name), t->val);
+		if (ret < 0)
+			return ret;
+	}
+	return ret;
+}
+
 static int prepare_metric(struct evsel **metric_events,
 			  struct metric_ref *metric_refs,
 			  struct expr_parse_ctx *pctx,
@@ -467,6 +480,11 @@ static void generic_metric(struct perf_stat_config *config,
 		pctx->sctx.user_requested_cpu_list = strdup(config->user_requested_cpu_list);
 	pctx->sctx.runtime = runtime;
 	pctx->sctx.system_wide = config->system_wide;
+	i = prepare_retire_lat(pctx, &config->tpebs_results);
+	if (i < 0) {
+		expr__ctx_free(pctx);
+		return;
+	}
 	i = prepare_metric(metric_events, metric_refs, pctx, aggr_idx);
 	if (i < 0) {
 		expr__ctx_free(pctx);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v1 4/5] perf stat: Create another thread for sample data processing
  2024-02-21  7:20 [RFC PATCH v1 0/5] TPEBS counting mode support weilin.wang
                   ` (2 preceding siblings ...)
  2024-02-21  7:20 ` [RFC PATCH v1 3/5] perf stat: Add retire latency values into the expr_parse_ctx to prepare for final metric calculation weilin.wang
@ 2024-02-21  7:20 ` weilin.wang
  2024-02-21  7:20 ` [RFC PATCH v1 5/5] perf stat: Add retire latency print functions to print out at the very end of print out weilin.wang
  4 siblings, 0 replies; 11+ messages in thread
From: weilin.wang @ 2024-02-21  7:20 UTC (permalink / raw)
  To: weilin.wang, Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Adrian Hunter, Kan Liang
  Cc: linux-perf-users, linux-kernel, Perry Taylor, Samantha Alt,
	Caleb Biggers

From: Weilin Wang <weilin.wang@intel.com>

Another thread is required to synchronize between perf stat and perf record
when we pass data through pipe.

Signed-off-by: Weilin Wang <weilin.wang@intel.com>
---
 tools/perf/builtin-stat.c | 58 +++++++++++++++++++++++----------------
 1 file changed, 34 insertions(+), 24 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 948caab9ea91..cbbf6cc2ab6b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -777,7 +777,6 @@ static int process_sample_event(struct perf_tool *tool,
 	const char *evname;
 	struct tpebs_retire_lat *t;
 
-	pr_debug("entering function %s\n ", __func__);
 	evname = evsel__name(evsel);
 
 	pr_debug("[%03d] ", sample->cpu);
@@ -808,9 +807,9 @@ static int process_feature_event(struct perf_session *session,
 	return 0;
 }
 
-static int __cmd_script(struct child_process *cmd __maybe_unused)
+static void *__cmd_script(void *arg __maybe_unused)
 {
-	int err = 0;
+	struct child_process *cmd = arg;
 	struct perf_session *session;
 	struct perf_data data = {
 		.mode = PERF_DATA_MODE_READ,
@@ -826,29 +825,15 @@ static int __cmd_script(struct child_process *cmd __maybe_unused)
 		.attr		 = perf_event__process_attr,
 		},
 	};
-	struct tpebs_event *e;
-
-	list_for_each_entry(e, &stat_config.tpebs_events, nd) {
-		struct tpebs_retire_lat *new = malloc(sizeof(struct tpebs_retire_lat));
-
-		if (!new)
-			return -1;
-		new->name = strdup(e->name);
-		new->tpebs_name = strdup(e->tpebs_name);
-		new->count = 0;
-		new->sum = 0;
-		list_add_tail(&new->nd, &stat_config.tpebs_results);
-	}
 
-	kill(cmd->pid, SIGTERM);
 	session = perf_session__new(&data, &script.tool);
 	if (IS_ERR(session))
-		return PTR_ERR(session);
+		return NULL;
 	script.session = session;
-	err = perf_session__process_events(session);
+	perf_session__process_events(session);
 	perf_session__delete(session);
 
-	return err;
+	return NULL;
 }
 
 static int __run_perf_stat(int argc, const char **argv, int run_idx)
@@ -868,15 +853,37 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	int err;
 	bool second_pass = false;
 	struct child_process cmd;
+	pthread_t thread_script;
 
 	//Prepare perf record for sampling event retire_latency before fork and prepare workload
 	if (stat_config.tpebs_event_size > 0) {
 		int ret;
 
+		struct tpebs_event *e;
 		pr_debug("perf stat pid = %d\n", getpid());
 		ret = prepare_perf_record(&cmd);
 		if (ret)
 			return ret;
+
+		list_for_each_entry(e, &stat_config.tpebs_events, nd) {
+			struct tpebs_retire_lat *new = malloc(sizeof(struct tpebs_retire_lat));
+
+			if (!new)
+				return -1;
+			new->name = strdup(e->name);
+			new->tpebs_name = strdup(e->tpebs_name);
+			new->count = 0;
+			new->sum = 0;
+			list_add_tail(&new->nd, &stat_config.tpebs_results);
+		}
+
+		if (pthread_create(&thread_script, NULL, __cmd_script, &cmd)) {
+			kill(cmd.pid, SIGTERM);
+			close(cmd.out);
+			pr_err("Could not create thread to process sample data.\n");
+			return -1;
+		}
+		sleep(2);
 	}
 
 	if (forks) {
@@ -1087,12 +1094,15 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	if (stat_config.tpebs_event_size > 0) {
 		int ret;
 
-		pr_debug("pid = %d\n", getpid());
-		pr_debug("cmd.pid = %d\n", cmd.pid);
+		pr_debug("Workload finished, finishing record\n");
+		pr_debug("Perf stat pid = %d, Perf record pid = %d\n", getpid(), cmd.pid);
 
-		ret = __cmd_script(&cmd);
+		kill(cmd.pid, SIGTERM);
+		pthread_join(thread_script, NULL);
 		close(cmd.out);
-		pr_debug("%d\n", ret);
+		ret = finish_command(&cmd);
+		if (ret != -ERR_RUN_COMMAND_WAITPID_SIGNAL)
+			return ret;
 	}
 
 	if (stat_config.walltime_run_table)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH v1 5/5] perf stat: Add retire latency print functions to print out at the very end of print out
  2024-02-21  7:20 [RFC PATCH v1 0/5] TPEBS counting mode support weilin.wang
                   ` (3 preceding siblings ...)
  2024-02-21  7:20 ` [RFC PATCH v1 4/5] perf stat: Create another thread for sample data processing weilin.wang
@ 2024-02-21  7:20 ` weilin.wang
  4 siblings, 0 replies; 11+ messages in thread
From: weilin.wang @ 2024-02-21  7:20 UTC (permalink / raw)
  To: weilin.wang, Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Adrian Hunter, Kan Liang
  Cc: linux-perf-users, linux-kernel, Perry Taylor, Samantha Alt,
	Caleb Biggers

From: Weilin Wang <weilin.wang@intel.com>

Add print out functions so that users could read retire latency values.

Signed-off-by: Weilin Wang <weilin.wang@intel.com>
---
 tools/perf/util/stat-display.c | 65 ++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index afe6db8e7bf4..ea0d67d8993e 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -21,6 +21,7 @@
 #include "iostat.h"
 #include "pmu.h"
 #include "pmus.h"
+#include "metricgroup.h"
 
 #define CNTR_NOT_SUPPORTED	"<not supported>"
 #define CNTR_NOT_COUNTED	"<not counted>"
@@ -34,6 +35,7 @@
 #define COMM_LEN     16
 #define PID_LEN       7
 #define CPUS_LEN      4
+#define RETIRE_LEN    8
 
 static int aggr_header_lens[] = {
 	[AGGR_CORE] 	= 18,
@@ -415,6 +417,67 @@ static void print_metric_std(struct perf_stat_config *config,
 	fprintf(out, " %-*s", METRIC_LEN - n - 1, unit);
 }
 
+static void print_retire_lat_std(struct perf_stat_config *config,
+				 struct outstate *os)
+{
+	FILE *out = os->fh;
+	bool newline = os->newline;
+	struct tpebs_retire_lat *t;
+	struct list_head *retire_lats = &config->tpebs_results;
+
+	list_for_each_entry(t, retire_lats, nd) {
+		if (newline)
+			do_new_line_std(config, os);
+		fprintf(out, "%'*.2f %-*s", COUNTS_LEN, t->val, EVNAME_LEN, t->name);
+		fprintf(out, "%*ld %*d\n", RETIRE_LEN, t->count,
+			 RETIRE_LEN, t->sum);
+	}
+}
+
+static void print_retire_lat_csv(struct perf_stat_config *config,
+				 struct outstate *os)
+{
+	FILE *out = os->fh;
+	struct tpebs_retire_lat *t;
+	struct list_head *retire_lats = &config->tpebs_results;
+	const char *sep = config->csv_sep;
+
+	list_for_each_entry(t, retire_lats, nd) {
+		fprintf(out, "%f%s%s%s%s%ld%s%d\n", t->val, sep, sep, t->name, sep,
+			t->count, sep, t->sum);
+	}
+}
+
+static void print_retire_lat_json(struct perf_stat_config *config,
+				  struct outstate *os)
+{
+	FILE *out = os->fh;
+	struct tpebs_retire_lat *t;
+	struct list_head *retire_lats = &config->tpebs_results;
+
+	fprintf(out, "{");
+	list_for_each_entry(t, retire_lats, nd) {
+		fprintf(out, "\"retire_latency-value\" : \"%f\", ", t->val);
+		fprintf(out, "\"event-name\" : \"%s\"", t->name);
+		fprintf(out, "\"sample-counts\" : \"%ld\"", t->count);
+		fprintf(out, "\"retire_latency-sum\" : \"%d\"", t->sum);
+	}
+	fprintf(out, "}");
+}
+
+static void print_retire_lat(struct perf_stat_config *config,
+			     struct outstate *os)
+{
+	if (!&config->tpebs_results)
+		return;
+	if (config->json_output)
+		print_retire_lat_json(config, os);
+	else if (config->csv_output)
+		print_retire_lat_csv(config, os);
+	else
+		print_retire_lat_std(config, os);
+}
+
 static void new_line_csv(struct perf_stat_config *config, void *ctx)
 {
 	struct outstate *os = ctx;
@@ -1591,6 +1654,8 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
 		break;
 	}
 
+	print_retire_lat(config, &os);
+
 	print_footer(config);
 
 	fflush(config->output);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.
  2024-02-21  7:20 ` [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric weilin.wang
@ 2024-02-21 17:52   ` Arnaldo Carvalho de Melo
  2024-02-21 20:34     ` Ian Rogers
  0 siblings, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-02-21 17:52 UTC (permalink / raw)
  To: weilin.wang
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Kan Liang,
	linux-perf-users, linux-kernel, Perry Taylor, Samantha Alt,
	Caleb Biggers

On Wed, Feb 21, 2024 at 02:20:56AM -0500, weilin.wang@intel.com wrote:
> From: Weilin Wang <weilin.wang@intel.com>

You wrote no description for this patch, please add one and show
examples of the feature being used, if possible.

See below for more comments.
 
> Signed-off-by: Weilin Wang <weilin.wang@intel.com>
> +static int __cmd_script(struct child_process *cmd __maybe_unused)
> +{
> +	int err = 0;
> +	struct perf_session *session;
> +	struct perf_data data = {
> +		.mode = PERF_DATA_MODE_READ,
> +		.path = PERF_DATA,
> +		.fd   = cmd->out,
> +	};
> +	struct perf_script script = {
> +		.tool = {
> +		.sample		 = process_sample_event,
> +		.ordered_events	 = true,
> +		.ordering_requires_timestamps = true,
> +		.feature	 = process_feature_event,
> +		.attr		 = perf_event__process_attr,
> +		},
> +	};
> +	struct tpebs_event *e;
> +
> +	list_for_each_entry(e, &stat_config.tpebs_events, nd) {
> +		struct tpebs_retire_lat *new = malloc(sizeof(struct tpebs_retire_lat));
> +
> +		if (!new)
> +			return -1;
> +		new->name = strdup(e->name);
> +		new->tpebs_name = strdup(e->tpebs_name);
> +		new->count = 0;
> +		new->sum = 0;

Without even having thought that much about this overall architecture,
that looks too heavy at first sight, the above is done in tools/perf/
as:

void tpebs_retire_lat__delete(struct tpebs_retire_lat *retire_lat)
{
	if (retire_lat == NULL)
		return;

	zfree(&retire_lat->tpebs_name);
	zfree(&retire_lat->tpebs_name);
}

struct tpebs_retire_lat__new(tpebs_event *e)
{
	struct tpebs_retire_lat *retire_lat = zalloc(sizeof(*retire_lat));

	if (retire_lat != NULL) {
		retire_lat->name = strdup(e->name);
		retire_lat->tpebs_name = strdup(e->tpebs_name);

		if (retire_lat->name == NULL || retire_lat->tpebs_name == NULL) {
			tpebs_retire_lat__delete(retire_lat);
			retire_lat = NULL;
		}
	}

	return retire_lat;
}

And then you call the constructor  in that loop, and the destructor at
some point when those data structures are not needed.

We do it because perf has a TUI mode and we may end up calling tools
from them in a long running session, so we need to avoid leaks.

Also can we somehow hide Intel specific terms in arch specific files
while leaving something generic, possibly implementable in other arches
in the core code? I mean hiding clearly intel specific stuff like the
"tpebs" term in tools/perf/arch/x86/.

> +		list_add_tail(&new->nd, &stat_config.tpebs_results);
> +	}
> +
> +	kill(cmd->pid, SIGTERM);
> +	session = perf_session__new(&data, &script.tool);
> +	if (IS_ERR(session))
> +		return PTR_ERR(session);
> +	script.session = session;
> +	err = perf_session__process_events(session);
> +	perf_session__delete(session);
> +
> +	return err;
> +}
> +
>  static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  {
>  	int interval = stat_config.interval;
> @@ -709,12 +866,14 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	struct affinity saved_affinity, *affinity = NULL;
>  	int err;
>  	bool second_pass = false;
> +	struct child_process cmd;
>  
>  	//Prepare perf record for sampling event retire_latency before fork and prepare workload
>  	if (stat_config.tpebs_event_size > 0) {
>  		int ret;
>  
> -		ret = __run_perf_record();
> +		pr_debug("perf stat pid = %d\n", getpid());
> +		ret = prepare_perf_record(&cmd);
>  		if (ret)
>  			return ret;
>  	}
> @@ -924,6 +1083,17 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  
>  	t1 = rdclock();
>  
> +	if (stat_config.tpebs_event_size > 0) {
> +		int ret;
> +
> +		pr_debug("pid = %d\n", getpid());
> +		pr_debug("cmd.pid = %d\n", cmd.pid);
> +
> +		ret = __cmd_script(&cmd);
> +		close(cmd.out);
> +		pr_debug("%d\n", ret);
> +	}
> +
>  	if (stat_config.walltime_run_table)
>  		stat_config.walltime_run[run_idx] = t1 - t0;
>  
> @@ -2714,6 +2884,7 @@ int cmd_stat(int argc, const char **argv)
>  	}
>  
>  	INIT_LIST_HEAD(&stat_config.tpebs_events);
> +	INIT_LIST_HEAD(&stat_config.tpebs_results);
>  
>  	/*
>  	 * Metric parsing needs to be delayed as metrics may optimize events
> @@ -2921,5 +3092,7 @@ int cmd_stat(int argc, const char **argv)
>  	metricgroup__rblist_exit(&stat_config.metric_events);
>  	evlist__close_control(stat_config.ctl_fd, stat_config.ctl_fd_ack, &stat_config.ctl_fd_close);
>  
> +	tpebs_data__delete();
> +
>  	return status;
>  }
> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> index fc16299c915f..2298ca3b370b 100644
> --- a/tools/perf/util/data.c
> +++ b/tools/perf/util/data.c
> @@ -173,6 +173,10 @@ static bool check_pipe(struct perf_data *data)
>  	int fd = perf_data__is_read(data) ?
>  		 STDIN_FILENO : STDOUT_FILENO;
>  
> +	if (data->fd > 0) {
> +		fd = data->fd;
> +	}
> +
>  	if (!data->path) {
>  		if (!fstat(fd, &st) && S_ISFIFO(st.st_mode))
>  			is_pipe = true;
> diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
> index effcc195d7e9..5554d46ad212 100644
> --- a/tools/perf/util/data.h
> +++ b/tools/perf/util/data.h
> @@ -28,6 +28,7 @@ struct perf_data_file {
>  
>  struct perf_data {
>  	const char		*path;
> +	int			 fd;
>  	struct perf_data_file	 file;
>  	bool			 is_pipe;
>  	bool			 is_dir;
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 6c16e5a0b1fc..8518e2b3e5be 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -691,8 +691,17 @@ static int metricgroup__build_event_string(struct strbuf *events,
>  
>  		if (p) {
>  			struct tpebs_event *new_event = malloc(sizeof(struct tpebs_event));
> -			*p = '\0';
> +			char *name;
> +
>  			new_event->tpebs_name = strdup(id);
> +			*p = '\0';
> +			name = malloc(strlen(id) + 2);
> +			if (!name)
> +				return -ENOMEM;
> +
> +			strcpy(name, id);
> +			strcat(name, ":p");
> +			new_event->name = name;

For such cases we use asprintf(), that allocates and formats the string
in one call, look for examples in other tools/perf/ files.

>  			*tpebs_event_size += 1;
>  			pr_debug("retire_latency required, tpebs_event_size=%lu, new_event=%s\n",
>  			*tpebs_event_size, new_event->name);
> diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
> index 7c24ed768ff3..1fa12cc3294e 100644
> --- a/tools/perf/util/metricgroup.h
> +++ b/tools/perf/util/metricgroup.h
> @@ -71,6 +71,13 @@ struct tpebs_event {
>  	const char *name;
>  	const char *tpebs_name;
>  };
> +struct tpebs_retire_lat {
> +	struct list_head nd;
> +	const char *name;
> +	const char *tpebs_name;
> +	size_t count;
> +	int sum;
> +};

Here you declare the constructor and destructor I suggested above.
  
>  struct metric_event *metricgroup__lookup(struct rblist *metric_events,
>  					 struct evsel *evsel,
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 9d0186316b29..8e180f13aa2d 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -111,6 +111,9 @@ struct perf_stat_config {
>  	struct rblist		 metric_events;
>  	struct list_head	 tpebs_events;
>  	size_t			 tpebs_event_size;
> +	struct list_head	 tpebs_results;
> +	pid_t			 tpebs_pid;
> +	int			 tpebs_pipe;
>  	int			 ctl_fd;
>  	int			 ctl_fd_ack;
>  	bool			 ctl_fd_close;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.
  2024-02-21 17:52   ` Arnaldo Carvalho de Melo
@ 2024-02-21 20:34     ` Ian Rogers
  2024-02-23  7:03       ` Namhyung Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Rogers @ 2024-02-21 20:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: weilin.wang, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Adrian Hunter, Kan Liang,
	linux-perf-users, linux-kernel, Perry Taylor, Samantha Alt,
	Caleb Biggers

On Wed, Feb 21, 2024 at 9:52 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Wed, Feb 21, 2024 at 02:20:56AM -0500, weilin.wang@intel.com wrote:
> > From: Weilin Wang <weilin.wang@intel.com>
>
> You wrote no description for this patch, please add one and show
> examples of the feature being used, if possible.
>
> See below for more comments.
>
> > Signed-off-by: Weilin Wang <weilin.wang@intel.com>
> > +static int __cmd_script(struct child_process *cmd __maybe_unused)
> > +{
> > +     int err = 0;
> > +     struct perf_session *session;
> > +     struct perf_data data = {
> > +             .mode = PERF_DATA_MODE_READ,
> > +             .path = PERF_DATA,
> > +             .fd   = cmd->out,
> > +     };
> > +     struct perf_script script = {
> > +             .tool = {
> > +             .sample          = process_sample_event,
> > +             .ordered_events  = true,
> > +             .ordering_requires_timestamps = true,
> > +             .feature         = process_feature_event,
> > +             .attr            = perf_event__process_attr,
> > +             },
> > +     };
> > +     struct tpebs_event *e;
> > +
> > +     list_for_each_entry(e, &stat_config.tpebs_events, nd) {
> > +             struct tpebs_retire_lat *new = malloc(sizeof(struct tpebs_retire_lat));
> > +
> > +             if (!new)
> > +                     return -1;
> > +             new->name = strdup(e->name);
> > +             new->tpebs_name = strdup(e->tpebs_name);
> > +             new->count = 0;
> > +             new->sum = 0;
>
> Without even having thought that much about this overall architecture,
> that looks too heavy at first sight, the above is done in tools/perf/
> as:
>
> void tpebs_retire_lat__delete(struct tpebs_retire_lat *retire_lat)
> {
>         if (retire_lat == NULL)
>                 return;
>
>         zfree(&retire_lat->tpebs_name);
>         zfree(&retire_lat->tpebs_name);
> }
>
> struct tpebs_retire_lat__new(tpebs_event *e)
> {
>         struct tpebs_retire_lat *retire_lat = zalloc(sizeof(*retire_lat));
>
>         if (retire_lat != NULL) {
>                 retire_lat->name = strdup(e->name);
>                 retire_lat->tpebs_name = strdup(e->tpebs_name);
>
>                 if (retire_lat->name == NULL || retire_lat->tpebs_name == NULL) {
>                         tpebs_retire_lat__delete(retire_lat);
>                         retire_lat = NULL;
>                 }
>         }
>
>         return retire_lat;
> }
>
> And then you call the constructor  in that loop, and the destructor at
> some point when those data structures are not needed.
>
> We do it because perf has a TUI mode and we may end up calling tools
> from them in a long running session, so we need to avoid leaks.
>
> Also can we somehow hide Intel specific terms in arch specific files
> while leaving something generic, possibly implementable in other arches
> in the core code? I mean hiding clearly intel specific stuff like the
> "tpebs" term in tools/perf/arch/x86/.

Thanks Arnaldo, TPEBS support is necessary to support TMA metrics 4.7
on meteorlake. The timed values replace hard coded constants that
assume worst case latencies. You can see the metrics here:
https://github.com/intel/perfmon/blob/main/TMA_Metrics-full.csv
in the MTL (meteorlake) column row 29 there is:
MEM_INST_RETIRED.STLB_HIT_LOADS*min($PEBS, #Mem_STLB_Hit_Cost) / CLKS
+ Load_STLB_Miss
where the $PEBS is supposed to be replaced with the latency from a
sample of MEM_INST_RETIRED.STLB_HIT_LOADS. There are meteorlake
machines shipping but currently there are no perf metrics.

Weilin raised the TPEBS problem in the LPC 2023 talk, the issue being
that sampling and counting don't really exist in the current perf tool
code at the same time. BPF could be a workaround but permissions are
an issue. Perhaps leader sampling but then what to do if two latencies
are needed. Forking perf to do this is an expedient and ideally we'd
not do it.

I'm against stuff going in the arch directory in general. If something
is specific to a PMU, let's special case the logic for that PMU in the
tool. The arch directory gets us into messes like:
https://lore.kernel.org/lkml/CAP-5=fVABSBZnsmtRn1uF-k-G1GWM-L5SgiinhPTfHbQsKXb_g@mail.gmail.com/
Where a heterogenous/hybrid/big.little fix is in the x86 arch folder
(overriding a weak symbol) but not in the arm or other ones leading to
an arm bug.

I think the idea of a count coming from an external or tool source is
something of a more generic concept. I think what Weilin is doing here
is setting the groundwork for doing that, a first implementation. I'd
like the expr literals, like #smt_on, #num_cpus, etc. to be more like
tool events such as duration_time. I think we can move in the
direction that Weilin is adding here and then iterate to clean up
these things, hopefully move them to events that then other tools
could use, etc.

Thanks,
Ian




> > +             list_add_tail(&new->nd, &stat_config.tpebs_results);
> > +     }
> > +
> > +     kill(cmd->pid, SIGTERM);
> > +     session = perf_session__new(&data, &script.tool);
> > +     if (IS_ERR(session))
> > +             return PTR_ERR(session);
> > +     script.session = session;
> > +     err = perf_session__process_events(session);
> > +     perf_session__delete(session);
> > +
> > +     return err;
> > +}
> > +
> >  static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >  {
> >       int interval = stat_config.interval;
> > @@ -709,12 +866,14 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >       struct affinity saved_affinity, *affinity = NULL;
> >       int err;
> >       bool second_pass = false;
> > +     struct child_process cmd;
> >
> >       //Prepare perf record for sampling event retire_latency before fork and prepare workload
> >       if (stat_config.tpebs_event_size > 0) {
> >               int ret;
> >
> > -             ret = __run_perf_record();
> > +             pr_debug("perf stat pid = %d\n", getpid());
> > +             ret = prepare_perf_record(&cmd);
> >               if (ret)
> >                       return ret;
> >       }
> > @@ -924,6 +1083,17 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >
> >       t1 = rdclock();
> >
> > +     if (stat_config.tpebs_event_size > 0) {
> > +             int ret;
> > +
> > +             pr_debug("pid = %d\n", getpid());
> > +             pr_debug("cmd.pid = %d\n", cmd.pid);
> > +
> > +             ret = __cmd_script(&cmd);
> > +             close(cmd.out);
> > +             pr_debug("%d\n", ret);
> > +     }
> > +
> >       if (stat_config.walltime_run_table)
> >               stat_config.walltime_run[run_idx] = t1 - t0;
> >
> > @@ -2714,6 +2884,7 @@ int cmd_stat(int argc, const char **argv)
> >       }
> >
> >       INIT_LIST_HEAD(&stat_config.tpebs_events);
> > +     INIT_LIST_HEAD(&stat_config.tpebs_results);
> >
> >       /*
> >        * Metric parsing needs to be delayed as metrics may optimize events
> > @@ -2921,5 +3092,7 @@ int cmd_stat(int argc, const char **argv)
> >       metricgroup__rblist_exit(&stat_config.metric_events);
> >       evlist__close_control(stat_config.ctl_fd, stat_config.ctl_fd_ack, &stat_config.ctl_fd_close);
> >
> > +     tpebs_data__delete();
> > +
> >       return status;
> >  }
> > diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> > index fc16299c915f..2298ca3b370b 100644
> > --- a/tools/perf/util/data.c
> > +++ b/tools/perf/util/data.c
> > @@ -173,6 +173,10 @@ static bool check_pipe(struct perf_data *data)
> >       int fd = perf_data__is_read(data) ?
> >                STDIN_FILENO : STDOUT_FILENO;
> >
> > +     if (data->fd > 0) {
> > +             fd = data->fd;
> > +     }
> > +
> >       if (!data->path) {
> >               if (!fstat(fd, &st) && S_ISFIFO(st.st_mode))
> >                       is_pipe = true;
> > diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
> > index effcc195d7e9..5554d46ad212 100644
> > --- a/tools/perf/util/data.h
> > +++ b/tools/perf/util/data.h
> > @@ -28,6 +28,7 @@ struct perf_data_file {
> >
> >  struct perf_data {
> >       const char              *path;
> > +     int                      fd;
> >       struct perf_data_file    file;
> >       bool                     is_pipe;
> >       bool                     is_dir;
> > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > index 6c16e5a0b1fc..8518e2b3e5be 100644
> > --- a/tools/perf/util/metricgroup.c
> > +++ b/tools/perf/util/metricgroup.c
> > @@ -691,8 +691,17 @@ static int metricgroup__build_event_string(struct strbuf *events,
> >
> >               if (p) {
> >                       struct tpebs_event *new_event = malloc(sizeof(struct tpebs_event));
> > -                     *p = '\0';
> > +                     char *name;
> > +
> >                       new_event->tpebs_name = strdup(id);
> > +                     *p = '\0';
> > +                     name = malloc(strlen(id) + 2);
> > +                     if (!name)
> > +                             return -ENOMEM;
> > +
> > +                     strcpy(name, id);
> > +                     strcat(name, ":p");
> > +                     new_event->name = name;
>
> For such cases we use asprintf(), that allocates and formats the string
> in one call, look for examples in other tools/perf/ files.
>
> >                       *tpebs_event_size += 1;
> >                       pr_debug("retire_latency required, tpebs_event_size=%lu, new_event=%s\n",
> >                       *tpebs_event_size, new_event->name);
> > diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
> > index 7c24ed768ff3..1fa12cc3294e 100644
> > --- a/tools/perf/util/metricgroup.h
> > +++ b/tools/perf/util/metricgroup.h
> > @@ -71,6 +71,13 @@ struct tpebs_event {
> >       const char *name;
> >       const char *tpebs_name;
> >  };
> > +struct tpebs_retire_lat {
> > +     struct list_head nd;
> > +     const char *name;
> > +     const char *tpebs_name;
> > +     size_t count;
> > +     int sum;
> > +};
>
> Here you declare the constructor and destructor I suggested above.
>
> >  struct metric_event *metricgroup__lookup(struct rblist *metric_events,
> >                                        struct evsel *evsel,
> > diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> > index 9d0186316b29..8e180f13aa2d 100644
> > --- a/tools/perf/util/stat.h
> > +++ b/tools/perf/util/stat.h
> > @@ -111,6 +111,9 @@ struct perf_stat_config {
> >       struct rblist            metric_events;
> >       struct list_head         tpebs_events;
> >       size_t                   tpebs_event_size;
> > +     struct list_head         tpebs_results;
> > +     pid_t                    tpebs_pid;
> > +     int                      tpebs_pipe;
> >       int                      ctl_fd;
> >       int                      ctl_fd_ack;
> >       bool                     ctl_fd_close;
> > --
> > 2.43.0
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.
  2024-02-21 20:34     ` Ian Rogers
@ 2024-02-23  7:03       ` Namhyung Kim
  2024-02-23  7:47         ` Ian Rogers
  0 siblings, 1 reply; 11+ messages in thread
From: Namhyung Kim @ 2024-02-23  7:03 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, weilin.wang, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers

Hi,

On Wed, Feb 21, 2024 at 12:34 PM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Feb 21, 2024 at 9:52 AM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > On Wed, Feb 21, 2024 at 02:20:56AM -0500, weilin.wang@intel.com wrote:
> > > From: Weilin Wang <weilin.wang@intel.com>
> >
> > You wrote no description for this patch, please add one and show
> > examples of the feature being used, if possible.
> >
> > See below for more comments.
> >
> > > Signed-off-by: Weilin Wang <weilin.wang@intel.com>
> > > +static int __cmd_script(struct child_process *cmd __maybe_unused)
> > > +{
> > > +     int err = 0;
> > > +     struct perf_session *session;
> > > +     struct perf_data data = {
> > > +             .mode = PERF_DATA_MODE_READ,
> > > +             .path = PERF_DATA,
> > > +             .fd   = cmd->out,
> > > +     };
> > > +     struct perf_script script = {
> > > +             .tool = {
> > > +             .sample          = process_sample_event,
> > > +             .ordered_events  = true,
> > > +             .ordering_requires_timestamps = true,
> > > +             .feature         = process_feature_event,
> > > +             .attr            = perf_event__process_attr,
> > > +             },
> > > +     };
> > > +     struct tpebs_event *e;
> > > +
> > > +     list_for_each_entry(e, &stat_config.tpebs_events, nd) {
> > > +             struct tpebs_retire_lat *new = malloc(sizeof(struct tpebs_retire_lat));
> > > +
> > > +             if (!new)
> > > +                     return -1;
> > > +             new->name = strdup(e->name);
> > > +             new->tpebs_name = strdup(e->tpebs_name);
> > > +             new->count = 0;
> > > +             new->sum = 0;
> >
> > Without even having thought that much about this overall architecture,
> > that looks too heavy at first sight, the above is done in tools/perf/
> > as:
> >
> > void tpebs_retire_lat__delete(struct tpebs_retire_lat *retire_lat)
> > {
> >         if (retire_lat == NULL)
> >                 return;
> >
> >         zfree(&retire_lat->tpebs_name);
> >         zfree(&retire_lat->tpebs_name);
> > }
> >
> > struct tpebs_retire_lat__new(tpebs_event *e)
> > {
> >         struct tpebs_retire_lat *retire_lat = zalloc(sizeof(*retire_lat));
> >
> >         if (retire_lat != NULL) {
> >                 retire_lat->name = strdup(e->name);
> >                 retire_lat->tpebs_name = strdup(e->tpebs_name);
> >
> >                 if (retire_lat->name == NULL || retire_lat->tpebs_name == NULL) {
> >                         tpebs_retire_lat__delete(retire_lat);
> >                         retire_lat = NULL;
> >                 }
> >         }
> >
> >         return retire_lat;
> > }
> >
> > And then you call the constructor  in that loop, and the destructor at
> > some point when those data structures are not needed.
> >
> > We do it because perf has a TUI mode and we may end up calling tools
> > from them in a long running session, so we need to avoid leaks.
> >
> > Also can we somehow hide Intel specific terms in arch specific files
> > while leaving something generic, possibly implementable in other arches
> > in the core code? I mean hiding clearly intel specific stuff like the
> > "tpebs" term in tools/perf/arch/x86/.
>
> Thanks Arnaldo, TPEBS support is necessary to support TMA metrics 4.7
> on meteorlake. The timed values replace hard coded constants that
> assume worst case latencies. You can see the metrics here:
> https://github.com/intel/perfmon/blob/main/TMA_Metrics-full.csv
> in the MTL (meteorlake) column row 29 there is:
> MEM_INST_RETIRED.STLB_HIT_LOADS*min($PEBS, #Mem_STLB_Hit_Cost) / CLKS
> + Load_STLB_Miss
> where the $PEBS is supposed to be replaced with the latency from a
> sample of MEM_INST_RETIRED.STLB_HIT_LOADS. There are meteorlake
> machines shipping but currently there are no perf metrics.
>
> Weilin raised the TPEBS problem in the LPC 2023 talk, the issue being
> that sampling and counting don't really exist in the current perf tool
> code at the same time. BPF could be a workaround but permissions are
> an issue. Perhaps leader sampling but then what to do if two latencies
> are needed. Forking perf to do this is an expedient and ideally we'd
> not do it.

Even with BPF, I think it needs two instances of an event - one for
counting and the other for sampling, right?  I wonder if it can just
use a single event for sampling and show the sum of periods in
PERF_SAMPLE_READ.

I'm not sure if an event group can have sampling and non-sampling
events at the same time.  But it can be done without groups then.
Anyway what's the issue with two latencies?

Thanks,
Namhyung

>
> I'm against stuff going in the arch directory in general. If something
> is specific to a PMU, let's special case the logic for that PMU in the
> tool. The arch directory gets us into messes like:
> https://lore.kernel.org/lkml/CAP-5=fVABSBZnsmtRn1uF-k-G1GWM-L5SgiinhPTfHbQsKXb_g@mail.gmail.com/
> Where a heterogenous/hybrid/big.little fix is in the x86 arch folder
> (overriding a weak symbol) but not in the arm or other ones leading to
> an arm bug.
>
> I think the idea of a count coming from an external or tool source is
> something of a more generic concept. I think what Weilin is doing here
> is setting the groundwork for doing that, a first implementation. I'd
> like the expr literals, like #smt_on, #num_cpus, etc. to be more like
> tool events such as duration_time. I think we can move in the
> direction that Weilin is adding here and then iterate to clean up
> these things, hopefully move them to events that then other tools
> could use, etc.
>
> Thanks,
> Ian
>
>
>
>
> > > +             list_add_tail(&new->nd, &stat_config.tpebs_results);
> > > +     }
> > > +
> > > +     kill(cmd->pid, SIGTERM);
> > > +     session = perf_session__new(&data, &script.tool);
> > > +     if (IS_ERR(session))
> > > +             return PTR_ERR(session);
> > > +     script.session = session;
> > > +     err = perf_session__process_events(session);
> > > +     perf_session__delete(session);
> > > +
> > > +     return err;
> > > +}
> > > +
> > >  static int __run_perf_stat(int argc, const char **argv, int run_idx)
> > >  {
> > >       int interval = stat_config.interval;
> > > @@ -709,12 +866,14 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> > >       struct affinity saved_affinity, *affinity = NULL;
> > >       int err;
> > >       bool second_pass = false;
> > > +     struct child_process cmd;
> > >
> > >       //Prepare perf record for sampling event retire_latency before fork and prepare workload
> > >       if (stat_config.tpebs_event_size > 0) {
> > >               int ret;
> > >
> > > -             ret = __run_perf_record();
> > > +             pr_debug("perf stat pid = %d\n", getpid());
> > > +             ret = prepare_perf_record(&cmd);
> > >               if (ret)
> > >                       return ret;
> > >       }
> > > @@ -924,6 +1083,17 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> > >
> > >       t1 = rdclock();
> > >
> > > +     if (stat_config.tpebs_event_size > 0) {
> > > +             int ret;
> > > +
> > > +             pr_debug("pid = %d\n", getpid());
> > > +             pr_debug("cmd.pid = %d\n", cmd.pid);
> > > +
> > > +             ret = __cmd_script(&cmd);
> > > +             close(cmd.out);
> > > +             pr_debug("%d\n", ret);
> > > +     }
> > > +
> > >       if (stat_config.walltime_run_table)
> > >               stat_config.walltime_run[run_idx] = t1 - t0;
> > >
> > > @@ -2714,6 +2884,7 @@ int cmd_stat(int argc, const char **argv)
> > >       }
> > >
> > >       INIT_LIST_HEAD(&stat_config.tpebs_events);
> > > +     INIT_LIST_HEAD(&stat_config.tpebs_results);
> > >
> > >       /*
> > >        * Metric parsing needs to be delayed as metrics may optimize events
> > > @@ -2921,5 +3092,7 @@ int cmd_stat(int argc, const char **argv)
> > >       metricgroup__rblist_exit(&stat_config.metric_events);
> > >       evlist__close_control(stat_config.ctl_fd, stat_config.ctl_fd_ack, &stat_config.ctl_fd_close);
> > >
> > > +     tpebs_data__delete();
> > > +
> > >       return status;
> > >  }
> > > diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> > > index fc16299c915f..2298ca3b370b 100644
> > > --- a/tools/perf/util/data.c
> > > +++ b/tools/perf/util/data.c
> > > @@ -173,6 +173,10 @@ static bool check_pipe(struct perf_data *data)
> > >       int fd = perf_data__is_read(data) ?
> > >                STDIN_FILENO : STDOUT_FILENO;
> > >
> > > +     if (data->fd > 0) {
> > > +             fd = data->fd;
> > > +     }
> > > +
> > >       if (!data->path) {
> > >               if (!fstat(fd, &st) && S_ISFIFO(st.st_mode))
> > >                       is_pipe = true;
> > > diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
> > > index effcc195d7e9..5554d46ad212 100644
> > > --- a/tools/perf/util/data.h
> > > +++ b/tools/perf/util/data.h
> > > @@ -28,6 +28,7 @@ struct perf_data_file {
> > >
> > >  struct perf_data {
> > >       const char              *path;
> > > +     int                      fd;
> > >       struct perf_data_file    file;
> > >       bool                     is_pipe;
> > >       bool                     is_dir;
> > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > > index 6c16e5a0b1fc..8518e2b3e5be 100644
> > > --- a/tools/perf/util/metricgroup.c
> > > +++ b/tools/perf/util/metricgroup.c
> > > @@ -691,8 +691,17 @@ static int metricgroup__build_event_string(struct strbuf *events,
> > >
> > >               if (p) {
> > >                       struct tpebs_event *new_event = malloc(sizeof(struct tpebs_event));
> > > -                     *p = '\0';
> > > +                     char *name;
> > > +
> > >                       new_event->tpebs_name = strdup(id);
> > > +                     *p = '\0';
> > > +                     name = malloc(strlen(id) + 2);
> > > +                     if (!name)
> > > +                             return -ENOMEM;
> > > +
> > > +                     strcpy(name, id);
> > > +                     strcat(name, ":p");
> > > +                     new_event->name = name;
> >
> > For such cases we use asprintf(), that allocates and formats the string
> > in one call, look for examples in other tools/perf/ files.
> >
> > >                       *tpebs_event_size += 1;
> > >                       pr_debug("retire_latency required, tpebs_event_size=%lu, new_event=%s\n",
> > >                       *tpebs_event_size, new_event->name);
> > > diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
> > > index 7c24ed768ff3..1fa12cc3294e 100644
> > > --- a/tools/perf/util/metricgroup.h
> > > +++ b/tools/perf/util/metricgroup.h
> > > @@ -71,6 +71,13 @@ struct tpebs_event {
> > >       const char *name;
> > >       const char *tpebs_name;
> > >  };
> > > +struct tpebs_retire_lat {
> > > +     struct list_head nd;
> > > +     const char *name;
> > > +     const char *tpebs_name;
> > > +     size_t count;
> > > +     int sum;
> > > +};
> >
> > Here you declare the constructor and destructor I suggested above.
> >
> > >  struct metric_event *metricgroup__lookup(struct rblist *metric_events,
> > >                                        struct evsel *evsel,
> > > diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> > > index 9d0186316b29..8e180f13aa2d 100644
> > > --- a/tools/perf/util/stat.h
> > > +++ b/tools/perf/util/stat.h
> > > @@ -111,6 +111,9 @@ struct perf_stat_config {
> > >       struct rblist            metric_events;
> > >       struct list_head         tpebs_events;
> > >       size_t                   tpebs_event_size;
> > > +     struct list_head         tpebs_results;
> > > +     pid_t                    tpebs_pid;
> > > +     int                      tpebs_pipe;
> > >       int                      ctl_fd;
> > >       int                      ctl_fd_ack;
> > >       bool                     ctl_fd_close;
> > > --
> > > 2.43.0
> > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.
  2024-02-23  7:03       ` Namhyung Kim
@ 2024-02-23  7:47         ` Ian Rogers
  2024-02-24  2:44           ` Namhyung Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Rogers @ 2024-02-23  7:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, weilin.wang, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers

On Thu, Feb 22, 2024 at 11:03 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi,
>
> On Wed, Feb 21, 2024 at 12:34 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On Wed, Feb 21, 2024 at 9:52 AM Arnaldo Carvalho de Melo
> > <acme@kernel.org> wrote:
> > >
> > > On Wed, Feb 21, 2024 at 02:20:56AM -0500, weilin.wang@intel.com wrote:
> > > > From: Weilin Wang <weilin.wang@intel.com>
> > >
> > > You wrote no description for this patch, please add one and show
> > > examples of the feature being used, if possible.
> > >
> > > See below for more comments.
> > >
> > > > Signed-off-by: Weilin Wang <weilin.wang@intel.com>
> > > > +static int __cmd_script(struct child_process *cmd __maybe_unused)
> > > > +{
> > > > +     int err = 0;
> > > > +     struct perf_session *session;
> > > > +     struct perf_data data = {
> > > > +             .mode = PERF_DATA_MODE_READ,
> > > > +             .path = PERF_DATA,
> > > > +             .fd   = cmd->out,
> > > > +     };
> > > > +     struct perf_script script = {
> > > > +             .tool = {
> > > > +             .sample          = process_sample_event,
> > > > +             .ordered_events  = true,
> > > > +             .ordering_requires_timestamps = true,
> > > > +             .feature         = process_feature_event,
> > > > +             .attr            = perf_event__process_attr,
> > > > +             },
> > > > +     };
> > > > +     struct tpebs_event *e;
> > > > +
> > > > +     list_for_each_entry(e, &stat_config.tpebs_events, nd) {
> > > > +             struct tpebs_retire_lat *new = malloc(sizeof(struct tpebs_retire_lat));
> > > > +
> > > > +             if (!new)
> > > > +                     return -1;
> > > > +             new->name = strdup(e->name);
> > > > +             new->tpebs_name = strdup(e->tpebs_name);
> > > > +             new->count = 0;
> > > > +             new->sum = 0;
> > >
> > > Without even having thought that much about this overall architecture,
> > > that looks too heavy at first sight, the above is done in tools/perf/
> > > as:
> > >
> > > void tpebs_retire_lat__delete(struct tpebs_retire_lat *retire_lat)
> > > {
> > >         if (retire_lat == NULL)
> > >                 return;
> > >
> > >         zfree(&retire_lat->tpebs_name);
> > >         zfree(&retire_lat->tpebs_name);
> > > }
> > >
> > > struct tpebs_retire_lat__new(tpebs_event *e)
> > > {
> > >         struct tpebs_retire_lat *retire_lat = zalloc(sizeof(*retire_lat));
> > >
> > >         if (retire_lat != NULL) {
> > >                 retire_lat->name = strdup(e->name);
> > >                 retire_lat->tpebs_name = strdup(e->tpebs_name);
> > >
> > >                 if (retire_lat->name == NULL || retire_lat->tpebs_name == NULL) {
> > >                         tpebs_retire_lat__delete(retire_lat);
> > >                         retire_lat = NULL;
> > >                 }
> > >         }
> > >
> > >         return retire_lat;
> > > }
> > >
> > > And then you call the constructor  in that loop, and the destructor at
> > > some point when those data structures are not needed.
> > >
> > > We do it because perf has a TUI mode and we may end up calling tools
> > > from them in a long running session, so we need to avoid leaks.
> > >
> > > Also can we somehow hide Intel specific terms in arch specific files
> > > while leaving something generic, possibly implementable in other arches
> > > in the core code? I mean hiding clearly intel specific stuff like the
> > > "tpebs" term in tools/perf/arch/x86/.
> >
> > Thanks Arnaldo, TPEBS support is necessary to support TMA metrics 4.7
> > on meteorlake. The timed values replace hard coded constants that
> > assume worst case latencies. You can see the metrics here:
> > https://github.com/intel/perfmon/blob/main/TMA_Metrics-full.csv
> > in the MTL (meteorlake) column row 29 there is:
> > MEM_INST_RETIRED.STLB_HIT_LOADS*min($PEBS, #Mem_STLB_Hit_Cost) / CLKS
> > + Load_STLB_Miss
> > where the $PEBS is supposed to be replaced with the latency from a
> > sample of MEM_INST_RETIRED.STLB_HIT_LOADS. There are meteorlake
> > machines shipping but currently there are no perf metrics.
> >
> > Weilin raised the TPEBS problem in the LPC 2023 talk, the issue being
> > that sampling and counting don't really exist in the current perf tool
> > code at the same time. BPF could be a workaround but permissions are
> > an issue. Perhaps leader sampling but then what to do if two latencies
> > are needed. Forking perf to do this is an expedient and ideally we'd
> > not do it.
>
> Even with BPF, I think it needs two instances of an event - one for
> counting and the other for sampling, right?  I wonder if it can just
> use a single event for sampling and show the sum of periods in
> PERF_SAMPLE_READ.
>
> I'm not sure if an event group can have sampling and non-sampling
> events at the same time.  But it can be done without groups then.
> Anyway what's the issue with two latencies?

The latencies come from samples and with leader sampling only the
leader gets sampled so we can't get two latencies. For 2 latencies
we'd need 2 groups for 2 leaders or to modify leader sampling - if we
could encode that we want to sample but don't need the sample in the
mmap, just want the latency being available to be read, etc. This and
BPF are both long-term viable solutions, but forking is the expedient
solution to get something going - we'd likely want it as a fallback
anyway.

Thanks,
Ian

> Thanks,
> Namhyung
>
> >
> > I'm against stuff going in the arch directory in general. If something
> > is specific to a PMU, let's special case the logic for that PMU in the
> > tool. The arch directory gets us into messes like:
> > https://lore.kernel.org/lkml/CAP-5=fVABSBZnsmtRn1uF-k-G1GWM-L5SgiinhPTfHbQsKXb_g@mail.gmail.com/
> > Where a heterogenous/hybrid/big.little fix is in the x86 arch folder
> > (overriding a weak symbol) but not in the arm or other ones leading to
> > an arm bug.
> >
> > I think the idea of a count coming from an external or tool source is
> > something of a more generic concept. I think what Weilin is doing here
> > is setting the groundwork for doing that, a first implementation. I'd
> > like the expr literals, like #smt_on, #num_cpus, etc. to be more like
> > tool events such as duration_time. I think we can move in the
> > direction that Weilin is adding here and then iterate to clean up
> > these things, hopefully move them to events that then other tools
> > could use, etc.
> >
> > Thanks,
> > Ian
> >
> >
> >
> >
> > > > +             list_add_tail(&new->nd, &stat_config.tpebs_results);
> > > > +     }
> > > > +
> > > > +     kill(cmd->pid, SIGTERM);
> > > > +     session = perf_session__new(&data, &script.tool);
> > > > +     if (IS_ERR(session))
> > > > +             return PTR_ERR(session);
> > > > +     script.session = session;
> > > > +     err = perf_session__process_events(session);
> > > > +     perf_session__delete(session);
> > > > +
> > > > +     return err;
> > > > +}
> > > > +
> > > >  static int __run_perf_stat(int argc, const char **argv, int run_idx)
> > > >  {
> > > >       int interval = stat_config.interval;
> > > > @@ -709,12 +866,14 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> > > >       struct affinity saved_affinity, *affinity = NULL;
> > > >       int err;
> > > >       bool second_pass = false;
> > > > +     struct child_process cmd;
> > > >
> > > >       //Prepare perf record for sampling event retire_latency before fork and prepare workload
> > > >       if (stat_config.tpebs_event_size > 0) {
> > > >               int ret;
> > > >
> > > > -             ret = __run_perf_record();
> > > > +             pr_debug("perf stat pid = %d\n", getpid());
> > > > +             ret = prepare_perf_record(&cmd);
> > > >               if (ret)
> > > >                       return ret;
> > > >       }
> > > > @@ -924,6 +1083,17 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> > > >
> > > >       t1 = rdclock();
> > > >
> > > > +     if (stat_config.tpebs_event_size > 0) {
> > > > +             int ret;
> > > > +
> > > > +             pr_debug("pid = %d\n", getpid());
> > > > +             pr_debug("cmd.pid = %d\n", cmd.pid);
> > > > +
> > > > +             ret = __cmd_script(&cmd);
> > > > +             close(cmd.out);
> > > > +             pr_debug("%d\n", ret);
> > > > +     }
> > > > +
> > > >       if (stat_config.walltime_run_table)
> > > >               stat_config.walltime_run[run_idx] = t1 - t0;
> > > >
> > > > @@ -2714,6 +2884,7 @@ int cmd_stat(int argc, const char **argv)
> > > >       }
> > > >
> > > >       INIT_LIST_HEAD(&stat_config.tpebs_events);
> > > > +     INIT_LIST_HEAD(&stat_config.tpebs_results);
> > > >
> > > >       /*
> > > >        * Metric parsing needs to be delayed as metrics may optimize events
> > > > @@ -2921,5 +3092,7 @@ int cmd_stat(int argc, const char **argv)
> > > >       metricgroup__rblist_exit(&stat_config.metric_events);
> > > >       evlist__close_control(stat_config.ctl_fd, stat_config.ctl_fd_ack, &stat_config.ctl_fd_close);
> > > >
> > > > +     tpebs_data__delete();
> > > > +
> > > >       return status;
> > > >  }
> > > > diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> > > > index fc16299c915f..2298ca3b370b 100644
> > > > --- a/tools/perf/util/data.c
> > > > +++ b/tools/perf/util/data.c
> > > > @@ -173,6 +173,10 @@ static bool check_pipe(struct perf_data *data)
> > > >       int fd = perf_data__is_read(data) ?
> > > >                STDIN_FILENO : STDOUT_FILENO;
> > > >
> > > > +     if (data->fd > 0) {
> > > > +             fd = data->fd;
> > > > +     }
> > > > +
> > > >       if (!data->path) {
> > > >               if (!fstat(fd, &st) && S_ISFIFO(st.st_mode))
> > > >                       is_pipe = true;
> > > > diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
> > > > index effcc195d7e9..5554d46ad212 100644
> > > > --- a/tools/perf/util/data.h
> > > > +++ b/tools/perf/util/data.h
> > > > @@ -28,6 +28,7 @@ struct perf_data_file {
> > > >
> > > >  struct perf_data {
> > > >       const char              *path;
> > > > +     int                      fd;
> > > >       struct perf_data_file    file;
> > > >       bool                     is_pipe;
> > > >       bool                     is_dir;
> > > > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> > > > index 6c16e5a0b1fc..8518e2b3e5be 100644
> > > > --- a/tools/perf/util/metricgroup.c
> > > > +++ b/tools/perf/util/metricgroup.c
> > > > @@ -691,8 +691,17 @@ static int metricgroup__build_event_string(struct strbuf *events,
> > > >
> > > >               if (p) {
> > > >                       struct tpebs_event *new_event = malloc(sizeof(struct tpebs_event));
> > > > -                     *p = '\0';
> > > > +                     char *name;
> > > > +
> > > >                       new_event->tpebs_name = strdup(id);
> > > > +                     *p = '\0';
> > > > +                     name = malloc(strlen(id) + 2);
> > > > +                     if (!name)
> > > > +                             return -ENOMEM;
> > > > +
> > > > +                     strcpy(name, id);
> > > > +                     strcat(name, ":p");
> > > > +                     new_event->name = name;
> > >
> > > For such cases we use asprintf(), that allocates and formats the string
> > > in one call, look for examples in other tools/perf/ files.
> > >
> > > >                       *tpebs_event_size += 1;
> > > >                       pr_debug("retire_latency required, tpebs_event_size=%lu, new_event=%s\n",
> > > >                       *tpebs_event_size, new_event->name);
> > > > diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h
> > > > index 7c24ed768ff3..1fa12cc3294e 100644
> > > > --- a/tools/perf/util/metricgroup.h
> > > > +++ b/tools/perf/util/metricgroup.h
> > > > @@ -71,6 +71,13 @@ struct tpebs_event {
> > > >       const char *name;
> > > >       const char *tpebs_name;
> > > >  };
> > > > +struct tpebs_retire_lat {
> > > > +     struct list_head nd;
> > > > +     const char *name;
> > > > +     const char *tpebs_name;
> > > > +     size_t count;
> > > > +     int sum;
> > > > +};
> > >
> > > Here you declare the constructor and destructor I suggested above.
> > >
> > > >  struct metric_event *metricgroup__lookup(struct rblist *metric_events,
> > > >                                        struct evsel *evsel,
> > > > diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> > > > index 9d0186316b29..8e180f13aa2d 100644
> > > > --- a/tools/perf/util/stat.h
> > > > +++ b/tools/perf/util/stat.h
> > > > @@ -111,6 +111,9 @@ struct perf_stat_config {
> > > >       struct rblist            metric_events;
> > > >       struct list_head         tpebs_events;
> > > >       size_t                   tpebs_event_size;
> > > > +     struct list_head         tpebs_results;
> > > > +     pid_t                    tpebs_pid;
> > > > +     int                      tpebs_pipe;
> > > >       int                      ctl_fd;
> > > >       int                      ctl_fd_ack;
> > > >       bool                     ctl_fd_close;
> > > > --
> > > > 2.43.0
> > > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.
  2024-02-23  7:47         ` Ian Rogers
@ 2024-02-24  2:44           ` Namhyung Kim
  0 siblings, 0 replies; 11+ messages in thread
From: Namhyung Kim @ 2024-02-24  2:44 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, weilin.wang, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers

On Thu, Feb 22, 2024 at 11:48 PM Ian Rogers <irogers@google.com> wrote:
>
> On Thu, Feb 22, 2024 at 11:03 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi,
> >
> > On Wed, Feb 21, 2024 at 12:34 PM Ian Rogers <irogers@google.com> wrote:
> > > Weilin raised the TPEBS problem in the LPC 2023 talk, the issue being
> > > that sampling and counting don't really exist in the current perf tool
> > > code at the same time. BPF could be a workaround but permissions are
> > > an issue. Perhaps leader sampling but then what to do if two latencies
> > > are needed. Forking perf to do this is an expedient and ideally we'd
> > > not do it.
> >
> > Even with BPF, I think it needs two instances of an event - one for
> > counting and the other for sampling, right?  I wonder if it can just
> > use a single event for sampling and show the sum of periods in
> > PERF_SAMPLE_READ.
> >
> > I'm not sure if an event group can have sampling and non-sampling
> > events at the same time.  But it can be done without groups then.
> > Anyway what's the issue with two latencies?
>
> The latencies come from samples and with leader sampling only the
> leader gets sampled so we can't get two latencies. For 2 latencies
> we'd need 2 groups for 2 leaders or to modify leader sampling

Do those 2 latencies come from 2 events or a single event?

But I realized that PERF_SAMPLE_READ would return the period
only and I guess the latency is in PERF_SAMPLE_WEIGHT(_STRUCT), right?
Then it won't work with PERF_SAMPLE_READ unless we extend the
read format to include the weights.

> - if we
> could encode that we want to sample but don't need the sample in the
> mmap, just want the latency being available to be read, etc. This and
> BPF are both long-term viable solutions, but forking is the expedient
> solution to get something going - we'd likely want it as a fallback
> anyway.

Maybe we can add it to the read format, but I'm not sure how the
kernel maintains the value.  PERF_SAMPLE_READ would be fine
to return the value in the sample.  But it should support read(2) too.

Simply adding the values might not be what users want.  Maybe
average latency/weight is meaningful but it could depend on
what the event measures..

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-02-24  2:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-21  7:20 [RFC PATCH v1 0/5] TPEBS counting mode support weilin.wang
2024-02-21  7:20 ` [RFC PATCH v1 1/5] perf stat: Parse and find tpebs events when parsing metrics to prepare for perf record sampling weilin.wang
2024-02-21  7:20 ` [RFC PATCH v1 2/5] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric weilin.wang
2024-02-21 17:52   ` Arnaldo Carvalho de Melo
2024-02-21 20:34     ` Ian Rogers
2024-02-23  7:03       ` Namhyung Kim
2024-02-23  7:47         ` Ian Rogers
2024-02-24  2:44           ` Namhyung Kim
2024-02-21  7:20 ` [RFC PATCH v1 3/5] perf stat: Add retire latency values into the expr_parse_ctx to prepare for final metric calculation weilin.wang
2024-02-21  7:20 ` [RFC PATCH v1 4/5] perf stat: Create another thread for sample data processing weilin.wang
2024-02-21  7:20 ` [RFC PATCH v1 5/5] perf stat: Add retire latency print functions to print out at the very end of print out weilin.wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).