All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Namhyung Kim <namhyung@kernel.org>,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [RFC] Sharing PMU counters across compatible events
Date: Wed, 6 Dec 2017 12:42:04 +0100	[thread overview]
Message-ID: <20171206114204.GB10580@krava> (raw)
In-Reply-To: <20171201141950.GB2421075@devbig577.frc2.facebook.com>

On Fri, Dec 01, 2017 at 06:19:50AM -0800, Tejun Heo wrote:
> Hello,
> 
> There are cases where a single PMU event, let's say instructions, is
> interesting for different subsets of the system.  For example, it
> could be interesting to monitor instructions system-wide, at cgroup
> level and per each thread.
> 
> This could easily be me not knowing better but I can't think of a way
> to do the above without consuming multiple PMU counters to monitor
> instructions, which quickly gets out of hand with the product of
> multiple domains and counters.  Getting broken down numbers and adding
> up doesn't work at cgroup (the numbers aren't mutually exclusive) or
> thread level.
> 
> If there's a way to achieve this using the existing features, I'd be
> glad to learn about it.  If not, the patch at the bottom is a crude
> half-working proof-of-concept to share PMU counters across compatible
> events.
> 
> In a nutshell, while adding events, it looks whether there already is
> a compatible event.  If so, instead of enabling the new event, it gets
> linked to the already enabled one (the master event) and count of the
> dup event is updated by adding the delta of the master event.
> 
> That patch is just a proof of concept.  It's missing counter
> propagation somewhere and the dup counts end up somewhat lower than
> they should be.  The master selection and switching are half-broken.
> Event compatibility testing is barely implemented, so on and so forth.
> However, it does allow gathering events for different targets without
> consuming multiple PMU counters.
> 
> What do you think?  Would this be something worth pursuing?

I see this rather on the hw level, since it concerns HW counters

I think we could detect same (alias) events at the time counters
are added/removed on/from the cpu and share their HW part like
counter idx, regs and such (struct hw_perf_event_cpu in my changes)

this way it'd be completely transparent for generic code

I made some untested (just compile) changes and attaching the
top patch, that has the main logic, please check all changes
in here:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  perf/alias_2

there's big change due to changing the perf_event::hw stuff
but the rest is quite simple (attached).. not completely sure
yet if I'm missing some functionality which would break due
to this change

thoughts?

thanks,
jirka


---
 arch/x86/events/core.c     | 58 +++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/perf_event.h |  4 ++++
 kernel/events/core.c       |  1 +
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 2bb5d1b2a622..737857f07881 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1189,6 +1189,54 @@ void x86_pmu_enable_event(struct perf_event *event)
 				       ARCH_PERFMON_EVENTSEL_ENABLE);
 }
 
+static bool is_alias(struct perf_event *alias, struct perf_event *event)
+{
+	if (is_sampling_event(alias))
+		return false;
+
+	return memcmp(&alias->attr, &event->attr, sizeof(alias->attr));
+}
+
+static int alias_add(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+	struct perf_event *master;
+	int n;
+
+	for (n = 0; n < cpuc->n_events; n++) {
+		if (is_alias(event, cpuc->event_list[n]))
+			break;
+	}
+
+	if (n == cpuc->n_events)
+		return 0;
+
+	master = cpuc->event_list[n];
+	event->hw.cpu = master->hw.cpu;
+	list_add_tail(&event->hw.alias, &master->hw.master);
+	return 1;
+}
+
+static int alias_del(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+	if (event->hw.cpu != &event->hw.cpu_local)
+		return 0;
+
+	event->hw.cpu = &event->hw.cpu_local;
+	list_del(&event->hw.alias);
+	return 1;
+}
+
+static void alias_update(struct perf_event *event)
+{
+	struct perf_event *alias;
+
+	list_for_each_entry(alias, &event->hw.master, hw.alias) {
+		alias->count          = event->count;
+		alias->hw.prev_count  = event->hw.prev_count;
+		alias->hw.period_left = event->hw.period_left;
+	}
+}
+
 /*
  * Add a single event to the PMU.
  *
@@ -1200,7 +1248,10 @@ static int x86_pmu_add(struct perf_event *event, int flags)
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct hw_perf_event *hwc;
 	int assign[X86_PMC_IDX_MAX];
-	int n, n0, ret;
+	int n, n0, ret = 0;
+
+	if (alias_add(cpuc, event))
+		goto out;
 
 	hwc = &event->hw;
 
@@ -1383,11 +1434,16 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto do_del;
 
+	if (alias_del(cpuc, event))
+		goto do_del;
+
 	/*
 	 * Not a TXN, therefore cleanup properly.
 	 */
 	x86_pmu_stop(event, PERF_EF_UPDATE);
 
+	alias_update(event);
+
 	for (i = 0; i < cpuc->n_events; i++) {
 		if (event == cpuc->event_list[i])
 			break;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b8bc5ddfbffd..6a09914a3555 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -136,6 +136,10 @@ struct hw_perf_event {
 		struct { /* hardware */
 			struct hw_perf_event_cpu *cpu;
 			struct hw_perf_event_cpu  cpu_local;
+			union {
+				struct list_head master;
+				struct list_head alias;
+			};
 		};
 		struct { /* software */
 			struct hrtimer	hrtimer;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5879c67d90e4..9cf36b2b533f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9513,6 +9513,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	INIT_LIST_HEAD(&event->active_event_entry);
 	INIT_LIST_HEAD(&event->dup_list);
 	INIT_LIST_HEAD(&event->addr_filters.list);
+	INIT_LIST_HEAD(&event->hw.master);
 	INIT_HLIST_NODE(&event->hlist_entry);
 
 
-- 
2.13.6

  reply	other threads:[~2017-12-06 11:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-01 14:19 [RFC] Sharing PMU counters across compatible events Tejun Heo
2017-12-06 11:42 ` Jiri Olsa [this message]
2017-12-11 15:34   ` Tejun Heo
2017-12-13 10:18     ` Jiri Olsa
2017-12-13 16:15       ` Tejun Heo
2017-12-06 12:35 ` Peter Zijlstra
2017-12-11 15:47   ` Tejun Heo
2017-12-12 22:37     ` Peter Zijlstra
2017-12-13 16:18       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171206114204.GB10580@krava \
    --to=jolsa@redhat.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.