From: David Carrillo-Cisneros <davidcc@google.com>
To: linux-kernel@vger.kernel.org
Cc: "x86@kernel.org" <x86@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Vegard Nossum <vegard.nossum@gmail.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Nilay Vaish <nilayvaish@gmail.com>, Borislav Petkov <bp@suse.de>,
Vikas Shivappa <vikas.shivappa@linux.intel.com>,
Ravi V Shankar <ravi.v.shankar@intel.com>,
Fenghua Yu <fenghua.yu@intel.com>, Paul Turner <pjt@google.com>,
Stephane Eranian <eranian@google.com>,
David Carrillo-Cisneros <davidcc@google.com>
Subject: [PATCH v3 30/46] perf/x86/intel/cmt: add asynchronous read for task events
Date: Sat, 29 Oct 2016 17:38:27 -0700 [thread overview]
Message-ID: <1477787923-61185-31-git-send-email-davidcc@google.com> (raw)
In-Reply-To: <1477787923-61185-1-git-send-email-davidcc@google.com>
Reading CMT/MBM task events in intel_cmt poses a challenge since it
requires to read from multiple sockets (usually accomplished with
an IPI) while called with interruptions disabled.
The current usptream driver avoids the problematic read with
interruptions disabled by not making a dummy perf_event_read() of
llc_occupancy for task events. The actual read is performed in
perf_event_count() whenever perf_event_count() is called with
interruptions enabled. This works but changes the expected behavior of
perf_event_read() and perf_event_count().
This patch follows a different approach by performing asynchronous reads
of all remote packages and waiting until either reads complete or a
deadline expires. It will return an error if the IPI does not complete
on time.
This asynchronous approach has advantages:
1) It does not alter perf_event_count().
2) perf_event_read() does read for all types of events.
3) reads in all packages are executed in parallel. Parallel readings are
specially advantageous because reading CMT/MBM events is slow
(it requires sequential read and write to two msrs). I measured a
llc_occupancy read in my HSW system to take ~1250 cycles.
Parallel reads of all caches will become a bigger advantage with
oncoming bigger microprocessors (up to 8 packages) and when CMT support
for L2 is rolled out, since task events will require a read to all
L2 caches units.
Also, introduces struct cmt_csd and a per-pkg array of cmt_csd's (one
per rmid). This array is used to control the potentially concurrent
reads to each rmid's event.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
---
arch/x86/events/intel/cmt.c | 206 +++++++++++++++++++++++++++++++++++++++++++-
arch/x86/events/intel/cmt.h | 14 +++
2 files changed, 217 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/cmt.c b/arch/x86/events/intel/cmt.c
index f5ab48e..f9195ec 100644
--- a/arch/x86/events/intel/cmt.c
+++ b/arch/x86/events/intel/cmt.c
@@ -8,6 +8,12 @@
#include "cmt.h"
#include "../perf_event.h"
+#define RMID_VAL_UNAVAIL BIT_ULL(62)
+#define RMID_VAL_ERROR BIT_ULL(63)
+
+#define MSR_IA32_QM_CTR 0x0c8e
+#define MSR_IA32_QM_EVTSEL 0x0c8d
+
#define QOS_L3_OCCUP_EVENT_ID BIT_ULL(0)
#define QOS_EVENT_MASK QOS_L3_OCCUP_EVENT_ID
@@ -1229,6 +1235,41 @@ static bool __match_event(struct perf_event *a, struct perf_event *b)
return false;
}
+/* Must be called in a cpu in rmid's package. */
+static int cmt_rmid_read(u32 rmid, u64 *val)
+{
+ wrmsr(MSR_IA32_QM_EVTSEL, QOS_L3_OCCUP_EVENT_ID, rmid);
+ rdmsrl(MSR_IA32_QM_CTR, *val);
+
+ /* Ignore this reading on error states and do not update the value. */
+ if (WARN_ON_ONCE(*val & RMID_VAL_ERROR))
+ return -EINVAL;
+ if (WARN_ON_ONCE(*val & RMID_VAL_UNAVAIL))
+ return -ENODATA;
+
+ return 0;
+}
+
+/* time to wait before time out rmid read IPI */
+#define CMT_IPI_WAIT_TIME 100 /* ms */
+
+static void smp_call_rmid_read(void *data)
+{
+ struct cmt_csd *ccsd = (struct cmt_csd *)data;
+
+ ccsd->ret = cmt_rmid_read(ccsd->rmid, &ccsd->value);
+
+ /*
+ * smp_call_function_single_async must have cleared csd.flags
+ * before invoking func.
+ */
+ WARN_ON_ONCE(ccsd->csd.flags);
+
+ /* ensure values are stored before clearing on_read. */
+ barrier();
+ atomic_set(&ccsd->on_read, 0);
+}
+
static struct pmu intel_cmt_pmu;
/* Try to find a monr with same target, otherwise create new one. */
@@ -1318,9 +1359,145 @@ static struct monr *monr_next_descendant_post(struct monr *pos,
return pos->parent;
}
+/* Issue reads to CPUs in remote packages. */
+static int issue_read_remote_pkgs(struct monr *monr,
+ struct cmt_csd **issued_ccsds,
+ u32 *local_rmid)
+{
+ struct cmt_csd *ccsd;
+ struct pmonr *pmonr;
+ struct pkg_data *pkgd = NULL;
+ union pmonr_rmids rmids;
+ int err = 0, read_cpu;
+ u16 p, local_pkgid = topology_logical_package_id(smp_processor_id());
+
+ /* Issue remote packages. */
+ rcu_read_lock();
+ while ((pkgd = cmt_pkgs_data_next_rcu(pkgd))) {
+
+ pmonr = pkgd_pmonr(pkgd, monr);
+ /* Retrieve rmid and check state without acquiring pkg locks. */
+ rmids.value = atomic64_read(&pmonr->atomic_rmids);
+ /* Skip Off and Unused states. */
+ if (rmids.sched_rmid == INVALID_RMID)
+ continue;
+ /*
+ * pmonrs in Dep_{Idle,Dirty} states have run without rmid
+ * their own rmid and would report wrong occupancy.
+ */
+ if (rmids.read_rmid == INVALID_RMID) {
+ err = -EBUSY;
+ goto exit;
+ }
+ p = pkgd->pkgid;
+ if (p == local_pkgid) {
+ *local_rmid = rmids.read_rmid;
+ continue;
+ }
+ ccsd = &pkgd->ccsds[rmids.read_rmid];
+ /*
+ * Reads of remote packages are only required for task events.
+ * pmu->read in task events is serialized by task_ctx->lock in
+ * perf generic code. Events with same task target share rmid
+ * and task_ctx->lock, so there is no need to support
+ * concurrent remote reads to same RMID.
+ *
+ * ccsd->on_read could be not zero if a read expired before,
+ * in that rare case, fail now and hope next time the ongoing
+ * IPI will have completed.
+ */
+ if (atomic_inc_return(&ccsd->on_read) > 1) {
+ err = -EBUSY;
+ goto exit;
+ }
+ issued_ccsds[p] = ccsd;
+ read_cpu = cpumask_any(topology_core_cpumask(pkgd->work_cpu));
+ err = smp_call_function_single_async(read_cpu, &ccsd->csd);
+ if (WARN_ON_ONCE(err))
+ goto exit;
+ }
+exit:
+ rcu_read_unlock();
+
+ return err;
+}
+
+/*
+ * Fail if IPI hasn't finish by @deadline if @count != NULL.
+ * @count == NULL signals no update and therefore no reason to wait.
+ */
+static int read_issued_pkgs(struct cmt_csd **issued_ccsds,
+ u64 deadline, u64 *count)
+{
+ struct cmt_csd *ccsd;
+ int p;
+
+ for (p = 0; p < CMT_MAX_NR_PKGS; p++) {
+ ccsd = issued_ccsds[p];
+ if (!ccsd)
+ continue;
+
+ /* A smp_cond_acquire on ccsd->on_read and time. */
+ while (atomic_read(&ccsd->on_read) &&
+ time_before64(get_jiffies_64(), deadline))
+ cpu_relax();
+
+ /*
+ * guarantee that cssd->ret and ccsd->value are read after
+ * read or deadline.
+ */
+ smp_rmb();
+
+ /* last IPI took unusually long. */
+ if (WARN_ON_ONCE(atomic_read(&ccsd->on_read)))
+ return -EBUSY;
+ /* ccsd->on_read is always cleared after csd.flags. */
+ if (WARN_ON_ONCE(ccsd->csd.flags))
+ return -EBUSY;
+ if (ccsd->ret)
+ return ccsd->ret;
+
+ *count += ccsd->value;
+ }
+
+ return 0;
+}
+
+static int read_all_pkgs(struct monr *monr, int wait_time_ms, u64 *count)
+{
+ struct cmt_csd *issued_ccsds[CMT_MAX_NR_PKGS];
+ int err = 0;
+ u32 local_rmid = INVALID_RMID;
+ u64 deadline, val;
+
+ *count = 0;
+ memset(issued_ccsds, 0, CMT_MAX_NR_PKGS * sizeof(*issued_ccsds));
+ err = issue_read_remote_pkgs(monr, issued_ccsds, &local_rmid);
+ if (err)
+ return err;
+ /*
+ * Save deadline after issuing reads so that all packages have at
+ * least wait_time_ms to complete.
+ */
+ deadline = get_jiffies_64() + msecs_to_jiffies(wait_time_ms);
+
+ /* Read local package. */
+ if (local_rmid != INVALID_RMID) {
+ err = cmt_rmid_read(local_rmid, &val);
+ if (WARN_ON_ONCE(err))
+ return err;
+ *count += val;
+ }
+
+ return read_issued_pkgs(issued_ccsds, deadline, count);
+}
+
static int intel_cmt_event_read(struct perf_event *event)
{
struct monr *monr = monr_from_event(event);
+ u64 count;
+ u16 pkgid = topology_logical_package_id(smp_processor_id());
+ int err;
/*
* preemption disabled since called holding
@@ -1342,11 +1519,17 @@ static int intel_cmt_event_read(struct perf_event *event)
}
if (event->attach_state & PERF_ATTACH_TASK) {
+ /* It's a task event. */
+ err = read_all_pkgs(monr, CMT_IPI_WAIT_TIME, &count);
+ } else {
/* To add support in next patches in series */
return -ENOTSUPP;
}
- /* To add support in next patches in series */
- return -ENOTSUPP;
+ if (err)
+ return err;
+ local64_set(&event->count, count);
+
+ return 0;
}
static inline void __intel_cmt_event_start(struct perf_event *event,
@@ -1566,15 +1749,17 @@ void perf_cgroup_arch_css_offline(struct cgroup_subsys_state *css)
static void free_pkg_data(struct pkg_data *pkg_data)
{
+ kfree(pkg_data->ccsds);
kfree(pkg_data);
}
/* Init pkg_data for @cpu 's package. */
static struct pkg_data *alloc_pkg_data(int cpu)
{
+ struct cmt_csd *ccsd;
struct cpuinfo_x86 *c = &cpu_data(cpu);
struct pkg_data *pkgd;
- int numa_node = cpu_to_node(cpu);
+ int r, ccsds_nr_bytes, numa_node = cpu_to_node(cpu);
u16 pkgid = topology_logical_package_id(cpu);
if (pkgid >= CMT_MAX_NR_PKGS) {
@@ -1618,6 +1803,21 @@ static struct pkg_data *alloc_pkg_data(int cpu)
lockdep_set_class(&pkgd->lock, &lock_keys[pkgid]);
#endif
+ ccsds_nr_bytes = (pkgd->max_rmid + 1) * sizeof(*(pkgd->ccsds));
+ pkgd->ccsds = kzalloc_node(ccsds_nr_bytes, GFP_KERNEL, numa_node);
+ if (!pkgd->ccsds) {
+ free_pkg_data(pkgd);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ for (r = 0; r <= pkgd->max_rmid; r++) {
+ ccsd = &pkgd->ccsds[r];
+ ccsd->rmid = r;
+ ccsd->csd.func = smp_call_rmid_read;
+ ccsd->csd.info = ccsd;
+ __set_bit(r, pkgd->free_rmids);
+ }
+
__min_max_rmid = min(__min_max_rmid, pkgd->max_rmid);
return pkgd;
diff --git a/arch/x86/events/intel/cmt.h b/arch/x86/events/intel/cmt.h
index 1e40e6b..8bb43bd 100644
--- a/arch/x86/events/intel/cmt.h
+++ b/arch/x86/events/intel/cmt.h
@@ -191,6 +191,19 @@ struct pmonr {
enum pmonr_state state;
};
+/**
+ * struct cmt_csd - data for async IPI call that read rmids on remote packages.
+ *
+ * One per rmid per package. One issuer at the time. Readers wait on @value_gen.
+ */
+struct cmt_csd {
+ struct call_single_data csd;
+ atomic_t on_read;
+ u64 value;
+ int ret;
+ u32 rmid;
+};
+
/*
* Compile constant required for bitmap macros.
* Broadwell EP has 2 rmids per logical core, use twice as many as upper bound.
@@ -237,6 +250,7 @@ struct pkg_data {
unsigned int work_cpu;
u32 max_rmid;
u16 pkgid;
+ struct cmt_csd *ccsds;
};
/**
--
2.8.0.rc3.226.g39d4020
next prev parent reply other threads:[~2016-10-30 0:45 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-30 0:37 [PATCH v3 00/46] Cache Monitoring Technology (aka CQM) David Carrillo-Cisneros
2016-10-30 0:37 ` [PATCH v3 01/46] perf/x86/intel/cqm: remove previous version of CQM and MBM David Carrillo-Cisneros
2016-10-30 0:37 ` [PATCH v3 02/46] perf/x86/intel: rename CQM cpufeatures to CMT David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 03/46] x86/intel: add CONFIG_INTEL_RDT_M configuration flag David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 04/46] perf/x86/intel/cmt: add device initialization and CPU hotplug support David Carrillo-Cisneros
2016-11-10 15:19 ` Thomas Gleixner
2016-10-30 0:38 ` [PATCH v3 05/46] perf/x86/intel/cmt: add per-package locks David Carrillo-Cisneros
2016-11-10 21:23 ` Thomas Gleixner
2016-11-11 2:22 ` David Carrillo-Cisneros
2016-11-11 7:21 ` Peter Zijlstra
2016-11-11 7:32 ` Ingo Molnar
2016-11-11 9:41 ` Thomas Gleixner
2016-11-11 17:21 ` David Carrillo-Cisneros
2016-11-13 10:58 ` Thomas Gleixner
2016-11-15 4:53 ` David Carrillo-Cisneros
2016-11-16 19:00 ` Thomas Gleixner
2016-10-30 0:38 ` [PATCH v3 06/46] perf/x86/intel/cmt: add intel_cmt pmu David Carrillo-Cisneros
2016-11-10 21:27 ` Thomas Gleixner
2016-10-30 0:38 ` [PATCH v3 07/46] perf/core: add RDT Monitoring attributes to struct hw_perf_event David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 08/46] perf/x86/intel/cmt: add MONitored Resource (monr) initialization David Carrillo-Cisneros
2016-11-10 23:09 ` Thomas Gleixner
2016-10-30 0:38 ` [PATCH v3 09/46] perf/x86/intel/cmt: add basic monr hierarchy David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 10/46] perf/x86/intel/cmt: add Package MONitored Resource (pmonr) initialization David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 11/46] perf/x86/intel/cmt: add cmt_user_flags (uflags) to monr David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 12/46] perf/x86/intel/cmt: add per-package rmid pools David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 13/46] perf/x86/intel/cmt: add pmonr's Off and Unused states David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 14/46] perf/x86/intel/cmt: add Active and Dep_{Idle, Dirty} states David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 15/46] perf/x86/intel: encapsulate rmid and closid updates in pqr cache David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 16/46] perf/x86/intel/cmt: set sched rmid and complete pmu start/stop/add/del David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 17/46] perf/x86/intel/cmt: add uflag CMT_UF_NOLAZY_RMID David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 18/46] perf/core: add arch_info field to struct perf_cgroup David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 19/46] perf/x86/intel/cmt: add support for cgroup events David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 20/46] perf/core: add pmu::event_terminate David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 21/46] perf/x86/intel/cmt: use newly introduced event_terminate David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 22/46] perf/x86/intel/cmt: sync cgroups and intel_cmt device start/stop David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 23/46] perf/core: hooks to add architecture specific features in perf_cgroup David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 24/46] perf/x86/intel/cmt: add perf_cgroup_arch_css_{online,offline} David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 25/46] perf/x86/intel/cmt: add monr->flags and CMT_MONR_ZOMBIE David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 26/46] sched: introduce the finish_arch_pre_lock_switch() scheduler hook David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 27/46] perf/x86/intel: add pqr cache flags and intel_pqr_ctx_switch David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 28/46] perf,perf/x86,perf/powerpc,perf/arm,perf/*: add int error return to pmu::read David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 29/46] perf/x86/intel/cmt: add error handling to intel_cmt_event_read David Carrillo-Cisneros
2016-10-30 0:38 ` David Carrillo-Cisneros [this message]
2016-10-30 0:38 ` [PATCH v3 31/46] perf/x86/intel/cmt: add subtree read for cgroup events David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 32/46] perf/core: Add PERF_EV_CAP_READ_ANY_{CPU_,}PKG flags David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 33/46] perf/x86/intel/cmt: use PERF_EV_CAP_READ_{,CPU_}PKG flags in Intel cmt David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 34/46] perf/core: introduce PERF_EV_CAP_CGROUP_NO_RECURSION David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 35/46] perf/x86/intel/cmt: use PERF_EV_CAP_CGROUP_NO_RECURSION in intel_cmt David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 36/46] perf/core: add perf_event cgroup hooks for subsystem attributes David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 37/46] perf/x86/intel/cmt: add cont_monitoring to perf cgroup David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 38/46] perf/x86/intel/cmt: introduce read SLOs for rotation David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 39/46] perf/x86/intel/cmt: add max_recycle_threshold sysfs attribute David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 40/46] perf/x86/intel/cmt: add rotation scheduled work David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 41/46] perf/x86/intel/cmt: add rotation minimum progress SLO David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 42/46] perf/x86/intel/cmt: add rmid stealing David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 43/46] perf/x86/intel/cmt: add CMT_UF_NOSTEAL_RMID flag David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 44/46] perf/x86/intel/cmt: add debugfs intel_cmt directory David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 45/46] perf/stat: fix bug in handling events in error state David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 46/46] perf/stat: revamp read error handling, snapshot and per_pkg events David Carrillo-Cisneros
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1477787923-61185-31-git-send-email-davidcc@google.com \
--to=davidcc@google.com \
--cc=ak@linux.intel.com \
--cc=bp@suse.de \
--cc=eranian@google.com \
--cc=fenghua.yu@intel.com \
--cc=kan.liang@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=nilayvaish@gmail.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=ravi.v.shankar@intel.com \
--cc=tglx@linutronix.de \
--cc=vegard.nossum@gmail.com \
--cc=vikas.shivappa@linux.intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).