From mboxrd@z Thu Jan 1 00:00:00 1970 From: agustinv@codeaurora.org (agustinv at codeaurora.org) Date: Mon, 21 Mar 2016 11:56:59 -0400 Subject: [PATCH V1] perf: qcom: Add L3 cache PMU driver In-Reply-To: <20160321090400.GQ6344@twins.programming.kicks-ass.net> References: <1458333422-8963-1-git-send-email-agustinv@codeaurora.org> <20160321090400.GQ6344@twins.programming.kicks-ass.net> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 2016-03-21 05:04, Peter Zijlstra wrote: > On Fri, Mar 18, 2016 at 04:37:02PM -0400, Agustin Vega-Frias wrote: >> This adds a new dynamic PMU to the Perf Events framework to program >> and control the L3 cache PMUs in some Qualcomm Technologies SOCs. >> >> The driver supports a distributed cache architecture where the overall >> cache is comprised of multiple slices each with its own PMU. The >> driver >> aggregates counts across the whole system to provide a global picture >> of the metrics selected by the user. > > So is there never a situation where you want to profile just a single > slice? No, access to each individual slice is determined by hashing based on the target address. > > It userspace at all aware of these slices through other means? Userspace is not aware of the actual topology. > > That is; typically we do not aggregate in-kernel like this but simply > expose each slice as a separate PMU and let userspace sort things. My decision of single vs. multiple PMUs was based on reducing the overhead required of retrieving the system-wide counts, which would require multiple system calls in the multiple-PMU case.