[PATCH V5] perf: qcom: Add L3 cache PMU driver

All of lore.kernel.org
 help / color / mirror / Atom feed

From: agustinv@codeaurora.org (Agustin Vega-Frias)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH V5] perf: qcom: Add L3 cache PMU driver
Date: Fri, 31 Mar 2017 11:33:47 -0400	[thread overview]
Message-ID: <f6d5d652b2123a3ad85ef2b0758e4e4a@codeaurora.org> (raw)
In-Reply-To: <20170331135955.GB6488@leverpostej>

Hey Mark,

On 2017-03-31 09:59, Mark Rutland wrote:
> Hi Agustin,
> 
> On Thu, Mar 23, 2017 at 05:03:17PM -0400, Agustin Vega-Frias wrote:
>> This adds a new dynamic PMU to the Perf Events framework to program
>> and control the L3 cache PMUs in some Qualcomm Technologies SOCs.
>> 
>> The driver supports a distributed cache architecture where the overall
>> cache for a socket is comprised of multiple slices each with its own 
>> PMU.
>> Access to each individual PMU is provided even though all CPUs share 
>> all
>> the slices. User space needs to aggregate to individual counts to 
>> provide
>> a global picture.
>> 
>> The driver exports formatting and event information to sysfs so it can
>> be used by the perf user space tools with the syntaxes:
>>    perf stat -a -e l3cache_0_0/read-miss/
>>    perf stat -a -e l3cache_0_0/event=0x21/
>> 
>> Signed-off-by: Agustin Vega-Frias <agustinv@codeaurora.org>
> 
> Thanks for respinning this; it looks good to me.
> 
>> +static int qcom_l3_cache__event_init(struct perf_event *event)
>> +{
>> +	struct l3cache_pmu *l3pmu = to_l3cache_pmu(event->pmu);
>> +	struct hw_perf_event *hwc = &event->hw;
>> +	struct perf_event *sibling;
> 
>> +	/*
>> +	 * We must NOT create groups containing mixed PMUs, although 
>> software
>> +	 * events are acceptable
>> +	 */
>> +	if (event->group_leader->pmu != event->pmu &&
>> +			!is_software_event(event->group_leader))
>> +		return -EINVAL;
>> +
>> +	list_for_each_entry(sibling, &event->group_leader->sibling_list,
>> +			group_entry)
>> +		if (sibling->pmu != event->pmu &&
>> +				!is_software_event(sibling))
>> +			return -EINVAL;
> 
> Sorry for spotting this so late, but I just  noticed that we don't
> validate the group isn't too large to ever fit in the HW, which is
> important for avoiding pointless work in perf_rotate_context() when the
> mux hrtimer callback occurs.
> 
> We fail to do that in other drivers, too, so that's something we should
> clean up more generally.
> 
> Here, I'd like to factor the group validation logic out into a helper:
> 
> #define event_num_counters(e)	(event_uses_long_counter(e) ? 2 : 1)
> 
> static bool qcom_l3_cache__validate_event_group(struct perf_event 
> *event)
> {
> 	struct perf_event *leader = event->group_leader;
> 	struct perf_event *sibling;
> 	int counters = 0;
> 
> 	/*
> 	 * We must NOT create groups containing mixed PMUs, although
> 	 * software.
> 	 */
> 	if (leader->pmu != event->pmu && !is_software_event(leader))
> 		return false;
> 
> 	counters = event_num_counters(event);
> 	counters += event_num_counters(leader);
> 
> 	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
> 		if (is_software_event(sibling))
> 			continue;
> 		if (sibling->pmu != event->pmu)
> 			return false;
> 		counters += event_num_counters(sibling);
> 	}
> 
> 	/*
> 	 * If the group requires more counters than the HW has, it
> 	 * cannot ever be scheduled.
> 	 */
> 	return counters <= L3_NUM_COUNTERS;
> }
> 
> ... where in qcom_l3_cache__event_init() we'd do:
> 
> 	if (!qcom_l3_cache__validate_event_group(event))
> 		return -EINVAL;
> 
> If you're happy with that, I can make that change when applying.

Sounds good. I'll also apply this change internally and test it out.
I'll send it as V6 if I get it ready today.

Thanks,
Agustin

-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a 
Linux Foundation Collaborative Project.

WARNING: multiple messages have this Message-ID (diff)

From: Agustin Vega-Frias <agustinv@codeaurora.org>
To: Mark Rutland <mark.rutland@arm.com>
Cc: linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	Will Deacon <will.deacon@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	timur@codeaurora.org, nleeder@codeaurora.org,
	agross@codeaurora.org, jcm@redhat.com, msalter@redhat.com,
	mlangsdo@redhat.com, ahs3@redhat.com
Subject: Re: [PATCH V5] perf: qcom: Add L3 cache PMU driver
Date: Fri, 31 Mar 2017 11:33:47 -0400	[thread overview]
Message-ID: <f6d5d652b2123a3ad85ef2b0758e4e4a@codeaurora.org> (raw)
In-Reply-To: <20170331135955.GB6488@leverpostej>

Hey Mark,

On 2017-03-31 09:59, Mark Rutland wrote:
> Hi Agustin,
> 
> On Thu, Mar 23, 2017 at 05:03:17PM -0400, Agustin Vega-Frias wrote:
>> This adds a new dynamic PMU to the Perf Events framework to program
>> and control the L3 cache PMUs in some Qualcomm Technologies SOCs.
>> 
>> The driver supports a distributed cache architecture where the overall
>> cache for a socket is comprised of multiple slices each with its own 
>> PMU.
>> Access to each individual PMU is provided even though all CPUs share 
>> all
>> the slices. User space needs to aggregate to individual counts to 
>> provide
>> a global picture.
>> 
>> The driver exports formatting and event information to sysfs so it can
>> be used by the perf user space tools with the syntaxes:
>>    perf stat -a -e l3cache_0_0/read-miss/
>>    perf stat -a -e l3cache_0_0/event=0x21/
>> 
>> Signed-off-by: Agustin Vega-Frias <agustinv@codeaurora.org>
> 
> Thanks for respinning this; it looks good to me.
> 
>> +static int qcom_l3_cache__event_init(struct perf_event *event)
>> +{
>> +	struct l3cache_pmu *l3pmu = to_l3cache_pmu(event->pmu);
>> +	struct hw_perf_event *hwc = &event->hw;
>> +	struct perf_event *sibling;
> 
>> +	/*
>> +	 * We must NOT create groups containing mixed PMUs, although 
>> software
>> +	 * events are acceptable
>> +	 */
>> +	if (event->group_leader->pmu != event->pmu &&
>> +			!is_software_event(event->group_leader))
>> +		return -EINVAL;
>> +
>> +	list_for_each_entry(sibling, &event->group_leader->sibling_list,
>> +			group_entry)
>> +		if (sibling->pmu != event->pmu &&
>> +				!is_software_event(sibling))
>> +			return -EINVAL;
> 
> Sorry for spotting this so late, but I just  noticed that we don't
> validate the group isn't too large to ever fit in the HW, which is
> important for avoiding pointless work in perf_rotate_context() when the
> mux hrtimer callback occurs.
> 
> We fail to do that in other drivers, too, so that's something we should
> clean up more generally.
> 
> Here, I'd like to factor the group validation logic out into a helper:
> 
> #define event_num_counters(e)	(event_uses_long_counter(e) ? 2 : 1)
> 
> static bool qcom_l3_cache__validate_event_group(struct perf_event 
> *event)
> {
> 	struct perf_event *leader = event->group_leader;
> 	struct perf_event *sibling;
> 	int counters = 0;
> 
> 	/*
> 	 * We must NOT create groups containing mixed PMUs, although
> 	 * software.
> 	 */
> 	if (leader->pmu != event->pmu && !is_software_event(leader))
> 		return false;
> 
> 	counters = event_num_counters(event);
> 	counters += event_num_counters(leader);
> 
> 	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
> 		if (is_software_event(sibling))
> 			continue;
> 		if (sibling->pmu != event->pmu)
> 			return false;
> 		counters += event_num_counters(sibling);
> 	}
> 
> 	/*
> 	 * If the group requires more counters than the HW has, it
> 	 * cannot ever be scheduled.
> 	 */
> 	return counters <= L3_NUM_COUNTERS;
> }
> 
> ... where in qcom_l3_cache__event_init() we'd do:
> 
> 	if (!qcom_l3_cache__validate_event_group(event))
> 		return -EINVAL;
> 
> If you're happy with that, I can make that change when applying.

Sounds good. I'll also apply this change internally and test it out.
I'll send it as V6 if I get it ready today.

Thanks,
Agustin

-- 
Qualcomm Datacenter Technologies, Inc. on behalf of the Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a 
Linux Foundation Collaborative Project.

next prev parent reply	other threads:[~2017-03-31 15:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-23 21:03 [PATCH V5] perf: qcom: Add L3 cache PMU driver Agustin Vega-Frias
2017-03-23 21:03 ` Agustin Vega-Frias
2017-03-31 13:59 ` Mark Rutland
2017-03-31 13:59   ` Mark Rutland
2017-03-31 15:33   ` Agustin Vega-Frias [this message]
2017-03-31 15:33     ` Agustin Vega-Frias

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6d5d652b2123a3ad85ef2b0758e4e4a@codeaurora.org \
    --to=agustinv@codeaurora.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.