From: "Yan, Zheng" <zheng.z.yan@intel.com>
To: Stephane Eranian <eranian@google.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [PATCH 1/7] perf, x86: Reduce lbr_sel_map size
Date: Wed, 26 Jun 2013 14:05:46 +0800 [thread overview]
Message-ID: <51CA84BA.8040102@intel.com> (raw)
In-Reply-To: <CABPqkBTNRsd4fHF=Rts45pHSApmpEnOkNNcvSWeDXjryc+bs_g@mail.gmail.com>
On 06/25/2013 08:33 PM, Stephane Eranian wrote:
> On Tue, Jun 25, 2013 at 10:47 AM, Yan, Zheng <zheng.z.yan@intel.com> wrote:
>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>
>> The index of lbr_sel_map is bit value of perf branch_sample_type.
>> We can reduce lbr_sel_map size by using bit shift as index.
>>
>> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>> ---
>> arch/x86/kernel/cpu/perf_event.h | 4 +++
>> arch/x86/kernel/cpu/perf_event_intel_lbr.c | 50 ++++++++++++++----------------
>> include/uapi/linux/perf_event.h | 42 +++++++++++++++++--------
>> 3 files changed, 56 insertions(+), 40 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
>> index 108dc75..a74d554 100644
>> --- a/arch/x86/kernel/cpu/perf_event.h
>> +++ b/arch/x86/kernel/cpu/perf_event.h
>> @@ -447,6 +447,10 @@ struct x86_pmu {
>> struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
>> };
>>
>> +enum {
>> + PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE = PERF_SAMPLE_BRANCH_MAX_SHIFT,
>> +};
>> +
>> #define x86_add_quirk(func_) \
>> do { \
>> static struct x86_pmu_quirk __quirk __initdata = { \
>> diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
>> index d5be06a..a72e9e9 100644
>> --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
>> +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
>> @@ -69,10 +69,6 @@ static enum {
>> #define LBR_FROM_FLAG_IN_TX (1ULL << 62)
>> #define LBR_FROM_FLAG_ABORT (1ULL << 61)
>>
>> -#define for_each_branch_sample_type(x) \
>> - for ((x) = PERF_SAMPLE_BRANCH_USER; \
>> - (x) < PERF_SAMPLE_BRANCH_MAX; (x) <<= 1)
>> -
>> /*
>> * x86control flow change classification
>> * x86control flow changes include branches, interrupts, traps, faults
>> @@ -387,14 +383,14 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event)
>> {
>> struct hw_perf_event_extra *reg;
>> u64 br_type = event->attr.branch_sample_type;
>> - u64 mask = 0, m;
>> - u64 v;
>> + u64 mask = 0, v;
>> + int i;
>>
>> - for_each_branch_sample_type(m) {
>> - if (!(br_type & m))
>> + for (i = 0; i < PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE; i++) {
>> + if (!(br_type & (1U << i)))
>
> Needs to be 1ULL to avoid bug later on. br_type is u64.
thanks, will fix.
>
>> continue;
>>
>> - v = x86_pmu.lbr_sel_map[m];
>> + v = x86_pmu.lbr_sel_map[i];
>> if (v == LBR_NOT_SUPP)
>> return -EOPNOTSUPP;
>>
>> @@ -649,33 +645,33 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
>> /*
>> * Map interface branch filters onto LBR filters
>> */
>> -static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
>> - [PERF_SAMPLE_BRANCH_ANY] = LBR_ANY,
>> - [PERF_SAMPLE_BRANCH_USER] = LBR_USER,
>> - [PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
>> - [PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
>> - [PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_REL_JMP
>> - | LBR_IND_JMP | LBR_FAR,
>> +static const int nhm_lbr_sel_map[PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE] = {
>> + [PERF_SAMPLE_BRANCH_ANY_SHIFT] = LBR_ANY,
>> + [PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
>> + [PERF_SAMPLE_BRANCH_KERNEL_SHIFT] = LBR_KERNEL,
>> + [PERF_SAMPLE_BRANCH_HV_SHIFT] = LBR_IGN,
>> + [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = LBR_RETURN | LBR_REL_JMP
>> + | LBR_IND_JMP | LBR_FAR,
>> /*
>> * NHM/WSM erratum: must include REL_JMP+IND_JMP to get CALL branches
>> */
>> - [PERF_SAMPLE_BRANCH_ANY_CALL] =
>> + [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] =
>> LBR_REL_CALL | LBR_IND_CALL | LBR_REL_JMP | LBR_IND_JMP | LBR_FAR,
>> /*
>> * NHM/WSM erratum: must include IND_JMP to capture IND_CALL
>> */
>> - [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL | LBR_IND_JMP,
>> + [PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL | LBR_IND_JMP,
>> };
>>
> I think it would ease formatting readability
> if the indexes could be constructed from a simple macro:
> #define BR_SHIFT(a) \
> PERF_SAMPLE_##a##_SHIFT
>
> #define BR_SMPL(a) \
> PERF_SAMPLE_##a
#define BR_SHIFT(a) PERF_SAMPLE_##a##_SHIFT
static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE] = {
[BR_SHIFT(ANY)] = LBR_ANY,
[BR_SHIFT(USER)] = LBR_USER,
[BR_SHIFT(KERNEL) = LBR_KERNEL,
[BR_SHIFT(HV)] = LBR_IGN,
[BR_SHIFT(ANY_RETURN)] = LBR_RETURN | LBR_FAR,
[BR_SHIFT(ANY_CALL)] = LBR_REL_CALL | LBR_IND_CALL | LBR_FAR,
[BR_SHIFT(CALL_STACK)] = LBR_IND_CALL,
};
the code looks strange, I don't think it has better readability.
Regards
Yan, Zheng
>
>> -static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
>> - [PERF_SAMPLE_BRANCH_ANY] = LBR_ANY,
>> - [PERF_SAMPLE_BRANCH_USER] = LBR_USER,
>> - [PERF_SAMPLE_BRANCH_KERNEL] = LBR_KERNEL,
>> - [PERF_SAMPLE_BRANCH_HV] = LBR_IGN,
>> - [PERF_SAMPLE_BRANCH_ANY_RETURN] = LBR_RETURN | LBR_FAR,
>> - [PERF_SAMPLE_BRANCH_ANY_CALL] = LBR_REL_CALL | LBR_IND_CALL
>> - | LBR_FAR,
>> - [PERF_SAMPLE_BRANCH_IND_CALL] = LBR_IND_CALL,
>> +static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_SELECT_MAP_SIZE] = {
>> + [PERF_SAMPLE_BRANCH_ANY_SHIFT] = LBR_ANY,
>> + [PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
>> + [PERF_SAMPLE_BRANCH_KERNEL_SHIFT] = LBR_KERNEL,
>> + [PERF_SAMPLE_BRANCH_HV_SHIFT] = LBR_IGN,
>> + [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = LBR_RETURN | LBR_FAR,
>> + [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_REL_CALL | LBR_IND_CALL
>> + | LBR_FAR,
>> + [PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL,
>> };
>>
>> /* core */
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index 0b1df41..2ec219e 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -148,20 +148,36 @@ enum perf_event_sample_format {
>> * The branch types can be combined, however BRANCH_ANY covers all types
>> * of branches and therefore it supersedes all the other types.
>> */
>> +enum perf_branch_sample_type_shift {
>> + PERF_SAMPLE_BRANCH_USER_SHIFT = 0, /* user branches */
>> + PERF_SAMPLE_BRANCH_KERNEL_SHIFT = 1, /* kernel branches */
>> + PERF_SAMPLE_BRANCH_HV_SHIFT = 2, /* hypervisor branches */
>> +
>> + PERF_SAMPLE_BRANCH_ANY_SHIFT = 3, /* any branch types */
>> + PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT = 4, /* any call branch */
>> + PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT = 5, /* any return branch */
>> + PERF_SAMPLE_BRANCH_IND_CALL_SHIFT = 6, /* indirect calls */
>> + PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT = 7, /* transaction aborts */
>> + PERF_SAMPLE_BRANCH_IN_TX_SHIFT = 8, /* in transaction */
>> + PERF_SAMPLE_BRANCH_NO_TX_SHIFT = 9, /* not in transaction */
>> +
>> + PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
>> +};
>> +
>> enum perf_branch_sample_type {
>> - PERF_SAMPLE_BRANCH_USER = 1U << 0, /* user branches */
>> - PERF_SAMPLE_BRANCH_KERNEL = 1U << 1, /* kernel branches */
>> - PERF_SAMPLE_BRANCH_HV = 1U << 2, /* hypervisor branches */
>> -
>> - PERF_SAMPLE_BRANCH_ANY = 1U << 3, /* any branch types */
>> - PERF_SAMPLE_BRANCH_ANY_CALL = 1U << 4, /* any call branch */
>> - PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << 5, /* any return branch */
>> - PERF_SAMPLE_BRANCH_IND_CALL = 1U << 6, /* indirect calls */
>> - PERF_SAMPLE_BRANCH_ABORT_TX = 1U << 7, /* transaction aborts */
>> - PERF_SAMPLE_BRANCH_IN_TX = 1U << 8, /* in transaction */
>> - PERF_SAMPLE_BRANCH_NO_TX = 1U << 9, /* not in transaction */
>> -
>> - PERF_SAMPLE_BRANCH_MAX = 1U << 10, /* non-ABI */
>> + PERF_SAMPLE_BRANCH_USER = 1U << PERF_SAMPLE_BRANCH_USER_SHIFT,
>> + PERF_SAMPLE_BRANCH_KERNEL = 1U << PERF_SAMPLE_BRANCH_KERNEL_SHIFT,
>> + PERF_SAMPLE_BRANCH_HV = 1U << PERF_SAMPLE_BRANCH_HV_SHIFT,
>> +
>> + PERF_SAMPLE_BRANCH_ANY = 1U << PERF_SAMPLE_BRANCH_ANY_SHIFT,
>> + PERF_SAMPLE_BRANCH_ANY_CALL = 1U << PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT,
>> + PERF_SAMPLE_BRANCH_ANY_RETURN = 1U << PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT,
>> + PERF_SAMPLE_BRANCH_IND_CALL = 1U << PERF_SAMPLE_BRANCH_IND_CALL_SHIFT,
>> + PERF_SAMPLE_BRANCH_ABORT_TX = 1U << PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT,
>> + PERF_SAMPLE_BRANCH_IN_TX = 1U << PERF_SAMPLE_BRANCH_IN_TX_SHIFT,
>> + PERF_SAMPLE_BRANCH_NO_TX = 1U << PERF_SAMPLE_BRANCH_NO_TX_SHIFT,
>> +
>> + PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>> };
>>
>> #define PERF_SAMPLE_BRANCH_PLM_ALL \
>> --
>> 1.8.1.4
>>
next prev parent reply other threads:[~2013-06-26 6:05 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-25 8:47 [PATCH 0/7] perf, x86: Haswell LBR call stack support Yan, Zheng
2013-06-25 8:47 ` [PATCH 1/7] perf, x86: Reduce lbr_sel_map size Yan, Zheng
2013-06-25 12:33 ` Stephane Eranian
2013-06-26 6:05 ` Yan, Zheng [this message]
2013-06-25 8:47 ` [PATCH 2/7] perf, x86: Basic Haswell LBR call stack support Yan, Zheng
2013-06-25 12:37 ` Stephane Eranian
2013-06-26 2:42 ` Yan, Zheng
2013-06-25 8:47 ` [PATCH 3/7] perf, x86: Introduce x86 special perf event context Yan, Zheng
2013-06-25 8:47 ` [PATCH 4/7] perf, x86: Save/resotre LBR stack during context switch Yan, Zheng
2013-06-26 11:47 ` Peter Zijlstra
2013-06-25 8:47 ` [PATCH 5/7] perf, core: Pass perf_sample_data to perf_callchain() Yan, Zheng
2013-06-25 8:47 ` [PATCH 6/7] perf, x86: Use LBR call stack to get user callchain Yan, Zheng
2013-06-26 9:00 ` Stephane Eranian
2013-06-26 12:42 ` Stephane Eranian
2013-06-26 12:45 ` Stephane Eranian
2013-06-27 1:52 ` Yan, Zheng
2013-06-27 1:40 ` Yan, Zheng
2013-06-27 8:58 ` Stephane Eranian
2013-06-28 2:24 ` Yan, Zheng
2013-06-25 8:47 ` [PATCH 7/7] perf, x86: Discard zero length call entries in LBR call stack Yan, Zheng
2013-06-25 12:40 ` [PATCH 0/7] perf, x86: Haswell LBR call stack support Stephane Eranian
2013-06-25 15:27 ` Andi Kleen
2013-06-25 15:30 ` Andi Kleen
2013-06-26 11:54 ` Peter Zijlstra
2013-06-26 16:40 ` Andi Kleen
2013-06-26 16:48 ` Stephane Eranian
2013-06-27 8:07 ` Yan, Zheng
2013-06-26 12:07 ` Peter Zijlstra
2013-06-26 16:59 ` Andi Kleen
2013-06-26 17:11 ` Peter Zijlstra
-- strict thread matches above, loose matches on Subject: below --
2013-02-25 2:01 [PATCH V3 " Yan, Zheng
2013-02-25 2:01 ` [PATCH 1/7] perf, x86: Reduce lbr_sel_map size Yan, Zheng
2013-01-30 6:30 [PATCH 0/7] perf, x86: Haswell LBR call stack support Yan, Zheng
2013-01-30 6:30 ` [PATCH 1/7] perf, x86: Reduce lbr_sel_map size Yan, Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51CA84BA.8040102@intel.com \
--to=zheng.z.yan@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=andi@firstfloor.org \
--cc=eranian@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.