* Re: [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply
2026-02-28 5:33 ` [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply Dapeng Mi
@ 2026-03-07 1:27 ` Chen, Zide
2026-03-11 20:03 ` Peter Zijlstra
2026-03-11 20:16 ` Peter Zijlstra
2026-03-16 9:50 ` [tip: perf/urgent] " tip-bot2 for Dapeng Mi
2 siblings, 1 reply; 10+ messages in thread
From: Chen, Zide @ 2026-03-07 1:27 UTC (permalink / raw)
To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Falcon Thomas,
Xudong Hao, stable
On 2/27/2026 9:33 PM, Dapeng Mi wrote:
> When running the command:
> 'perf record -e "{instructions,instructions:p}" -j any,counter sleep 1',
> a "shift-out-of-bounds" warning is reported on CWF.
>
> [ 5231.981423][ C17] UBSAN: shift-out-of-bounds in /kbuild/src/consumer/arch/x86/events/intel/lbr.c:970:15
> [ 5231.981428][ C17] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
> [ 5231.981436][ C17] CPU: 17 UID: 0 PID: 211871 Comm: sleep Tainted: G S W 6.18.0-2025-12-09-intel-next-48166-g6cf574943ba3 #1 PREEMPT(none)
> [ 5231.981445][ C17] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
> [ 5231.981447][ C17] Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.IPC.3544.P98.2508260307 08/26/2025
> [ 5231.981449][ C17] Call Trace:
> [ 5231.981453][ C17] <NMI>
> [ 5231.981455][ C17] dump_stack_lvl+0x4b/0x70
> [ 5231.981463][ C17] ubsan_epilogue+0x5/0x2b
> [ 5231.981468][ C17] __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe6
> [ 5231.981472][ C17] ? __entry_text_end+0x158b/0x102259
> [ 5231.981475][ C17] intel_pmu_lbr_counters_reorder.isra.0.cold+0x2a/0xa7
> [ 5231.981480][ C17] ? __task_pid_nr_ns+0x134/0x2a0
> [ 5231.981483][ C17] ? __pfx_intel_pmu_lbr_counters_reorder.isra.0+0x10/0x10
> [ 5231.981486][ C17] ? __pfx_perf_output_sample+0x10/0x10
> [ 5231.981489][ C17] ? arch_perf_update_userpage+0x293/0x310
> [ 5231.981491][ C17] ? __pfx_arch_perf_update_userpage+0x10/0x10
> [ 5231.981494][ C17] ? local_clock_noinstr+0xd/0x100
> [ 5231.981498][ C17] ? calc_timer_values+0x2cb/0x860
> [ 5231.981501][ C17] ? perf_event_update_userpage+0x399/0x5b0
> [ 5231.981505][ C17] ? __pfx_perf_event_update_userpage+0x10/0x10
> [ 5231.981508][ C17] ? local_clock_noinstr+0xd/0x100
> [ 5231.981511][ C17] ? __perf_event_account_interrupt+0x11c/0x540
> [ 5231.981514][ C17] intel_pmu_lbr_save_brstack+0xc0/0x4c0
> [ 5231.981518][ C17] setup_arch_pebs_sample_data+0x114b/0x2400
> [ 5231.981522][ C17] ? __pfx_x86_perf_event_set_period+0x10/0x10
> [ 5231.981526][ C17] intel_pmu_drain_arch_pebs+0x64d/0xcc0
> [ 5231.981530][ C17] ? __pfx_intel_pmu_drain_arch_pebs+0x10/0x10
> [ 5231.981534][ C17] ? unwind_next_frame+0x11c5/0x1df0
> [ 5231.981541][ C17] ? intel_pmu_drain_bts_buffer+0xbf/0x6e0
> [ 5231.981545][ C17] ? __pfx_intel_pmu_drain_bts_buffer+0x10/0x10
> [ 5231.981550][ C17] handle_pmi_common+0x5c5/0xcb0
> [ 5231.981553][ C17] ? __pfx_handle_pmi_common+0x10/0x10
> [ 5231.981556][ C17] ? intel_idle+0x64/0xb0
> [ 5231.981560][ C17] ? intel_bts_interrupt+0xe5/0x4c0
> [ 5231.981562][ C17] ? __pfx_intel_bts_interrupt+0x10/0x10
> [ 5231.981565][ C17] ? intel_pmu_lbr_filter+0x27f/0x910
> [ 5231.981568][ C17] intel_pmu_handle_irq+0x2ed/0x600
> [ 5231.981571][ C17] perf_event_nmi_handler+0x219/0x280
> [ 5231.981575][ C17] ? __pfx_perf_event_nmi_handler+0x10/0x10
> [ 5231.981579][ C17] ? unwind_next_frame+0x11c5/0x1df0
> [ 5231.981582][ C17] nmi_handle.part.0+0x11b/0x3a0
> [ 5231.981585][ C17] ? unwind_next_frame+0x11c5/0x1df0
> [ 5231.981588][ C17] default_do_nmi+0x6b/0x180
> [ 5231.981591][ C17] fred_exc_nmi+0x3e/0x80
> [ 5231.981594][ C17] asm_fred_entrypoint_kernel+0x41/0x60
> [ 5231.981596][ C17] RIP: 0010:unwind_next_frame+0x11c5/0x1df0
> ......
>
> The warning occurs because the second "instructions:p" event, which
> involves branch counters sampling, is incorrectly programmed to fixed
> counter 0 instead of the general-purpose (GP) counters 0-3 that support
> branch counters sampling. Currently only GP counters 0~3 support branch
> counters sampling on CWF, any event involving branch counters sampling
> should be programed on GP counters 0~3.
> Since the counter index of fixed> counter 0 is 32, it leads to the "src" value in below code is right
> shifted 64 bits and trigger the "shift-out-of-bounds" warning.
>
> cnt = (src >> (order[j] * LBR_INFO_BR_CNTR_BITS)) & LBR_INFO_BR_CNTR_MASK;
>
> The root cause is the loss of the branch counters constraint for the
> last event in the branch counters sampling event group. This results in
> the second "instructions:p" event being programmed on fixed counter 0
> incorrectly instead of the appropriate GP counters 0~3.
>
> To address this, we apply the missing branch counters constraint for
> the last event in the group. Additionally, we introduce a new function,
> `intel_set_branch_counter_constr()`, to apply the branch counters
> constraint and avoid code duplication.
>
> Cc: stable@vger.kernel.org
> Reported-by: Xudong Hao <xudong.hao@intel.com>
> Fixes: 33744916196b ("perf/x86/intel: Support branch counters logging")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
Reviewed-by: Zide Chen <zide.chen@intel.com>
> arch/x86/events/intel/core.c | 30 ++++++++++++++++++++----------
> 1 file changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 4768236c054b..4b042d71104f 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4628,6 +4628,19 @@ static inline void intel_pmu_set_acr_caused_constr(struct perf_event *event,
> event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64);
> }
>
> +static inline int intel_set_branch_counter_constr(struct perf_event *event,
> + int *num)
> +{
> + if (branch_sample_call_stack(event))
> + return -EINVAL;
> + if (branch_sample_counters(event)) {
> + (*num)++;
> + event->hw.dyn_constraint &= x86_pmu.lbr_counters;
> + }
> +
> + return 0;
> +}
> +
> static int intel_pmu_hw_config(struct perf_event *event)
> {
> int ret = x86_pmu_hw_config(event);
> @@ -4698,21 +4711,18 @@ static int intel_pmu_hw_config(struct perf_event *event)
> * group, which requires the extra space to store the counters.
> */
> leader = event->group_leader;
> - if (branch_sample_call_stack(leader))
> + if (intel_set_branch_counter_constr(leader, &num))
> return -EINVAL;
> - if (branch_sample_counters(leader)) {
> - num++;
> - leader->hw.dyn_constraint &= x86_pmu.lbr_counters;
> - }
> leader->hw.flags |= PERF_X86_EVENT_BRANCH_COUNTERS;
>
> for_each_sibling_event(sibling, leader) {
> - if (branch_sample_call_stack(sibling))
> + if (intel_set_branch_counter_constr(sibling, &num))
> + return -EINVAL;
> + }
> +
> + if (event != leader) {
> + if (intel_set_branch_counter_constr(event, &num))
> return -EINVAL;
> - if (branch_sample_counters(sibling)) {
> - num++;
> - sibling->hw.dyn_constraint &= x86_pmu.lbr_counters;
> - }
> }
>
> if (num > fls(x86_pmu.lbr_counters))
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply
2026-03-07 1:27 ` Chen, Zide
@ 2026-03-11 20:03 ` Peter Zijlstra
2026-03-12 2:02 ` Mi, Dapeng
0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2026-03-11 20:03 UTC (permalink / raw)
To: Chen, Zide
Cc: Dapeng Mi, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Ian Rogers, Adrian Hunter, Alexander Shishkin, Andi Kleen,
Eranian Stephane, linux-kernel, linux-perf-users, Dapeng Mi,
Falcon Thomas, Xudong Hao, stable
On Fri, Mar 06, 2026 at 05:27:35PM -0800, Chen, Zide wrote:
>
>
> On 2/27/2026 9:33 PM, Dapeng Mi wrote:
> > When running the command:
> > 'perf record -e "{instructions,instructions:p}" -j any,counter sleep 1',
> > a "shift-out-of-bounds" warning is reported on CWF.
> >
> > [ 5231.981423][ C17] UBSAN: shift-out-of-bounds in /kbuild/src/consumer/arch/x86/events/intel/lbr.c:970:15
> > [ 5231.981428][ C17] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
> > [ 5231.981436][ C17] CPU: 17 UID: 0 PID: 211871 Comm: sleep Tainted: G S W 6.18.0-2025-12-09-intel-next-48166-g6cf574943ba3 #1 PREEMPT(none)
> > [ 5231.981445][ C17] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
> > [ 5231.981447][ C17] Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.IPC.3544.P98.2508260307 08/26/2025
> > [ 5231.981449][ C17] Call Trace:
> > [ 5231.981453][ C17] <NMI>
> > [ 5231.981455][ C17] dump_stack_lvl+0x4b/0x70
> > [ 5231.981463][ C17] ubsan_epilogue+0x5/0x2b
> > [ 5231.981468][ C17] __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe6
> > [ 5231.981472][ C17] ? __entry_text_end+0x158b/0x102259
> > [ 5231.981475][ C17] intel_pmu_lbr_counters_reorder.isra.0.cold+0x2a/0xa7
> > [ 5231.981480][ C17] ? __task_pid_nr_ns+0x134/0x2a0
> > [ 5231.981483][ C17] ? __pfx_intel_pmu_lbr_counters_reorder.isra.0+0x10/0x10
> > [ 5231.981486][ C17] ? __pfx_perf_output_sample+0x10/0x10
> > [ 5231.981489][ C17] ? arch_perf_update_userpage+0x293/0x310
> > [ 5231.981491][ C17] ? __pfx_arch_perf_update_userpage+0x10/0x10
> > [ 5231.981494][ C17] ? local_clock_noinstr+0xd/0x100
> > [ 5231.981498][ C17] ? calc_timer_values+0x2cb/0x860
> > [ 5231.981501][ C17] ? perf_event_update_userpage+0x399/0x5b0
> > [ 5231.981505][ C17] ? __pfx_perf_event_update_userpage+0x10/0x10
> > [ 5231.981508][ C17] ? local_clock_noinstr+0xd/0x100
> > [ 5231.981511][ C17] ? __perf_event_account_interrupt+0x11c/0x540
> > [ 5231.981514][ C17] intel_pmu_lbr_save_brstack+0xc0/0x4c0
> > [ 5231.981518][ C17] setup_arch_pebs_sample_data+0x114b/0x2400
> > [ 5231.981522][ C17] ? __pfx_x86_perf_event_set_period+0x10/0x10
> > [ 5231.981526][ C17] intel_pmu_drain_arch_pebs+0x64d/0xcc0
> > [ 5231.981530][ C17] ? __pfx_intel_pmu_drain_arch_pebs+0x10/0x10
> > [ 5231.981534][ C17] ? unwind_next_frame+0x11c5/0x1df0
> > [ 5231.981541][ C17] ? intel_pmu_drain_bts_buffer+0xbf/0x6e0
> > [ 5231.981545][ C17] ? __pfx_intel_pmu_drain_bts_buffer+0x10/0x10
> > [ 5231.981550][ C17] handle_pmi_common+0x5c5/0xcb0
> > [ 5231.981553][ C17] ? __pfx_handle_pmi_common+0x10/0x10
> > [ 5231.981556][ C17] ? intel_idle+0x64/0xb0
> > [ 5231.981560][ C17] ? intel_bts_interrupt+0xe5/0x4c0
> > [ 5231.981562][ C17] ? __pfx_intel_bts_interrupt+0x10/0x10
> > [ 5231.981565][ C17] ? intel_pmu_lbr_filter+0x27f/0x910
> > [ 5231.981568][ C17] intel_pmu_handle_irq+0x2ed/0x600
> > [ 5231.981571][ C17] perf_event_nmi_handler+0x219/0x280
> > [ 5231.981575][ C17] ? __pfx_perf_event_nmi_handler+0x10/0x10
> > [ 5231.981579][ C17] ? unwind_next_frame+0x11c5/0x1df0
> > [ 5231.981582][ C17] nmi_handle.part.0+0x11b/0x3a0
> > [ 5231.981585][ C17] ? unwind_next_frame+0x11c5/0x1df0
> > [ 5231.981588][ C17] default_do_nmi+0x6b/0x180
> > [ 5231.981591][ C17] fred_exc_nmi+0x3e/0x80
> > [ 5231.981594][ C17] asm_fred_entrypoint_kernel+0x41/0x60
> > [ 5231.981596][ C17] RIP: 0010:unwind_next_frame+0x11c5/0x1df0
> > ......
That trace could be reduced to:
UBSAN: shift-out-of-bounds in /kbuild/src/consumer/arch/x86/events/intel/lbr.c:970:15
shift exponent 64 is too large for 64-bit type 'long long unsigned int'
......
intel_pmu_lbr_counters_reorder.isra.0.cold+0x2a/0xa7
intel_pmu_lbr_save_brstack+0xc0/0x4c0
setup_arch_pebs_sample_data+0x114b/0x2400
Without loosing anything valuable.
> > The warning occurs because the second "instructions:p" event, which
> > involves branch counters sampling, is incorrectly programmed to fixed
> > counter 0 instead of the general-purpose (GP) counters 0-3 that support
So here you have 0-3, the normal 'range' notation, but then you go all
funny and use ~ instead:
> > branch counters sampling. Currently only GP counters 0~3 support branch
> > counters sampling on CWF, any event involving branch counters sampling
> > should be programed on GP counters 0~3.
> > Since the counter index of fixed> counter 0 is 32, it leads to the "src" value in below code is right
> > shifted 64 bits and trigger the "shift-out-of-bounds" warning.
> >
> > cnt = (src >> (order[j] * LBR_INFO_BR_CNTR_BITS)) & LBR_INFO_BR_CNTR_MASK;
> >
> > The root cause is the loss of the branch counters constraint for the
> > last event in the branch counters sampling event group. This results in
> > the second "instructions:p" event being programmed on fixed counter 0
> > incorrectly instead of the appropriate GP counters 0~3.
s/0~3/0-3/ ?
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply
2026-03-11 20:03 ` Peter Zijlstra
@ 2026-03-12 2:02 ` Mi, Dapeng
0 siblings, 0 replies; 10+ messages in thread
From: Mi, Dapeng @ 2026-03-12 2:02 UTC (permalink / raw)
To: Peter Zijlstra, Chen, Zide
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane,
linux-kernel, linux-perf-users, Dapeng Mi, Falcon Thomas,
Xudong Hao, stable
On 3/12/2026 4:03 AM, Peter Zijlstra wrote:
> On Fri, Mar 06, 2026 at 05:27:35PM -0800, Chen, Zide wrote:
>>
>> On 2/27/2026 9:33 PM, Dapeng Mi wrote:
>>> When running the command:
>>> 'perf record -e "{instructions,instructions:p}" -j any,counter sleep 1',
>>> a "shift-out-of-bounds" warning is reported on CWF.
>>>
>>> [ 5231.981423][ C17] UBSAN: shift-out-of-bounds in /kbuild/src/consumer/arch/x86/events/intel/lbr.c:970:15
>>> [ 5231.981428][ C17] shift exponent 64 is too large for 64-bit type 'long long unsigned int'
>>> [ 5231.981436][ C17] CPU: 17 UID: 0 PID: 211871 Comm: sleep Tainted: G S W 6.18.0-2025-12-09-intel-next-48166-g6cf574943ba3 #1 PREEMPT(none)
>>> [ 5231.981445][ C17] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
>>> [ 5231.981447][ C17] Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.IPC.3544.P98.2508260307 08/26/2025
>>> [ 5231.981449][ C17] Call Trace:
>>> [ 5231.981453][ C17] <NMI>
>>> [ 5231.981455][ C17] dump_stack_lvl+0x4b/0x70
>>> [ 5231.981463][ C17] ubsan_epilogue+0x5/0x2b
>>> [ 5231.981468][ C17] __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe6
>>> [ 5231.981472][ C17] ? __entry_text_end+0x158b/0x102259
>>> [ 5231.981475][ C17] intel_pmu_lbr_counters_reorder.isra.0.cold+0x2a/0xa7
>>> [ 5231.981480][ C17] ? __task_pid_nr_ns+0x134/0x2a0
>>> [ 5231.981483][ C17] ? __pfx_intel_pmu_lbr_counters_reorder.isra.0+0x10/0x10
>>> [ 5231.981486][ C17] ? __pfx_perf_output_sample+0x10/0x10
>>> [ 5231.981489][ C17] ? arch_perf_update_userpage+0x293/0x310
>>> [ 5231.981491][ C17] ? __pfx_arch_perf_update_userpage+0x10/0x10
>>> [ 5231.981494][ C17] ? local_clock_noinstr+0xd/0x100
>>> [ 5231.981498][ C17] ? calc_timer_values+0x2cb/0x860
>>> [ 5231.981501][ C17] ? perf_event_update_userpage+0x399/0x5b0
>>> [ 5231.981505][ C17] ? __pfx_perf_event_update_userpage+0x10/0x10
>>> [ 5231.981508][ C17] ? local_clock_noinstr+0xd/0x100
>>> [ 5231.981511][ C17] ? __perf_event_account_interrupt+0x11c/0x540
>>> [ 5231.981514][ C17] intel_pmu_lbr_save_brstack+0xc0/0x4c0
>>> [ 5231.981518][ C17] setup_arch_pebs_sample_data+0x114b/0x2400
>>> [ 5231.981522][ C17] ? __pfx_x86_perf_event_set_period+0x10/0x10
>>> [ 5231.981526][ C17] intel_pmu_drain_arch_pebs+0x64d/0xcc0
>>> [ 5231.981530][ C17] ? __pfx_intel_pmu_drain_arch_pebs+0x10/0x10
>>> [ 5231.981534][ C17] ? unwind_next_frame+0x11c5/0x1df0
>>> [ 5231.981541][ C17] ? intel_pmu_drain_bts_buffer+0xbf/0x6e0
>>> [ 5231.981545][ C17] ? __pfx_intel_pmu_drain_bts_buffer+0x10/0x10
>>> [ 5231.981550][ C17] handle_pmi_common+0x5c5/0xcb0
>>> [ 5231.981553][ C17] ? __pfx_handle_pmi_common+0x10/0x10
>>> [ 5231.981556][ C17] ? intel_idle+0x64/0xb0
>>> [ 5231.981560][ C17] ? intel_bts_interrupt+0xe5/0x4c0
>>> [ 5231.981562][ C17] ? __pfx_intel_bts_interrupt+0x10/0x10
>>> [ 5231.981565][ C17] ? intel_pmu_lbr_filter+0x27f/0x910
>>> [ 5231.981568][ C17] intel_pmu_handle_irq+0x2ed/0x600
>>> [ 5231.981571][ C17] perf_event_nmi_handler+0x219/0x280
>>> [ 5231.981575][ C17] ? __pfx_perf_event_nmi_handler+0x10/0x10
>>> [ 5231.981579][ C17] ? unwind_next_frame+0x11c5/0x1df0
>>> [ 5231.981582][ C17] nmi_handle.part.0+0x11b/0x3a0
>>> [ 5231.981585][ C17] ? unwind_next_frame+0x11c5/0x1df0
>>> [ 5231.981588][ C17] default_do_nmi+0x6b/0x180
>>> [ 5231.981591][ C17] fred_exc_nmi+0x3e/0x80
>>> [ 5231.981594][ C17] asm_fred_entrypoint_kernel+0x41/0x60
>>> [ 5231.981596][ C17] RIP: 0010:unwind_next_frame+0x11c5/0x1df0
>>> ......
> That trace could be reduced to:
>
> UBSAN: shift-out-of-bounds in /kbuild/src/consumer/arch/x86/events/intel/lbr.c:970:15
> shift exponent 64 is too large for 64-bit type 'long long unsigned int'
> ......
> intel_pmu_lbr_counters_reorder.isra.0.cold+0x2a/0xa7
> intel_pmu_lbr_save_brstack+0xc0/0x4c0
> setup_arch_pebs_sample_data+0x114b/0x2400
>
> Without loosing anything valuable.
Sure.
>
>
>>> The warning occurs because the second "instructions:p" event, which
>>> involves branch counters sampling, is incorrectly programmed to fixed
>>> counter 0 instead of the general-purpose (GP) counters 0-3 that support
> So here you have 0-3, the normal 'range' notation, but then you go all
> funny and use ~ instead:
😂
>
>>> branch counters sampling. Currently only GP counters 0~3 support branch
>>> counters sampling on CWF, any event involving branch counters sampling
>>> should be programed on GP counters 0~3.
>>> Since the counter index of fixed> counter 0 is 32, it leads to the "src" value in below code is right
>>> shifted 64 bits and trigger the "shift-out-of-bounds" warning.
>>>
>>> cnt = (src >> (order[j] * LBR_INFO_BR_CNTR_BITS)) & LBR_INFO_BR_CNTR_MASK;
>>>
>>> The root cause is the loss of the branch counters constraint for the
>>> last event in the branch counters sampling event group. This results in
>>> the second "instructions:p" event being programmed on fixed counter 0
>>> incorrectly instead of the appropriate GP counters 0~3.
> s/0~3/0-3/ ?
Sure.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply
2026-02-28 5:33 ` [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply Dapeng Mi
2026-03-07 1:27 ` Chen, Zide
@ 2026-03-11 20:16 ` Peter Zijlstra
2026-03-12 2:31 ` Mi, Dapeng
2026-03-16 9:50 ` [tip: perf/urgent] " tip-bot2 for Dapeng Mi
2 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2026-03-11 20:16 UTC (permalink / raw)
To: Dapeng Mi
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane,
linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
Falcon Thomas, Xudong Hao, stable
On Sat, Feb 28, 2026 at 01:33:20PM +0800, Dapeng Mi wrote:
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 4768236c054b..4b042d71104f 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4628,6 +4628,19 @@ static inline void intel_pmu_set_acr_caused_constr(struct perf_event *event,
> event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64);
> }
>
> +static inline int intel_set_branch_counter_constr(struct perf_event *event,
> + int *num)
> +{
> + if (branch_sample_call_stack(event))
> + return -EINVAL;
> + if (branch_sample_counters(event)) {
> + (*num)++;
> + event->hw.dyn_constraint &= x86_pmu.lbr_counters;
> + }
> +
> + return 0;
> +}
> +
> static int intel_pmu_hw_config(struct perf_event *event)
> {
> int ret = x86_pmu_hw_config(event);
> @@ -4698,21 +4711,18 @@ static int intel_pmu_hw_config(struct perf_event *event)
> * group, which requires the extra space to store the counters.
> */
> leader = event->group_leader;
> + if (intel_set_branch_counter_constr(leader, &num))
> return -EINVAL;
> leader->hw.flags |= PERF_X86_EVENT_BRANCH_COUNTERS;
>
> for_each_sibling_event(sibling, leader) {
> + if (intel_set_branch_counter_constr(sibling, &num))
> + return -EINVAL;
> + }
> +
Do the new bit is this, right?
> + if (event != leader) {
> + if (intel_set_branch_counter_constr(event, &num))
> return -EINVAL;
> }
The point being that for_each_sibling_event() will not have iterated the
event because its not on the list yet?
That wasn't really clear from the changelog and I think that deserves a
comment as well.
Let me go fix that.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply
2026-03-11 20:16 ` Peter Zijlstra
@ 2026-03-12 2:31 ` Mi, Dapeng
2026-03-12 6:41 ` Peter Zijlstra
0 siblings, 1 reply; 10+ messages in thread
From: Mi, Dapeng @ 2026-03-12 2:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane,
linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
Falcon Thomas, Xudong Hao, stable
On 3/12/2026 4:16 AM, Peter Zijlstra wrote:
> On Sat, Feb 28, 2026 at 01:33:20PM +0800, Dapeng Mi wrote:
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 4768236c054b..4b042d71104f 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -4628,6 +4628,19 @@ static inline void intel_pmu_set_acr_caused_constr(struct perf_event *event,
>> event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64);
>> }
>>
>> +static inline int intel_set_branch_counter_constr(struct perf_event *event,
>> + int *num)
>> +{
>> + if (branch_sample_call_stack(event))
>> + return -EINVAL;
>> + if (branch_sample_counters(event)) {
>> + (*num)++;
>> + event->hw.dyn_constraint &= x86_pmu.lbr_counters;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int intel_pmu_hw_config(struct perf_event *event)
>> {
>> int ret = x86_pmu_hw_config(event);
>> @@ -4698,21 +4711,18 @@ static int intel_pmu_hw_config(struct perf_event *event)
>> * group, which requires the extra space to store the counters.
>> */
>> leader = event->group_leader;
>> + if (intel_set_branch_counter_constr(leader, &num))
>> return -EINVAL;
>> leader->hw.flags |= PERF_X86_EVENT_BRANCH_COUNTERS;
>>
>> for_each_sibling_event(sibling, leader) {
>> + if (intel_set_branch_counter_constr(sibling, &num))
>> + return -EINVAL;
>> + }
>> +
> Do the new bit is this, right?
Actually not, the key change is the below one. The last event in the group
is not applied the branch counter constraint.
Assume we have a event group {cycles,instructions,branches}. When the 3rd
event "branches" is created and the function intel_pmu_hw_config() is
called for the "branches" event to check the config. The event leader is
"cycles" and the sibling event has only the "instructions" event at that
time since the 3rd event "branches" is in creation and still not added into
the sibling_list. So for_each_sibling_event() can't really iterate the
"branches" event.
>
>> + if (event != leader) {
>> + if (intel_set_branch_counter_constr(event, &num))
>> return -EINVAL;
>> }
> The point being that for_each_sibling_event() will not have iterated the
> event because its not on the list yet?
Yes.
>
> That wasn't really clear from the changelog and I think that deserves a
> comment as well.
Sure. I would add comment and enhance the changelog to make it clearer. Thanks.
>
> Let me go fix that.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply
2026-03-12 2:31 ` Mi, Dapeng
@ 2026-03-12 6:41 ` Peter Zijlstra
2026-03-12 6:52 ` Mi, Dapeng
0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2026-03-12 6:41 UTC (permalink / raw)
To: Mi, Dapeng
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane,
linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
Falcon Thomas, Xudong Hao, stable
On Thu, Mar 12, 2026 at 10:31:28AM +0800, Mi, Dapeng wrote:
>
> On 3/12/2026 4:16 AM, Peter Zijlstra wrote:
> > On Sat, Feb 28, 2026 at 01:33:20PM +0800, Dapeng Mi wrote:
> >> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> >> index 4768236c054b..4b042d71104f 100644
> >> --- a/arch/x86/events/intel/core.c
> >> +++ b/arch/x86/events/intel/core.c
> >> @@ -4628,6 +4628,19 @@ static inline void intel_pmu_set_acr_caused_constr(struct perf_event *event,
> >> event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64);
> >> }
> >>
> >> +static inline int intel_set_branch_counter_constr(struct perf_event *event,
> >> + int *num)
> >> +{
> >> + if (branch_sample_call_stack(event))
> >> + return -EINVAL;
> >> + if (branch_sample_counters(event)) {
> >> + (*num)++;
> >> + event->hw.dyn_constraint &= x86_pmu.lbr_counters;
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> static int intel_pmu_hw_config(struct perf_event *event)
> >> {
> >> int ret = x86_pmu_hw_config(event);
> >> @@ -4698,21 +4711,18 @@ static int intel_pmu_hw_config(struct perf_event *event)
> >> * group, which requires the extra space to store the counters.
> >> */
> >> leader = event->group_leader;
> >> + if (intel_set_branch_counter_constr(leader, &num))
> >> return -EINVAL;
> >> leader->hw.flags |= PERF_X86_EVENT_BRANCH_COUNTERS;
> >>
> >> for_each_sibling_event(sibling, leader) {
> >> + if (intel_set_branch_counter_constr(sibling, &num))
> >> + return -EINVAL;
> >> + }
> >> +
> > Do the new bit is this, right?
>
> Actually not, the key change is the below one. The last event in the group
> is not applied the branch counter constraint.
>
> Assume we have a event group {cycles,instructions,branches}. When the 3rd
> event "branches" is created and the function intel_pmu_hw_config() is
> called for the "branches" event to check the config. The event leader is
> "cycles" and the sibling event has only the "instructions" event at that
> time since the 3rd event "branches" is in creation and still not added into
> the sibling_list. So for_each_sibling_event() can't really iterate the
> "branches" event.
>
>
> >
> >> + if (event != leader) {
> >> + if (intel_set_branch_counter_constr(event, &num))
> >> return -EINVAL;
> >> }
> > The point being that for_each_sibling_event() will not have iterated the
> > event because its not on the list yet?
>
> Yes.
>
>
> >
> > That wasn't really clear from the changelog and I think that deserves a
> > comment as well.
>
> Sure. I would add comment and enhance the changelog to make it clearer. Thanks.
>
I already fixed everything up. Should be in queue/perf/urgent.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply
2026-03-12 6:41 ` Peter Zijlstra
@ 2026-03-12 6:52 ` Mi, Dapeng
2026-03-12 7:40 ` Peter Zijlstra
0 siblings, 1 reply; 10+ messages in thread
From: Mi, Dapeng @ 2026-03-12 6:52 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane,
linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
Falcon Thomas, Xudong Hao, stable
On 3/12/2026 2:41 PM, Peter Zijlstra wrote:
> On Thu, Mar 12, 2026 at 10:31:28AM +0800, Mi, Dapeng wrote:
>> On 3/12/2026 4:16 AM, Peter Zijlstra wrote:
>>> On Sat, Feb 28, 2026 at 01:33:20PM +0800, Dapeng Mi wrote:
>>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>>> index 4768236c054b..4b042d71104f 100644
>>>> --- a/arch/x86/events/intel/core.c
>>>> +++ b/arch/x86/events/intel/core.c
>>>> @@ -4628,6 +4628,19 @@ static inline void intel_pmu_set_acr_caused_constr(struct perf_event *event,
>>>> event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64);
>>>> }
>>>>
>>>> +static inline int intel_set_branch_counter_constr(struct perf_event *event,
>>>> + int *num)
>>>> +{
>>>> + if (branch_sample_call_stack(event))
>>>> + return -EINVAL;
>>>> + if (branch_sample_counters(event)) {
>>>> + (*num)++;
>>>> + event->hw.dyn_constraint &= x86_pmu.lbr_counters;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> static int intel_pmu_hw_config(struct perf_event *event)
>>>> {
>>>> int ret = x86_pmu_hw_config(event);
>>>> @@ -4698,21 +4711,18 @@ static int intel_pmu_hw_config(struct perf_event *event)
>>>> * group, which requires the extra space to store the counters.
>>>> */
>>>> leader = event->group_leader;
>>>> + if (intel_set_branch_counter_constr(leader, &num))
>>>> return -EINVAL;
>>>> leader->hw.flags |= PERF_X86_EVENT_BRANCH_COUNTERS;
>>>>
>>>> for_each_sibling_event(sibling, leader) {
>>>> + if (intel_set_branch_counter_constr(sibling, &num))
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>> Do the new bit is this, right?
>> Actually not, the key change is the below one. The last event in the group
>> is not applied the branch counter constraint.
>>
>> Assume we have a event group {cycles,instructions,branches}. When the 3rd
>> event "branches" is created and the function intel_pmu_hw_config() is
>> called for the "branches" event to check the config. The event leader is
>> "cycles" and the sibling event has only the "instructions" event at that
>> time since the 3rd event "branches" is in creation and still not added into
>> the sibling_list. So for_each_sibling_event() can't really iterate the
>> "branches" event.
>>
>>
>>>> + if (event != leader) {
>>>> + if (intel_set_branch_counter_constr(event, &num))
>>>> return -EINVAL;
>>>> }
>>> The point being that for_each_sibling_event() will not have iterated the
>>> event because its not on the list yet?
>> Yes.
>>
>>
>>> That wasn't really clear from the changelog and I think that deserves a
>>> comment as well.
>> Sure. I would add comment and enhance the changelog to make it clearer. Thanks.
>>
> I already fixed everything up. Should be in queue/perf/urgent.
Thanks.
Peter, As Ian points out, the patch "perf/x86: Update cap_user_rdpmc base
on rdpmc user disable state" has a bug
(https://lore.kernel.org/all/CAP-5=fWr2L6miiFZ6Km3HYEdmqp3T0NBL=WY3buKdKztW+HvmA@mail.gmail.com/).
I would posted a patch to fix the issue. Thanks.
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply
2026-03-12 6:52 ` Mi, Dapeng
@ 2026-03-12 7:40 ` Peter Zijlstra
0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2026-03-12 7:40 UTC (permalink / raw)
To: Mi, Dapeng
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane,
linux-kernel, linux-perf-users, Dapeng Mi, Zide Chen,
Falcon Thomas, Xudong Hao, stable
On Thu, Mar 12, 2026 at 02:52:43PM +0800, Mi, Dapeng wrote:
>
> Peter, As Ian points out, the patch "perf/x86: Update cap_user_rdpmc base
> on rdpmc user disable state" has a bug
> (https://lore.kernel.org/all/CAP-5=fWr2L6miiFZ6Km3HYEdmqp3T0NBL=WY3buKdKztW+HvmA@mail.gmail.com/).
> I would posted a patch to fix the issue. Thanks.
>
Yep, saw that, its gone :-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [tip: perf/urgent] perf/x86/intel: Add missing branch counters constraint apply
2026-02-28 5:33 ` [RESEND Patch 2/2] perf/x86/intel: Add missing branch counters constraint apply Dapeng Mi
2026-03-07 1:27 ` Chen, Zide
2026-03-11 20:16 ` Peter Zijlstra
@ 2026-03-16 9:50 ` tip-bot2 for Dapeng Mi
2 siblings, 0 replies; 10+ messages in thread
From: tip-bot2 for Dapeng Mi @ 2026-03-16 9:50 UTC (permalink / raw)
To: linux-tip-commits
Cc: Xudong Hao, Dapeng Mi, Peter Zijlstra (Intel), stable, x86,
linux-kernel
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 1d07bbd7ea36ea0b8dfa8068dbe67eb3a32d9590
Gitweb: https://git.kernel.org/tip/1d07bbd7ea36ea0b8dfa8068dbe67eb3a32d9590
Author: Dapeng Mi <dapeng1.mi@linux.intel.com>
AuthorDate: Sat, 28 Feb 2026 13:33:20 +08:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 12 Mar 2026 11:29:16 +01:00
perf/x86/intel: Add missing branch counters constraint apply
When running the command:
'perf record -e "{instructions,instructions:p}" -j any,counter sleep 1',
a "shift-out-of-bounds" warning is reported on CWF.
UBSAN: shift-out-of-bounds in /kbuild/src/consumer/arch/x86/events/intel/lbr.c:970:15
shift exponent 64 is too large for 64-bit type 'long long unsigned int'
......
intel_pmu_lbr_counters_reorder.isra.0.cold+0x2a/0xa7
intel_pmu_lbr_save_brstack+0xc0/0x4c0
setup_arch_pebs_sample_data+0x114b/0x2400
The warning occurs because the second "instructions:p" event, which
involves branch counters sampling, is incorrectly programmed to fixed
counter 0 instead of the general-purpose (GP) counters 0-3 that support
branch counters sampling. Currently only GP counters 0-3 support branch
counters sampling on CWF, any event involving branch counters sampling
should be programed on GP counters 0-3. Since the counter index of fixed
counter 0 is 32, it leads to the "src" value in below code is right
shifted 64 bits and trigger the "shift-out-of-bounds" warning.
cnt = (src >> (order[j] * LBR_INFO_BR_CNTR_BITS)) & LBR_INFO_BR_CNTR_MASK;
The root cause is the loss of the branch counters constraint for the
new event in the branch counters sampling event group. Since it isn't
yet part of the sibling list. This results in the second
"instructions:p" event being programmed on fixed counter 0 incorrectly
instead of the appropriate GP counters 0-3.
To address this, we apply the missing branch counters constraint for
the last event in the group. Additionally, we introduce a new function,
`intel_set_branch_counter_constr()`, to apply the branch counters
constraint and avoid code duplication.
Fixes: 33744916196b ("perf/x86/intel: Support branch counters logging")
Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260228053320.140406-2-dapeng1.mi@linux.intel.com
Cc: stable@vger.kernel.org
---
arch/x86/events/intel/core.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index cf3a4fe..36c6821 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4628,6 +4628,19 @@ static inline void intel_pmu_set_acr_caused_constr(struct perf_event *event,
event->hw.dyn_constraint &= hybrid(event->pmu, acr_cause_mask64);
}
+static inline int intel_set_branch_counter_constr(struct perf_event *event,
+ int *num)
+{
+ if (branch_sample_call_stack(event))
+ return -EINVAL;
+ if (branch_sample_counters(event)) {
+ (*num)++;
+ event->hw.dyn_constraint &= x86_pmu.lbr_counters;
+ }
+
+ return 0;
+}
+
static int intel_pmu_hw_config(struct perf_event *event)
{
int ret = x86_pmu_hw_config(event);
@@ -4698,21 +4711,19 @@ static int intel_pmu_hw_config(struct perf_event *event)
* group, which requires the extra space to store the counters.
*/
leader = event->group_leader;
- if (branch_sample_call_stack(leader))
+ if (intel_set_branch_counter_constr(leader, &num))
return -EINVAL;
- if (branch_sample_counters(leader)) {
- num++;
- leader->hw.dyn_constraint &= x86_pmu.lbr_counters;
- }
leader->hw.flags |= PERF_X86_EVENT_BRANCH_COUNTERS;
for_each_sibling_event(sibling, leader) {
- if (branch_sample_call_stack(sibling))
+ if (intel_set_branch_counter_constr(sibling, &num))
+ return -EINVAL;
+ }
+
+ /* event isn't installed as a sibling yet. */
+ if (event != leader) {
+ if (intel_set_branch_counter_constr(event, &num))
return -EINVAL;
- if (branch_sample_counters(sibling)) {
- num++;
- sibling->hw.dyn_constraint &= x86_pmu.lbr_counters;
- }
}
if (num > fls(x86_pmu.lbr_counters))
^ permalink raw reply related [flat|nested] 10+ messages in thread