public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Leo Yan <leo.yan@arm.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Rob Herring <robh@kernel.org>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	James Clark <james.clark@linaro.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kvmarm@lists.linux.dev
Subject: Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
Date: Mon, 24 Feb 2025 18:03:01 +0000	[thread overview]
Message-ID: <20250224180301.GI8144@e132581.arm.com> (raw)
In-Reply-To: <Z7yY19UtSnND5KTl@J2N7QTR9R3.cambridge.arm.com>

On Mon, Feb 24, 2025 at 04:05:43PM +0000, Mark Rutland wrote:

[...]

> > > > Just a minor concern.  I don't see any handling for task migration.
> > > > E.g., for a task is migrated from one CPU to another CPU, I expect we
> > > > need to save and restore branch records based on BRBE injection.  So
> > > > far, the driver simply invalidates all records.
> > > >
> > > > I think this topic is very likely discussed before.  If this is the
> > > > case, please ignore my comment.  Except this, the code looks good
> > > > to me.
> > > 
> > > Not really discussed on the list, but that was present in v18 (though
> > > not functional because .sched_task() hook wasn't actually enabled) and
> > > Mark removed it. His work is here[1].The only comment was:
> > > 
> > > Note: saving/restoring at context-switch doesn't interact well with
> > > event rotation (e.g. if filters change)
> > 
> > In the brbe_enable() function, it "Merge the permitted branch filters
> > of all events".  Based on current implementation, all events share the
> > same branch filter.
> 
> Critically, the brbe_enable() function merges the filters of all
> *active* events which have been installed into hardware. It does not
> track all events which can be rotated, and the resulting filter is not
> the same -- it can change as a result of rotation.

In a perf session has multiple events, and events have different branch
filters, seems to me, a simple way is to return error for this case.

I would argue BRBE is an IP for recording branches per CPU wise, it does
not support recording for event wise.  If we can unify branch filter
within a perf session, would this be much easier for handling?

> > When event rotation happens, if without context switch, in theory we
> > should can directly use the branch record (no invalidation, no injection)
> > for all events.
> 
> No; that only works in *some* cases, and will produce incorrect results
> in others.
> 
> For example, consider filtering. Imagine a PMU with a single counter,
> and two events, where event-A filters for calls-and-returns and event-B
> filters for calls-only. When switching from event-A to event-B, it's
> theoretically possible to keep the existing records around, knowing that
> the returns can be filtered out later. When switching from event-B to
> event-A we cannot keep the existing records, since there are gaps
> whenever a return should have been recorded.

Seems to me, the problem is not caused by event rotation.  We need to
calculate a correct filter in the first place - the BRBE driver should
calculate a superset for all filters of events for a session.  Then,
generate branch record based event's specific filter.

> There are a number of cases of that shape given the set of configurable
> filters. In theory it's possible to retain those in some cases, but I
> don't think that the complexity is justified.
> 
> Similarly, whenever kernel branches are recorded it's necessary to drop
> the stale branches whenever branch recording is paused, as there's
> necessarily a blackout period and hence a gap in the records.

If we save BRBE record when a process is switched out and then restore
the record when a process is switched in, should we can keep a decent
branch record for performance profiling?

I understand it might be many things happen in the middle of a task
switching or migration, but it is fine for not recording branches during
the blackout period.  The missed branch records are not very helpful for
forming a flow for a profiled program itself, especially, if we
consider we will optimize userspace program in many cases.

> Do you think that you have a case where losing branches across rotation
> *really* matters?

I agreed that event rotation case might be rare and complex.  Please see
a comment below.

> > For a context-switch case, we need to save and re-inject branch record.
> > BRBE record sticks to a process context, no matter what events have been
> > enabled.
> 
> I had originally wanted to keep per-event records around, but it doesn't
> work in all cases. One reason events get discarded at context-switch
> time is that CPU-bound events can sample branches, and would
> mis-attribute stale userspace branches to the wrong context when
> switching tasks. There are explicit comments about this in
> amd_pmu_brs_sched_task() and intel_pmu_lbr_sched_task().
> 
> Given we discard records when reprogramming events, we *could* try to
> preserve events in some cases, but I suspect that as with the rotation
> case this'll be a lot of complexity for little gain. Note that as we
> discard events when enabling the PMU, we'd throw some task-bound records
> away anyway, and practically the gain would be limited to cpu-bound
> records.
> 
> Do you have a reason why you think we *must* keep events around?

Here I am really concerned are cases when a process is preempted or
migrated.  The driver doesn't save and restore branch records for these
cases, it just invalidates all records when a task is scheduled in.

As a result, if an event overflow is close to context switch, it is
likely to capture incomplete branch records.  For a userspace-only
tracing, it is risk to capture empty branch record after preemption
and migrations.

Thanks,
Leo

  reply	other threads:[~2025-02-24 18:03 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
2025-02-18 20:39 ` [PATCH v20 01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters Rob Herring (Arm)
2025-02-18 20:39 ` [PATCH v20 02/11] perf: arm_pmu: Don't disable counter in armpmu_add() Rob Herring (Arm)
2025-02-18 20:39 ` [PATCH v20 03/11] perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event() Rob Herring (Arm)
2025-02-18 20:39 ` [PATCH v20 04/11] perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 05/11] perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event() Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 06/11] perf: apple_m1: Don't disable counter in m1_pmu_enable_event() Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 07/11] perf: arm_pmu: Move PMUv3-specific data Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 08/11] arm64/sysreg: Add BRBE registers and fields Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 09/11] arm64: Handle BRBE booting requirements Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 10/11] KVM: arm64: nvhe: Disable branch generation in nVHE guests Rob Herring (Arm)
2025-02-24 10:41   ` Leo Yan
2025-02-18 20:40 ` [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) Rob Herring (Arm)
2025-02-24 12:25   ` Leo Yan
2025-02-24 12:46     ` Rob Herring
2025-02-24 14:03       ` Leo Yan
2025-02-24 16:05         ` Mark Rutland
2025-02-24 18:03           ` Leo Yan [this message]
2025-02-25  1:31             ` Rob Herring
2025-02-25 12:38               ` Leo Yan
2025-02-25 15:35                 ` Rob Herring
2025-02-25 19:46                 ` Mark Rutland
2025-02-25 12:01             ` Mark Rutland
2025-02-25 17:48               ` Leo Yan
2025-02-25 19:04                 ` Rob Herring
2025-02-25 19:58                 ` Mark Rutland
2025-02-26 13:48                   ` Leo Yan
2025-02-26 14:26                     ` Rob Herring
2025-02-19 16:09 ` [PATCH v20 00/11] arm64/perf: Enable branch stack sampling James Clark
2025-03-01  7:05 ` Will Deacon
2025-03-03 16:44   ` Rob Herring
2025-03-04 11:25     ` Catalin Marinas
2025-03-04 16:25       ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250224180301.GI8144@e132581.arm.com \
    --to=leo.yan@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=james.clark@linaro.org \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=robh@kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox