* [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
@ 2026-01-23 16:03 James Clark
2026-01-30 20:24 ` Leo Yan
2026-02-02 19:03 ` Will Deacon
0 siblings, 2 replies; 18+ messages in thread
From: James Clark @ 2026-01-23 16:03 UTC (permalink / raw)
To: Will Deacon, Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
Leo Yan
Cc: linux-arm-kernel, linux-perf-users, linux-kernel, James Clark
The Arm ARM known issues document [1] states that the architecture will
be relaxed so that the profiling buffer must be correctly configured
when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
PMBLIMITR_EL1.FM != DISCARD:
R24557
While the Profiling Buffer is enabled, profiling is not stopped, and
Discard mode is not enabled, all of the following must be true:
* The current write pointer must be at least one sample record below
the write limit pointer.
The same relaxation also says that writes may be completely ignored:
When the Profiling Buffer is enabled, profiling is not stopped, and
Discard mode is not enabled, the PE might ignore a direct write to any
of the following Profiling Buffer registers, other than a direct write
to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
* The current write pointer, PMBPTR_EL1.
* The Limit pointer, PMBLIMITR_EL1.
* PMBSR_EL1.
When arm_spe_pmu_start() occurs, SPEProfilingStopped() is false
(PMBSR_EL1.S == 0) meaning that the write to PMBLIMITR_EL1 now becomes
the point where the buffer configuration must be correct by, rather than
the "When profiling becomes enabled" (StatisticalProfilingEnabled())
from the old rule which is much later when PMSCR_EL1 is written.
If the writes to PMBLIMITR_EL1 and PMBPTR_EL1 are re-ordered then a
misconfigured state could be observed, resulting in a buffer management
event. Or the write to PMBPTR_EL1 could be ignored.
Fix it by adding an isb() after the write to PMBPTR_EL1 to ensure that
this completes before enabling the buffer.
To avoid redundant isb()s in the IRQ handler, remove the isb() between
the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
order these happen in now that all the previous configuration is covered
by the new isb().
This issue is only present in arm_spe_pmu_start() and not in the IRQ
handler because SPEProfilingStopped() is true in the IRQ handler. Jumps
to the out_write_limit label will skip the isb() but this is ok as they
only happen if discard mode is set or the buffer isn't enabled so
correct configuration is not required.
[1]: https://developer.arm.com/documentation/102105/latest/
Fixes: d5d9696b0380 ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension")
Signed-off-by: James Clark <james.clark@linaro.org>
---
A previous version of this was posted here [1] bundled with other
changes to support running in a guest. Since then the known issues doc
linked in the commit message has been released so this is a resend of
only the critical part that also needs to be fixed for hosts.
A redundant isb() has also been removed in this version which is not
present in the previous version.
[1]: https://lore.kernel.org/linux-arm-kernel/20250701-james-spe-vm-interface-v1-0-52a2cd223d00@linaro.org/
---
drivers/perf/arm_spe_pmu.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4801115f2b54..62ae409fd5b4 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -639,6 +639,7 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
limit += (u64)buf->base;
base = (u64)buf->base + PERF_IDX2OFF(handle->head, buf);
write_sysreg_s(base, SYS_PMBPTR_EL1);
+ isb();
out_write_limit:
write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
@@ -780,10 +781,8 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
* PMBPTR might be misaligned, but we'll burn that bridge
* when we get to it.
*/
- if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)) {
+ if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED))
arm_spe_perf_aux_output_begin(handle, event);
- isb();
- }
break;
case SPE_PMU_BUF_FAULT_ACT_SPURIOUS:
/* We've seen you before, but GCC has the memory of a sieve. */
---
base-commit: c072629f05d7bca1148ab17690d7922a31423984
change-id: 20260123-james-spe-relaxation-d6621c7a68ff
Best regards,
--
James Clark <james.clark@linaro.org>
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-01-23 16:03 [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer James Clark
@ 2026-01-30 20:24 ` Leo Yan
2026-02-02 16:53 ` Will Deacon
2026-02-02 19:03 ` Will Deacon
1 sibling, 1 reply; 18+ messages in thread
From: Leo Yan @ 2026-01-30 20:24 UTC (permalink / raw)
To: James Clark
Cc: Will Deacon, Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
linux-arm-kernel, linux-perf-users, linux-kernel
On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
> The Arm ARM known issues document [1] states that the architecture will
> be relaxed so that the profiling buffer must be correctly configured
> when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> PMBLIMITR_EL1.FM != DISCARD:
>
> R24557
>
> While the Profiling Buffer is enabled, profiling is not stopped, and
> Discard mode is not enabled, all of the following must be true:
>
> * The current write pointer must be at least one sample record below
> the write limit pointer.
>
> The same relaxation also says that writes may be completely ignored:
>
> When the Profiling Buffer is enabled, profiling is not stopped, and
> Discard mode is not enabled, the PE might ignore a direct write to any
> of the following Profiling Buffer registers, other than a direct write
> to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
>
> * The current write pointer, PMBPTR_EL1.
> * The Limit pointer, PMBLIMITR_EL1.
> * PMBSR_EL1.
>
> When arm_spe_pmu_start() occurs, SPEProfilingStopped() is false
> (PMBSR_EL1.S == 0) meaning that the write to PMBLIMITR_EL1 now becomes
> the point where the buffer configuration must be correct by, rather than
> the "When profiling becomes enabled" (StatisticalProfilingEnabled())
> from the old rule which is much later when PMSCR_EL1 is written.
>
> If the writes to PMBLIMITR_EL1 and PMBPTR_EL1 are re-ordered then a
> misconfigured state could be observed, resulting in a buffer management
> event. Or the write to PMBPTR_EL1 could be ignored.
>
> Fix it by adding an isb() after the write to PMBPTR_EL1 to ensure that
> this completes before enabling the buffer.
Makes sense.
> To avoid redundant isb()s in the IRQ handler, remove the isb() between
> the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
> order these happen in now that all the previous configuration is covered
> by the new isb().
The isb() in the interrupt handler is useful and should not be removed.
See the sequence in the interrupt handler:
arm_spe_perf_aux_output_begin() {
write_sysreg_s(base, SYS_PMBPTR_EL1);
// Ensure the write pointer is ordered
isb();
write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
}
// Ensure the limit pointer is ordered
isb();
// Profiling is enabled:
// (PMBLIMITR_EL1.E==1) && (PMBSR_EL1.S==0) && (not discard mode)
write_sysreg_s(0, SYS_PMBSR_EL1);
The first isb() ensures that the write pointer update is ordered. The
second isb() ensures that the limit pointer is visible before profiling
is enabled by clearing the PMBSR_EL1.S bit. When handling a normal
maintenance interrupt, PMBSR_EL1.S is set by the SPE to stop tracing,
while PMBLIMITR_EL1.E remains set. Clearing PMBSR_EL1.S therefore
effectively re-enables profiling.
Since the second isb() is a synchronization for both the write pointer
and the limit pointer before profiling is enabled, it could be argued
that the first isb() is redundant in the interrupt handling. However,
the first isb() is crucial for the arm_spe_pmu_start() case, and keeping
it in the interrupt handler does no harm and simplifies code maintenance.
In short, if preserves the second isb() instead of removing it, the
change looks good to me.
Thanks,
Leo
> This issue is only present in arm_spe_pmu_start() and not in the IRQ
> handler because SPEProfilingStopped() is true in the IRQ handler. Jumps
> to the out_write_limit label will skip the isb() but this is ok as they
> only happen if discard mode is set or the buffer isn't enabled so
> correct configuration is not required.
>
> [1]: https://developer.arm.com/documentation/102105/latest/
>
> Fixes: d5d9696b0380 ("drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension")
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> A previous version of this was posted here [1] bundled with other
> changes to support running in a guest. Since then the known issues doc
> linked in the commit message has been released so this is a resend of
> only the critical part that also needs to be fixed for hosts.
>
> A redundant isb() has also been removed in this version which is not
> present in the previous version.
>
> [1]: https://lore.kernel.org/linux-arm-kernel/20250701-james-spe-vm-interface-v1-0-52a2cd223d00@linaro.org/
> ---
> drivers/perf/arm_spe_pmu.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 4801115f2b54..62ae409fd5b4 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -639,6 +639,7 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
> limit += (u64)buf->base;
> base = (u64)buf->base + PERF_IDX2OFF(handle->head, buf);
> write_sysreg_s(base, SYS_PMBPTR_EL1);
> + isb();
>
> out_write_limit:
> write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> @@ -780,10 +781,8 @@ static irqreturn_t arm_spe_pmu_irq_handler(int irq, void *dev)
> * PMBPTR might be misaligned, but we'll burn that bridge
> * when we get to it.
> */
> - if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)) {
> + if (!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED))
> arm_spe_perf_aux_output_begin(handle, event);
> - isb();
> - }
> break;
> case SPE_PMU_BUF_FAULT_ACT_SPURIOUS:
> /* We've seen you before, but GCC has the memory of a sieve. */
>
> ---
> base-commit: c072629f05d7bca1148ab17690d7922a31423984
> change-id: 20260123-james-spe-relaxation-d6621c7a68ff
>
> Best regards,
> --
> James Clark <james.clark@linaro.org>
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-01-30 20:24 ` Leo Yan
@ 2026-02-02 16:53 ` Will Deacon
2026-02-02 18:42 ` Leo Yan
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2026-02-02 16:53 UTC (permalink / raw)
To: Leo Yan
Cc: James Clark, Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
linux-arm-kernel, linux-perf-users, linux-kernel
On Fri, Jan 30, 2026 at 08:24:37PM +0000, Leo Yan wrote:
> On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
> > The Arm ARM known issues document [1] states that the architecture will
> > be relaxed so that the profiling buffer must be correctly configured
> > when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> > PMBLIMITR_EL1.FM != DISCARD:
> >
> > R24557
> >
> > While the Profiling Buffer is enabled, profiling is not stopped, and
> > Discard mode is not enabled, all of the following must be true:
> >
> > * The current write pointer must be at least one sample record below
> > the write limit pointer.
> >
> > The same relaxation also says that writes may be completely ignored:
> >
> > When the Profiling Buffer is enabled, profiling is not stopped, and
> > Discard mode is not enabled, the PE might ignore a direct write to any
> > of the following Profiling Buffer registers, other than a direct write
> > to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
> >
> > * The current write pointer, PMBPTR_EL1.
> > * The Limit pointer, PMBLIMITR_EL1.
> > * PMBSR_EL1.
Oh nice, since when was it ok to relax the architecture and break
existing drivers that were perfectly fine before? The SPE spec's not
worth the paper it's written on...
Anyway, we're not changing the driver without a comment next to the new
isb() explaining the backwards incompatible change.
> > When arm_spe_pmu_start() occurs, SPEProfilingStopped() is false
> > (PMBSR_EL1.S == 0) meaning that the write to PMBLIMITR_EL1 now becomes
> > the point where the buffer configuration must be correct by, rather than
> > the "When profiling becomes enabled" (StatisticalProfilingEnabled())
> > from the old rule which is much later when PMSCR_EL1 is written.
> >
> > If the writes to PMBLIMITR_EL1 and PMBPTR_EL1 are re-ordered then a
> > misconfigured state could be observed, resulting in a buffer management
> > event. Or the write to PMBPTR_EL1 could be ignored.
> >
> > Fix it by adding an isb() after the write to PMBPTR_EL1 to ensure that
> > this completes before enabling the buffer.
>
> Makes sense.
>
> > To avoid redundant isb()s in the IRQ handler, remove the isb() between
> > the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
> > order these happen in now that all the previous configuration is covered
> > by the new isb().
>
> The isb() in the interrupt handler is useful and should not be removed.
>
> See the sequence in the interrupt handler:
>
> arm_spe_perf_aux_output_begin() {
> write_sysreg_s(base, SYS_PMBPTR_EL1);
>
> // Ensure the write pointer is ordered
> isb();
>
> write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> }
>
> // Ensure the limit pointer is ordered
> isb();
>
> // Profiling is enabled:
> // (PMBLIMITR_EL1.E==1) && (PMBSR_EL1.S==0) && (not discard mode)
> write_sysreg_s(0, SYS_PMBSR_EL1);
>
> The first isb() ensures that the write pointer update is ordered. The
> second isb() ensures that the limit pointer is visible before profiling
> is enabled by clearing the PMBSR_EL1.S bit. When handling a normal
> maintenance interrupt, PMBSR_EL1.S is set by the SPE to stop tracing,
> while PMBLIMITR_EL1.E remains set. Clearing PMBSR_EL1.S therefore
> effectively re-enables profiling.
>
> Since the second isb() is a synchronization for both the write pointer
> and the limit pointer before profiling is enabled, it could be argued
> that the first isb() is redundant in the interrupt handling. However,
> the first isb() is crucial for the arm_spe_pmu_start() case, and keeping
> it in the interrupt handler does no harm and simplifies code maintenance.
>
> In short, if preserves the second isb() instead of removing it, the
> change looks good to me.
I'm not sure I follow your logic as to why both ISBs are required, but
I'd have thought that if perf_aux_output_begin() fails when called from
arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
because we're going to clear pmblimitr_el1 to 0 and that surely has
to be ordered before clearing pmbsr?
Will
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-02 16:53 ` Will Deacon
@ 2026-02-02 18:42 ` Leo Yan
2026-02-02 18:57 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2026-02-02 18:42 UTC (permalink / raw)
To: Will Deacon
Cc: James Clark, Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
linux-arm-kernel, linux-perf-users, linux-kernel
On Mon, Feb 02, 2026 at 04:53:57PM +0000, Will Deacon wrote:
[...]
> > > To avoid redundant isb()s in the IRQ handler, remove the isb() between
> > > the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
> > > order these happen in now that all the previous configuration is covered
> > > by the new isb().
> >
> > The isb() in the interrupt handler is useful and should not be removed.
> >
> > See the sequence in the interrupt handler:
> >
> > arm_spe_perf_aux_output_begin() {
> > write_sysreg_s(base, SYS_PMBPTR_EL1);
> >
> > // Ensure the write pointer is ordered
> > isb();
> >
> > write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> > }
> >
> > // Ensure the limit pointer is ordered
> > isb();
> >
> > // Profiling is enabled:
> > // (PMBLIMITR_EL1.E==1) && (PMBSR_EL1.S==0) && (not discard mode)
> > write_sysreg_s(0, SYS_PMBSR_EL1);
> >
> > The first isb() ensures that the write pointer update is ordered. The
> > second isb() ensures that the limit pointer is visible before profiling
> > is enabled by clearing the PMBSR_EL1.S bit. When handling a normal
> > maintenance interrupt, PMBSR_EL1.S is set by the SPE to stop tracing,
> > while PMBLIMITR_EL1.E remains set. Clearing PMBSR_EL1.S therefore
> > effectively re-enables profiling.
> >
> > Since the second isb() is a synchronization for both the write pointer
> > and the limit pointer before profiling is enabled, it could be argued
> > that the first isb() is redundant in the interrupt handling. However,
> > the first isb() is crucial for the arm_spe_pmu_start() case, and keeping
> > it in the interrupt handler does no harm and simplifies code maintenance.
> >
> > In short, if preserves the second isb() instead of removing it, the
> > change looks good to me.
>
> I'm not sure I follow your logic as to why both ISBs are required, but
> I'd have thought that if perf_aux_output_begin() fails when called from
> arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
> because we're going to clear pmblimitr_el1 to 0 and that surely has
> to be ordered before clearing pmbsr?
I think the ISB after arm_spe_perf_aux_output_begin() in the irq
handler is required for both the failure and success cases.
For a normal maintenance interrupt, an ISB is inserted between writing
PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
visible before tracing restarts. This ensures that the following
conditions are safely met:
"While the Profiling Buffer is enabled, profiling is not stopped, and
Discard mode is not enabled, all of the following must be true:
The current write pointer must be at least one sample record below
the write limit pointer.
PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
regardless of the value of the applicable TBI bit."
Thanks,
Leo
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-02 18:42 ` Leo Yan
@ 2026-02-02 18:57 ` Will Deacon
2026-02-02 19:14 ` Leo Yan
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2026-02-02 18:57 UTC (permalink / raw)
To: Leo Yan
Cc: James Clark, Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
linux-arm-kernel, linux-perf-users, linux-kernel
On Mon, Feb 02, 2026 at 06:42:34PM +0000, Leo Yan wrote:
> On Mon, Feb 02, 2026 at 04:53:57PM +0000, Will Deacon wrote:
>
> [...]
>
> > > > To avoid redundant isb()s in the IRQ handler, remove the isb() between
> > > > the PMBLIMITR_EL1 write and SYS_PMBSR_EL1 as it doesn't matter which
> > > > order these happen in now that all the previous configuration is covered
> > > > by the new isb().
> > >
> > > The isb() in the interrupt handler is useful and should not be removed.
> > >
> > > See the sequence in the interrupt handler:
> > >
> > > arm_spe_perf_aux_output_begin() {
> > > write_sysreg_s(base, SYS_PMBPTR_EL1);
> > >
> > > // Ensure the write pointer is ordered
> > > isb();
> > >
> > > write_sysreg_s(limit, SYS_PMBLIMITR_EL1);
> > > }
> > >
> > > // Ensure the limit pointer is ordered
> > > isb();
> > >
> > > // Profiling is enabled:
> > > // (PMBLIMITR_EL1.E==1) && (PMBSR_EL1.S==0) && (not discard mode)
> > > write_sysreg_s(0, SYS_PMBSR_EL1);
> > >
> > > The first isb() ensures that the write pointer update is ordered. The
> > > second isb() ensures that the limit pointer is visible before profiling
> > > is enabled by clearing the PMBSR_EL1.S bit. When handling a normal
> > > maintenance interrupt, PMBSR_EL1.S is set by the SPE to stop tracing,
> > > while PMBLIMITR_EL1.E remains set. Clearing PMBSR_EL1.S therefore
> > > effectively re-enables profiling.
> > >
> > > Since the second isb() is a synchronization for both the write pointer
> > > and the limit pointer before profiling is enabled, it could be argued
> > > that the first isb() is redundant in the interrupt handling. However,
> > > the first isb() is crucial for the arm_spe_pmu_start() case, and keeping
> > > it in the interrupt handler does no harm and simplifies code maintenance.
> > >
> > > In short, if preserves the second isb() instead of removing it, the
> > > change looks good to me.
> >
> > I'm not sure I follow your logic as to why both ISBs are required, but
> > I'd have thought that if perf_aux_output_begin() fails when called from
> > arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
> > because we're going to clear pmblimitr_el1 to 0 and that surely has
> > to be ordered before clearing pmbsr?
>
> I think the ISB after arm_spe_perf_aux_output_begin() in the irq
> handler is required for both the failure and success cases.
>
> For a normal maintenance interrupt, an ISB is inserted between writing
> PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
> visible before tracing restarts. This ensures that the following
> conditions are safely met:
>
> "While the Profiling Buffer is enabled, profiling is not stopped, and
> Discard mode is not enabled, all of the following must be true:
>
> The current write pointer must be at least one sample record below
> the write limit pointer.
>
> PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
> regardless of the value of the applicable TBI bit."
Hmm, so let's say we've executed the first ISB. At that point, the
Profiling Buffer is disabled (PMBLIMITR_EL1.E = 0) and profiling is
stopped (PMBSR_EL1.S = 1). If we *don't* have the second ISB then either
PMBLIMITR_EL1 is written first or PMBSR_EL1 is written first. But the
text you quoted will only come into effect once they've both happened,
right? In which case, why does the order matter for the success case?
Will
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-02 18:57 ` Will Deacon
@ 2026-02-02 19:14 ` Leo Yan
2026-02-03 9:29 ` James Clark
0 siblings, 1 reply; 18+ messages in thread
From: Leo Yan @ 2026-02-02 19:14 UTC (permalink / raw)
To: Will Deacon
Cc: James Clark, Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
linux-arm-kernel, linux-perf-users, linux-kernel
On Mon, Feb 02, 2026 at 06:57:11PM +0000, Will Deacon wrote:
[...]
> > > I'm not sure I follow your logic as to why both ISBs are required, but
> > > I'd have thought that if perf_aux_output_begin() fails when called from
> > > arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
> > > because we're going to clear pmblimitr_el1 to 0 and that surely has
> > > to be ordered before clearing pmbsr?
> >
> > I think the ISB after arm_spe_perf_aux_output_begin() in the irq
> > handler is required for both the failure and success cases.
> >
> > For a normal maintenance interrupt, an ISB is inserted between writing
> > PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
> > visible before tracing restarts. This ensures that the following
> > conditions are safely met:
> >
> > "While the Profiling Buffer is enabled, profiling is not stopped, and
> > Discard mode is not enabled, all of the following must be true:
> >
> > The current write pointer must be at least one sample record below
> > the write limit pointer.
> >
> > PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
> > regardless of the value of the applicable TBI bit."
>
> Hmm, so let's say we've executed the first ISB. At that point, the
> Profiling Buffer is disabled (PMBLIMITR_EL1.E = 0) and profiling is
> stopped (PMBSR_EL1.S = 1).
This is not true. PMBLIMITR_EL1.E is always 1 during interrupt
handling.
> If we *don't* have the second ISB then either
> PMBLIMITR_EL1 is written first or PMBSR_EL1 is written first. But the
> text you quoted will only come into effect once they've both happened,
> right? In which case, why does the order matter for the success case?
Yes, both PMBLIMITR_EL1.E == 1 and PMBSR_EL1.S == 0 must be true to
enable tracing.
However, the tricky part is that PMBLIMITR_EL1.E remains 1 throughout
the sequence. Writing PMBLIMITR_EL1 effectively only sets the limit,
while clearing PMBSR_EL1 is the distinct step that enables tracing.
Thanks,
Leo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-02 19:14 ` Leo Yan
@ 2026-02-03 9:29 ` James Clark
2026-02-03 9:32 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2026-02-03 9:29 UTC (permalink / raw)
To: Leo Yan, Will Deacon
Cc: Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
linux-arm-kernel, linux-perf-users, linux-kernel
On 02/02/2026 7:14 pm, Leo Yan wrote:
> On Mon, Feb 02, 2026 at 06:57:11PM +0000, Will Deacon wrote:
>
> [...]
>
>
>>>> I'm not sure I follow your logic as to why both ISBs are required, but
>>>> I'd have thought that if perf_aux_output_begin() fails when called from
>>>> arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
>>>> because we're going to clear pmblimitr_el1 to 0 and that surely has
>>>> to be ordered before clearing pmbsr?
>>>
>>> I think the ISB after arm_spe_perf_aux_output_begin() in the irq
>>> handler is required for both the failure and success cases.
>>>
>>> For a normal maintenance interrupt, an ISB is inserted between writing
>>> PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
>>> visible before tracing restarts. This ensures that the following
>>> conditions are safely met:
>>>
>>> "While the Profiling Buffer is enabled, profiling is not stopped, and
>>> Discard mode is not enabled, all of the following must be true:
>>>
>>> The current write pointer must be at least one sample record below
>>> the write limit pointer.
>>>
>>> PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
>>> regardless of the value of the applicable TBI bit."
>>
>> Hmm, so let's say we've executed the first ISB. At that point, the
>> Profiling Buffer is disabled (PMBLIMITR_EL1.E = 0) and profiling is
>> stopped (PMBSR_EL1.S = 1).
>
> This is not true. PMBLIMITR_EL1.E is always 1 during interrupt
> handling.
>
>> If we *don't* have the second ISB then either
>> PMBLIMITR_EL1 is written first or PMBSR_EL1 is written first. But the
>> text you quoted will only come into effect once they've both happened,
>> right? In which case, why does the order matter for the success case?
>
> Yes, both PMBLIMITR_EL1.E == 1 and PMBSR_EL1.S == 0 must be true to
> enable tracing.
>
> However, the tricky part is that PMBLIMITR_EL1.E remains 1 throughout
> the sequence. Writing PMBLIMITR_EL1 effectively only sets the limit,
> while clearing PMBSR_EL1 is the distinct step that enables tracing.
>
> Thanks,
> Leo
I think Leo is correct that the old isb() is still needed. I removed it
under the assumption that PMBLIMITR_EL1.E was unset in the interrupt
handler. Possibly because the previous version re-arranged the handler
to do that.
If PMBLIMITR_EL1.E is set, we have to make sure clearing PMBSR_EL1 comes
last as it's the thing that defines the point where both pointers must
be correct by.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-03 9:29 ` James Clark
@ 2026-02-03 9:32 ` Will Deacon
0 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2026-02-03 9:32 UTC (permalink / raw)
To: James Clark
Cc: Leo Yan, Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
linux-arm-kernel, linux-perf-users, linux-kernel
On Tue, Feb 03, 2026 at 09:29:56AM +0000, James Clark wrote:
>
>
> On 02/02/2026 7:14 pm, Leo Yan wrote:
> > On Mon, Feb 02, 2026 at 06:57:11PM +0000, Will Deacon wrote:
> >
> > [...]
> >
> >
> > > > > I'm not sure I follow your logic as to why both ISBs are required, but
> > > > > I'd have thought that if perf_aux_output_begin() fails when called from
> > > > > arm_spe_perf_aux_output_begin() in the irqhandler, we need the ISB
> > > > > because we're going to clear pmblimitr_el1 to 0 and that surely has
> > > > > to be ordered before clearing pmbsr?
> > > >
> > > > I think the ISB after arm_spe_perf_aux_output_begin() in the irq
> > > > handler is required for both the failure and success cases.
> > > >
> > > > For a normal maintenance interrupt, an ISB is inserted between writing
> > > > PMBLIMITR_EL1 and PMBSR_EL1 to ensure that a valid limit write is
> > > > visible before tracing restarts. This ensures that the following
> > > > conditions are safely met:
> > > >
> > > > "While the Profiling Buffer is enabled, profiling is not stopped, and
> > > > Discard mode is not enabled, all of the following must be true:
> > > >
> > > > The current write pointer must be at least one sample record below
> > > > the write limit pointer.
> > > >
> > > > PMBPTR_EL1.PTR[63:56] must equal PMBLIMITR_EL1.LIMIT[63:56],
> > > > regardless of the value of the applicable TBI bit."
> > >
> > > Hmm, so let's say we've executed the first ISB. At that point, the
> > > Profiling Buffer is disabled (PMBLIMITR_EL1.E = 0) and profiling is
> > > stopped (PMBSR_EL1.S = 1).
> >
> > This is not true. PMBLIMITR_EL1.E is always 1 during interrupt
> > handling.
Ah, yes, thank you for correcting me here.
> > > If we *don't* have the second ISB then either
> > > PMBLIMITR_EL1 is written first or PMBSR_EL1 is written first. But the
> > > text you quoted will only come into effect once they've both happened,
> > > right? In which case, why does the order matter for the success case?
> >
> > Yes, both PMBLIMITR_EL1.E == 1 and PMBSR_EL1.S == 0 must be true to
> > enable tracing.
> >
> > However, the tricky part is that PMBLIMITR_EL1.E remains 1 throughout
> > the sequence. Writing PMBLIMITR_EL1 effectively only sets the limit,
> > while clearing PMBSR_EL1 is the distinct step that enables tracing.
> >
> > Thanks,
> > Leo
>
> I think Leo is correct that the old isb() is still needed. I removed it
> under the assumption that PMBLIMITR_EL1.E was unset in the interrupt
> handler. Possibly because the previous version re-arranged the handler to do
> that.
>
> If PMBLIMITR_EL1.E is set, we have to make sure clearing PMBSR_EL1 comes
> last as it's the thing that defines the point where both pointers must be
> correct by.
Agreed that we can't remove the existing isb, but please see my other
reply as I'm not entirely sure we need to add an extra isb to handle the
arch relaxation.
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-01-23 16:03 [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer James Clark
2026-01-30 20:24 ` Leo Yan
@ 2026-02-02 19:03 ` Will Deacon
2026-02-03 10:46 ` James Clark
1 sibling, 1 reply; 18+ messages in thread
From: Will Deacon @ 2026-02-02 19:03 UTC (permalink / raw)
To: James Clark
Cc: Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel
On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
> The Arm ARM known issues document [1] states that the architecture will
> be relaxed so that the profiling buffer must be correctly configured
> when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> PMBLIMITR_EL1.FM != DISCARD:
>
> R24557
>
> While the Profiling Buffer is enabled, profiling is not stopped, and
> Discard mode is not enabled, all of the following must be true:
>
> * The current write pointer must be at least one sample record below
> the write limit pointer.
>
> The same relaxation also says that writes may be completely ignored:
>
> When the Profiling Buffer is enabled, profiling is not stopped, and
> Discard mode is not enabled, the PE might ignore a direct write to any
> of the following Profiling Buffer registers, other than a direct write
> to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
>
> * The current write pointer, PMBPTR_EL1.
> * The Limit pointer, PMBLIMITR_EL1.
> * PMBSR_EL1.
Thinking about this some more, does that mean that the direct write to
PMBPTR_EL1 performs an indirect read of PMBLIMITR_EL1 so that it can
determine the write-ignore semantics? If so, doesn't that mean that
we'll get order against a subsequent direct write of PMBLIMITR_EL1
without an ISB thanks to table "D24-1 Synchronization requirements"
which says that an indirect read followed by a direct write doesn't
require synchronisation?
There's also a sentence above the table stating:
"Direct writes to System registers are not allowed to affect any
instructions appearing in program order before the direct write."
so after all that, I'm not really sure why the ISB is required.
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-02 19:03 ` Will Deacon
@ 2026-02-03 10:46 ` James Clark
2026-02-03 11:07 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2026-02-03 10:46 UTC (permalink / raw)
To: Will Deacon
Cc: Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel
On 02/02/2026 7:03 pm, Will Deacon wrote:
> On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
>> The Arm ARM known issues document [1] states that the architecture will
>> be relaxed so that the profiling buffer must be correctly configured
>> when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
>> PMBLIMITR_EL1.FM != DISCARD:
>>
>> R24557
>>
>> While the Profiling Buffer is enabled, profiling is not stopped, and
>> Discard mode is not enabled, all of the following must be true:
>>
>> * The current write pointer must be at least one sample record below
>> the write limit pointer.
>>
>> The same relaxation also says that writes may be completely ignored:
>>
>> When the Profiling Buffer is enabled, profiling is not stopped, and
>> Discard mode is not enabled, the PE might ignore a direct write to any
>> of the following Profiling Buffer registers, other than a direct write
>> to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
>>
>> * The current write pointer, PMBPTR_EL1.
>> * The Limit pointer, PMBLIMITR_EL1.
>> * PMBSR_EL1.
>
> Thinking about this some more, does that mean that the direct write to
> PMBPTR_EL1 performs an indirect read of PMBLIMITR_EL1 so that it can
> determine the write-ignore semantics? If so, doesn't that mean that
> we'll get order against a subsequent direct write of PMBLIMITR_EL1
> without an ISB thanks to table "D24-1 Synchronization requirements"
> which says that an indirect read followed by a direct write doesn't
> require synchronisation?
>
> There's also a sentence above the table stating:
>
> "Direct writes to System registers are not allowed to affect any
> instructions appearing in program order before the direct write."
>
> so after all that, I'm not really sure why the ISB is required.
>
> Will
We were under the impression that this was required for the SPU as it is
treated as a separate entity than the PE.
In "D17.9 Synchronization and Statistical Profiling" there is:
INDWCG
A Context Synchronization event guarantees that a direct write to a
System register made by the PE in program order before the Context
synchronization event are observable by indirect reads and indirect
writes of the same System register made by a profiling operation
relating to a sampled operation in program order after the Context
synchronization event.
That specifically mentions an indirect read following a direct write,
which seems to contradict D24-1. Although I thought this is a special
case for SPE.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-03 10:46 ` James Clark
@ 2026-02-03 11:07 ` Will Deacon
2026-02-06 9:50 ` James Clark
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2026-02-03 11:07 UTC (permalink / raw)
To: James Clark
Cc: Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel
On Tue, Feb 03, 2026 at 10:46:37AM +0000, James Clark wrote:
>
>
> On 02/02/2026 7:03 pm, Will Deacon wrote:
> > On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
> > > The Arm ARM known issues document [1] states that the architecture will
> > > be relaxed so that the profiling buffer must be correctly configured
> > > when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> > > PMBLIMITR_EL1.FM != DISCARD:
> > >
> > > R24557
> > >
> > > While the Profiling Buffer is enabled, profiling is not stopped, and
> > > Discard mode is not enabled, all of the following must be true:
> > >
> > > * The current write pointer must be at least one sample record below
> > > the write limit pointer.
> > >
> > > The same relaxation also says that writes may be completely ignored:
> > >
> > > When the Profiling Buffer is enabled, profiling is not stopped, and
> > > Discard mode is not enabled, the PE might ignore a direct write to any
> > > of the following Profiling Buffer registers, other than a direct write
> > > to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
> > >
> > > * The current write pointer, PMBPTR_EL1.
> > > * The Limit pointer, PMBLIMITR_EL1.
> > > * PMBSR_EL1.
> >
> > Thinking about this some more, does that mean that the direct write to
> > PMBPTR_EL1 performs an indirect read of PMBLIMITR_EL1 so that it can
> > determine the write-ignore semantics? If so, doesn't that mean that
> > we'll get order against a subsequent direct write of PMBLIMITR_EL1
> > without an ISB thanks to table "D24-1 Synchronization requirements"
> > which says that an indirect read followed by a direct write doesn't
> > require synchronisation?
> >
> > There's also a sentence above the table stating:
> >
> > "Direct writes to System registers are not allowed to affect any
> > instructions appearing in program order before the direct write."
> >
> > so after all that, I'm not really sure why the ISB is required.
> >
> > Will
>
> We were under the impression that this was required for the SPU as it is
> treated as a separate entity than the PE.
>
> In "D17.9 Synchronization and Statistical Profiling" there is:
>
> INDWCG
>
> A Context Synchronization event guarantees that a direct write to a
> System register made by the PE in program order before the Context
> synchronization event are observable by indirect reads and indirect
> writes of the same System register made by a profiling operation
> relating to a sampled operation in program order after the Context
> synchronization event.
>
> That specifically mentions an indirect read following a direct write, which
> seems to contradict D24-1. Although I thought this is a special case for
> SPE.
My reading of the the text above is that it is covering the direct write
-> indirect read case, whereas I think the case in the SPE driver that
we're considering for your patch is when we have an indirect read
followed by a direct write.
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-03 11:07 ` Will Deacon
@ 2026-02-06 9:50 ` James Clark
2026-02-19 12:08 ` James Clark
0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2026-02-06 9:50 UTC (permalink / raw)
To: Will Deacon
Cc: Mark Rutland, Catalin Marinas, Alexandru Elisei,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel
On 03/02/2026 11:07 am, Will Deacon wrote:
> On Tue, Feb 03, 2026 at 10:46:37AM +0000, James Clark wrote:
>>
>>
>> On 02/02/2026 7:03 pm, Will Deacon wrote:
>>> On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
>>>> The Arm ARM known issues document [1] states that the architecture will
>>>> be relaxed so that the profiling buffer must be correctly configured
>>>> when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
>>>> PMBLIMITR_EL1.FM != DISCARD:
>>>>
>>>> R24557
>>>>
>>>> While the Profiling Buffer is enabled, profiling is not stopped, and
>>>> Discard mode is not enabled, all of the following must be true:
>>>>
>>>> * The current write pointer must be at least one sample record below
>>>> the write limit pointer.
>>>>
>>>> The same relaxation also says that writes may be completely ignored:
>>>>
>>>> When the Profiling Buffer is enabled, profiling is not stopped, and
>>>> Discard mode is not enabled, the PE might ignore a direct write to any
>>>> of the following Profiling Buffer registers, other than a direct write
>>>> to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
>>>>
>>>> * The current write pointer, PMBPTR_EL1.
>>>> * The Limit pointer, PMBLIMITR_EL1.
>>>> * PMBSR_EL1.
>>>
>>> Thinking about this some more, does that mean that the direct write to
>>> PMBPTR_EL1 performs an indirect read of PMBLIMITR_EL1 so that it can
>>> determine the write-ignore semantics? If so, doesn't that mean that
>>> we'll get order against a subsequent direct write of PMBLIMITR_EL1
>>> without an ISB thanks to table "D24-1 Synchronization requirements"
>>> which says that an indirect read followed by a direct write doesn't
>>> require synchronisation?
>>>
>>> There's also a sentence above the table stating:
>>>
>>> "Direct writes to System registers are not allowed to affect any
>>> instructions appearing in program order before the direct write."
>>>
>>> so after all that, I'm not really sure why the ISB is required.
>>>
>>> Will
>>
>> We were under the impression that this was required for the SPU as it is
>> treated as a separate entity than the PE.
>>
>> In "D17.9 Synchronization and Statistical Profiling" there is:
>>
>> INDWCG
>>
>> A Context Synchronization event guarantees that a direct write to a
>> System register made by the PE in program order before the Context
>> synchronization event are observable by indirect reads and indirect
>> writes of the same System register made by a profiling operation
>> relating to a sampled operation in program order after the Context
>> synchronization event.
>>
>> That specifically mentions an indirect read following a direct write, which
>> seems to contradict D24-1. Although I thought this is a special case for
>> SPE.
>
> My reading of the the text above is that it is covering the direct write
> -> indirect read case, whereas I think the case in the SPE driver that
> we're considering for your patch is when we have an indirect read
> followed by a direct write.
>
> Will
Yeah, and that text also only applies to "profiling operations", not
writes to PMBPTR and PMBLIMITR.
Upon further investigation you are correct about the isb() not being
required, even with the new relaxation. Seems like we just accepted that
the relaxation required some change to the driver without really
thinking about it. But yeah thanks for looking in detail and catching it.
So we can drop this now. Sorry for the noise.
James
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-06 9:50 ` James Clark
@ 2026-02-19 12:08 ` James Clark
2026-02-19 12:57 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2026-02-19 12:08 UTC (permalink / raw)
To: Will Deacon, Michael Williams, Alexandru Elisei
Cc: Mark Rutland, Catalin Marinas, Anshuman Khandual, Rob Herring,
Suzuki Poulose, Robin Murphy, Leo Yan, linux-arm-kernel,
linux-perf-users, linux-kernel
On 06/02/2026 9:50 am, James Clark wrote:
>
>
> On 03/02/2026 11:07 am, Will Deacon wrote:
>> On Tue, Feb 03, 2026 at 10:46:37AM +0000, James Clark wrote:
>>>
>>>
>>> On 02/02/2026 7:03 pm, Will Deacon wrote:
>>>> On Fri, Jan 23, 2026 at 04:03:53PM +0000, James Clark wrote:
>>>>> The Arm ARM known issues document [1] states that the architecture
>>>>> will
>>>>> be relaxed so that the profiling buffer must be correctly configured
>>>>> when ProfilingBufferEnabled() && !SPEProfilingStopped() &&
>>>>> PMBLIMITR_EL1.FM != DISCARD:
>>>>>
>>>>> R24557
>>>>>
>>>>> While the Profiling Buffer is enabled, profiling is not
>>>>> stopped, and
>>>>> Discard mode is not enabled, all of the following must be true:
>>>>>
>>>>> * The current write pointer must be at least one sample record
>>>>> below
>>>>> the write limit pointer.
>>>>>
>>>>> The same relaxation also says that writes may be completely ignored:
>>>>>
>>>>> When the Profiling Buffer is enabled, profiling is not stopped,
>>>>> and
>>>>> Discard mode is not enabled, the PE might ignore a direct write
>>>>> to any
>>>>> of the following Profiling Buffer registers, other than a
>>>>> direct write
>>>>> to PMBLIMITR_EL1 that clears PMBLIMITR_EL1.E from 1 to 0:
>>>>>
>>>>> * The current write pointer, PMBPTR_EL1.
>>>>> * The Limit pointer, PMBLIMITR_EL1.
>>>>> * PMBSR_EL1.
>>>>
>>>> Thinking about this some more, does that mean that the direct write to
>>>> PMBPTR_EL1 performs an indirect read of PMBLIMITR_EL1 so that it can
>>>> determine the write-ignore semantics? If so, doesn't that mean that
>>>> we'll get order against a subsequent direct write of PMBLIMITR_EL1
>>>> without an ISB thanks to table "D24-1 Synchronization requirements"
>>>> which says that an indirect read followed by a direct write doesn't
>>>> require synchronisation?
>>>>
>>>> There's also a sentence above the table stating:
>>>>
>>>> "Direct writes to System registers are not allowed to affect any
>>>> instructions appearing in program order before the direct write."
>>>>
>>>> so after all that, I'm not really sure why the ISB is required.
>>>>
>>>> Will
>>>
>>> We were under the impression that this was required for the SPU as it is
>>> treated as a separate entity than the PE.
>>>
>>> In "D17.9 Synchronization and Statistical Profiling" there is:
>>>
>>> INDWCG
>>>
>>> A Context Synchronization event guarantees that a direct write to a
>>> System register made by the PE in program order before the Context
>>> synchronization event are observable by indirect reads and indirect
>>> writes of the same System register made by a profiling operation
>>> relating to a sampled operation in program order after the Context
>>> synchronization event.
>>>
>>> That specifically mentions an indirect read following a direct write,
>>> which
>>> seems to contradict D24-1. Although I thought this is a special case for
>>> SPE.
>>
>> My reading of the the text above is that it is covering the direct write
>> -> indirect read case, whereas I think the case in the SPE driver that
>> we're considering for your patch is when we have an indirect read
>> followed by a direct write.
>>
>> Will
>
> Yeah, and that text also only applies to "profiling operations", not
> writes to PMBPTR and PMBLIMITR.
>
> Upon further investigation you are correct about the isb() not being
> required, even with the new relaxation. Seems like we just accepted that
> the relaxation required some change to the driver without really
> thinking about it. But yeah thanks for looking in detail and catching it.
>
> So we can drop this now. Sorry for the noise.
>
> James
>
Hi Will,
I'm back to drag this up again. So I think all of the above discussion
relies on the ordering given by the indirect read needed for the "might
ignore a direct write..." part. But it's _might_ ignore a direct write,
it's possible for an implementation to not do that, so there are two
possible implementations:
#1 Where there is an indirect read to give the write ignore outcome
#2 Where there is no write ignore outcome so it doesn't require an
indirect read
For #2 there's nothing to force the ordering. We're writing to two
different registers (PMBPTR_EL1 and PMBLIMITR_EL1) and we have to have
the PMBLIMITR_EL1 write come second for the buffer to be considered
configured correctly. For example if the old value of PMBPTR_EL1 is
higher than the new PMBLIMITR_EL1 and the write to PMBLIMITR_EL1 happens
first then it's misconfigured. That's why we think we need the isb() here.
Thanks
James
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-19 12:08 ` James Clark
@ 2026-02-19 12:57 ` Will Deacon
2026-02-19 13:51 ` James Clark
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2026-02-19 12:57 UTC (permalink / raw)
To: James Clark
Cc: Michael Williams, Alexandru Elisei, Mark Rutland, Catalin Marinas,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel
On Thu, Feb 19, 2026 at 12:08:27PM +0000, James Clark wrote:
> I'm back to drag this up again. So I think all of the above discussion
> relies on the ordering given by the indirect read needed for the "might
> ignore a direct write..." part. But it's _might_ ignore a direct write, it's
> possible for an implementation to not do that, so there are two possible
> implementations:
>
> #1 Where there is an indirect read to give the write ignore outcome
> #2 Where there is no write ignore outcome so it doesn't require an
> indirect read
>
> For #2 there's nothing to force the ordering. We're writing to two different
> registers (PMBPTR_EL1 and PMBLIMITR_EL1) and we have to have the
> PMBLIMITR_EL1 write come second for the buffer to be considered configured
> correctly. For example if the old value of PMBPTR_EL1 is higher than the new
> PMBLIMITR_EL1 and the write to PMBLIMITR_EL1 happens first then it's
> misconfigured. That's why we think we need the isb() here.
I thought profiling was disabled in these cases, so why is it
misconfigured?
If it is misconfigured, what can go wrong given that we're either stopped
or pmscr is clear?
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-19 12:57 ` Will Deacon
@ 2026-02-19 13:51 ` James Clark
2026-02-19 14:03 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2026-02-19 13:51 UTC (permalink / raw)
To: Will Deacon
Cc: Michael Williams, Alexandru Elisei, Mark Rutland, Catalin Marinas,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel
On 19/02/2026 12:57 pm, Will Deacon wrote:
> On Thu, Feb 19, 2026 at 12:08:27PM +0000, James Clark wrote:
>> I'm back to drag this up again. So I think all of the above discussion
>> relies on the ordering given by the indirect read needed for the "might
>> ignore a direct write..." part. But it's _might_ ignore a direct write, it's
>> possible for an implementation to not do that, so there are two possible
>> implementations:
>>
>> #1 Where there is an indirect read to give the write ignore outcome
>> #2 Where there is no write ignore outcome so it doesn't require an
>> indirect read
>>
>> For #2 there's nothing to force the ordering. We're writing to two different
>> registers (PMBPTR_EL1 and PMBLIMITR_EL1) and we have to have the
>> PMBLIMITR_EL1 write come second for the buffer to be considered configured
>> correctly. For example if the old value of PMBPTR_EL1 is higher than the new
>> PMBLIMITR_EL1 and the write to PMBLIMITR_EL1 happens first then it's
>> misconfigured. That's why we think we need the isb() here.
>
> I thought profiling was disabled in these cases, so why is it
> misconfigured?
>
> If it is misconfigured, what can go wrong given that we're either stopped
> or pmscr is clear?
>
> Will
It is stopped in the interrupt handler because PMBSR_EL1.S = 1. But in
arm_spe_pmu_start(), PMBSR_EL1.S = 0 so it's not stopped. And with
PMBPTR_EL1 still being set to the value from the last session it could
be higher than PMBLIMITR_EL1.
PMSCR_EL1 doesn't affect SPEProfilingStopped(), it's only:
boolean stopped = (PMBSR_EL1.S == '1');
The conditions for when the buffer needs to be configured correctly from
R24557 are:
ProfilingBufferEnabled() && !SPEProfilingStopped() &&
PMBLIMITR_EL1.FM != DISCARD
I think even if PMSCR_EL1 is clear you can still get a buffer management
error, even if no samples were going to be written into the buffer. It
just says:
While the Profiling Buffer is enabled, profiling is not stopped, and
Discard mode is not enabled, all of the following must be true:
* The current write pointer must be at least one sample record below
the write limit pointer.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-19 13:51 ` James Clark
@ 2026-02-19 14:03 ` Will Deacon
2026-02-19 14:15 ` James Clark
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2026-02-19 14:03 UTC (permalink / raw)
To: James Clark
Cc: Michael Williams, Alexandru Elisei, Mark Rutland, Catalin Marinas,
Anshuman Khandual, Rob Herring, Suzuki Poulose, Robin Murphy,
Leo Yan, linux-arm-kernel, linux-perf-users, linux-kernel
On Thu, Feb 19, 2026 at 01:51:26PM +0000, James Clark wrote:
> On 19/02/2026 12:57 pm, Will Deacon wrote:
> > On Thu, Feb 19, 2026 at 12:08:27PM +0000, James Clark wrote:
> > > I'm back to drag this up again. So I think all of the above discussion
> > > relies on the ordering given by the indirect read needed for the "might
> > > ignore a direct write..." part. But it's _might_ ignore a direct write, it's
> > > possible for an implementation to not do that, so there are two possible
> > > implementations:
> > >
> > > #1 Where there is an indirect read to give the write ignore outcome
> > > #2 Where there is no write ignore outcome so it doesn't require an
> > > indirect read
> > >
> > > For #2 there's nothing to force the ordering. We're writing to two different
> > > registers (PMBPTR_EL1 and PMBLIMITR_EL1) and we have to have the
> > > PMBLIMITR_EL1 write come second for the buffer to be considered configured
> > > correctly. For example if the old value of PMBPTR_EL1 is higher than the new
> > > PMBLIMITR_EL1 and the write to PMBLIMITR_EL1 happens first then it's
> > > misconfigured. That's why we think we need the isb() here.
> >
> > I thought profiling was disabled in these cases, so why is it
> > misconfigured?
> >
> > If it is misconfigured, what can go wrong given that we're either stopped
> > or pmscr is clear?
> >
> > Will
>
> It is stopped in the interrupt handler because PMBSR_EL1.S = 1. But in
> arm_spe_pmu_start(), PMBSR_EL1.S = 0 so it's not stopped. And with
> PMBPTR_EL1 still being set to the value from the last session it could be
> higher than PMBLIMITR_EL1.
>
> PMSCR_EL1 doesn't affect SPEProfilingStopped(), it's only:
>
> boolean stopped = (PMBSR_EL1.S == '1');
>
> The conditions for when the buffer needs to be configured correctly from
> R24557 are:
>
> ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> PMBLIMITR_EL1.FM != DISCARD
>
> I think even if PMSCR_EL1 is clear you can still get a buffer management
> error, even if no samples were going to be written into the buffer. It just
> says:
>
> While the Profiling Buffer is enabled, profiling is not stopped, and
> Discard mode is not enabled, all of the following must be true:
>
> * The current write pointer must be at least one sample record below
> the write limit pointer.
... but doesn't that mean that R24557 is a breaking change to the
architecture? The current Arm ARM doesn't appear to require this,
existing software doesn't honour it so why should we hack extra barriers
into Linux?
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-19 14:03 ` Will Deacon
@ 2026-02-19 14:15 ` James Clark
2026-02-19 14:30 ` Mark Rutland
0 siblings, 1 reply; 18+ messages in thread
From: James Clark @ 2026-02-19 14:15 UTC (permalink / raw)
To: Will Deacon
Cc: Mark Rutland, Catalin Marinas, Anshuman Khandual, Rob Herring,
Suzuki Poulose, Robin Murphy, Leo Yan, linux-arm-kernel,
linux-perf-users, linux-kernel, Michael Williams,
Alexandru Elisei
On 19/02/2026 2:03 pm, Will Deacon wrote:
> On Thu, Feb 19, 2026 at 01:51:26PM +0000, James Clark wrote:
>> On 19/02/2026 12:57 pm, Will Deacon wrote:
>>> On Thu, Feb 19, 2026 at 12:08:27PM +0000, James Clark wrote:
>>>> I'm back to drag this up again. So I think all of the above discussion
>>>> relies on the ordering given by the indirect read needed for the "might
>>>> ignore a direct write..." part. But it's _might_ ignore a direct write, it's
>>>> possible for an implementation to not do that, so there are two possible
>>>> implementations:
>>>>
>>>> #1 Where there is an indirect read to give the write ignore outcome
>>>> #2 Where there is no write ignore outcome so it doesn't require an
>>>> indirect read
>>>>
>>>> For #2 there's nothing to force the ordering. We're writing to two different
>>>> registers (PMBPTR_EL1 and PMBLIMITR_EL1) and we have to have the
>>>> PMBLIMITR_EL1 write come second for the buffer to be considered configured
>>>> correctly. For example if the old value of PMBPTR_EL1 is higher than the new
>>>> PMBLIMITR_EL1 and the write to PMBLIMITR_EL1 happens first then it's
>>>> misconfigured. That's why we think we need the isb() here.
>>>
>>> I thought profiling was disabled in these cases, so why is it
>>> misconfigured?
>>>
>>> If it is misconfigured, what can go wrong given that we're either stopped
>>> or pmscr is clear?
>>>
>>> Will
>>
>> It is stopped in the interrupt handler because PMBSR_EL1.S = 1. But in
>> arm_spe_pmu_start(), PMBSR_EL1.S = 0 so it's not stopped. And with
>> PMBPTR_EL1 still being set to the value from the last session it could be
>> higher than PMBLIMITR_EL1.
>>
>> PMSCR_EL1 doesn't affect SPEProfilingStopped(), it's only:
>>
>> boolean stopped = (PMBSR_EL1.S == '1');
>>
>> The conditions for when the buffer needs to be configured correctly from
>> R24557 are:
>>
>> ProfilingBufferEnabled() && !SPEProfilingStopped() &&
>> PMBLIMITR_EL1.FM != DISCARD
>>
>> I think even if PMSCR_EL1 is clear you can still get a buffer management
>> error, even if no samples were going to be written into the buffer. It just
>> says:
>>
>> While the Profiling Buffer is enabled, profiling is not stopped, and
>> Discard mode is not enabled, all of the following must be true:
>>
>> * The current write pointer must be at least one sample record below
>> the write limit pointer.
>
> ... but doesn't that mean that R24557 is a breaking change to the
> architecture? The current Arm ARM doesn't appear to require this,
> existing software doesn't honour it so why should we hack extra barriers
> into Linux?
>
> Will
Yes I suppose it is. The current Arm ARM doesn't require it, but R24577
is in the "known issues" document for the Arm ARM, so it's saying the
ARM is incorrect here and you shouldn't trust what it says.
Arm Architecture Reference Manual for A-profile architecture: Known
issues
This document includes the Known Issues for the following documents:
Arm Architecture Reference Manual for A-profile architecture
(DDI0487)
Presumably the fix will make it into the Arm ARM eventually. There is
certainly a discussion to be had about whether this is a good change to
make or not, but at this point I'm only concerned with making the driver
correct using all of the information that is currently published.
I suppose there is a chance that this could be deleted from the known
issues and not make it into a future Arm ARM. TBH I have no experience
or feeling to say how likely that would be.
Purely from a software point of view, ignoring the new rule despite it
being published here doesn't feel like it would be right.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer
2026-02-19 14:15 ` James Clark
@ 2026-02-19 14:30 ` Mark Rutland
0 siblings, 0 replies; 18+ messages in thread
From: Mark Rutland @ 2026-02-19 14:30 UTC (permalink / raw)
To: James Clark
Cc: Will Deacon, Catalin Marinas, Anshuman Khandual, Rob Herring,
Suzuki Poulose, Robin Murphy, Leo Yan, linux-arm-kernel,
linux-perf-users, linux-kernel, Michael Williams,
Alexandru Elisei
On Thu, Feb 19, 2026 at 02:15:22PM +0000, James Clark wrote:
> On 19/02/2026 2:03 pm, Will Deacon wrote:
> > On Thu, Feb 19, 2026 at 01:51:26PM +0000, James Clark wrote:
> > > On 19/02/2026 12:57 pm, Will Deacon wrote:
> > > > On Thu, Feb 19, 2026 at 12:08:27PM +0000, James Clark wrote:
> > > > > I'm back to drag this up again. So I think all of the above discussion
> > > > > relies on the ordering given by the indirect read needed for the "might
> > > > > ignore a direct write..." part. But it's _might_ ignore a direct write, it's
> > > > > possible for an implementation to not do that, so there are two possible
> > > > > implementations:
> > > > >
> > > > > #1 Where there is an indirect read to give the write ignore outcome
> > > > > #2 Where there is no write ignore outcome so it doesn't require an
> > > > > indirect read
> > > > >
> > > > > For #2 there's nothing to force the ordering. We're writing to two different
> > > > > registers (PMBPTR_EL1 and PMBLIMITR_EL1) and we have to have the
> > > > > PMBLIMITR_EL1 write come second for the buffer to be considered configured
> > > > > correctly. For example if the old value of PMBPTR_EL1 is higher than the new
> > > > > PMBLIMITR_EL1 and the write to PMBLIMITR_EL1 happens first then it's
> > > > > misconfigured. That's why we think we need the isb() here.
> > > >
> > > > I thought profiling was disabled in these cases, so why is it
> > > > misconfigured?
> > > >
> > > > If it is misconfigured, what can go wrong given that we're either stopped
> > > > or pmscr is clear?
> > > >
> > > > Will
> > >
> > > It is stopped in the interrupt handler because PMBSR_EL1.S = 1. But in
> > > arm_spe_pmu_start(), PMBSR_EL1.S = 0 so it's not stopped. And with
> > > PMBPTR_EL1 still being set to the value from the last session it could be
> > > higher than PMBLIMITR_EL1.
> > >
> > > PMSCR_EL1 doesn't affect SPEProfilingStopped(), it's only:
> > >
> > > boolean stopped = (PMBSR_EL1.S == '1');
> > >
> > > The conditions for when the buffer needs to be configured correctly from
> > > R24557 are:
> > >
> > > ProfilingBufferEnabled() && !SPEProfilingStopped() &&
> > > PMBLIMITR_EL1.FM != DISCARD
> > >
> > > I think even if PMSCR_EL1 is clear you can still get a buffer management
> > > error, even if no samples were going to be written into the buffer. It just
> > > says:
> > >
> > > While the Profiling Buffer is enabled, profiling is not stopped, and
> > > Discard mode is not enabled, all of the following must be true:
> > >
> > > * The current write pointer must be at least one sample record below
> > > the write limit pointer.
> >
> > ... but doesn't that mean that R24557 is a breaking change to the
> > architecture? The current Arm ARM doesn't appear to require this,
> > existing software doesn't honour it so why should we hack extra barriers
> > into Linux?
> >
> > Will
>
> Yes I suppose it is. The current Arm ARM doesn't require it, but R24577 is
> in the "known issues" document for the Arm ARM, so it's saying the ARM is
> incorrect here and you shouldn't trust what it says.
>
> Arm Architecture Reference Manual for A-profile architecture: Known
> issues
>
> This document includes the Known Issues for the following documents:
>
> Arm Architecture Reference Manual for A-profile architecture
> (DDI0487)
>
> Presumably the fix will make it into the Arm ARM eventually.
While that document is called "Known issues" it really means "Key
changes since the last ARM ARM". The items in that are not all bug
fixes.
In this case, R24577 is intended to be a relaxation (that's what the "R"
stands for), which breaks existing widely-deployed SW. It fails to meet
the usual acceptability requirements for a relaxation.
IMO this should not have made it into the architecture as-is.
> I suppose there is a chance that this could be deleted from the known issues
> and not make it into a future Arm ARM. TBH I have no experience or feeling
> to say how likely that would be.
Broken changes have made it into the architecture before, and have been
subsequently reverted. In general, we try to avoid that in the first
place.
I will raise this internally within Arm.
Mark.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-02-19 14:30 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-23 16:03 [PATCH] perf: arm_spe: Add barrier before enabling profiling buffer James Clark
2026-01-30 20:24 ` Leo Yan
2026-02-02 16:53 ` Will Deacon
2026-02-02 18:42 ` Leo Yan
2026-02-02 18:57 ` Will Deacon
2026-02-02 19:14 ` Leo Yan
2026-02-03 9:29 ` James Clark
2026-02-03 9:32 ` Will Deacon
2026-02-02 19:03 ` Will Deacon
2026-02-03 10:46 ` James Clark
2026-02-03 11:07 ` Will Deacon
2026-02-06 9:50 ` James Clark
2026-02-19 12:08 ` James Clark
2026-02-19 12:57 ` Will Deacon
2026-02-19 13:51 ` James Clark
2026-02-19 14:03 ` Will Deacon
2026-02-19 14:15 ` James Clark
2026-02-19 14:30 ` Mark Rutland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox