* [RFC PATCH v1 01/10] pmu.h: add a BUILD_BUG_ON to ensure it fits within one page
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-28 10:08 ` Jan Beulich
2025-07-28 10:21 ` Andrew Cooper
2025-07-25 15:06 ` [RFC PATCH v1 02/10] arch-x86/pmu.h: document current memory layout for VPMU Edwin Török
` (13 subsequent siblings)
14 siblings, 2 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, andriy.sultanov, boris.ostrovsky
Linux already has a similar BUILD_BUG_ON.
Currently this struct is ~224 bytes on x86-64.
No functional change.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/arch/x86/cpu/vpmu.c | 1 +
xen/include/public/pmu.h | 3 +++
2 files changed, 4 insertions(+)
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index c28192ea26..7be79c2d00 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -401,6 +401,7 @@ static int vpmu_arch_initialise(struct vcpu *v)
uint8_t vendor = current_cpu_data.x86_vendor;
int ret;
+ BUILD_BUG_ON(sizeof(struct xen_pmu_data) > PAGE_SIZE);
BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index af8b7babdd..15decc024d 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -93,6 +93,9 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
* Architecture-independent fields of xen_pmu_data are WO for the hypervisor
* and RO for the guest but some fields in xen_pmu_arch can be writable
* by both the hypervisor and the guest (see arch-$arch/pmu.h).
+ *
+ * PAGE_SIZE bytes of memory are allocated.
+ * This struct cannot be larger than PAGE_SIZE.
*/
struct xen_pmu_data {
/* Interrupted VCPU */
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 01/10] pmu.h: add a BUILD_BUG_ON to ensure it fits within one page
2025-07-25 15:06 ` [RFC PATCH v1 01/10] pmu.h: add a BUILD_BUG_ON to ensure it fits within one page Edwin Török
@ 2025-07-28 10:08 ` Jan Beulich
2025-07-28 10:21 ` Andrew Cooper
1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2025-07-28 10:08 UTC (permalink / raw)
To: Edwin Török
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, andriy.sultanov,
boris.ostrovsky, xen-devel
On 25.07.2025 17:06, Edwin Török wrote:
> Linux already has a similar BUILD_BUG_ON.
> Currently this struct is ~224 bytes on x86-64.
>
> No functional change.
>
> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
> ---
> xen/arch/x86/cpu/vpmu.c | 1 +
> xen/include/public/pmu.h | 3 +++
> 2 files changed, 4 insertions(+)
>
> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
> index c28192ea26..7be79c2d00 100644
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -401,6 +401,7 @@ static int vpmu_arch_initialise(struct vcpu *v)
> uint8_t vendor = current_cpu_data.x86_vendor;
> int ret;
>
> + BUILD_BUG_ON(sizeof(struct xen_pmu_data) > PAGE_SIZE);
> BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
> BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
> BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
I'm fine with this change, and in isolation it can have my ack.
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -93,6 +93,9 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
> * Architecture-independent fields of xen_pmu_data are WO for the hypervisor
> * and RO for the guest but some fields in xen_pmu_arch can be writable
> * by both the hypervisor and the guest (see arch-$arch/pmu.h).
> + *
> + * PAGE_SIZE bytes of memory are allocated.
> + * This struct cannot be larger than PAGE_SIZE.
> */
> struct xen_pmu_data {
> /* Interrupted VCPU */
I'm not happy about this change: PAGE_SIZE is a Xen-internal entity, which
has no specific meaning in the public interface. (The fact that under io/
there are a number of similar references doesn't justify the use here.)
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 01/10] pmu.h: add a BUILD_BUG_ON to ensure it fits within one page
2025-07-25 15:06 ` [RFC PATCH v1 01/10] pmu.h: add a BUILD_BUG_ON to ensure it fits within one page Edwin Török
2025-07-28 10:08 ` Jan Beulich
@ 2025-07-28 10:21 ` Andrew Cooper
2025-07-28 10:22 ` Edwin Torok
1 sibling, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2025-07-28 10:21 UTC (permalink / raw)
To: Edwin Török, xen-devel
Cc: Jan Beulich, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, andriy.sultanov,
boris.ostrovsky
On 25/07/2025 4:06 pm, Edwin Török wrote:
> Linux already has a similar BUILD_BUG_ON.
> Currently this struct is ~224 bytes on x86-64.
>
> No functional change.
>
> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
> ---
> xen/arch/x86/cpu/vpmu.c | 1 +
> xen/include/public/pmu.h | 3 +++
> 2 files changed, 4 insertions(+)
>
> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
> index c28192ea26..7be79c2d00 100644
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -401,6 +401,7 @@ static int vpmu_arch_initialise(struct vcpu *v)
> uint8_t vendor = current_cpu_data.x86_vendor;
> int ret;
>
> + BUILD_BUG_ON(sizeof(struct xen_pmu_data) > PAGE_SIZE);
> BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
> BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
> BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
This is fine (even if it ought to be elsewhere, but don't worry about that).
> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> index af8b7babdd..15decc024d 100644
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -93,6 +93,9 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
> * Architecture-independent fields of xen_pmu_data are WO for the hypervisor
> * and RO for the guest but some fields in xen_pmu_arch can be writable
> * by both the hypervisor and the guest (see arch-$arch/pmu.h).
> + *
> + * PAGE_SIZE bytes of memory are allocated.
> + * This struct cannot be larger than PAGE_SIZE.
This isn't. Xen's PAGE_SIZE is not necessarily the same as PAGE_SIZE in
the guest consuming this header.
This highlights one of the problems that Xen's ABI entrenches. Being
x86-only, it's 4k in practice, but there's no easy solution.
I'd just skip this comment. Anything else is going to get tied up in
unrelated bigger problems.
~Andrew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v1 01/10] pmu.h: add a BUILD_BUG_ON to ensure it fits within one page
2025-07-28 10:21 ` Andrew Cooper
@ 2025-07-28 10:22 ` Edwin Torok
2025-07-28 10:25 ` Jan Beulich
0 siblings, 1 reply; 30+ messages in thread
From: Edwin Torok @ 2025-07-28 10:22 UTC (permalink / raw)
To: Andrew Cooper
Cc: xen-devel, Jan Beulich, Roger Pau Monné, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, andriy.sultanov,
boris.ostrovsky
On Mon, Jul 28, 2025 at 11:21 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 25/07/2025 4:06 pm, Edwin Török wrote:
> > Linux already has a similar BUILD_BUG_ON.
> > Currently this struct is ~224 bytes on x86-64.
> >
> > No functional change.
> >
> > Signed-off-by: Edwin Török <edwin.torok@cloud.com>
> > ---
> > xen/arch/x86/cpu/vpmu.c | 1 +
> > xen/include/public/pmu.h | 3 +++
> > 2 files changed, 4 insertions(+)
> >
> > diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
> > index c28192ea26..7be79c2d00 100644
> > --- a/xen/arch/x86/cpu/vpmu.c
> > +++ b/xen/arch/x86/cpu/vpmu.c
> > @@ -401,6 +401,7 @@ static int vpmu_arch_initialise(struct vcpu *v)
> > uint8_t vendor = current_cpu_data.x86_vendor;
> > int ret;
> >
> > + BUILD_BUG_ON(sizeof(struct xen_pmu_data) > PAGE_SIZE);
> > BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
> > BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
> > BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
>
> This is fine (even if it ought to be elsewhere, but don't worry about that).
>
> > diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
> > index af8b7babdd..15decc024d 100644
> > --- a/xen/include/public/pmu.h
> > +++ b/xen/include/public/pmu.h
> > @@ -93,6 +93,9 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
> > * Architecture-independent fields of xen_pmu_data are WO for the hypervisor
> > * and RO for the guest but some fields in xen_pmu_arch can be writable
> > * by both the hypervisor and the guest (see arch-$arch/pmu.h).
> > + *
> > + * PAGE_SIZE bytes of memory are allocated.
> > + * This struct cannot be larger than PAGE_SIZE.
>
> This isn't. Xen's PAGE_SIZE is not necessarily the same as PAGE_SIZE in
> the guest consuming this header.
>
> This highlights one of the problems that Xen's ABI entrenches. Being
> x86-only, it's 4k in practice, but there's no easy solution.
>
> I'd just skip this comment. Anything else is going to get tied up in
> unrelated bigger problems.
Thanks, I'll drop this comment in the next version of the series.
--Edwin
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v1 01/10] pmu.h: add a BUILD_BUG_ON to ensure it fits within one page
2025-07-28 10:22 ` Edwin Torok
@ 2025-07-28 10:25 ` Jan Beulich
0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2025-07-28 10:25 UTC (permalink / raw)
To: Edwin Torok
Cc: xen-devel, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, andriy.sultanov,
boris.ostrovsky, Andrew Cooper
On 28.07.2025 12:22, Edwin Torok wrote:
> On Mon, Jul 28, 2025 at 11:21 AM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>>
>> On 25/07/2025 4:06 pm, Edwin Török wrote:
>>> Linux already has a similar BUILD_BUG_ON.
>>> Currently this struct is ~224 bytes on x86-64.
>>>
>>> No functional change.
>>>
>>> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
>>> ---
>>> xen/arch/x86/cpu/vpmu.c | 1 +
>>> xen/include/public/pmu.h | 3 +++
>>> 2 files changed, 4 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
>>> index c28192ea26..7be79c2d00 100644
>>> --- a/xen/arch/x86/cpu/vpmu.c
>>> +++ b/xen/arch/x86/cpu/vpmu.c
>>> @@ -401,6 +401,7 @@ static int vpmu_arch_initialise(struct vcpu *v)
>>> uint8_t vendor = current_cpu_data.x86_vendor;
>>> int ret;
>>>
>>> + BUILD_BUG_ON(sizeof(struct xen_pmu_data) > PAGE_SIZE);
>>> BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
>>> BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
>>> BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
>>
>> This is fine (even if it ought to be elsewhere, but don't worry about that).
>>
>>> diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
>>> index af8b7babdd..15decc024d 100644
>>> --- a/xen/include/public/pmu.h
>>> +++ b/xen/include/public/pmu.h
>>> @@ -93,6 +93,9 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
>>> * Architecture-independent fields of xen_pmu_data are WO for the hypervisor
>>> * and RO for the guest but some fields in xen_pmu_arch can be writable
>>> * by both the hypervisor and the guest (see arch-$arch/pmu.h).
>>> + *
>>> + * PAGE_SIZE bytes of memory are allocated.
>>> + * This struct cannot be larger than PAGE_SIZE.
>>
>> This isn't. Xen's PAGE_SIZE is not necessarily the same as PAGE_SIZE in
>> the guest consuming this header.
>>
>> This highlights one of the problems that Xen's ABI entrenches. Being
>> x86-only, it's 4k in practice, but there's no easy solution.
>>
>> I'd just skip this comment. Anything else is going to get tied up in
>> unrelated bigger problems.
>
> Thanks, I'll drop this comment in the next version of the series.
As said, I'm happy to ack the change with the comment adjustment dropped.
That is, I could easily carry out what you say above while committing.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v1 02/10] arch-x86/pmu.h: document current memory layout for VPMU
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
2025-07-25 15:06 ` [RFC PATCH v1 01/10] pmu.h: add a BUILD_BUG_ON to ensure it fits within one page Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-25 15:06 ` [RFC PATCH v1 03/10] arch-x86/pmu.h: convert ascii art drawing to Unicode Edwin Török
` (12 subsequent siblings)
14 siblings, 0 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, andriy.sultanov, boris.ostrovsky
There are nested structs, unions, padding and flexible array members.
The interpretation of the flexible array members is all done with
pointer arithmetic, it is useful to visualize the actual memory layout.
The ascii-art drawing is compatible with ascii-art-to-unicode (aa2u) fro
m hackage.
No functional change.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/include/public/arch-x86/pmu.h | 63 +++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
index d0a99268af..8be71a88ee 100644
--- a/xen/include/public/arch-x86/pmu.h
+++ b/xen/include/public/arch-x86/pmu.h
@@ -129,6 +129,69 @@ struct xen_pmu_arch {
typedef struct xen_pmu_arch xen_pmu_arch_t;
DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
+/* Memory layout:
+* .---------------------.
+* | struct xen_pmu_data |
+* +==============+=====================+=======================+ <.
+* | vcpu_id | |
+* +------------------------------------------------------------+ |
+* | pcpu_id | |
+* +------------------------------------------------------------+ |
+* | domain_id | |
+* +------------------------------------------------------------+ |
+* |##pad#######################################################| |
+* +====+=+===+==================+==============================+ |
+* | pmu| | r | regs |##pad#########################| |
+* +----. +---. (xen or guest) |##############################| |
+* | +======================+==============================+ |
+* | | pmu_flags | |
+* | +===+====================+============================+ |
+* | | l | lapic_lvtpc |############################| |
+* | +---. ###################|##pad#######################| |
+* | | ###################|############################| |
+* | +===+=+=======+=====+====+====+=======+========+======+ |
+* | | c | | | amd | | | intel | |#####| |
+* | +---+ | .-----. | .-------. |#####| |
+* | | | counter | fixed_counters |#####| |
+* | | +------------------+----------------------+#####| |
+* | | | ctrls | arch_counters |#####| |
+* | | +=====+========+===+----------------------+#####| |
+* | | | | regs[] | :| global_ctrl |#####| |
+* | | | +--------. :+----------------------+#####| |
+* | | |struct :| global_ovf_ctrl |#####| |
+* | | |xen_pmu_cntr_pair:+----------------------+#####| |
+* | | |[counters] :| global_status |#####| |
+* | | | :+----------------------+#####| |
+* | | | :| fixed_ctrl |#####| |
+* | | | :+----------------------+#####| |
+* | | | :| ds_area |#####| |
+* | | | :+----------------------+#####| |
+* | | | :| pebs_enable |#pad#| |
+* | | | :+----------------------+#####| |
+* | | | v| debugctl |#####| |
+* | | |##################+=======+========+=====+#####| |
+* | | |##################| | regs[] | :[0]|#####| |
+* | | |##################| +--------. : |#####| |
+* | | |##################| uint64_t : |#####| |
+* | | |##################| [fixed_counters] : |#####| |
+* | | |##################| : |#####| |
+* | | |##################| : |#####| |
+* | | |##################| -----------------: |#####| |
+* | | |##################| struct : |#####| |
+* | | |##################| xen_pmu_cntr_pair: |#####| |
+* | | +==================+ [arch_counters] : +=====+ |
+* | | | : | | |
+* | | | v | | |
+* | | +======================+ | |
+* | +=====================================================+ |
+* +==========================+=================================+ |
+* |############################################################| |
+* :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :
+* :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :
+* |############################################################| | PAGE_SIZE
+* +=========+==================================================+ <.
+*/
+
#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
/*
* Local variables:
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* [RFC PATCH v1 03/10] arch-x86/pmu.h: convert ascii art drawing to Unicode
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
2025-07-25 15:06 ` [RFC PATCH v1 01/10] pmu.h: add a BUILD_BUG_ON to ensure it fits within one page Edwin Török
2025-07-25 15:06 ` [RFC PATCH v1 02/10] arch-x86/pmu.h: document current memory layout for VPMU Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-28 10:22 ` Jan Beulich
2025-07-25 15:06 ` [RFC PATCH v1 04/10] vpmu.c: factor out register conversion Edwin Török
` (11 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, andriy.sultanov, boris.ostrovsky
Use `aa2u` (ascii-art-to-unicode from Hackage) to convert to
Unicode box drawing characters.
The full list of supported drawing characters can be seen in the
examples at:
https://github.com/fmthoma/ascii-art-to-unicode/blob/master/src/Text/AsciiArt.hs
For future maintenance: conversion can be done incrementally
(mixing ascii art with already converted Unicode,
and running the conversion tool again), or by hand.
No functional change.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/include/public/arch-x86/pmu.h | 120 +++++++++++++++---------------
1 file changed, 60 insertions(+), 60 deletions(-)
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
index 8be71a88ee..5bd0aa6f77 100644
--- a/xen/include/public/arch-x86/pmu.h
+++ b/xen/include/public/arch-x86/pmu.h
@@ -130,66 +130,66 @@ typedef struct xen_pmu_arch xen_pmu_arch_t;
DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
/* Memory layout:
-* .---------------------.
-* | struct xen_pmu_data |
-* +==============+=====================+=======================+ <.
-* | vcpu_id | |
-* +------------------------------------------------------------+ |
-* | pcpu_id | |
-* +------------------------------------------------------------+ |
-* | domain_id | |
-* +------------------------------------------------------------+ |
-* |##pad#######################################################| |
-* +====+=+===+==================+==============================+ |
-* | pmu| | r | regs |##pad#########################| |
-* +----. +---. (xen or guest) |##############################| |
-* | +======================+==============================+ |
-* | | pmu_flags | |
-* | +===+====================+============================+ |
-* | | l | lapic_lvtpc |############################| |
-* | +---. ###################|##pad#######################| |
-* | | ###################|############################| |
-* | +===+=+=======+=====+====+====+=======+========+======+ |
-* | | c | | | amd | | | intel | |#####| |
-* | +---+ | .-----. | .-------. |#####| |
-* | | | counter | fixed_counters |#####| |
-* | | +------------------+----------------------+#####| |
-* | | | ctrls | arch_counters |#####| |
-* | | +=====+========+===+----------------------+#####| |
-* | | | | regs[] | :| global_ctrl |#####| |
-* | | | +--------. :+----------------------+#####| |
-* | | |struct :| global_ovf_ctrl |#####| |
-* | | |xen_pmu_cntr_pair:+----------------------+#####| |
-* | | |[counters] :| global_status |#####| |
-* | | | :+----------------------+#####| |
-* | | | :| fixed_ctrl |#####| |
-* | | | :+----------------------+#####| |
-* | | | :| ds_area |#####| |
-* | | | :+----------------------+#####| |
-* | | | :| pebs_enable |#pad#| |
-* | | | :+----------------------+#####| |
-* | | | v| debugctl |#####| |
-* | | |##################+=======+========+=====+#####| |
-* | | |##################| | regs[] | :[0]|#####| |
-* | | |##################| +--------. : |#####| |
-* | | |##################| uint64_t : |#####| |
-* | | |##################| [fixed_counters] : |#####| |
-* | | |##################| : |#####| |
-* | | |##################| : |#####| |
-* | | |##################| -----------------: |#####| |
-* | | |##################| struct : |#####| |
-* | | |##################| xen_pmu_cntr_pair: |#####| |
-* | | +==================+ [arch_counters] : +=====+ |
-* | | | : | | |
-* | | | v | | |
-* | | +======================+ | |
-* | +=====================================================+ |
-* +==========================+=================================+ |
-* |############################################################| |
-* :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :
-* :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :
-* |############################################################| | PAGE_SIZE
-* +=========+==================================================+ <.
+* ╭─────────────────────╮
+* │ struct xen_pmu_data │
+* ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁╮
+* │ vcpu_id │ │
+* ├────────────────────────────────────────────────────────────┤ │
+* │ pcpu_id │ │
+* ├────────────────────────────────────────────────────────────┤ │
+* │ domain_id │ │
+* ├────────────────────────────────────────────────────────────┤ │
+* │██pad███████████████████████████████████████████████████████│ │
+* ╞════╤═╤═══╤══════════════════╤══════════════════════════════╡ │
+* │ pmu│ │ r │ regs │██pad█████████████████████████│ │
+* ├────╯ ├───╯ (xen or guest) │██████████████████████████████│ │
+* │ ╞══════════════════════╧══════════════════════════════╡ │
+* │ │ pmu_flags │ │
+* │ ╞═══╤════════════════════╤════════════════════════════╡ │
+* │ │ l │ lapic_lvtpc │████████████████████████████│ │
+* │ ├───╯ ███████████████████│██pad███████████████████████│ │
+* │ │ ███████████████████│████████████████████████████│ │
+* │ ╞═══╤═╤═══════╤═════╤════╪════╤═══════╤═══════════════╡ │
+* │ │ c │ │ │ amd │ │ │ intel │ │█████│ │
+* │ ├───┘ │ ╰─────╯ │ ╰───────╯ │█████│ │
+* │ │ │ counter │ fixed_counters │█████│ │
+* │ │ ├──────────────────┼──────────────────────┤█████│ │
+* │ │ │ ctrls │ arch_counters │█████│ │
+* │ │ ╞═════╤════════╤═══├──────────────────────┤█████│ │
+* │ │ │ │ regs[] │ ┆│ global_ctrl │█████│ │
+* │ │ │ └────────╯ ┆├──────────────────────┤█████│ │
+* │ │ │struct ┆│ global_ovf_ctrl │█████│ │
+* │ │ │xen_pmu_cntr_pair┆├──────────────────────┤█████│ │
+* │ │ │[counters] ┆│ global_status │█████│ │
+* │ │ │ ┆├──────────────────────┤█████│ │
+* │ │ │ ┆│ fixed_ctrl │█████│ │
+* │ │ │ ┆├──────────────────────┤█████│ │
+* │ │ │ ┆│ ds_area │█████│ │
+* │ │ │ ┆├──────────────────────┤█████│ │
+* │ │ │ ┆│ pebs_enable │█pad█│ │
+* │ │ │ ┆├──────────────────────┤█████│ │
+* │ │ │ ▽│ debugctl │█████│ │
+* │ │ │██████████████████╞═══════╤════════╤═════╡█████│ │
+* │ │ │██████████████████│ │ regs[] │ ┆[0]│█████│ │
+* │ │ │██████████████████│ └────────╯ ┆ │█████│ │
+* │ │ │██████████████████│ uint64_t ┆ │█████│ │
+* │ │ │██████████████████│ [fixed_counters] ┆ │█████│ │
+* │ │ │██████████████████│ ┆ │█████│ │
+* │ │ │██████████████████│ ┆ │█████│ │
+* │ │ │██████████████████│ ─────────────────┆ │█████│ │
+* │ │ │██████████████████│ struct ┆ │█████│ │
+* │ │ │██████████████████│ xen_pmu_cntr_pair┆ │█████│ │
+* │ │ ╘══════════════════╡ [arch_counters] ┆ ╞═════╡ │
+* │ │ │ ┆ │ │ │
+* │ │ │ ▽ │ │ │
+* │ │ ╘══════════════════════╛ │ │
+* │ ╘═════════════════════════════════════════════════════╡ │
+* ╞════════════════════════════════════════════════════════════╡ │
+* │████████████████████████████████████████████████████████████│ │
+* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
+* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
+* │████████████████████████████████████████████████████████████│ │ PAGE_SIZE
+* ╘════════════════════════════════════════════════════════════╛ ◁╯
*/
#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 03/10] arch-x86/pmu.h: convert ascii art drawing to Unicode
2025-07-25 15:06 ` [RFC PATCH v1 03/10] arch-x86/pmu.h: convert ascii art drawing to Unicode Edwin Török
@ 2025-07-28 10:22 ` Jan Beulich
2025-07-28 16:07 ` Edwin Torok
0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2025-07-28 10:22 UTC (permalink / raw)
To: Edwin Török, xen-devel
Cc: Andrew Cooper, Roger Pau Monné, andriy.sultanov,
boris.ostrovsky
On 25.07.2025 17:06, Edwin Török wrote:
> Use `aa2u` (ascii-art-to-unicode from Hackage) to convert to
> Unicode box drawing characters.
>
> The full list of supported drawing characters can be seen in the
> examples at:
> https://github.com/fmthoma/ascii-art-to-unicode/blob/master/src/Text/AsciiArt.hs
>
> For future maintenance: conversion can be done incrementally
> (mixing ascii art with already converted Unicode,
> and running the conversion tool again), or by hand.
>
> No functional change.
>
> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
> ---
> xen/include/public/arch-x86/pmu.h | 120 +++++++++++++++---------------
> 1 file changed, 60 insertions(+), 60 deletions(-)
I'm already unconvinced of the earlier patch: The whole construct isn't self-
explanatory, and it lacks a legend. There's also the question of legibility.
The change here has the main problem of making readability dependent upon the
capabilities of the editor / viewer / etc one is using. For example, the '┆'
character as well as the arrow ones I can't get Win10's console subsystem to
display properly.
In addition this pretty extensive use of Unicode chars then raises a concern
that was brought up for the binutils project a little while ago [1]. As I
have indicated in replies there, isolated uses may be appropriate in certain
cases. A larger comment like the one here is different though, as the
slipping in of problematic characters (now or in the future) might go
unnoticed.
Jan
[1] https://inbox.sourceware.org/binutils/72219ed1-147d-4dd3-b503-363d981528a2@arm.com/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v1 03/10] arch-x86/pmu.h: convert ascii art drawing to Unicode
2025-07-28 10:22 ` Jan Beulich
@ 2025-07-28 16:07 ` Edwin Torok
2025-07-29 6:45 ` Jan Beulich
0 siblings, 1 reply; 30+ messages in thread
From: Edwin Torok @ 2025-07-28 16:07 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, Andrew Cooper, Roger Pau Monné, andriy.sultanov,
boris.ostrovsky
On Mon, Jul 28, 2025 at 11:23 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 25.07.2025 17:06, Edwin Török wrote:
> > Use `aa2u` (ascii-art-to-unicode from Hackage) to convert to
> > Unicode box drawing characters.
> >
> > The full list of supported drawing characters can be seen in the
> > examples at:
> > https://github.com/fmthoma/ascii-art-to-unicode/blob/master/src/Text/AsciiArt.hs
> >
> > For future maintenance: conversion can be done incrementally
> > (mixing ascii art with already converted Unicode,
> > and running the conversion tool again), or by hand.
> >
> > No functional change.
> >
> > Signed-off-by: Edwin Török <edwin.torok@cloud.com>
> > ---
> > xen/include/public/arch-x86/pmu.h | 120 +++++++++++++++---------------
> > 1 file changed, 60 insertions(+), 60 deletions(-)
>
> I'm already unconvinced of the earlier patch: The whole construct isn't self-
> explanatory, and it lacks a legend. There's also the question of legibility.
> The change here has the main problem of making readability dependent upon the
> capabilities of the editor / viewer / etc one is using. For example, the '┆'
> character as well as the arrow ones I can't get Win10's console subsystem to
> display properly.
>
The original ASCII diagram could also be moved to another file that
contains only documentation and is not used during compilation.
There is https://ivanceras.github.io/svgbob-editor/ which can then
create an SVG out of it if needed.
The SVG (or its ASCII source) wouldn't be restricted to 80 chars, and
could contain more details.
Although if it is a separate file it is more likely to go stale when
the .h is updated.
Here is a solution that works with ASCII instead then (the diagram
itself is not very readable in pure ASCII).
I think my main goal was to understand what the flexible array member
would contain, and that could actually be explained in a sort of
pseudo-C.
Something like:
```
struct ... {
uint32_t fixed_counters;
uint32_t arch_counters;
....
union {
uint64_t regs[];
struct {
uint64_t fixed[fixed_counters];
struct xen_pmu_cntr_pair arch[arch_counters];
}
}
}
```
This isn't (yet?) valid C, although it follows the trend the C
standard is evolving to, e.g. you can already refer to array
dimensions in function arguments, where the array dimension is another
function argument, in fact the manpages already started to get updated
to follow this new style, and newer versions of GCC support it, e.g.
memcpy: https://man7.org/linux/man-pages/man3/memcpy.3.html
I don't know whether future C standards would ever add support for
flexible array members where the size depends on another struct field,
but it should be fine as a comment, and perhaps easier to maintain
than a diagram. If it ever becomes valid C it can be promoted from a
comment to actual code.
It'd retain the main benefit: being able to see the memory layout,
without having to read through the source code and all the
sizeof/offset pointer calculation to figure out what is actually
stored in regs[] and how big it could be.
What do you think?
> In addition this pretty extensive use of Unicode chars then raises a concern
> that was brought up for the binutils project a little while ago [1]. As I
> have indicated in replies there, isolated uses may be appropriate in certain
> cases. A larger comment like the one here is different though, as the
> slipping in of problematic characters (now or in the future) might go
> unnoticed.
https://github.com/rurban/libu8ident might help (there is also a
configuration there that'd match what is proposed for C26)
GCC 12+ also has some protection against it
https://developers.redhat.com/articles/2022/01/12/prevent-trojan-source-attacks-gcc-12
Safely using Unicode in source code is a rather complex (and perhaps
not entirely solved) topic, so I understand if you'd rather avoid
extensive Unicode
use.
Best regards,
--Edwin
>
> Jan
>
> [1] https://inbox.sourceware.org/binutils/72219ed1-147d-4dd3-b503-363d981528a2@arm.com/
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 03/10] arch-x86/pmu.h: convert ascii art drawing to Unicode
2025-07-28 16:07 ` Edwin Torok
@ 2025-07-29 6:45 ` Jan Beulich
0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2025-07-29 6:45 UTC (permalink / raw)
To: Edwin Torok
Cc: xen-devel, Andrew Cooper, Roger Pau Monné, andriy.sultanov,
boris.ostrovsky
On 28.07.2025 18:07, Edwin Torok wrote:
> On Mon, Jul 28, 2025 at 11:23 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 25.07.2025 17:06, Edwin Török wrote:
>>> Use `aa2u` (ascii-art-to-unicode from Hackage) to convert to
>>> Unicode box drawing characters.
>>>
>>> The full list of supported drawing characters can be seen in the
>>> examples at:
>>> https://github.com/fmthoma/ascii-art-to-unicode/blob/master/src/Text/AsciiArt.hs
>>>
>>> For future maintenance: conversion can be done incrementally
>>> (mixing ascii art with already converted Unicode,
>>> and running the conversion tool again), or by hand.
>>>
>>> No functional change.
>>>
>>> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
>>> ---
>>> xen/include/public/arch-x86/pmu.h | 120 +++++++++++++++---------------
>>> 1 file changed, 60 insertions(+), 60 deletions(-)
>>
>> I'm already unconvinced of the earlier patch: The whole construct isn't self-
>> explanatory, and it lacks a legend. There's also the question of legibility.
>> The change here has the main problem of making readability dependent upon the
>> capabilities of the editor / viewer / etc one is using. For example, the '┆'
>> character as well as the arrow ones I can't get Win10's console subsystem to
>> display properly.
>>
>
> The original ASCII diagram could also be moved to another file that
> contains only documentation and is not used during compilation.
> There is https://ivanceras.github.io/svgbob-editor/ which can then
> create an SVG out of it if needed.
> The SVG (or its ASCII source) wouldn't be restricted to 80 chars, and
> could contain more details.
>
> Although if it is a separate file it is more likely to go stale when
> the .h is updated.
>
> Here is a solution that works with ASCII instead then (the diagram
> itself is not very readable in pure ASCII).
> I think my main goal was to understand what the flexible array member
> would contain, and that could actually be explained in a sort of
> pseudo-C.
> Something like:
>
> ```
> struct ... {
> uint32_t fixed_counters;
> uint32_t arch_counters;
> ....
> union {
> uint64_t regs[];
> struct {
> uint64_t fixed[fixed_counters];
> struct xen_pmu_cntr_pair arch[arch_counters];
> }
> }
> }
> ```
>
> This isn't (yet?) valid C, although it follows the trend the C
> standard is evolving to, e.g. you can already refer to array
> dimensions in function arguments, where the array dimension is another
> function argument, in fact the manpages already started to get updated
> to follow this new style, and newer versions of GCC support it, e.g.
> memcpy: https://man7.org/linux/man-pages/man3/memcpy.3.html
> I don't know whether future C standards would ever add support for
> flexible array members where the size depends on another struct field,
> but it should be fine as a comment, and perhaps easier to maintain
> than a diagram. If it ever becomes valid C it can be promoted from a
> comment to actual code.
Somewhat related (but afaict not really usable here, leaving aside that
this is a public header and hence needs to remain free of extension uses)
is gcc's relatively new counted_by attribute.
> It'd retain the main benefit: being able to see the memory layout,
> without having to read through the source code and all the
> sizeof/offset pointer calculation to figure out what is actually
> stored in regs[] and how big it could be.
>
> What do you think?
Why not.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v1 04/10] vpmu.c: factor out register conversion
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (2 preceding siblings ...)
2025-07-25 15:06 ` [RFC PATCH v1 03/10] arch-x86/pmu.h: convert ascii art drawing to Unicode Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-28 10:25 ` Andrew Cooper
2025-07-25 15:06 ` [RFC PATCH v1 05/10] pmu.h: introduce a stacktrace area Edwin Török
` (10 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, andriy.sultanov, boris.ostrovsky
A followup commit will use this to store the guest's regs when domid ==
DOMID_XEN.
To avoid code duplication move the code into a function.
No functional change.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/arch/x86/cpu/vpmu.c | 49 ++++++++++++++++++++++-------------------
1 file changed, 26 insertions(+), 23 deletions(-)
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 7be79c2d00..713311a1ac 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -160,6 +160,31 @@ static inline struct vcpu *choose_hwdom_vcpu(void)
return hardware_domain->vcpu[idx];
}
+static inline void vpmu_convert_regs(struct xen_pmu_regs *r, uint64_t *flags,
+ struct vcpu *sampled,
+ const struct cpu_user_regs *cur_regs) {
+ r->ip = cur_regs->rip;
+ r->sp = cur_regs->rsp;
+ r->flags = cur_regs->rflags;
+
+ if (!is_hvm_vcpu(sampled)) {
+ r->ss = cur_regs->ss;
+ r->cs = cur_regs->cs;
+ if (!(sampled->arch.flags & TF_kernel_mode))
+ *flags |= PMU_SAMPLE_USER;
+ } else {
+ struct segment_register seg;
+
+ hvm_get_segment_register(sampled, x86_seg_cs, &seg);
+ r->cs = seg.sel;
+ hvm_get_segment_register(sampled, x86_seg_ss, &seg);
+ r->ss = seg.sel;
+ r->cpl = seg.dpl;
+ if (!(sampled->arch.hvm.guest_cr[0] & X86_CR0_PE))
+ *flags |= PMU_SAMPLE_REAL;
+ }
+}
+
void vpmu_do_interrupt(void)
{
struct vcpu *sampled = current, *sampling;
@@ -255,29 +280,7 @@ void vpmu_do_interrupt(void)
else
cur_regs = guest_cpu_user_regs();
- r->ip = cur_regs->rip;
- r->sp = cur_regs->rsp;
- r->flags = cur_regs->rflags;
-
- if ( !is_hvm_vcpu(sampled) )
- {
- r->ss = cur_regs->ss;
- r->cs = cur_regs->cs;
- if ( !(sampled->arch.flags & TF_kernel_mode) )
- *flags |= PMU_SAMPLE_USER;
- }
- else
- {
- struct segment_register seg;
-
- hvm_get_segment_register(sampled, x86_seg_cs, &seg);
- r->cs = seg.sel;
- hvm_get_segment_register(sampled, x86_seg_ss, &seg);
- r->ss = seg.sel;
- r->cpl = seg.dpl;
- if ( !(sampled->arch.hvm.guest_cr[0] & X86_CR0_PE) )
- *flags |= PMU_SAMPLE_REAL;
- }
+ vpmu_convert_regs(r, flags, sampled, cur_regs);
}
vpmu->xenpmu_data->domain_id = domid;
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 04/10] vpmu.c: factor out register conversion
2025-07-25 15:06 ` [RFC PATCH v1 04/10] vpmu.c: factor out register conversion Edwin Török
@ 2025-07-28 10:25 ` Andrew Cooper
2025-07-28 14:17 ` Edwin Torok
0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2025-07-28 10:25 UTC (permalink / raw)
To: Edwin Török, xen-devel
Cc: Jan Beulich, Roger Pau Monné, andriy.sultanov,
boris.ostrovsky
On 25/07/2025 4:06 pm, Edwin Török wrote:
> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
> index 7be79c2d00..713311a1ac 100644
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -160,6 +160,31 @@ static inline struct vcpu *choose_hwdom_vcpu(void)
> return hardware_domain->vcpu[idx];
> }
>
> +static inline void vpmu_convert_regs(struct xen_pmu_regs *r, uint64_t *flags,
> + struct vcpu *sampled,
> + const struct cpu_user_regs *cur_regs) {
> + r->ip = cur_regs->rip;
> + r->sp = cur_regs->rsp;
> + r->flags = cur_regs->rflags;
> +
> + if (!is_hvm_vcpu(sampled)) {
> + r->ss = cur_regs->ss;
> + r->cs = cur_regs->cs;
> + if (!(sampled->arch.flags & TF_kernel_mode))
> + *flags |= PMU_SAMPLE_USER;
> + } else {
> + struct segment_register seg;
> +
> + hvm_get_segment_register(sampled, x86_seg_cs, &seg);
> + r->cs = seg.sel;
> + hvm_get_segment_register(sampled, x86_seg_ss, &seg);
> + r->ss = seg.sel;
> + r->cpl = seg.dpl;
> + if (!(sampled->arch.hvm.guest_cr[0] & X86_CR0_PE))
> + *flags |= PMU_SAMPLE_REAL;
> + }
> +}
> +
This is fine in principle, except that you're changing the style away
from Xen style.
I can fix it on commit, but it's going to collide massively later in the
series.
~Andrew
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 04/10] vpmu.c: factor out register conversion
2025-07-28 10:25 ` Andrew Cooper
@ 2025-07-28 14:17 ` Edwin Torok
0 siblings, 0 replies; 30+ messages in thread
From: Edwin Torok @ 2025-07-28 14:17 UTC (permalink / raw)
To: Andrew Cooper
Cc: xen-devel, Jan Beulich, Roger Pau Monné, andriy.sultanov,
boris.ostrovsky
On Mon, Jul 28, 2025 at 11:25 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 25/07/2025 4:06 pm, Edwin Török wrote:
> > diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
> > index 7be79c2d00..713311a1ac 100644
> > --- a/xen/arch/x86/cpu/vpmu.c
> > +++ b/xen/arch/x86/cpu/vpmu.c
> > @@ -160,6 +160,31 @@ static inline struct vcpu *choose_hwdom_vcpu(void)
> > return hardware_domain->vcpu[idx];
> > }
> >
> > +static inline void vpmu_convert_regs(struct xen_pmu_regs *r, uint64_t *flags,
> > + struct vcpu *sampled,
> > + const struct cpu_user_regs *cur_regs) {
> > + r->ip = cur_regs->rip;
> > + r->sp = cur_regs->rsp;
> > + r->flags = cur_regs->rflags;
> > +
> > + if (!is_hvm_vcpu(sampled)) {
> > + r->ss = cur_regs->ss;
> > + r->cs = cur_regs->cs;
> > + if (!(sampled->arch.flags & TF_kernel_mode))
> > + *flags |= PMU_SAMPLE_USER;
> > + } else {
> > + struct segment_register seg;
> > +
> > + hvm_get_segment_register(sampled, x86_seg_cs, &seg);
> > + r->cs = seg.sel;
> > + hvm_get_segment_register(sampled, x86_seg_ss, &seg);
> > + r->ss = seg.sel;
> > + r->cpl = seg.dpl;
> > + if (!(sampled->arch.hvm.guest_cr[0] & X86_CR0_PE))
> > + *flags |= PMU_SAMPLE_REAL;
> > + }
> > +}
> > +
>
> This is fine in principle, except that you're changing the style away
> from Xen style.
Ah looks like we lack a .clang-format file, so `gq` reformatted the
code with Clang's default settings.
I'll avoid using `gq` for now, and have redone the commit using '<' instead.
>
> I can fix it on commit, but it's going to collide massively later in the
> series.
I've pushed a WiP branch which contains this change, and fixed the
rebase conflicts in the followup patches:
https://gitlab.com/xen-project/people/edwintorok/xen/-/commits/pmustack-next?ref_type=heads
Best regards,
--Edwin
>
> ~Andrew
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v1 05/10] pmu.h: introduce a stacktrace area
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (3 preceding siblings ...)
2025-07-25 15:06 ` [RFC PATCH v1 04/10] vpmu.c: factor out register conversion Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-08-05 6:40 ` Andriy Sultanov
2025-08-28 12:04 ` Jan Beulich
2025-07-25 15:06 ` [RFC PATCH v1 06/10] arch-x86/pmu.h: convert ascii art diagram to Unicode Edwin Török
` (9 subsequent siblings)
14 siblings, 2 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
andriy.sultanov, boris.ostrovsky
The guest always allocates a full page that is mapped for 'struct
xen_pmu_data' (there is no smaller mapping granularity).
The existing struct contains a flexible array member that is used to
store architectural performance counters on Intel (their number is
dynamically determined from CPUID).
AFAICT their number is currently limited to 32 due to the bitmap in the
CPUID only having 32 bits defined for this, although the cpuid has 8
bits reserved to specify the number of these counters, so this may grow.
For backwards compatibility and to leave as much room for future growth
in these counters as possible: the newly introduced hypervisor
stacktrace area will be stored at the end of the page.
Conceptually this would've been:
```
struct xen_pmu_data_v2 {
union {
struct xen_pmu_data v1;
uint8_t pad[PAGE_SIZE - sizeof(struct xen_pmu_hv_stacktrace)];
};
struct xen_pmu_hv_stacktrace pmu_stack;
};
```
But that is not valid C due to the flexible array member in v1.
127 is chosen as the stacktrace limit to match the default stacktrace
limit in Linux.
This is also compatible with old hypervisors, where stacktrace_nr would
be left at 0 (the entire page is zeroed at allocation), and would just
never return a stacktrace (compatible behaviour).
Even on hypervisors that support the stacktrace struct, the stacktrace
may only be available in certain configurations for now (e.g.
CONFIG_FRAME_POINTER on x86-64 only for now).
The guest registers however will be available whenever domain_id ==
DOMID_XEN, to enable guest kernels to profile themselves.
There is code duplication in 'struct xen_pmu_arch_guest',
but xlat.h checker fails if I try to reuse that in the definition of
xen_pmu_arch_t.
There is padding at the end of `xen_pmu_hv_stacktrace` for
future-proofing.
No functional change.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/arch/x86/cpu/vpmu.c | 2 +-
xen/arch/x86/cpu/vpmu_amd.c | 2 +-
xen/arch/x86/cpu/vpmu_intel.c | 2 +-
xen/include/public/arch-arm.h | 1 +
xen/include/public/arch-ppc.h | 1 +
xen/include/public/arch-riscv.h | 1 +
xen/include/public/arch-x86/pmu.h | 157 ++++++++++++++++++------------
xen/include/public/pmu.h | 36 ++++++-
8 files changed, 137 insertions(+), 65 deletions(-)
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 713311a1ac..286dc2af5f 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -404,7 +404,7 @@ static int vpmu_arch_initialise(struct vcpu *v)
uint8_t vendor = current_cpu_data.x86_vendor;
int ret;
- BUILD_BUG_ON(sizeof(struct xen_pmu_data) > PAGE_SIZE);
+ BUILD_BUG_ON(sizeof(struct xen_pmu_data) > MAX_XEN_PMU_DATA_SIZE);
BUILD_BUG_ON(sizeof(struct xen_pmu_intel_ctxt) > XENPMU_CTXT_PAD_SZ);
BUILD_BUG_ON(sizeof(struct xen_pmu_amd_ctxt) > XENPMU_CTXT_PAD_SZ);
BUILD_BUG_ON(sizeof(struct xen_pmu_regs) > XENPMU_REGS_PAD_SZ);
diff --git a/xen/arch/x86/cpu/vpmu_amd.c b/xen/arch/x86/cpu/vpmu_amd.c
index a6117dfebf..024d0c1eb7 100644
--- a/xen/arch/x86/cpu/vpmu_amd.c
+++ b/xen/arch/x86/cpu/vpmu_amd.c
@@ -543,7 +543,7 @@ static const struct arch_vpmu_ops *__init common_init(void)
}
if ( sizeof(struct xen_pmu_data) +
- 2 * sizeof(uint64_t) * num_counters > PAGE_SIZE )
+ 2 * sizeof(uint64_t) * num_counters > MAX_XEN_PMU_DATA_SIZE)
{
printk(XENLOG_WARNING
"VPMU: Register bank does not fit into VPMU shared page\n");
diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
index 7ce98ee42e..9c8b5c1907 100644
--- a/xen/arch/x86/cpu/vpmu_intel.c
+++ b/xen/arch/x86/cpu/vpmu_intel.c
@@ -967,7 +967,7 @@ const struct arch_vpmu_ops *__init core2_vpmu_init(void)
pmc_quirk = current_cpu_data.x86 == 6;
if ( sizeof(struct xen_pmu_data) + sizeof(uint64_t) * fixed_pmc_cnt +
- sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt > PAGE_SIZE )
+ sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt > MAX_XEN_PMU_DATA_SIZE)
{
printk(XENLOG_WARNING
"VPMU: Register bank does not fit into VPMU share page\n");
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index e2412a1747..3d61337fda 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -540,6 +540,7 @@ typedef uint64_t xen_callback_t;
#ifndef __ASSEMBLY__
/* Stub definition of PMU structure */
typedef struct xen_pmu_arch { uint8_t dummy; } xen_pmu_arch_t;
+struct xen_pmu_arch_guest { uint8_t dummy; };
#endif
#endif /* __XEN_PUBLIC_ARCH_ARM_H__ */
diff --git a/xen/include/public/arch-ppc.h b/xen/include/public/arch-ppc.h
index 4ca453a284..5e019ac8d5 100644
--- a/xen/include/public/arch-ppc.h
+++ b/xen/include/public/arch-ppc.h
@@ -101,6 +101,7 @@ struct xen_arch_domainconfig {
};
typedef struct xen_pmu_arch { uint8_t dummy; } xen_pmu_arch_t;
+struct xen_pmu_arch_guest { uint8_t dummy; };
#endif /* !__ASSEMBLY__ */
diff --git a/xen/include/public/arch-riscv.h b/xen/include/public/arch-riscv.h
index 168263b920..30f9ec609a 100644
--- a/xen/include/public/arch-riscv.h
+++ b/xen/include/public/arch-riscv.h
@@ -78,6 +78,7 @@ typedef struct arch_shared_info arch_shared_info_t;
/* Stub definition of PMU structure */
typedef struct xen_pmu_arch { uint8_t dummy; } xen_pmu_arch_t;
+struct xen_pmu_arch_guest { uint8_t dummy; };
#endif
#endif /* __XEN_PUBLIC_ARCH_RISCV_H__ */
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
index 5bd0aa6f77..bdc8218cbe 100644
--- a/xen/include/public/arch-x86/pmu.h
+++ b/xen/include/public/arch-x86/pmu.h
@@ -74,6 +74,23 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
#define PMU_SAMPLE_REAL (1<<2) /* Sample is from realmode */
#define PMU_SAMPLE_PV (1<<3) /* Sample from a PV guest */
+/*
+ * Architecture-specific information describing the state of the guest at
+ * the time of a PMU interrupt.
+ */
+struct xen_pmu_arch_guest {
+ union {
+ /*
+ * Processor's registers at the time of interrupt.
+ * WO for hypervisor, RO for guests.
+ */
+ xen_pmu_regs_t regs;
+ /* Padding for adding new registers to xen_pmu_regs in the future */
+#define XENPMU_REGS_PAD_SZ 64
+ uint8_t pad[XENPMU_REGS_PAD_SZ];
+ } r;
+};
+
/*
* Architecture-specific information describing state of the processor at
* the time of PMU interrupt.
@@ -129,67 +146,86 @@ struct xen_pmu_arch {
typedef struct xen_pmu_arch xen_pmu_arch_t;
DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
+
/* Memory layout:
-* ╭─────────────────────╮
-* │ struct xen_pmu_data │
-* ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁╮
-* │ vcpu_id │ │
-* ├────────────────────────────────────────────────────────────┤ │
-* │ pcpu_id │ │
-* ├────────────────────────────────────────────────────────────┤ │
-* │ domain_id │ │
-* ├────────────────────────────────────────────────────────────┤ │
-* │██pad███████████████████████████████████████████████████████│ │
-* ╞════╤═╤═══╤══════════════════╤══════════════════════════════╡ │
-* │ pmu│ │ r │ regs │██pad█████████████████████████│ │
-* ├────╯ ├───╯ (xen or guest) │██████████████████████████████│ │
-* │ ╞══════════════════════╧══════════════════════════════╡ │
-* │ │ pmu_flags │ │
-* │ ╞═══╤════════════════════╤════════════════════════════╡ │
-* │ │ l │ lapic_lvtpc │████████████████████████████│ │
-* │ ├───╯ ███████████████████│██pad███████████████████████│ │
-* │ │ ███████████████████│████████████████████████████│ │
-* │ ╞═══╤═╤═══════╤═════╤════╪════╤═══════╤═══════════════╡ │
-* │ │ c │ │ │ amd │ │ │ intel │ │█████│ │
-* │ ├───┘ │ ╰─────╯ │ ╰───────╯ │█████│ │
-* │ │ │ counter │ fixed_counters │█████│ │
-* │ │ ├──────────────────┼──────────────────────┤█████│ │
-* │ │ │ ctrls │ arch_counters │█████│ │
-* │ │ ╞═════╤════════╤═══├──────────────────────┤█████│ │
-* │ │ │ │ regs[] │ ┆│ global_ctrl │█████│ │
-* │ │ │ └────────╯ ┆├──────────────────────┤█████│ │
-* │ │ │struct ┆│ global_ovf_ctrl │█████│ │
-* │ │ │xen_pmu_cntr_pair┆├──────────────────────┤█████│ │
-* │ │ │[counters] ┆│ global_status │█████│ │
-* │ │ │ ┆├──────────────────────┤█████│ │
-* │ │ │ ┆│ fixed_ctrl │█████│ │
-* │ │ │ ┆├──────────────────────┤█████│ │
-* │ │ │ ┆│ ds_area │█████│ │
-* │ │ │ ┆├──────────────────────┤█████│ │
-* │ │ │ ┆│ pebs_enable │█pad█│ │
-* │ │ │ ┆├──────────────────────┤█████│ │
-* │ │ │ ▽│ debugctl │█████│ │
-* │ │ │██████████████████╞═══════╤════════╤═════╡█████│ │
-* │ │ │██████████████████│ │ regs[] │ ┆[0]│█████│ │
-* │ │ │██████████████████│ └────────╯ ┆ │█████│ │
-* │ │ │██████████████████│ uint64_t ┆ │█████│ │
-* │ │ │██████████████████│ [fixed_counters] ┆ │█████│ │
-* │ │ │██████████████████│ ┆ │█████│ │
-* │ │ │██████████████████│ ┆ │█████│ │
-* │ │ │██████████████████│ ─────────────────┆ │█████│ │
-* │ │ │██████████████████│ struct ┆ │█████│ │
-* │ │ │██████████████████│ xen_pmu_cntr_pair┆ │█████│ │
-* │ │ ╘══════════════════╡ [arch_counters] ┆ ╞═════╡ │
-* │ │ │ ┆ │ │ │
-* │ │ │ ▽ │ │ │
-* │ │ ╘══════════════════════╛ │ │
-* │ ╘═════════════════════════════════════════════════════╡ │
-* ╞════════════════════════════════════════════════════════════╡ │
-* │████████████████████████████████████████████████████████████│ │
-* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
-* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
-* │████████████████████████████████████████████████████████████│ │ PAGE_SIZE
-* ╘════════════════════════════════════════════════════════════╛ ◁╯
+ * ╭─────────────────────╮
+ * │ struct xen_pmu_data │
+ * ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁╮
+ * │ vcpu_id │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │ pcpu_id │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │ domain_id │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │██pad███████████████████████████████████████████████████████│ │
+ * ╞════╤═╤═══╤══════════════════╤══════════════════════════════╡ │
+ * │ pmu│ │ r │ regs │██pad█████████████████████████│ │
+ * ├────╯ ├───╯ (xen or guest) │██████████████████████████████│ │
+ * │ ╞══════════════════════╧══════════════════════════════╡ │
+ * │ │ pmu_flags │ │
+ * │ ╞═══╤════════════════════╤════════════════════════════╡ │
+ * │ │ l │ lapic_lvtpc │████████████████████████████│ │
+ * │ ├───╯ ███████████████████│██pad███████████████████████│ │
+ * │ │ ███████████████████│████████████████████████████│ │
+ * │ ╞═══╤═╤═══════╤═════╤════╪════╤═══════╤═══════════════╡ │
+ * │ │ c │ │ │ amd │ │ │ intel │ │█████│ │
+ * │ ├───┘ │ ╰─────╯ │ ╰───────╯ │█████│ │
+ * │ │ │ counter │ fixed_counters │█████│ │
+ * │ │ ├──────────────────┼──────────────────────┤█████│ │
+ * │ │ │ ctrls │ arch_counters │█████│ │
+ * │ │ ╞═════╤════════╤═══├──────────────────────┤█████│ │
+ * │ │ │ │ regs[] │ ┆│ global_ctrl │█████│ │
+ * │ │ │ └────────╯ ┆├──────────────────────┤█████│ │
+ * │ │ │struct ┆│ global_ovf_ctrl │█████│ │
+ * │ │ │xen_pmu_cntr_pair┆├──────────────────────┤█████│ │
+ * │ │ │[counters] ┆│ global_status │█████│ │
+ * │ │ │ ┆├──────────────────────┤█████│ │
+ * │ │ │ ┆│ fixed_ctrl │█████│ │
+ * │ │ │ ┆├──────────────────────┤█████│ │
+ * │ │ │ ┆│ ds_area │█████│ │
+ * │ │ │ ┆├──────────────────────┤█████│ │
+ * │ │ │ ┆│ pebs_enable │█pad█│ │
+ * │ │ │ ┆├──────────────────────┤█████│ │
+ * │ │ │ ▽│ debugctl │█████│ │
+ * │ │ │██████████████████╞═══════╤════════╤═════╡█████│ │
+ * │ │ │██████████████████│ │ regs[] │ ┆[0]│█████│ │
+ * │ │ │██████████████████│ └────────╯ ┆ │█████│ │
+ * │ │ │██████████████████│ uint64_t ┆ │█████│ │
+ * │ │ │██████████████████│ [fixed_counters] ┆ │█████│ │
+ * │ │ │██████████████████│ ┆ │█████│ │
+ * │ │ │██████████████████│ ┆ │█████│ │
+ * │ │ │██████████████████│ ─────────────────┆ │█████│ │
+ * │ │ │██████████████████│ struct ┆ │█████│ │
+ * │ │ │██████████████████│ xen_pmu_cntr_pair┆ │█████│ │
+ * │ │ ╘══════════════════╡ [arch_counters] ┆ ╞═════╡ │
+ * │ │ │ ┆ │ │ │
+ * │ │ │ ▽ │ │ │
+ * │ │ ╘══════════════════════╛ │ │
+ * │ ╘═════════════════════════════════════════════════════╡ │
+ * ╞════════════════════════════════════════════════════════════╡ │
+ * │████████████████████████████████████████████████████████████│ │
+ * ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
+ * ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
+ * │████████████████████████████████████████████████████████████│ │
+ * |############################################################| |
+ * |##########.------------------------------.##################| |
+ * |##########| struct xen_pmu_hv_stacktrace |##################| |
+ * +==========+==============================+==================+ |
+ * | ^ [stacktrace_nr-1] | |
+ * | : | |
+ * | stacktrace[stacktrace_nr] : [0] | |
+ * +------------------------------------------------------------+ |
+ * | stacktrace_nr | |
+ * +------------------------------------------------------------+ |
+ * | guest_domain_id | |
+ * +------------------------------------------------------------+ |
+ * |##pad#######################################################| |
+ * +=======+=+===+==================+===========================+ |
+ * | guest | | r | regs |##pad######################| |
+ * +-------. +---. (xen or guest) |###########################| |
+ * | +======================+===========================+ |
+ * | |##pad2############################################| | PAGE_SIZE
+ * +=========+==================================================+ <.
*/
#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
@@ -202,4 +238,3 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
* indent-tabs-mode: nil
* End:
*/
-
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 15decc024d..1879914ea6 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -20,7 +20,7 @@
#endif
#define XENPMU_VER_MAJ 0
-#define XENPMU_VER_MIN 1
+#define XENPMU_VER_MIN 2
/*
* ` enum neg_errnoval
@@ -120,6 +120,40 @@ struct xen_pmu_data {
xen_pmu_arch_t pmu;
};
+/* stacktrace entry populated from the end,
+ * so stacktrace_nr == 1, means that stacktrace[PMU_MAX_STACKTRCE-1] is valid.
+ * This is done, so that PMU_MAX_STACKTRACE can be changed in the future, without breaking the ABI.
+ * The struct itself (and thus the stacktrace_nr field) will always be placed at the end of a page.
+ *
+ * See arch-x86/pmu.h for an example memory layout on x86.
+ *
+ * */
+#define PMU_MAX_STACKTRACE 127
+
+/* WO for hypervisor, RO for guest */
+struct xen_pmu_hv_stacktrace {
+ uint64_t stacktrace[PMU_MAX_STACKTRACE];
+ uint64_t stacktrace_nr;
+
+ /* Like xen_pmu_data.domain_id, but instead of DOMID_XEN always contains the
+ * domain that was interrupted (DOMID_SELF if it matches the sampling
+ * domain).
+ */
+ domid_t guest_domain_id;
+ uint8_t pad[6];
+
+ /* When xen_pmu_data.domain_id == DOMID_XEN, this will contain
+ * the registers of the guest that was interrupted.
+ * This is useful for Dom0 kernel stacktraces, even if the interrupt
+ * arrives while in Xen.
+ * */
+ struct xen_pmu_arch_guest guest;
+#define XEN_PMU_STACKTRACE_PAD 48
+ uint8_t pad2[XEN_PMU_STACKTRACE_PAD];
+};
+
+#define MAX_XEN_PMU_DATA_SIZE (PAGE_SIZE - sizeof(struct xen_pmu_hv_stacktrace))
+
#endif /* __XEN_PUBLIC_PMU_H__ */
/*
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 05/10] pmu.h: introduce a stacktrace area
2025-07-25 15:06 ` [RFC PATCH v1 05/10] pmu.h: introduce a stacktrace area Edwin Török
@ 2025-08-05 6:40 ` Andriy Sultanov
2025-08-28 12:04 ` Jan Beulich
1 sibling, 0 replies; 30+ messages in thread
From: Andriy Sultanov @ 2025-08-05 6:40 UTC (permalink / raw)
To: Edwin Török, xen-devel
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, Bertrand Marquis,
Volodymyr Babchuk, boris.ostrovsky
On 7/25/25 5:07 PM, Edwin Török wrote:
> diff --git a/xen/arch/x86/cpu/vpmu_amd.c b/xen/arch/x86/cpu/vpmu_amd.c
> index a6117dfebf..024d0c1eb7 100644
> --- a/xen/arch/x86/cpu/vpmu_amd.c
> +++ b/xen/arch/x86/cpu/vpmu_amd.c
> @@ -543,7 +543,7 @@ static const struct arch_vpmu_ops *__init common_init(void)
> }
>
> if ( sizeof(struct xen_pmu_data) +
> - 2 * sizeof(uint64_t) * num_counters > PAGE_SIZE )
> + 2 * sizeof(uint64_t) * num_counters > MAX_XEN_PMU_DATA_SIZE)
> {
> printk(XENLOG_WARNING
> "VPMU: Register bank does not fit into VPMU shared page\n");
You've lost a space before the end of the parenthesis here
> diff --git a/xen/arch/x86/cpu/vpmu_intel.c b/xen/arch/x86/cpu/vpmu_intel.c
> index 7ce98ee42e..9c8b5c1907 100644
> --- a/xen/arch/x86/cpu/vpmu_intel.c
> +++ b/xen/arch/x86/cpu/vpmu_intel.c
> @@ -967,7 +967,7 @@ const struct arch_vpmu_ops *__init core2_vpmu_init(void)
> pmc_quirk = current_cpu_data.x86 == 6;
>
> if ( sizeof(struct xen_pmu_data) + sizeof(uint64_t) * fixed_pmc_cnt +
> - sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt > PAGE_SIZE )
> + sizeof(struct xen_pmu_cntr_pair) * arch_pmc_cnt > MAX_XEN_PMU_DATA_SIZE)
> {
> printk(XENLOG_WARNING
> "VPMU: Register bank does not fit into VPMU share page\n");
and here
> diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
> index 5bd0aa6f77..bdc8218cbe 100644
> --- a/xen/include/public/arch-x86/pmu.h
> +++ b/xen/include/public/arch-x86/pmu.h
> @@ -129,67 +146,86 @@ struct xen_pmu_arch {
> typedef struct xen_pmu_arch xen_pmu_arch_t;
> DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
>
> +
> /* Memory layout:
> -* ╭─────────────────────╮
> -* │ struct xen_pmu_data │
> -* ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁╮
> -* │ vcpu_id │ │
> -* ├────────────────────────────────────────────────────────────┤ │
> -* │ pcpu_id │ │
> -* ├────────────────────────────────────────────────────────────┤ │
> -* │ domain_id │ │
> -* ├────────────────────────────────────────────────────────────┤ │
> -* │██pad███████████████████████████████████████████████████████│ │
> -* ╞════╤═╤═══╤══════════════════╤══════════════════════════════╡ │
> -* │ pmu│ │ r │ regs │██pad█████████████████████████│ │
> -* ├────╯ ├───╯ (xen or guest) │██████████████████████████████│ │
> -* │ ╞══════════════════════╧══════════════════════════════╡ │
> -* │ │ pmu_flags │ │
> -* │ ╞═══╤════════════════════╤════════════════════════════╡ │
> -* │ │ l │ lapic_lvtpc │████████████████████████████│ │
> -* │ ├───╯ ███████████████████│██pad███████████████████████│ │
> -* │ │ ███████████████████│████████████████████████████│ │
> -* │ ╞═══╤═╤═══════╤═════╤════╪════╤═══════╤═══════════════╡ │
> -* │ │ c │ │ │ amd │ │ │ intel │ │█████│ │
> -* │ ├───┘ │ ╰─────╯ │ ╰───────╯ │█████│ │
> -* │ │ │ counter │ fixed_counters │█████│ │
> -* │ │ ├──────────────────┼──────────────────────┤█████│ │
> -* │ │ │ ctrls │ arch_counters │█████│ │
> -* │ │ ╞═════╤════════╤═══├──────────────────────┤█████│ │
> -* │ │ │ │ regs[] │ ┆│ global_ctrl │█████│ │
> -* │ │ │ └────────╯ ┆├──────────────────────┤█████│ │
> -* │ │ │struct ┆│ global_ovf_ctrl │█████│ │
> -* │ │ │xen_pmu_cntr_pair┆├──────────────────────┤█████│ │
> -* │ │ │[counters] ┆│ global_status │█████│ │
> -* │ │ │ ┆├──────────────────────┤█████│ │
> -* │ │ │ ┆│ fixed_ctrl │█████│ │
> -* │ │ │ ┆├──────────────────────┤█████│ │
> -* │ │ │ ┆│ ds_area │█████│ │
> -* │ │ │ ┆├──────────────────────┤█████│ │
> -* │ │ │ ┆│ pebs_enable │█pad█│ │
> -* │ │ │ ┆├──────────────────────┤█████│ │
> -* │ │ │ ▽│ debugctl │█████│ │
it looks like there is a stray space character here breaking up
the vertical lines
> -* │ │ │██████████████████╞═══════╤════════╤═════╡█████│ │
> -* │ │ │██████████████████│ │ regs[] │ ┆[0]│█████│ │
> -* │ │ │██████████████████│ └────────╯ ┆ │█████│ │
> -* │ │ │██████████████████│ uint64_t ┆ │█████│ │
> -* │ │ │██████████████████│ [fixed_counters] ┆ │█████│ │
> -* │ │ │██████████████████│ ┆ │█████│ │
> -* │ │ │██████████████████│ ┆ │█████│ │
> -* │ │ │██████████████████│ ─────────────────┆ │█████│ │
> -* │ │ │██████████████████│ struct ┆ │█████│ │
> -* │ │ │██████████████████│ xen_pmu_cntr_pair┆ │█████│ │
> -* │ │ ╘══════════════════╡ [arch_counters] ┆ ╞═════╡ │
> -* │ │ │ ┆ │ │ │
> -* │ │ │ ▽ │ │ │
same here
> -* │ │ ╘══════════════════════╛ │ │
> -* │ ╘═════════════════════════════════════════════════════╡ │
> -* ╞════════════════════════════════════════════════════════════╡ │
> -* │████████████████████████████████████████████████████████████│ │
> -* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
> -* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
> -* │████████████████████████████████████████████████████████████│ │ PAGE_SIZE
> -* ╘════════════════════════════════════════════════════════════╛ ◁╯
> + * ╭─────────────────────╮
> + * │ struct xen_pmu_data │
> + * ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁╮
> + * │ vcpu_id │ │
> + * ├────────────────────────────────────────────────────────────┤ │
> + * │ pcpu_id │ │
> + * ├────────────────────────────────────────────────────────────┤ │
> + * │ domain_id │ │
> + * ├────────────────────────────────────────────────────────────┤ │
> + * │██pad███████████████████████████████████████████████████████│ │
> + * ╞════╤═╤═══╤══════════════════╤══════════════════════════════╡ │
> + * │ pmu│ │ r │ regs │██pad█████████████████████████│ │
> + * ├────╯ ├───╯ (xen or guest) │██████████████████████████████│ │
> + * │ ╞══════════════════════╧══════════════════════════════╡ │
> + * │ │ pmu_flags │ │
> + * │ ╞═══╤════════════════════╤════════════════════════════╡ │
> + * │ │ l │ lapic_lvtpc │████████████████████████████│ │
> + * │ ├───╯ ███████████████████│██pad███████████████████████│ │
> + * │ │ ███████████████████│████████████████████████████│ │
> + * │ ╞═══╤═╤═══════╤═════╤════╪════╤═══════╤═══════════════╡ │
> + * │ │ c │ │ │ amd │ │ │ intel │ │█████│ │
> + * │ ├───┘ │ ╰─────╯ │ ╰───────╯ │█████│ │
> + * │ │ │ counter │ fixed_counters │█████│ │
> + * │ │ ├──────────────────┼──────────────────────┤█████│ │
> + * │ │ │ ctrls │ arch_counters │█████│ │
> + * │ │ ╞═════╤════════╤═══├──────────────────────┤█████│ │
> + * │ │ │ │ regs[] │ ┆│ global_ctrl │█████│ │
> + * │ │ │ └────────╯ ┆├──────────────────────┤█████│ │
> + * │ │ │struct ┆│ global_ovf_ctrl │█████│ │
> + * │ │ │xen_pmu_cntr_pair┆├──────────────────────┤█████│ │
> + * │ │ │[counters] ┆│ global_status │█████│ │
> + * │ │ │ ┆├──────────────────────┤█████│ │
> + * │ │ │ ┆│ fixed_ctrl │█████│ │
> + * │ │ │ ┆├──────────────────────┤█████│ │
> + * │ │ │ ┆│ ds_area │█████│ │
> + * │ │ │ ┆├──────────────────────┤█████│ │
> + * │ │ │ ┆│ pebs_enable │█pad█│ │
> + * │ │ │ ┆├──────────────────────┤█████│ │
> + * │ │ │ ▽│ debugctl │█████│ │
and here
> + * │ │ │██████████████████╞═══════╤════════╤═════╡█████│ │
> + * │ │ │██████████████████│ │ regs[] │ ┆[0]│█████│ │
> + * │ │ │██████████████████│ └────────╯ ┆ │█████│ │
> + * │ │ │██████████████████│ uint64_t ┆ │█████│ │
> + * │ │ │██████████████████│ [fixed_counters] ┆ │█████│ │
> + * │ │ │██████████████████│ ┆ │█████│ │
> + * │ │ │██████████████████│ ┆ │█████│ │
> + * │ │ │██████████████████│ ─────────────────┆ │█████│ │
> + * │ │ │██████████████████│ struct ┆ │█████│ │
> + * │ │ │██████████████████│ xen_pmu_cntr_pair┆ │█████│ │
> + * │ │ ╘══════════════════╡ [arch_counters] ┆ ╞═════╡ │
> + * │ │ │ ┆ │ │ │
> + * │ │ │ ▽ │ │ │
and here
> + * │ │ ╘══════════════════════╛ │ │
> + * │ ╘═════════════════════════════════════════════════════╡ │
> + * ╞════════════════════════════════════════════════════════════╡ │
> + * │████████████████████████████████████████████████████████████│ │
> + * ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
> + * ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
> + * │████████████████████████████████████████████████████████████│ │
> + * |############################################################| |
> + * |##########.------------------------------.##################| |
> + * |##########| struct xen_pmu_hv_stacktrace |##################| |
> + * +==========+==============================+==================+ |
> + * | ^ [stacktrace_nr-1] | |
> + * | : | |
> + * | stacktrace[stacktrace_nr] : [0] | |
> + * +------------------------------------------------------------+ |
> + * | stacktrace_nr | |
> + * +------------------------------------------------------------+ |
> + * | guest_domain_id | |
> + * +------------------------------------------------------------+ |
> + * |##pad#######################################################| |
> + * +=======+=+===+==================+===========================+ |
> + * | guest | | r | regs |##pad######################| |
> + * +-------. +---. (xen or guest) |###########################| |
> + * | +======================+===========================+ |
> + * | |##pad2############################################| | PAGE_SIZE
> + * +=========+==================================================+ <.
> */
>
> #endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
Andriy Sultanov | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 05/10] pmu.h: introduce a stacktrace area
2025-07-25 15:06 ` [RFC PATCH v1 05/10] pmu.h: introduce a stacktrace area Edwin Török
2025-08-05 6:40 ` Andriy Sultanov
@ 2025-08-28 12:04 ` Jan Beulich
1 sibling, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2025-08-28 12:04 UTC (permalink / raw)
To: Edwin Török
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, Bertrand Marquis,
Volodymyr Babchuk, andriy.sultanov, boris.ostrovsky, xen-devel
On 25.07.2025 17:06, Edwin Török wrote:
> The guest always allocates a full page that is mapped for 'struct
> xen_pmu_data' (there is no smaller mapping granularity).
>
> The existing struct contains a flexible array member that is used to
> store architectural performance counters on Intel (their number is
> dynamically determined from CPUID).
>
> AFAICT their number is currently limited to 32 due to the bitmap in the
> CPUID only having 32 bits defined for this, although the cpuid has 8
> bits reserved to specify the number of these counters, so this may grow.
>
> For backwards compatibility and to leave as much room for future growth
> in these counters as possible: the newly introduced hypervisor
> stacktrace area will be stored at the end of the page.
>
> Conceptually this would've been:
> ```
> struct xen_pmu_data_v2 {
> union {
> struct xen_pmu_data v1;
> uint8_t pad[PAGE_SIZE - sizeof(struct xen_pmu_hv_stacktrace)];
> };
> struct xen_pmu_hv_stacktrace pmu_stack;
> };
> ```
>
> But that is not valid C due to the flexible array member in v1.
And I fear it's not a good interface anyway. What's wrong with the caller
dedicating another page to holding the stack trace related data?
> There is code duplication in 'struct xen_pmu_arch_guest',
> but xlat.h checker fails if I try to reuse that in the definition of
> xen_pmu_arch_t.
What exactly is this referring to?
> --- a/xen/include/public/arch-x86/pmu.h
> +++ b/xen/include/public/arch-x86/pmu.h
> @@ -74,6 +74,23 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
> #define PMU_SAMPLE_REAL (1<<2) /* Sample is from realmode */
> #define PMU_SAMPLE_PV (1<<3) /* Sample from a PV guest */
>
> +/*
> + * Architecture-specific information describing the state of the guest at
> + * the time of a PMU interrupt.
> + */
> +struct xen_pmu_arch_guest {
> + union {
> + /*
> + * Processor's registers at the time of interrupt.
> + * WO for hypervisor, RO for guests.
> + */
> + xen_pmu_regs_t regs;
How's this related to stack traces?
> @@ -129,67 +146,86 @@ struct xen_pmu_arch {
> typedef struct xen_pmu_arch xen_pmu_arch_t;
> DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
>
> +
Stray double blank lines.
> /* Memory layout:
> -* ╭─────────────────────╮
> -* │ struct xen_pmu_data │
> -* ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁╮
> -* │ vcpu_id │ │
> -* ├────────────────────────────────────────────────────────────┤ │
> -* │ pcpu_id │ │
> -* ├────────────────────────────────────────────────────────────┤ │
> -* │ domain_id │ │
> -* ├────────────────────────────────────────────────────────────┤ │
> -* │██pad███████████████████████████████████████████████████████│ │
> -* ╞════╤═╤═══╤══════════════════╤══════════════════════════════╡ │
> -* │ pmu│ │ r │ regs │██pad█████████████████████████│ │
> -* ├────╯ ├───╯ (xen or guest) │██████████████████████████████│ │
> -* │ ╞══════════════════════╧══════════════════════════════╡ │
> -* │ │ pmu_flags │ │
> -* │ ╞═══╤════════════════════╤════════════════════════════╡ │
> -* │ │ l │ lapic_lvtpc │████████████████████████████│ │
> -* │ ├───╯ ███████████████████│██pad███████████████████████│ │
> -* │ │ ███████████████████│████████████████████████████│ │
> -* │ ╞═══╤═╤═══════╤═════╤════╪════╤═══════╤═══════════════╡ │
> -* │ │ c │ │ │ amd │ │ │ intel │ │█████│ │
> -* │ ├───┘ │ ╰─────╯ │ ╰───────╯ │█████│ │
> -* │ │ │ counter │ fixed_counters │█████│ │
> -* │ │ ├──────────────────┼──────────────────────┤█████│ │
> -* │ │ │ ctrls │ arch_counters │█████│ │
> -* │ │ ╞═════╤════════╤═══├──────────────────────┤█████│ │
> -* │ │ │ │ regs[] │ ┆│ global_ctrl │█████│ │
> -* │ │ │ └────────╯ ┆├──────────────────────┤█████│ │
> -* │ │ │struct ┆│ global_ovf_ctrl │█████│ │
> -* │ │ │xen_pmu_cntr_pair┆├──────────────────────┤█████│ │
> -* │ │ │[counters] ┆│ global_status │█████│ │
> -* │ │ │ ┆├──────────────────────┤█████│ │
> -* │ │ │ ┆│ fixed_ctrl │█████│ │
> -* │ │ │ ┆├──────────────────────┤█████│ │
> -* │ │ │ ┆│ ds_area │█████│ │
> -* │ │ │ ┆├──────────────────────┤█████│ │
> -* │ │ │ ┆│ pebs_enable │█pad█│ │
> -* │ │ │ ┆├──────────────────────┤█████│ │
> -* │ │ │ ▽│ debugctl │█████│ │
> -* │ │ │██████████████████╞═══════╤════════╤═════╡█████│ │
> -* │ │ │██████████████████│ │ regs[] │ ┆[0]│█████│ │
> -* │ │ │██████████████████│ └────────╯ ┆ │█████│ │
> -* │ │ │██████████████████│ uint64_t ┆ │█████│ │
> -* │ │ │██████████████████│ [fixed_counters] ┆ │█████│ │
> -* │ │ │██████████████████│ ┆ │█████│ │
> -* │ │ │██████████████████│ ┆ │█████│ │
> -* │ │ │██████████████████│ ─────────────────┆ │█████│ │
> -* │ │ │██████████████████│ struct ┆ │█████│ │
> -* │ │ │██████████████████│ xen_pmu_cntr_pair┆ │█████│ │
> -* │ │ ╘══════════════════╡ [arch_counters] ┆ ╞═════╡ │
> -* │ │ │ ┆ │ │ │
> -* │ │ │ ▽ │ │ │
> -* │ │ ╘══════════════════════╛ │ │
> -* │ ╘═════════════════════════════════════════════════════╡ │
> -* ╞════════════════════════════════════════════════════════════╡ │
> -* │████████████████████████████████████████████████████████████│ │
> -* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
> -* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
> -* │████████████████████████████████████████████████████████████│ │ PAGE_SIZE
> -* ╘════════════════════════════════════════════════════════════╛ ◁╯
> + * ╭─────────────────────╮
> + * │ struct xen_pmu_data │
> + * ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁╮
> + * │ vcpu_id │ │
> + * ├────────────────────────────────────────────────────────────┤ │
> + * │ pcpu_id │ │
> + * ├────────────────────────────────────────────────────────────┤ │
> + * │ domain_id │ │
> + * ├────────────────────────────────────────────────────────────┤ │
> + * │██pad███████████████████████████████████████████████████████│ │
> + * ╞════╤═╤═══╤══════════════════╤══════════════════════════════╡ │
> + * │ pmu│ │ r │ regs │██pad█████████████████████████│ │
> + * ├────╯ ├───╯ (xen or guest) │██████████████████████████████│ │
> + * │ ╞══════════════════════╧══════════════════════════════╡ │
> + * │ │ pmu_flags │ │
> + * │ ╞═══╤════════════════════╤════════════════════════════╡ │
> + * │ │ l │ lapic_lvtpc │████████████████████████████│ │
> + * │ ├───╯ ███████████████████│██pad███████████████████████│ │
> + * │ │ ███████████████████│████████████████████████████│ │
> + * │ ╞═══╤═╤═══════╤═════╤════╪════╤═══════╤═══════════════╡ │
> + * │ │ c │ │ │ amd │ │ │ intel │ │█████│ │
> + * │ ├───┘ │ ╰─────╯ │ ╰───────╯ │█████│ │
> + * │ │ │ counter │ fixed_counters │█████│ │
> + * │ │ ├──────────────────┼──────────────────────┤█████│ │
> + * │ │ │ ctrls │ arch_counters │█████│ │
> + * │ │ ╞═════╤════════╤═══├──────────────────────┤█████│ │
> + * │ │ │ │ regs[] │ ┆│ global_ctrl │█████│ │
> + * │ │ │ └────────╯ ┆├──────────────────────┤█████│ │
> + * │ │ │struct ┆│ global_ovf_ctrl │█████│ │
> + * │ │ │xen_pmu_cntr_pair┆├──────────────────────┤█████│ │
> + * │ │ │[counters] ┆│ global_status │█████│ │
> + * │ │ │ ┆├──────────────────────┤█████│ │
> + * │ │ │ ┆│ fixed_ctrl │█████│ │
> + * │ │ │ ┆├──────────────────────┤█████│ │
> + * │ │ │ ┆│ ds_area │█████│ │
> + * │ │ │ ┆├──────────────────────┤█████│ │
> + * │ │ │ ┆│ pebs_enable │█pad█│ │
> + * │ │ │ ┆├──────────────────────┤█████│ │
> + * │ │ │ ▽│ debugctl │█████│ │
> + * │ │ │██████████████████╞═══════╤════════╤═════╡█████│ │
> + * │ │ │██████████████████│ │ regs[] │ ┆[0]│█████│ │
> + * │ │ │██████████████████│ └────────╯ ┆ │█████│ │
> + * │ │ │██████████████████│ uint64_t ┆ │█████│ │
> + * │ │ │██████████████████│ [fixed_counters] ┆ │█████│ │
> + * │ │ │██████████████████│ ┆ │█████│ │
> + * │ │ │██████████████████│ ┆ │█████│ │
> + * │ │ │██████████████████│ ─────────────────┆ │█████│ │
> + * │ │ │██████████████████│ struct ┆ │█████│ │
> + * │ │ │██████████████████│ xen_pmu_cntr_pair┆ │█████│ │
> + * │ │ ╘══════════════════╡ [arch_counters] ┆ ╞═════╡ │
> + * │ │ │ ┆ │ │ │
> + * │ │ │ ▽ │ │ │
> + * │ │ ╘══════════════════════╛ │ │
> + * │ ╘═════════════════════════════════════════════════════╡ │
> + * ╞════════════════════════════════════════════════════════════╡ │
> + * │████████████████████████████████████████████████████████████│ │
> + * ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
> + * ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
> + * │████████████████████████████████████████████████████████████│ │
> + * |############################################################| |
> + * |##########.------------------------------.##################| |
> + * |##########| struct xen_pmu_hv_stacktrace |##################| |
> + * +==========+==============================+==================+ |
> + * | ^ [stacktrace_nr-1] | |
> + * | : | |
> + * | stacktrace[stacktrace_nr] : [0] | |
> + * +------------------------------------------------------------+ |
> + * | stacktrace_nr | |
> + * +------------------------------------------------------------+ |
> + * | guest_domain_id | |
> + * +------------------------------------------------------------+ |
> + * |##pad#######################################################| |
> + * +=======+=+===+==================+===========================+ |
> + * | guest | | r | regs |##pad######################| |
> + * +-------. +---. (xen or guest) |###########################| |
> + * | +======================+===========================+ |
> + * | |##pad2############################################| | PAGE_SIZE
> + * +=========+==================================================+ <.
> */
>
> #endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
> @@ -202,4 +238,3 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
> * indent-tabs-mode: nil
> * End:
> */
> -
Stray change?
> @@ -120,6 +120,40 @@ struct xen_pmu_data {
> xen_pmu_arch_t pmu;
> };
>
> +/* stacktrace entry populated from the end,
> + * so stacktrace_nr == 1, means that stacktrace[PMU_MAX_STACKTRCE-1] is valid.
> + * This is done, so that PMU_MAX_STACKTRACE can be changed in the future, without breaking the ABI.
> + * The struct itself (and thus the stacktrace_nr field) will always be placed at the end of a page.
> + *
> + * See arch-x86/pmu.h for an example memory layout on x86.
> + *
> + * */
> +#define PMU_MAX_STACKTRACE 127
> +
> +/* WO for hypervisor, RO for guest */
> +struct xen_pmu_hv_stacktrace {
> + uint64_t stacktrace[PMU_MAX_STACKTRACE];
> + uint64_t stacktrace_nr;
> +
> + /* Like xen_pmu_data.domain_id, but instead of DOMID_XEN always contains the
> + * domain that was interrupted (DOMID_SELF if it matches the sampling
> + * domain).
> + */
> + domid_t guest_domain_id;
How's this related to stack traces? And how would it help associating back
what vCPU the data belongs to?
> + uint8_t pad[6];
> +
> + /* When xen_pmu_data.domain_id == DOMID_XEN, this will contain
> + * the registers of the guest that was interrupted.
> + * This is useful for Dom0 kernel stacktraces, even if the interrupt
> + * arrives while in Xen.
> + * */
Nit: Indentation.
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v1 06/10] arch-x86/pmu.h: convert ascii art diagram to Unicode
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (4 preceding siblings ...)
2025-07-25 15:06 ` [RFC PATCH v1 05/10] pmu.h: introduce a stacktrace area Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-31 15:35 ` Jan Beulich
2025-07-25 15:06 ` [RFC PATCH v1 07/10] arch-x86/vpmu.c: store guest registers when domain_id == DOMID_XEN Edwin Török
` (8 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, Anthony PERARD, Michal Orzel, Julien Grall,
Stefano Stabellini, andriy.sultanov, boris.ostrovsky
Using `aa2u` tool.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/include/public/arch-x86/pmu.h | 47 ++++++++++++++++---------------
xen/include/public/pmu.h | 2 +-
2 files changed, 25 insertions(+), 24 deletions(-)
diff --git a/xen/include/public/arch-x86/pmu.h b/xen/include/public/arch-x86/pmu.h
index bdc8218cbe..92ae4dbb1d 100644
--- a/xen/include/public/arch-x86/pmu.h
+++ b/xen/include/public/arch-x86/pmu.h
@@ -75,8 +75,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
#define PMU_SAMPLE_PV (1<<3) /* Sample from a PV guest */
/*
- * Architecture-specific information describing the state of the guest at
- * the time of a PMU interrupt.
+ * Architecture-specific information describing state of the guest at
+ * the time of PMU interrupt.
+ * Even if the interrupt arrived while inside Xen, this will always contain
+ * the guest's state.
*/
struct xen_pmu_arch_guest {
union {
@@ -146,11 +148,10 @@ struct xen_pmu_arch {
typedef struct xen_pmu_arch xen_pmu_arch_t;
DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
-
/* Memory layout:
* ╭─────────────────────╮
* │ struct xen_pmu_data │
- * ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁╮
+ * ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁│
* │ vcpu_id │ │
* ├────────────────────────────────────────────────────────────┤ │
* │ pcpu_id │ │
@@ -207,25 +208,25 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
* ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
* │████████████████████████████████████████████████████████████│ │
- * |############################################################| |
- * |##########.------------------------------.##################| |
- * |##########| struct xen_pmu_hv_stacktrace |##################| |
- * +==========+==============================+==================+ |
- * | ^ [stacktrace_nr-1] | |
- * | : | |
- * | stacktrace[stacktrace_nr] : [0] | |
- * +------------------------------------------------------------+ |
- * | stacktrace_nr | |
- * +------------------------------------------------------------+ |
- * | guest_domain_id | |
- * +------------------------------------------------------------+ |
- * |##pad#######################################################| |
- * +=======+=+===+==================+===========================+ |
- * | guest | | r | regs |##pad######################| |
- * +-------. +---. (xen or guest) |###########################| |
- * | +======================+===========================+ |
- * | |##pad2############################################| | PAGE_SIZE
- * +=========+==================================================+ <.
+ * │████████████████████████████████████████████████████████████│ │
+ * │██████████╭──────────────────────────────╮██████████████████│ │
+ * │██████████│ struct xen_pmu_hv_stacktrace │██████████████████│ │
+ * ╞══════════╧══════════════════════════════╧══════════════════╡ │
+ * │ △ [stacktrace_nr-1] │ │
+ * │ ┆ │ │
+ * │ stacktrace[stacktrace_nr] ┆ [0] │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │ stacktrace_nr │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │ guest_domain_id │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │██pad███████████████████████████████████████████████████████│ │
+ * ╞═══════╤═╤═══╤══════════════════╤═══════════════════════════╡ │
+ * │ guest │ │ r │ regs │██pad██████████████████████│ │
+ * ├───────╯ ├───╯ (xen or guest) │███████████████████████████│ │
+ * │ ╞══════════════════════╧═══════════════════════════╡ │
+ * │ │██pad2████████████████████████████████████████████│ │ PAGE_SIZE
+ * ╘═════════╧══════════════════════════════════════════════════╛ ◁╯
*/
#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 1879914ea6..6366a79169 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -148,7 +148,7 @@ struct xen_pmu_hv_stacktrace {
* arrives while in Xen.
* */
struct xen_pmu_arch_guest guest;
-#define XEN_PMU_STACKTRACE_PAD 48
+#define XEN_PMU_STACKTRACE_PAD 56
uint8_t pad2[XEN_PMU_STACKTRACE_PAD];
};
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 06/10] arch-x86/pmu.h: convert ascii art diagram to Unicode
2025-07-25 15:06 ` [RFC PATCH v1 06/10] arch-x86/pmu.h: convert ascii art diagram to Unicode Edwin Török
@ 2025-07-31 15:35 ` Jan Beulich
2025-08-01 9:53 ` Edwin Torok
0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2025-07-31 15:35 UTC (permalink / raw)
To: Edwin Török
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, andriy.sultanov,
boris.ostrovsky, xen-devel
On 25.07.2025 17:06, Edwin Török wrote:
> Using `aa2u` tool.
>
> Signed-off-by: Edwin Török <edwin.torok@cloud.com>
How come the use of that tool made ...
> --- a/xen/include/public/arch-x86/pmu.h
> +++ b/xen/include/public/arch-x86/pmu.h
> @@ -75,8 +75,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
> #define PMU_SAMPLE_PV (1<<3) /* Sample from a PV guest */
>
> /*
> - * Architecture-specific information describing the state of the guest at
> - * the time of a PMU interrupt.
> + * Architecture-specific information describing state of the guest at
> + * the time of PMU interrupt.
> + * Even if the interrupt arrived while inside Xen, this will always contain
> + * the guest's state.
> */
> struct xen_pmu_arch_guest {
... this comment change, or yet more interesting, ...
> --- a/xen/include/public/pmu.h
> +++ b/xen/include/public/pmu.h
> @@ -148,7 +148,7 @@ struct xen_pmu_hv_stacktrace {
> * arrives while in Xen.
> * */
> struct xen_pmu_arch_guest guest;
> -#define XEN_PMU_STACKTRACE_PAD 48
> +#define XEN_PMU_STACKTRACE_PAD 56
> uint8_t pad2[XEN_PMU_STACKTRACE_PAD];
> };
... this value in the public interface?
Jan
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 06/10] arch-x86/pmu.h: convert ascii art diagram to Unicode
2025-07-31 15:35 ` Jan Beulich
@ 2025-08-01 9:53 ` Edwin Torok
0 siblings, 0 replies; 30+ messages in thread
From: Edwin Torok @ 2025-08-01 9:53 UTC (permalink / raw)
To: Jan Beulich
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, andriy.sultanov,
boris.ostrovsky, xen-devel
On Thu, Jul 31, 2025 at 4:35 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 25.07.2025 17:06, Edwin Török wrote:
> > Using `aa2u` tool.
> >
> > Signed-off-by: Edwin Török <edwin.torok@cloud.com>
>
> How come the use of that tool made ...
>
> > --- a/xen/include/public/arch-x86/pmu.h
> > +++ b/xen/include/public/arch-x86/pmu.h
> > @@ -75,8 +75,10 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_regs_t);
> > #define PMU_SAMPLE_PV (1<<3) /* Sample from a PV guest */
> >
> > /*
> > - * Architecture-specific information describing the state of the guest at
> > - * the time of a PMU interrupt.
> > + * Architecture-specific information describing state of the guest at
> > + * the time of PMU interrupt.
> > + * Even if the interrupt arrived while inside Xen, this will always contain
> > + * the guest's state.
> > */
> > struct xen_pmu_arch_guest {
Thanks for spotting, according to my evolog this one ended up here
while fixing a rebase conflict:
```
○ lrwoxosk hidden edwin.torok@cloud.com 2025-07-24 13:20:56 215f9e58
│ arch-x86/pmu.h: convert ascii art diagram to Unicode
│ -- operation 073584199528 (2025-07-24 13:20:56) snapshot working copy
│ diff --git a/xen/include/public/arch-x86/pmu.h
b/xen/include/public/arch-x86/pmu.h
│ index 0000000000..4dc3d9a20a 100644
│ --- a/xen/include/public/arch-x86/pmu.h
│ +++ b/xen/include/public/arch-x86/pmu.h
│ @@ -75,8 +75,10 @@
│ #define PMU_SAMPLE_PV (1<<3) /* Sample from a PV guest */
│
│ /*
│ - * Architecture-specific information describing the state of the guest at
│ - * the time of a PMU interrupt.
│ + * Architecture-specific information describing state of the guest at
│ + * the time of PMU interrupt.
│ + * Even if the interrupt arrived while inside Xen, this will always contain
│ + * the guest's state.
│ */
│ struct xen_pmu_arch_guest {
│ union {
│ @@ -149,178 +151,89 @@
│ typedef struct xen_pmu_arch xen_pmu_arch_t;
│ DEFINE_XEN_GUEST_HANDLE(xen_pmu_arch_t);
│
│ -
│ /* Memory layout:
│ -<<<<<<< Conflict 1 of 1
│ -+++++++ Contents of side #1
```
>
> ... this comment change, or yet more interesting, ...
>
> > --- a/xen/include/public/pmu.h
> > +++ b/xen/include/public/pmu.h
> > @@ -148,7 +148,7 @@ struct xen_pmu_hv_stacktrace {
> > * arrives while in Xen.
> > * */
> > struct xen_pmu_arch_guest guest;
> > -#define XEN_PMU_STACKTRACE_PAD 48
> > +#define XEN_PMU_STACKTRACE_PAD 56
> > uint8_t pad2[XEN_PMU_STACKTRACE_PAD];
> > };
>
> ... this value in the public interface?
Thanks for spotting, it doesn't belong in this commit.
I would assume that this happened by squashing a commit into the wrong
place, or by editing the wrong commit while rebasing.
Luckily I use 'jj', and it has an evolog that stores the full history
of how a commit changed over split/squash/rebase, so I don't have to
guess, but can give you a precise answer.
My evolog says that this change came from editing the source code
while having the wrong commit checked out (I kept tweaking those
values):
```
○ lrwoxosk hidden edwin.torok@cloud.com 2025-07-24 13:46:44 bfa29564
│ arch-x86/pmu.h: convert ascii art diagram to Unicode
│ -- operation 3f48136a1b6a (2025-07-24 13:46:44) snapshot working copy
│ diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
│ index 1879914ea6..6366a79169 100644
│ --- a/xen/include/public/pmu.h
│ +++ b/xen/include/public/pmu.h
│ @@ -148,7 +148,7 @@
│ * arrives while in Xen.
│ * */
│ struct xen_pmu_arch_guest guest;
│ -#define XEN_PMU_STACKTRACE_PAD 48
│ +#define XEN_PMU_STACKTRACE_PAD 56
│ uint8_t pad2[XEN_PMU_STACKTRACE_PAD];
│ };
```
Before I submit the next version I'll check whether the other hunks
ended up in the right place after all the
squashing/rebasing/splitting
Best regards,
--Edwin
>
> Jan
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v1 07/10] arch-x86/vpmu.c: store guest registers when domain_id == DOMID_XEN
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (5 preceding siblings ...)
2025-07-25 15:06 ` [RFC PATCH v1 06/10] arch-x86/pmu.h: convert ascii art diagram to Unicode Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-25 15:06 ` [RFC PATCH v1 08/10] pmu.h: expose a hypervisor stacktrace feature Edwin Török
` (7 subsequent siblings)
14 siblings, 0 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, andriy.sultanov, boris.ostrovsky
When Xen profiling is enabled (for HW domain only), then domain_id is
set to DOMID_XEN, and Xen's IP is reported as the sample location.
With VPMU >= 0.2 we can now report more information to help a guest
construct a stacktrace, and store the guest's registers and domain_id
into the new 'struct xen_pmu_hv_stracktrace'.
Privileged (HW domain) guests can then trace themselves, even if the
sample interrupt triggered inside Xen. This is useful if kernel or
userspace stacktrace gather is enabled.
For this to be effective a kernel change is required too, but it is
backwards compatible with old kernels, that:
* would ignore the newly stored data (it is towards the end of the page,
in a previously unused area)
* would report VPMU 0.1, and thus Xen would have xen_pmu_hv_stacktrace ==
NULL, and not report this information
To avoid stale values the guest_domain_id is always initialized to the
correct value, and ip is set to 0.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/arch/x86/cpu/vpmu.c | 33 ++++++++++++++++++++++++++++-----
xen/arch/x86/include/asm/vpmu.h | 1 +
2 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 286dc2af5f..770f63f95a 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -170,7 +170,7 @@ static inline void vpmu_convert_regs(struct xen_pmu_regs *r, uint64_t *flags,
if (!is_hvm_vcpu(sampled)) {
r->ss = cur_regs->ss;
r->cs = cur_regs->cs;
- if (!(sampled->arch.flags & TF_kernel_mode))
+ if (flags && !(sampled->arch.flags & TF_kernel_mode))
*flags |= PMU_SAMPLE_USER;
} else {
struct segment_register seg;
@@ -180,7 +180,7 @@ static inline void vpmu_convert_regs(struct xen_pmu_regs *r, uint64_t *flags,
hvm_get_segment_register(sampled, x86_seg_ss, &seg);
r->ss = seg.sel;
r->cpl = seg.dpl;
- if (!(sampled->arch.hvm.guest_cr[0] & X86_CR0_PE))
+ if (flags && !(sampled->arch.hvm.guest_cr[0] & X86_CR0_PE))
*flags |= PMU_SAMPLE_REAL;
}
}
@@ -240,6 +240,14 @@ void vpmu_do_interrupt(void)
else
domid = sampled->domain->domain_id;
+ if (vpmu->xenpmu_hv_stacktrace)
+ {
+ vpmu->xenpmu_hv_stacktrace->guest_domain_id = domid;
+
+ /* avoid stale values when domid != DOMID_XEN */
+ vpmu->xenpmu_hv_stacktrace->guest.r.regs.ip = 0;
+ }
+
/* Store appropriate registers in xenpmu_data */
#ifdef CONFIG_COMPAT
/* FIXME: 32-bit PVH should go here as well */
@@ -275,6 +283,11 @@ void vpmu_do_interrupt(void)
is_hardware_domain(sampling->domain) )
{
cur_regs = regs;
+ if (vpmu->xenpmu_hv_stacktrace)
+ {
+ vpmu_convert_regs(&vpmu->xenpmu_hv_stacktrace->guest.r.regs,
+ NULL, sampled, guest_cpu_user_regs());
+ }
domid = DOMID_XEN;
}
else
@@ -546,6 +559,7 @@ static void vpmu_cleanup(struct vcpu *v)
vpmu_arch_destroy(v);
xenpmu_data = vpmu->xenpmu_data;
vpmu->xenpmu_data = NULL;
+ vpmu->xenpmu_hv_stacktrace = NULL;
spin_unlock(&vpmu->vpmu_lock);
@@ -572,6 +586,7 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
struct vpmu_struct *vpmu;
struct page_info *page;
uint64_t gfn = params->val;
+ void *vpmu_page;
if ( (params->vcpu >= d->max_vcpus) || (d->vcpu[params->vcpu] == NULL) )
return -EINVAL;
@@ -601,7 +616,8 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
return -EEXIST;
}
- v->arch.vpmu.xenpmu_data = __map_domain_page_global(page);
+ vpmu_page = __map_domain_page_global(page);
+ v->arch.vpmu.xenpmu_data = vpmu_page;
if ( !v->arch.vpmu.xenpmu_data )
{
spin_unlock(&vpmu->vpmu_lock);
@@ -609,8 +625,15 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params)
return -ENOMEM;
}
- if ( vpmu_arch_initialise(v) )
- put_vpmu(v);
+ if (params->version.maj > 0 || params->version.min >= 2)
+ v->arch.vpmu.xenpmu_hv_stacktrace =
+ (void *)((uint8_t *)vpmu_page + PAGE_SIZE -
+ sizeof(struct xen_pmu_hv_stacktrace));
+ else
+ v->arch.vpmu.xenpmu_hv_stacktrace = NULL;
+
+ if (vpmu_arch_initialise(v))
+ put_vpmu(v);
spin_unlock(&vpmu->vpmu_lock);
diff --git a/xen/arch/x86/include/asm/vpmu.h b/xen/arch/x86/include/asm/vpmu.h
index dae9b43dac..df28f80f0f 100644
--- a/xen/arch/x86/include/asm/vpmu.h
+++ b/xen/arch/x86/include/asm/vpmu.h
@@ -55,6 +55,7 @@ struct vpmu_struct {
size_t context_size;
size_t priv_context_size;
struct xen_pmu_data *xenpmu_data;
+ struct xen_pmu_hv_stacktrace *xenpmu_hv_stacktrace; /* only set if client vpmu >= 0.2 */
spinlock_t vpmu_lock;
};
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* [RFC PATCH v1 08/10] pmu.h: expose a hypervisor stacktrace feature
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (6 preceding siblings ...)
2025-07-25 15:06 ` [RFC PATCH v1 07/10] arch-x86/vpmu.c: store guest registers when domain_id == DOMID_XEN Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-25 15:06 ` [RFC PATCH v1 09/10] vpmu.c hypervisor stacktrace support in vPMU Edwin Török
` (6 subsequent siblings)
14 siblings, 0 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Andrew Cooper, Anthony PERARD,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, andriy.sultanov, boris.ostrovsky
For now this'll only be expected to work when the hypervisor is compiled
with frame pointers, and only when the domain is privileged enough to
profile Xen (i.e. it is the hardware domain running in HV or ALL modes).
The stacktrace feature is a flag, rather than a new mode, and would
simply report a stacktrace of depth 0 in other cases.
Old hypervisors with VPMU 0.1 would also report a stracktrace of depth
0, due to implicit zero-filling of the page that contains 'struct xen_pmu_data'.
This is just the interface, followup commit implements it.
No functional change.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/include/public/pmu.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/xen/include/public/pmu.h b/xen/include/public/pmu.h
index 6366a79169..85b2bbed74 100644
--- a/xen/include/public/pmu.h
+++ b/xen/include/public/pmu.h
@@ -80,10 +80,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_pmu_params_t);
* Architectural Performance Events exposed by
* cpuid and listed in the Intel developer's manual
* (ignored on AMD).
+ * - XENPMU_FEATURE_HV_STACKTRACE: Hypervisor stacktraces (when compiled with frame pointers)
*/
#define XENPMU_FEATURE_INTEL_BTS (1<<0)
#define XENPMU_FEATURE_IPC_ONLY (1<<1)
#define XENPMU_FEATURE_ARCH_ONLY (1<<2)
+#define XENPMU_FEATURE_HV_STACKTRACE (1<<3)
/*
* Shared PMU data between hypervisor and PV(H) domains.
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* [RFC PATCH v1 09/10] vpmu.c hypervisor stacktrace support in vPMU
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (7 preceding siblings ...)
2025-07-25 15:06 ` [RFC PATCH v1 08/10] pmu.h: expose a hypervisor stacktrace feature Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-25 15:06 ` [RFC PATCH v1 10/10] xen/tools/pyperf.py: example script to parse perf output Edwin Török
` (5 subsequent siblings)
14 siblings, 0 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Jan Beulich, Andrew Cooper,
Roger Pau Monné, andriy.sultanov, boris.ostrovsky
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/arch/x86/cpu/vpmu.c | 53 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 51 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 770f63f95a..ad02ab9dd8 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -160,9 +160,51 @@ static inline struct vcpu *choose_hwdom_vcpu(void)
return hardware_domain->vcpu[idx];
}
+static inline void vpmu_hypervisor_stacktrace(struct xen_pmu_hv_stacktrace *xenpmu_stack, uint64_t rsp, uint64_t rbp)
+{
+#ifdef CONFIG_FRAME_POINTER
+ /* Based on _show_trace in ../traps.c,
+ * TODO: there should be an iterator to share this code
+ * */
+ unsigned long *frame, addr;
+ uint64_t low = rsp, high = get_stack_trace_bottom(rsp), next = rbp;
+
+ while(xenpmu_stack->stacktrace_nr < sizeof(xenpmu_stack->stacktrace) / sizeof(xenpmu_stack->stacktrace[0])) {
+ /* Valid frame pointer? */
+ if ( (next < low) || (next >= high) )
+ {
+ /*
+ * Exception stack frames have a different layout, denoted by an
+ * inverted frame pointer.
+ */
+ next = ~next;
+ if ( (next < low) || (next >= high) )
+ break;
+ frame = (unsigned long *)next;
+ next = frame[0];
+ addr = frame[(offsetof(struct cpu_user_regs, rip) -
+ offsetof(struct cpu_user_regs, rbp))
+ / BYTES_PER_LONG];
+ }
+ else
+ {
+ /* Ordinary stack frame. */
+ frame = (unsigned long *)next;
+ next = frame[0];
+ addr = frame[1];
+ }
+
+ xenpmu_stack->stacktrace[PMU_MAX_STACKTRACE - 1 - xenpmu_stack->stacktrace_nr++] = addr;
+
+ low = (unsigned long)&frame[2];
+ }
+#endif
+}
+
static inline void vpmu_convert_regs(struct xen_pmu_regs *r, uint64_t *flags,
struct vcpu *sampled,
- const struct cpu_user_regs *cur_regs) {
+ const struct cpu_user_regs *cur_regs)
+{
r->ip = cur_regs->rip;
r->sp = cur_regs->rsp;
r->flags = cur_regs->rflags;
@@ -246,6 +288,7 @@ void vpmu_do_interrupt(void)
/* avoid stale values when domid != DOMID_XEN */
vpmu->xenpmu_hv_stacktrace->guest.r.regs.ip = 0;
+ vpmu->xenpmu_hv_stacktrace->stacktrace_nr = 0;
}
/* Store appropriate registers in xenpmu_data */
@@ -287,6 +330,11 @@ void vpmu_do_interrupt(void)
{
vpmu_convert_regs(&vpmu->xenpmu_hv_stacktrace->guest.r.regs,
NULL, sampled, guest_cpu_user_regs());
+
+ /* can only call this when domid == DOMID_XEN */
+ if (vpmu_features & XENPMU_FEATURE_HV_STACKTRACE)
+ vpmu_hypervisor_stacktrace(vpmu->xenpmu_hv_stacktrace,
+ regs->rsp, regs->rbp);
}
domid = DOMID_XEN;
}
@@ -747,7 +795,8 @@ long do_xenpmu_op(
case XENPMU_feature_set:
if ( pmu_params.val & ~(XENPMU_FEATURE_INTEL_BTS |
XENPMU_FEATURE_IPC_ONLY |
- XENPMU_FEATURE_ARCH_ONLY))
+ XENPMU_FEATURE_ARCH_ONLY |
+ XENPMU_FEATURE_HV_STACKTRACE))
return -EINVAL;
spin_lock(&vpmu_lock);
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* [RFC PATCH v1 10/10] xen/tools/pyperf.py: example script to parse perf output
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (8 preceding siblings ...)
2025-07-25 15:06 ` [RFC PATCH v1 09/10] vpmu.c hypervisor stacktrace support in vPMU Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-25 15:06 ` [RFC LINUX PATCH v1 1/3] perf kvm: introduce a hypervisor_callchain callback Edwin Török
` (4 subsequent siblings)
14 siblings, 0 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel
Cc: Edwin Török, Andrew Cooper, Anthony PERARD,
Michal Orzel, Jan Beulich, Julien Grall, Roger Pau Monné,
Stefano Stabellini, andriy.sultanov, boris.ostrovsky
Looks up symbols by parsing the output of 'perf script -i
perf.data.kvm'.
Note that addr2line is very slow, and invoking it for each symbol would
results in the script taking several minutes to process a trace that
only lasts a few seconds.
Processing only unique addresses, and processing multiple addresses in
one go reduces this to acceptable running times.
Based on an earlier script by Andriy Sultanov for parsing 'perf kvm report
--stdio' output.
This doesn't have to be added to the tree, eventually perf itself should
be capable of performing this lookup, but meanwhile it is useful to have
some way for automatically converting stacktraces.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
xen/tools/pyperf.py | 146 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 146 insertions(+)
create mode 100644 xen/tools/pyperf.py
diff --git a/xen/tools/pyperf.py b/xen/tools/pyperf.py
new file mode 100644
index 0000000000..881e6e1e0b
--- /dev/null
+++ b/xen/tools/pyperf.py
@@ -0,0 +1,146 @@
+#!/usr/bin/env python3
+## run after 'perf kvm --host --guest record -a -F 99 -- xe vm-list'
+# wraps the call to 'perf script' to get xen symbols instead of just addresses, like:
+#
+# perf 236848 [000] 67971.504330: 1 cycles:P:
+# ffff82d04029fc73 [unknown] ([unknown])
+# ffff82d04029ebc2 [unknown] ([unknown])
+# ffff82d04029ef1f [unknown] ([unknown])
+# ffff82d0403081df [unknown] ([unknown])
+# ffff82d0402012d3 [unknown] ([unknown])
+# 7f2ea410469d __GI___ioctl+0x3d (/usr/lib64/libc.so.6)
+# 55f9b3a8df5c perf_evsel__run_ioctl+0x6c (/usr/bin/perf)
+# 55f9b397b152 __evlist__enable.constprop.0+0x112 (/usr/bin/perf)
+# 55f9b38c2643 __cmd_record.constprop.0+0x2833 (/usr/bin/perf)
+# 55f9b38c3e1c cmd_record+0xeec (/usr/bin/perf)
+# 55f9b38e3ed0 cmd_kvm+0x7d0 (/usr/bin/perf)
+# 55f9b3961ff1 run_builtin+0x71 (/usr/bin/perf)
+# 55f9b38adbeb main+0x68b (/usr/bin/perf)
+# 7f2ea403314a __libc_start_call_main+0x7a (/usr/lib64/libc.so.6)
+# 7f2ea403320b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
+# 55f9b38ade55 _start+0x25 (/usr/bin/perf)
+#
+# Or with perf script -F+srcline:
+# perf 236848 [000] 67971.504330: 1 cycles:P:
+# ffff82d04029fc73 [unknown] ([unknown])
+# ffff82d04029ebc2 [unknown] ([unknown])
+# ffff82d04029ef1f [unknown] ([unknown])
+# ffff82d0403081df [unknown] ([unknown])
+# ffff82d0402012d3 [unknown] ([unknown])
+# 7f2ea410469d __GI___ioctl+0x3d (/usr/lib64/libc.so.6)
+# libc.so.6[d569d]
+# 55f9b3a8df5c perf_evsel__run_ioctl+0x6c (/usr/bin/perf)
+# evsel.c:435
+# 55f9b397b152 __evlist__enable.constprop.0+0x112 (/usr/bin/perf)
+# evlist.c:573
+# 55f9b38c2643 __cmd_record.constprop.0+0x2833 (/usr/bin/perf)
+# builtin-record.c:2524
+# 55f9b38c3e1c cmd_record+0xeec (/usr/bin/perf)
+# builtin-record.c:4215
+# 55f9b38e3ed0 cmd_kvm+0x7d0 (/usr/bin/perf)
+# builtin-kvm.c:2081
+# 55f9b3961ff1 run_builtin+0x71 (/usr/bin/perf)
+# perf.c:322
+# 55f9b38adbeb main+0x68b (/usr/bin/perf)
+# perf.c:375
+# 7f2ea403314a __libc_start_call_main+0x7a (/usr/lib64/libc.so.6)
+# libc.so.6[414a]
+# 7f2ea403320b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
+# libc.so.6[420b]
+# 55f9b38ade55 _start+0x25 (/usr/bin/perf)
+# ??:0
+#
+# TODO: BUG when we have only 1 unknown and not from hypervisor we print 2 bogus values
+# and claim it was hypervisor
+# This is useful for FlameGraphs
+
+from collections import defaultdict
+from typing import Optional
+import subprocess
+import tempfile
+
+with open('/sys/hypervisor/version/major', 'r') as f:
+ major = f.read().rstrip()
+with open('/sys/hypervisor/version/minor', 'r') as f:
+ minor = f.read().rstrip()
+with open('/sys/hypervisor/version/extra', 'r') as f:
+ extra = f.read().rstrip()
+
+xen_version = f'{major}.{minor}{extra}'
+xen_symbols = f'/boot/xen-syms-{xen_version}'
+# TODO: srcline transform as we do for ours
+report: bytes = subprocess.check_output(['perf', 'script', '-i', 'perf.data.kvm'])
+
+# invoking addr2line for each symbol is very slow
+# (takes several minutes to process a trace that way)
+# Instead collect all addresses, and invoke addr2line once
+# (this way we can also query it only for unique addresses)
+maybe_in_hypervisor = False
+
+addresses = set()
+lines = []
+
+for rawline in report.splitlines():
+ line = rawline.decode()
+ columns = line.split()
+ if line and line[0] != ' ':
+ # beginning of new stacktrace, could start in hypervisor
+ maybe_in_hypervisor = True
+ if len(columns) == 3 and maybe_in_hypervisor and columns[1] == '[unknown]' and columns[2] == '([unknown])':
+ addr = columns[0]
+ addresses.add(addr)
+ lines.append((line, columns))
+ else:
+ maybe_in_hypervisor = False
+ lines.append((line, []))
+
+addr2loc: defaultdict[str, list[str]] = defaultdict(list)
+with tempfile.NamedTemporaryFile() as tmp:
+ # TODO: add back -i, but sharing needs to be fixed
+ tmp.write(f"-asfe {xen_symbols}\n".encode("utf-8"))
+ for addr in addresses:
+ tmp.write((addr + "\n").encode("utf-8"))
+ tmp.flush()
+ cmd = ['addr2line', '@' + tmp.name]
+ symbols = subprocess.check_output(cmd).decode().splitlines()
+ # output format:
+ # address
+ # function_name
+ # source_file:line_number
+ # inlined by function_name
+ # inlined by source_file:line_number
+ # we keep each of these separate, but change the order,
+ # so that it works for a flamegraph (from callees to callers)
+ # source_file:line_number
+ # function_name
+ # inlined by source_file:line_number
+ # inlined by function_name
+
+ # sometimes we get extra info in parenthesis, drop it:
+ # /builddir/build/BUILD/xen-4.21.0/xen/build-xen-debug/../arch/x86/pv/emul-priv-op.c:1349 (discriminator 8)
+ addr = ""
+ prev: Optional[str] = None
+ for symline in symbols:
+ if symline.startswith("0x"):
+ addr = symline[2:]
+ addr2loc[addr] = []
+ prev = None
+ else:
+ if prev:
+ for line in [symline, prev]:
+ addr2loc[addr].append("%024s %s_[h] ([%s])" %
+ (addr, line.split()[0], xen_symbols))
+ prev = symline
+
+ for (line, columns) in lines:
+ if len(columns) == 3:
+ addr = columns[0]
+ loc = addr2loc.get(addr)
+ if loc:
+ # TODO: stacktrace needs to be shared when inlined functions have common parent
+ for out in loc:
+ print(out)
+ else:
+ print(line)
+ else:
+ print(line)
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* [RFC LINUX PATCH v1 1/3] perf kvm: introduce a hypervisor_callchain callback
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (9 preceding siblings ...)
2025-07-25 15:06 ` [RFC PATCH v1 10/10] xen/tools/pyperf.py: example script to parse perf output Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-25 15:06 ` [RFC LINUX PATCH v1 2/3] xen/{interface,xenpmu}.h: update with VPMU 0.2 from Xen Edwin Török
` (3 subsequent siblings)
14 siblings, 0 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel; +Cc: Edwin Török
`perf kvm` currently assumes that it can construct a stacktrace by
looking up stack pointer addresses in the current kernel's address
space.
That only works if the hypervisor is the same as the kernel (i.e. KVM),
but doesn't work if the hypervisor is separate from the kernel (Xen,
with Linux as Dom0).
Introduce a callback to enable Xen to retrieve the stacktrace from Xen
instead when a sample is inside the hypervisor (domid == DOMID_XEN
instead of DOMID_SELF).
The callback can replace the registers with the guest kernel's registers
upon return when domid == DOMID_SELF, so that we can continue with the
kernel stacktrace.
Both KVM and Xen define this as NULL, a followup commit will implement
the callback for Xen (KVM doesn't need a callback implementation).
No functional change.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
arch/x86/xen/pmu.c | 2 ++
include/linux/perf_event.h | 12 ++++++++++++
kernel/events/core.c | 5 +++++
virt/kvm/kvm_main.c | 1 +
4 files changed, 20 insertions(+)
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 246d67dab510..b92dc739fdfb 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -466,6 +466,7 @@ static unsigned long xen_get_guest_ip(void)
static struct perf_guest_info_callbacks xen_guest_cbs = {
.state = xen_guest_state,
.get_ip = xen_get_guest_ip,
+ .hypervisor_callchain = NULL
};
/* Convert registers from Xen's format to Linux' */
@@ -489,6 +490,7 @@ static void xen_convert_regs(const struct xen_pmu_regs *xen_regs,
}
}
+
irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
{
int err, ret = IRQ_NONE;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 90c782749b05..d82aeaddadb8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -29,10 +29,14 @@
#define PERF_GUEST_ACTIVE 0x01
#define PERF_GUEST_USER 0x02
+struct perf_callchain_entry_ctx;
+
struct perf_guest_info_callbacks {
unsigned int (*state)(void);
unsigned long (*get_ip)(void);
unsigned int (*handle_intel_pt_intr)(void);
+ void (*hypervisor_callchain)(struct perf_callchain_entry_ctx *pc,
+ struct pt_regs *regs);
};
#ifdef CONFIG_HAVE_HW_BREAKPOINT
@@ -1514,6 +1518,7 @@ extern struct perf_guest_info_callbacks __rcu *perf_guest_cbs;
DECLARE_STATIC_CALL(__perf_guest_state, *perf_guest_cbs->state);
DECLARE_STATIC_CALL(__perf_guest_get_ip, *perf_guest_cbs->get_ip);
DECLARE_STATIC_CALL(__perf_guest_handle_intel_pt_intr, *perf_guest_cbs->handle_intel_pt_intr);
+DECLARE_STATIC_CALL(__perf_hypervisor_callchain, *perf_guest_cbs->hypervisor_callchain);
static inline unsigned int perf_guest_state(void)
{
@@ -1527,12 +1532,19 @@ static inline unsigned int perf_guest_handle_intel_pt_intr(void)
{
return static_call(__perf_guest_handle_intel_pt_intr)();
}
+static inline void
+perf_hypervisor_callchain(struct perf_callchain_entry_ctx *entry,
+ struct pt_regs *regs)
+{
+ static_call(__perf_hypervisor_callchain)(entry, regs);
+}
extern void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs);
extern void perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs);
#else
static inline unsigned int perf_guest_state(void) { return 0; }
static inline unsigned long perf_guest_get_ip(void) { return 0; }
static inline unsigned int perf_guest_handle_intel_pt_intr(void) { return 0; }
+static inline void perf_hypervisor_callchain(struct perf_callchain_entry_ctx *) { return; }
#endif /* CONFIG_GUEST_PERF_EVENTS */
extern void perf_event_exec(void);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3a33d9c1b1b2..a8535294018b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6917,6 +6917,7 @@ struct perf_guest_info_callbacks __rcu *perf_guest_cbs;
DEFINE_STATIC_CALL_RET0(__perf_guest_state, *perf_guest_cbs->state);
DEFINE_STATIC_CALL_RET0(__perf_guest_get_ip, *perf_guest_cbs->get_ip);
DEFINE_STATIC_CALL_RET0(__perf_guest_handle_intel_pt_intr, *perf_guest_cbs->handle_intel_pt_intr);
+DEFINE_STATIC_CALL_NULL(__perf_hypervisor_callchain, *perf_guest_cbs->hypervisor_callchain);
void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
{
@@ -6931,6 +6932,9 @@ void perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
if (cbs->handle_intel_pt_intr)
static_call_update(__perf_guest_handle_intel_pt_intr,
cbs->handle_intel_pt_intr);
+ if (cbs->hypervisor_callchain)
+ static_call_update(__perf_hypervisor_callchain,
+ cbs->hypervisor_callchain);
}
EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
@@ -6944,6 +6948,7 @@ void perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
static_call_update(__perf_guest_get_ip, (void *)&__static_call_return0);
static_call_update(__perf_guest_handle_intel_pt_intr,
(void *)&__static_call_return0);
+ static_call_update(__perf_hypervisor_callchain, (void *)&__static_call_return0);
synchronize_rcu();
}
EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 44c228bcd699..20a03dd9cc42 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6038,6 +6038,7 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
.state = kvm_guest_state,
.get_ip = kvm_guest_get_ip,
.handle_intel_pt_intr = NULL,
+ .hypervisor_callchain = NULL
};
void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void))
base-commit: dbcb8d8e4163e46066f43e2bd9a6779e594ec900
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* [RFC LINUX PATCH v1 2/3] xen/{interface,xenpmu}.h: update with VPMU 0.2 from Xen
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (10 preceding siblings ...)
2025-07-25 15:06 ` [RFC LINUX PATCH v1 1/3] perf kvm: introduce a hypervisor_callchain callback Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-25 15:06 ` [RFC LINUX PATCH v1 3/3] perf kvm: implement Xen hypervisor stacktraces Edwin Török
` (2 subsequent siblings)
14 siblings, 0 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel; +Cc: Edwin Török
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
arch/x86/include/asm/xen/interface.h | 100 +++++++++++++++++++++++++++
include/xen/interface/xenpmu.h | 56 +++++++++++++--
2 files changed, 152 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/xen/interface.h b/arch/x86/include/asm/xen/interface.h
index baca0b00ef76..f3667831573f 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -320,6 +320,25 @@ struct xen_pmu_regs {
#define PMU_SAMPLE_REAL (1<<2) /* Sample is from realmode */
#define PMU_SAMPLE_PV (1<<3) /* Sample from a PV guest */
+/*
+ * Architecture-specific information describing state of the guest at
+ * the time of PMU interrupt.
+ * Even if the interrupt arrived while inside Xen, this will always contain
+ * the guest's state.
+ */
+struct xen_pmu_arch_guest {
+ union {
+ /*
+ * Processor's registers at the time of interrupt.
+ * WO for hypervisor, RO for guests.
+ */
+ struct xen_pmu_regs regs;
+ /* Padding for adding new registers to xen_pmu_regs in the future */
+#define XENPMU_REGS_PAD_SZ 64
+ uint8_t pad[XENPMU_REGS_PAD_SZ];
+ } r;
+};
+
/*
* Architecture-specific information describing state of the processor at
* the time of PMU interrupt.
@@ -376,6 +395,87 @@ struct xen_pmu_arch {
} c;
};
+/* Memory layout:
+ * ╭─────────────────────╮
+ * │ struct xen_pmu_data │
+ * ╒══════════════╧═════════════════════╧═══════════════════════╕ ◁│
+ * │ vcpu_id │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │ pcpu_id │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │ domain_id │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │██pad███████████████████████████████████████████████████████│ │
+ * ╞════╤═╤═══╤══════════════════╤══════════════════════════════╡ │
+ * │ pmu│ │ r │ regs │██pad█████████████████████████│ │
+ * ├────╯ ├───╯ (xen or guest) │██████████████████████████████│ │
+ * │ ╞══════════════════════╧══════════════════════════════╡ │
+ * │ │ pmu_flags │ │
+ * │ ╞═══╤════════════════════╤════════════════════════════╡ │
+ * │ │ l │ lapic_lvtpc │████████████████████████████│ │
+ * │ ├───╯ ███████████████████│██pad███████████████████████│ │
+ * │ │ ███████████████████│████████████████████████████│ │
+ * │ ╞═══╤═╤═══════╤═════╤════╪════╤═══════╤═══════════════╡ │
+ * │ │ c │ │ │ amd │ │ │ intel │ │█████│ │
+ * │ ├───┘ │ ╰─────╯ │ ╰───────╯ │█████│ │
+ * │ │ │ counter │ fixed_counters │█████│ │
+ * │ │ ├──────────────────┼──────────────────────┤█████│ │
+ * │ │ │ ctrls │ arch_counters │█████│ │
+ * │ │ ╞═════╤════════╤═══├──────────────────────┤█████│ │
+ * │ │ │ │ regs[] │ ┆│ global_ctrl │█████│ │
+ * │ │ │ └────────╯ ┆├──────────────────────┤█████│ │
+ * │ │ │struct ┆│ global_ovf_ctrl │█████│ │
+ * │ │ │xen_pmu_cntr_pair┆├──────────────────────┤█████│ │
+ * │ │ │[counters] ┆│ global_status │█████│ │
+ * │ │ │ ┆├──────────────────────┤█████│ │
+ * │ │ │ ┆│ fixed_ctrl │█████│ │
+ * │ │ │ ┆├──────────────────────┤█████│ │
+ * │ │ │ ┆│ ds_area │█████│ │
+ * │ │ │ ┆├──────────────────────┤█████│ │
+ * │ │ │ ┆│ pebs_enable │█pad█│ │
+ * │ │ │ ┆├──────────────────────┤█████│ │
+ * │ │ │ ▽│ debugctl │█████│ │
+ * │ │ │██████████████████╞═══════╤════════╤═════╡█████│ │
+ * │ │ │██████████████████│ │ regs[] │ ┆[0]│█████│ │
+ * │ │ │██████████████████│ └────────╯ ┆ │█████│ │
+ * │ │ │██████████████████│ uint64_t ┆ │█████│ │
+ * │ │ │██████████████████│ [fixed_counters] ┆ │█████│ │
+ * │ │ │██████████████████│ ┆ │█████│ │
+ * │ │ │██████████████████│ ┆ │█████│ │
+ * │ │ │██████████████████│ ─────────────────┆ │█████│ │
+ * │ │ │██████████████████│ struct ┆ │█████│ │
+ * │ │ │██████████████████│ xen_pmu_cntr_pair┆ │█████│ │
+ * │ │ ╘══════════════════╡ [arch_counters] ┆ ╞═════╡ │
+ * │ │ │ ┆ │ │ │
+ * │ │ │ ▽ │ │ │
+ * │ │ ╘══════════════════════╛ │ │
+ * │ ╘═════════════════════════════════════════════════════╡ │
+ * ╞════════════════════════════════════════════════════════════╡ │
+ * │████████████████████████████████████████████████████████████│ │
+ * ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
+ * ┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆┆ ┆
+ * │████████████████████████████████████████████████████████████│ │
+ * │████████████████████████████████████████████████████████████│ │
+ * │██████████╭──────────────────────────────╮██████████████████│ │
+ * │██████████│ struct xen_pmu_hv_stacktrace │██████████████████│ │
+ * ╞══════════╧══════════════════════════════╧══════════════════╡ │
+ * │ △ [stacktrace_nr-1] │ │
+ * │ ┆ │ │
+ * │ stacktrace[stacktrace_nr] ┆ [0] │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │ stacktrace_nr │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │ guest_domain_id │ │
+ * ├────────────────────────────────────────────────────────────┤ │
+ * │██pad███████████████████████████████████████████████████████│ │
+ * ╞═══════╤═╤═══╤══════════════════╤═══════════════════════════╡ │
+ * │ guest │ │ r │ regs │██pad██████████████████████│ │
+ * ├───────╯ ├───╯ (xen or guest) │███████████████████████████│ │
+ * │ ╞══════════════════════╧═══════════════════════════╡ │
+ * │ │██pad2████████████████████████████████████████████│ │ PAGE_SIZE
+ * ╘═════════╧══════════════════════════════════════════════════╛ ◁╯
+ */
+
#endif /* !__ASSEMBLY__ */
/*
diff --git a/include/xen/interface/xenpmu.h b/include/xen/interface/xenpmu.h
index e2ee73d91bd6..c4dfa8e349f7 100644
--- a/include/xen/interface/xenpmu.h
+++ b/include/xen/interface/xenpmu.h
@@ -5,7 +5,7 @@
#include "xen.h"
#define XENPMU_VER_MAJ 0
-#define XENPMU_VER_MIN 1
+#define XENPMU_VER_MIN 2
/*
* ` enum neg_errnoval
@@ -22,8 +22,7 @@
#define XENPMU_init 4
#define XENPMU_finish 5
#define XENPMU_lvtpc_set 6
-#define XENPMU_flush 7
-
+#define XENPMU_flush 7 /* Write cached MSR values to HW */
/* ` } */
/* Parameters structure for HYPERVISOR_xenpmu_op call */
@@ -56,8 +55,20 @@ struct xen_pmu_params {
/*
* PMU features:
* - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
+ * - XENPMU_FEATURE_IPC_ONLY: Restrict PMCs to the most minimum set possible.
+ * Instructions, cycles, and ref cycles. Can be
+ * used to calculate instructions-per-cycle (IPC)
+ * (ignored on AMD).
+ * - XENPMU_FEATURE_ARCH_ONLY: Restrict PMCs to the Intel Pre-Defined
+ * Architectural Performance Events exposed by
+ * cpuid and listed in the Intel developer's manual
+ * (ignored on AMD).
+ * - XENPMU_FEATURE_HV_STACKTRACE: Hypervisor stacktraces (when compiled with frame pointers)
*/
-#define XENPMU_FEATURE_INTEL_BTS 1
+#define XENPMU_FEATURE_INTEL_BTS (1<<0)
+#define XENPMU_FEATURE_IPC_ONLY (1<<1)
+#define XENPMU_FEATURE_ARCH_ONLY (1<<2)
+#define XENPMU_FEATURE_HV_STACKTRACE (1<<3)
/*
* Shared PMU data between hypervisor and PV(H) domains.
@@ -67,6 +78,9 @@ struct xen_pmu_params {
* Architecture-independent fields of xen_pmu_data are WO for the hypervisor
* and RO for the guest but some fields in xen_pmu_arch can be writable
* by both the hypervisor and the guest (see arch-$arch/pmu.h).
+ *
+ * PAGE_SIZE bytes of memory are allocated.
+ * This struct cannot be larger than PAGE_SIZE.
*/
struct xen_pmu_data {
/* Interrupted VCPU */
@@ -92,4 +106,38 @@ struct xen_pmu_data {
struct xen_pmu_arch pmu;
};
+/* stacktrace entry populated from the end,
+ * so stacktrace_nr == 1, means that stacktrace[PMU_MAX_STACKTRCE-1] is valid.
+ * This is done, so that PMU_MAX_STACKTRACE can be changed in the future, without breaking the ABI.
+ * The struct itself (and thus the stacktrace_nr field) will always be placed at the end of a page.
+ *
+ * See arch-x86/pmu.h for an example memory layout on x86.
+ *
+ */
+#define PMU_MAX_STACKTRACE 127
+
+/* WO for hypervisor, RO for guest */
+struct xen_pmu_hv_stacktrace {
+ uint64_t stacktrace[PMU_MAX_STACKTRACE];
+ uint64_t stacktrace_nr;
+
+ /* Like xen_pmu_data.domain_id, but instead of DOMID_XEN always contains the
+ * domain that was interrupted (DOMID_SELF if it matches the sampling
+ * domain).
+ */
+ domid_t guest_domain_id;
+ uint8_t pad[6];
+
+ /* When xen_pmu_data.domain_id == DOMID_XEN, this will contain
+ * the registers of the guest that was interrupted.
+ * This is useful for Dom0 kernel stacktraces, even if the interrupt
+ * arrives while in Xen.
+ */
+ struct xen_pmu_arch_guest guest;
+#define XEN_PMU_STACKTRACE_PAD 56
+ uint8_t pad2[XEN_PMU_STACKTRACE_PAD];
+};
+
+#define MAX_XEN_PMU_DATA_SIZE (PAGE_SIZE - sizeof(struct xen_pmu_hv_stacktrace))
+
#endif /* __XEN_PUBLIC_XENPMU_H__ */
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* [RFC LINUX PATCH v1 3/3] perf kvm: implement Xen hypervisor stacktraces
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (11 preceding siblings ...)
2025-07-25 15:06 ` [RFC LINUX PATCH v1 2/3] xen/{interface,xenpmu}.h: update with VPMU 0.2 from Xen Edwin Török
@ 2025-07-25 15:06 ` Edwin Török
2025-07-25 15:48 ` [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Torok
2025-07-25 22:25 ` Demi Marie Obenour
14 siblings, 0 replies; 30+ messages in thread
From: Edwin Török @ 2025-07-25 15:06 UTC (permalink / raw)
To: xen-devel; +Cc: Edwin Török
Using the new VPMU 0.2 interface.
This is backwards compatible with VPMU 0.1:
the new 'struct xen_pmu_hv_stacktrace` is stored at the end of the page,
and stacktrace_nr would be 0 on old hypervisors.
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
---
arch/x86/events/core.c | 4 ++-
arch/x86/xen/pmu.c | 73 ++++++++++++++++++++++++++++++++++++++----
2 files changed, 70 insertions(+), 7 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ad63bd408cd9..1fca4a77f353 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2764,12 +2764,14 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
struct unwind_state state;
unsigned long addr;
+ perf_hypervisor_callchain(entry, regs);
+
if (perf_guest_state()) {
/* TODO: We don't support guest os callchain now */
return;
}
- if (perf_callchain_store(entry, regs->ip))
+ if (!regs->ip || perf_callchain_store(entry, regs->ip))
return;
if (perf_hw_regs(regs))
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index b92dc739fdfb..4996b6904e0b 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -19,11 +19,13 @@
struct xenpmu {
/* Shared page between hypervisor and domain */
struct xen_pmu_data *xenpmu_data;
+ const struct xen_pmu_hv_stacktrace *xenpmu_hv_stacktrace;
uint8_t flags;
};
static DEFINE_PER_CPU(struct xenpmu, xenpmu_shared);
#define get_xenpmu_data() (this_cpu_ptr(&xenpmu_shared)->xenpmu_data)
+#define get_xenpmu_hv_stacktrace() (this_cpu_ptr(&xenpmu_shared)->xenpmu_hv_stacktrace)
#define get_xenpmu_flags() (this_cpu_ptr(&xenpmu_shared)->flags)
/* Macro for computing address of a PMU MSR bank */
@@ -436,8 +438,19 @@ static unsigned int xen_guest_state(void)
return state;
}
- if (!xen_initial_domain() || (xenpmu_data->domain_id >= DOMID_SELF))
- return state;
+ if (!xen_initial_domain() || (xenpmu_data->domain_id >= DOMID_SELF)) {
+ if (xenpmu_data->domain_id == DOMID_XEN) {
+ /* when inside Xen we output the hypervisor stacktrace if available,
+ * but only look at guest stacktrace if this is our domid
+ */
+ const struct xen_pmu_hv_stacktrace *xenpmu_hv_stacktrace =
+ get_xenpmu_hv_stacktrace();
+ if (!xenpmu_hv_stacktrace ||
+ xenpmu_hv_stacktrace->guest_domain_id == DOMID_SELF)
+ return state;
+ } else
+ return state;
+ }
state |= PERF_GUEST_ACTIVE;
@@ -463,10 +476,54 @@ static unsigned long xen_get_guest_ip(void)
return xenpmu_data->pmu.r.regs.ip;
}
+static void xen_convert_regs(const struct xen_pmu_regs *xen_regs,
+ struct pt_regs *regs, uint64_t pmu_flags);
+
+static void xen_hypervisor_callchain(struct perf_callchain_entry_ctx *entry,
+ struct pt_regs *regs)
+{
+ if (!entry)
+ return;
+
+ const struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
+
+ if (!xenpmu_data) {
+ pr_warn_once("%s: pmudata not initialized\n", __func__);
+ return;
+ }
+
+ if (xenpmu_data->domain_id != DOMID_XEN)
+ return;
+
+ if (!regs->ip || perf_callchain_store(entry, regs->ip))
+ return;
+
+ const struct xen_pmu_hv_stacktrace *pmu_stack =
+ get_xenpmu_hv_stacktrace();
+
+ const unsigned int stacktrace_nr = pmu_stack->stacktrace_nr;
+
+ if (stacktrace_nr > ARRAY_SIZE(pmu_stack->stacktrace)) {
+ pr_warn_once("%s: stacktrace_nr out of bounds: %d", __func__,
+ stacktrace_nr);
+ return;
+ }
+
+ for (unsigned int i = 0; i < stacktrace_nr; i++) {
+ uint64_t addr =
+ pmu_stack->stacktrace[PMU_MAX_STACKTRACE - 1 - i];
+ if (!addr || perf_callchain_store(entry, addr))
+ break;
+ }
+
+ xen_convert_regs(&pmu_stack->guest.r.regs, regs,
+ xenpmu_data->pmu.pmu_flags);
+}
+
static struct perf_guest_info_callbacks xen_guest_cbs = {
.state = xen_guest_state,
.get_ip = xen_get_guest_ip,
- .hypervisor_callchain = NULL
+ .hypervisor_callchain = xen_hypervisor_callchain
};
/* Convert registers from Xen's format to Linux' */
@@ -490,7 +547,6 @@ static void xen_convert_regs(const struct xen_pmu_regs *xen_regs,
}
}
-
irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
{
int err, ret = IRQ_NONE;
@@ -527,7 +583,7 @@ void xen_pmu_init(int cpu)
{
int err;
struct xen_pmu_params xp;
- unsigned long pfn;
+ unsigned long pfn, pmu_page;
struct xen_pmu_data *xenpmu_data;
BUILD_BUG_ON(sizeof(struct xen_pmu_data) > PAGE_SIZE);
@@ -535,7 +591,8 @@ void xen_pmu_init(int cpu)
if (xen_hvm_domain() || (cpu != 0 && !is_xen_pmu))
return;
- xenpmu_data = (struct xen_pmu_data *)get_zeroed_page(GFP_KERNEL);
+ pmu_page = get_zeroed_page(GFP_KERNEL);
+ xenpmu_data = (struct xen_pmu_data *)pmu_page;
if (!xenpmu_data) {
pr_err("VPMU init: No memory\n");
return;
@@ -551,6 +608,10 @@ void xen_pmu_init(int cpu)
goto fail;
per_cpu(xenpmu_shared, cpu).xenpmu_data = xenpmu_data;
+ per_cpu(xenpmu_shared, cpu).xenpmu_hv_stacktrace =
+ (const struct xen_pmu_hv_stacktrace *)
+ (pmu_page + PAGE_SIZE -
+ sizeof(struct xen_pmu_hv_stacktrace));
per_cpu(xenpmu_shared, cpu).flags = 0;
if (!is_xen_pmu) {
--
2.47.1
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (12 preceding siblings ...)
2025-07-25 15:06 ` [RFC LINUX PATCH v1 3/3] perf kvm: implement Xen hypervisor stacktraces Edwin Török
@ 2025-07-25 15:48 ` Edwin Torok
2025-07-25 22:25 ` Demi Marie Obenour
14 siblings, 0 replies; 30+ messages in thread
From: Edwin Torok @ 2025-07-25 15:48 UTC (permalink / raw)
To: xen-devel
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, Bertrand Marquis,
Volodymyr Babchuk, andriy.sultanov, boris.ostrovsky
On Fri, Jul 25, 2025 at 4:07 PM Edwin Török <edwin.torok@cloud.com> wrote:
> [...]
>
> Edwin Török (10):
> pmu.h: add a BUILD_BUG_ON to ensure it fits within one page
> arch-x86/pmu.h: document current memory layout for VPMU
> arch-x86/pmu.h: convert ascii art drawing to Unicode
> vpmu.c: factor out register conversion
> pmu.h: introduce a stacktrace area
> arch-x86/pmu.h: convert ascii art diagram to Unicode
> arch-x86/vpmu.c: store guest registers when domain_id == DOMID_XEN
> pmu.h: expose a hypervisor stacktrace feature
> vpmu.c hypervisor stacktrace support in vPMU
> xen/tools/pyperf.py: example script to parse perf output
>
> xen/arch/x86/cpu/vpmu.c | 130 ++++++++++++++++++++------
> xen/arch/x86/cpu/vpmu_amd.c | 2 +-
> xen/arch/x86/cpu/vpmu_intel.c | 2 +-
> xen/arch/x86/include/asm/vpmu.h | 1 +
> xen/include/public/arch-arm.h | 1 +
> xen/include/public/arch-ppc.h | 1 +
> xen/include/public/arch-riscv.h | 1 +
> xen/include/public/arch-x86/pmu.h | 101 ++++++++++++++++++++-
> xen/include/public/pmu.h | 41 ++++++++-
> xen/tools/pyperf.py | 146 ++++++++++++++++++++++++++++++
> 10 files changed, 395 insertions(+), 31 deletions(-)
> create mode 100644 xen/tools/pyperf.py
>
For convenience this is also available as a git repository here:
https://gitlab.com/xen-project/people/edwintorok/xen/-/commits/pmustack?ref_type=heads
https://github.com/edwintorok/linux-stable/commits/pmustack/
Best regards,
--Edwin
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support
2025-07-25 15:06 [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Török
` (13 preceding siblings ...)
2025-07-25 15:48 ` [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support Edwin Torok
@ 2025-07-25 22:25 ` Demi Marie Obenour
2025-07-28 9:25 ` Edwin Torok
14 siblings, 1 reply; 30+ messages in thread
From: Demi Marie Obenour @ 2025-07-25 22:25 UTC (permalink / raw)
To: Edwin Török, xen-devel
Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, Bertrand Marquis,
Volodymyr Babchuk, andriy.sultanov, boris.ostrovsky
[-- Attachment #1.1.1: Type: text/plain, Size: 2195 bytes --]
On 7/25/25 11:06, Edwin Török wrote:
> Caveats:
> * x86-only for now
> * only tested on AMD EPYC 8124P
> * Xen PMU support was broken to begin with on Xeon Silver 4514Y, so I
> wasn't able to test there ('perf top' fails to parse samples). I'll
> try to figure out what is wrong there separately
> * for now I edit the release config in xen.spec to enable frame
> pointers. Eventually it might be useful to have a 3rd build variant:
> release-fp. Or teach Xen to produce/parse ORC or SFrame formats without
> requiring frame pointers.
That would definitely be nice.
> * perf produces raw hex addresses, and a python script is used to
> post-process it and obtain symbols. Eventually perf should be updated
> to do this processing itself (there was an old patch for Linux 3.12 by Borislav Petkov)
> * I've only tested capturing Dom0 stack traces. Linux doesn't support
> guest stacktraces yet (it can only lookup the guest RIP)
What would be needed to fix this? Capturing guest stacktraces from the host
or Xen seems like a really bad idea, but it might make sense to interrupt the
guest and allow it to provide a (strictly validated) stack trace for use by
the host. This would need to be done asynchronously, as Linux is moving
towards generating stack traces outside of the NMI handler.
> * the Linux patch will need to be forwarded ported to master before submission
> * All the caveats for using regular VPMU apply, except for the lack of
> stacktraces, that is fixed here!
> * Dom0 must run hard pinned on all host CPUs
> * Watchdog must be disabled
> * not security supported
> * x86 only
> * secureboot needs to be disabled
What would be needed to fix these limitations? With them it isn't really
possible to do profiling on production systems, only on dedicated development
boxes. That works great if you have a dev box and can create a realistic
workload with non-sensitive data, but less great if you have a problem that
you can't reproduce on a non-production system. It's also not usable
for real-time monitoring of production environments.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC PATCH v1 00/10] Xen flamegraph (hypervisor stacktrace profile) support
2025-07-25 22:25 ` Demi Marie Obenour
@ 2025-07-28 9:25 ` Edwin Torok
0 siblings, 0 replies; 30+ messages in thread
From: Edwin Torok @ 2025-07-28 9:25 UTC (permalink / raw)
To: Demi Marie Obenour
Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini,
Bertrand Marquis, Volodymyr Babchuk, andriy.sultanov,
boris.ostrovsky
On Fri, Jul 25, 2025 at 11:26 PM Demi Marie Obenour
<demiobenour@gmail.com> wrote:
>
> On 7/25/25 11:06, Edwin Török wrote:
> > Caveats:
> > * x86-only for now
> > * only tested on AMD EPYC 8124P
> > * Xen PMU support was broken to begin with on Xeon Silver 4514Y, so I
> > wasn't able to test there ('perf top' fails to parse samples). I'll
> > try to figure out what is wrong there separately
> > * for now I edit the release config in xen.spec to enable frame
> > pointers. Eventually it might be useful to have a 3rd build variant:
> > release-fp. Or teach Xen to produce/parse ORC or SFrame formats without
> > requiring frame pointers.
>
> That would definitely be nice.
>
> > * perf produces raw hex addresses, and a python script is used to
> > post-process it and obtain symbols. Eventually perf should be updated
> > to do this processing itself (there was an old patch for Linux 3.12 by Borislav Petkov)
> > * I've only tested capturing Dom0 stack traces. Linux doesn't support
> > guest stacktraces yet (it can only lookup the guest RIP)
>
> What would be needed to fix this? Capturing guest stacktraces from the host
> or Xen seems like a really bad idea, but it might make sense to interrupt the
> guest and allow it to provide a (strictly validated) stack trace for use by
> the host. This would need to be done asynchronously, as Linux is moving
> towards generating stack traces outside of the NMI handler.
The way perf captures stacktraces for userspace is that it either
walks its stack by following framepointers
and copying memory from userspace, or it can take a copy of the entire
userspace stack (up to a limit of ~64KiB),
and let perf userspace construct a stacktrace from that (for --callgraph=dwarf).
I'd expect that copying from userspace is a lot faster than copying
from a guest, because for a guest you'd also need to map
the page first, which would be an additional cost (and you'd have to
be careful not to infinitely recurse if you get another interrupt
while mapping), unless you keep the entire guest address space mapped,
or have a cache of mapped stack pages.
You can let a guest profile itself though, in which case it can
process its own stacktrace, but exposing Xen's stacktrace to untrusted
guests is probably not a good idea.
You could try to also do what I've done with Xen here: have the guest
provide the stacktrace to the hypervisor, which provides it to Dom0.
But then you'd need to run some code inside the guest, and that may
not be possible if you are currently handling something on behalf of
the guest in Xen.
I'd first wait to see whether KVM implements this, and then implement
something similar for Xen. AFAICT KVM doesn't support this either.
>
> > * the Linux patch will need to be forwarded ported to master before submission
> > * All the caveats for using regular VPMU apply, except for the lack of
> > stacktraces, that is fixed here!
> What would be needed to fix these limitations?
See below for my answers to each one, although others on this mailing
list might be able to provide a more correct answer.
> > * Dom0 must run hard pinned on all host CPUs
Not sure. I think Dom0 needs to be able to run some code whenever the
NMI arrives, and that needs to run on the CPU it arrived on, unless
you define a way for one CPU to also receive and process interrupts
for CPUs that Dom0 doesn't run on.
The pinning requirement could be lifted if everything is correctly
context switched
> > * Watchdog must be disabled
IIUC the Xen watchdog and the profiling interrupt both use NMIs, so
you can only have one of them active.
In fact even with bare metal Linux the NMI watchdog sometimes needs to
be disabled for certain perf counters to work, although basic timer
based profiling and most counters work with NMI enabled. If needed
'perf' prints a message to disable the Linux NMI watchdog, but if you
follow those instructions literally the host will panic and reboot 20s
later because the soft lockup detector won't work anymore (so that too
would need to be disabled).
> > * not security supported
See https://xenbits.xen.org/xsa/advisory-163.html
Also even if you ignore security support, using vPMU on production
systems currently is probably not a good idea, there are probably lots
of pre-existing bugs to fix, and the bugs might be micro-architecture
specific.
E.g. with vPMU enabled running 'perf stat -ddd' in Dom0 caused one of
my (older) hosts to completely freeze (all vCPUs except one stuck in a
spinlock, and the last one not running anywhere), whereas it ran
perfectly fine on other (newer) hosts. I haven't debugged yet what is
causing it (could also be a bug in Linux, or the Linux Xen PMU driver
and not Xen).
There is a way to restrict what performance counters are exposed to
guests, and e.g. I think EC2 used to expose some of these to guests.
Initially temperatures/turbo boost could be measured from guests, but
that got disabled following an XSA:
https://www.brendangregg.com/blog/2014-09-15/the-msrs-of-ec2.html.
Later a restricted set of PMCs got exposed (vpmu=ipc, or vpmu=arch),
which then got enabled for EC2 guests (don't know whether they still
expose these): https://www.brendangregg.com/blog/2017-05-04/the-pmcs-of-ec2.html
If that is enabled, the stacktrace is already suitably restricted to
Dom0-only, so should be safe to use, i.e. even if you can't use
`vpmu=on`, you might be able to use `vpmu=ipc`.
Currently neither of these is security supported though.
> > * x86 only
This one should be fixable, all it needs is a way to do a stacktrace,
which should already be present in the arch-specific traps.c (although
AFAICT only X86 and ARM implement stack traces currently), although
that of course assumes that other arches would have a PMU
implementation to begin with.
AFAICT xenpmu_op is only implemented on x86:
```
#ifdef CONFIG_X86
xenpmu_op(unsigned int op, xen_pmu_params_t *arg)
#endif
```
> > * secureboot needs to be disabled
>
This is because to enable vpmu you need to modify the Xen cmdline, and
that is restricted under secure boot.
If you enable vpmu at build time then it might work, but see above
about no security support.
> With them it isn't really
> possible to do profiling on production systems, only on dedicated development
> boxes.
I'd like to be able to do profiling on production too. But I'm taking
it one step at a time, at least now I'll have a way to do profiling on
development/test boxes.
For production use a different approach might be needed, e.g. LBR, or
a dedicated way to get just a hypervisor stacktrace on a timer,
without involving the (v)PMU at all.
That would require some new integration with `perf` too.
> That works great if you have a dev box and can create a realistic
> workload with non-sensitive data, but less great if you have a problem that
> you can't reproduce on a non-production system. It's also not usable
> for real-time monitoring of production environments.
Best regards,
--Edwin
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
^ permalink raw reply [flat|nested] 30+ messages in thread